test(idn-hostname): add Bidi rule (RFC 5893) cases#934
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds additional negative test coverage for idn-hostname to ensure implementations correctly reject invalid BiDi (bidirectional) IDN label/domain patterns per RFC 5893.
Changes:
- Add new invalid-case vectors covering digit-first labels in BiDi domains, mixed-direction labels, and mixed digit sets in RTL labels.
- Apply the same new vectors across v1 and multiple optional draft test suites.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/v1/format/idn-hostname.json | Adds new invalid BiDi/IDN hostname test vectors. |
| tests/draft7/optional/format/idn-hostname.json | Mirrors the new invalid BiDi/IDN hostname test vectors for draft7 optional. |
| tests/draft2020-12/optional/format/idn-hostname.json | Mirrors the new invalid BiDi/IDN hostname test vectors for draft2020-12 optional. |
| tests/draft2019-09/optional/format/idn-hostname.json | Mirrors the new invalid BiDi/IDN hostname test vectors for draft2019-09 optional. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "description": "Bidi domain name with a digit-first label is invalid", | ||
| "comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character", |
| "description": "Bidi domain name with a digit-first label is invalid", | ||
| "comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character", |
| "description": "Bidi domain name with a digit-first label is invalid", | ||
| "comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character", |
| "description": "Bidi domain name with a digit-first label is invalid", | ||
| "comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character", |
jviotti
left a comment
There was a problem hiding this comment.
Awesome. Looks valid as far as I can tell
| { | ||
| "description": "Bidi domain name with a digit-first label is invalid", | ||
| "comment": "https://www.rfc-editor.org/rfc/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character", | ||
| "data": "0a.\u05d0", | ||
| "valid": false | ||
| }, |
There was a problem hiding this comment.
This one isn't correct. The bidi rules apply per label. This example includes two labels, "0A" and "\u05d0". The first is unambiguously LTR and the second is unambiguously RTL. Both are valid labels. If you remove the ".", it becomes a single label and would violate rule 1 as intended.
Following the methodology I used for ipv4 and uuid, I read RFC 5893 section 2 and found the current idn-hostname.json has no dedicated tests for the Bidi rule (it tests RTL exception characters, but not the six-condition rule itself).
In a Bidi domain name (any name with a right-to-left label), every label must satisfy the Bidi rule: condition 1 requires the first character to be L, R or AL, and the other conditions constrain direction.
Changes
0a.א- a digit-first label in a Bidi domain name (condition 1) - invalid.0ا- a digit before a right-to-left letter (condition 1) - invalid.aא- a left-to-right label containing a right-to-left letter (condition 5) - invalid.א0٠- a right-to-left label mixing European and Arabic-Indic digits (condition 4) - invalid.Ecosystem Impact
0a.א(accepts it). The idna package checks the Bidi rule per label; the0alabel has no RTL character so the check is skipped, missing that theאlabel makes the whole name a Bidi domain name. It correctly rejects the three single-label cases.RFC References
Reproduction commands and the idn-hostname cross-implementation matrix are in my evidence repo: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-hostname.md
Related: #965