Skip to content

test(idn-hostname): add Bidi rule (RFC 5893) cases#934

Open
vtushar06 wants to merge 3 commits into
json-schema-org:mainfrom
vtushar06:idn-hostname-bidi-rule
Open

test(idn-hostname): add Bidi rule (RFC 5893) cases#934
vtushar06 wants to merge 3 commits into
json-schema-org:mainfrom
vtushar06:idn-hostname-bidi-rule

Conversation

@vtushar06

@vtushar06 vtushar06 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Following the methodology I used for ipv4 and uuid, I read RFC 5893 section 2 and found the current idn-hostname.json has no dedicated tests for the Bidi rule (it tests RTL exception characters, but not the six-condition rule itself).

In a Bidi domain name (any name with a right-to-left label), every label must satisfy the Bidi rule: condition 1 requires the first character to be L, R or AL, and the other conditions constrain direction.

Changes

  • Added 4 test cases across draft7, draft2019-09, draft2020-12, and v1.
  • 0a.א - a digit-first label in a Bidi domain name (condition 1) - invalid.
  • - a digit before a right-to-left letter (condition 1) - invalid.
  • - a left-to-right label containing a right-to-left letter (condition 5) - invalid.
  • א0٠ - a right-to-left label mixing European and Arabic-Indic digits (condition 4) - invalid.

Ecosystem Impact

  1. python-jsonschema 4.x: FAILS 0a.א (accepts it). The idna package checks the Bidi rule per label; the 0a label has no RTL character so the check is skipped, missing that the א label makes the whole name a Bidi domain name. It correctly rejects the three single-label cases.
  2. Node (WHATWG): FAILS all four (accepts them) - it does not apply the Bidi rule by default.
  3. Go x/net/idna: PASSES all four (rejects them).

RFC References

Reproduction commands and the idn-hostname cross-implementation matrix are in my evidence repo: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-hostname.md

Related: #965

@vtushar06 vtushar06 requested a review from a team as a code owner June 11, 2026 18:06
Copilot AI review requested due to automatic review settings June 11, 2026 18:06

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds additional negative test coverage for idn-hostname to ensure implementations correctly reject invalid BiDi (bidirectional) IDN label/domain patterns per RFC 5893.

Changes:

  • Add new invalid-case vectors covering digit-first labels in BiDi domains, mixed-direction labels, and mixed digit sets in RTL labels.
  • Apply the same new vectors across v1 and multiple optional draft test suites.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tests/v1/format/idn-hostname.json Adds new invalid BiDi/IDN hostname test vectors.
tests/draft7/optional/format/idn-hostname.json Mirrors the new invalid BiDi/IDN hostname test vectors for draft7 optional.
tests/draft2020-12/optional/format/idn-hostname.json Mirrors the new invalid BiDi/IDN hostname test vectors for draft2020-12 optional.
tests/draft2019-09/optional/format/idn-hostname.json Mirrors the new invalid BiDi/IDN hostname test vectors for draft2019-09 optional.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/v1/format/idn-hostname.json Outdated
Comment on lines +336 to +337
"description": "Bidi domain name with a digit-first label is invalid",
"comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character",
Comment on lines +328 to +329
"description": "Bidi domain name with a digit-first label is invalid",
"comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character",
Comment on lines +336 to +337
"description": "Bidi domain name with a digit-first label is invalid",
"comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character",
Comment on lines +336 to +337
"description": "Bidi domain name with a digit-first label is invalid",
"comment": "https://tools.ietf.org/html/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character",

@jviotti jviotti left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Looks valid as far as I can tell

Comment on lines +335 to +340
{
"description": "Bidi domain name with a digit-first label is invalid",
"comment": "https://www.rfc-editor.org/rfc/rfc5893#section-2 a label in a Bidi domain name must start with an L, R or AL character",
"data": "0a.\u05d0",
"valid": false
},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't correct. The bidi rules apply per label. This example includes two labels, "0A" and "\u05d0". The first is unambiguously LTR and the second is unambiguously RTL. Both are valid labels. If you remove the ".", it becomes a single label and would violate rule 1 as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants