Skip to content

Suite contains invalid, positive tests for idn-hostname #675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fdutton opened this issue May 19, 2023 · 2 comments
Closed

Suite contains invalid, positive tests for idn-hostname #675

fdutton opened this issue May 19, 2023 · 2 comments
Labels
waiting for author A pull request which is waiting for an update from its author.

Comments

@fdutton
Copy link

fdutton commented May 19, 2023

RFC 5890 is the top-level specification (i.e., the entry point) for describing and validating Internationalized Domain Names for Applications (IDNA). RFC 5893 is a subordinate specification that addresses how to validate domain names compliant with Unicode's bi-directional algorithm.

RFC 5893 Section 2.1 states, "The first character must be a character with Bidi property L, R, or AL." Five tests in tests/draft2020-12/optional/format/idn-hostname fail this check (results are the same in the other drafts).

  • KATAKANA MIDDLE DOT with Hiragana (Bidi class is ON)
  • KATAKANA MIDDLE DOT with Katakana (Bidi class is ON)
  • KATAKANA MIDDLE DOT with Han (Bidi class is ON)
  • Arabic-Indic digits mixed with Extended Arabic-Indic digits (Bidi class is AN)
  • Extended Arabic-Indic digits not mixed with Arabic-Indic digits (Bidi class is EN)

I managed to get the tests to pass by prefacing the test data with random characters from the same script. For example, I prefaced the test data for KATAKANA MIDDLE DOT with Hiragana with U+3045 but I do not know if this is reasonable.

I can submit a pull-request but would prefer to do so once I learn how to build and test this project. I would appreciate it if someone could direct me to this portion of the documentation or describe the process. I also need to know if I should update draft-next.

@Julian
Copy link
Member

Julian commented Aug 30, 2023

I missed this previously, apologies for not following up.

I'm no expert in these RFCs, but my reading combined with looking at at least a Python implementation seems to suggest to me these are correct as is. Specifically, the paragraph before in the section you site says:

The following rule, consisting of six conditions, applies to labels in Bidi domain names.

and just above in Section 2 is the definition:

A "Bidi domain name" is a domain name that contains at least one RTL label.

i.e. it seems to me at least that what you're citing applies only to bidi names, not all IDN hostnames. In particular, the examples you're citing contain no RTL character, so they indeed do not need to start with a character with such a bidi property.

If you are an expert here please feel free to elaborate :)

@Julian Julian added the waiting for author A pull request which is waiting for an update from its author. label Aug 30, 2023
@Julian
Copy link
Member

Julian commented Sep 13, 2023

Going to close given the above, but if you or anyone disagrees do follow up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for author A pull request which is waiting for an update from its author.
Projects
None yet
Development

No branches or pull requests

2 participants