You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RFC 5890 is the top-level specification (i.e., the entry point) for describing and validating Internationalized Domain Names for Applications (IDNA). RFC 5893 is a subordinate specification that addresses how to validate domain names compliant with Unicode's bi-directional algorithm.
RFC 5893 Section 2.1 states, "The first character must be a character with Bidi property L, R, or AL." Five tests in tests/draft2020-12/optional/format/idn-hostname fail this check (results are the same in the other drafts).
KATAKANA MIDDLE DOT with Hiragana (Bidi class is ON)
KATAKANA MIDDLE DOT with Katakana (Bidi class is ON)
KATAKANA MIDDLE DOT with Han (Bidi class is ON)
Arabic-Indic digits mixed with Extended Arabic-Indic digits (Bidi class is AN)
Extended Arabic-Indic digits not mixed with Arabic-Indic digits (Bidi class is EN)
I managed to get the tests to pass by prefacing the test data with random characters from the same script. For example, I prefaced the test data for KATAKANA MIDDLE DOT with Hiragana with U+3045 but I do not know if this is reasonable.
I can submit a pull-request but would prefer to do so once I learn how to build and test this project. I would appreciate it if someone could direct me to this portion of the documentation or describe the process. I also need to know if I should update draft-next.
The text was updated successfully, but these errors were encountered:
I missed this previously, apologies for not following up.
I'm no expert in these RFCs, but my reading combined with looking at at least a Python implementation seems to suggest to me these are correct as is. Specifically, the paragraph before in the section you site says:
The following rule, consisting of six conditions, applies to labels in Bidi domain names.
and just above in Section 2 is the definition:
A "Bidi domain name" is a domain name that contains at least one RTL label.
i.e. it seems to me at least that what you're citing applies only to bidi names, not all IDN hostnames. In particular, the examples you're citing contain no RTL character, so they indeed do not need to start with a character with such a bidi property.
If you are an expert here please feel free to elaborate :)
RFC 5890 is the top-level specification (i.e., the entry point) for describing and validating Internationalized Domain Names for Applications (IDNA). RFC 5893 is a subordinate specification that addresses how to validate domain names compliant with Unicode's bi-directional algorithm.
RFC 5893 Section 2.1 states, "The first character must be a character with Bidi property L, R, or AL." Five tests in
tests/draft2020-12/optional/format/idn-hostname
fail this check (results are the same in the other drafts).I managed to get the tests to pass by prefacing the test data with random characters from the same script. For example, I prefaced the test data for
KATAKANA MIDDLE DOT with Hiragana
withU+3045
but I do not know if this is reasonable.I can submit a pull-request but would prefer to do so once I learn how to build and test this project. I would appreciate it if someone could direct me to this portion of the documentation or describe the process. I also need to know if I should update
draft-next
.The text was updated successfully, but these errors were encountered: