-
-
Notifications
You must be signed in to change notification settings - Fork 215
Tests for mixed Arabic-Indic digits and Extended Arabic-Indic digits violates the Bidi rule #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, @Julian @gregsdennis |
I'm not sure what the issue is here. Is there a technical reason this character needs to be added (i.e. is this testing something that's not currently covered?), or is it merely a linguistic problem? |
@OptimumCode I think but haven't re-checked that this is the same as #675 -- can you perhaps have a look there, I recall having a look into this and concluding it seemed correct as is to me (and OP on that issue didn't follow up). But yeah perhaps let me know if you see something wrong there and/or whether you agree you're referring to the same thing. |
Thank you, @Julian. I did not see this issue when tried to look for an existing one. I am not an expert here either so I might be wrong somewhere. But let me try to explain why I think this change should be made. Answering @gregsdennis question:
These are 3 tests for Arabic-Indic digits and Extended Arabic-Indic digits in the JSON schema test suite. One test has them mixed and two tests have only either Arabic-Indic digits or Extended Arabic-Indic digits. Now the hardest part - why does it violate the Bidi rule? According to paragraph 4.2.3.4 in RFC5891 the labels that have right-to-left characters MUST meet Bidi criteria
What is right-to-left is explained in RFC5893 which is referenced above. In section 1.4 we can find definition or right-to-left:
In our case, it contains characters with the types AN (Arabic Number) and UN (European Number). This would mean that it is an RTL label. Also, the header for RFC5893 is Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA). Because of that, I believe the Bidi domain name and IDN hostname with RTL label are the same thing. Could you please share your thoughts on this considering all the above? There is also one interesting thing: if we take a look at Bidi criteria in point 4 we will see that mix of AN and EN in RTL label is not allowed
This would make a mix of Arabic-Indic digits and Extended Arabic-Indic digits invalid in any case because Extended Arabic-Indic digits have type EN. |
Okay. Thanks for the explanation. I just didn't know about these kinds of language issues, and it appeared like it might just be a language thing. |
Hi, could you please advise what should I do with PR? If you disagree with some points in my explanation please let me know - I will try to elaborate. Also, what do you think about adding test cases for Bidi rule violations? (as a separate PR) |
It will probably be a few days until I can reread the spec but haven't forgotten. |
Thank you, @Julian. If you need something from my side please let me know |
Took another quick look today -- yes I think you look correct, the other four examples in #675 were not RTL I think (they weren't |
Thank you, @Julian for reviewing this |
The current test cases (like this one) violate the Bidi rule and does not test what they are supposed to test.
The Bidi rule is violated because:
By adding the first character with type AL the string will pass the Bidi rule validation but still will fail the Arabic-Indic digits rule.
Please, correct me if I missed something here and my conclusions are wrong.
The text was updated successfully, but these errors were encountered: