-
-
Notifications
You must be signed in to change notification settings - Fork 215
add tests for pattern matching with unicode semantics #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'll need to run these. I'm pretty sure that .Net RegEx doesn't fully like Unicode. It might just be UTF-16 and/or -32, though. |
Yeah, the following fail for .Net.
for both It's not an issue with the tests. It's .Net's RegEx implementation that doesn't support surrogate pairs (bunch of SO questions) representing UTF-32 and .Net string are UTF-16. I'm not sure how I want to handle this. I'd be curious to what extent other languages support regex matching on surrogate pairs. |
Since this is now a required part of the spec, I think it's fair to have tests for it in the normal area. However, I could also add these tests to optional/* for drafts 2019-09 and earlier. |
It's just a SHOULD, not a MUST, and as all things regex (where the reality is no language but JS will use ECMAScript regexes) it's I think perfectly expected if a .NET implementation doesn't support matching on surrogate pairs. I still personally support putting them here though I guess, and folks should skip them if their language makes it unduly hard to pass. |
Then the release notes need updating, because it says it's now required -- https://json-schema.org/draft/2020-12/release-notes.html |
Yes, I think that section is misleading (but it's not normative) -- the spec seems to instead say:
The next section (which I didn't read carefully but looks new-ish to me) might give us a reason to move just the surrogate pair test to optional though now that I see it... specifically:
I yeah need to parse that mentally, but we should probably double check surrogate pairs match the criteria that's recommending. |
Yup that's an error. Feel free to make a PR or see if @jdesrosiers wants to handle it =] |
Ok what do people want to do here? Is this good to merge as is or do some bits need to move? |
I'll work on fixing the release notes, probably early next week. |
If it isn't a hard retirement, these tests should go in optional.
If I need to build provision to skip these (or coerce them to optional) in my test suite, I can do that. I've done similar for my JSON Path suite for syntaxes that I'm just not going to support. |
Considering it's just SHOULD, I'd also probably be coming around to all of them belonging in optional. (Basically all SHOULD tests are in there at the minute that I know of, e.g. ecmascript-regex itself). |
I didn't add these tests for 2019-09 and earlier, because I thought that the requirements changed for 2020-12. but if they didn't, and they are just as "SHOULD" as they always were, then we should be consistent. Which I guess that means they go in optional. |
As of draft2020-12 supporting this is still just a "SHOULD".
a42091c
to
c5f08e2
Compare
All tests have been moved into tests/*/optional/unicode.json. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I think these are good tests for those who can support it.
It looks like CI on the master branch is failing, because this PR contains invalid tests -- specifically the ones that use Is there something we can improve about CI reporting to make things clearer? |
An email notification on test failure would be nice, but I don't know if that's possible. FWIW I don't even see the test results in this PR at all. |
This is actually against the spec, per spec, I'll file a fix PR tomorrow with exact references to the spec. |
In short, https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.6.3.3 refences to a specific regexp dialect, and that's ECMA-262 dialect. In ECMA-262 dialect, > /^\d$/u.test('৪২')
false
> /^\p{digit}+$/u.test('৪২')
true I'll file a PR tomorrow to fix this (with exact refs). |
Fix and explanation in #505. |
Supporting this is now mandatory in draft2020-12.