-
-
Notifications
You must be signed in to change notification settings - Fork 216
Add more tests for valid emails #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As much as it pains me to say it, it appears that the JSON schema spec only requires hostnames to be compliant with RFC 1034, which does not permit hostname labels to begin with digits. This is clearly a mistake on their part, since RFC 1123 has been around since 1989, but I guess we're stuck with it. Should I remove my new hostname tests from this MR? |
@bvisness probably file a ticket upstream in the spec repo (or better a PR!) to reference the newer RFC going forward in future drafts |
So @Julian, am I crazy or is the CI for the test suite really rejecting my new tests because the JS implementation has too narrow a definition of an email address? |
That certainly seems like what's happening, though that must have changed without me noticing -- @epoberezkin do you know anything about that? We really shouldn't have the test suite's test suite running an implementation on them (not mine, yours, nor any others) -- if the implementation has bugs, it will block people sending patches (as possibly is what's happening here). |
@Julian a long comment here - my day off :)
I don't think it is correct. I believe it is better to slow down adding tests than to have completely untested test-suite. Whenever you have tests that confirm that your code is compliant with them, the code you tests confirm that the tests are compliant with the code. There is no way to prove that they are correct or fit for purpose, you are just proving that they are consistent with each other. Having untested test suite makes it completely useless... Would you rely on visual PR inspection to merge? I do think the opposite - we should have a bigger test matrix of validators / draft version that is visible in travis as separate jobs, so we can see which draft version tests passed against which validators. Once we have several validators testing tests for each draft version, we can have criteria for partial pass required to merge the changes. As we only have one job now we should not be adding additional tests without them passing the tests. And given that formats support is optional in the spec, formats tests should be minimal rather than comprehensive, and when we create a test matrix, formats should be in separate jobs. Also, @bvisness, this test suite never intended to be complete, I believe that "there is no way to prove that a validator fully complies with the spec" - or at least, it is Some time ago I tested several JS validators against the tests Ajv accumulated over time (when fixing bugs, addressing limitations, etc.) - https://github.com/epoberezkin/test-validators. Not a single validator passes all Ajv tests. It doesn't prove that Ajv is better or even more compliant that other validators, it is quite possible that Ajv would fail a bigger number of tests of other validators. But it does mean that Ajv is better for its users, as the bugs and limitations that they reported were addressed and they are not addressed by other JS validators. Another demo I created at the same time was a fake validator that fully complies with this test-suite and yet not only it is not compliant with the spec, it is absolutely useless - https://github.com/epoberezkin/fjv. This validator fails validation only in the cases that this test-suite prescribes, and passes in all other cases (it simply compares schemas and data instances with the examples in this test suite). That example |
Maybe there is a more generic way to write tests that would be more comprehensive, but fully comprehensive tests suite would only prove compliance with the spec if we agree that this test-suite IS the spec, and there are no other requirements to comply other than to pass this test-suite. It may be possible to express JSON Schema as a set of tests written in a language that has an advanced type system, e.g. haskell. |
What's a day off, that sounds nice :(
That's not how it works for any normal test suite (for any normal code base) -- when you write tests, you don't write tests for your tests, do you? I don't, and it's considered an antipattern to do so -- tests need to be correct by visual inspection -- someone needs to be able to look at the test and tell that it tests correct behavior, not by writing another layer of tests ensuring the test tests the right behavior. Having your validator (or mine, etc.) pass or fail over some new tests being written provides 0 confidence -- both false positives and false negatives are possible -- after all the whole point of a new test is it's a thing the suite previously didn't cover. The point of the "tests" for this test suite is totally different, and they operate at a completely different mental level -- they test things like "are all the descriptions unique" -- something totally different from the level of "does this test pass or fail on instances", which is what the test suite itself is meant to do. You cannot test that, you just have to write correct tests.
Yes precisely, the same way as one does for a normal code base :)
Are you suggesting we merge tests by consensus of validators, because yeah I strongly disagree with that one -- again, new tests are "things that are in the spec, that we forgot to test for before". It's highly likely that some, most, or all validators fail for a new test being added. If you're instead suggesting we also run this suite against a bunch of validators and report what happens, but do not use that to block PRs, well that's a great idea, it's #20 :) and yeah, we should, on every commit, run against all the validators who contribute a way to have their tests run, and then report what happened. But that's totally separate from this process of adding new tests to the suite.
I haven't read The point of tests is to make the correct implementation be the simplest implementation that still passes all the tests. And therefore make a correct implementation more likely than an incorrect one. Anything beyond that requires formal verification methods. |
I'm just trying to keep people's implementations from rejecting real email addresses! Which the JS one is apparently doing! As @Julian and I were discussing in #254, really strict tests put a large burden on implementers for what is just an optional feature. That's why I have only added valid test cases. Obviously no set of tests can guarantee correctness (as you have so elegantly proven), but when has that ever been a reason not to add more tests? |
My intent with these tests is to make implementers do one of the following:
Either approach will pass the tests, and neither approach will reject real, valid email addresses like the ones that caused our production code to explode this week. |
Of course you do visual inspection, and the tests are supposed to define the behaviour, and they are supposed to be simple and declarative to be the documentation for your code. I am not challenging any of that, and I do not write tests for tests (and neither we do it here). But you are not saying that you never had a situation when a mistake in tests was caught by actually running them against your code? I think, everybody had (or at least most people). So unless you run the tests, you cannot be certain they are correct, can you? This is what we do here - we actually run the tests against the code base(s) that is(are) supposed to be correct against these tests, we are not writing tests for tests.
"both false positives and false negatives are possible" - it is definitely correct, but from it doesn't follow that "Having your validator (or mine, etc.) pass or fail over some new tests being written provides 0 confidence" - it actually quite the opposite, having tests actually run on every change did provide me the confidence they are correct, tests that are not executed against the codebase whenever tests change are less valuable. Any test failure warrants the investigation of both the test and the code and the decision which is correct.
Visual inspection is only the first step, then you run them and then you investigate why they fail - mistake can be in both the test and in the code, if you don't run them - how would you catch mistakes that can creep in in PR that contains 120 lines (this one) and can be missed during the visual inspection?
Tests are not part of the spec. If they were, they should be published together with the spec. They are not even referenced from the spec. Some form of consensus, I didn't suggest any specific process, is a normal way to resolve any contradictions. By the way I didn't challenge whether these tests are correct - they probably are. I only challenged whether they are necessary. If we believe they are, we need to fix validator(s) that failed them, re-run the tests and then merge.
I agree, I was only saying that there is a limited value in adding non-orthogonal tests that test behaviours that are optional.
I am suggesting that we agree on what minimal amount of tests jobs need to run to allow merging the tests. As now we only have one test job, well - from my point of view, it is a minimal amount to merge, unless the problem is not fixed for a long time (no idea what defines long) but given that there is no urgency to extend the test suite, it can be anything between a few weeks and a few months, whatever feels appropriate. Otherwise, it's better to have un-merged tests for some time than losing ability to run them in this code base. To make it more flexible we need to break one job to several jobs and agree on merging criteria. |
@bvisness on the subject of this PR, leaving test theory aside: only two tests fail with Ajv:
It may be worth splitting it in two PRs, merge the one that passes and merge another one once it's investigated / fixed in Ajv |
(Thanks @epoberezkin, I still very strongly disagree I think, but seems likely if we chat it out we'll figure out just where we disagree, which possibly I haven't noticed yet) Will respond to that comment in more detail, but exactly your last comment is what I very strongly disagree with -- it is not @bvisness's responsibility to wait for / fix / investigate the behavior of your validator (nor mine). If tests fail, he should not have to wait until ajv investigates it's bugs (again, or mine). Of course many validators failing on a test is a good sign that either the spec is unclear or the test is wrong, which points you to be really really sure the test is correct, but I don't think that's what's happening here, and the way things are set up right now isn't very conducive to that, because the build will remain red until ajv fixes its bug. |
That's why I suggest we split it into a matrix (with formats separate) so at least we can see what is wrong rather than just blindly merge. And fixing won't take long - I just need to confirm that these tests are indeed correct - they have somewhat unusual emails. |
Oh and by the way, I am implicitly assuming a requirement which maybe we all have never agreed upon (but which all my work teams go by) which is "you can't merge until the build is green". |
Ah cool, great! Then we agree. That was exactly my point :), and that we mark all the validator runs as optional for merge. |
Er, well, hopefully we do, as long as this is a typo:
Not formats separate, ajv separate, yes? |
I don't like the idea that everything is optional. I am ok with some jobs are optional and the process should to flag it, create an issue in validator in question (which I did) and then decide to merge. Maybe the process should require double (or triple approval for tests that fail... I just want us to agree on the process to follow whenever not all jobs pass (or at least more jobs fail than before, what's the time after which the jobs have to be removed etc.) Let's take this conversation elsewhere :) I will probably fix and we will merge (as I said I have no idea whether these two unusual emails are supposed to pass (and whether we want to have such tests is another question entirely) |
Both separate. Formats are optional in the spec, and they are more likely to fail, particularly in some pathological cases, like in this PR. So we would have this matrix:
8 jobs in total. Even if formats or some other optional tests fail, we would still know that the mandatory testspass. Ajv only tests some old formats, not the newer ones and not content encoding keywords. |
If you only run tests against one codebase it is a normal thing. Here we run tests against multiple external codebases we have no control over, so the requirement can be reduced. I just want there to be some clarity about these requirements rather than "it's always ok to merge" |
I disagree with any validator's run being mandatory.
Let's discuss wherever is convenient.
…On Thu, Feb 21, 2019, 12:26 Evgeny Poberezkin ***@***.***> wrote:
Not formats separate, *ajv* separate, yes?
Both separate. Formats are optional in the spec, and they are more likely
to fail, particularly in some pathological cases, like in this PR.
So we would have this matrix:
- draft-3
- keywords: jsonschema
- optional: jsonschema
- draft-4
- keywords: jsonschema, ajv
- optional: jsonschema, ajv
- draft-6
- keywords: jsonschema, ajv
- optional: jsonschema, ajv
- draft-7
- keywords: jsonschema, ajv
- optional: jsonschema, ajv
8 jobs in total.
Even if formats or some other optional tests fail, we would still know
that the mandatory testspass. Ajv only tests some old formats, not the
newer ones and not content encoding keywords.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUIXu7lGlR8ZJSQZoyClwACJOVsyDczks5vPtbZgaJpZM4bDys->
.
|
With travis build matrix you can define which jobs are allowed to fail: So we can have all optional tests as optional. The whole build will pass, but we will see which tests failed against which validators. And I would really encourage all validator authors to have their tests here. What merging criteria we have is secondary to me. |
Or we can only have tests that validate semantic validity of the tests - i.e. that JSON files are valid JSON, even test the tests against their schema etc. - only have them as mandatory. That would definitely reduce the chances of errors and typos. |
Right so what we're now coming to is I think a lot like what I had in mind for #20 -- So I think it's a great idea but I also want to get @bvisness unblocked -- in other words get the build green :). There's a bunch of details that need hashing out I think, e.g. I suspect #53 even is related -- but yeah if possible I would like to get the build green again so that we can then decide the examples here are correct, merge this, and then figure out details on #20, rather than making him wait while we discuss. Are you OK with doing that? |
Frankly, you can take as much time as you need. I'm not turning email validation back on for our code any time soon. 😛 |
As I wrote, I am not sure:
So I cannot confirm either way. Definitely happy to merge other tests without these 2 (and the other tests also pass). These two can be separated to another PR. |
Yes sure, agreed, which I haven't done myself because I noticed this setup, but of course we can check fairly easily whether they're permitted under the spec.
If they are correct, your implementation has a bug and rejects them :), so I'd find it hard to hear an argument they don't belong here, the goal of this suite is to help implementations not have bugs. |
See above :) 1) All implementations have bugs, I am more interested whether these bugs affect any users. 2) No finite set of tests exists that would catch all bugs (at least, not in this format). So the decision whether to include or not should be based on something else, otherwise you would end up with ever growing set of tests that do not reflect actual usage of the spec. The important question is whether a test case reflects a realistic usage scenario - what is the use case for using such emails in any context? - seems it's purely for the sake of theoretic correctness. |
"Don't write tests because it's hopeless and there will always be bugs"?
Really don't see what your argument is, sorry.
Again, this suite was designed to ensure working implementations. If yours
doesn't want to support behavior mandated by the spec you are free to skip
those tests.
…On Thu, Feb 21, 2019, 17:09 Evgeny Poberezkin ***@***.***> wrote:
If they are correct, your implementation has a bug and rejects them :), so
I'd find it hard to hear an argument they don't belong here, the goal of
this suite is to help implementations not have bugs.
See above :) 1) All implementations have bugs, I am more interested
whether these bugs affect any users. 2) No finite set of tests exists that
would catch all bugs (at least, not in this format).
So the decision whether to include or not should be based on something
else, otherwise you would end up with ever growing set of tests that do not
reflect actual usage of the spec.
The important question is whether a test case reflects a realistic usage
scenario - what is the use case for using such emails in any context? -
seems it's purely for the sake of theoretic correctness.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUIXlKAl7ZJPJj7YY5YnUbyEN74vW36ks5vPxkKgaJpZM4bDys->
.
|
My reading of the RFC5322 (email spec) is that these two failing emails are incorrect in this context. RFC5322 says:
To re-phrase it, RFC5322 says that the domain part of email name should comply with the addressing syntax of the context where it is used. It does list RFC1123, but only as an example. The context for email syntax is JSON schema validation, and JSON schema spec also defines which RFC should be used for hostname validation - it is RFC1034. So I don't think RFC1123 should apply for validation of domain part of email, and instead RFC1034 should apply. |
No, this is not what I wrote :) What I wrote equates to "do not write more tests than necessary to test realistic use cases". Beyond certain number of tests, they do not add value, and only increase maintenance and cognitive costs. |
Actually, this #253 (comment) relates to the second failing test. |
@epoberezkin May I again point out that I have added test cases for real, valid email addresses because implementations were rejecting real, valid email addresses? What do you think these tests are for? This bug actually affected me, an actual user. Validating actual customer email addresses is certainly "actual usage of the spec". I have included tests that catch a real bug in a real implementation that I really used. I cannot fathom why this is so controversial. |
If you think that it's ok for your email address validation to only support RFC 1034 domains, then you are living in a fantasy world where no customers have email addresses at RFC 1123 domains. This is false; we have customers with email addresses at RFC 1123 domains, and I had a schema implementation break because of it. The JSON schema spec is only as useful as its application in the real world! |
@bvisness This is what I am looking at. Real and valid is not the same. There is no universal definitions of validity, there are various RFCs that define it. The fact that the email is real and used in some context does not mean that it should be valid against JSON Schema email format. If JSON Schema definition is obsolete, then the spec should be updated and then the test suite should be updated, not the other way around. Most validators allow replacing the implementation of format validation (and even turn it off completely) - so you can always supply the email definition that is working for your usage scenario, but it does not mean that any particular library has a bug that needs to be fixed. |
... and even if a validator needs to be fixed, proposing a PR that fixes a real issue is more valuable than a test case that makes this test-suite deviate from the spec. We're only talking about [...] domain part here btw, a quoted string should be valid it seems. |
Whether we're talking about local part or domain part, every case I've provided is a valid RFC 5322 email address. Maybe RFC 5322 truly doesn't specify anything about the domain part - in that case, implementations should not reject emails because of the domain. There is no need to update any spec to address the test cases I have added. I've already addressed this issue in my actual validator of choice: santhosh-tekuri/jsonschema#20 All I want is for the same bug not to occur in other implementations! |
@bvisness So I still haven't gotten a chance to actually form an opinion here, so apologies :/ I would say that I agree the test suite should not be ahead of the spec though -- so if it's "reasonable" to interpret the spec in some way (even if it's ridiculous, like your hostname RFC bug), we definitely should fix it, but not here, first in the spec, then in the draft that's published with the fix. But yeah will try and carve some time out to read your examples in the next couple of days. @epoberezkin I again still would like to see a PR removing ajv from the build, regardless of this ticket. I still can't tell why you're pushing back on that. Again, your validator should not be more special than any other with regards to a new PR that adds previously unseen tests, and if we want to hit #20, we can do so in a way that does not disrupt changes to the test suite. |
I am only pushing back against these additional tests that possibly don’t match the current spec. I agree that we should split running validators against the test cases, so that PRs only fails on structural validation of test cases. We can introduce a more complex grid later but I was going to split running all tests to an optional job so we can still screen PRs like this. |
Here's my opinion: Among the implementations that choose to validate format (and "email" in particular), they should probably all behave the same. First, we should ensure that the spec is referencing the correct production. Second, we should add as many tests as necessary to ensure that implementations are using the correct production. Finally, as we add more examples it's likely we'll get something wrong, so we should incorporate an email address test library to ensure that our examples are correct. |
OK cool! Phew, glad we agree -- great. That helps me focus on just whether I agree these tests are conformant or not! Sorry for the confusion. @awwright thanks for weighing in, that's roughly what I think too (though I think the first part of your comment is not very likely to ever be true in practice :D) So let's get concrete here again, because I think I'm coming around to @epoberezkin's concrete recommendation but want to make sure we're saying exactly the same thing (which maybe for the last part of this we are not):
In the face of ambiguity, both behaviors are therefore "acceptable" for the current drafts. So first, if we want one behavior or the other, we update the spec to make sure that if we don't want that, that it clearly says so. That I think we already have, since now we do not ask for RFC1034 anymore. So regardless: These tests should be merged in for draft 8 As for earlier drafts, again, either behavior sounds acceptable (agree? disagree?) but validators still need to "pick" one, and in theory be testing them -- e.g., in my validator, I'd like the liberal behavior, so I want tests making sure that happens even on draft 7. So in some sense there are 2 sets of mutually incompatible results, and a need for a validator to pick one of the two, yes? I think we'd all agree that a validator that say, returned false for one of the 2 disputed examples and true for the other is definitely not in compliance. So, ridiculous as it sounds, maybe what we want is two conflicting tests with opposite results and then each downstream user runs the ones from the behavior they want, and skips the other test? Otherwise you run into the fundamental problem that the suite tries to solve, which is downstream validators writing their own not-shared tests. |
@Julian correct me if I am wrong (it’s a bit confusing :) - you say that a validator can freely choose to return true or false in the scenario of this test, and in both cases it would be correct from draft-07 point of view, but not from draft-08 point of view where only one result should be correct. If so, then I believe that this test should NOT be added to draft-07 (as both interpretations ARE valid, none is REQUIRED to be correct, so no need to test it either way). This test should be added to draft-08 though. As for validators introducing their own tests - I think they all do it already, so it’s not a problem. Tests appear in validators in the process of addressing the issues that can be very specific to only one implementation - I don’t think it is practical to add them all to the suite. Although if you think otherwise, I don’t mind adding Ajv tests here - some of them are VERY specific though, and many people would not understand why they are needed. |
Yeah, I think I would rather have no tests than two sets of tests (especially for an optional part of the spec.) I still believe that my new test cases (though esoteric) are valid RFC 5322 emails, even though JSON Schema's understanding of a hostname is flawed. I read the JSON schema spec as validating hostname and email independently, but at this point it really seems like it's not worth it anyway. I balk at the idea of adding a full email test suite to any version of the JSON schema tests. The fact that such a comprehensive test suite would be required to enforce an optional "convenience" feature of the spec makes me think that JSON schema shouldn't have |
json-schema-org/json-schema-spec#716 I'd have at-mentioned you but I had forgotten this was a test suite issue and couldn't find it when I put up the PR. |
This is one of the most likely directions for My personal preference would be to outright remove the assertion behavior (instead of having it be optional with randomly variable support in practice), say that implementations MAY offer validation as an extension, but authors MUST NOT rely on any such thing, and instead encourage applications to leverage |
That sounds like an excellent direction to me! |
@epoberezkin are you saying you have tests beyond those in this repo? I definitely don't, not ones for schema behavior, just for my own language-level API, so if so yes definitely would love to see them in this repo! Obviously hard to know without seeing them but it seems likely it should at least be possible to describe each one in a transparent enough way. |
@Julian yes, lots of JSON schema tests in Ajv repo. I will commit them here in batches. |
Note that draft |
Per RFC 1123, hostname labels are allowed to start with either a letter or a digit. The RFC 5322 email spec specifies that the domain part of an email is subject to the modifications in RFC 1123. However, the tests for hostnames and emails do not include any examples of hostname labels beginning with a digit.
This PR adds cases to the hostname and email tests for all draft versions.