-
-
Notifications
You must be signed in to change notification settings - Fork 216
Extend email format tests #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c857051
to
bd97d43
Compare
Whoops. It passes all the dot checks correctly but fails Upd: ah, because of Upd: I just removed |
bd97d43
to
e8e406f
Compare
The library I checked with does not think the last two tests (space in domain; symbol in domain) are correct with respect to RFC 2822. |
But they are listed there with |
No, I mean that the library thinks those two cases are valid emails. |
Ah, I misread your comment, sorry. Reading RFC 5322, it does indeed seem that But I removed both to be on the safe side, because I'm not entirely sure there, domain part of the spec seems trickier than I initially thought. |
Before this, a simple /@/ check passed this format test. This now extends the test a bit more, to check that local part is checked at least for something. Symbol check + dot position check is a good indication here, because: 1) It's common and doesn't need implementing complex quotes logic 2) Dot is the only symbol which validity depends on the position in the local part This should trigger most of the naive implementations (e.g. /@/) and is still not hard to implement as it doesn't include quoted strings support. JSON Schema uses RFC 5322. Refs: https://tools.ietf.org/html/rfc5322#page-17 Refs: https://tools.ietf.org/html/rfc5322#page-12
e8e406f
to
7ad7443
Compare
FWIW this is exactly how my implementation implements the email checker as well :) (and will continue to do so) -- though as I've said in #253 I'm fine with other folks adding more tests here obviously, those of us who don't want to implement more strict email format verification will just skip the new ones. |
My two cents: emails, much more so than hostnames, are hard to parse well, and I wouldn't include any Note that even checking for the presence of a "@" is technically wrong because the escaped form, "\@" is allowed inside the local-part (and actually allowed in any quoted parts of that local part, non-escaped, "Abc@def"@example.com, for example. See: RFC 3696, Section 3). And don't get me started on the dot-separated "quoted-string OR atom" form in obs-local-part. |
@ssilverman I don't think that unquoted
Then The intent of this PR is to check if any minimal email validation is present apart from a |
I don't disagree that test sets should contain Per the link in my comment (RFC 3696, Section 3), the following emails are valid (I'm repeating the link for posterity and for future readers):
My point is that whether that's allowed in RFC 5322 or not, you can bet someone uses them somewhere. :) Emails get very messy to parse "properly" and I personally don't see much value in even writing strict tests for them. The domain parts, maybe, but not the local part (before the '@'). Just to emphasize: I don't disagree that these tests work and that it's often a good idea to test things about a grammar, but emails are a special kind of difficult when you see what's out in the wild. Emails are a case where "best practices" sometimes disagree with the specs and various grammars. The tests will test the RFC 5322 grammar, sure, but it's possible for "real emails" to violate this. This is simply my opinion, and I won't object strongly to the tests being included. |
I'll add, regarding backslash-escapes from RFC 5322, just to illustrate that it gets messy:
To me, I interpret that RFC 5322 flips what are valid emails with backslash-escapes. They can appear inside quoted strings, just like RFC 2822 allows (heck, I don't even think that they're allowed outside quoted strings there, unlike what RFC 3696 mentions). (And yes, I know the tests in this PR don't include quoted strings, as you mention above, I'm just continuing the description of my '@' example, for future readers.) Grammars are certainly "fun"! :) |
For posterity, here was the regex I used until recently: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If others are good with it, I'm good with it.
Addresses failures related to: json-schema-org/JSON-Schema-Test-Suite#405
Addresses failures related to: json-schema-org/JSON-Schema-Test-Suite#405
Addresses failures related to: json-schema-org/JSON-Schema-Test-Suite#405
Addresses failures related to: json-schema-org/JSON-Schema-Test-Suite#405
Addresses failures related to: json-schema-org/JSON-Schema-Test-Suite#405
Addresses failures related to: json-schema-org/JSON-Schema-Test-Suite#405
Before this, a simple
/@/
check passed this format test.This now extends the test a bit more, to check that local part is checked at least for something.
Symbol check + dot position check is a good indication here, because:
This should trigger most of the naive implementations (e.g.
/@/
) and is still not hard to implement.JSON Schema uses RFC 5322.
Refs: https://tools.ietf.org/html/rfc5322#page-17
Refs: https://tools.ietf.org/html/rfc5322#page-12
Also, some explanation for the two dots (with links to same RFC pages): https://serverfault.com/questions/395766/are-two-periods-allowed-in-the-local-part-of-an-email-address