Skip to content

Extend email format tests #405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 4, 2020

Conversation

ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Jun 12, 2020

Before this, a simple /@/ check passed this format test.

This now extends the test a bit more, to check that local part is checked at least for something.

Symbol check + dot position check is a good indication here, because:

  1. It's common and doesn't need implementing more complex logic (quotes etc)
  2. Dot is the only symbol which validity depends on the position in the local part

This should trigger most of the naive implementations (e.g. /@/) and is still not hard to implement.

JSON Schema uses RFC 5322.

Refs: https://tools.ietf.org/html/rfc5322#page-17
Refs: https://tools.ietf.org/html/rfc5322#page-12

Also, some explanation for the two dots (with links to same RFC pages): https://serverfault.com/questions/395766/are-two-periods-allowed-in-the-local-part-of-an-email-address

@ChALkeR ChALkeR marked this pull request as draft June 12, 2020 06:44
@ChALkeR ChALkeR marked this pull request as ready for review June 12, 2020 06:49
@ChALkeR ChALkeR force-pushed the chalker/email-format branch 2 times, most recently from c857051 to bd97d43 Compare June 12, 2020 10:26
@ChALkeR
Copy link
Member Author

ChALkeR commented Jun 12, 2020

ajv should pass all of these.

Whoops. It passes all the dot checks correctly but fails root@localhost for some reason.
Strange, require('ajv')().compile({format:'email'})('root@localhost') works but npm t in this repo fails on that.

Upd: ah, because of require('ajv')({format:'full'}).compile({format:'email'})('root@localhost')

Upd: I just removed root@localhost for now due to not being sure if that's intended behavior or not.

@ChALkeR ChALkeR force-pushed the chalker/email-format branch from bd97d43 to e8e406f Compare June 12, 2020 10:40
@karenetheridge
Copy link
Member

The library I checked with does not think the last two tests (space in domain; symbol in domain) are correct with respect to RFC 2822.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jun 12, 2020

The library I checked with does not think the last two tests (space in domain; symbol in domain) are correct with respect to RFC 2822.

But they are listed there with valid: false, so that's as it should be?

@karenetheridge
Copy link
Member

No, I mean that the library thinks those two cases are valid emails.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jun 12, 2020

Ah, I misread your comment, sorry.

Reading RFC 5322, it does indeed seem that & in the domain part is allowed (which is strange, I would say).
Space shouldn't be allowed though.

But I removed both to be on the safe side, because I'm not entirely sure there, domain part of the spec seems trickier than I initially thought.

Before this, a simple /@/ check passed this format test.

This now extends the test a bit more, to check that local part is
checked at least for something.

Symbol check + dot position check is a good indication here, because:
1) It's common and doesn't need implementing complex quotes logic
2) Dot is the only symbol which validity depends on the position in the
   local part

This should trigger most of the naive implementations (e.g. /@/) and is
still not hard to implement as it doesn't include quoted strings support.

JSON Schema uses RFC 5322.

Refs: https://tools.ietf.org/html/rfc5322#page-17
Refs: https://tools.ietf.org/html/rfc5322#page-12
@ChALkeR ChALkeR force-pushed the chalker/email-format branch from e8e406f to 7ad7443 Compare June 12, 2020 19:44
@Julian
Copy link
Member

Julian commented Jun 14, 2020

Before this, a simple /@/ check passed this format test.

FWIW this is exactly how my implementation implements the email checker as well :) (and will continue to do so) -- though as I've said in #253 I'm fine with other folks adding more tests here obviously, those of us who don't want to implement more strict email format verification will just skip the new ones.

@ssilverman
Copy link
Member

ssilverman commented Jun 19, 2020

My two cents: emails, much more so than hostnames, are hard to parse well, and I wouldn't include any false tests here. Yes, the tests you have here are technically valid, and yes, they follow the RFCs, but email is notoriously "flexible".

Note that even checking for the presence of a "@" is technically wrong because the escaped form, "\@" is allowed inside the local-part (and actually allowed in any quoted parts of that local part, non-escaped, "Abc@def"@example.com, for example. See: RFC 3696, Section 3). And don't get me started on the dot-separated "quoted-string OR atom" form in obs-local-part.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jun 19, 2020

@ssilverman I don't think that unquoted \@ is allowed per RFC 5322, which the spec uses?
And this purposely excludes any tests for quoted strings.

wouldn't include any false tests here

Then email format should be removed from the spec alltogether, if users can't assume that it will return false on at least something. I.e. if we can't assume that email validation does at least something -- why is that even included?

The intent of this PR is to check if any minimal email validation is present apart from a /@/ check.
If there are some easier tests that these -- I am fine with replacing these with those :-).

@ssilverman
Copy link
Member

ssilverman commented Jun 19, 2020

I don't disagree that test sets should contain false tests. I just think they make life complicated specifically for emails. This is just my opinion and two cents.

Per the link in my comment (RFC 3696, Section 3), the following emails are valid (I'm repeating the link for posterity and for future readers):

  1. Unquoted form: Abc\@[email protected]
  2. Quoted form: "Abc@def"@example.com

My point is that whether that's allowed in RFC 5322 or not, you can bet someone uses them somewhere. :)

Emails get very messy to parse "properly" and I personally don't see much value in even writing strict tests for them. The domain parts, maybe, but not the local part (before the '@'). Just to emphasize: I don't disagree that these tests work and that it's often a good idea to test things about a grammar, but emails are a special kind of difficult when you see what's out in the wild.

Emails are a case where "best practices" sometimes disagree with the specs and various grammars.

The tests will test the RFC 5322 grammar, sure, but it's possible for "real emails" to violate this. This is simply my opinion, and I won't object strongly to the tests being included.

@ssilverman
Copy link
Member

ssilverman commented Jun 19, 2020

I'll add, regarding backslash-escapes from RFC 5322, just to illustrate that it gets messy:

  1. local-part can contain obs-local-part which can contain words.
  2. words are either atoms or quoted-strings.
  3. quoted-strings may contain qtext or quoted_pairs.
  4. And, if you continue going down into the grammar, you'll find that VCHARs include the '@' character. So now, you can have backslash-escaped '@' characters inside quoted strings.

To me, I interpret that RFC 5322 flips what are valid emails with backslash-escapes. They can appear inside quoted strings, just like RFC 2822 allows (heck, I don't even think that they're allowed outside quoted strings there, unlike what RFC 3696 mentions). (And yes, I know the tests in this PR don't include quoted strings, as you mention above, I'm just continuing the description of my '@' example, for future readers.)

Grammars are certainly "fun"! :)

@ssilverman ssilverman requested a review from Julian June 19, 2020 06:33
@ssilverman
Copy link
Member

ssilverman commented Jun 19, 2020

For posterity, here was the regex I used until recently: ^[^@]+@[^@]+$.
Here's the regex I use now: ^.*[^\\]@[^@]+$
Double-yuck. 🤢

@ssilverman ssilverman removed the request for review from Julian June 19, 2020 06:40
Copy link
Member

@ssilverman ssilverman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If others are good with it, I'm good with it.

@ssilverman ssilverman requested a review from a team June 19, 2020 06:48
@Julian Julian merged commit 1d5c3c0 into json-schema-org:master Jul 4, 2020
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request Oct 21, 2021
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request Apr 22, 2022
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request May 15, 2023
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request May 16, 2023
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request May 22, 2023
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants