Skip to content
This repository was archived by the owner on Apr 12, 2024. It is now read-only.

Fix eMail address regex #14719

Closed
wants to merge 7 commits into from
Closed

Fix eMail address regex #14719

wants to merge 7 commits into from

Conversation

mirabilos
Copy link
Contributor

These commits address a few shortcomings of the current eMail address regular expression used for form validation:

  • ‘.’ may not begin or end a local-part (SMTP RFC)
  • limit host part total length to 255, label length to 63 (DNS RFC)
  • case-sensitive comparison (speed optimisation and reliability)

mirabilos added 4 commits June 6, 2016 15:00
A period (‘.’) may not begin or end a local-part

cf. http://www.mirbsd.org/cvs.cgi/contrib/hosted/tg/mailfrom.php?rev=HEAD
and RFC 822 / 5321
A “label” (each of the things between the dots (‘.’) after the ‘@’ in
the eMail address) MUST NOT be longer than 63 characters.

cf. http://www.mirbsd.org/cvs.cgi/contrib/hosted/tg/mailfrom.php?rev=HEAD
and RFC 1035 §2.3.4
RFC 1035 §2.3.4 imposes a maximum length for any DNS object;
RFC 5321 §2.3.5 references this (and requires FQDNs, but there
have been cases where a TLD had an MX RR and thus eMail addresses
like “localpart@tld” are valid).

Credits: Natureshadow <[email protected]>
There is no need to make it case-insensitive; the local-part
already catches the entire range, and the host part is easily
done. Furthermore, this makes the regex locale-independent,
avoiding issues with e.g. turkish case conversions.

cf. http://www.mirbsd.org/cvs.cgi/contrib/hosted/tg/mailfrom.php?rev=HEAD
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

1 similar comment
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@mirabilos
Copy link
Contributor Author

I signed it!

(although this diff does not pass threshold of originality, and as such is not covered by copyright protection)

@Narretz
Copy link
Contributor

Narretz commented Jun 6, 2016

@mirabilos CLA is required for all Google projects, unfortunately. I wouldn't require it personally.

Can you include links to the docs on which these changes are based?

And we also need tests for these changes in this file: https://github.com/angular/angular.js/blob/master/test/ng/directive/inputSpec.js#L2784

@Narretz Narretz added this to the Backlog milestone Jun 6, 2016
@mirabilos
Copy link
Contributor Author

Martin Staffa dixit:

@mirabilos CLA is required for all Google projects, unfortunately. I
wouldn't require it personally.

Yeswell, I did sign it, and today I even went and made sure
the eMail address I used in the commits was registered, so
go on ;)

Can you include links to the docs on which these changes are based?

RFC 822/5321 (SMTP) and RFC 1035 (DNS) are linked from the
individual commit messages, shouldn’t that suffice?

And we also need tests for these changes in this file:
https://github.com/angular/angular.js/blob/master/test/ng/directive/inputSpec.js#L2784

When I read that I thought “oh nooooo…” at first, but that
looks reasonable. I’ll see I can throw some together that
cover eMail addresses in more depth.

Is adding them in a subsequent commit okay with you? I don’t
much like destroying the actual history by rebasing too much.

Thanks,

//mirabilos

tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

@googlebot
Copy link

CLAs look good, thanks!

1 similar comment
@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no labels Jun 6, 2016
@@ -24,7 +24,7 @@ var ISO_DATE_REGEXP = /^\d{4,}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d\.\d+(?:[+-
// 9. Fragment
// 1111111111111111 222 333333 44444 555555555555555555555555 666 77777777 8888888 999
var URL_REGEXP = /^[a-z][a-z\d.+-]*:\/*(?:[^:@]+(?::[^@]+)?@)?(?:[^\s:/?#]+|\[[a-f\d:]+\])(?::\d+)?(?:\/[^?#]*)?(?:\?[^#]*)?(?:#.*)?$/i;
var EMAIL_REGEXP = /^[a-z0-9!#$%&'*+\/=?^_`{|}~.-]+@[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$/i;
var EMAIL_REGEXP = /^[-!#-'*+\/-9=?A-Z^-~]+(\.[-!#-'*+\/-9=?A-Z^-~]+)*@(?=.{1,255}$)[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ranges such as #-' or \/-9 make this much less readable imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Georgios Kalpakas dixit:

Ranges such as #-' or \/-9 make this much less readable imo.

OK, I’ll expand them. But then, there are people who can’t read
a regex no matter how legible it is, and others will know about
it…

bye,

//mirabilos

tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

@mirabilos
Copy link
Contributor Author

mirabilos commented Jun 6, 2016

@Narretz tests included, let’s wait for Travis and hope I didn’t add a typo somewhere by accident… and that expanding the character class ranges didn’t fuck it up either…

@mirabilos
Copy link
Contributor Author

@Narretz haha “line is too long”, any idea how I can address this test?

Inhouse we always exclude checks such as this one from being run on the testsuite, only on the main code.

In this special case, the line is legitimately very long as it tests a very long string… which, I suppose, I could concatenate, but… any suggestions? It’s your project after all.

requested by Georgios Kalpakas in #14719

(let’s hope all the special characters are accepted by JS as-is…)
@gkalpak
Copy link
Member

gkalpak commented Jun 6, 2016

LGTM (assuming the tests are green). Now, the big question: Is it a BC ?

@mirabilos
Copy link
Contributor Author

mirabilos commented Jun 6, 2016

@gkalpak still waiting for the tests, I tried to get around the line length issue by concatenating, and had a pasto in them indeed (something I tested from the command-line but mispasted).

Now, the big question: what is a “BC”?

https://www.mirbsd.org/wtf.cgi?q=BC doesn’t seem to match in this context…

@mirabilos
Copy link
Contributor Author

I just recherched the maximum length of a local-part, and it’s 64 (RFC 5321 §4.5.3.1.1) so I’ll add another change to limit (and check!) that, but I’ll wait for this set of tests to pass first.

@Narretz
Copy link
Contributor

Narretz commented Jun 6, 2016

BC means Breaking Change, i.e. if the change will break something that worked before in someone's application. Since this tightens it in terms of the spec, I think it's not BC.

BTW, we are going to squash the commits into one before merging (but preserve the commit message contents)

@gkalpak
Copy link
Member

gkalpak commented Jun 6, 2016

Lol, sorry for the acronym. So, "BC" is a breaking change.

Regarding, RFC 5321 §4.5.3.1.1, it sounds more like a recommendation than a hard limit:

Every implementation MUST be able to receive objects of at least
these sizes. Objects larger than these sizes SHOULD be avoided when
possible. However, some Internet mail constructs such as encoded
X.400 addresses (RFC 2156 [35]) will often require larger objects.

It might be better to leave local-part unrestricted.

@mirabilos
Copy link
Contributor Author

No need to be sorry about the acronyms – I actually do collect them.

I’d prefer to have the commits not squashed, as they contain detail relevant for each partial change, but… I guess it’s your project.

This is not a breaking change because ① it only invalidates some eMail addresses previously passing as valid which are not valid in SMTP, and ② your own documentation – https://docs.angularjs.org/api/ng/input/input%5Bemail%5D – suggests users with specific requirements to use their own checks anyway.

I’d not let things unrestricted because the regex used is a “reasonably sane default” for eMail addresses, which omits, for example, all obsolete forms, the quoted localpart form, and the literal addresses like user@[10.0.0.1] and user@[IPv6:::1] from the set of valid addresses. In short, it’s a regex for accepting “what users are most likely to use as internet-reachable eMail address on a website signup form”. Site-local use, e.g. non-SMTP MTAs (Exchange, Lotus, X.400 or something) have specific rules, but I think they’re outside the scope of this. As the Perl people say: 「How to validate a user supplied email address is a FAQ (see perlfaq9): the only sure way to see if a supplied email address is genuine is to send an email to it and see if the user recieves it.」

In case you’re curious: there’s actually an eMail address validating exactly what the RFC says, but it’s so large it doesn’t even fit on my (laptop) screen. I can dig it out if needed.

I think the use case for this expression is to catch common typos and mistakes.

@gkalpak
Copy link
Member

gkalpak commented Jun 6, 2016

I’d prefer to have the commits not squashed, as they contain detail relevant for each partial change

We prefer to squash the commit when merging (so our history is cleaner on a higher level), but keep the unsquashed commits on the PR, which we link to from the squashed commit message. So, all the context is preserved and is easy to get to from the final commit.

I’d not let things unrestricted because the regex used is a “reasonably sane default” for eMail addresses, which omits, for example, all obsolete forms, the quoted localpart form, and the literal addresses like user@[10.0.0.1] and user@[IPv6:::1] from the set of valid addresses.

Point taken 😃

@mirabilos
Copy link
Contributor Author

OK.

Oh great, now we have a line length issue in the regex, out of all places. I’ll try to wrap this.

(On the topic of acronyms, if you have any that http://www.mirbsd.org/wtf.htm or https://f-droid.org/repository/browse/?fdid=de.naturalnet.mirwtfapp doesn’t know, send them to me privately.)

@Narretz
Copy link
Contributor

Narretz commented Jun 6, 2016

Don't wrap it. Use

/* jshint maxlen:250 */
// regex line
/* jshint maxlen:200 */

Untested, but should work

mirabilos added 2 commits June 7, 2016 00:45
RFC 5321 §4.5.3.1.1 ⇒ local-part can have up to 64 octets
RFC 5321 §4.5.3.1.3 ⇒ path “including the punctuation and
element separators” can have up to 256 octets
RFC 5321 §4.1.2 specifies path as ‘<’ + mailbox¹ + ‘>’ in
the best case, leaving us 254 octets

The limitation of the total path size to 254 octets leaves
at most 252 octets (one local-part, one ‘@’) for the domain,
which means we don’t need to explicitly check the domain
size any more (removing the assertion after the ‘@’).

① RFC 821/5321 “mailbox” is the same as RFC 822 “addr-spec”
@mirabilos
Copy link
Contributor Author

Thanks, that indeed worked. I think we’re ready for review then.

petebacondarwin pushed a commit that referenced this pull request Jun 10, 2016
**Limit size of local-part and total path size in eMail addresses**

RFC 5321 §4.5.3.1.1 ⇒ local-part can have up to 64 octets
RFC 5321 §4.5.3.1.3 ⇒ path “including the punctuation and
element separators” can have up to 256 octets
RFC 5321 §4.1.2 specifies path as ‘<’ + mailbox¹ + ‘>’ in
the best case, leaving us 254 octets

The limitation of the total path size to 254 octets leaves
at most 252 octets (one local-part, one ‘@’) for the domain,
which means we don’t need to explicitly check the domain
size any more (removing the assertion after the ‘@’).

① RFC 821/5321 “mailbox” is the same as RFC 822 “addr-spec”

**Optimise eMail address regex for speed**

There is no need to make it case-insensitive; the local-part
already catches the entire range, and the host part is easily
done. Furthermore, this makes the regex locale-independent,
avoiding issues with e.g. turkish case conversions.

cf. http://www.mirbsd.org/cvs.cgi/contrib/hosted/tg/mailfrom.php?rev=HEAD

**Limit eMail address total host part length**

RFC 1035 §2.3.4 imposes a maximum length for any DNS object;
RFC 5321 §2.3.5 references this (and requires FQDNs, but there
have been cases where a TLD had an MX RR and thus eMail addresses
like “localpart@tld” are valid).

Credits: Natureshadow <[email protected]>

**Limit eMail address individual host part length**

A “label” (each of the things between the dots (‘.’) after the ‘@’ in
the eMail address) MUST NOT be longer than 63 characters.

cf. http://www.mirbsd.org/cvs.cgi/contrib/hosted/tg/mailfrom.php?rev=HEAD
and RFC 1035 §2.3.4

**Fix eMail address local-part validation**

A period (‘.’) may not begin or end a local-part

cf. http://www.mirbsd.org/cvs.cgi/contrib/hosted/tg/mailfrom.php?rev=HEAD
and RFC 822 / 5321

Closes #14719
@frederikprijck frederikprijck mentioned this pull request May 31, 2017
3 tasks
@gkalpak gkalpak mentioned this pull request Sep 11, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants