Email Address Regex [was Re: silly regex question]

Gary_Wright · 4 January 2006 22:28

It doesn't make you a bad person but it certainly makes
your application less interoperable than it might be.
For example, the vast majority of email addresses on this
mailing list are of the form:

Bill Kelly <billk@cts.com>

In some GUI environments it is harder to select the portion
between the <>'s than to select the entire address.

From RFC 1123:

   At every layer of the protocols, there is a general rule whose
   application can lead to enormous benefits in robustness and
   interoperability:

"Be liberal in what you accept, and conservative
in what you send"

Gary Wright

···

On Jan 4, 2006, at 1:27 PM, Bill Kelly wrote:

Personally, right or wrong, to catch that I just reject
email addresses with a "<" or ">" in them. I'll admit I
don't really care if some spec says it's possible to legally
form email addresses with those characters. That may make
me a bad person.

Jacob_Fugal · 9 January 2006 23:59

And my regex matches that address.

Jacob Fugal

···

On 1/9/06, Gavin Kistner <gavin@refinery.com> wrote:

On Jan 4, 2006, at 12:12 PM, mental@rydia.net wrote:
> Quoting "Andreas S." <f@andreas-s.net>:
>
>> It is trivial to create a formally correct address that makes
>> absolutely no sense, so what's the point of doing such a
>> complicated and error-prone validation?

For example, a friend of mine has the email address:

?@hisdomain.net

(The domain above was changed to protect his privacy. But the single
question mark as the 'username' is all that he has

Andreas_S1 · 4 January 2006 17:56

Jacob Fugal wrote:

···

On 1/4/06, Andreas S. <f@andreas-s.net> wrote:

Tim Fletcher wrote:
> By "error prone" do you mean that it won't detect addresses that don't
> exist?

No, I mean that it might declare some addresses invalid although they
aren't.

You'll see from my comments in the original post[1] and in my reply to
David Black in the other thread[2] that this regex is indeed compliant
with a single, non-named address as defined by the RFC[3].

Possibly. Still, I prefer a simple solution over a complicated one. What
type of errors do you hope to catch with this huge regex? Typing errors?
Deliberately entered rubbish? The regex accepts just about anything with
a "@", e.g. "$@$".

--
Posted via http://www.ruby-forum.com/\.

Yohanes_Santoso1 · 4 January 2006 19:27

Hal Fulton <hal9000@hypermetrics.com> writes:

Bill Kelly wrote:

But whoever wrote that spec should be
infested with the fleas of 1000 camels.

He probably already is.

Hal

In any case, many of the syntax put in rfc822 has been obseleted in
rfc2822.

The complexity of RFC822 (year 1982) was because the need to
interoperate with wildly different systems. Consider that the RFC
title was: "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT
MESSAGES". As if there was other Internet such that it was needed to
specify which Internet.

Consider the title for RFC2822 (year 2001): "Internet Message Format"
where it was already clear that the ARPA Internet was the winner and
thus can afford to simplify the address syntax.

YS.

Bill_Kelly · 4 January 2006 23:48

Personally, right or wrong, to catch that I just reject
email addresses with a "<" or ">" in them. I'll admit I
don't really care if some spec says it's possible to legally
form email addresses with those characters. That may make
me a bad person.

It doesn't make you a bad person but it certainly makes
your application less interoperable than it might be.
For example, the vast majority of email addresses on this
mailing list are of the form:

Bill Kelly <billk@cts.com>

In some GUI environments it is harder to select the portion
between the <>'s than to select the entire address.

I'd agree that ideally my web form should be smart enough to
handle that. I started out with no validation at all, and only added the <> rejection after observing the occasional
submit with that syntax causing a bounced email.

As I recall, I asked my then-employer what degree of thoroughness he wanted me to invest in coding the email
validation logic, the answer was something like, "Just give
them the chance to enter it again, properly, in the manner
requested. If they can't follow simple directions then I
don't think we want them using our software." Woot!

From RFC 1123:

  At every layer of the protocols, there is a general rule whose
  application can lead to enormous benefits in robustness and
  interoperability:

    "Be liberal in what you accept, and conservative
     in what you send"

<h2>agreed in general, but I don't
think everyone is in agreement that browsers' willingness to
<html>render this muck<body> have resulted in a net</html>
benefit for mankind - although, i have heard it argued
</body>both ways.<head>

Regards,

Bill

···

From: <gwtmp01@mac.com>

On Jan 4, 2006, at 1:27 PM, Bill Kelly wrote:

Jacob_Fugal · 4 January 2006 18:19

Not possibly. Gauranteed. It's compliant to the portions of the RFC I mentioned.

Still, I'll concede it doesn't prevent rubbish from being entered. The
domain of valid email addresses is much larger than the domain of
*actual* email addresses. I'm not claiming that this regex should even
be used for form validation. I dislike email validation period. My
intent in first writing the regex two years ago and bringing it up
again now is mostly:

1) To show off my regex-fu
2) To demonstrate the inadequacy of simplistic regexes for email validation.

For instance, I'll often use the "name+tag@domain" construct to filter
mail and/or determine who's selling my address. When I find a form
that claims that email address is invalid, I get upset. As such, I've
taken it as my own personal crusade to punch down inadequate email
validations whenever I see them. My method is to demonstrate a regex
that does allow valid addresses. My first hope is that they'll notice
the futility and just remove the email address validation altogether.
If that fails, I hope they'll actually use the compliant regex.

The only reason I defended the regex was because you claimed it was
invalid. If you're original argument had been that the regex was
unnecessary, I'd probably have agreed with you. Validating email
addresses by form is pointless. If someone doesn't want to give you
their address, they won't. Requiring them to input a valid fake
address instead of an invalid fake address doesn't improve your data
at all. The only reason I can see that being necessary is to prevent
malformed addresses from breaking your application in some way. But if
that's a problem, fix the application, not the email address.

Jacob Fugal

···

On 1/4/06, Andreas S. <f@andreas-s.net> wrote:

Jacob Fugal wrote:
> On 1/4/06, Andreas S. <f@andreas-s.net> wrote:
>> Tim Fletcher wrote:
>> > By "error prone" do you mean that it won't detect addresses that don't
>> > exist?
>>
>> No, I mean that it might declare some addresses invalid although they
>> aren't.
>
> You'll see from my comments in the original post[1] and in my reply to
> David Black in the other thread[2] that this regex is indeed compliant
> with a single, non-named address as defined by the RFC[3].

Possibly. Still, I prefer a simple solution over a complicated one. What
type of errors do you hope to catch with this huge regex? Typing errors?
Deliberately entered rubbish? The regex accepts just about anything with
a "@", e.g. "$@$".

Chad_Perrin1 · 5 January 2006 02:50

You could always just specifically disallow non-standards-compliant
(X)HTML, though depending on how you handle that it might end up
rejecting a lot of stuff meant for IE and OE that could be of use to you
(depending on what you find useful).

···

On Thu, Jan 05, 2006 at 08:48:48AM +0900, Bill Kelly wrote:

>From RFC 1123:
>
> At every layer of the protocols, there is a general rule whose
> application can lead to enormous benefits in robustness and
> interoperability:
>
> "Be liberal in what you accept, and conservative
> in what you send"

<h2>agreed in general, but I don't
think everyone is in agreement that browsers' willingness to
<html>render this muck<body> have resulted in a net</html>
benefit for mankind - although, i have heard it argued
</body>both ways.<head>

--
Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ]

This sig for rent: a Signify v1.14 production from http://www.debian.org/

Andreas_S1 · 4 January 2006 21:00

Jacob Fugal wrote:

The only reason I defended the regex was because you claimed it was
invalid.

I don't remember that. I dislike complex solutions like this Regex
because they are error prone (as proved by your correction for Tim's
rfc822.rb), I didn't claim yours was invalid.

If you're original argument had been that the regex was
unnecessary, I'd probably have agreed with you. Validating email
addresses by form is pointless. If someone doesn't want to give you
their address, they won't. Requiring them to input a valid fake
address instead of an invalid fake address doesn't improve your data
at all. The only reason I can see that being necessary is to prevent
malformed addresses from breaking your application in some way. But if
that's a problem, fix the application, not the email address.

I totally agree with you.

···

--
Posted via http://www.ruby-forum.com/\.

Jeff_Moss1 · 4 January 2006 22:27

Here's my useful form validation:

/^\s*([-a-z0-9&\'*+.\/=?^_{}~]+@([a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?\.)+[a-z]{2,5}\s*(,\s*|\z))+$/i

It may not catch EVERYTHING, but should work just fine for most
people. It will allow multiple email addresses separated by commas.

I figure if you want to go beyond that, a verification system would be
the next logical step.

-Jeff

···

On Thu, Jan 05, 2006 at 03:19:09AM +0900, Jacob Fugal wrote:

On 1/4/06, Andreas S. <f@andreas-s.net> wrote:
> Jacob Fugal wrote:
> > On 1/4/06, Andreas S. <f@andreas-s.net> wrote:
> >> Tim Fletcher wrote:
> >> > By "error prone" do you mean that it won't detect addresses that don't
> >> > exist?
> >>
> >> No, I mean that it might declare some addresses invalid although they
> >> aren't.
> >
> > You'll see from my comments in the original post[1] and in my reply to
> > David Black in the other thread[2] that this regex is indeed compliant
> > with a single, non-named address as defined by the RFC[3].
>
> Possibly. Still, I prefer a simple solution over a complicated one. What
> type of errors do you hope to catch with this huge regex? Typing errors?
> Deliberately entered rubbish? The regex accepts just about anything with
> a "@", e.g. "$@$".

Not possibly. Gauranteed. It's compliant to the portions of the RFC I mentioned.

Still, I'll concede it doesn't prevent rubbish from being entered. The
domain of valid email addresses is much larger than the domain of
*actual* email addresses. I'm not claiming that this regex should even
be used for form validation. I dislike email validation period. My
intent in first writing the regex two years ago and bringing it up
again now is mostly:

1) To show off my regex-fu
2) To demonstrate the inadequacy of simplistic regexes for email validation.

For instance, I'll often use the "name+tag@domain" construct to filter
mail and/or determine who's selling my address. When I find a form
that claims that email address is invalid, I get upset. As such, I've
taken it as my own personal crusade to punch down inadequate email
validations whenever I see them. My method is to demonstrate a regex
that does allow valid addresses. My first hope is that they'll notice
the futility and just remove the email address validation altogether.
If that fails, I hope they'll actually use the compliant regex.

The only reason I defended the regex was because you claimed it was
invalid. If you're original argument had been that the regex was
unnecessary, I'd probably have agreed with you. Validating email
addresses by form is pointless. If someone doesn't want to give you
their address, they won't. Requiring them to input a valid fake
address instead of an invalid fake address doesn't improve your data
at all. The only reason I can see that being necessary is to prevent
malformed addresses from breaking your application in some way. But if
that's a problem, fix the application, not the email address.

Jacob Fugal

Jacob_Fugal · 5 January 2006 17:03

Ok, checking back on the flow here, this is what I saw:

[Jacob] Be careful with email validation via regex, it's harder than
you might think: <example regex>

  [Andreas] It is trivial to create a formally correct address that
  makes absolutely no sense, so what's the point of doing such a
  complicated and *error-prone validation*?

[Tim] By "error prone" do you mean that it won't detect addresses
that don't exist?

[Andreas] No, I mean that *it might declare some addresses invalid*
although they aren't.

In my mind, due to the use of pronouns, I believed the "error-prone
validation [that] might declare some addresses invalid" referred to my
example regex. Apparently they referred instead to inadequate regex
validation in general. Sorry for the confusion.

Jacob Fugal

···

On 1/4/06, Andreas S. <f@andreas-s.net> wrote:

Jacob Fugal wrote:
> The only reason I defended the regex was because you claimed it was
> invalid.

I don't remember that. I dislike complex solutions like this Regex
because they are error prone (as proved by your correction for Tim's
rfc822.rb), I didn't claim yours was invalid.

Topic		Replies	Views
Validate email address? ruby-talk	7	67	31 January 2008
Email regex ruby-talk	7	123	24 September 2009
Regarding validation of email address ruby-talk	3	113	28 July 2008
Check for valid email address ruby-talk	2	120	5 November 2008
Get all emails in a string? ruby-talk	3	114	14 October 2009

Email Address Regex [was Re: silly regex question]

Related topics