Here's a rails example for validating email addresses.
validates_format_of :login, :with => /
^[-^!$#%&'*+\/=?`{|}~.\w]+
@[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])*
(\.[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])*)+$/x,
:message => "must be a valid email address",
:on => :create
Be careful with email validation via regex, it's harder than you might
think[1][2]:
/^([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\x0C\x0E-\x21\x23-\x5B\x5D
-\x7F]|\\[\x00-\x7F])*")(\.([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\
x0C\x0E-\x21\x23-\x5B\x5D-\x7F]|\\[\x00-\x7F])*"))*@([a-zA-Z0-9&_?\/`!
#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[\x00-\x7F])*\])(\.
([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[
\x00-\x7F])*\]))*$/
Jacob Fugal
[1] From http://phantom.byu.edu/pipermail/uug-list/2004-January/009707.html
[2] That regex needs some serious /x treatment, which I didn't know
about at the time it was written.
···
On 1/3/06, Dan Kohn <dan@dankohn.com> wrote:
Hello.
Jacob Fugal:
Be careful with email validation via regex, it's harder than you might
think[1][2]:
/^([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\x0C\x0E-\x21\x23-\x5B\x5D
-\x7F]|\\[\x00-\x7F])*")(\.([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\
x0C\x0E-\x21\x23-\x5B\x5D-\x7F]|\\[\x00-\x7F])*"))*@([a-zA-Z0-9&_?\/`!
>#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[\x00-\x7F])*\])(\.
([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[
\x00-\x7F])*\]))*$/
It does match
" spaces! @s! \"escaped quotes!\" "@shot.pl
and it's the first one doing this that I know of, kudos!
Unfortunately, it does not match 'international' domains, so
it wouldn't pass addresses in the domain of, say, gżegżółka.pl
Cheers,
-- Shot
···
--
Like the ski resort of girls looking for husbands and husbands looking
for girls, the situation is not as symmetrical as it might seem.
====================== home.pl: Nr 1 w Polsce. Domeny, Hosting, Serwery WWW, Strony, eSklep, Office 365 === home.pl: Nr 1 w Polsce. Domeny, Hosting, Serwery WWW, Strony, eSklep, Office 365 ===
Job security? I mean, without pointer arithmetic and its associated mysteries (negative array indices were a personal favourite), we need something to keep us gainfully employed!
matthew smillie.
···
On Jan 4, 2006, at 12:47, Andreas S. wrote:
Jacob Fugal wrote:
On 1/3/06, Dan Kohn <dan@dankohn.com> wrote:
Here's a rails example for validating email addresses.
validates_format_of :login, :with => /
^[-^!$#%&'*+\/=?`{|}~.\w]+
@[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])*
(\.[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])*)+$/x,
:message => "must be a valid email address",
:on => :create
Be careful with email validation via regex, it's harder than you might
think[1][2]:
/^([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\x0C\x0E-\x21\x23-\x5B\x5D
-\x7F]|\\[\x00-\x7F])*")(\.([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\
x0C\x0E-\x21\x23-\x5B\x5D-\x7F]|\\[\x00-\x7F])*"))*@([a-zA-Z0-9&_?\/`!
>#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[\x00-\x7F])*\])(\.
([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[
\x00-\x7F])*\]))*$/
It is trivial to create a formally correct address that makes absolutely
no sense, so what's the point of doing such a complicated and
error-prone validation?
By "error prone" do you mean that it won't detect addresses that don't
exist?
Is it not still better to catch some errors than none at all?
Yeah, as I said in the footnote, the regex I posted needed some
readability treatment. Yours looks pretty nice, and exactly equivalent
except for a typo in quoted_pair:
- quoted_pair = '\\x5c\\x00-\\x7f'
+ quoted_pair = '\\x5c[\\x00-\\x7f]'
Jacob Fugal
···
On 1/4/06, Tim Fletcher <twoggle@gmail.com> wrote:
http://tfletcher.com/lib/rfc822.rb
(doesn't look quite as messy
Yeah, I've seen that one as well. My regex is only meant to match the
definition of an 'addr-spec' token (described as "global" or "simple"
address) in section 6.1 of the RFC822 grammar, as opposed to a
'mailbox' or 'address'. I figure people aren't going to type the "John
Doe <john@doe.com>" format into a form, nor named lists ('group' token
in the grammar).
Jacob Fugal
···
On 1/4/06, dblack@wobblini.net <dblack@wobblini.net> wrote:
See also: Mail::RFC822::Address
Quoting "Andreas S." <f@andreas-s.net>:
It is trivial to create a formally correct address that makes
absolutely no sense, so what's the point of doing such a
complicated and error-prone validation?
Well, I might actually have one.
The comment form on my web site sends email directly to me; as a
convenience, the email address entered on the form becomes the
email's From address (I can see who it's from and reply more
easily).
Now, doing that would open up all sorts of injection attacks if I
didn't do any validation. So I do a quick and paranoid (syntactic)
validity check -- if the address fails, then it is included in the
body of the message instead of a header field.
In this case, a nonsensical address is perfectly fine (I will see it
and know better), and it's even okay if a valid address is rejected
(I'll still get the message and be able to figure things out from
the body), but I have to be able to detect syntactically invalid
addresses.
-mental
Hello.
Jacob Fugal:
> Be careful with email validation via regex, it's harder than you might
> think[1][2]:
>
> /^([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\x0C\x0E-\x21\x23-\x5B\x5D
> -\x7F]|\\[\x00-\x7F])*")(\.([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|"([\x00-\
> x0C\x0E-\x21\x23-\x5B\x5D-\x7F]|\\[\x00-\x7F])*"))*@([a-zA-Z0-9&_?\/`!
> >#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[\x00-\x7F])*\])(\.
> ([a-zA-Z0-9&_?\/`!|#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[
> \x00-\x7F])*\]))*$/
It does match
" spaces! @s! \"escaped quotes!\" "@shot.pl
and it's the first one doing this that I know of, kudos!
Not the first, I've been preceded by others that are even more correct
(and complex) :). Particularly:
Mail::RFC822::Address
Unfortunately, it does not match 'international' domains, so
it wouldn't pass addresses in the domain of, say, gżegżółka.pl
Good point. When I wrote this expression, I was only considering ASCII
characters in the 0x00-0x7F (0-127 decimal, which doesn't include
extended characters). Looking back at RFC822, it looks like that RFC
is likewise limited. It has no support for extended ASCII or UNICODE.
This is reasonable, based on the age of the RFC (1982).
As I understand from Yohanes' post in this thread, RFC2822 (2001)
supercedes RFC822, so I assume RFC2822 probably takes extended ASCII
-- and hopefully UNICODE, as well -- into account. Time to update the
regex! I'll leave it to someone else, however.
Jacob Fugal
···
On 1/6/06, Shot - Piotr Szotkowski <shot@shot.pl> wrote:
Hi!
It is trivial to create a formally correct address that makes
absolutely no sense, so what's the point of doing such a complicated
and error-prone validation?
To give one example: On German keyboards "@" is entered using
"AltGr-q". If one releases "AltGr" before pushing "q" (which may well
happen if you type the quick-and-dirty way) "nobody@example.com"
becomes "nobodyqexample.com".
Also one should keep in mind the three commandments of distrust:
1. He who inputs is guilty.
2. He who inputs remains guilty unless he proofs that he is *not*
guilty.
3. If the proof under rule 2 leaves any doubt (no matter how tiny it
may be) the first rule applies.
In short: Input is evil unless you know for sure that it is not.
Josef 'Jupp' Schugt
···
At Wed, 4 Jan 2006 21:47:34 +0900, Andreas S. wrote:
--
Wer Nutzen aus Folterungen zieht oder dies befuerwortet, darf als
ausgewiesener Feind der freiheitlich-demokratischen Grundordnung kein
politisches Amt bekleiden. Ein etwaiger Beamtenstatus ist aus dem
gleichen Grunde umgehend zu entziehen.
Tim Fletcher wrote:
By "error prone" do you mean that it won't detect addresses that don't
exist?
No, I mean that it might declare some addresses invalid although they
aren't.
···
--
Posted via http://www.ruby-forum.com/\.
I figure people aren't going to type the "John
Doe <john@doe.com>" format into a form
In my experience, a certain percentage do. I'm guessing it might be because they copied their email address out
of something like Outlook Express, and pasted it into the
form. (OE will display "John Doe" as a hyperlink, which if
selected and copied turns into "John Doe <john@doe.com>"
in the clipboard.)
Personally, right or wrong, to catch that I just reject
email addresses with a "<" or ">" in them. I'll admit I
don't really care if some spec says it's possible to legally
form email addresses with those characters. That may make
me a bad person. But whoever wrote that spec should be
infested with the fleas of 1000 camels.
Regards,
Bill
···
From: "Jacob Fugal" <lukfugl@gmail.com>
The full RFC2822 regex is too big, but RMail has a parser for it.
The full RFC2822 regex is too big, but RMail has a parser for it.
For example, a friend of mine has the email address:
?@hisdomain.net
(The domain above was changed to protect his privacy. But the single question mark as the 'username' is all that he has
···
On Jan 4, 2006, at 12:12 PM, mental@rydia.net wrote:
Quoting "Andreas S." <f@andreas-s.net>:
It is trivial to create a formally correct address that makes
absolutely no sense, so what's the point of doing such a
complicated and error-prone validation?
You'll see from my comments in the original post[1] and in my reply to
David Black in the other thread[2] that this regex is indeed compliant
with a single, non-named address as defined by the RFC[3].
Jacob Fugal
[1] http://phantom.byu.edu/pipermail/uug-list/2004-January/009707.html
[2] [ruby-talk:174081]
[3] RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES (RFC822)
···
On 1/4/06, Andreas S. <f@andreas-s.net> wrote:
Tim Fletcher wrote:
> By "error prone" do you mean that it won't detect addresses that don't
> exist?
No, I mean that it might declare some addresses invalid although they
aren't.
It's possible that someone might copy and paste something in that format
from a number of other, non-Windows email clients, too -- though it's
more difficult to do so by accident, since generally clients like mutt
won't drop more stuff into your copy/paste buffer than you actually
highlighted.
···
On Thu, Jan 05, 2006 at 03:27:48AM +0900, Bill Kelly wrote:
From: "Jacob Fugal" <lukfugl@gmail.com>
>
>I figure people aren't going to type the "John
>Doe <john@doe.com>" format into a form
In my experience, a certain percentage do. I'm guessing
it might be because they copied their email address out
of something like Outlook Express, and pasted it into the
form. (OE will display "John Doe" as a hyperlink, which if
selected and copied turns into "John Doe <john@doe.com>"
in the clipboard.)
--
Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ]
unix virus: If you're using a unixlike OS, please forward
this to 20 others and erase your system partition.