Sorry if this a stupid question but I am new to ruby AND regular
expressions.
Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
- valid http address
- etc.
TIA
-Michael Garriss
Sorry if this a stupid question but I am new to ruby AND regular
expressions.
Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
TIA
-Michael Garriss
Hi Michael,
Has anyone compiled a collection of ‘common’ regular expression
patterns.
Never heard of such a list. The O’Reilly book “Mastering Regular
Expressions” probably contains a whole lot.
- valid email addresses
Short answer: /^\w+@[\w.]+\w+$/
Take an email address: “joe97_smith@some.domain.org”
The email starts with “word” characters (letters, underscores and
numbers):
Word character
>
/^\w+/
Start
Followed by an “@” symbol.
/^\w+@/
Followed by a collection of word characters and dots and ending in word
characters.
End of the string.
>
/^\w+@[\w.]+\w+$/
____/
>
New class containing either word characters or dots.
Similarly, you can construct REs for other tasks.
- valid http address
- etc.
TIA
-Michael Garriss
–
Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137
You might look at the URI at http://arika.org/ruby/uri. It goes a bit beyond
just a regexp for email and http though.
On Mon, Jan 27, 2003 at 04:59:15AM +0900, Michael Garriss wrote:
Sorry if this a stupid question but I am new to ruby AND regular
expressions.Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
- valid http address
- etc.
TIA
-Michael Garriss
–
Alan Chen
Digikata Computing
http://digikata.com
Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
To be strictly pedantic, I don’t think there is such a thing. For the truly
adventurous, consider bang-paths, %'s and the like which have all been valid at
one point or another. There was a big writeup/FAQ in the perl domain about this
very subject.
The perl people have (well, Damian Conway has):
martin
Michael Garriss mgarriss@earthlink.net wrote:
Sorry if this a stupid question but I am new to ruby AND regular
expressions.Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
- valid http address
- etc.
The Perl Cookbook has a chapter on regular expressions, and concludes
with about three pages of nifty little ones. It also has a few items
devoted to things like email adresses.
You might be able to view these online at the PLEAC project:
http://pleac.sourceforge.net. The code from the book can also be
downloaded from the O’Reilly website somewhere.
Gavin
On Monday, January 27, 2003, 6:59:15 AM, Michael wrote:
Sorry if this a stupid question but I am new to ruby AND regular
expressions.
Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
- valid http address
- etc.
Try this site:
http://www.regxlib.com/Default.aspx
later…
— Michael Garriss mgarriss@earthlink.net wrote:
Sorry if this a stupid question but I am new to ruby AND regular
expressions.Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
- valid http address
- etc.
TIA
-Michael Garriss
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
Quoteing dcarrera@math.umd.edu, on Mon, Jan 27, 2003 at 05:17:02AM +0900:
Hi Michael,
Has anyone compiled a collection of ‘common’ regular expression
patterns.Never heard of such a list. The O’Reilly book “Mastering Regular
Expressions” probably contains a whole lot.
Including one for a valid rfc822 email address list, which takes about
a page, no spaces.
- valid email addresses
Short answer: /^\w+@[\w.]+\w+$/
Take an email address: “joe97_smith@some.domain.org”
How about we take:
“hi y@”.“ruby ",master!” ( … a comment!!) @ u%me . u+me . u-me
its syntactically valid, too! Though admittedly unusual…
Cheers,
Sam
Yes, there is actually. Email is a strict protocol, just like FTP and
others. However, there’s so much flexibility in valid email addresses
that you probably want to stick to common email addresses.
On Mon, Jan 27, 2003 at 05:53:53AM +0900, Mike Campbell wrote:
Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
To be strictly pedantic, I don’t think there is such a thing.
–
Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137
Perfect! Thank you.
Mike Thomas wrote:
Try this site:
later…
— Michael Garriss mgarriss@earthlink.net wrote:
Sorry if this a stupid question but I am new to ruby AND regular
expressions.Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
- valid http address
- etc.
TIA
-Michael Garriss
=====
Mike Thomas
http://www.samoht.com
It’s better backwards
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
Wrote Mike Thomas mike_thomas@yahoo.com, on Tue, Jan 28, 2003 at 02:58:47AM +0900:
Try this site:
One of those doesn’t allow _ in the domain name,
admin@ensemble_independant.org
neither allow quoted local-parts,
"C=ca,CO=Certicom,CN=Sam Roberts"@inet-gateway.x500.org
and neither allow whitespace between tokens.
sroberts @ uniserve . com
Here’s the version from Mastering Regular Expressions. It’s perl, so
you’ll have to do a little converting.
A single email address is usually an addr-spec, so if you don’t want to
match things like
Sam Roberts sroberts@uniserve.com
or lists of addresses, you can just look for “Item 2:” below, and expand
that RE. The variables below match the RFC 822 BNF pretty closely.
Cheers,
Sam
–
Sam Roberts sroberts@certicom.com
(From http://public.yahoo.com/~jfriedl/regex/code.html)
$esc = ‘\\’; $Period = ‘.’;
$space = ‘\040’; $tab = ‘\t’;
$OpenBR = ‘[’; $CloseBR = ‘]’;
$OpenParen = ‘(’; $CloseParen = ‘)’;
$NonASCII = ‘\x80-\xff’; $ctrl = ‘\000-\037’;
$CRlist = ‘\n\015’; # note: this should really be only \015.
$qtext = qq/[^$esc$NonASCII$CRlist"]/; # for within “…”
$dtext = qq/[^$esc$NonASCII$CRlist$OpenBR$CloseBR]/; # for within […]
$quoted_pair = qq< $esc [^$NonASCII] >; # an escaped character
$atom_char = qq/[^($space)<>@,;:".$esc$OpenBR$CloseBR$ctrl$NonASCII]/;
$atom = qq<
$atom_char+ # some number of atom characters…
(?!$atom_char) # …not followed by something that could be part of an atom
;
$ctext = qq< [^$esc$NonASCII$CRlist()] >;
$Cnested = qq< $OpenParen (?: $ctext | $quoted_pair )* $CloseParen >;
$comment = qq< $OpenParen
(?: $ctext | $quoted_pair | $Cnested )*
$CloseParen >;
$X = qq< (?: [$space$tab] | $comment )* >; # optional separator
$quoted_str = qq<
" (?: # opening quote…
$qtext # Anything except backslash and quote
> # or
$quoted_pair # Escaped something (something != CR)
)* " # closing quote
;
$word = qq< (?: $atom | $quoted_str ) >;
$domain_ref = $atom;
$domain_lit = qq< $OpenBR # [
(?: $dtext | $quoted_pair )* # stuff
$CloseBR # ]
;
$sub_domain = qq< (?: $domain_ref | $domain_lit ) >;
$domain = qq< $sub_domain # initial subdomain
(?: #
$X $Period # if led by a period…
$X $sub_domain # …further okay
)*
;
$route = qq< @ $X $domain
(?: $X , $X @ $X $domain )* # further okay, if led by comma
: # closing colon
;
$local_part = qq< $word # initial word
(?: $X $Period $X $word )* # further okay, if led by a period
;
$addr_spec = qq< $local_part $X @ $X $domain >;
$route_addr = qq[ < $X # leading <
(?: $route $X )? # optional route
$addr_spec # address spec
$X > # trailing >
];
$phrase_ctrl = ‘\000-\010\012-\037’; # like ctrl, but without tab
$phrase_char =
qq/[^()<>@,;:".$esc$OpenBR$CloseBR$NonASCII$phrase_ctrl]/;
$phrase = qq< $word # one word, optionally followed by…
(?:
$phrase_char | # atom and space parts, or…
$comment | # comments, or…
$quoted_str # quoted strings
)*
;
$mailbox = qq< $X # optional leading comment
(?: $addr_spec # address
> # or
$phrase $route_addr # name and address
) $X # optional trailing comment
;
###########################################################################
my $error = 0;
my $valid;
foreach $address (@ARGV) {
$valid = $address =~ m/^$mailbox$/xo;
printf “`$address’ is syntactically %s.\n”, $valid ? “valid” : “invalid”;
$error = 1 if not $valid;
}
exit $error;
Hi,
Isn’t this the very last example in the book “Mastering Regular
Expressions” by Friedl, i.e., the Appendix B: Email Regex Program? (At
least that is in the first edition.) Be careful though, because when the
regex is expanded into its plain form, the regex size is 6,598 bytes long
.
Regards,
Bill
Mike Campbell michael_s_campbell@yahoo.com wrote:
Has anyone compiled a collection of ‘common’ regular expression
patterns. For example:
- valid email addresses
To be strictly pedantic, I don’t think there is such a thing. For the truly
adventurous, consider bang-paths, %'s and the like which have all been valid at
one point or another. There was a big writeup/FAQ in the perl domain about this
very subject.
I only meant to write an “approximate” RE that would work most of the
time. Writing a truly comprehensive RE would be very difficult and
probably not even worth it.
For instance, did you know that the backspace character is technically
allowed? Your address could be:
@domain.com
But who’s really going to have a backspace in their email address? (good
luck getting any email there).
It’s better to just ignore this possibility.
Cheers,
On Mon, Jan 27, 2003 at 05:41:17AM +0900, Sam Roberts wrote:
How about we take:
“hi y@”.“ruby ",master!” ( … a comment!!) @ u%me . u+me . u-me
its syntactically valid, too! Though admittedly unusual…
–
Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137
Quoteing dcarrera@math.umd.edu, on Mon, Jan 27, 2003 at 06:03:31AM +0900:
> > - valid email addresses
> To be strictly pedantic, I don't think there is such a thing.Yes, there is actually. Email is a strict protocol, just like FTP and
others. However, there's so much flexibility in valid email addresses
that you probably want to stick to common email addresses.
I think Mikes referring to a perl conversation about validity,
in which there was some ambiguity in what people mean by valid.
The syntax is completely described, but some people mean by "valid
email address" an email address you can actually send mail to, which
involves doing stuff like making sure the domain name exists and
is reachable. That kind of validity is pretty much impossible to
check, without actually sending a mail and getting a reply!
Cheers,
Sam
On Mon, Jan 27, 2003 at 05:53:53AM +0900, Mike Campbell wrote:
To be strictly pedantic, I don’t think there is such a thing.
Yes, there is actually. Email is a strict protocol, just like FTP and
others. However, there’s so much flexibility in valid email addresses
that you probably want to stick to common email addresses.
As another person noted, I wasn’t saying you couldn’t check for RFC-822
compliance, but rather that you can’t, via a regex, determine if a mail address
is valid; i.e., it’ll get there.
Here’s the perl faq to which I was referring. this is a shortened version,
which is a shame, as the one I recall from years gone by gave examples using %'s
which could (IIRC) be legally parsed in more than 1 way, either, both, or
neither being “valid”. That may have been pre RFC-822 though, to be fair.
=============
How do I check a valid email address?
You can’t.
Remember that without sending mail to the address and seeing whether it
bounces (and even then you face the halting problem), you cannot
determine whether an email address is valid. Even if you apply
the email header standard, you can have problems, because there are deliverable
addresses that aren’t RFC-822 (the mail header standard) compliant,
and addresses that aren’t deliverable which are.
\w.- ↩︎
Daniel Carrera dcarrera@math.umd.edu writes:
But who’s really going to have a backspace in their email address? (good
luck getting any email there).
It’s better to just ignore this possibility.
And who’s going to have crazy things like pluses or hyphens in their
email address? Please, if you’re going to do email address validation,
do it properly, or you won’t be getting any mail from me. Some might
consider that an unexpected bonus, of course.
–
Given an infinite amount of monkeys an infinite amount of time, an
infinite amount of drafting supplies, and an infinite amount of crack,
they’d come up with Downtown Chicago. – David Jacoby, in the monastery
Quoteing dcarrera@math.umd.edu, on Mon, Jan 27, 2003 at 05:53:51AM +0900:
How about we take:
“hi y@”.“ruby ",master!” ( … a comment!!) @ u%me . u+me . u-me
its syntactically valid, too! Though admittedly unusual…
I only meant to write an “approximate” RE that would work most of the
Oh, hey, I know that! I wasn’t trying to trash your regexp.
It’ time. Writing a truly comprehensive RE would be very difficult and
probably not even worth it.
Yep, but + and - in domain names isn’t too uncommon, and you still see email
addresses with the %-hack for uucp in the local-part.
Anyhow, its actually not too hard to write a RE from the BNF in RFC822,
particularly if you ignore some deprecated by RFC2822 stuff, but
luckily, we don’t have to, because you can find the RE in the
excellent book you recommended, Mastering Regular Expressions.
I picked it up because I couldn’t believe REs were complicated enough to
need a whole book, and then kept reading out of amazement.
Cheers,
Sam
On Mon, Jan 27, 2003 at 05:41:17AM +0900, Sam Roberts wrote: