Best way to parse email recipient lists?

Hey all,

So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).

These can obviously come in a wild variety of formats, and I'd like to
support as many as possible.

The other gotcha - is that I'd like to keep as much name metadata
available as possible.

Using TMail's parser - I was under the impression that the name portion
in the "to", "cc", "bcc" fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there's a work around)

Here are a few example scenarios (from relatively easy to a little
harder):
raw.uncooked@sushi.co.jp
mr.techie+nospam@gmail.com
<raw@hotmail.com>
"Bob Smith" <bob@smith.org>
Bob Smith <bob@smith.org>
"Jones, Craig" <craig.jones@corp.com>
"Summer Thomas" <sum49@aol.com>; "Al Franken" <al@dnc.org>
"Clinton, Bill" <bill@whitehouse.gov>; "Obama, Barack"
<barack@senate.gov>; "Jenny McCarthy" <jenny@spike.com>
Bob <bob@accounting.com>, <jessica@email.com>, James Blunt
<james@blunt.org>

etc...

Any ideas?

I've been working up RegEx's like crazy but my RegEx foo isn't quite
what it used to be. Are there any shortcuts, or do I need one big RegEx
many specific ones to match the various scenarios?

We're currently using this RegEx to detect when we have a single
properly formatted address (w/o a name attached):
http://tfletcher.com/lib/rfc822.rb
...but that's only one small portion of the problem.

- Shanti

···

--
Posted via http://www.ruby-forum.com/.

Shanti,

Try:

/(\W?([\w\s]+)\W+)?(\w[\w\+\-\.]+@[\w\-\.]+)\W?/i
(with "+" signs in mailbox (like user+nospam@domain.com), which are
invalid)

/(\W?([\w\s]+)\W+)?(\w[\w\\-\.]+@[\w\-\.]+)\W?/i
(without "+" signs)

These should break the addresses down into arrays of matches that you
can parse into:
display name
mailbox
domain

Let me know if this doesn't pass the tests. Better yet, send me a unit
test and i'll make it work. :slight_smile:

also: http://www.zenspider.com/Languages/Ruby/QuickRef.html#11

Michael Fleet
Disinnovate
http://www.disinnovate.com/

···

--
Posted via http://www.ruby-forum.com/.

Hey all,

So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).

These can obviously come in a wild variety of formats, and I'd like to
support as many as possible.

The other gotcha - is that I'd like to keep as much name metadata
available as possible.

Using TMail's parser - I was under the impression that the name portion
in the "to", "cc", "bcc" fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there's a work around)

Here are a few example scenarios (from relatively easy to a little
harder):
raw.uncooked@sushi.co.jp
mr.techie+nospam@gmail.com
<raw@hotmail.com>
"Bob Smith" <bob@smith.org>
Bob Smith <bob@smith.org>
"Jones, Craig" <craig.jones@corp.com>
"Summer Thomas" <sum49@aol.com>; "Al Franken" <al@dnc.org>
"Clinton, Bill" <bill@whitehouse.gov>; "Obama, Barack"
<barack@senate.gov>; "Jenny McCarthy" <jenny@spike.com>
Bob <bob@accounting.com>, <jessica@email.com>, James Blunt
<james@blunt.org>

     harp:~ > cat a.rb
     require 'tmail'
     require 'yaml'

     tmail = TMail::Mail::parse <<-msg
     From shanti@braford.org Thu Nov 9 08:55:15 2006

···

On Fri, 10 Nov 2006, Shanti Braford wrote:
     Date: Fri, 10 Nov 2006 00:52:17 +0900
     From: Shanti Braford <shanti@braford.org>
     Reply-To: ruby-talk@ruby-lang.org
     To: ruby-talk ML <ruby-talk@ruby-lang.org>
     Newsgroups: comp.lang.ruby
     Subject: Best way to parse email recipient lists?

     Hey all,

     So... I'm trying to parse email recipient lists (entered by hand into
     the "to", "cc" and "bcc" fields of a mail app by users).
     msg

     %w( to from cc bcc ).each do |field|
       list = tmail.send("#{ field }_addrs") ||
       phrases = list.map{|a| a.phrase}

       y field => phrases.zip(list.map{|a| a.to_s})
     end

     harp:~ > ruby a.rb
     to:
     - - ruby-talk ML
       - ruby-talk ML <ruby-talk@ruby-lang.org>
     from:
     - - Shanti Braford
       - Shanti Braford <shanti@braford.org>
     cc:

     bcc:

-a
--
my religion is very simple. my religion is kindness. -- the dalai lama