Email parsing

Hi there.

Question, Im using net/pop to collect email, is there any way I can get
fields as an object from it ? Such as subject, attachments, etc ?

Thanks

Rove Monteux

···


Rove Monteux
Systems Administrator

rove.monteux@fluid-rock.com

Quoteing rove.monteux@fluid-rock.com, on Sat, Feb 07, 2004 at 02:13:56AM +0900:

Question, Im using net/pop to collect email, is there any way I can get
fields as an object from it ? Such as subject, attachments, etc ?

I don't think ruby has any builtin mail message decoders, but I've used
rubymail to do this kind of thing, and it was easy. There was at least 1
other package, I used rubymail because I thought it looked the best at
the time, and can't for the life of me remember why anymore!

Cheers,
Sam

That sounds good, thanks !

Rove Monteux

Sam Roberts wrote:

···

Quoteing rove.monteux@fluid-rock.com, on Sat, Feb 07, 2004 at 02:13:56AM +0900:

Question, Im using net/pop to collect email, is there any way I can get
fields as an object from it ? Such as subject, attachments, etc ?

I don’t think ruby has any builtin mail message decoders, but I’ve used
rubymail to do this kind of thing, and it was easy. There was at least 1
other package, I used rubymail because I thought it looked the best at
the time, and can’t for the life of me remember why anymore!

Cheers,
Sam


Rove Monteux
Systems Administrator

rove.monteux@fluid-rock.com

http://raa.ruby-lang.org/list.rhtml?name=rubymail

-a

···

On Sat, 7 Feb 2004, Rove Monteux wrote:

Date: Sat, 7 Feb 2004 02:13:56 +0900
From: Rove Monteux rove.monteux@fluid-rock.com
Newsgroups: comp.lang.ruby
Subject: Email parsing

Hi there.

Question, Im using net/pop to collect email, is there any way I can get
fields as an object from it ? Such as subject, attachments, etc ?

Thanks

Rove Monteux

ATTN: please update your address books with address below!

===============================================================================

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
STP :: Solar-Terrestrial Physics Data | NCEI
NGDC :: http://www.ngdc.noaa.gov/
NESDIS :: http://www.nesdis.noaa.gov/
NOAA :: http://www.noaa.gov/
US DOC :: http://www.commerce.gov/

The difference between art and science is that science is what we
understand well enough to explain to a computer.
Art is everything else.
– Donald Knuth, “Discover”

/bin/sh -c ‘for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done’
===============================================================================

Grugitate Mail
http://raa.ruby-lang.org/list.rhtml?name=gurgitate-mail

robert

“Rove Monteux” rove.monteux@fluid-rock.com schrieb im Newsbeitrag
news:4023CAD0.5040100@fluid-rock.com

···

Hi there.

Question, Im using net/pop to collect email, is there any way I can get
fields as an object from it ? Such as subject, attachments, etc ?

Thanks

Rove Monteux


Rove Monteux
Systems Administrator

rove.monteux@fluid-rock.com

Thanks again,

I see that the ‘Guide’ session is incomplete, specially on the subjects
of decoding, encoding and using rubymail with net/pop and net/smtp
(whats basically you would use rubymail for), any
sugestions/examples/basic documentation posted somewhere on that by any
chance ?

Again, thanks.

Cheers

Rove Monteux

Ara.T.Howard wrote:

···

On Sat, 7 Feb 2004, Rove Monteux wrote:

Date: Sat, 7 Feb 2004 02:13:56 +0900
From: Rove Monteux rove.monteux@fluid-rock.com
Newsgroups: comp.lang.ruby
Subject: Email parsing

Hi there.

Question, Im using net/pop to collect email, is there any way I can get
fields as an object from it ? Such as subject, attachments, etc ?

Thanks

Rove Monteux

http://raa.ruby-lang.org/list.rhtml?name=rubymail

-a


Rove Monteux
Systems Administrator

rove.monteux@fluid-rock.com

Quoteing rove.monteux@fluid-rock.com, on Sat, Feb 07, 2004 at 02:58:24AM +0900:

I see that the 'Guide' session is incomplete, specially on the subjects
of decoding, encoding and using rubymail with net/pop and net/smtp
(whats basically you would use rubymail for), any

Well, that's one way. The other way is to read a local mailbox in unix
mbox format (also the format used by Apple's Mail.app), and to injecting
mail into the delivery system by piping to /usr/bin/mail.

Anyhow, it took me a minute to figure out how I did this, too.

Here's some code ripped from a script I wrote to convert mbox files to
vcards, with some comments added:

#!/usr/bin/ruby -w

require 'rmail'

addrs = RMail::Address::List.new

ARGV.each { |mbox|
  File.open(File.expand_path(mbox)) { |file|
    puts "Reading: #{file.path}..."

    RMail::mailbox::MBoxReader.new(file).each_message { |input|

    # input is a single string that is one message. I get it here from
    # the MBoxReader, you'll get it as the return value of pop.pop (used
    # in the mode where it returns the whole message as a string).

    # Then you have to parse it to create a message, do it like this:
      message = RMail::Parser.read(input)

    # Now you can play with the mesage:
      header = message.header
      addrs.concat(header.from)
      addrs.concat(header.recipients)
      addrs.concat(header.reply_to)
    }
  }
}

addrs = addrs.uniq

puts "Unique addrs: #{addrs.size}"

Cheers,
Sam

Hi Sam,

Yes, I got till there using the net/pop alright, can retrieve headers
(subject, from, to etc). Im a bit lost now, I request the message.body
as it says on the docs, and it returns me nil, both on multipart or not
messages.

Follows my snippet (message is the full message retrieved with m.pop),

do whatever with the message

def process_message(message)
p = RMail::Parser.new
m = p.parse(message)

deals with the header

h = m.header()
subject = h.subject
from = h.from
to = h.to
puts "from: " + from.first
puts "to: " + to.first
puts "subject: " + subject

deals with the body

puts m.to_s
if m.multipart?()
m.each_part { |g|
puts g.to_s }
else
puts m.decode().to_s
end

end

Everything until the ‘body’ bit works correctly, and according to the docs,

http://www.lickey.com/rubymail/rubymail/doc/classes/RMail/Message.html#M000015

Its pretty much the same as with the header, as long as you check if it
is multipart or not. Doing a to_s should return me something, what it
doesnt, so obviously Im doing something wrong. Any ideas on that ? :slight_smile:

Thanks again !

Cheers

Rove Monteux

Sam Roberts wrote:

···

Quoteing rove.monteux@fluid-rock.com, on Sat, Feb 07, 2004 at 02:58:24AM +0900:

I see that the ‘Guide’ session is incomplete, specially on the subjects
of decoding, encoding and using rubymail with net/pop and net/smtp
(whats basically you would use rubymail for), any

Well, that’s one way. The other way is to read a local mailbox in unix
mbox format (also the format used by Apple’s Mail.app), and to injecting
mail into the delivery system by piping to /usr/bin/mail.

Anyhow, it took me a minute to figure out how I did this, too.

Here’s some code ripped from a script I wrote to convert mbox files to
vcards, with some comments added:

#!/usr/bin/ruby -w

require ‘rmail’

addrs = RMail::Address::List.new

ARGV.each { |mbox|
File.open(File.expand_path(mbox)) { |file|
puts “Reading: #{file.path}…”

RMail::mailbox::MBoxReader.new(file).each_message { |input|

input is a single string that is one message. I get it here from

the MBoxReader, you’ll get it as the return value of pop.pop (used

in the mode where it returns the whole message as a string).

Then you have to parse it to create a message, do it like this:

 message = RMail::Parser.read(input)

Now you can play with the mesage:

 header = message.header
 addrs.concat(header.from)
 addrs.concat(header.recipients)
 addrs.concat(header.reply_to)

}
}
}

addrs = addrs.uniq

puts “Unique addrs: #{addrs.size}”

Cheers,
Sam


Rove Monteux
Systems Administrator

rove.monteux@fluid-rock.com

Quoteing rove.monteux@fluid-rock.com, on Sat, Feb 07, 2004 at 04:14:01AM +0900:

Yes, I got till there using the net/pop alright, can retrieve headers
(subject, from, to etc). Im a bit lost now, I request the message.body
as it says on the docs, and it returns me nil, both on multipart or not
messages.

Sorry, your on your own now, luckily you have the source!

Follows my snippet (message is the full message retrieved with m.pop),

# deals with the body
puts m.to_s
if m.multipart?()
m.each_part { |g|
   puts g.to_s }

I think this should be g.decode().

else
puts m.decode().to_s

And this shouldn't have a .to_s.

decode() is just a wrapper around a check that its not multi-part, then
looks to see if it's qp or base64 encoded, and decodes if appropriate.

No docs on what the parts are, but my bet is that they are a
RMail:Message, because that's how MIME works: its recursive. Each of
those parts could have parts, too.

end

end

First thing I'd do, though, is this:

p m.body

if m.body.is_a?(Array)
  m.body.each do |part|
    p part
  end
end

Just to kinda check out what Message.@body is...

The mail message you're looking at DOES have a body, doesn't it? :slight_smile:

I wrote a library like this in C++, and it was fun, but got mostly
abandoned. Lickey isn't really working on this now, he's got a life, it
sounded like, last time I exchanged mail with him.

I'd like to do some work on rubymail, I've some local changes I've made,
and I'd like to add some net/(imap,pop) integration, stuff like that,
but there are only so many hours in the day. Canada's pretty
progressive, but we still haven't managed to legislate a 26 hour day (or
even a 4-day workweek, sadly).

Good luck,
Sam

Hi Sam !

Yes Im pretty puzzled now,

puts m.to_s

Outputs the whole of the message, including the two parts (normal body + attachment).

Now, p m.body and p part return nil. And I can clearly see that the email message is properly formed and has a body (+ attachment).

Gonna browse through the sources for rubymail and try to trace down whats happening, or where Im going wrong.

Anyway I agree with the 4 day schema (as long’s still paid full time, of course) :slight_smile:

Cheers and thanks again !

Rove Monteux

Sam Roberts wrote:

···

Quoteing rove.monteux@fluid-rock.com, on Sat, Feb 07, 2004 at 04:14:01AM +0900:

Yes, I got till there using the net/pop alright, can retrieve headers
(subject, from, to etc). Im a bit lost now, I request the message.body
as it says on the docs, and it returns me nil, both on multipart or not
messages.

Sorry, your on your own now, luckily you have the source!

Follows my snippet (message is the full message retrieved with m.pop),

deals with the body

puts m.to_s
if m.multipart?()
m.each_part { |g|
puts g.to_s }

I think this should be g.decode().

else
puts m.decode().to_s

And this shouldn’t have a .to_s.

decode() is just a wrapper around a check that its not multi-part, then
looks to see if it’s qp or base64 encoded, and decodes if appropriate.

No docs on what the parts are, but my bet is that they are a
RMail:Message, because that’s how MIME works: its recursive. Each of
those parts could have parts, too.

end

end

First thing I’d do, though, is this:

p m.body

if m.body.is_a?(Array)
m.body.each do |part|
p part
end
end

Just to kinda check out what Message.@body is…

The mail message you’re looking at DOES have a body, doesn’t it? :slight_smile:

I wrote a library like this in C++, and it was fun, but got mostly
abandoned. Lickey isn’t really working on this now, he’s got a life, it
sounded like, last time I exchanged mail with him.

I’d like to do some work on rubymail, I’ve some local changes I’ve made,
and I’d like to add some net/(imap,pop) integration, stuff like that,
but there are only so many hours in the day. Canada’s pretty
progressive, but we still haven’t managed to legislate a 26 hour day (or
even a 4-day workweek, sadly).

Good luck,
Sam


Rove Monteux
Systems Administrator

rove.monteux@fluid-rock.com