Enhancing the Gateway (Help Needed)

James Edward Gray II wrote:

I've been looking into this a little this morning.

We do receive multipart/related messages, though they seem fairly
uncommon compared to multipart/alternative. They don't appear to be
gated properly. In fact, the mailing list archives don't even seem to
show them. For example 271796 was a multipart/related message and I
can't find it in the archives or on comp.lang.ruby.

To understand what we are dealing with here, I read:

  RFC 2387 - The MIME Multipart/Related Content-type (RFC2387)

This type does not seem easy to deal with and I open to suggestions for
the best strategy to use.

AFAIK it's mostly used for HTML messages with images embedded in the
email itself.

Yeah, I think that's what I'm seeing in my analysis of the messages.

I guess it would mostly be one part of a multipart/alternative message, of which one alternative should be text/plain anyway.

Most of the cases I have found have a multipart/alternative section inside the multipart/related section, like this example shows:

   271796: multipart/related ()
     multipart/alternative ()
     image/png ()

Obviously I need to extend my statistics gathering script to handle the nesting, but I've checked this message by hand and there was a text/plain part in there.

Otherwise, you're most likely left with HTML to
strip, and images which you may either drop or attach to the output as
files.

Right. Which means I still need to settle on an HTML strategy as well.

Sorry if I happen to be wrong on one point or the other.

The other usage that seems common, more common than the HTML case in fact, is as part of a signed message:

   271822: multipart/signed ()
     multipart/related ()
     application/pgp-signature ()

I've not yet checked to see if these messages are gated properly with our current setup.

James Edward Gray II

···

On Oct 29, 2007, at 9:20 AM, mortee wrote:

Todd Benson wrote:

The lowest common denominator for language is US-ASCII (is that a good
thing or bad thing? You decide).

Aside from any language bias: the language of this list/group is
certainly English, which does just well in ASCII. So IMHO we wouldn't
loose much by falling back to that in case of some iconv errors. At
least certainly not as much as it'd be worth extraneous effort to work
around.

mortee

Hi,

At Mon, 29 Oct 2007 13:17:24 +0900,
Nobuyoshi Nakada wrote in [ruby-talk:276371]:

I suspect you mean multipart/relative.

I wasn't even aware of that format, to be honest. I knew of
multipart/mixed (which our Usenet host will allow) and multipart/
alternative. What is the purpose of multipart/relative?

As the above.

Oops, it was multipart/related, and I removed the paragraph
mentioned about it. My mistake, sorry.

I've been looking into this a little this morning.

We do receive multipart/related messages, though they seem fairly
uncommon compared to multipart/alternative. They don't appear to be
gated properly. In fact, the mailing list archives don't even seem
to show them. For example 271796 was a multipart/related message and
I can't find it in the archives or on comp.lang.ruby.

To understand what we are dealing with here, I read:

   RFC 2387 - The MIME Multipart/Related Content-type (RFC2387)

This type does not seem easy to deal with and I open to suggestions
for the best strategy to use.

James Edward Gray II

I haven't built enough clout in this group for my opinion to matter,
but here goes...

I'm in over my head with all this email stuff and need all the help I can get. The gateway belongs to all of us, not my. So don't be shy. Help me fix this right and we all benefit.

James did a great job with the gateway ... no doubt about that.

Just to be totally clear, I didn't make the original gateway. I'm just the current caretaker.

Make sure, James and others, that you label the reformed
emails/postings with some kind of rejoinder that says something to the
effect of "mail/posting has been modified to make it available."

I will absolutely do this. The code I posted earlier in this thread already does.

James Edward Gray II

···

On Oct 29, 2007, at 10:02 AM, Todd Benson wrote:

On 10/29/07, James Edward Gray II <james@grayproductions.net> wrote:

On Oct 28, 2007, at 11:35 PM, Nobuyoshi Nakada wrote:

One thing I did notice, is that when you (James) have forwarded ruby quiz submissions to the list they were encoded like:

Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
  x-unix-mode=0666;
  name=time_window.rb
Content-Disposition: attachment;
  filename=time_window.rb

It would be nice to preserve those; but I have no idea whether it
makes sense to just pretend ALL "application/octet-stream" is just
"text/plain". :slight_smile:

Well, if it's some problem with me that's good news, because I can be fixed. :wink:

Was the overall type of the email application/octet-stream? I would assume it was multipart/mixed and you are just showing the attachment portion here. Our host does allow us to send multipart/mixed. Was the message in question gated?

Ah. LOL. Not only was it indeed gated to Usenet, it apparently was
NOT preserved in the ruby-talk archive. (But it did appear on the
mailing list.)

Yeah, it was multipart/mixed.
http://groups.google.com/group/comp.lang.ruby/msg/ad72badfe1ad61fa?dmode=source

Sorry for the noise,

Bill

···

From: "James Edward Gray II" <james@grayproductions.net>

On Oct 29, 2007, at 5:29 PM, Bill Kelly wrote:

Otherwise, you're most likely left with HTML to
strip, and images which you may either drop or attach to the output as
files.

Right. Which means I still need to settle on an HTML strategy as well.

I'm not sure you have that many HTML only messages. For my mailbox, I
have an HTML-only filter. It catches 0.5% of my incoming mail, and it's
100% spam.

OTOH, I seem to recall we looked at a weird multipart/alternative
message recently which had only one plain text part.

Sorry if I happen to be wrong on one point or the other.

The other usage that seems common, more common than the HTML case in
fact, is as part of a signed message:

   271822: multipart/signed ()
     multipart/related ()
     application/pgp-signature ()

I've not yet checked to see if these messages are gated properly with
our current setup.

Yes. I have <200710281217.12340.konrad@tylerc.org> / ruby-talk 276326,
for instance. I can't guarantee it's propagated as well as a pure text
message, but it should be on most servers.

Fred

···

Le 29 octobre à 16:06, James Edward Gray II a écrit :

On Oct 29, 2007, at 9:20 AM, mortee wrote:

--
You walked away from this Did it make it easier on you ? So now what ?
Life must go on still haunted It's so hard to face the day I hope it
is good for you I tried, oh how I tried, but it's broken Let me go, I
could have died (Kittie, Pink Lemonade)

Fred, you always show up when I need you. That's why you're still my best friend. :wink:

Otherwise, you're most likely left with HTML to
strip, and images which you may either drop or attach to the output as
files.

Right. Which means I still need to settle on an HTML strategy as well.

I'm not sure you have that many HTML only messages. For my mailbox, I
have an HTML-only filter. It catches 0.5% of my incoming mail, and it's 100% spam.

Yes, you may be right about that. Perhaps not much of a concern. I'm not seeing any such messages in my sample data.

OTOH, I seem to recall we looked at a weird multipart/alternative
message recently which had only one plain text part.

Sadly, that's extremely common. Have a look at just the beginning of my sample data:

271456: multipart/alternative ()
   text/plain (UTF-8)
271541: multipart/signed ()
   text/plain (utf-8)
   application/pgp-signature ()
271567: multipart/signed ()
   text/plain (iso-8859-1)
   application/pgp-signature ()
271588: multipart/signed ()
   text/plain (utf-8)
   application/pgp-signature ()
271569: multipart/alternative ()
   text/plain (ISO-8859-1)
271578: multipart/alternative ()
   text/plain (ISO-8859-1)
271566: multipart/signed ()
   text/plain (iso-8859-1)
   application/pgp-signature ()
271568: multipart/alternative ()
   text/plain (ISO-8859-1)
271444: multipart/alternative ()
   text/plain (ISO-8859-1)
271452: multipart/alternative ()
   text/plain (ISO-8859-1)
271640: multipart/alternative ()
   text/plain (UTF-8)
271669: multipart/alternative ()
   text/plain (ISO-8859-1)

Good thing those are super easy to fix. :wink:

Sorry if I happen to be wrong on one point or the other.

The other usage that seems common, more common than the HTML case in
fact, is as part of a signed message:

   271822: multipart/signed ()
     multipart/related ()
     application/pgp-signature ()

I've not yet checked to see if these messages are gated properly with
our current setup.

Yes. I have <200710281217.12340.konrad@tylerc.org> / ruby-talk 276326,
for instance. I can't guarantee it's propagated as well as a pure text
message, but it should be on most servers.

Awesome. That's good to know. Thanks for checking that for me.

James Edward Gray II

···

On Oct 29, 2007, at 1:55 PM, F. Senault wrote:

Le 29 octobre à 16:06, James Edward Gray II a écrit :

On Oct 29, 2007, at 9:20 AM, mortee wrote: