Charset Detection

Is there any existing functions or external gems that can take a generic
string and parse for charset markers, converting that parts of the
string to their appropriate charsets?

For example, theres an email with the headers
Subject:
=?iso-8859-1?B?VGhlIE9mZmljaWFsIFNBVCBRdWVzdGlvbiBvZiB0aGUgRGF5?=

How can I convert this to what it should be,

···

Date: Sat, 03 Jul 2010 04:00:29 EDT
From: =?iso-8859-1?B?U0FU?= <CollegeBoard@noreply.collegeboard.org>
Date: Sat, 03 Jul 2010 04:00:29 EDT
From: SAT <CollegeBoard@noreply.collegeboard.org>
Subject: SAT Question of the Day
--
Posted via http://www.ruby-forum.com/.

Shea Barton wrote:

Is there any existing functions or external gems that can take a generic
string and parse for charset markers, converting that parts of the
string to their appropriate charsets?

For example, theres an email with the headers
Date: Sat, 03 Jul 2010 04:00:29 EDT
Subject:
=?iso-8859-1?B?VGhlIE9mZmljaWFsIFNBVCBRdWVzdGlvbiBvZiB0aGUgRGF5?=

How can I convert this to what it should be,
Date: Sat, 03 Jul 2010 04:00:29 EDT
From: SAT <CollegeBoard@noreply.collegeboard.org>
Subject: SAT Question of the Day

Not answering your question directly, but this syntax is specific to
MIME encoding of E-mail headers:

So if you look at ruby MIME toolkits you may find what you're looking
for.

···

From: =?iso-8859-1?B?U0FU?= <CollegeBoard@noreply.collegeboard.org>

--
Posted via http://www.ruby-forum.com/\.

heres the method I created in case it is useful to anyone else..

def convert_mime_encoded_word(mime_encoded_word)
  require 'iconv'
  require 'base64'
  from_charset, from_encoding, encoded_word =
mime_encoded_word.scan(/\=\?([^?]+)\?([BQ])\?([^?]+)\?\=/i).first
  if from_encoding == "Q"
    decoded_word = encoded_word.unpack("M").first
  elsif from_encoding == "B"
    decoded_word = encoded_word.unpack("m").first
  end
  Iconv.iconv("UTF8", from_charset, decoded_word).first
end

···

--
Posted via http://www.ruby-forum.com/.