How do I decode strings?

Dear friends!

Ho do I decode MIME encoded strings like "=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

I attempted to use Base64.decode_b, but the decoding is not as was expected.

Regards,

···

--
Dr Balwinder Singh Dheeman Registered Linux User: #229709
CLLO (Chief Linux Learning Officer) Machines: #168573, 170593, 259192
Anu's Linux@HOME Distros: Ubuntu, Fedora, Knoppix
More: http://anu.homelinux.net/~bsd/ Visit: http://counter.li.org/

The "Q" means it uses the encoding known as "quoted-printable". This
and Base64 are the usual encodings in MIME. QuotedPrintable is simpler
and results in longish encoded strings, being suitable for texts with
a few characters out of ASCII 7-bit range.

From "[SUMMARY] Quoted Printable (#23)"
(http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/133976\) sent to
this mailing list,

class String
def to_quoted_printable(*args)
   [self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
   self.gsub(/\r\n/, "\n").unpack("M").first
end
end

provides the solution you want. Just try:

         s = "Jes=FAs_=C1ngel";
         print s.from_quoted_printable

Cheers,
Adriano.

···

On 4/18/05, Dr Balwinder S Dheeman <bsd.SANSPAM@cto.homelinux.net> wrote:

Ho do I decode MIME encoded strings like
"=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

Quoting bsd.SANSPAM@cto.homelinux.net, on Tue, Apr 19, 2005 at 02:44:35AM +0900:

Dear friends!

Ho do I decode MIME encoded strings like
"=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

I attempted to use Base64.decode_b, but the decoding is not as was expected.

Yes, its not base64, its RFC2047.

Here's one way:

# $Id: rfc2047.rb,v 1.4 2003/04/18 20:55:56 sam Exp $

···

#
# An implementation of RFC 2047 decoding.
#
# This module depends on the iconv library by Nobuyoshi Nakada, which I've
# heard may be distributed as a standard part of Ruby 1.8. Many thanks to him
# for helping with building and using iconv.
#
# Thanks to "Josef 'Jupp' Schugt" <jupp@gmx.de> for pointing out an error with
# stateful character sets.
#
# Copyright (c) Sam Roberts <sroberts@uniserve.com> 2004
#
# This file is distributed under the same terms as Ruby.

require 'iconv'

module Rfc2047

  WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
  WORDSEQ = %r{(#{WORD.source})\s+(?=#{WORD.source})}

  # Decodes a string, +from+, containing RFC 2047 encoded words into a target
  # character set, +target+. See iconv_open(3) for information on the
  # supported target encodings. If one of the encoded words cannot be
  # converted to the target encoding, it is left in its encoded form.
  def Rfc2047.decode_to(target, from)
    from = from.gsub(WORDSEQ, '\1')
    out = from.gsub(WORD) do
      >word>
      charset, encoding, text = $1, $2, $3
      
      # B64 or QP decode, as necessary:
      case encoding
        when 'b', 'B'
          #puts text
          text = text.unpack('m*')[0]
          #puts text.dump

        when 'q', 'Q'
          # RFC 2047 has a variant of quoted printable where a ' ' character
          # can be represented as an '_', rather than =32, so convert
          # any of these that we find before doing the QP decoding.
          text = text.tr("_", " ")
          text = text.unpack('M*')[0]

        # Don't need an else, because no other values can be matched in a
        # WORD.
      end

      # Convert:
      #
      # Remember - Iconv.open(to, from)!
      begin
        text = Iconv.iconv(target, charset, text).join
        #puts text.dump
      rescue Errno::EINVAL, Iconv::IllegalSequence
        # Replace with the entire matched encoded word, a NOOP.
        text = word
      end
    end
  end
end

Thanks a lot, after testing adding:

   def decode_q(str)
     str.gsub!(/=\?ISO-8859-[1-9]*\?Q\?([!->@-~]+)\?=/i) {
         $1.unpack("M").first
     }
     str.gsub!(/_/, " ")
     str
   end

to my program/script.

···

On 04/18/2005 11:46 PM, Adriano Ferreira wrote:

On 4/18/05, Dr Balwinder S Dheeman <bsd.SANSPAM@cto.homelinux.net> wrote:

Ho do I decode MIME encoded strings like
"=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

The "Q" means it uses the encoding known as "quoted-printable". This
and Base64 are the usual encodings in MIME. QuotedPrintable is simpler
and results in longish encoded strings, being suitable for texts with
a few characters out of ASCII 7-bit range.

From "[SUMMARY] Quoted Printable (#23)"
(http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/133976\) sent to
this mailing list,

class String
def to_quoted_printable(*args)
   [self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
   self.gsub(/\r\n/, "\n").unpack("M").first
end
end

provides the solution you want. Just try:

         s = "Jes=FAs_=C1ngel";
         print s.from_quoted_printable

--
Dr Balwinder Singh Dheeman Registered Linux User: #229709
CLLO (Chief Linux Learning Officer) Machines: #168573, 170593, 259192
Anu's Linux@HOME Distros: Ubuntu, Fedora, Knoppix
More: http://anu.homelinux.net/~bsd/ Visit: http://counter.li.org/

Thanks a lot! that's what I needed :slight_smile:

···

On 04/19/2005 05:17 AM, Sam Roberts wrote:

Quoting bsd.SANSPAM@cto.homelinux.net, on Tue, Apr 19, 2005 at 02:44:35AM +0900:

Dear friends!

Ho do I decode MIME encoded strings like "=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

I attempted to use Base64.decode_b, but the decoding is not as was expected.

Yes, its not base64, its RFC2047.

Here's one way:

# $Id: rfc2047.rb,v 1.4 2003/04/18 20:55:56 sam Exp $
#
# An implementation of RFC 2047 decoding.
#
# This module depends on the iconv library by Nobuyoshi Nakada, which I've # heard may be distributed as a standard part of Ruby 1.8. Many thanks to him
# for helping with building and using iconv.
#
# Thanks to "Josef 'Jupp' Schugt" <jupp@gmx.de> for pointing out an error with
# stateful character sets.
#
# Copyright (c) Sam Roberts <sroberts@uniserve.com> 2004
#
# This file is distributed under the same terms as Ruby.

require 'iconv'

module Rfc2047

  WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
  WORDSEQ = %r{(#{WORD.source})\s+(?=#{WORD.source})}

  # Decodes a string, +from+, containing RFC 2047 encoded words into a target
  # character set, +target+. See iconv_open(3) for information on the
  # supported target encodings. If one of the encoded words cannot be
  # converted to the target encoding, it is left in its encoded form.
  def Rfc2047.decode_to(target, from)
    from = from.gsub(WORDSEQ, '\1')
    out = from.gsub(WORD) do
      >word>
      charset, encoding, text = $1, $2, $3
            # B64 or QP decode, as necessary:
      case encoding
        when 'b', 'B'
          #puts text
          text = text.unpack('m*')[0]
          #puts text.dump

        when 'q', 'Q'
          # RFC 2047 has a variant of quoted printable where a ' ' character
          # can be represented as an '_', rather than =32, so convert
          # any of these that we find before doing the QP decoding.
          text = text.tr("_", " ")
          text = text.unpack('M*')[0]

        # Don't need an else, because no other values can be matched in a
        # WORD.
      end

      # Convert:
      #
      # Remember - Iconv.open(to, from)!
      begin
        text = Iconv.iconv(target, charset, text).join
        #puts text.dump
      rescue Errno::EINVAL, Iconv::IllegalSequence
        # Replace with the entire matched encoded word, a NOOP.
        text = word
      end
    end
  end
end

--
Dr Balwinder Singh Dheeman Registered Linux User: #229709
CLLO (Chief Linux Learning Officer) Machines: #168573, 170593, 259192
Anu's Linux@HOME Distros: Ubuntu, Fedora, Knoppix
More: http://anu.homelinux.net/~bsd/ Visit: http://counter.li.org/