[QUIZ] Quoted Printable (#23)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The quoted printable encoding is used in primarily in email, thought it has
recently seen some use in XML areas as well. The encoding is simple to
translate to and from.

This week's quiz is to build a filter that handles quoted printable translation.

Your script should be a standard Unix filter, reading from files listed on the
command-line or STDIN and writing to STDOUT. In normal operation, the script
should encode all text read in the quoted printable format. However, your
script should also support a -d command-line option and when present, text
should be decoded from quoted printable instead. Finally, your script should
understand a -x command-line option and when given, it should encode <, > and &
for use with XML.

Here are the rules we will use, from the quoted printable format:

  1. Bytes with ASCII values from 33 (exclamation point) through 60 (less
      than) and values from 62 (greater than) through 126 (tilde) should be
      passed through the encoding process unchanged. Note that the -x switch
      modifies this rule slightly, as stated above.
  
  2. Other bytes are to be encoded as an equals sign (=) followed by two
      hexadecimal digits. For example, when -x is active less than (<) will
      become =3C. Use only capital letters for hex digits.
  
  3. The exceptions are spaces and tabs. They should remain unencoded as
      long as any non-whitespace character follows them on the line. Spaces
      and tabs at the end of a line, must be encoded per rule 2 above.
  
  4. Native line endings should be translated to carriage return-line feed
      pairs.
  
  5. Quoted printable lines are limited to 76 characters of length (not
      counting the line ending pair). Longer lines must be divided up. Any
      line endings added by the encoding process should be proceeded by an
      equals sign, so the unecoder will know to remove them. The equals sign
      must be the last character on the line, followed immediately by the line
      end pair. Such an equals sign does count as a non-whitespace character
      for rule 3, allowing preceding spaces and tabs to remain unencoded.
      The equals sign must fit inside the 76 character limit.

To unecode, just reverse the process.

Note: I assumed it would be cheating to use the builtin quoted printable facilities.

I found it somewhat frustrating that String#each_byte does not return
any useful value (see encode_str).

I found it a bit more frustrating that String#chomp! is a greedier than you might expect, discarding all sorts of potential line endings, instead of limiting itself to $/.

I would also suggest that adding support for GetoptLong#[] to query options directly, instead of requiring a full iteration.

#!/usr/bin/env ruby -w

require 'getoptlong'

MaxLength = 76

def main
   opts = GetoptLong.new(
     [ "-d", GetoptLong::NO_ARGUMENT ],
     [ "-x", GetoptLong::NO_ARGUMENT ]
   )
   $opt_decode = false
   $opt_xml = false
   opts.each do |opt, arg|
     case opt
     when "-d": $opt_decode = true
     when "-x": $opt_xml = true
     end
   end

   if $opt_decode
     decode_input
   else
     encode_input
   end
end

def encode_input
   STDOUT.binmode # We need to control the line-endings.
   while (line = gets) do
     # Note: String#chomp! swallows more than just $/.
     line.sub!(/#{$/}$/o, "")
     # Encode the entire line.
     line.gsub!(/[^\t -<>-~]+/) { |str| encode_str(str) }
     line.gsub!(/[&<>]+/) { |str| encode_str(str) } if $opt_xml
     line.sub!(/\s*$/) { |str| encode_str(str) }
     # Split the line up as needed.
     while line.length > MaxLength
       split = line.index("=", MaxLength - 4) - 1
       split = (MaxLength - 2) if split.nil? or (split > MaxLength - 2)
       print line[0..split], "=\r\n"
       line = line[(split + 1)..-1]
     end
     print line, "\r\n"
   end
end

def encode_str(str)
   encoded = ""
   str.each_byte { |c| encoded << "=%02X" % c }
   encoded
end

def decode_input
   while (line = gets) do
     line.chomp!
     line.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
     if line[-1] == ?=
       print line[0..-2]
     else
       print line, $/
     end
   end
end

main

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>

I must sheepishly admit that I was unaware of of Ruby's converter when I made the quiz. It was pointed out the me in a private email after I posted it. The converter isn't a complete solution to the quiz, but it gets you very close.

Is it cheating to use Ruby features? Never. Feel free, then poke a little fun at the quiz editor because you're smarter than he is. All part of the fun.

Sorry for the oversight.

James Edward Gray II

···

On Mar 13, 2005, at 12:57 PM, Glenn Parker wrote:

Note: I assumed it would be cheating to use the builtin quoted printable facilities.

Hi,

Testing. I found building a test suite before doing the code really helpful on this one, to get my head around the intricacies of the encoding. Actually thinking through the edge cases and working out expected results was necessary for me to develop this solution.

Now, of course, this would have been a lot easier if I'd just been able to find the "builtin quoted printable facilities." What builtin quoted printable facilities?

Anyway, here is my result:
http://www.dave.burt.id.au/ruby/quoted-printable.rb

And the tester:
http://www.dave.burt.id.au/ruby/test-quoted-printable.rb

The testing program generates test methods and test data dynamically.

The public interface to my solution looks like this:

module QuotedPrintable

  WHITESPACE = [?\t, ?\ ]
  WHITESPACE_REGEXP = /[\t ]/
  WHITESPACE_ESCAPED_REGEXP = /=09|=20/

  # bytes that do not need to be escaped
  PRINTABLES = ((?!..?~).to_a + WHITESPACE) - [?=]

  MAX_LINE_WIDTH = 76

  NEWLINE = "\r\n"

  # additional bytes to escape for safety in an EBCDIC document
  EBCDIC_EXCEPTIONS = %w' ! " # $ @ [ \ ] ^ ` { | } ~ '
  EBCDIC_PRINTABLES = PRINTABLES - EBCDIC_EXCEPTIONS
  # additional bytes to escape for safety in an XML document
  XML_EXCEPTIONS = %w' < > & '
  XML_PRINTABLES = PRINTABLES - XML_EXCEPTIONS

  # Encode self to the quoted-printable transfer encoding
  def to_quoted_printable(printables = QuotedPrintable::PRINTABLES)

  # Decode self from the quoted-printable transfer encoding
  def from_quoted_printable

  # Functions that do quoted-printable encoding and decoding
  class << self

    # Return the quoted-printable escaped representation of the given byte
    # (byte must be a Fixnum between 0 and 255)
    def encode_byte(byte)

    # Return the byte corresponding to the given quoted-printable escape
    # sequence as a String. If it's not valid, return nil.
    def decode_sequence(escape_sequence)

    # Return the given string encoded as quoted-printable, including the
    # canonical \r\n line terminators.
    def encode_string(string, printables = PRINTABLES)

    # Consider the given string quoted-printable encoded, and decode it,
    # including translating line terminators to the native default.
    def decode_string(string)

# Add quoted-printable conversions to String
class String
  include QuotedPrintable # to_quoted_printable, from_quoted_printable
end

Cheers,
Dave

Look up the "M" format for Array.pack.

James Edward Gray II

···

On Mar 14, 2005, at 9:41 AM, Dave Burt wrote:

Now, of course, this would have been a lot easier if I'd just been able to find the "builtin quoted printable facilities." What builtin quoted printable facilities?

What builtin quoted printable facilities?

Look up the "M" format for Array.pack.

So here's the cheat solution:

class String
  def to_quoted_printable(*args)
    [self].pack("M").gsub(/\n/, "\r\n")
  end
  def from_quoted_printable
    self.gsub(/\r\n/, "\n").unpack("M").first
  end
end

(Just add my original if __FILE__ block to make it almost quiz-compatible)

And here's how it fares against my test suite:

Loaded suite TC_QuotedPrintable
Started
.............FF.FFFFFFF..
Finished in 0.39 seconds.

So it's 10 times the speed of my original one (against random binary data), but chops lines too early, ends up with 73- instead of 76-character lines. Of course, this one won't do XML.

Interestingly, if I use a gsub! instead of a loop with sub!s in my soft_break! method, I get a 5x speedup... and fail the same tests.

Cheers,
Dave

(from Dave's solution)

if __FILE__ == $0
   require 'optparse'

   # Look, James, I'm opt-parsing! :slight_smile:
   ...

I'm so proud! :smiley:

James Edward Gray II