It was pointed out, first in private email and later on Ruby Talk, that your
quiz editor isn't quite up on all of Ruby's features. Support for the Quoted
Printable encoding is already in the language. You can access this with the "M"
format specification of Array.pack() and String.unpack(). Dave burt posted a
modification to his solution using these features. Here's that class:
class String
def to_quoted_printable(*args)
[self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
self.gsub(/\r\n/, "\n").unpack("M").first
end
end
Ruby's Quoted Printable encoder uses standard Unix line endings, which is why
you see the gsub() translations to the specified carriage-return line-feed pairs
above. That doesn't handle the XML aspect of the quiz, but you can add that
with a few more calls to gsub() at both ends.
Ignoring my knowledge gap, we still have some interesting solutions to discuss.
Let's start with a solution. Here's Glenn Parker's code:
#!/usr/bin/env ruby -w
require 'getoptlong'
MaxLength = 76
def main
opts = GetoptLong.new(
[ "-d", GetoptLong::NO_ARGUMENT ],
[ "-x", GetoptLong::NO_ARGUMENT ]
)
$opt_decode = false
$opt_xml = false
opts.each do |opt, arg|
case opt
when "-d": $opt_decode = true
when "-x": $opt_xml = true
end
end
if $opt_decode
decode_input
else
encode_input
end
end
def encode_input
STDOUT.binmode # We need to control the line-endings.
while (line = gets) do
# Note: String#chomp! swallows more than just $/.
line.sub!(/#{$/}$/o, "")
# Encode the entire line.
line.gsub!(/[^\t -<>-~]+/) { |str| encode_str(str) }
line.gsub!(/[&<>]+/) { |str| encode_str(str) } if $opt_xml
line.sub!(/\s*$/) { |str| encode_str(str) }
# Split the line up as needed.
while line.length > MaxLength
### original code ###
# split = line.index("=", MaxLength - 4) - 1
# split = (MaxLength - 2) if split.nil? or (split > MaxLength - 2)
### BUGFIX: index() can return nil, so don't subtract -JEG2 ###
split = line.index("=", MaxLength - 4)
split = (MaxLength - 2) if split.nil? or ( split - 1 >
MaxLength - 2 )
### END BUGFIX ###
print line[0..split], "=\r\n"
line = line[(split + 1)..-1]
end
print line, "\r\n"
end
end
def encode_str(str)
encoded = ""
str.each_byte { |c| encoded << "=%02X" % c }
encoded
end
def decode_input
while (line = gets) do
line.chomp!
line.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
if line[-1] == ?=
print line[0..-2]
else
print line, $/
end
end
end
main
Let me talk a little about that shebang line. It doesn't work on my system:
$ chmod +x quoted_printable.rb
$ ./quoted_printable.rb
env: ruby -w: No such file or directory
That's one of the minuses of using the "env ruby" trick. If you don't want to
hardcode the path and still want to enable warnings inside the script, the
following works:
#!/usr/bin/env ruby
$VERBOSE = true # enable warnings
That doesn't have anything to do with the quiz, of course, and you could still
run Glenn's code with "ruby quoted_printable.rb", but having been bitten by that
same problem myself, I wanted to mention it.
Getting back to the code, Glenn pulls in getoptlong, defines a constant to hold
the line length, and then defines a method called main(). main() just parses
command line options (setting the globals $opt_decode and $opt_xml as needed),
then hands off work to either decode_input() or encode_input().
For encoding, encode_input() handles most of the work. It starts by shutting
off line ending translation with a call to binmode(). I believe that's only
needed when your code is running on Windows, but it's still a great habit to
form anytime you're going to muck with raw line endings.
From there, encode_input() loops over STDIN with a line-by-line read. Note that
it performs its own chomp() with a call to sub!(). The author explains why in
his submission email:
I found it a bit more frustrating that String#chomp! is a greedier than
you might expect, discarding all sorts of potential line endings,
instead of limiting itself to $/.
The next three substitutions encode the needed characters on the line. They're
just a combination of simple Regexps and calls to encode_str(). If you glance
down at encode_str(), you can see that it's a very simple byte to hex
translator.
The final while loop in encode_input() breaks up long lines. It looks more
complex above, because I added a bug fix too it. When running tests on the
code, Glenn's script crashed on me. The issue was that String.index() can
return nil and you can't subtract 1 from nil. I just moved the "- 1" down a
line to work around this.
The reason index() is called looking for an "=" is to prevent breaking up an
already encoded character. If there aren't any encoded characters, the line is
split at MaxLength.
This method of breaking up the lines can break lines mid-word. You might want
to consider trying to break them at word boundaries though. A big advantage of
Quoted Printable is that it's really a Base64-like encoding, that keeps plain
text pretty readable. That's why I suggested its use to embed data in XML. To
that end, breaking lines on word boundaries just enhances that characteristic.
Getting back to the code one last time, decode_input() is even easier to follow.
It too is a line-by-line read, with a gsub!() used to unencode and a basic if
statement used to unwrap lines (by dropping the = and not printing a line
ending).
The other solutions are all quite interesting and I do encourage everyone to
check them out. Most submissions modified String to add the conversions.
Matthew Moss also added foreach() style readers to IO. Dave Burt included a
nice set of test cases, used by himself and at least one other person. Good
stuff all around.
My thanks to all who endure my mental lapses, and to those who gently correct
me. I need all the help I can get.
Great news: We have a record four quizzes queued up right now, all of them
including some contribution from others! I'm so pleased. We'll start our run
tomorrow with a quiz for people who know when to Hold'em and when to fold 'em...