[QUIZ] Text Munger (#76)

From: Gregory Seidman <Gregory_Seidman@alumni.brown.edu>
Date: April 21, 2006 9:24:49 AM CDT
To: Ruby Quiz <james@grayproductions.net>
Subject: Re: [QUIZ] Text Munger (#76)

Note that this was sent directly to you, not to the list as a whole. I was
going for development speed, so waiting until Sunday to submit a solution
would be... frustrating. Anyhow...

} -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
} by Matthew Moss
} Your task for this quiz, then, is to take a text as input and output the text in
} this fashion. Scramble each word's center (leaving the first and last letters of
} each word intact). Whitespace, punctuation, numbers -- anything that isn't a
} word -- should also remain unchanged.

There are a few interesting things about this script:

- It isn't bothering to rearrange any word that is less than four letters.

- It treats capital letters at the beginning of a word, even if they occur
  in the middle of a word. This has two effects. First, acronyms and other
  words in all-caps are not rearranged at all. Words in all-caps don't
  register the same when we read them, and rearranging their letters is
  much more disruptive. A word like MacDonald, however, would be rearranged
  as two separate words. The capital letters act as anchors, much in the
  same way that the first and last letters of words do, and cannot be
  moved. For example, MacDonald could be rendered MacDonlad and still be
  readable, but MlacdnDao would not be.

- It isn't bothering to compact or flatten anything before joining. I do
  need a flatten after the scan to get individual tokens instead of an
  array of arrays, but I leave in all the nils and just ignore them as if
  they were whitespace or punctuation. They disappear in the join anyway,
  so it doesn't matter.

- Also, the replacement for actual words is an array of three elements: a
  single-character string, an array of single-character strings, and
  another single-character string. The join takes care of that, too.

The script:

#!/usr/bin/env ruby

if /^(-[?])|(-h)|(-help)|(--help)$/ =~ ARGV[0] || ARGV.length > 1
  $stderr.puts "Usage: #{$PROGRAM_NAME} [filename]"
  exit 1

infile = $stdin
if ARGV.length == 1
  infile = File.new(ARGV[0]) rescue begin
    $stderr.puts("File not found: '#{ARGV[0]}'")
    exit 2

tok_exp = /([a-zA-Z][a-z]*)|([^A-Za-z]+)/
word_exp = /[a-zA-Z][a-z]{3,}/

infile.each_line { |line| puts line.scan(tok_exp).flatten!.map! { |tok>
    if word_exp =~ tok
      newtok = [tok[0..0]]
      newtok << tok[1...-1].split('').sort_by{rand}
      newtok << tok[-1..-1]


Begin forwarded message:

On Fri, Apr 21, 2006 at 09:34:35PM +0900, Ruby Quiz wrote: