[summary] 1-800-the-quiz (#20)

Here's an interesting quote from Brian Schroeder's solution page:

  To make it more difficult JEGII introduced the possibility to skip letters,
  but in my opinion this leads to bad results (Though it made me think, so it
  was a good idea).

I'm not sure exactly what qualifies as "bad results," but here's a thought I had
while reading this:

  CAR-4-YOU

That's not the reason I added it though. As usual, the real truth is far less
interesting. It was pointed out to me, by Tobias Peters, that this problem
originated as a means to compare various programming languages. I examined the
original description of the problem, forwarded to me by Tobias, and tried to
stay close to the original challenge. Allowing single numbers comes from there.

One last interesting quote from Brian's page:

  -e, --encoding ENCODING
  How the alphabet is encoded to phonenumbers. james or logic are supported.

Does this suggest that "james" and "logic" are opposites? Food for thought...

On to the solutions.

I'll show Brian's solution below, because I ran into some minor issues with the
other two. However, do take a look at Jannis Harder's WEBrick servlet. It's
very little work to offer this service to the whole world through a Web
interface and that program shows it off nicely. Lee Marlow's solution was also
short and very straight forward, though the running time was a little high.

Let's inspect Brian's code:

  # Nodes in the Dictionary.
  class DictionaryNode < Array
    # Terminal info
    attr_reader :words

    def initialize
      super(10)
      @words = []
    end
  end

This first piece of the puzzle is a node to be used by a tree class we'll meet
shortly. DictionaryNode is an Array that contains exactly 10 members. Why 10?
Because that's how many digits our encoding has. DictionaryNode also contains
an Array of words, that will be filled from the dictionary file.

Here's the start of Brian's tree class:

  # A tree-indexed version of the dictionary that allows
  # efficent searching by number 2 alphabet mapping.
  class Dictionary
    def initialize(encoding)
      super()
      @encoding = {}
      @inverse_encoding = {}

      encoding.each do | k, v |
        @encoding[k] = v.split(/\s+/).map{|c| c[0]}
      end

      # Create map from characters to numbers
      @inverse_encoding = @encoding.inject({}) { | r, (k, v) |
        v.each do | l | r[l] = k end
        r
      }
      @root = DictionaryNode.new
    end
  
    # ...

That's pretty easy to follow. This setup work creates maps for the encoding of
this Dictionary object. The maps go both ways, numbers to letters and the
inverse letters to numbers. Finally, the root of the tree is created from a new
DictionaryNode.

The following methods add words to the tree from a dictionary file:

    # ...
  
    # Helper method for rekursive adding of words to the dictionary
    private
    def add_recursive(node, word, position)
      if word.length == position
        node.words << word
        return node
      end
      add_recursive(
        node[@inverse_encoding[word[position]]] ||= DictionaryNode.new,
        word,
        position + 1 )
    end
   
    # Add words to the dictionary
    public
    def add(word)
      add_recursive(@root, word, 0)
      self
    end
   
    # Load a wordlist from a file, which contains one word per line.
    # Ignores punctuation and whitespace.
    def load_wordlist(file, options)
      $stderr.print "Loading dictionary... " if options.verbose
      start = Time.new
      file.read.gsub(/[^A-Za-z\n]/, '').upcase!.split($/).uniq!.each do |w|
        w.chomp!
        next if w.empty? or w.length <= options.min_length
        self.add(w)
      end
      if options.verbose
        $stderr.puts "built dictionary in %f seconds" %
          (Time.new-start).to_f
      end
      self
    end
  
    # ...

The dictionary reading process starts in the last method, load_wordlist(). This
method slurps the file, discards illegal characters, normalizes case, breaks up
the list on line boundaries, and eliminates repetition all on the third line.
Each member of the list of words that creates is then sent on to the add()
method. Note the nice use of options.verbose to show a build time.

Before I move on to add(), let me point out that the above method does have a
few minor issues. When I was playing with this code to figure out how it works,
I fed it a five word dictionary and was surprised when it crashed. The cause?
No duplicates. That causes uniq!() to return nil (a hot topic on Ruby Talk
lately) and since nil doesn't support an each() call, the code blew up.
upcase!() has similar problems.

One more minor issue. Here's a tip: When you normalize case, it's generally
better to go down than up. The reason is international support. Some languages
distinguish between things like uppercase and titlecase. That means that a
bunch of uppercase conversions might not be consistent, based on certain local
settings. The best way to avoid such problems is to lowercase content instead.
This isn't much of a problem, but it's a good habit to build.

Back to the code. add(), as you can see, is just a shell over add_recursive().
It passes the word on with the root node and a starting position of 0.

add_recursive() is pretty clever. It digs down into the tree until finding the
right spot to place the word in the Dictionary. This digging happens at the end
of the method with recursive calls. The current letter in the word is examined
and a branch of the tree is created to handle that encoded letter, if it didn't
already exist. The algorithm then moves to that node, examining the next letter
in line. When all the letters have been branched off, we're at the right place
to insert the word. The if at the beginning of the method handles that end
condition.

The last thing a Dictionary object requires is a way to hunt for words. Here
are those methods:

    # ...
  
    private
    # Search words and return (in the block) words and the unmatched rest
    # of the number
    def sub_find(node, number, &block)
      # Return words found so far
      block[node.words.map{|w|w.dup}, number] unless node.words.empty?
      # No more digits, so stop searching here
      return node if number.empty?
      # Search for longer words
      sub_find(node[number[0]], number[1..-1], &block) if node[number[0]]
    end

    private
    # Calculate all allowed skip patterns for a number of a given length
    def skips(s, length)
      return [s] if length == 0
      result = skips(s + [false], length-1)
      result.concat(skips(s + [true], length-1)) unless s[-1]
      result
    end

    public
    
    def find_noskip(number)
      result = []
      sub_find(@root, number) do | words, rest_number |
        if rest_number.empty?
          result.concat(words)
        else
          find_noskip(rest_number).each do | sentence |
            words.each do | w |
              result << w + '-' + sentence
            end
          end
        end
      end
      result
    end

    # Skipping makes this a bit ugly
    def find(number)
      result = []
      skips([], number.length).each do | skipped |

        # Create the injector that can inject the skipped numbers
        # back into the word
        injector = []
        skipped.zip(number).each_with_index do |(s,n), i|
          injector << [n.to_s, i] if s
        end

        # We search for words built from the unskipped digits
        unskipped_digits =
          number.zip(skipped).select{|(d, s)| !s}.map{|(d,s)|d}
        sentences = find_noskip(unskipped_digits)
        # Inject the skipped digits back into the found sentences
        sentences.each do | s |
          injector.each do | (n, i) | s.insert(i, n) end
        end

        result.concat(sentences)
      end
      result
    end
  end

Start with the sub_find() method. It's the key to the search and easy enough to
digest. sub_find() takes a node to search, the number to use in that search,
and a block to pass results to. The first line passes all matching words from
this node to the block, if there are any. The root node, where the algorithm
begins, won't have any since most dictionaries don't include 0 length words.
The second line finishes the process, if we've examined all the numbers. The
third line recurses, moving to the node for the next digit at the head of the
number variable. That's half of the picture.

The find_noskip() method is the public face for that. It calls sub_find(),
passing a block of code that fills the local results Array as matches are found.
When a word matches in the number, find_noskip() recurses looking other words to
finish off the number. Of course, as the name implies, this version of the
process does not skip digits.

For skipping, you need the find() method. find() first calls skip() to
calculate all possible skip patterns for this number. Then, one skip at a time,
find() removes the skipped digits and calls find_noskip() on the remainder.
After results are generated, the skips are reinserted back into their original
locations. That's pretty tricky.

To be clear, this does not function as I intended the quiz to work (and I now
understand why Brian thinks I allowed "bad results"). Numbers were only to be
allowed at word boundaries, while Brian's algorithm will reinsert them into the
middle of words. Looking back, I did not make this very clear in the quiz and
it's certainly my error. Brian's code is still a very nice implementation of
his interpretation of the rules.

Finally, there's an interface that puts all that code to use. Let's look at
that:

  encodings = {
    :james => {
      2 => 'A B C',
      3 => 'D E F',
      4 => 'G H I',
      5 => 'J K L',
      6 => 'M N O',
      7 => 'P Q R S',
      8 => 'T U V',
      9 => 'W X Y Z'},

    :logic => {
      0 => 'A B',
      1 => 'C D',
      2 => 'E F',
      3 => 'G H',
      4 => 'I J K',
      5 => 'L M N',
      6 => 'O P Q',
      7 => 'R S T',
      8 => 'U V W',
      9 => 'X Y Z'
    }
  }

  require 'optparse'

  class PhonewordOptions < OptionParser
    attr_reader :dictionary, :encoding, :format, :allow_skips, :help,
                :encoding_help, :verbose, :min_length
    def initialize
      super()
      @dictionary = '/usr/share/dict/words'
      @encoding = :james
      @format = :plain
      @allow_skips = true
      @help = false
      @encoding_help = false
      @verbose = false
      @ignore_non_alpha = false
      @min_length = 1
      self.on("-d", "--dictionary DICTIONARY", String) { | v |
        @dictionary = v
      }
      self.on("-e", "--encoding ENCODING", String,
              "How the alphabet is encoded to phonenumbers. " +
              "james or logic are supported.") { | v |
        @encoding = v.downcase.to_sym
      }
      self.on("-p", "--plain",
              'One result per found number, ' +
              'no other information. (Default)') { @format = :plain }
      self.on("-f", "--full", 'Prefix the result with the number') {
        @format = :full
      }
      self.on("-v", "--verbose", 'Make more noise') { @verbose = true }
      self.on("-s", "--skips", "--allow_skips", "--allow-skips",
              'Allow to skip one adjacent number while matching. (Default)',
              'Gives lots of ugly results, but james asked for it.') {
         @allow_skips = true
      }
      self.on("-c", "--no-skips",
              "Don't leave numbers in the detected words") {
        @allow_skips = false
      }
      self.on("-m" "--min-length", "Minimum length of accepted words.",
                "Use this to ignore one-letter words that make " +
                "the output quite uninteresting.", Integer) { | v |
        @min_length = v
      }
      self.on("-?", "--help") { @help = true }
      self.on("--supported-encodings", "--encoding-help",
              "List the supported encodings") { @encoding_help = true }
    end
  end

  options = PhonewordOptions.new
  options.parse!(ARGV)

  if options.help
    puts options
    exit
  end

  if options.encoding_help or !encodings[options.encoding]
    puts "Possible encodings:"
    puts encodings.to_a.sort_by{|(k,v)|k.to_s}.map{|(k,v)|
      "#{k}:\n"+v.map{|(n,e)|" #{n}: #{e}"}.sort.join("\n")
    }
    exit
  end

  dictionary = Dictionary.new(encodings[options.encoding]).load_wordlist(
    File.open(options.dictionary), options
  )

  output = {
    :plain => lambda do | number, sentence | sentence end,
    :full => lambda do | number, sentence |
      "#{number.ljust(15)}: #{sentence}"
    end
  }

  method = {true => :find, false => :find_noskip }

  ARGF.each do | number |
    number.strip!
    number = number.gsub(/[^0-9]/, '').unpack('C*').map{|n|n - ?0}
    $stderr.puts "Searching for #{number}" if options.verbose
    dictionary.send(
      method[options.allow_skips], number
    ).each do | sentence |
      puts output[options.format][number, sentence]
    end
  end

Most of that code is option handling. Brian creates his own PhonewordOptions
object, which inherits from OptionParser. In the setup for that object,
defaults are established and option parsing in defined with several calls to
on(). From there, reader methods are provided for all the defined options.
This makes for a pretty self-contained bundle of option parsing and reading.
You can see the options object put to good use, after the class.

That final block is what actually kicks off the program. Each number is read
from ARGF, cleaned up, and passed to the find methods of the dictionary object.
Results from that find are printed, creating a complete solution.

My thanks go out to all three quiz workers. It was nice to have a few people
playing along again.

Tomorrows quiz will stay with the topic of phones and how we use them...

Thanks for all your effort James. It is always interesting to read your
summaries, and I always learn something.

Again I could not resist and corrected the bugs you spotted and hacked
the solution to incorporate the behaviour you intended. It's amazing how
difficult it is to get the specifications right, and to make a
non-native speaker understand them.

Here is the code portion that contains all the important additions:

    def find_wordskip(number)
      result =
      sub_find(@root, number) do | words, rest_number |
        if rest_number.empty?
          result.concat(words)
        elsif rest_number.length == 1
          find_wordskip(rest_number).each do | sentence |
            words.each do | w |
              result << w + '-' + sentence
            end
          end
          words.each do | w |
            result << w + '-' + rest_number[0].to_s
          end
        else
          find_wordskip(rest_number).each do | sentence |
            words.each do | w |
              result << w + '-' + sentence
            end
          end
        end
      end

      sub_find(@root, number[1..-1]) do | words, rest_number |
        if rest_number.empty?
          result.concat(words.map{|w| number[0].to_s + '-' + w})
        else
          find_wordskip(rest_number).each do | sentence |
            words.each do | w |
              result << number[0].to_s + '-' + w + '-' + sentence
            end
          end
        end
      end
      result
    end

it goes through some hoops to allow adding of digits at the beginning of
a sentence, at the end of a sentence and between any two words in the
sentence.

I spat this out while I had to code something important, so I hope I got
the recursion right. On a quick check it did not spit out anything
unwanted, so I hope its correct.

The full solution can be found at:

http://ruby.brian-schroeder.de/quiz/phonewords/

Best regards,

Brian

···

On Thu, 24 Feb 2005 23:35:28 +0900 Ruby Quiz <james@grayproductions.net> wrote:

[Snipped amazingly long essay]

--
http://ruby.brian-schroeder.de/

Hi James,

May I ask what issues you encountered with the others? (I mean, the only
issue I found in mine is that it's too Perl-ish...)

Thanks for all your work on this.

Cheers,
Dave

···

"Ruby Quiz" <james@grayproductions.net> wrote:

...
I'll show Brian's solution below, because I ran into some minor issues
with the
other two. ...

Welcome back Brian. I've missed being teased about the effort I put into the quizzes by a guy who puts in just as much. :wink:

Your code is also always interesting for me to read. I learn things from it every time, so it's a good trade.

James Edward Gray II

···

On Feb 24, 2005, at 9:59 AM, Brian Schröder wrote:

On Thu, 24 Feb 2005 23:35:28 +0900 > Ruby Quiz <james@grayproductions.net> wrote:

[Snipped amazingly long essay]

Thanks for all your effort James. It is always interesting to read your
summaries, and I always learn something.

I posted all the problems I found to the list.

I never saw your solution, until just now when I went hunting for it on Google Groups. I'm very sorry I missed it.

I'm still not finding it in my mail client, so I'm guessing this might have been a problem with the gateway being down? Do you post from Usenet?

Again, very sorry I didn't see it.

James Edward Gray II

···

On Feb 24, 2005, at 6:55 PM, Dave Burt wrote:

Hi James,

May I ask what issues you encountered with the others? (I mean, the only
issue I found in mine is that it's too Perl-ish...)

Yes, I'm a Usenetter, and the gateway did have a little nap back there
(though it seems happy now, as you read my last message). Not your fault.

Cheers,
Dave

···

"James Edward Gray II" <james@grayproductions.net> wrote:

On Feb 24, 2005, at 6:55 PM, Dave Burt wrote:

I posted all the problems I found to the list.

I never saw your solution, until just now when I went hunting for it on
Google Groups. I'm very sorry I missed it.

I'm still not finding it in my mail client, so I'm guessing this might
have been a problem with the gateway being down? Do you post from Usenet?

Dave, would you please resend your message so it will show up on the mailing list for those of us who missed it? Thanks.

James Edward Gray II

···

On Feb 24, 2005, at 7:25 PM, Dave Burt wrote:

Yes, I'm a Usenetter, and the gateway did have a little nap back there
(though it seems happy now, as you read my last message). Not your fault.

Yes, I'm a Usenetter, and the gateway did have a little nap back there
(though it seems happy now, as you read my last message). Not your fault.

Dave, would you please resend your message so it will show up on the
mailing list for those of us who missed it? Thanks.

OK. Apologies to c.l.r for the re-post. Google groups link:
http://groups-beta.google.com/group/comp.lang.ruby/browse_thread/thread/90246bd3ecdf05cc/53f008423008c2fb?#53f008423008c2fb

Originally posted: Wed, 23 Feb 2005 12:41:42 GMT

i once again found the time to do the ruby quiz. I liked the quiz
because it was short, and on the other hand harder than I thought.

Me too.

I
skipped the part about skipping letters in my first try, and when I had
to add it it made me think quite a bit. (I first skipped letters instead
of numbers, because I got confused in my datastructure.)

I tackled it in bits, too, roughly corresponding to my 3 main functions:
match, combine and new_phone_numbers. (See the end of this message for a
link to the program.)

Step 0: setup
Get a map get a digit corresponding to any character. {'A'=>'2',
'B'=>'2'...}
Read dictionary into a hash mapping digit-strings to an array of the words
they can make (uppercase).
{'228' => ['BAT', 'CAT'], ...}

Step 1: match
Check every possible substring of the number for matches in the dictionary
map. (Initially, I just printed these underneath the string in the
corresponding position. I thought I was nearly there.) To move on, I got
this function to produce an array of all these matches, each match being
represented like this: {:start => 1, :length => 3, :words => ['BAT', 'CAT']}

Step 2: combine
Combine gets all non-overlapping combinations of matches from the above
step, and returns an array of combinations. Each combination is an array of
matches (see above... I really should have made Match a class, hey?).

Step 3: new_phone_numbers
Iterates through each word in each match in each combination... except it's
not that simple. 3 combinations x 3 matches each x 1 word each = 9
solutions, no worries. 3 combinations x 3 matches each x 3 words each = 243
solutions. Each word in each match has to be shown with every word in every
other match in the combination. Enter an array of indexes - an index for
each match to keep track of which word it's up to (the variable's called
"index"). So the array of indexes starts at each match's last word, and gets
decremented until it's [0,0...].
Now we've got that tricky loop sorted out, the easy part: substitute the
current word (using index) from each match into the number, and finally
reject the number if it's got 2 digits in a row.

And here's the final product.
http://www.dave.burt.id.au/ruby/phone_words.rb

Cheers,
Dave

···

"James Edward Gray II" <james@grayproductions.net> wrote:

On Feb 24, 2005, at 7:25 PM, Dave Burt wrote:

Subject: Re: [SOLUTION] [QUIZ] 1-800-THE-QUIZ (#20)
"Brian Schröder" <ruby@brian-schroeder.de> wrote: