[SUMMARY] Word Blender (#108)

I'm almost embarrassed to admit that I originally turned this quiz down. I
thought it would be too similar to the Scrabble Stems problem we did long, long
ago. Ben politely explained that he felt it was different enough though, and
then sent some guy named Guido after me in a parking lot one night. That
convinced me to actually work the problem, and I had a change of heart. I think
we can tell from the popularity of the problem that Ben is smarter than I am, so
I'm glad I did.

Many solvers chose to build the entire game, so we will take a look at that
approach. Though the quiz doesn't explicitly call for it, most programmers
chose to add scoring or a time limit to their solution to spice up the action.
I went with the time limit and we will examine my own code below.

The first step is to get a word list, of course. There were several approaches
to this process, since manipulating every word in the dictionary each time could
get a little slow. My answer to this was just to cache the word list after I
had built it once and reuse that for all future runs. Here's the code that
handles the loading and caching:

  # game date cache
  CACHE_FILE = ".game_words"
  
  if File.exist? CACHE_FILE # load from cache
    word_list = File.open(CACHE_FILE) { |file| Marshal.load(file) }
  else # build word list
    # prepare data structure
    words_by_signature = Hash.new { |words, sig| words[sig] = Array.new }
  
    # read dictionary
    File.foreach(ARGV.shift || "/usr/share/dict/words") do |word|
      word.downcase!
      word.delete!("^a-z")
    
      next unless word.length.between? 3, 6
    
      (words_by_signature[word.split("").sort.join] << word).uniq!
    end
  
    # prepare recursive signature search
    def choices( sig,
                 seen = Hash.new { |all, cur| all[cur] = true; false },
                 &blk )
      sig.length.times do |i|
        shorter = sig[0...i] + sig[(i+1)...sig.length]
        unless seen[shorter]
          blk[shorter]
          choices(shorter, seen, &blk) unless shorter.length == 3
        end
      end
    end
  
    # prepare game data structure
    word_list = Hash.new
  
    # build game choices
    words_by_signature.keys.grep(/\A.{6}\Z/) do |possible|
      word_list[possible] = words_by_signature[possible]
    
      choices(possible) do |shorter_signature|
        if words_by_signature.include? shorter_signature
          word_list[possible].push(*words_by_signature[shorter_signature])
        end
      end
    end
  
    # cache for faster loads
    File.open(CACHE_FILE, "w") { |file| Marshal.dump(word_list, file) }
  end
  
  # ...

This uses Marshal to build a trivial word cache with only a few lines of code.
If the cache exists, we slurp it back in. Otherwise we build the cache and save
it for future runs.

To build the cache, I pluck all three to six letter words out of the indicated
dictionary file and build a word list containing all six letter words linked to
smaller words using the same letters.

The main trick used in this recursive grouping of words is the use of
"signatures." A word's signature is just the sorted order of the letters in the
word: aejms for james, for example. Comparing signatures makes it trivial to
find words that use the same letters, since their signatures will be the same.

Using the signatures, the choices() method just removes one character at a time
recursing through the word list. This allows me to find all of the smaller
words that can be formed using the same letters.

If you wanted to generate the entire list of games as the quiz suggested, the
above is all you need. Each key is one possible round with the values being the
words that can be matched in that round.

I wanted to play though and built a full game interface. My interface requires
Unix, because those are the tricks I know. Here's the start of that code:

  # ...
  
  require "io/wait"
  
  ### game interface (requires Unix) ###
  TERMINAL_STATE = `stty -g`
  system "stty raw -echo cbreak"
  at_exit { system "stty #{TERMINAL_STATE}" }
  clear = `clear`
  
  # a raw mode savvy puts
  def out(*args) print(*(args + ["\r\n"])) end
  
  # for easy selection
  words = word_list.keys
  
  # ...

This setup code memorizes the original state of the user's terminal, modifies
that state to raw mode so I can read individual characters as they are pressed,
arranges to have the terminal settings restored at exit, grabs the escape
sequence we can use to clear the terminal, and builds a puts() like method that
works with raw mode. This code doesn't really have much to do with Ruby. I'm
just shelling out to standard Unix utilities here.

We're now ready for the game event loop:

  # ...
  
  rounds = 0
  loop do
    # select letters
    letters = current = words[rand(words.size)]
    while word_list.include? letters
      letters = letters.split("").sort_by { rand }.join
    end
    letters.gsub!(/.(?=.)/, '\0 ')
    
    # round data
    advance = false
    matches = Array.new
    current_match = String.new
    start = Time.now
    message = nil
    last_update = start - 1
    
    # ...

I begin by selecting a word for the round and scrambling that word's letters
until they are no longer a real word. Then I setup some variables to hold data
for the round like whether or not the user has found a six letter word and
should advance as well as any feedback message I am currently showing the user
and the last time I refreshed the screen.

Now we start the round event loop:

    # ...
    
    # round event loop
    until Time.now >= start + 2 * 60
      # game display
      if last_update <= Time.now - 1
        print clear
  
        out "Your letters: #{letters}"
        out " Time left: #{120 - (Time.now - start).round} seconds"
        out " Your words: #{matches.join(', ')}"
        out
        unless message.nil?
          out message
          out
        end
        print current_match
        $stdout.flush
        
        last_update = Time.now
      end
      
      # ...

The round event loop runs for two minutes and this first bit is responsible for
drawing the screen. After clearing the screen, it prints the letters, time
left, words found, and any feedback message to the screen. Note that I update
the screen each second, assuming no other activity, so the user will notice the
clock counting down.

Here's the input processing:

      # ...
      
      # input handler
      if $stdin.ready?
        char = $stdin.getc
        case char
        when ?a..?z, ?A..?Z # read input
          current_match << char.chr.downcase
          message = nil
          last_update = start - 1
        when ?\b, 127 # backspace/delete
          current_match = current_match[0..-2]
          message = nil
          last_update = start - 1
        when ?\r, ?\n # test entered word
          if word_list[current].include? current_match
            matches << current_match
            matches = matches.sort_by { |word| [word.size, word] }
            if not advance and current_match.length == 6
              advance = true
              message = "You will advance to the next round!"
            else
              message = nil
            end
          else
            message = "Unknown word."
          end
          current_match = String.new
          last_update = start - 1
        end
      end
    end
    
    # ...

This just checks to see if there is data waiting on STDIN. We don't want to
read from it without checking as that could block the application waiting for
input. The ready?() method used here is added by the io/wait library and will
return true if there is data waiting. The rest of the code just handles the
input we get. Letters are recorded, we support backspace, and a carriage-return
tells us to try the current word, set some feedback, and refresh the display.

When the round is done, we finish out the game loop:

    # ...
    
    # round results
    print clear
    missed = word_list[current] - matches
    unless missed.empty?
      out "Other words using \"#{letters}:\""
      out missed.sort_by { |word| [word.size, word] }.join(", ")
      out
    end
    if advance
      rounds += 1
      
      out "You made #{matches.size} word#{'s' if matches.size != 1}, ",
          "including at least one six letter word. Nice work."
      out "Press any key to begin the next round."
      
      $stdin.getc
    else
      out "You made #{matches.size} word#{'s' if matches.size != 1}, ",
          "but failed to find a six letter word."
      
      break # end game
    end
  end
  
  # game results
  out "You completed #{rounds} round#{'s' if rounds != 1}. ",
      "Thanks for playing."

The above code just prints missed words and results. If you found a six letter
word, the code will loop to a new round. Otherwise it will break out of the
program.

A big thank you to Ben (and Guido!) for convincing me to try the quiz and to all
those that took the time to play with it.

Tomorrow we'll try a problem that has been making the rounds, literally...

I used a different signature method - I mapped each unique letter in
the target word to a prime, and then used the product of the primes of
all the letters in a word as its signature. That way, a is contained
in b if signature(b) % signature(a) == 0, and you can generate the
signatures via each_byte, array lookup and integer multiplication (no
need for split, sort or string deletion).

martin

···

On 1/11/07, Ruby Quiz <james@grayproductions.net> wrote:

The main trick used in this recursive grouping of words is the use of
"signatures." A word's signature is just the sorted order of the letters in the
word: aejms for james, for example. Comparing signatures makes it trivial to
find words that use the same letters, since their signatures will be the same.

Using the signatures, the choices() method just removes one character at a time
recursing through the word list. This allows me to find all of the smaller
words that can be formed using the same letters.

I'm almost embarrassed to admit that I originally turned this quiz down. I
thought it would be too similar to the Scrabble Stems problem we did long, long
ago. Ben politely explained that he felt it was different enough though, and
then sent some guy named Guido after me in a parking lot one night. That
convinced me to actually work the problem, and I had a change of heart. I think
we can tell from the popularity of the problem that Ben is smarter than I am, so
I'm glad I did.

Coupla things:

1) Gianni, not Guido. I can understand the confusion, though, as it was
   dark and probably hurt a bunch.

2) I didn't manage to write a solution to the quiz, despite the fact that
   I had at least a month's time more than most everyone else, so none of
   this smarter business. :slight_smile:

I wanted to play though and built a full game interface. My interface requires
Unix, because those are the tricks I know. Here's the start of that code:

<snip>

This setup code memorizes the original state of the user's terminal, modifies
that state to raw mode so I can read individual characters as they are pressed,
arranges to have the terminal settings restored at exit, grabs the escape
sequence we can use to clear the terminal, and builds a puts() like method that
works with raw mode. This code doesn't really have much to do with Ruby. I'm
just shelling out to standard Unix utilities here.

<snip awesomeness>

Further proof of #2 above.

Ben

···

On Thu, Jan 11, 2007, Ruby Quiz wrote:

Very interesting. I've never seen that before. I like it.

My version of "signatures" comes out of Programming Peals, if my memory is right. Just FYI.

James Edward Gray II

···

On Jan 11, 2007, at 7:54 AM, Martin DeMello wrote:

On 1/11/07, Ruby Quiz <james@grayproductions.net> wrote:

The main trick used in this recursive grouping of words is the use of
"signatures." A word's signature is just the sorted order of the letters in the
word: aejms for james, for example. Comparing signatures makes it trivial to
find words that use the same letters, since their signatures will be the same.

Using the signatures, the choices() method just removes one character at a time
recursing through the word list. This allows me to find all of the smaller
words that can be formed using the same letters.

I used a different signature method - I mapped each unique letter in
the target word to a prime, and then used the product of the primes of
all the letters in a word as its signature. That way, a is contained
in b if signature(b) % signature(a) == 0, and you can generate the
signatures via each_byte, array lookup and integer multiplication (no
need for split, sort or string deletion).

Holy crap. That is incredibly cool. Thanks for sharing this technique!

Ben

···

On Thu, Jan 11, 2007, Martin DeMello wrote:

I used a different signature method - I mapped each unique letter in
the target word to a prime, and then used the product of the primes of
all the letters in a word as its signature. That way, a is contained
in b if signature(b) % signature(a) == 0, and you can generate the
signatures via each_byte, array lookup and integer multiplication (no
need for split, sort or string deletion).

The numbers overflow 32 bits in the general case, I think, but for
this restricted problem it works very nicely.

martin

···

On 1/11/07, James Edward Gray II <james@grayproductions.net> wrote:

On Jan 11, 2007, at 7:54 AM, Martin DeMello wrote:
> I used a different signature method - I mapped each unique letter in
> the target word to a prime, and then used the product of the primes of
> all the letters in a word as its signature. That way, a is contained
> in b if signature(b) % signature(a) == 0, and you can generate the
> signatures via each_byte, array lookup and integer multiplication (no
> need for split, sort or string deletion).

Very interesting. I've never seen that before. I like it.

I believe with Ruby they'll get converted to Bignum automatically with no
overflow, though I bet the running time suffers.

···

On 1/11/07, Martin DeMello <martindemello@gmail.com> wrote:

On 1/11/07, James Edward Gray II <james@grayproductions.net> wrote:
> On Jan 11, 2007, at 7:54 AM, Martin DeMello wrote:
> > I used a different signature method - I mapped each unique letter in
> > the target word to a prime, and then used the product of the primes of
> > all the letters in a word as its signature. That way, a is contained
> > in b if signature(b) % signature(a) == 0, and you can generate the
> > signatures via each_byte, array lookup and integer multiplication (no
> > need for split, sort or string deletion).
>
> Very interesting. I've never seen that before. I like it.

The numbers overflow 32 bits in the general case, I think, but for
this restricted problem it works very nicely.

martin

Martin DeMello wrote:

> > I used a different signature method - I mapped each unique letter in
> > the target word to a prime, and then used the product of the primes of
> > all the letters in a word as its signature. That way, a is contained
> > in b if signature(b) % signature(a) == 0, and you can generate the
> > signatures via each_byte, array lookup and integer multiplication (no
> > need for split, sort or string deletion).
>
> Very interesting. I've never seen that before. I like it.

The numbers overflow 32 bits in the general case, I think, but for
this restricted problem it works very nicely.

Specifically (I was wondering) you can use 9 primes and still be under
32 bits, 15 primes and still be under 64 bits.

module Enumerable
  def product; inject(){ |n,p| p*n }; end
end

first9 = primes[0..8]
puts "First 9 primes:\n%s\nproduct: %d (%d bits)" %
  [
    first9.inspect,
    first9.product,
    first9.product.to_s(2).length
  ]

first15 = primes[0..14]
puts "\nFirst 15 primes:\n%s\nproduct: %d (%d bits)" %
  [
    first15.inspect,
    first15.product,
    first15.product.to_s(2).length
  ]

#=> First 9 primes:
#=> [2, 3, 5, 7, 11, 13, 17, 19, 23]
#=> product: 223092870 (28 bits)

#=> First 15 primes:
#=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
#=> product: 614889782588491410 (60 bits)

25 primes puts you at 121 bits; 26 primes at 128 bits on the nose.

···

On 1/11/07, James Edward Gray II <james@grayproductions.net> wrote:
> On Jan 11, 2007, at 7:54 AM, Martin DeMello wrote: