[QUIZ] Word Chains (#44)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Our own Dave Thomas has also posted some Ruby code challenges on his blog in the
past. There are several interesting problems there:

  http://codekata.org/

This week's Ruby Quiz is one of my favorite problems posted by Dave.

Given two words of equal length as command-line arguments, build a chain of
words connecting the first to the second. Each word in the chain must be in the
dictionary and every step along the chain changes exactly one letter from the
previous word.

Again, your program should accept input as two command-line arguments. Your
program should also allow a -d command-line switch followed by a dictionary file
to use, but feel free to choose a sensible default for your system. The result
should be a word chain starting with the first word and ending with the last
printed to STDOUT, one word per line. Print an error message if a chain cannot
be found.

Bonus points for finding the shortest chain and/or a small execution time.

Here's an example:

  $ time ruby -d word_chain.rb duck ruby
  Loading dictionary...
  Building chain...
  duck
  ruck
  rusk
  ruse
  rube
  ruby
  
  real 0m27.551s
  user 0m27.402s
  sys 0m0.113s

does anyone have a working link for the wordlist.txt dictionary that Dave links (broken) to from codekata?

Ruby Quiz wrote:

···

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Our own Dave Thomas has also posted some Ruby code challenges on his blog in the
past. There are several interesting problems there:

http://codekata.org/

This week's Ruby Quiz is one of my favorite problems posted by Dave.

Given two words of equal length as command-line arguments, build a chain of
words connecting the first to the second. Each word in the chain must be in the
dictionary and every step along the chain changes exactly one letter from the
previous word.

Again, your program should accept input as two command-line arguments. Your
program should also allow a -d command-line switch followed by a dictionary file
to use, but feel free to choose a sensible default for your system. The result
should be a word chain starting with the first word and ending with the last
printed to STDOUT, one word per line. Print an error message if a chain cannot
be found.

Bonus points for finding the shortest chain and/or a small execution time.

Here's an example:

$ time ruby -d word_chain.rb duck ruby
Loading dictionary...
Building chain...
duck
ruck
rusk
ruse
rube
ruby

real 0m27.551s
user 0m27.402s
sys 0m0.113s

How about another bonus challenge:

If your program can find the shortest chain between two words, then extend/modify it to find a longest shortest chain for all words with n letters in the dictionary.
In other words: find two words for which the shortest chain is at least as long as the shortest chain between any other word pair.

For example:

$ ruby longest_shortest.rb -d /usr/share/dict/words 4
idol
idyl
odyl
odal
oral
orad
brad
bead
beak
beck
back
bach
each
etch
utch
utah
utas
upas
$ ruby longest_shortest.rb -d /usr/share/dict/words 5
alure
azure
azury
anury
anura
abura
abuta
aluta
alula
amula
amala
amara
arara
araca
braca
brava
breva
breve
beeve
belve
belee
besee
resee
resex
remex
remix
remit
demit
dimit
digit
dight
wight
wicht
wecht
pecht
pacht
yacht
yasht

Dominik

···

On Fri, 26 Aug 2005 15:24:22 +0200, Ruby Quiz <james@grayproductions.net> wrote:

Bonus points for finding the shortest chain and/or a small execution time.

My fun extra credit (that I may or may not get to) is to find the longest chain for a given starting word :slight_smile:

From: "Max Rabenmann" <flashdrvnk@hotmail.com>
Date: August 27, 2005 3:26:14 PM CDT
To: submission@rubyquiz.com
Subject: Please Forward: Ruby Quiz Submission

Hi,

I produced a solution for rubyquiz #44.
It may not be the fastest solution, but it works.

Thank you.

greetings from germany

_________________________________________________________________
Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! http://desktop.msn.de/ Jetzt gratis downloaden!

wordmorph.rb (2.53 KB)

···

Begin forwarded message:

Here's the code used in the quiz:

#!/usr/local/bin/ruby -w

class WordChain
     @@dictionary = Array.new

     def self.distance( from, to )
         same = 0
         from.length.times { |index| same += 1 if from[index] == to[index] }
         from.length - same
     end

     def self.load_words( file, limit )
         limit = normalize_word(limit).length

         File.foreach(file) do |word|
             word = normalize_word(word)

             next unless word.length == limit

             @@dictionary << word
         end

         @@dictionary.uniq!
     end

     def self.normalize_word( word )
         normal = word.dup

         normal.strip!
         normal.downcase!
         normal.delete!("^a-z")

         normal
     end

     def initialize( start, finish )
         @start = self.class.normalize_word(start)
         @finish = self.class.normalize_word(finish)

         @chain = nil
     end

     attr_reader :start, :finish, :chain

     def link
         chains = Array[Array[@start]]

         until chains.empty? or
               self.class.distance(chains.first.last, @finish) == 1
             chain = chains.shift
             links = @@dictionary.select do |word|
                 self.class.distance(chain.last, word) == 1 and
                 not chain.include? word
             end
             links.each { |link| chains << (chain.dup << link) }

             chains = chains.sort_by do |c|
                 c.length + self.class.distance(c.last, @finish)
             end
         end

         if chains.empty?
             @chain = Array.new
         else
             @chain = (chains.shift << @finish)
         end
     end

     def to_s
         link if @chain.nil?

         if @chain.empty?
             "No chain found between #{@start} and #{@finish}."
         else
             @chain.join("\n")
         end
     end
end

if __FILE__ == $0
     dictionary_file = "/usr/share/dict/words"
     if ARGV.size >= 2 and ARGV.first == "-d"
         ARGV.shift
         dictionary_file = ARGV.shift
     end

     unless ARGV.size == 2 and ARGV.first != ARGV.last and
            ARGV.first.length == ARGV.last.length
         puts "Usage: #{File.basename($0)} [-d DICTIONARY] START_WORD END_WORD"
         exit
     end
     start, finish = ARGV

     warn "Loading dictionary..." if $DEBUG
     WordChain.load_words(dictionary_file, start)
     warn "Building chain..." if $DEBUG
     puts WordChain.new(start, finish)
end

__END__

James Edward Gray II

Hi,

I implemented a bidirectional breadth-first search. I'm using regexps
to speed up dictionary actions.

Extra features:
     -v Debug output.
     -s Single line output.
           Morphing of more than two words.
           Support for non alphabetic chars.

wordmorph.rb (3.8 KB)

Heya dear quizers,

here is my solution.
This is a breadth-first algorithm from both sides.

It may be interesting that it doesn't search the dictinary
for words simmilar do the given ones (or the reachables) but
generates them (rotating each character of the word from a to
z) and looks up the result in a big Set of words (the $dict).

Because the lookup in a hashtable (what Set uses) is constant
the algorithm doesn't slow down with bigger dictinaries. It
does slow down with longer words, but thats quite normal.

Half of the lines are for option handling, search_words and
find_chain are the only interresting methods.

I would realy like to see some speed comparisons (on a large
dict :slight_smile: ).

One more thing, this doesn't use recursion, so doing realy
long chains shoudn't be a problem (except for runtime of
course)

cheers

Simon

···

-----------------------------------------------------------

require 'set'
require 'getoptlong'

def search_words words, known
   hash= Hash.new
   words.each do |word|
     (s = word.dup).size.times do |i|
       26.times do |c| s[i] = ?a + (s[i] + 1 - ?a) % 26
         hash[s] = word if $dict.include?(s) && !known[s]
       end
     end
   end
   hash
end

def find_chain word_a, word_b
   reachable_a = (temp_a = {word_a => 0}).dup
   reachable_b = (temp_b = {word_b => 0}).dup
   found = nil

   loop do
     reachable_a.merge!(temp_a = search_words(temp_a.keys, reachable_a))
     break if found=temp_a.inject(nil){|r, w|reachable_b[w[0]]? w[0]:r}
     warn "reachable a: #{reachable_a.size}" if $verbose

     reachable_b.merge!(temp_b = search_words(temp_b.keys, reachable_b))
     break if found=temp_b.inject(nil){|r, w|reachable_a[w[0]]? w[0]:r}
     warn "reachable b: #{reachable_b.size}" if $verbose

     if temp_a.size == 0 || temp_b.size == 0
       puts "now way!";
       exit 0
     end
   end

   word, list = found, [found]
   while (word = reachable_a[word]) && word != 0
     list.insert(0, word)
   end
   word = found
   while (word = reachable_b[word]) && word != 0
     list.push(word)
   end
   list
end

def usage
   puts "#{$PROGRAM_NAME} [-v|--verbose] [-h|--help]"+
     "[-d|--dictionary] {word1} {word2}"
   exit
end

opts = GetoptLong.new(
   [ "--dictionary", "-d", GetoptLong::REQUIRED_ARGUMENT ],
   [ "--verbose", "-v", GetoptLong::NO_ARGUMENT ],
   [ "--help", "-h", GetoptLong::NO_ARGUMENT ])

$verbose, dictfile = nil, 'words.txt'
opts.each do |opt, arg|
   usage if opt == "--help"
   $verbose = true if opt == "--verbose"
   if opt == "--dictionary"
     dictfile = arg
     usage if dictfile.empty?
   end
end
usage if ARGV.size != 2

$dict = Set.new
open(dictfile){|f| f.each{|w|$dict << w.chomp}}

puts find_chain(ARGV[0], ARGV[1])

-----------------------------------------------------------

Here is my solution.

It uses breadth first search (unidirectional) and is quite fast.
I use a trick to find all possible steps fast: While reading the words I store them in a hash of arrays, e.g. when adding the word "dog", I register "dog" as step for "\0og", "d\0g" and "do\0", later while look for a step for "dot" I will use all words registered for "\0ot", "d\0t" or "do\0". So reading the dictionary takes some time (and memory), but afterwards finding the shortest chains is really fast. I also use symbols instead of strings in some places for faster hash-lookup.

Of course I have also implemented my extra bonus challenge, to find a longest shortest chain and Simon's suggestion to allow words with different length.

Some examples:

$ time ruby word_chain.rb duck ruby
duck
luck
luce
lube
rube
ruby

real 0m1.414s
user 0m1.326s
sys 0m0.028s
$ time ruby word_chain.rb ape human
ape
are
arm
aum
hum
huma
human

real 0m2.604s
user 0m2.484s
sys 0m0.052s
$ time ruby word_chain.rb computer a
computer
compute
compote
compose
compos
compo
campo
camp
cam
aam
aa
a

real 0m12.550s
user 0m11.890s
sys 0m0.232s

The examples with words of different length take longer because much more words from the dictionary have to be added to the hash.

$ time ruby longest_shortest.rb 3
edh
edo
ado
aho
pho
phi
poi
poa
pya
mya

real 0m4.840s
user 0m4.615s
sys 0m0.055s
$ time ruby longest_shortest.rb 6
zounds
wounds
woundy
[49 lines]
pantry
paltry
peltry

real 1m12.614s
user 1m10.113s
sys 0m0.655s
$ time ruby longest_shortest.rb 9
unpausing
unpassing
unpasting
unwasting
unwaiting
unwailing
unfailing
unfalling
unfilling
untilling
untelling
unselling
unsealing
unhealing
unhearing
unwearing
unweaving
unreaving
unreeving
unreeling
unfeeling
unfeeding
unheeding

real 0m9.252s
user 0m8.836s
sys 0m0.158s
$ time ruby longest_shortest.rb 12
modification
codification
conification
sonification
sinification
vinification
vilification
bilification

real 0m7.492s
user 0m7.127s
sys 0m0.138s

Dominik

The code:

# word_chain.rb

···

####################

DEFAULT_DICTIONARY = "/usr/share/dict/words"

# Data structure that efficiently stores words from a dictionary in a way, that
# it is easy to find all words that differ from a given word only at one
# letter (words that could be the next step in a word chain).
# Example: when adding the word "dog", add_word will register "dog" as step for
# "\0og", "d\0g" and "do\0", later each_possible_step("cat") will yield all words
# registered for "\0at", "c\0t" or "ca\0".
class WordSteps
     def initialize
         @steps = Hash.new { |h, k| h[k] = }
         @words = {}
     end

     # yields all words (as strings) that were added with add_word
     def each_word(&block)
         @words.each_key(&block)
     end

     # add all steps for word (a string) to the steps
     def add_word(word)
         sym = word.to_sym
         wdup = word.dup
         for i in 0...word.length
             wdup[i] = 0
             @steps[wdup] << sym
             wdup[i] = word[i]
         end
         @words[word] = sym # for allow_shorter and each_word
     end

     # yields each possible next step for word (a string) as symbol, some
     # possible steps might be yielded multiple times
     # if allow_shorter is true, word[0..-2].to_sym will also be yielded if
     # available
     # if allow_longer is true, all words that match /#{word}./ will be yielded
     def each_possible_step(word, allow_shorter = false, allow_longer = false)
         wdup = word.dup
         for i in 0...word.length
             wdup[i] = 0
             if @steps.has_key?(wdup)
                 @steps[wdup].each { |step| yield step }
             end
             wdup[i] = word[i]
         end
         if allow_shorter && @words.has_key?(tmp = word[0..-2])
             yield @words[tmp]
         end
         if allow_longer && @steps.has_key?(tmp = word + "\0")
             @steps[tmp].each { |step| yield step }
         end
     end

     # tries to find a word chain between word1 and word2 (strings) using all
     # available steps
     # returns the chain as array of symbols or nil, if no chain is found
     # shorter/longer determines if shorter or longer words are allowed in the
     # chain
     def build_word_chain(word1, word2, shorter = false, longer = false)
         # build chain with simple breadth first search
         current = [word1.to_sym]
         pre = { current[0] => nil } # will contain the predecessors
         target = word2.to_sym
         catch(:done) do
             until current.empty?
                 next_step =
                 current.each do |csym|
                     each_possible_step(csym.to_s, shorter, longer) do

ssym>

                         # have we seen this word before?
                         unless pre.has_key? ssym
                             pre[ssym] = csym
                             throw(:done) if ssym == target
                             next_step << ssym
                         end
                     end
                 end
                 current = next_step
             end
             return nil # no chain found
         end
         # build the chain (in reverse order)
         chain = [target]
         chain << target while target = pre[target]
         chain.reverse
     end

     # builds and returns a WordSteps instance "containing" all words with
     # length in length_range from the file file_name
     def self.load_from_file(file_name, length_range)
         word_steps = new
         IO.foreach(file_name) do |line|
             # only load words with correct length
             if length_range === (word = line.strip).length
                 word_steps.add_word(word.downcase)
             end
         end
         word_steps
     end
end

if $0 == __FILE__
     dictionary = DEFAULT_DICTIONARY

     # parse arguments
     if ARGV[0] == "-d"
         ARGV.shift
         dictionary = ARGV.shift
     end
     unless ARGV.size == 2
         puts "usage: #$0 [-d path/to/dictionary] word1 word2"
         exit 1
     end
     word1, word2 = ARGV[0].strip.downcase, ARGV[1].strip.downcase

     shorter = word1.length > word2.length
     longer = word1.length < word2.length
     length_range = if longer
         word1.length..word2.length
     else
         word2.length..word1.length
     end

     # read dictionary
     warn "Loading dictionary..." if $DEBUG
     word_steps = WordSteps.load_from_file(dictionary, length_range)
     word_steps.add_word(word2) # if it is not in dictionary

     # build chain
     warn "Building chain..." if $DEBUG
     chain = word_steps.build_word_chain(word1, word2, shorter, longer)

     # print result
     puts chain || "No chain found!"
end

# longest_shortest.rb
####################

require "word_chain"

class WordSteps
     # builds a longest shortest chain starting with word and returns it as
     # array of symbols
     # if the found chain is shorter than old_max, then it is possible to
     # determine other words, whose longest shortest chain will also be not
     # longer than old_max, those words will be added to not_longer, that way
     # the caller can avoid searching a longest chain for them
     def longest_word_chain(word, not_longer = {}, old_max = 0)
         # build chain with simple breadth first search until no more steps are
         # possible, one of the last steps is then a longest shortest chain
         current = [word.to_sym]
         pre = { current[0] => nil } # will contain the predecessors
         target = nil
         iterations =
         loop do
             next_step =
             iterations << current
             current.each do |csym|
                 each_possible_step(csym.to_s) do |ssym|
                     unless pre.has_key? ssym
                         pre[ssym] = csym
                         next_step << ssym
                     end
                 end
             end
             if next_step.empty?
                 target = current[0]
                 break
             else
                 current = next_step
             end
         end

         # build the chain (in reverse order)
         chain = [target]
         chain << target while target = pre[target]

         # add words to not_longer if possible (see above)
         if chain.size < old_max
             (0...([old_max+1-chain.size, iterations.size].min)).each do |i|
                 iterations[i].each { |w| not_longer[w] = true }
             end
         end
         chain.reverse
     end
end

if $0 == __FILE__
     dictionary = DEFAULT_DICTIONARY

     # parse arguments
     if ARGV[0] == "-d"
         ARGV.shift
         dictionary = ARGV.shift
     end
     word_length = [ARGV.shift.to_i, 1].max

     # read dictionary
     warn "Loading dictionary..." if $DEBUG
     word_steps = WordSteps.load_from_file(dictionary, word_length)

     # search chain
     warn "Searching longest chain..." if $DEBUG
     max = 0
     chain = nil
     not_longer = {}
     word_steps.each_word { |w|
         next if not_longer[w.to_sym]
         cur = word_steps.longest_word_chain(w, not_longer, max)
         if cur.size > max
             chain = cur
             max = cur.size
             warn chain.inspect if $DEBUG
         end
     }

     # print result
     puts chain || "No chain found!"
end

My solution uses the same method of others here: bi-directional
breadth-first-search. This is usually the first thought when looking
for a shortest path in a non-weighted graph. The only difference is
that after I load the dictionary and set up all of the graph edges (My
algorithm right now is pretty slow for doing this), I use Marshal to
save it off to disk. So I only have to load the dictionary and set up
the graph once. When I do the actual graph search it works pretty
quick. I haven't found an instance where it takes more than half a
second. Since I'm doing this I also had it check for the word lengths
and then decide which dictionary to use instead of the -d option.

dictionary.rb

···

==========

class Dictionary
  # This takes in a straight array of words. (It is assumed that they
  # are all the same length, but I should add better error checking
  # on this at some point.)
  def initialize(words)
    @words = Hash.new
    words.each do |word|
      wordmaskarr = []
      word.length.times { |i| wordmaskarr <<
"#{word[0...i]}.#{word[i+1...word.length]}"}
      #puts "#{word}:\t/#{wordmaskarr.join('|')}/"
      wordmask = Regexp.new(wordmaskarr.join('|'))

      @words[word] ||= []
      words.each do |otherword|
        if(otherword =~ wordmask && otherword != word) then
          @words[otherword] ||= []
          #puts "\t\tfound match: #{otherword}"
          @words[word] |= [otherword]
          @words[otherword] |= [word]
        end
      end
    end
  end

  def chain(from, to)
    if(!@words[from]) then
      puts "#{from} not in dictionary"
      return []
    elsif(!@words[to]) then
      puts "#{to} not in dictionary"
      return []
    elsif(from==to)
      return [from]
    end
    linknode = nil
    #these hashes are used to keep links back to where they came from
    fromedges = {from => ""}
    toedges = {to => ""}
    #these are queues used for the breadth first search
    fromqueue = [from]
    toqueue = [to]
    while(toqueue.length>0 && fromqueue.length>0)
      fromnode = fromqueue.shift
      tonode = toqueue.shift
      if(toedges[fromnode] || fromnode==to) then
        linknode = fromnode
        break
      elsif(fromedges[tonode] || tonode==from) then
        linknode = tonode
        break
      end

      @words[fromnode].each do |i|
        if(!fromedges[i]) then
          fromedges[i] = fromnode
          fromqueue.push(i)
        end
      end

      @words[tonode].each do |i|
        if(!toedges[i]) then
          toedges[i] = tonode
          toqueue.push(i)
        end
      end
    end
    if(linknode == nil) then
      return nil
    end
    chain = []
    currnode = linknode
    while(fromedges[currnode] != "")
      chain.unshift(currnode)
      currnode = fromedges[currnode]
    end
    currnode = toedges[linknode]
    while(toedges[currnode] != "")
      chain.push(currnode)
      currnode = toedges[currnode]
    end
    return [from]+chain+[to]
  end
end

makedicts.rb

require 'dictionary'
dicts = Hash.new
File.open("WORD.LST") do |file|
  file.each_line do |line|
    line.chomp!.downcase!
    (dicts[line.length] ||= []) << line
  end
end

dicts.each do |len,dict|
  dict.sort!
  File.open("dict#{0 if len<10}#{len}.txt", "w") do |file|
    file.write(dict.join("\n"))
  end
end

dicts.each do |len, dict|
  if len<50 then
    puts "Making dictionary graph for #{len} letter words"
    currdict = Dictionary.new(dict)
    File.open("dict#{0 if len<10}#{len}.dump", "w") do |file|
      Marshal.dump(currdict, file)
    end
  end
end

wordchain.rb

require 'dictionary'
if(ARGV.length != 2) then puts "Two words required"; exit(1) end
from = ARGV[0]
to = ARGV[1]
if(from.length != to.length) then puts "Word are different lengths";
exit(1) end
dict = nil
File.open("dict#{0 if from.length<10}#{from.length}.dump") do |file|
  dict = Marshal.load(file)
  chain = dict.chain(from, to)
  if(chain) then
    puts chain.join("\n")
  else
    puts "No link found"
  end
end

Example runs:
$ time ruby wordchain.rb duck ruby
duck
duce
luce
lube
rube
ruby

real 0m0.196s
user 0m0.186s
sys 0m0.030s

$ time ruby wordchain.rb kitten rabies
kitten
kitted
kilted
killed
gilled
galled
gabled
gables
gabies
rabies

real 0m0.347s
user 0m0.343s
sys 0m0.030s

Here comes my solution. It relies on the priority queue posted in
another mail and is really fast.

It includes an intelligent neighbour finder and features the A*
algorithm in "starring" role.

regards,

Brian

wordchain.rb (3.84 KB)

README (953 Bytes)

priority_queue.c (13.3 KB)

test.rb (1.15 KB)

extconf.rb (49 Bytes)

···

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Not that one specifically, but I usually use SCOWL from:

http://wordlist.sourceforge.net/

Although I see I was lazy with the program used in the quiz and just read from /usr/share/dict/words.

Another word list that has been used in past Ruby Quiz challenges is the 12dicts package from:

http://prdownloads.sourceforge.net/wordlist/12dicts-4.0.zip

Hope that helps.

James Edward Gray II

···

On Aug 26, 2005, at 9:03 AM, Francois Paul wrote:

does anyone have a working link for the wordlist.txt dictionary that Dave links (broken) to from codekata?

Dominik Bathon wrote:

Bonus points for finding the shortest chain and/or a small execution time.

How about another bonus challenge:

If your program can find the shortest chain between two words, then extend/modify it to find a longest shortest chain for all words with n letters in the dictionary.
In other words: find two words for which the shortest chain is at least as long as the shortest chain between any other word pair.

For example:

$ ruby longest_shortest.rb -d /usr/share/dict/words 4
idol
idyl
odyl
odal
oral
orad
brad
bead
beak
beck
back
bach
each
etch
utch
utah
utas
upas

And another challenge, explain all the words your program emits..

who/what/where is utas? or utch? even 'orad' seems totaly foreign
to me - but than perhaps thats normal as i'm just a german guy.

Back to your question, yes sounds more like a challenge (in terms
of acceptable runtime). I would like to suggest finding chains
between words of different lengths.

like this:

refugees
refugee
refuges
refutes
reputes
reputed
deputed
debuted
debated
rebated
related
relayed
delayed
decayed
decoyed
decoded
decodes
decides
deciles
defiles
refiles
refile
refine
repine
rapine
raping
rating
bating
basing
basin
basis
basts
bests
beats
beams
seams
shams
shame

don't get me wrong, i just think your dictionary is a bit over
the top...

cheers

Simon

···

On Fri, 26 Aug 2005 15:24:22 +0200, Ruby Quiz > <james@grayproductions.net> wrote:

Here's my solution. It's most surely not the fastest, but it can be fast.

Despite the availability of HighLine and others, I still am not adept at setting up command-line processing. It just takes too long for such a menial task. So mine takes four arguments, in order:
* The first word
* The second word
* A max recursion depth (more below)
* The path to a dictionary file
(I used 12dicts's "2of12inf" in my testing. My own /usr/share/dict/words didn't even have the word "rats" in it, and was filled with all sorts of stupid password-testing non-words.)

Rather than learn Djikstra's algorithm, I stubbornly yet-again chose to come up with my own shortest-path algorithm. Mine does a single-ended depth-first search, but uses a global variable to keep track of the shortest path found so far, and stops any paths which surpass that limit.

(Due to some fencepost error in my thinking, if it finds a 10-item chain, it will keep finding other 10-item chains until it finds a 9-item chain. I was too lazy to think about the problem and figure out where it lay; I kept trying "empirical programming" ("Let's see what happens when I change this from >= to >...") but never found the right combination to get it pared down properly. This is why I know mine will not be the fastest :slight_smile:

Before I implemented the $max_depth aspect, my algorithm would merrily dive 1600 levels deep and overflow the stack. By starting off with a reasonable maximum depth, I both prevent that and (with a good guess at what chain length I might be able to find) can massively speed things up.

For example, using a pair of words that is particularly hard for my algorithm to find the chain for:
[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 12
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 12
...
425.621661 seconds elapsed

[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 10
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 10
...
17.003073 seconds elapsed

[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 8
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 8
...
3.069903 seconds elapsed

[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 6
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 6
...
(no path found)
1.527844 seconds elapsed

I kept meaning to try a solution that does a double-ended 'blossom', spreading out from the two end words and seeing if any of the available words in each word intersect. Because I had limited programming time, I didn't get to try this out; I was concerned that it would be a memory hog. Possibly it would have been uber fast.

I played around with sets versus hashes versus arrays in various spots of my code, but (even with profiling) couldn't find an area that made much of a speed difference. The algorithm is king.

I just realized that I'm only keeping track of already-visited nodes per recursion path...I probably could speed things up quite a bit by globally reducing the pool of available links for ALL paths as a word is visited. Damn.

Maybe next shortest-path quiz we get I'll finally look up Djikstra's (Dijkstra's?) algorithm and try to win the speed race. It's just less fun (but more effective) to look up someone else's research and implement it; far more fun to play with (broken) algorithms on my own :slight_smile:

#!/usr/local/bin/ruby

class String
   # Finds all words in _legal_words_ which may be achieved by varying
   # one letter of the receiving string.

···

#
   # Legal words must respond to the #include? method
   def variations( legal_words )
     letters = 'abcdefghijklmnopqrstuvwxyz'.split('')
     my_characters = self.split( '' )
     all_variations = []
     my_characters.each_with_index{ |orig_char,i|
       letters.each{ |new_char|
         if new_char != orig_char
           test_word = self.dup
           test_word[ i ] = new_char
           all_variations << test_word if legal_words.include?( test_word )
         end
       }
     }
     all_variations
   end

   # Finds and returns an array that links the current word to _dest_word_,
   # where each link in the chain is a 'variation' of the previous link.
   #
   # Legal words is a common pool of words to
   def link_to( dest_word, legal_words, forbidden_words=[], depth=1 )
     return nil if (depth+1) >= $max_depth

     #if $DEBUG
       #puts "#{'.'*depth}link from '#{self}' (d:#{depth}; max:#{$max_depth})"
     #end

     # find the words that I can link to, that haven't already been seen
     links = self.variations( legal_words ) - forbidden_words

     if links.include?( dest_word )
       #Sweet, I have a direct route!
       [ self ]
     else
       path = nil
       links.each{ |iword|
         seen_words = forbidden_words.dup << self
         if test_path = iword.link_to(dest_word,legal_words,seen_words,depth+1)
           total_length = depth + test_path.length + 1
           return nil if total_length > $max_depth
           path = test_path
           $max_depth = total_length
           puts "FOUND PATH of length #{total_length}" if $DEBUG
         end
       }
       if path
         path << self
       else
         nil
       end
     end
   end
end

start_word = ARGV[0] || 'crave'
end_word = ARGV[1] || 'primp'
$max_depth = Integer( ARGV[2] || 10 )
dict = ARGV[3] || '/usr/share/dict/words'
#dict = ARGV[3] || './12dicts-4/2of12inf.txt'

desired_length = start_word.length
unless end_word.length == desired_length
   msg = "Error: '#{start_word}' and '#{end_word}' are not the same length"
   msg << "(#{start_word.length} vs. #{end_word.length})"
   raise msg
end

print "Finding chain between #{start_word} and #{end_word} using #{dict}, "
puts "going no deeper than #{$max_depth}"

# Hash seems to be a hair faster than Set for my purposes
hash_path = "#{dict.gsub( /[^\w\d]/, '_' )}_#{desired_length}.marshall"

# Load/generate words of the right length
avail_words = {}
if File.exists?( hash_path )
   File.open( hash_path, 'r' ){ |f|
     avail_words = Marshal.load( f )
   }
else
   File.open( dict, 'r' ){ |f|
     w = f.read.split(/[\r\n]+/)
     # No capital words, or words ending with % (12dicts)
     w.reject!{ |word| word.length != desired_length or /[^a-z]/ =~ word }
     w.each{ |word| avail_words[ word ] = true }
   }
   File.open( hash_path, 'wb' ){ |f|
     f << Marshal.dump( avail_words )
   }
end

puts "#{avail_words.length} words with #{desired_length} letters"

unless avail_words.include?( end_word )
   raise "Error: '#{end_word}' is not included in #{dict}"
end

#Because Math is Hard
$max_depth += 1

start_time = Time.new
if path = start_word.link_to( end_word, avail_words )
   path.reverse!
   path << end_word
   puts path.join( "\n" )
else
   puts "(no path found)"
end
end_time = Time.new
puts " ", "#{end_time-start_time} seconds elapsed"

What's the benefit to using #warn rather than #puts here? stderr versus stdout?

···

On Aug 28, 2005, at 7:31 AM, James Edward Gray II wrote:

    warn "Loading dictionary..." if $DEBUG
    WordChain.load_words(dictionary_file, start)
    warn "Building chain..." if $DEBUG
    puts WordChain.new(start, finish)

My solution is attached or at <http://tinyurl.com/dpvot&gt; (needs data:-url support in your browser)

I really liked how the interface turned out:

puts WordChains.new(words).from("ruby").shortest_path_to("duck")

WordChains.new(words).from("ruby").each { |path|
   puts path.join(", ")
}

Is there some kind of priority queue in the stdlib or available as a gem?

Examples:

   $ time ruby word_chains.rb duck ruby
   duck
   dusk
   rusk
   ruse
   rube
   ruby

   real 0m5.173s
   user 0m1.434s
   sys 0m0.181s

All chains from a specific word:

   $ time ruby word_chains.rb ruby
   [...]
   ruby, rubs, ribs, rims, aims, alms, elms, elma, erma, erna, edna, edda, eddy, edgy, edge

   real 0m24.723
   user 0m7.248
   sys 0m0.643

Search the longest chain for all words (this will take _forever_)

   $ time ruby -d word_chains.rb -l 2
   [...]
   al, cl, ca, ya, ye
   5

   real 0m49.587

-Levin

word_chains.rb (4.64 KB)

···

James Edward Gray II <james@grayproductions.net> wrote:

#: Jannis Harder changed the world a bit at a time by saying on 8/28/2005 8:39 PM :#

Hi,

I implemented a bidirectional breadth-first search. I'm using regexps
to speed up dictionary actions.

Extra features:
     -v Debug output.
     -s Single line output.
           Morphing of more than two words.
           Support for non alphabetic chars.

Sorry to come in the middle of the solution postings but I was wondering if anybody can point me to an equivalent of linux 'time' for windows.

thanks in advance,
:alex |.::the_mindstorm::.|

Here's my solution,
It's another bidirectional breadth first search
I coded it up before I saw all the others, but I think it ended up
being pretty similar.
-Adam

···

---
# Ruby Quiz Word Chain Solution
# Adam Shelly
# Late August, 2005

# Usage wordchain [-d dictionary] [-l] word word ...
# finds chains between every pair of words on the command line.
# uses dictionary file specified by -d, or /usr/share/dict/words by default.
# if -l is specified finds the longest chain from each word.

class String
    def shift
        self.slice(1,length-1)
    end
    def parent= p
        @parent = p
    end
    def parent
        @parent
    end
end

class WCSolver
    #Loads only the words with a given length.
    #didn't add much speed, but saves memory.
    def initialize(wordlist, length)
        @t = Hash.new
        @len = length
        add_words(wordlist)
    end
    
    def add_words(wordlist)
        w2=nil
        wordlist.each {|w| @t[w2]=true if (w2=w.strip).length == @len }
    end

    def neighbors word
        list = []
        word.size.times do |i|
            w = word.dup
            w.parent = word
            ?a.upto(?z) {
                >c>w[i]=c
              list << w.dup if (@t[w] )
            }
        end
        list.uniq
    end

    #bidirectional breadth first search
    def solve first, last
        puts "Connecting #{first} -> #{last}"
        return nil if @len && (first.length != @len || last.length != @len)
        seenhead, seentail, match = [],[],[]
        head,tail = [first],[last]
        while match == []
            r = head.collect{|w| neighbors(w)}.flatten - seenhead
      return nil if r == []
            seenhead += head = r
            match = head & tail
            break unless (match == [])
      r= tail.collect{|w| neighbors(w)}.flatten - seentail
      return nil if r == []
      seentail += tail = r
      match = head & tail
        end
        r = [ head.find {|w| w==match[0]}]
        while r[-1].parent do r << r[-1].parent end
        r.reverse!
        r[-1]=tail.find {|w| w==match[0]}
        while r[-1].parent do r << r[-1].parent end
        r
    end

    def longest from
        puts "Longest Chain from #{from}"
        seen,head = [],[from]
        while true
            r= head.collect{|w| neighbors(w)}.flatten - seen
            break if r == []
            seen += head = r
        end
        l = [head[0]]
        while l[-1].parent do
            l << l[-1].parent
        end
        l.reverse!
    end
end

if $0 == __FILE__
    puts "Wordchain Finder\n"

    file = ARGV.slice!(ARGV.index('-d'),2)[1] if ARGV.include?'-d'
    file ||= "/usr/share/dict/words"

    showlong = ARGV.slice!(ARGV.index('-l'),1) if ARGV.include?'-l'
    length = ARGV[0].length if ARGV[0]

        wordlist = File.new(file, "r")
    if (wordlist)
        s = WCSolver.new(wordlist,length)
        while (w = ARGV.shift)
            unless (showlong)
                puts s.solve(w, ARGV[0]) if ARGV[0]
            else
                puts s.longest(w)
            end
        end
    end
end

Here is mine. Nothing special, except that I used Dijkstra algorithm
from the graph library (http://raa.ruby-lang.org/project/graph/).
Looking at the library source, I noticed that it doesn't use a
priority queue, hence it could be easily made faster by using Brian
Schröder's.

chain.rb (1.9 KB)

···

---
Paolo

I updated my solution:

- No longer outputs solution in reverse order
- Can calculate a chain along multiple words
- Nicer commandline parsing

an example follows:

bschroed@black:~/svn/projekte/wordchain$ time ruby wordchain.rb -d
/usr/share/dict/british-english ruby duck send self
Loading database
Searching connection between ruby and duck
ruby
rubs
dubs
duns
dunk
duck

Searching connection between duck and send
duck
fuck
funk
fund
fend
send

Searching connection between send and self
send
wend
weld
well
sell
self

real 0m3.796s
user 0m3.356s
sys 0m0.439s

regards,

Brian

wordchain.rb (3.96 KB)

···

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/