Here's my solution. It's most surely not the fastest, but it can be fast.
Despite the availability of HighLine and others, I still am not adept at setting up command-line processing. It just takes too long for such a menial task. So mine takes four arguments, in order:
* The first word
* The second word
* A max recursion depth (more below)
* The path to a dictionary file
(I used 12dicts's "2of12inf" in my testing. My own /usr/share/dict/words didn't even have the word "rats" in it, and was filled with all sorts of stupid password-testing non-words.)
Rather than learn Djikstra's algorithm, I stubbornly yet-again chose to come up with my own shortest-path algorithm. Mine does a single-ended depth-first search, but uses a global variable to keep track of the shortest path found so far, and stops any paths which surpass that limit.
(Due to some fencepost error in my thinking, if it finds a 10-item chain, it will keep finding other 10-item chains until it finds a 9-item chain. I was too lazy to think about the problem and figure out where it lay; I kept trying "empirical programming" ("Let's see what happens when I change this from >= to >...") but never found the right combination to get it pared down properly. This is why I know mine will not be the fastest
Before I implemented the $max_depth aspect, my algorithm would merrily dive 1600 levels deep and overflow the stack. By starting off with a reasonable maximum depth, I both prevent that and (with a good guess at what chain length I might be able to find) can massively speed things up.
For example, using a pair of words that is particularly hard for my algorithm to find the chain for:
[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 12
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 12
...
425.621661 seconds elapsed
[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 10
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 10
...
17.003073 seconds elapsed
[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 8
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 8
...
3.069903 seconds elapsed
[Slim:~/Sites/Ruby] gavinkis% ruby -d wordchains.rb crate primp 6
Finding chain between crate and primp using ./12dicts-4/2of12inf.txt, going no deeper than 6
...
(no path found)
1.527844 seconds elapsed
I kept meaning to try a solution that does a double-ended 'blossom', spreading out from the two end words and seeing if any of the available words in each word intersect. Because I had limited programming time, I didn't get to try this out; I was concerned that it would be a memory hog. Possibly it would have been uber fast.
I played around with sets versus hashes versus arrays in various spots of my code, but (even with profiling) couldn't find an area that made much of a speed difference. The algorithm is king.
I just realized that I'm only keeping track of already-visited nodes per recursion path...I probably could speed things up quite a bit by globally reducing the pool of available links for ALL paths as a word is visited. Damn.
Maybe next shortest-path quiz we get I'll finally look up Djikstra's (Dijkstra's?) algorithm and try to win the speed race. It's just less fun (but more effective) to look up someone else's research and implement it; far more fun to play with (broken) algorithms on my own
#!/usr/local/bin/ruby
class String
# Finds all words in _legal_words_ which may be achieved by varying
# one letter of the receiving string.
···
#
# Legal words must respond to the #include? method
def variations( legal_words )
letters = 'abcdefghijklmnopqrstuvwxyz'.split('')
my_characters = self.split( '' )
all_variations = []
my_characters.each_with_index{ |orig_char,i|
letters.each{ |new_char|
if new_char != orig_char
test_word = self.dup
test_word[ i ] = new_char
all_variations << test_word if legal_words.include?( test_word )
end
}
}
all_variations
end
# Finds and returns an array that links the current word to _dest_word_,
# where each link in the chain is a 'variation' of the previous link.
#
# Legal words is a common pool of words to
def link_to( dest_word, legal_words, forbidden_words=[], depth=1 )
return nil if (depth+1) >= $max_depth
#if $DEBUG
#puts "#{'.'*depth}link from '#{self}' (d:#{depth}; max:#{$max_depth})"
#end
# find the words that I can link to, that haven't already been seen
links = self.variations( legal_words ) - forbidden_words
if links.include?( dest_word )
#Sweet, I have a direct route!
[ self ]
else
path = nil
links.each{ |iword|
seen_words = forbidden_words.dup << self
if test_path = iword.link_to(dest_word,legal_words,seen_words,depth+1)
total_length = depth + test_path.length + 1
return nil if total_length > $max_depth
path = test_path
$max_depth = total_length
puts "FOUND PATH of length #{total_length}" if $DEBUG
end
}
if path
path << self
else
nil
end
end
end
end
start_word = ARGV[0] || 'crave'
end_word = ARGV[1] || 'primp'
$max_depth = Integer( ARGV[2] || 10 )
dict = ARGV[3] || '/usr/share/dict/words'
#dict = ARGV[3] || './12dicts-4/2of12inf.txt'
desired_length = start_word.length
unless end_word.length == desired_length
msg = "Error: '#{start_word}' and '#{end_word}' are not the same length"
msg << "(#{start_word.length} vs. #{end_word.length})"
raise msg
end
print "Finding chain between #{start_word} and #{end_word} using #{dict}, "
puts "going no deeper than #{$max_depth}"
# Hash seems to be a hair faster than Set for my purposes
hash_path = "#{dict.gsub( /[^\w\d]/, '_' )}_#{desired_length}.marshall"
# Load/generate words of the right length
avail_words = {}
if File.exists?( hash_path )
File.open( hash_path, 'r' ){ |f|
avail_words = Marshal.load( f )
}
else
File.open( dict, 'r' ){ |f|
w = f.read.split(/[\r\n]+/)
# No capital words, or words ending with % (12dicts)
w.reject!{ |word| word.length != desired_length or /[^a-z]/ =~ word }
w.each{ |word| avail_words[ word ] = true }
}
File.open( hash_path, 'wb' ){ |f|
f << Marshal.dump( avail_words )
}
end
puts "#{avail_words.length} words with #{desired_length} letters"
unless avail_words.include?( end_word )
raise "Error: '#{end_word}' is not included in #{dict}"
end
#Because Math is Hard
$max_depth += 1
start_time = Time.new
if path = start_word.link_to( end_word, avail_words )
path.reverse!
path << end_word
puts path.join( "\n" )
else
puts "(no path found)"
end
end_time = Time.new
puts " ", "#{end_time-start_time} seconds elapsed"