[SOLUTION] Word Chains (#44)

(Daniel Sheppard) #1

Bit slow in bringing my code to market.

My code uses the dijkstra shortest-path algorithym. I originally
implemented by just recursing the tree - it ran reasonably well, but
once I realised that dijkstra applied, I kicked that to the kerb - I
might figure out where I put it (if I kept it?) and post it later. The
Dijkstra algorithym is built as separate module.

The big thing that sped up the program was pre-calculating the list of
"related words" as each word is created, rather than recursing the
entire word list and repeatedly comparing words. Did lots of playing
around with adjusting other things to make it run faster, but everything
that made sense to do seemed to make the thing slower. I'll investigate
some of the counter-intuitive points and discuss seperately.

The script accepts three arguments - start, end and dictionary.
Dictionary can be wildcarded to include multiple files. The linking of
differently lengthed words is implemented - the linking words are
restricted to lie between the lengths of the two ends (I thought it
looked funny to go from duck to rubys via pub).

Running the sahara-cloudy test against the debian british-english
dictionary I managed to pull out a 27 word chain (not 47 as previously
posted):

daniels@daniels ~ $ time ruby dijkstra.rb sahara cloudy british-english
sahara samara tamara tamera tamers tapers papers payers payees paynes
paines paints faints flints clints clinks blinks blinds blonds bloods
floods floors flours flouts clouts clouds cloudy

real 0m22.638s
user 0m22.057s
sys 0m0.052s

Full Code:

···

---------------------------------------------------------

#
# requires the implementation of an each_neighbour method which must
yield the neighbours of an object, and optionally the 'cost' of the
link.
# it doesn't require the full object list - it builds the list as it
goes. Would probably be faster if it had the whole list, but I wanted to
keep the
# code generic
#
module Dijkstra
  class Vertex
    attr_reader :object
    attr_accessor :distance
    def initialize(object, distance=nil)
      @object = object
      @distance = distance
    end
    def <=>(other)
      a,b = distance, other.distance
      return 0 unless a || b
      return -1 unless a
      return 1 unless b
      return a <=> b
    end
  end
  # returns a hash where the keys are all reachable nodes and the
values are the shortest distances to those nodes
  # yields each node and the calculated distance hash as they are
calculated
  def find_distances()
    known = Hash.new
    unknown = Hash.new { |h,k| h[k] = Vertex.new(k)}
    unknown[self] = Vertex.new(self, 0)
    a = b = weight = neighbour = min = nil
    until unknown.empty?
      min = unknown.values.min
      known[min.object] = min
      yield min, known
      min.object.each_neighbour do |neighbour, weight|
        weight = 1 unless weight
        unless known.has_key?(neighbour)
          nd = unknown[neighbour]
          nd.distance = [nd.distance,
min.distance+ weight].min
        end
      end
      unknown.delete(min.object)
    end
    return distances
  end
  # finds the shortest path from the current node to the specified
end_node
  def shortest_path(end_node)
    find_distances() do |found_vertex, known_vertexes|
      if(found_vertex.object == end_node)
        return
shortest_path_internal(found_vertex, known_vertexes)
      end
    end
    return nil
  end
  private
  def shortest_path_internal(end_vertex, known_vertexes)
    path = [end_vertex]
    until path[0].object == self
      closest = min_distance = nil
      path[0].object.each_neighbour do |neighbour,
weight>
        weight ||= 1
        if known_vertexes[neighbour]
          vertex =
known_vertexes[neighbour]
          distance = vertex.distance +
weight
          if min_distance.nil? || distance
< min_distance
            closest = vertex
            min_distance = distance
          end
        end
      end
      path = [closest] + path
    end
    return path.collect {|x| x.object }
  end
end

class Word
  include Dijkstra

  @@similar = Hash.new {|h,k| h[k] = Array.new}

  def initialize(string)
    @string = Word.normalize(string)
    @similar_keys = []
    @string.length.times do |i|
      @similar_keys[i] = String.new(@string)
      @similar_keys[i][i] = "."
    end
    #allow changing word lengths - for some reason this
makes it FASTER for the same-length scenario??
    @similar_keys << @string[0..-2] if @string.length > 1
    @similar_keys << @string
    @similar_keys << @string + "."
    # ---
    @similar_keys.each do |key|
      words = @@similar[key]
      words << self
    end
  end
  def each_neighbour(&block)
    @nieghbours = @similar_keys.each do |key|
      @@similar[key].each {|word| yield word }
    end
  end
  def Word.read_words(word_file, size_range=nil)
    Dir[word_file].each do |filename|
      warn "reading #{filename}" if $DEBUG
      File.open(filename) do |f|
        f.each do |x|
          x = normalize(x)
          Word.new(x) if size_range.nil?

size_range.include?(x.size)

        end
      end
    end
  end
  def <=>(other)
    @string <=> other.to_s
  end
  def to_s
    @string
  end
  private
  def Word.normalize(string)
    string = string.downcase
    string.strip!
    string.gsub!(/[^a-z]/,"")
    string
  end
end

if $0 == __FILE__
  start_word = ARGV[0] || "duck"
  end_word = ARGV[1] || "ruby"
  dictionary = ARGV[2] || "scowl/final/english-words.[0-3]*"
  Word.read_words(dictionary, Range.new(*[start_word.length,
end_word.length].sort))
  start_node = Word.new(start_word)
  end_node = Word.new(end_word)
  shortest_path = start_node.shortest_path(end_node)
  puts shortest_path
end

#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################