Hi all,
I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'
---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
# for 4 letter words (length 5 because of \n)
# path to the word file
filename = "/usr/share/dict/words"
# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
dict_words =
all_anagrams = {}
open(filename).each do |dict_word|
if dict_word.length == 5 then
dict_words << dict_word.chomp.strip
end
end
wrds = dict_words
dict_words.each do |wrd|
current_wrd = wrd.split(//).sort.join
anagrams =
wrds.each do |w|
if wrd != w then
if not current_wrd == w.split(//).sort.join then
next
end
anagrams.push(w)
end
end
if not anagrams.empty? then
all_anagrams[wrd] = anagrams
end
end
p all_anagrams
end
---------------------------------------------------------
thanks,
I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html
that is exponentially faster than mine and finds all anagrams not just 4
letter words! I just need to understand it now.
---------------------------------------------------------
words = IO.readlines("/usr/share/dict/words")
Slurps in the entire dictionary as an array of lines (typically much quicker than reading a line at a time even with buffering).
anagrams = Hash.new()
Build a new Hash where the default value is an empty array
words.each do |word|
base = Array.new
word.chomp!.downcase!
word.each_byte do |byte|
base << byte.chr
end
Get the individual bytes in the word (after having stripped any line ending [chomp!] and standardizing the lettercase [downcase!])...
base.sort!
...and put them in a standard order (by value, the bytes are just integers). This is essentially the same as you were doing with current_wrd (in fact, it might be interesting to see how much this change affects the overall speed).
anagrams[base.to_s] |= [word]
construct the string representation of the array of byte values to use as the hash key, and then use the array union operator to include the new word.
end
# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}
values = anagrams.values.sort do |a, b|
b.length <=> a.length
end
Largest sets first.
File.open("anagrams.txt", "w") do |file|
values.each do |line|
file.puts(line.join(", "))
end
end
largest = anagrams.keys.max do |a, b|
a.length <=> b.length
end
puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
---------------------------------------------------------
--
Mark
Does that make more sense to you? (I nearly ignored your message, but your second post showed that you were really interested in understanding, not just trying to get others to do your code review
-Rob
Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
···
On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote:
On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote: