Review of code please

Hi all,

I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'

···

---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
# for 4 letter words (length 5 because of \n)

# path to the word file
filename = "/usr/share/dict/words"

# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
  dict_words = []
  all_anagrams = {}

  open(filename).each do |dict_word|
    if dict_word.length == 5 then
      dict_words << dict_word.chomp.strip
    end
  end
  
  wrds = dict_words

  dict_words.each do |wrd|
    current_wrd = wrd.split(//).sort.join

    anagrams = []

    wrds.each do |w|
      if wrd != w then
        if not current_wrd == w.split(//).sort.join then
          next
        end
        anagrams.push(w)
      end
    end
  
    if not anagrams.empty? then
      all_anagrams[wrd] = anagrams
    end
  end

  p all_anagrams

end
---------------------------------------------------------

thanks,

--
Mark

Well!!,

···

On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:

Hi all,

I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'

---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
# for 4 letter words (length 5 because of \n)

# path to the word file
filename = "/usr/share/dict/words"

# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
  dict_words =
  all_anagrams = {}

  open(filename).each do |dict_word|
    if dict_word.length == 5 then
      dict_words << dict_word.chomp.strip
    end
  end
  
  wrds = dict_words

  dict_words.each do |wrd|
    current_wrd = wrd.split(//).sort.join

    anagrams =

    wrds.each do |w|
      if wrd != w then
        if not current_wrd == w.split(//).sort.join then
          next
        end
        anagrams.push(w)
      end
    end
  
    if not anagrams.empty? then
      all_anagrams[wrd] = anagrams
    end
  end

  p all_anagrams

end
---------------------------------------------------------

thanks,

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html

that is exponentially faster than mine and finds all anagrams not just 4
letter words! I just need to understand it now.

---------------------------------------------------------
words = IO.readlines("/usr/share/dict/words")

anagrams = Hash.new()

words.each do |word|
    base = Array.new
    word.chomp!.downcase!

    word.each_byte do |byte|
        base << byte.chr
    end

    base.sort!

    anagrams[base.to_s] |= [word]
end

# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
    b.length <=> a.length
end

File.open("anagrams.txt", "w") do |file|
    values.each do |line|
        file.puts(line.join(", "))
    end
end

largest = anagrams.keys.max do |a, b|
    a.length <=> b.length
end

puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
---------------------------------------------------------

--
Mark

Hi all,

I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'

---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
# for 4 letter words (length 5 because of \n)

# path to the word file
filename = "/usr/share/dict/words"

# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
  dict_words =
  all_anagrams = {}

  open(filename).each do |dict_word|
    if dict_word.length == 5 then
      dict_words << dict_word.chomp.strip
    end
  end

  wrds = dict_words

  dict_words.each do |wrd|
    current_wrd = wrd.split(//).sort.join

    anagrams =

    wrds.each do |w|
      if wrd != w then
        if not current_wrd == w.split(//).sort.join then
          next
        end
        anagrams.push(w)
      end
    end

    if not anagrams.empty? then
      all_anagrams[wrd] = anagrams
    end
  end

  p all_anagrams

end
---------------------------------------------------------

thanks,

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html

that is exponentially faster than mine and finds all anagrams not just 4
letter words! I just need to understand it now.

---------------------------------------------------------
words = IO.readlines("/usr/share/dict/words")

Slurps in the entire dictionary as an array of lines (typically much quicker than reading a line at a time even with buffering).

anagrams = Hash.new()

Build a new Hash where the default value is an empty array

words.each do |word|
    base = Array.new
    word.chomp!.downcase!

    word.each_byte do |byte|
        base << byte.chr
    end

Get the individual bytes in the word (after having stripped any line ending [chomp!] and standardizing the lettercase [downcase!])...

    base.sort!

...and put them in a standard order (by value, the bytes are just integers). This is essentially the same as you were doing with current_wrd (in fact, it might be interesting to see how much this change affects the overall speed).

    anagrams[base.to_s] |= [word]

construct the string representation of the array of byte values to use as the hash key, and then use the array union operator to include the new word.

end

# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
    b.length <=> a.length
end

Largest sets first.

File.open("anagrams.txt", "w") do |file|
    values.each do |line|
        file.puts(line.join(", "))
    end
end

largest = anagrams.keys.max do |a, b|
    a.length <=> b.length
end

puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
---------------------------------------------------------

--
Mark

Does that make more sense to you? (I nearly ignored your message, but your second post showed that you were really interested in understanding, not just trying to get others to do your code review :wink:

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote:

On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:

Here's a map-reduce-ish [1] solution.

I don't say it's better, or faster, or smaller (memory-wise).
It's just, uh, different...

(Sorry, couldn't resist, again... ;])

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://labs.google.com/papers/mapreduce.html

···

----------------------------------------------------------------

WITHOUT THE TOTALS

File.open("anagrams.txt", "w") do |f|
   File.readlines("/usr/share/dict/words").collect do |word|
     word.chomp.downcase
   end.collect do |word|
     [word.scan(/./).sort, word]
   end.inject({}) do |hash, (base, word)|
     (hash[base] ||= []) << word ; hash
   end.values.reject do |set|
     set.length == 1
   end.collect do |set|
     set.sort
   end.sort_by do |set|
     [-set.length, set]
   end.collect do |set|
     set.join("\t")
   end.each do |line|
     f.puts(line)
   end
end

----------------------------------------------------------------

WITH THE TOTALS

sets =
File.readlines("/usr/share/dict/words").collect do |word|
   word.chomp.downcase
end.collect do |word|
   [word.scan(/./).sort, word]
end.inject({}) do |hash, (base, word)|
   (hash[base] ||= []) << word ; hash
end.values.reject do |set|
   set.length == 1
end.collect do |set|
   set.sort
end.sort_by do |set|
   [-set.length, set]
end

largest = sets[0]
longest = sets.inject{|a, b| a[0].length > b[0].length ? a : b}

File.open("anagrams.txt", "w") do |f|
   sets.each do |set|
     f.puts(set.join("\t"))
   end
end

puts "Total: %s" %
[sets.length]
puts "Largest set of anagrams: %s" %
[largest.inspect]
puts "Longest anagrams: %s at %s characters each" %
[longest.inspect, longest[0].length]

----------------------------------------------------------------

Hi Rob,

Hi all,

I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'

---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
# for 4 letter words (length 5 because of \n)

# path to the word file
filename = "/usr/share/dict/words"

# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
  dict_words =
  all_anagrams = {}

  open(filename).each do |dict_word|
    if dict_word.length == 5 then
      dict_words << dict_word.chomp.strip
    end
  end

  wrds = dict_words

  dict_words.each do |wrd|
    current_wrd = wrd.split(//).sort.join

    anagrams =

    wrds.each do |w|
      if wrd != w then
        if not current_wrd == w.split(//).sort.join then
          next
        end
        anagrams.push(w)
      end
    end

    if not anagrams.empty? then
      all_anagrams[wrd] = anagrams
    end
  end

  p all_anagrams

end
---------------------------------------------------------

thanks,

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/index.html

that is exponentially faster than mine and finds all anagrams not
just 4
letter words! I just need to understand it now.

---------------------------------------------------------
words = IO.readlines("/usr/share/dict/words")

Slurps in the entire dictionary as an array of lines (typically much
quicker than reading a line at a time even with buffering).

anagrams = Hash.new()

Build a new Hash where the default value is an empty array

words.each do |word|
    base = Array.new
    word.chomp!.downcase!

    word.each_byte do |byte|
        base << byte.chr
    end

Get the individual bytes in the word (after having stripped any line
ending [chomp!] and standardizing the lettercase [downcase!])...

    base.sort!

..and put them in a standard order (by value, the bytes are just
integers). This is essentially the same as you were doing with
current_wrd (in fact, it might be interesting to see how much this
change affects the overall speed).

    anagrams[base.to_s] |= [word]

construct the string representation of the array of byte values to
use as the hash key, and then use the array union operator to include
the new word.

end

# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
    b.length <=> a.length
end

Largest sets first.

File.open("anagrams.txt", "w") do |file|
    values.each do |line|
        file.puts(line.join(", "))
    end
end

largest = anagrams.keys.max do |a, b|
    a.length <=> b.length
end

puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
---------------------------------------------------------

--
Mark

Does that make more sense to you? (I nearly ignored your message,
but your second post showed that you were really interested in
understanding, not just trying to get others to do your code review :wink:

Excellent, thanks

Although my solution worked I knew it wasn't 'rubyish' and also slow. I
then found the second solution and was proved right!! Erics' solution
looks even more 'adventurous' so I'll have to spend some time with the
Cookbook, Pickaxe, irb and ri to try and get a grasp of it.

I actually benchmarked the different ways to get the words from the dict
soon after finding the second solution and IO.readlines was much quicker.

I thought this would be a good way to learn Ruby. ie find a simple
exercise, have a go and then get opinions on better ways to attacked
it. Like others here, rubyquiz is out of my reach ATM and I found this
'kata(6)' at the pragmatic programmers.

I've spent way too much time reading books and not enough time coding!

PS - your inline comments are very helpful.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

cheers,

···

On Sat, 27 Jan 2007 23:56:42 +0900, Rob Biedenharn wrote:

On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote:

On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:

--
Mark

Hi Erik,

···

On Sun, 28 Jan 2007 03:30:21 +0900, Erik Veenstra wrote:

Here's a map-reduce-ish [1] solution.

I don't say it's better, or faster, or smaller (memory-wise).
It's just, uh, different...

(Sorry, couldn't resist, again... ;])

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://labs.google.com/papers/mapreduce.html

----------------------------------------------------------------

WITHOUT THE TOTALS

File.open("anagrams.txt", "w") do |f|
   File.readlines("/usr/share/dict/words").collect do |word|
     word.chomp.downcase
   end.collect do |word|
     [word.scan(/./).sort, word]
   end.inject({}) do |hash, (base, word)|
     (hash[base] ||= ) << word ; hash
   end.values.reject do |set|
     set.length == 1
   end.collect do |set|
     set.sort
   end.sort_by do |set|
     [-set.length, set]
   end.collect do |set|
     set.join("\t")
   end.each do |line|
     f.puts(line)
   end
end

----------------------------------------------------------------

WITH THE TOTALS

sets =
File.readlines("/usr/share/dict/words").collect do |word|
   word.chomp.downcase
end.collect do |word|
   [word.scan(/./).sort, word]
end.inject({}) do |hash, (base, word)|
   (hash[base] ||= ) << word ; hash
end.values.reject do |set|
   set.length == 1
end.collect do |set|
   set.sort
end.sort_by do |set|
   [-set.length, set]
end

largest = sets[0]
longest = sets.inject{|a, b| a[0].length > b[0].length ? a : b}

File.open("anagrams.txt", "w") do |f|
   sets.each do |set|
     f.puts(set.join("\t"))
   end
end

puts "Total: %s" %
[sets.length]
puts "Largest set of anagrams: %s" %
[largest.inspect]
puts "Longest anagrams: %s at %s characters each" %
[longest.inspect, longest[0].length]

----------------------------------------------------------------

wow! now that's going to take some understanding!
Thanks for the homework over the next few nights :wink:

cheers,

--
Mark

Do you know what a one-to-one mapping is?
It's senseless to use "collect"; use "map" instead.

open( 'anagrams.txt', 'w' ) { |file|
  IO.readlines( 'words' ).inject({}){ |hash,word|
  word.strip!.downcase!
  (hash[word.split(//).sort.join] ||=) << word; hash }.
  values.reject{|list| list.size < 2}.
  map{|list| list.sort}.sort_by{|list| [ -list.size, list ] }.
  each{|list| file.puts list.join( "\t" ) } }

···

On Jan 27, 12:30 pm, "Erik Veenstra" <erikv...@gmail.com> wrote:

Here's a map-reduce-ish [1] solution.

I don't say it's better, or faster, or smaller (memory-wise).
It's just, uh, different...

(Sorry, couldn't resist, again... ;])

gegroet,
Erik V. -http://www.erikveen.dds.nl/

[1]http://labs.google.com/papers/mapreduce.html

----------------------------------------------------------------

WITHOUT THE TOTALS

File.open("anagrams.txt", "w") do |f|
   File.readlines("/usr/share/dict/words").collect do |word|
     word.chomp.downcase
   end.collect do |word|
     [word.scan(/./).sort, word]
   end.inject({}) do |hash, (base, word)|
     (hash[base] ||= ) << word ; hash
   end.values.reject do |set|
     set.length == 1
   end.collect do |set|
     set.sort
   end.sort_by do |set|
     [-set.length, set]
   end.collect do |set|
     set.join("\t")
   end.each do |line|
     f.puts(line)
   end
end

Do you know what a one-to-one mapping is? It's senseless to
use "collect"; use "map" instead.

Do you know they're exactly the same? It's senseless to use
"map" instead of "collect".

gegroet,
Erik V. - http://www.erikveen.dds.nl/

···

----------------------------------------------------------------

$ grep *.c -wie rb_define_method | grep -wie collect -wie map
array.c: rb_define_method(rb_cArray, "collect", rb_ary_collect,
0);
array.c: rb_define_method(rb_cArray, "collect!",
rb_ary_collect_bang, 0);
array.c: rb_define_method(rb_cArray, "map", rb_ary_collect, 0);
array.c: rb_define_method(rb_cArray, "map!", rb_ary_collect_bang,
0);
enum.c: rb_define_method(rb_mEnumerable,"collect", enum_collect,
0);
enum.c: rb_define_method(rb_mEnumerable,"map", enum_collect, 0);

----------------------------------------------------------------