[QUIZ][SOLUTION] Scrabble Stems (#12)

Here's my solution. I used the sowpods wordlist available here:
http://www.isc.ro/commands/lists.html (preprocessed to be all-lowercase,
though it doesn't really matter as long as the case is consistent).

It takes entirely too long to run (12 seconds on my P4/3GHz); I'll poke
at it some more if I get the time and see what's slowing it down.

···

--------------------------------------------------------------------------
$stems = {}
$seen = {}
$cutoff = ARGV[0].to_i

IO.foreach("sowpods.txt") {|word|
  next if word.length != 8
  word.chomp!
  letters = word.split(//).sort
  alpha = letters.join
  next if $seen[alpha]
  $seen[alpha] = true
  remove = letters.uniq
  remove.each {|i|
    stem = alpha.sub(i, '')
    $stems[stem] ||= {}
    $stems[stem][i] = true
  }
}

$stem_length = {}
$stems.each {|k,v| $stems[k] = v.keys.sort}
$stems.reject! {|k,v|
  n = v.length
  $stem_length[k] = n
  n < $cutoff
}

results = $stems.keys.sort_by {|i| -$stem_length[i]}
results.each {|s| p [s, $stem_length[s], $stems[s].join]}

Here's mine ... doesn't seem to fair too badly. Using the YAWL list
(264,061 words), it finds 25 bingo-stems that combine with n >= 20 letters.
(about 16 seconds on a 666MHz).

regards,
andrew

#!/usr/bin/ruby -w

n = ARGV[0].to_i
wfile = ARGV[1]

w7sigs = {}
w6sigs = {}

File.open(wfile,'r') do |f|
  f.each do |line|
    next unless line.size == 8
    line.chomp!
    line.downcase!
    sig = line.unpack('aaaaaaa').sort.join("")
    next if w7sigs[sig]
    w7sigs[sig] = 1
  end
end

w7sigs.each_key do |k|
  7.times do |i|
    ns = k.dup
    ll = ns.slice!(i,1)
    w6sigs[ns] ||= []
    w6sigs[ns] << ll
  end
end

w6sigs.each {|k,v| w6sigs[k].uniq!}
w6sigs.reject!{|k,v|v.size < n}
w6sigs.sort_by{|a|a[1].size}.reverse.each do |it|
  puts "#{it[0]} #{it[1].size}"
end
__END__

Hi,

Nothing special here: http://www.dave.burt.id.au/ruby/scrabble_stems.rb

It's quite ridiculous, actually. It makes a tree of about 45MB out of a
2.6MB input file, so it can tell you a word each stem can make with each
letter you can add to it.

Takes a minute for words above 20 on my machine (1.8GHz) using the
dictionary I mentioned. YAWL or something.

Just thought I might weigh in.

Season's greetings,
Dave

Here's my solution. It runs in 4 seconds on my
3932 bogomips machine, using /usr/share/dict/american-english (96,273 words).

My major optimizations (and I didn't fiddle with minor ones) are limiting the "anagram space" with a standard "anagram" representation, and generating the stems by subtraction from bingos instead of generating all gazillion stems and checking them with all letters.

bingo_stems.rb (1.56 KB)