In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).
Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
When talking of 6 letter stems, do I have to find all the 6 letter
stems that also have a meaning by themselves? (Like Satire for
example). Or may I find even nonsense stems?
···
On Fri, 17 Dec 2004 22:45:39 +0900, Ruby Quiz <james@grayproductions.net> wrote:
The three rules of Ruby Quiz:
1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.
2. Support Ruby Quiz by submitting ideas as often as you can:
In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).
Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words.
Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
If this quiz is really just about Scrabble, can we assume the input word list will contain only 7-letter words?
If I filter my dictionary (2of4brif.txt) down to just 7-letter words, I can handle this task in less than an hour of wall-clock time on a 2GHz PC. If I allow all words with 7 or more letters, I don't know if I'll be able to finish the processing in under a week's time.
I've certainly learned a few things about Ruby performance tuning along the way.
In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).
Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
Here is a possible (I hope correct) solution. SATIRE combines only with 16
letters according to my dictionary.
File.open(DICT) do |f|
f.each do |word|
word.chomp!
next if word.length != 7
word.downcase!
letters = word.split(//).sort!
uniques = letters.uniq
word = letters.join
uniques.each do |letter|
stem = word.sub(/#{letter}/, "")
(STEMS[stem] ||= {})[letter] = 1
end
end
end
In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).
Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
Here's my solution. My earlier performance problems resulted from several very inefficient attempts at generating unique letter combinations, i.e. list all unique sets of "N letters taken 6 at a time". Reducing the problem to "7 letters taken 6 at a time" made that part trivial, but I kept plugging at the more general problem.
On my 2GHz P4, I get 5319 stems with a cutoff of 3 in 4.062 seconds. A version optimized for 7-letter words (not shown) takes barely over 2 seconds.
If I swap line 47 in for line 48, so that all words of 7 letters or more are processed, I get 167,227 stems with a cutoff of 3 in 2196.453 seconds (~37 min.).
I don't believe the stems meaning or lack thereof is significant.
James Edward Gray II
···
On Dec 18, 2004, at 6:16 AM, Giovanni Intini wrote:
When talking of 6 letter stems, do I have to find all the 6 letter
stems that also have a meaning by themselves? (Like Satire for
example). Or may I find even nonsense stems?
No, all six letter stems count. The "meaningful" rearrangement is just a
handy way to refer to the stem.
martin
···
Giovanni Intini <intinig@gmail.com> wrote:
When talking of 6 letter stems, do I have to find all the 6 letter
stems that also have a meaning by themselves? (Like Satire for
example). Or may I find even nonsense stems?
In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words.
Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
If this quiz is really just about Scrabble, can we assume the input word list will contain only 7-letter words?
I filtered out words less than and more than seven letters.
If I filter my dictionary (2of4brif.txt) down to just 7-letter words, I can handle this task in less than an hour of wall-clock time on a 2GHz PC. If I allow all words with 7 or more letters, I don't know if I'll be able to finish the processing in under a week's time.
I've certainly learned a few things about Ruby performance tuning along the way.
Hmmm, I would say that it doesn't HAVE to take that long. My program runs in under 30 seconds.
There seems to be a problem with german umlauts in words:
rq:16: premature end of regular expression: /üge/ (RegexpError)
from rq:15:in `each'
from rq:15
from rq:8:in `each'
from rq:8
from rq:7:in `open'
from rq:7
Here is a possible (I hope correct) solution. SATIRE combines only with 16
letters according to my dictionary.
Very simmilar to my solution: (it finds 18 combining letters for SATIRE using the ENABLE2K word list)
hash = Hash.new {|h, k| h[k] = 0}
File.foreach(ARGV[0] || 'WORD.LST') do |line|
line.strip!
if line.size == 7
letters = line.downcase.scan(/./).sort.join
7.times do |i|
hash[letters[0, i] + letters[(i+1)..-1]] |= 1 << (letters[i] - 97)
end
end
end
cutoff = (ARGV[1] || '15').to_i
count = {}
hash.each do |k, v|
v = (v & 0x5555555) + ((v>>1) & 0x5555555)
v = (v & 0x3333333) + ((v>>2) & 0x3333333)
v = (v & 0xf0f0f0f) + ((v>>4) & 0xf0f0f0f)
v = (v & 0x0ff00ff) + ((v>>8) & 0x0ff00ff)
v = (v & 0x000ffff) + ((v>>16) & 0x000ffff)
count[k] = v if v >= cutoff
end
count.keys.sort_by {|k| count[k]}.each do |letters|
printf "%s: (%d) ", letters, count[letters]
combi = hash[letters]
26.times do |i|
print((i+97).chr) if combi[i] == 1
end
puts
end
My bad, then - I meant six letter stems that form seven letter words
when combined with at least n out of the 26 letters in the alphabet.
There are seven-to-make-eight stems too, but as James noted, the quiz is
not concerned with them (it's a trivial extension, though).
martin
···
Giovanni Intini <intinig@gmail.com> wrote:
From the original post it looked like you could have anything longer than 6.