My solution's algorithm is fairly simple: Take the words, take their pronunciations in IPA or a suitable equivalent, and then find the word with the pronunciation that requires the least amount of changes to turn it into the concatenated pronunciations of the terms (I am indebted to Steve D for the knowledge that this is called the "Levenshtein Distance").
ruby Rebus..rb (B+YOU+TEA+FULL)
BEAUTIFUL
I had plans for performing this on every combination of the synonyms of the given terms, and acquired both a thesaurus and pronunciation dictionary from http://www.dcs.shef.ac.uk/research/ilash/Moby/, but, unfortunately, they both had quite a few more entries than they should have. Apparently, "skim milk" is a synonym for "cheese." Additionally, according to the Moby Dictionary, "contretemps" is pronounced the way it's spelled, and "ember" is not a word, but "embers" is. To use the thesaurus would be to make a program that already takes a minute or so to run take a day instead. For those reasons, as well as the fact that the dictionary contains a lot that it shouldn't have (what's a "Fayme?"), and due to some unintuitive pronunciation differences, my program doesn't use a thesaurus, and gets a lot of things wrong.
ruby Rebus.rb (E+SCENT+SHELLS)
CENTRAL'S
(Yes, the unweighted Levensgtein distance really is lower than to "ESSENTIALS" when using Moby Pronunciation Dictionary.)
ruby Rebus.rb (FAN+TASK+TICK)
FANTASKTIK
Note on my program's input: only expressions within parentheses are evaluated (allowing for multiple evaluation and leaving known terms unchanged in a multi-rebus phrase that must be evaluated), everything must be cnverted to all caps, and operators must be shown. The latter is because I've implemented a crude interpreation of subtraction in rebuses.
ruby Rebus.rb (LOVE-L)
OF
As can be seen, my program's output is regretfully only mediocre. Changes that would greatly improve output include some mechanism to differentiate between using a letter's name and phoneme (i.e.: "aitch" versus the h-sound) (at present the dictionary just uses the name), using a more restrictive dictionary and thesaurus, and modifying the levenshtein_distance method to make changes between similar vowels and consonants have a lower weight than changes between dissimilar vowels and consonants.
But alas, the period for this quiz is ending.
Anyway, here's my code:
#$thesaurus = File.open("mobythes.aur") do |f|
# h = Hash.new
# f.readlines.each do |line|
# words=line.chomp.split(',').map{|s|s.upcase}
# h[words[0]] = words[1..-1]
# end
# h
#end
#This only works if the beginning comments are manually removed
$pronunciations = File.open("cmudict0.3") do |f|
NUMBER_OF_SYMBOLS = 39
h = Hash.new
lines = f.readlines
lines[0...NUMBER_OF_SYMBOLS].each do |line|
h[lines[0...(line =~ /[A-Z]/)]] = line.split[1..-1]
end
lines[NUMBER_OF_SYMBOLS..-1].each do |line|
words = line.split(/\s+/)
h[words.first] = words[1..-1]
end
h
end
$words = $pronunciations.keys
#See Levenshtein distance - Wikipedia
def levenshtein_distance(a, b)
prev_row = (0..b..length).to_a
cur_row = [0] * b.length
1.upto(a.length) do |i|
cur_row[0] = i
1.upto(b.length) do |j|
cost = (a[i-1] == b[j-1]) ? 0 : 1
cur_row[j] = [prev_row[j]+1,
cur_row[j-1]+1,
prev_row[j-1]+cost].min
end
prev_row = cur_row
cur_row = cur_row.dup
end
prev_row.last
end
#def all_synonym_combinations(word_arr)
# if word_arr.size == 1
# ($thesaurus[word_arr.first] || [word_arr.first]).map{|w|[w]}
# else
# next_combs = all_synonym_combinations(word_arr[1..-1])
# next_combs_length = next_combs.flatten.length
# synonyms = ($thesaurus[word_arr[0]] || ) << word_arr[0]
# synonyms = synonyms.inject(){|a,w| a+=[[w]]*next_combs_length}
# synonyms.flatten.zip(next_combs*synonyms.length
# ).map{|a|a.flatten}
# end
#end
rebus = ARGV[0].chomp
expressions = rebus.scan(/(?:\().+?(?:\))/)
expressions.each do |expression|
ops = expression.scan(/[+-]/)
terms = expression.gsub(/[\(\)]/,"").scan(/[^+-]+/)
#all_synonym_combinations(terms).each do |terms|
pronunciation = $pronunciations[terms[0]]
terms[1..-1].each_with_index do |term,idx|
if ops[idx] == "+"
pronunciation += $pronunciations[term]
else
#Set difference is a very crude interpretation of subtraction in rebuses
#(especially considering, in cmudict, a letter's pronunciation is that
# of its name, not its main phoneme)
pronunciation -= $pronunciations[term]
end
end
reps = $words.sort_by{|word|
levenshtein_distance($pronunciations[word],
pronunciation)}
rep = terms.index(reps[0]) ? reps[1] : reps[0]
rebus[expression] = rep
#end
end
puts rebus.gsub("+"," ")
ยทยทยท
____________________________________________________________________________________
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz