[QUIZ] Shirt Reader (#140)

James_Koppel1 · 27 September 2007 03:13

My solution's algorithm is fairly simple: Take the words, take their pronunciations in IPA or a suitable equivalent, and then find the word with the pronunciation that requires the least amount of changes to turn it into the concatenated pronunciations of the terms (I am indebted to Steve D for the knowledge that this is called the "Levenshtein Distance").

ruby Rebus..rb (B+YOU+TEA+FULL)

BEAUTIFUL

I had plans for performing this on every combination of the synonyms of the given terms, and acquired both a thesaurus and pronunciation dictionary from http://www.dcs.shef.ac.uk/research/ilash/Moby/, but, unfortunately, they both had quite a few more entries than they should have. Apparently, "skim milk" is a synonym for "cheese." Additionally, according to the Moby Dictionary, "contretemps" is pronounced the way it's spelled, and "ember" is not a word, but "embers" is. To use the thesaurus would be to make a program that already takes a minute or so to run take a day instead. For those reasons, as well as the fact that the dictionary contains a lot that it shouldn't have (what's a "Fayme?"), and due to some unintuitive pronunciation differences, my program doesn't use a thesaurus, and gets a lot of things wrong.

ruby Rebus.rb (E+SCENT+SHELLS)

CENTRAL'S

(Yes, the unweighted Levensgtein distance really is lower than to "ESSENTIALS" when using Moby Pronunciation Dictionary.)

ruby Rebus.rb (FAN+TASK+TICK)

FANTASKTIK

Note on my program's input: only expressions within parentheses are evaluated (allowing for multiple evaluation and leaving known terms unchanged in a multi-rebus phrase that must be evaluated), everything must be cnverted to all caps, and operators must be shown. The latter is because I've implemented a crude interpreation of subtraction in rebuses.

ruby Rebus.rb (LOVE-L)

OF

As can be seen, my program's output is regretfully only mediocre. Changes that would greatly improve output include some mechanism to differentiate between using a letter's name and phoneme (i.e.: "aitch" versus the h-sound) (at present the dictionary just uses the name), using a more restrictive dictionary and thesaurus, and modifying the levenshtein_distance method to make changes between similar vowels and consonants have a lower weight than changes between dissimilar vowels and consonants.

But alas, the period for this quiz is ending.

Anyway, here's my code:

#$thesaurus = File.open("mobythes.aur") do |f|
# h = Hash.new
# f.readlines.each do |line|
# words=line.chomp.split(',').map{|s|s.upcase}
# h[words[0]] = words[1..-1]
# end
# h
#end

#This only works if the beginning comments are manually removed
$pronunciations = File.open("cmudict0.3") do |f|
  NUMBER_OF_SYMBOLS = 39

  h = Hash.new

  lines = f.readlines
  lines[0...NUMBER_OF_SYMBOLS].each do |line|
    h[lines[0...(line =~ /[A-Z]/)]] = line.split[1..-1]
  end
  lines[NUMBER_OF_SYMBOLS..-1].each do |line|
    words = line.split(/\s+/)
    h[words.first] = words[1..-1]
  end
  h
end

$words = $pronunciations.keys

#See Levenshtein distance - Wikipedia
def levenshtein_distance(a, b)
  prev_row = (0..b..length).to_a
  cur_row = [0] * b.length
  1.upto(a.length) do |i|
    cur_row[0] = i
    1.upto(b.length) do |j|
      cost = (a[i-1] == b[j-1]) ? 0 : 1
      cur_row[j] = [prev_row[j]+1,
                          cur_row[j-1]+1,
                          prev_row[j-1]+cost].min
                        end
    prev_row = cur_row
    cur_row = cur_row.dup
  end
  prev_row.last
end

#def all_synonym_combinations(word_arr)
# if word_arr.size == 1
# ($thesaurus[word_arr.first] || [word_arr.first]).map{|w|[w]}
# else
# next_combs = all_synonym_combinations(word_arr[1..-1])
# next_combs_length = next_combs.flatten.length
# synonyms = ($thesaurus[word_arr[0]] || ) << word_arr[0]
# synonyms = synonyms.inject(){|a,w| a+=[[w]]*next_combs_length}
# synonyms.flatten.zip(next_combs*synonyms.length
# ).map{|a|a.flatten}
# end
#end

rebus = ARGV[0].chomp

expressions = rebus.scan(/(?:$).+?(?:$)/)

expressions.each do |expression|

  ops = expression.scan(/[+-]/)
  terms = expression.gsub(/[]/,"").scan(/[^+-]+/)
  #all_synonym_combinations(terms).each do |terms|
    pronunciation = $pronunciations[terms[0]]
    terms[1..-1].each_with_index do |term,idx|
      if ops[idx] == "+"
        pronunciation += $pronunciations[term]
      else
        #Set difference is a very crude interpretation of subtraction in rebuses
        #(especially considering, in cmudict, a letter's pronunciation is that
        # of its name, not its main phoneme)
        pronunciation -= $pronunciations[term]
      end
    end
    reps = $words.sort_by{|word|
      levenshtein_distance($pronunciations[word],
        pronunciation)}
    rep = terms.index(reps[0]) ? reps[1] : reps[0]
    rebus[expression] = rep
  #end
end

puts rebus.gsub("+"," ")

···

____________________________________________________________________________________
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz

Topic		Replies	Views
[QUIZ] Shirt Reader (#140) ruby-talk	2	108	25 September 2007
[SUMMARY] Shirt Reader (#140) ruby-talk	1	73	27 September 2007
[QUIZ] Shirt Reader (#140) ruby-talk	3	91	26 September 2007
Levenshtein_distance and recreate the string ruby-talk	11	197	1 May 2008
Mini quiz (Was: Detecting similar strings) ruby-talk	1	120	1 August 2006

[QUIZ] Shirt Reader (#140)

Related topics