[QUIZ] Shirt Reader (#140)

Hi all,

my solution: http://rn86.net/~stevedp/tshirt_reader.tar.gz

For my solution I used a combination of the Metaphone algorithm, pronunciation matching (via CMU pronunciation dictionary [http://www.speech.cs.cmu.edu/cgi-bin/cmudict]),
and the Levenshtein distance algorithm. The input must be words or
numbers which sound out the answer word. It will give back at max 10
words that are possible matches. In most of my test words the correct
match is in the 1st or 2nd place, but a few are in 5th or more.

You must first run prepare_dicts.rb which does some preparation work.

You
need the Text gem installed (gem install text) for the
Metaphone/Levenshtein algorithm. I used ruby inline to re-implement
the Levenshtein algorithm in C (versus the Text gem's pure ruby impl.)
which made it run like 20x faster at least. If you don't have ruby
inline installed it will fall back on the Text gem.

Here's the final list of phrases I was testing with (taken from test/test_tshirt.rb)

    %w[e scent shells] => 'essentials',
    %w[q all if i] => 'qualify',
    %w[fan task tick] => 'fantastic',
    %w[b you tea full] => 'beautiful',
    %w[fun duh mint all] => 'fundamental',
    %w[s cape] => 'escape',
    %w[pan z] => 'pansy',
    %w[n gauge] => 'engage',
    %w[cap tin] => 'captain',
    %w[g rate full] => 'grateful',
    %w[re late shun ship] => 'relationship',
    %w[con grad yeul 8] => 'congratulate',
    %w[2 burr q low sis] => 'tuberculosis',
    %w[my crows cope] => 'microscope',
    %w[add minus ray shun] => 'administration',
    %w[accent you ate it] => 'accentuated',
    %w[add van sing] => 'advancing',
    %w[car knee for us] => 'carnivorous',
    %w[soup or seed] => 'supercede',
    %w[poor 2 bell o] => 'portobello',
    %w[d pen dance] => 'dependence',
    %w[s o tear rick] => 'esoteric',
    %w[4 2 it us] => 'fortuitous',
    %w[4 2 n 8] => 'fortunate',
    %w[4 in R] => 'foreigner',
    %w[naan disk clothes your] => 'nondisclosure',
    %w[Granmda Atika Lee] => 'grammatically',
    %w[a brie vie a shun] => 'abbreviation',
    %w[pheemeeneeneetee] => 'femininity',
    %w[me c c p] => 'mississippi',
    %w[art fork] => 'aardvark',
    %w[liberty giblet] => 'flibbertigibbet',
    %w[zoo key knee] => 'zucchini',
    %w[you'll tight] => 'yuletide',
    %w[Luke I like] => 'lookalike',
    %w[mah deux mah zeal] => 'mademoiselle',
    %w[may gel omen yak] => 'megalomaniac',
    %w[half tell mall eau gist] => 'ophthalmologist',
    %w[whore tea cull your wrist] => 'horticulturist',
    %w[pant oh my m] => 'pantomime',
    %w[tear a ball] => 'terrible',
    %w[a bowl i shun] => 'abolition',
    %w[pre chair] => 'preacher',
    %w[10 s] => 'tennis',
    %w[e z] => 'easy',
    %w[1 door full] => 'wonderful',
    %w[a door] => 'adore',
    %w[hole e] => 'holy',
    %w[grand your] => 'grandeur',
    %w[4 2 5] => 'fortify',
    %w[age, it ate her] => 'agitator',
    %w[tear it or eel] => 'territorial',
    %w[s 1] => 'swan'

- steve

My answer's along the same lines with Metaphone, but nowhere near as good
as steve's:

require 'rubygems'
require 'text'
include Text::Metaphone

#use this to do the double_metaphone as a drop-in replacement for metaphone
def dmetaphone word
  first,second = double_metaphone word
  second || first
end

#this solution gets 3 of the test cases correct if single metaphone is used
#it gets 10 of the test cases correct if double-metaphone is used, but also
#provides a much longer list of wrong answers for everything

···

On Sun, 23 Sep 2007 21:42:30 +0900, steve d wrote:

Hi all,

my solution: http://rn86.net/~stevedp/tshirt_reader.tar.gz

For my solution I used a combination of the Metaphone algorithm,
pronunciation matching (via CMU pronunciation dictionary
[http://www.speech.cs.cmuedu/cgi-bin/cmudict\]), and the Levenshtein
distance algorithm. The input must be words or numbers which sound out
the answer word. It will give back at max 10 words that are possible
matches. In most of my test words the correct match is in the 1st or
2nd place, but a few are in 5th or more.

#
#use this alias to set the particular phonetic conversion algorithm
alias_method :phonetic_convert, :dmetaphone

NUMBERS=Hash.new{|h,k| k}.merge!({"1"=>"one", "2"=>"two", "3"=>"three",
  "4"=>"four","5"=>"five","6"=>"six","7"=>"seven","8"=>"eight","9"=>"nine"})

DICT=open('/usr/share/dict/words') do |f|
  d=Hash.new{|h,k| h[k]=}
  f.each_line do |word|
    word=word.chomp
    d[phonetic_convert(word).gsub(/\s/,'')] << word
  end
  d
end

def rebus words
  words=words.collect{|x| NUMBERS}.join(' ')
  DICT[phonetic_convert(words).gsub(/\s/,'')]
end

#tests given by steve d <oksteve@yahoo.com>
expectations = {
  %w[e scent shells] => 'essentials',
  %w[q all if i] => 'qualify',
  %w[fan task tick] => 'fantastic',
  %w[b you tea full] => 'beautiful',
  %w[fun duh mint all] => 'fundamental',
  %w[s cape] => 'escape',
  %w[pan z] => 'pansy',
  %w[n gauge] => 'engage',
  %w[cap tin] => 'captain',
  %w[g rate full] => 'grateful',
  %w[re late shun ship] => 'relationship',
  %w[con grad yeul 8] => 'congratulate',
  %w[con grad yule 8 shins] => 'congratulations', #from Phrogz
  %w[2 burr q low sis] => 'tuberculosis',
}

expectations.each do |words,target|
  result=rebus(words)
  if result.include?(target)
    printf "%s correctly gave %s.\n", words.inspect, target
  else
    printf "%s incorrect. Expected %s.\n", words.inspect, target
  end
  printf "Metaphone of words: %s Metaphone of target: %s\n",
    phonetic_convert(words.collect{|x| NUMBERS}.join(' ')),
    phonetic_convert(target)
  printf "Matching words %s\n", result.inspect
end

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

I'm wondering if this quiz would work in the opposite direction reasonably well? Given a word (or a sentence), create a series of words that will sound it out. If you stick to using nouns as the sound words, you can pretty reliably get an image for the word using Google's picture search.

Cheers,
  B