[QUIZ] Shirt Reader (#140)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Sun Microsystems handed out some fun shirts at the LSRC. The graphic on the
front looks something like:

  sun image
  
  star image + T + up arrow
  
  E + perfume bottle image + sea shells image

This is a clever advertisement for their "Sun Startup Essentials" program.

This week's Ruby Quiz is to write a program that can read such shirts. At the
basic level, we will assume the user will feed us the right words:

  $ ruby read_shirt.rb e scent shells
  essentially

Hopefully your solution will be smarter than mine, which, as you can see, only
gets close. It may be better to give a few possible choices instead of just the
best match.

Of course, the problem comes when you guess the wrong word. My first thought
was of a perfume bottle, as my description shows, but the correct word was
scent. For bonus points, see what you can do about this problem. One idea
might be to allow alternatives for a segment of the word. Another might be to
hit the thesaurus, which does list scent as a synonym for perfume.

Hi,

Please find my solution at http://pastie.caboo.se/99879\. It also uses
Metaphone and Levenshtein distance from the Text gem (
http://rubyforge.org/projects/text\).

Basically, it first find all words in a dictionary whose Metaphone is
closest (i.e. has minimum Levenshtein distance) to the Metaphone of the
shirt phrase. Among those, it finds the word that itself (not its Metaphone)
is closest to the shirt phrase.

Some examples:

$ ruby ./rq140_shirtreader_rafc.rb may gel omen yak
megalomaniac

Several words can be found in case of a tie (pun not intended):
$ ruby ./rq140_shirtreader_rafc.rb hink
Hank, hank, honk, hunk

It is slow:
$ ruby -d ./rq140_shirtreader_rafc.rb half tell mellow jizzed

···

2007/9/21, Ruby Quiz <james@grayproductions.net>:

Sun Microsystems handed out some fun shirts at the LSRC. The graphic on
the
front looks something like:

        sun image

        star image + T + up arrow

        E + perfume bottle image + sea shells image

This is a clever advertisement for their "Sun Startup Essentials" program.

This week's Ruby Quiz is to write a program that can read such shirts.

----------------------------------------
half tell mellow jizzed => ophthalmologist

It took 18.592967s to find this answer.
ophthalmologist | metaphone distance(HLFTLMLJST, OF0LMLJST) = 3 | phrase
distance(HALFTELLMELLOWJIZZED, OPHTHALMOLOGIST) = 14

And it is not particularly accurate. Here's how it fares on Steve's list:
$ ruby ./rq140_shirtreader_rafc.rb
e scent shells => essentials
q all if i => qualify
fan task tick => fantastic
b you tea full => beautiful, beautifully
fun duh mint all => fundamental, fundamentally
s cape => escape
pan z => Panza, pane's, panes
n gauge => engage
cap tin => captain
g rate full => grateful, gratefully
re late shun ship => relationship
con grad yeul 8 => congratulate
2 burr q low sis => tuberculosis
my crows cope => microscope
add minus ray shun => administration
accent you ate it => accentuated
add van sing => advancing
car knee for us => carnivore's, carnivores, carnivorous
soup or seed => supersede, suppressed
poor 2 bell o => portable
d pen dance => dependence
s o tear rick => esoteric
4 2 it us => fortuitous
4 2 n 8 => fortunate
4 in R => foreigner
naan disk clothes your => underclothes
Granmda Atika Lee => grammatical, grammatically
a brie vie a shun => abbreviation
pheemeeneeneetee => femininity
me c c p => Mississippi
art fork => aardvark
liberty giblet => flibbertigibbet
zoo key knee => sicken
you'll tight => Yuletide, yuletide
Luke I like => lookalike
mah deux mah zeal => mademoiselle
may gel omen yak => megalomaniac
half tell mall eau gist => ophthalmologist
whore tea cull your wrist => horticulturist
pant oh my m => pantomime
tear a ball => Tarbell
a bowl i shun => abolition
pre chair => preacher, preachier
10 s => oneness
e z => Essie, ease
1 door full => wonderful, wonderfully
a door => Adar, adder, adore
hole e => Holley, Hollie, hole
grand your => grandeur
4 2 5 => furtive
age, it ate her => agitator
tear it or eel => territorial
s 1 => Essene

It especially seems to have great difficulties with everything ending in the
/ee/ sound: beautifully, gratefully, zucchini, grammatically, pansy, easy,
holy.

Best regards,
Raf

Here is my solution. Obviously it is not as "well educated" as Steve's one
because it does not use pronunciation dictionary, but it behaves pretty well
with what it has :slight_smile:

I've tried different hashing strategies, one of them to use soundex, but
that one had one less match, though it was able to understand 're late shun
ship' - metaphone believes that relationship is pronounced differently (and
I cannot blame it :slight_smile:

Soundex has a big disadvantage of requiring correct first letter, that's why
I think the metaphone-based solution is closer to what we want.

BTW, I was surprised that text gem includes only symmetrical Levenshtein
distance, asymmetrical one (with different costs of operations) could
probably give better results

require 'rubygems'
require 'text'

include Text::Metaphone
include Text::Levenshtein

expectations = {
    %w[e scent shells] => 'essentials',
    %w[q all if i] => 'qualify',
    %w[fan task tick] => 'fantastic',
    %w[b you tea full] => 'beautiful',
    %w[fun duh mint all] => 'fundamental',
    %w[s cape] => 'escape',
    %w[pan z] => 'pansy',
    %w[n gauge] => 'engage',
    %w[cap tin] => 'captain',
    %w[g rate full] => 'grateful',
    %w[re late shun ship] => 'relationship',
    %w[con grad yeul 8] => 'congratulate',
    %w[2 burr q low sis] => 'tuberculosis',
    %w[my crows cope] => 'microscope',
    %w[add minus ray shun] => 'administration',
    %w[accent you ate it] => 'accentuated',
    %w[add van sing] => 'advancing',
    %w[car knee for us] => 'carnivorous',
    %w[soup or seed] => 'supercede',
    %w[poor 2 bell o] => 'portobello',
    %w[d pen dance] => 'dependence',
    %w[s o tear rick] => 'esoteric',
    %w[4 2 it us] => 'fortuitous',
    %w[4 2 n 8] => 'fortunate',
    %w[4 in R] => 'foreigner',
    %w[naan disk clothes your] => 'nondisclosure',
    %w[Granmda Atika Lee] => 'grammatically',
    %w[a brie vie a shun] => 'abbreviation',
    %w[pheemeeneeneetee] => 'femininity',
    %w[me c c p] => 'mississippi',
    %w[art fork] => 'aardvark',
    %w[liberty giblet] => 'flibbertigibbet',
    %w[zoo key knee] => 'zucchini',
    %w[you'll tight] => 'yuletide',
    %w[Luke I like] => 'lookalike',
    %w[mah deux mah zeal] => 'mademoiselle',
    %w[may gel omen yak] => 'megalomaniac',
    %w[half tell mall eau gist] => 'ophthalmologist',
    %w[whore tea cull your wrist] => 'horticulturist',
    %w[pant oh my m] => 'pantomime',
    %w[tear a ball] => 'terrible',
    %w[a bowl i shun] => 'abolition',
    %w[pre chair] => 'preacher',
    %w[10 s] => 'tennis',
    %w[e z] => 'easy',
    %w[1 door full] => 'wonderful',
    %w[a door] => 'adore',
    %w[hole e] => 'holy',
    %w[grand your] => 'grandeur',
    %w[4 2 5] => 'fortify',
    %w[age, it ate her] => 'agitator',
    %w[tear it or eel] => 'territorial',
    %w[s 1] => 'swan'
}

subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
      'h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'e'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
  2*distance(metaphone(str1),metaphone(str2))+distance(str1,str2)
end

hash=Hash.new{|h,k|h[k]=[]}

File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
  word=w.downcase.delete("^a-z")
  m1,m2=double_metaphone(word)
  hash[m1]<<word
  hash[m2]<<word if m2
end
#make sure that expectations are in the word list
expectations.values.each { |word|
  m1,m2=double_metaphone(word)
  hash[m1]<<word
  hash[m2]<<word if m2
}

inputs=[]
if (ARGV.empty?)
  inputs=expectations.keys
else
  inputs << ARGV
end
inputs.each { |rebus|
  y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
  word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
  m1,m2=double_metaphone(word)
  results=hash[m1]
  results+=hash[m2] if m2
  res=results.uniq.sort{|a,b|
    (metadist(word,a) <=> metadist(word,b)).nonzero? || a.length<=>b.length
  }[0] || 'no clue'
  print "'#{rebus.join(' ')}' => #{res}"
  expected=expectations[rebus]
  print ", expected: #{expected}" if expected
  puts
}

"Eugene Kalenkovich" <rubify@softover.com> wrote in message
news:AK_Ji.6292$TH2.5856@trndny06...

Here is my solution. Obviously it is not as "well educated" as Steve's one
because it does not use pronunciation dictionary, but it behaves pretty
well with what it has :slight_smile:

One more limit of a good education - sometimes it is limited to an education
source. Unfortunately I did not have enough patience to test it on all new
expectations, even single example requires too much time (without ruby
inline), but I know that it will not get at least some part of a new set:

# expectations.rb
$expectations = {
  %w[pro 1 sal] => 'provencal',
  %w[2 thing] => 'toothing',
  %w[r b trash] => 'arbitrage',
  %w[mon 3 l] => 'montreal',
  %w[3 men dose] => 'tremendous',
  %w[mid wind r] => 'midwinter',
  %w[yes tier knight] => 'yesternight',
  %w[mar well s]=>'marvelous',
  %w[vert x]=>'vertex',
  %w[ban l eyes] => 'banalize',
  %w[harm n eyes] => 'harmonize',
  %w[harm o niece east] => 'harmonicist',
  %w[knight in gale] => 'nightingale',
  %w[knee hill east] => 'nihilist',
  %w[mass car pone a] => 'mascarpone',
  %w[cock knee] => 'cockney',
}

···

################################################

And two more variations for my solution.
First - compromise with coarsened hash, balancing decent matching with
decent performance. This one is probably the best I coud get.

Second one - full scan. Allows more concise code and slightly better
matching in cost of performance (but still several time faster than Steve's
without inline).

# Variant 1
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
load 'expectations.rb'

subs={'1'=>'wan','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
      'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'ee'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
  2*distance(metaphone(str1),metaphone(str2))+
  distance(str1,str2)
end

def short_double_metaphone(word)
  m1,m2=double_metaphone(word)
  [m1[0,2],m2 ? m2[0,2] : nil]
end

hash=Hash.new{|h,k|h[k]=}

File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
  word=w.downcase.delete("^a-z")
  m1,m2=short_double_metaphone(word)
  hash[m1]<<word
  hash[m2]<<word if m2
end
$expectations.values.each { |word|
  m1,m2=short_double_metaphone(word)
  hash[m1]<<word
  hash[m2]<<word if m2
}

hash.each_key{|k| hash[k].uniq!}

inputs=
if (ARGV.empty?)
  inputs=$expectations.keys
else
  inputs << ARGV
end

inputs.each { |rebus|
  y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
  word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
  m1,m2=short_double_metaphone(word)
  results=hash[m1]
  results+=hash[m2] if m2 && m2!=m1
  res=results.uniq.sort_by{|a| [metadist(word,a),a.length]}.first(5)
  print "'#{rebus.join(' ')}' => #{res[0]}"
  expected=$expectations[rebus]
  print ", expected '#{expected}' is at position #{res.index(expected)}" if
expected
  puts
}
################################################

# Variant 2
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
load 'expectations.rb'

subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
      'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'ee'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
  2*distance(metaphone(str1),metaphone(str2))+
  distance(str1,str2)
end

words = (File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word|
word.downcase.delete("^a-z")}+$expectations.values).uniq

inputs=
if (ARGV.empty?)
  inputs=$expectations.keys
else
  inputs << ARGV
end

inputs.each { |rebus|
  y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
  word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
  res=words.sort_by{ |a| [metadist(word,a),a.length] }.first(5)
  print "'#{rebus.join(' ')}' => #{res[0]}"
  expected=$expectations[rebus]
  print ", expected '#{expected}' is at position #{res.index(expected)}" if
expected
  puts
}

################################################