How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).
Am I making sense? Is there a third party app I would need?
Soundex is a phonetic algorithm for indexing names by their sound when
pronounced in English. The basic aim is for names with the same
pronunciation to be encoded to the same string so that matching can occur
despite minor differences in spelling. Soundex is the most widely known of
all phonetic algorithms and is often used (incorrectly) as a synonym for
"phonetic algorithm".
If you process your database's names into a column of Soundex encodings,
you should be able to search this for an encoded target word. You could
then post-process the results to get the "score" for each one. Note that
you may want to look around for alternative encodings (eg, Metaphone).
-r
···
At 3:37 AM +0900 12/5/05, TomRossi7 wrote:
How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).
this was discussed on both this ML and the Rails ML about a week ago. The
threads have the word 'fuzzy' in the subject.
TomRossi7 wrote:
···
How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).
Am I making sense? Is there a third party app I would need?
from http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
.....
The following example is more complex. The query returns the relevance
values and it also sorts the rows in order of decreasing relevance. To
achieve this result, you should specify MATCH() twice: once in the
SELECT list and once in the WHERE clause. This causes no additional
overhead, because the MySQL optimizer notices that the two MATCH()
calls are identical and invokes the full-text search code only once.
SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
this was discussed on both this ML and the Rails ML about a week ago. The
threads have the word 'fuzzy' in the subject.
TomRossi7 wrote:
How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).
Am I making sense? Is there a third party app I would need?
Thanks,
Tom
I needed to do something similar with my office database. I used the Levenshtein distance. The routine was also
quite useful in finding name differences between my office data base and the billing company's.
···
#------------------------------------------------------
# Determine Levenshtein distance of two strings
def Ld(s,t)
#s string #1 #t string #2
n = s.size
m = t.size
a = Array.new
if n != 0 && m != 0 #2 create array
r = Array.new
rz = Array.new
0.upto(m) {|x| r.push(0)}
0.upto(n) {|x|a.push(r.dup)}
a.each_index {|x| a[0] = x}
0.upto(m) {|x| a[0] = x}
cost = 0
1.upto(n) do |i|
1.upto(m) do |j|
if s[i] == t[j]
cost =0
else
cost = 1
end
a[i][j] = [a[ i- 1][j] +1,a[i][j - 1] + 1,a[i - 1][j - 1] + cost].min
end
end
a[n][m]
else
0
end
end
/How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%)./
Less direct, but maybe useful...
The pyLucene project is the apache java lucene library implemented in C and SWIG'd to python (i think).
You might be able to create ruby bindings for the full text search C library used there.
Monday 05 December 2005 11:10、Ben Stroud さんは書きました:
···
/How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%)./
Less direct, but maybe useful...
The pyLucene project is the apache java lucene library implemented in C
and SWIG'd to python (i think).
You might be able to create ruby bindings for the full text search C
library used there.