Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.
Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?
Here's a Perl module that does something similar. You might try
porting it over to Ruby.
Farrel
···
On 01/08/06, Dylan Markow <dylan@dylanmarkow.com> wrote:
Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.
Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?
sender: "Dylan Markow" date: "Tue, Aug 01, 2006 at 03:25:59PM +0900" <<<EOQ
Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.
Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?
Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
Interesting your name is almost "Markov" ;-}
Anyway, besides algorithms mentioned here, I have notes on Double
Metaphone, NYSIIS, Phonex. For sequence analysis in general, google
McIlroy-Hunt, Ratcliff/Obershelp:
And here is a nice description of the Levenstein Distance:
Farrel
···
On 01/08/06, Farrel Lifson <farrel.lifson@gmail.com> wrote:
On 01/08/06, Dylan Markow <dylan@dylanmarkow.com> wrote:
> Is there a way to take two strings, and decide if they are "similar."
> I'm creating a contact system in Rails, and am having a large problem
> with my users punching in duplicate entries with the last names spelled
> slightly different.
>
> Is there a way to check if 2 strings are "identical" up to a certain
> percentage, such as only having 1 or 2 characters different?
sender: "Dylan Markow" date: "Tue, Aug 01, 2006 at 03:25:59PM +0900"
<<<EOQ
Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names
spelled
slightly different.
Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?
You know, there are lots of implementations there, but the Ruby one
seems to be missing [There's no reason to restrict it to working on
strings. If you duck, it'll work just as nicely on arrays of what have
you.]
···
On 01/08/06, Farrel Lifson <farrel.lifson@gmail.com> wrote:
If the names might be from different languages I would rather use
Levenstein than soundex. Levenstein is probably good at describing
typos as "very close" but soundex might be somewhat language specific.
But I haven't tried so I might be wrong.
Thanks
Michal
···
On 8/1/06, benjohn@fysh.org <benjohn@fysh.org> wrote:
>>>> sender: "Dylan Markow" date: "Tue, Aug 01, 2006 at 03:25:59PM +0900"
>>>> <<<EOQ
>> Is there a way to take two strings, and decide if they are "similar."
>> I'm creating a contact system in Rails, and am having a large problem
>> with my users punching in duplicate entries with the last names
>> spelled
>> slightly different.
>>
>> Is there a way to check if 2 strings are "identical" up to a certain
>> percentage, such as only having 1 or 2 characters different?
> Looks like there is a soundex implementation for Ruby:
> http://raa.ruby-lang.org/search.rhtml?search=soundex
>
> If by any chance you are using MySQL you could use the soundex function
> builtin into it as well.