Detecting similar strings

This should point you in the right direction I think:

http://raa.ruby-lang.org/project/levenshtein/
http://raa.ruby-lang.org/project/soundex/
http://raa.ruby-lang.org/project/metaphone/

The levenshtein algorithm basically gives you the "edit-distance"
between two strings. E.g. the minimum amount of
insertions/replacements/deletions to make the strings identical. It
gives you a pretty good indication on how similar the strings are.

Soundex transforms all strings that are similar into the same 4
character code (which looks something like "E246").

Metaphone is preferred over soundex I believe. It also transforms
similar strings into the same character sequence, but doesn't limit
itself to just 4 characters. That means it works a bit better with
longer strings.

There are probably other algorithms around as well, but I've had pretty
good luck with these three.

Regards,
Helge Elvik

ยทยทยท

-----Original Message-----
From: list-bounce@example.com [mailto:list-bounce@example.com] On Behalf
Of Dylan Markow
Sent: 1. august 2006 09:31
To: ruby-talk ML
Subject: Detecting similar strings

Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.

Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?

--
Posted via http://www.ruby-forum.com/.