Address search algorithms

Heya all,
Got an interesting little problem, not specifically Ruby, but I hope to
solve it with some nice ruby :slight_smile:

I need to be able to do searches on a data set of a street address.

The address has three fields, what is needed is an efficient way to search
this address with partial data.

The addresses are in a database, what I am thinking is going through and pre
generating a search string based on the three concatenated address values
and then index this search column and search off that using the output from
the same algorithm on the entered search values.

Would full text search be the best solution here? The DB is on postgres by
the well.

The problem is that someone might want to search for "back st" and expect to
match "123 back street", or worse, "123 back lane" and get "123 back street"

Anyone else run into these sorts of problems, any feed back?

Mikel

This gem might help:

ziplookup (1.0.0)
聽聽聽聽Classes for standardizing addresses via the USPS web site.

Heya all,
Got an interesting little problem, not specifically Ruby, but I hope to
solve it with some nice ruby :slight_smile:

I need to be able to do searches on a data set of a street address.

The address has three fields, what is needed is an efficient way to search
this address with partial data.

Assuming these are UK-only addresses, at the very least you should have four fields:

* building name/number;
* street;
* town;
* postcode

and the national postcode database allows for seven if I remember correctly - although it's usual for a couple of those to be blank. Depending on how well normalised you want the database it can be a good idea to split out the last three into their own tables and only store keys in your main address table. You'll get the biggest gain from doing that with towns, but streets are normally a good gain as well. Oh, and obviously use indexes.

The addresses are in a database, what I am thinking is going through and pre
generating a search string based on the three concatenated address values
and then index this search column and search off that using the output from
the same algorithm on the entered search values.

Would full text search be the best solution here? The DB is on postgres by
the well.

The problem is that someone might want to search for "back st" and expect to
match "123 back street", or worse, "123 back lane" and get "123 back street"

* Keep a dictionary of well known abbreviations and massage the query to expand these;
* store the street number in its own column (which won't necessarily be numeric thanks to all the Flat 1's and 22B's lurking in the system, not to mention named properties without a number etc.);
* store the postcode in its own column and filter by postcode whenever possible;
* use a soundex algorithm to store all words as phonetic approximations to deal with simple misspellings (bock vs. back, bak vs. back);
* AND MOST IMPORTANTLY: constrain the way the user can enter a query so that it keeps the query as simple as possible;
* use whatever full text search facility postgres has

Anyone else run into these sorts of problems, any feed back?

Many many years ago :slight_smile:
Out of interest, are you using the postcode database to generate your address information, or is it user-submitted data? If the latter, make sure you constrain how data can be submitted. And either way, do as much pre-processing as possible to get the addresses into a normalised form.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

路路路

On 30 May 2008, at 10:09, Mikel Lindsaar wrote:
----
raise ArgumentError unless @reality.responds_to? :reason

Many many years ago :slight_smile:

Thanks all for your tips!

I have already implemented most of them... but some of them are also good
ideas.

Out of interest, are you using the postcode database to generate your

address

information, or is it user-submitted data? If the latter, make sure you

constrain

how data can be submitted. And either way, do as much pre-processing as
possible to get the addresses into a normalised form.

This is pre-existing data... the worst kind :slight_smile:

Mikel

You have my sympathy :wink:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

路路路

On 30 May 2008, at 13:08, Mikel Lindsaar wrote:

This is pre-existing data... the worst kind :slight_smile:

----
raise ArgumentError unless @reality.responds_to? :reason