Convert text string i.e 'Peter' into integer ID

Thanks a lot everybody. I'm really impressed by the quick and useful
feedback.

Todd, I was told that searching by integers instead of strings would
speed up performance when using large mysql tables. Is that not so?

You should just index the column you need, and if you want that column to be unique, create a unique constraint:

create index <index_name> on <table_name> (<name_of_the_column>)

Something like that should work. To create the unique constraint, look on your database documentation.
Adding a column and getting a hash from a string is going to be slower than having an index at the database level.

Justus

Regards,

···

On Nov 12, 2008, at 12:06 PM, Justus Ohlhaver wrote:
--
Rolando Abarca M.

Justus Ohlhaver wrote:

I was told that searching by integers instead of strings would
speed up performance when using large mysql tables. Is that not so?

This is what's known as "premature optimisation".

The general rule is: build your application first. If it has performance
problems, profile it. Only after profiling to determine where it
*really* is slow, then modify it. Even the most experienced of
architects and programmers often get it wrong, when they rely on their
intuition as to where optimisation actually benefits you.

I'd say that if you have a few hundred thousand rows in a table, and the
column you are searching on is indexed, then I doubt you will get any
noticeable speed improvement searching on an integer rather than a
string column.

However, note that it is common practice in database applications to
have an integer primary key assigned from a sequence.

1 : "Peter"
2 : "James"
3 : "John"
4 : "Andrew"
... etc

In that way, if you happen to know the ID of the row you want already,
you can jump to it by ID. But if you want to search for it by name -
which might return 0, 1 or more results - you can do that efficiently
too as long as that column is indexed.

What you seem to be asking for is to allocate the IDs in such a way that
given the string, you can calculate the ID off-line without performing a
database search. But to avoid the possibility of two strings giving the
same integer, then you would have to use a strong cryptographic hash
like SHA1. This will give you an integer of size 2^160, which is very
large; so large that actually just storing the string (and searching on
it) will likely be more efficient anyway. Furthermore, the integers
themselves will effectively be randomly distributed, rather than a
linear sequence, so the same sort of tree index and lookup will be
required.

···

--
Posted via http://www.ruby-forum.com/\.

Justus Ohlhaver wrote:

Todd, I was told that searching by integers instead of strings would
speed up performance when using large mysql tables. Is that not so?

Well, it depends.

Searching for numbers can be faster only if the numbers are sorted. For
this you would usually have an index. This way, you halfsplit your way
to the number you want.

i.e. Go the middle: is it bigger or smaller? That narrows your search
by half with one check. Then go to the middle of the half it will be in
and repeat.

The problem with headlines is that you are not necessarily searching for
a whole headline.

e.g. searching for "man on the moon" might not find "There is a man on
the moon again." Then, there is case sensitivity which will affect
conversions to numbers often enough.

You could have what many websites have: a key word search. You could
have a list of key words, "man", "moon", etc., in another file and have
that column indexed. Then, you search for keywords and yield all
articles that have all of them. That would be fast and cover a lot of
bases.

IHTH

···

--
Posted via http://www.ruby-forum.com/\.