Convert text string i.e 'Peter' into integer ID

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for a
method thus can do the computation.

Thanks,
Justus

···

--
Posted via http://www.ruby-forum.com/.

code=0
"Peter".each_byte{|b| code += b} @ will not generate unique tho

···

-----Original Message-----
From: ohlhaver@gmail.com [mailto:ohlhaver@gmail.com]
Sent: Wednesday, November 12, 2008 6:47 PM
To: ruby-talk ML
Subject: Convert text string i.e 'Peter' into integer ID

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for a
method thus can do the computation.

What properties do you want the string:integer relationship to have?
The problem is too unbounded at the moment.

["Peter", "James", "John", "Andrew"].index("Peter")

satisfies the constraints given so far.

Thanks,
Justus

        Hugh

···

On Wed, 12 Nov 2008, Justus Ohlhaver wrote:

There are a lot of easy ways to do the string -> integer part - e.g.

1) def x(s); return 42; end # now you can call x on your string and it will convert it to an integer
2) "Peter"[0]
3) "Peter".object_id
4) "Peter".hash

The integer -> string part is the real problem; it's impossible to do it with methods 1) - 3), with some extra effort (storing the result in a lookup table) it should be possible to accomplish it with 4) unless you have some extra requirements.

The question is, what do you really want to achieve? What properties should the mapping have?

Cheers
Peter

···

On 2008.11.12., at 14:16, Justus Ohlhaver wrote:

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

Does "Peter".hash do what you want?

--Ken

···

On Wed, 12 Nov 2008 08:16:36 -0500, Justus Ohlhaver wrote:

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for a
method thus can do the computation.

Thanks,
Justus

--
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Hugh Sasse wrote:

···

On Wed, 12 Nov 2008, Justus Ohlhaver wrote:

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for a
method thus can do the computation.

What properties do you want the string:integer relationship to have?
The problem is too unbounded at the moment.

["Peter", "James", "John", "Andrew"].index("Peter")

satisfies the constraints given so far.

Thanks,
Justus

        Hugh

Thanks for your help. No I need to turn the string which would usually
be a headline ('Man lands on the moon') into a unique number. The
purpose is to speed up my database queries when checking whether an
entry with the same headline already exists in the db.
Justus

--
Posted via http://www.ruby-forum.com/.

Peter Szinek wrote:

4) "Peter".hash

The integer -> string part is the real problem; it's impossible to do
it with methods 1) - 3), with some extra effort (storing the result in
a lookup table) it should be possible to accomplish it with 4) unless
you have some extra requirements.

Hash values are not unique. Two different strings can have the same hash
value, so you can't get the string back from the hash alone.

HTH,
Sebastian

···

--
NP: In Flames - Lord Hypnos
Jabber: sepp2k@jabber.org
ICQ: 205544826

Ken Bloom wrote:

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for a
method thus can do the computation.

Thanks,
Justus

Does "Peter".hash do what you want?

--Ken

Yes, thanks it seems to do what I need, except for two possible
limitations:

1.Accordinng to Sebastian (above) 'Hash values are not unique. Two
different strings can have the same hash value'

2.It may not serve my original purpose, which is speeding up database
queries.

Thanks again,
Justus

···

On Wed, 12 Nov 2008 08:16:36 -0500, Justus Ohlhaver wrote:

--
Posted via http://www.ruby-forum.com/.

Why does it have to be a number? Your db should already index.

Todd

···

On Wed, Nov 12, 2008 at 8:39 AM, Justus Ohlhaver <ohlhaver@gmail.com> wrote:

Hugh Sasse wrote:

On Wed, 12 Nov 2008, Justus Ohlhaver wrote:

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for a
method thus can do the computation.

What properties do you want the string:integer relationship to have?
The problem is too unbounded at the moment.

["Peter", "James", "John", "Andrew"].index("Peter")

satisfies the constraints given so far.

Thanks,
Justus

        Hugh

Thanks for your help. No I need to turn the string which would usually
be a headline ('Man lands on the moon') into a unique number. The
purpose is to speed up my database queries when checking whether an
entry with the same headline already exists in the db.
Justus

I would do something like this with a md5 hash:

require 'digest/md5'
Digest::MD5.hexdigest('Peter')
# => "6fa95b1427af77b3d769ae9cb853382f"

Regards
Jan

···

Justus Ohlhaver <ohlhaver@gmail.com> wrote:

Hugh Sasse wrote:
Thanks for your help. No I need to turn the string which would usually
be a headline ('Man lands on the moon') into a unique number. The
purpose is to speed up my database queries when checking whether an
entry with the same headline already exists in the db. Justus

Thanks for your help. No I need to turn the string which would usually
be a headline ('Man lands on the moon') into a unique number. The
purpose is to speed up my database queries when checking whether an
entry with the same headline already exists in the db.

If you just want to check for duplicates, just store a SHA1 hash of the string:
(AR example follows, obviously you can use any ORM or plain SQL):

require "sha1";

headline = "Man lands on the moon"
Article.create(:headline => headline, uniq_hash => SHA1.digest(headline))

...
later
....

you want to decide whether new_headline already exists:

Article.create( ... ) unless Article.find_by_uniq_hash(SHA1.digest(new_headline))

HTH,
Peter

You need to generate a SHA1 or MD5 hash from your string.

require 'digest/md5'
d = Digest::MD5.new
d.update("if you believe it, they have put a man on the moon")
uniq = d.hexdigest

···

-----Original Message-----
From: ohlhaver@gmail.com [mailto:ohlhaver@gmail.com]

Thanks for your help. No I need to turn the string which
would usually be a headline ('Man lands on the moon') into a
unique number. The purpose is to speed up my database queries
when checking whether an entry with the same headline already
exists in the db.
Justus

Thanks for your help. No I need to turn the string which would usually
be a headline ('Man lands on the moon') into a unique number. The
purpose is to speed up my database queries when checking whether an
entry with the same headline already exists in the db.
Justus

Viewing your problem from a theoretical point of view, something similar
exists in compression theory, namely arithmetic compression. Of course
that is not convenient for your porpuses.

···

--
Posted via http://www.ruby-forum.com/.

I'm going to suggest what Todd Benson and Rolando Abarca suggested, which
is to just work with strings in the database. Don't bother with computing
some kind of (possibly unique) hash. Use a CREATE INDEX statement to
index the headline field, and you'll probably never notice a speed
difference between your roundabout method and feeding in the string
directly to the database.

--Ken

···

On Wed, 12 Nov 2008 10:22:05 -0500, Justus Ohlhaver wrote:

Ken Bloom wrote:

On Wed, 12 Nov 2008 08:16:36 -0500, Justus Ohlhaver wrote:

Hello,

is there any method to quickly convert a text string such as 'Peter'
into an integer? (And vice versa?)

I am not asking about just changing the format (to_i). I'm looking for
a method thus can do the computation.

Thanks,
Justus

Does "Peter".hash do what you want?

--Ken

Yes, thanks it seems to do what I need, except for two possible
limitations:

1.Accordinng to Sebastian (above) 'Hash values are not unique. Two
different strings can have the same hash value'

2.It may not serve my original purpose, which is speeding up database
queries.

--
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Unless I'm missing something here, strings are just numbers in order.
Why encode/encrypt?

Most db's should handle natural keys.

If absolutely necessary to store as number strings (I can't see why),
look at #pack and #unpack.

Todd

···

On Wed, Nov 12, 2008 at 8:47 AM, Jan Friedrich <janfri.rubyforge@gmail.com> wrote:

Justus Ohlhaver <ohlhaver@gmail.com> wrote:

Hugh Sasse wrote:
Thanks for your help. No I need to turn the string which would usually
be a headline ('Man lands on the moon') into a unique number. The
purpose is to speed up my database queries when checking whether an
entry with the same headline already exists in the db. Justus

I would do something like this with a md5 hash:

require 'digest/md5'
Digest::MD5.hexdigest('Peter')
# => "6fa95b1427af77b3d769ae9cb853382f"

Regards
Jan

Ken Bloom wrote:

1.Accordinng to Sebastian (above) 'Hash values are not unique. Two
different strings can have the same hash value'

2.It may not serve my original purpose, which is speeding up database
queries.

I'm going to suggest what Todd Benson and Rolando Abarca suggested,
which
is to just work with strings in the database. Don't bother with
computing
some kind of (possibly unique) hash. Use a CREATE INDEX statement to
index the headline field, and you'll probably never notice a speed
difference between your roundabout method and feeding in the string
directly to the database.

--Ken

Thanks again for all you help everyone!

I have made one small test already using an additional integer column
instead of the original headline string. To convert the headline string
into an integer value I used the .hash method. The db I'm using is
mysql. Using a very small sample of entries (about 1000) I found
virtually no difference at all in the time it took to check the entire
table for existing entries when comparing using the string column vs.
using the integer column for all searches. If there is any difference in
time it takes it would be less than 1%. Considering that there is an
additional computation (.hash method) being performed when using the
integer column one could maybe assume that the latter - the integer
column - by itself must slightly faster for the database to check. In
any case I am going to stick with the original string column for the
headline field for now.

I will try to optimize the table indexing the headline field as
suggested. One question regarding this: Can this be done from rails or
are these mysql commands ('CREATE INDEX' etc.)?

Thanks again for all the help!
Justus

and found virtually no difference in the time it took to compare about
100,000 rows in MySQL.
using an integer value which was derived using 'headline'.hash

Again, thanks everybody

···

On Wed, 12 Nov 2008 10:22:05 -0500, Justus Ohlhaver wrote:

--
Posted via http://www.ruby-forum.com/.

Todd Benson wrote:

···

On Wed, Nov 12, 2008 at 8:47 AM, Jan Friedrich > <janfri.rubyforge@gmail.com> wrote:

Digest::MD5.hexdigest('Peter')
# => "6fa95b1427af77b3d769ae9cb853382f"

Regards
Jan

Unless I'm missing something here, strings are just numbers in order.
Why encode/encrypt?

Most db's should handle natural keys.

If absolutely necessary to store as number strings (I can't see why),
look at #pack and #unpack.

Todd

Thanks a lot everybody. I'm really impressed by the quick and useful
feedback.

Todd, I was told that searching by integers instead of strings would
speed up performance when using large mysql tables. Is that not so?

Justus
--
Posted via http://www.ruby-forum.com/.

Unless I'm missing something here, strings are just numbers in order.
Why encode/encrypt?

You're right:
'Peter'.to_i(36) # => 42681699

But you will become problems with larger strings if your dbms doesn't
have integers with arbitrary length.

Most db's should handle natural keys.

This is right. I don't adress the db part in my post.

If absolutely necessary to store as number strings (I can't see why),
look at #pack and #unpack.

This would also not work for larger strings (see above).

HTH,
Jan

···

Todd Benson <caduceass@gmail.com> wrote:

I will try to optimize the table indexing the headline field as
suggested. One question regarding this: Can this be done from rails or
are these mysql commands ('CREATE INDEX' etc.)?

Yeah, you can do it from Rails migrations:

http://apidock.com/rails/ActiveRecord/Migration, section 'Available transformations':

add_index(table_name, column_names, options):
Adds a new index with the name of the column. Other options include :name and :unique (e.g. { :name => "users_name_index", :unique => true }).

Cheers,
Peter

To be honest, I know almost nothing about mysql. I will say, however,
that you should try natural keys and see how the performance works
(testing). PostgreSQL, for example, claims you gain no more
performance on any natural key (be it integer, character, otherwise).
The true bottleneck is almost always in the application. But, I don't
know your exact situation.

From what you have said, it seems like you are looking for a primary
key that's unique and fast. Most db's that are set up correctly do a
"behind-the-scenes" lookup for your key; which means that there is an
ID (number) assigned to your element. The search is definitely what
people are concerned about, but having a string turned into a number
won't help you there, unless it's like a password or something.

If you want the string compacted, then follow some of the other suggestions.

hth,
Todd

···

On Wed, Nov 12, 2008 at 9:06 AM, Justus Ohlhaver <ohlhaver@gmail.com> wrote:

Todd Benson wrote:

On Wed, Nov 12, 2008 at 8:47 AM, Jan Friedrich >> <janfri.rubyforge@gmail.com> wrote:

Digest::MD5.hexdigest('Peter')
# => "6fa95b1427af77b3d769ae9cb853382f"

Regards
Jan

Unless I'm missing something here, strings are just numbers in order.
Why encode/encrypt?

Most db's should handle natural keys.

If absolutely necessary to store as number strings (I can't see why),
look at #pack and #unpack.

Thanks a lot everybody. I'm really impressed by the quick and useful
feedback.

Todd, I was told that searching by integers instead of strings would
speed up performance when using large mysql tables. Is that not so?