String#hash changed in Ruby 1.9?

Hi all,
in ruby 1.8.7:
david@trince ~$ ruby -e 'puts "abc".hash'
833038373
david@trince ~$ ruby -e 'puts "abc".hash'
833038373
david@trince ~$ ruby -e 'puts "abc".hash'
833038373

[always the same number]

in ruby 1.9.1:
david@trince ~$ ruby -e 'puts "abc".hash'
402929305
david@trince ~$ ruby -e 'puts "abc".hash'
-403532784
david@trince ~$ ruby -e 'puts "abc".hash'
-650364342

What happened? Is this intentional? Rationale? Any tips on how to replace it?

in ruby 1.9.1:
david@trince ~$ ruby -e 'puts "abc".hash'
402929305
david@trince ~$ ruby -e 'puts "abc".hash'
-403532784
david@trince ~$ ruby -e 'puts "abc".hash'
-650364342

What happened? Is this intentional?

1.9 uses murmurhash(http://murmurhash.googlepages.com/\) with a random seed
which is generated once per application-run.

Any tips on how to replace it?

What does it hurt if the hash value of a string does not remain constant
between runs of the application?

HTH,
Sebastian

···

Am Montag 04 Mai 2009 16:22:01 schrieb David Palm:

Any tips on how to replace it?

What does it hurt if the hash value of a string does not remain constant
between runs of the application?

In my case it's pretty bad. I use it in a command line utility to cache rake tasks. I create one cachefile for each directory, naming them using the String#hash of the full path (Dir.pwd.hash). If the hash is different the next time the program runs the cache lookup fails (and I get a new cache file instead of the old one).

So, I don't need anything fancy, just an equivalent to Dir.pwd.hash that stay consistent. Do I need to MD5 it? Feels like overkill. Why was this changed in the first place?

Any tips on how to replace it?

What does it hurt if the hash value of a string does not remain constant between runs of the application?

In my case it's pretty bad. I use it in a command line utility to cache rake tasks. I create one cachefile for each directory, naming them using the String#hash of the full path (Dir.pwd.hash). If the hash is different the next time the program runs the cache lookup fails (and I get a new cache file instead of the old one).

Hm... But you do admit that this is a bit abusive, do you? Especially since there are no guarantees that you won't have any collisions with a hash value like the one returned by #hash.

How about storing your cache files with a fixed name in the original directory? Or have a file with metadata (mapping from path to cache file name)?

So, I don't need anything fancy, just an equivalent to Dir.pwd.hash that stay consistent. Do I need to MD5 it? Feels like overkill. Why was this changed in the first place?

That's an interesting question. I'm curious as well. Maybe the changes are just a side effect of a new - supposedly better - hashing algorithm.

Kind regards

  robert

···

On 04.05.2009 16:48, David Palm wrote:

Any tips on how to replace it?

What does it hurt if the hash value of a string does not remain
constant between runs of the application?

In my case it's pretty bad. I use it in a command line utility to
cache rake tasks. I create one cachefile for each directory, naming
them using the String#hash of the full path (Dir.pwd.hash). If the
hash is different the next time the program runs the cache lookup
fails (and I get a new cache file instead of the old one).

Hm... But you do admit that this is a bit abusive, do you?
Especially since there are no guarantees that you won't have any
collisions with a hash value like the one returned by #hash.

Oh yes, it's just a quick and convenient way of doing it. Dunno if I'd call it "abusive", but it's sure not military grade programming...

How about storing your cache files with a fixed name in the original
directory? Or have a file with metadata (mapping from path to cache
file name)?

Fixed name won't work; most directories are scm tracked so it'd be a mess to keep the cache files out of the way. One big(ish) cache file might work. Maybe even a sqlite db. Have to run some benchmark on that.

that stay consistent. Do I need to MD5 it? Feels like overkill. Why
was this changed in the first place?

That's an interesting question. I'm curious as well. Maybe the
changes are just a side effect of a new - supposedly better - hashing
algorithm.

The link sebastian provided (http://murmurhash.googlepages.com/\) was interesting but not exhaustive and I still don't know when/how/why the behaviour was changed. Perhaps the ml archives will tell?

···

On Tue, 5 May 2009 02:15:09 +0900, Robert Klemme wrote:

On 04.05.2009 16:48, David Palm wrote: