What’s the standard way of implementing #hash for value objects in Ruby?

Hi!

What’s the standard way of implementing #hash for value objects in Ruby?

For

class A
  def initialize(a, b, c)
    @a, @b, @c = a, b, c
  end
end

is it

def A.hash
  self.class.hash ^ @a.hash ^ @b.hash ^ @c.hash
end

or

def A.hash
  self.class.hash ^ [@a, @b, @c].hash
end

or is it something else?

No classes in the standard library use self.class.hash, but I think it
makes sense to use it.

I don't know how good this is, but if your objects have a unique string
representation, you could use example_object.to_s.hash. I'm sure there
must be more efficient solutions, but this solution has the advantage
that it's pretty easy to implement and should reduce the possibility of
collisions if you use your objects as keys in a Hash.

-Jeremy

···

On 12/29/2011 08:10 AM, Nikolai Weibull wrote:

Hi!

What’s the standard way of implementing #hash for value objects in Ruby?

I think you can't access instance variables from a class method, so
both methods are kinda useless.

···

2011/12/29 Nikolai Weibull <now@bitwi.se>:

Hi!

What’s the standard way of implementing #hash for value objects in Ruby?

For

class A
def initialize(a, b, c)
@a, @b, @c = a, b, c
end
end

is it

def A.hash
self.class.hash ^ @a.hash ^ @b.hash ^ @c.hash
end

or

def A.hash
self.class.hash ^ [@a, @b, @c].hash
end

or is it something else?

No classes in the standard library use self.class.hash, but I think it
makes sense to use it.

Ugh, that should of course be

class A
  def hash
     …
  end
end

···

On Thu, Dec 29, 2011 at 15:52, Gunther Diemant <g.diemant@gmx.net> wrote:

I think you can't access instance variables from a class method, so
both methods are kinda useless.

Easy way:

class A
  def hash
    [@a, @b, @c].hash
  end
end

Easiest:

# also gives you == and eql?
A = Struct.new :a, :b, :c

Kind regards

robert

···

On Thu, Dec 29, 2011 at 4:06 PM, Nikolai Weibull <now@bitwi.se> wrote:

On Thu, Dec 29, 2011 at 15:52, Gunther Diemant <g.diemant@gmx.net> wrote:

I think you can't access instance variables from a class method, so
both methods are kinda useless.

Ugh, that should of course be

class A
def hash

end
end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

If there isn’t one, perhaps one should be agreed upon?

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Another is the XOR of the hash value of the object’s class and the
hash value of an Array containing its instance variables.

···

On Fri, Dec 30, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:

Easy way:

Hi Nikolai,

Writing a good hash function is actually language neutral. Some
literature suggested the following form:

    p = <a big prime number> # e.g. p = 31
    hash = 1
    for each object you will use in equal do
       hash = hash * p + object.hash
    end

I am not sure if this fits your desire of a ruby-flavor hash
implementation, but I do use this consistently in all my Java
projects.

···

On Fri, Dec 30, 2011 at 7:47 AM, Nikolai Weibull <now@bitwi.se> wrote:

On Fri, Dec 30, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:

Easy way:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

If there isn’t one, perhaps one should be agreed upon?

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Another is the XOR of the hash value of the object’s class and the
hash value of an Array containing its instance variables.

If there is one, I don't know what it is, which implies there isn't one.

I usually just delegate to one of my attributes' hash methods (e.g. for a
user, I might use its user's name's hash). I'm not sure what advantage
would be gained by establishing a standard.

···

On Thu, Dec 29, 2011 at 5:47 PM, Nikolai Weibull <now@bitwi.se> wrote:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

If there isn’t one, perhaps one should be agreed upon?

Easy way:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

I have covered this a bit earlier:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

Boils down to that the hash code should be derived from hash codes of
all fields used as key fields in determining equivalence.

If there isn’t one, perhaps one should be agreed upon?

Why do you think a standard is needed? There are some well known
requirements for good hash codes and how they are calculated and since
most of the time Hash keys are from the same class whatever algorithm
chosen works.

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Not good, because then the same value in different fields has
identical influence on the hash code. Consequently likelihood of
collisions increases and thus performance of Hash storage and
retrieval may decrease.

Another is the XOR of the hash value of the object’s class and the
hash value of an Array containing its instance variables.

Better because that considers position.

But still, I don't see the need. Note also that a proper Hash key
usually should be immutable because changing them causes all sorts of
trouble if not done carefully. And most of the time Hash keys are
String, Symbol, Fixnum and the like - which all do have their #hash
implementation already.

Kind regards

robert

···

On Fri, Dec 30, 2011 at 12:47 AM, Nikolai Weibull <now@bitwi.se> wrote:

On Fri, Dec 30, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

You should be doing the same attributes you use in an equality test. I usually throw all the attributes in an array and ask for it's hash.

···

On Dec 29, 2011, at 21:25 , Josh Cheek wrote:

On Thu, Dec 29, 2011 at 5:47 PM, Nikolai Weibull <now@bitwi.se> wrote:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

If there isn’t one, perhaps one should be agreed upon?

If there is one, I don't know what it is, which implies there isn't one.

I usually just delegate to one of my attributes' hash methods (e.g. for a
user, I might use its user's name's hash). I'm not sure what advantage
would be gained by establishing a standard.

So, again, what’s the standard way of implementing #hash for value objects in Ruby?

If there is one, I don't know what it is, which implies there isn't one.

That’s a very bold statement to make, considering that you don’t seem
to know how #hash should be implemented:

I usually just delegate to one of my attributes' hash methods (e.g. for a
user, I might use its user's name's hash). I'm not sure what advantage
would be gained by establishing a standard.

If there was a standard way, then no one would have to ask if there
was one. It would make implementing #hash, which you should/must do
if you implement #==, trivial, as there’s then only one way to do so.
It would make hashing conflicts less likely, as the correct set of
parameters would be used, whatever the object, and a uniform
distribution could be established across different classes of objects.

Perhaps Ruby’s internal hashing functions (rb_hash_start,
rb_hash_uint, rb_hash_end) could be exposed in some new object?

Note that these functions are currently not being used in a uniform
way internally. For example, Struct starts with the #hash of the
object’s class, while Array starts with #length.

···

On Fri, Dec 30, 2011 at 06:25, Josh Cheek <josh.cheek@gmail.com> wrote:

On Thu, Dec 29, 2011 at 5:47 PM, Nikolai Weibull <now@bitwi.se> wrote:

I found this recent discussion interesting in a somewhat related
manner, dealing with hash tables:
http://groups.google.com/group/nodejs/browse_thread/thread/d34ed2ec3526db5a

···

On Fri, Dec 30, 2011 at 9:02 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

On Fri, Dec 30, 2011 at 12:47 AM, Nikolai Weibull <now@bitwi.se> wrote:

On Fri, Dec 30, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:

Easy way:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

I have covered this a bit earlier:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

Boils down to that the hash code should be derived from hash codes of
all fields used as key fields in determining equivalence.

If there isn’t one, perhaps one should be agreed upon?

Why do you think a standard is needed? There are some well known
requirements for good hash codes and how they are calculated and since
most of the time Hash keys are from the same class whatever algorithm
chosen works.

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Not good, because then the same value in different fields has
identical influence on the hash code. Consequently likelihood of
collisions increases and thus performance of Hash storage and
retrieval may decrease.

Another is the XOR of the hash value of the object’s class and the
hash value of an Array containing its instance variables.

Better because that considers position.

But still, I don't see the need. Note also that a proper Hash key
usually should be immutable because changing them causes all sorts of
trouble if not done carefully. And most of the time Hash keys are
String, Symbol, Fixnum and the like - which all do have their #hash
implementation already.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

I have covered this a bit earlier:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

Boils down to that the hash code should be derived from hash codes of
all fields used as key fields in determining equivalence.

Precisely. And how do you get a single value from the hash values of
those fields?

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Not good, because then the same value in different fields has
identical influence on the hash code. Consequently likelihood of
collisions increases and thus performance of Hash storage and
retrieval may decrease.

This is exactly what you suggest in your article, with the addition
(or rather XOR) of the hash value of the object’s class.

But still, I don't see the need. Note also that a proper Hash key
usually should be immutable because changing them causes all sorts of
trouble if not done carefully.

Hence the use of “value object” in my question.

And most of the time Hash keys are
String, Symbol, Fixnum and the like - which all do have their #hash
implementation already.

But how do you combine them? That’s what this whole thread has been
about. I realize now that I made quite a few assumptions about what I
thought readers would understand from my original question that may
not have been as obvious to them as they were to me. First, let me
apologize for this lack of clarity. Second, let me rephrase my
question and add some additional context and examples:

What algorithm should one employ in the calculation of the hash value
of an arbitrary value object?

As an example, what algorithm should one employ in the calculation of
the hash value of an immutable class A containing three immutable
instance variables @a, @b, @c that contain, respectively, a String, a
Fixnum, and a Symbol and that are all used in the calculation of #==?

The semi-standard solutions seem to be

class A
  def hash
    @a ^ @b ^ @c
  end
end

and

class A
  def hash
    [@a, @b, @c].hash
  end
end

These are the two main implementations in the Standard Library,
anyway. The first is also the solution proposed by Robert Klemme in

http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

I would claim that the algorithm should take the class of the object
into account as well, both for consistency with #== (which should
check equality of the classes of the objects being compared) and for
added entropy.

Internally, Ruby (primarily) uses three C functions for the
calculation of combined hash values, namely rb_hash_start,
rb_hash_uint, and rb_hash_end. As an example, the hash value of a
Struct is calculated (in Ruby with these three functions wrapped in an
imaginary module C) as

class Struct
  def hash
    C.rb_hash_end(reduce(C.rb_hash_start(self.class.hash)){ |h, v|
C.rb_hash_uint(h, v.hash) })
  end
end

Might it be useful to have Ruby expose a way to perform this
calculation from the Ruby realm so that other classes may employ this
algorithm?

···

On Fri, Dec 30, 2011 at 15:02, Robert Klemme <shortcutter@googlemail.com> wrote:

On Fri, Dec 30, 2011 at 12:47 AM, Nikolai Weibull <now@bitwi.se> wrote:

Maybe you will be interested in reading "Introduction to Algorithms".
There is a whole chapter discussing how to implement hash tables
including criteria of good hash functions.

···

On Fri, Dec 30, 2011 at 5:09 PM, Nikolai Weibull <now@bitwi.se> wrote:

On Fri, Dec 30, 2011 at 06:25, Josh Cheek <josh.cheek@gmail.com> wrote:

On Thu, Dec 29, 2011 at 5:47 PM, Nikolai Weibull <now@bitwi.se> wrote:

So, again, what’s the standard way of implementing #hash for value objects in Ruby?

If there is one, I don't know what it is, which implies there isn't one.

That’s a very bold statement to make, considering that you don’t seem
to know how #hash should be implemented:

I usually just delegate to one of my attributes' hash methods (e.g. for a
user, I might use its user's name's hash). I'm not sure what advantage
would be gained by establishing a standard.

If there was a standard way, then no one would have to ask if there
was one. It would make implementing #hash, which you should/must do
if you implement #==, trivial, as there’s then only one way to do so.
It would make hashing conflicts less likely, as the correct set of
parameters would be used, whatever the object, and a uniform
distribution could be established across different classes of objects.

Perhaps Ruby’s internal hashing functions (rb_hash_start,
rb_hash_uint, rb_hash_end) could be exposed in some new object?

Note that these functions are currently not being used in a uniform
way internally. For example, Struct starts with the #hash of the
object’s class, while Array starts with #length.

>> So, again, what’s the standard way of implementing #hash for value
objects in Ruby?

> If there is one, I don't know what it is, which implies there isn't one.

That’s a very bold statement to make, considering that you don’t seem
to know how #hash should be implemented:

Not sure what gives this impression, while learning algorithms I've
implemented a couple different hash algorithms (granted the keys were
always strings).

I usually just delegate to one of my attributes' hash methods (e.g. for a
> user, I might use its user's name's hash). I'm not sure what advantage
> would be gained by establishing a standard.

If there was a standard way, then no one would have to ask if there
was one.

That would change the answer, not the question.

It would make implementing #hash, which you should/must do

if you implement #==, trivial, as there’s then only one way to do so.

Delegating to the most relevant attribute still seems more trivial.

It would make hashing conflicts less likely, as the correct set of

parameters would be used, whatever the object, and a uniform
distribution could be established across different classes of objects.

It still seems that delegating to an attribute like an id or a string which
is likely unique, would yield sufficient uniqueness, and probably be more
performant as it would require fewer calculations. You may be more likely
to have a collision, but I'd expect that the likelihood of this is not
increased much, and the benefit saved on calculations and on complexity of
implementation would more than make up for it.

I suppose one reason I take this view could be that the only viable
scenarios I can think of for making some arbitrary object into a hash key
are for sets and Array#uniq. For me, these scenarios are exceedingly rare,
and have always been trivially replaced with alternative keys.

···

On Fri, Dec 30, 2011 at 3:09 AM, Nikolai Weibull <now@bitwi.se> wrote:

On Fri, Dec 30, 2011 at 06:25, Josh Cheek <josh.cheek@gmail.com> wrote:
> On Thu, Dec 29, 2011 at 5:47 PM, Nikolai Weibull <now@bitwi.se> wrote:

This was the main reason for releasing Ruby 1.8.7p357.

···

On Fri, Dec 30, 2011 at 15:42, Zachary Scott <zachary@zacharyscott.net> wrote:

I found this recent discussion interesting in a somewhat related
manner, dealing with hash tables:
http://groups.google.com/group/nodejs/browse_thread/thread/d34ed2ec3526db5a

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

I have covered this a bit earlier:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

Boils down to that the hash code should be derived from hash codes of
all fields used as key fields in determining equivalence.

Precisely. And how do you get a single value from the hash values of
those fields?

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Not good, because then the same value in different fields has
identical influence on the hash code. Consequently likelihood of
collisions increases and thus performance of Hash storage and
retrieval may decrease.

This is exactly what you suggest in your article, with the addition
(or rather XOR) of the hash value of the object’s class.

Right. I should probably have given more detail in this thread with
regard to what goals this is "worse" than the simple XOR of member
hash values.

But still, I don't see the need. Note also that a proper Hash key
usually should be immutable because changing them causes all sorts of
trouble if not done carefully.

Hence the use of “value object” in my question.

I can't see that "value object" implies "immutable".

http://c2.com/cgi/wiki?ValueObject

And most of the time Hash keys are
String, Symbol, Fixnum and the like - which all do have their #hash
implementation already.

But how do you combine them? That’s what this whole thread has been
about. I realize now that I made quite a few assumptions about what I
thought readers would understand from my original question that may
not have been as obvious to them as they were to me.

There you have learned something about communication. :slight_smile:

First, let me apologize for this lack of clarity.

No sweat.

Second, let me rephrase my
question and add some additional context and examples:

What algorithm should one employ in the calculation of the hash value
of an arbitrary value object?

There is no single standard (or best) way. The fact that different
languages (Java, Ruby...) have different means to calculate combined
hash values which all seem to work pretty well indicates this IMHO.

As an example, what algorithm should one employ in the calculation of
the hash value of an immutable class A containing three immutable
instance variables @a, @b, @c that contain, respectively, a String, a
Fixnum, and a Symbol and that are all used in the calculation of #==?

In this case I would choose the most straightforward solution: XOR of
all member hash codes. This is sufficient since you indicated that no
value can show up in two different fields.

The semi-standard solutions seem to be

class A
def hash
@a ^ @b ^ @c
end
end

and

class A
def hash
[@a, @b, @c].hash
end
end

These are the two main implementations in the Standard Library,
anyway. The first is also the solution proposed by Robert Klemme in

http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

And it does work indeed. But keep in mind that the article is not a
specific article about calculating hash codes but rather about what
has to be done to make a class "complete". Hash codes are just one
aspect.

I would claim that the algorithm should take the class of the object
into account as well, both for consistency with #== (which should
check equality of the classes of the objects being compared) and for
added entropy.

Considering the most common case that all keys in a Hash are of the
same class it is questionable whether this will really add entropy
since then for all instances the class's hash will be the same. You
pay a price for additional calculation though.

Internally, Ruby (primarily) uses three C functions for the
calculation of combined hash values, namely rb_hash_start,
rb_hash_uint, and rb_hash_end. As an example, the hash value of a
Struct is calculated (in Ruby with these three functions wrapped in an
imaginary module C) as

class Struct
def hash
C.rb_hash_end(reduce(C.rb_hash_start(self.class.hash)){ |h, v|
C.rb_hash_uint(h, v.hash) })
end
end

Might it be useful to have Ruby expose a way to perform this
calculation from the Ruby realm so that other classes may employ this
algorithm?

Not sure whether we would really gain that much. Those calls are
efficient in C but if you provide that mechanism in Ruby land you will
have multiple calls, e.g.

def hash
  h = Fixnum::HASH_START
  h = h.combine_hash(@a)
  h = h.combine_hash(@b)
  h = h.combine_hash(@c)
end

It may turn out to be more efficient to just do

def hash
  [@a, @b, @c].hash
end

or simply use instances of Struct generated classes and reuse their
implementation.

Requirements for a hash function are pretty clear (and pretty much standard):

1. It should make it as unlikely as possible that two objects which
are not equivalent have the same hash value.
2. Calculation should be as cheap (in terms of CPU cycles) as possible.

Hash function - Wikipedia has more detail

As you can see these are no hard numeric or formal requirements and
they both have potential to contradict each other. So it depends on
the situation what algorithm one chooses. It is always a trade off
between these two goals and there is no single solution which is best
under all circumstances.

I have attached a stats generator for existing Hashes. So anybody
interested can play with it and find out the distribution of hash
values for a given Hash.

Kind regards

robert

hash_stat.rb (287 Bytes)

···

On Fri, Dec 30, 2011 at 3:43 PM, Nikolai Weibull <now@bitwi.se> wrote:

On Fri, Dec 30, 2011 at 15:02, Robert Klemme <shortcutter@googlemail.com> wrote:

On Fri, Dec 30, 2011 at 12:47 AM, Nikolai Weibull <now@bitwi.se> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

What in what I’ve written in this thread suggests that I don’t know
anything about hash tables and hashing functions?

I have read said book, chapter 6 of The Art of Computer Programming,
and many other books and articles that cover this subject. I don’t
mention this to brag, but to point out the fact that this isn’t a
newbie asking a question on this mailing list.

···

On Fri, Dec 30, 2011 at 10:42, Yong Li <gilbertly@gmail.com> wrote:

Maybe you will be interested in reading "Introduction to Algorithms".
There is a whole chapter discussing how to implement hash tables
including criteria of good hash functions.

It would make implementing #hash, which you should/must do
if you implement #==, trivial, as there’s then only one way to do so.

Delegating to the most relevant attribute still seems more trivial.

Trivial – perhaps. Wrong – most definitely. That you don’t seem to
understand this is what tells me that you don’t understand how #hash
should be implemented.

I suppose one reason I take this view could be that the only viable
scenarios I can think of for making some arbitrary object into a hash key
are for sets and Array#uniq. For me, these scenarios are exceedingly rare,
and have always been trivially replaced with alternative keys.

How about having them as keys in an actual Hash (which is how Sets and
Array#uniq are currently implemented)?

···

On Fri, Dec 30, 2011 at 10:44, Josh Cheek <josh.cheek@gmail.com> wrote:

On Fri, Dec 30, 2011 at 3:09 AM, Nikolai Weibull <now@bitwi.se> wrote:

This was the main reason for releasing Ruby 1.8.7p357.

Oh, I hadn't realized until I just checked ruby-lang.org

http://www.ruby-lang.org/en/news/2011/12/28/denial-of-service-attack-was-found-for-rubys-hash-algorithm/