Q: How to use non-builtin class as hash key?

I have a class defined as:

···

###########################
class MyClass
    def initialize(s1, s2, s3)
        @string1 = s1
        @string2 = s2
        @string3 = s3
    end

    def hash
        return (@string1 + @string2 + @string3).hash
    end
end
###########################

Then I create two instances of this class:
###########################
obj1 = MyClass.new('a', 'b', 'c')
obj2 = MyClass.new('a', 'b', 'c')
###########################

If verify obj1 and obj2 do have the same hash value, I check that
'obj1.hash == obj2.hash'
returns true

Now i try to use those objects as hash key:
h[obj1] = 'abc'
h[obj2] = 'def'

I expected h contains only one item with value 'def', since obj1 and obj2
have the same hash value. But it turns out it has 2 items instead.

Did I miss something?

Jia Pu wrote:

I have a class defined as:

###########################
class MyClass
   def initialize(s1, s2, s3)
       @string1 = s1
       @string2 = s2
       @string3 = s3
   end

   def hash
       return (@string1 + @string2 + @string3).hash
   end
end
###########################

Then I create two instances of this class:
###########################
obj1 = MyClass.new('a', 'b', 'c')
obj2 = MyClass.new('a', 'b', 'c')
###########################

If verify obj1 and obj2 do have the same hash value, I check that
'obj1.hash == obj2.hash'
returns true

Now i try to use those objects as hash key:
h[obj1] = 'abc'
h[obj2] = 'def'

I expected h contains only one item with value 'def', since obj1 and obj2
have the same hash value. But it turns out it has 2 items instead.

Did I miss something?

You need to define #eql? in your class, since that is how Hash compares objects that hash the same:

class MyClass
    def contents
        [@string1, @string2, @string3]
    end

    def eql?(other)
        other.class == self.class and other.contents == self.contents
    end
end

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Thanks, that works.
I am wondering the requirement of #eql? is necessary. If the programmer
wants to have non-equal objects to have same hash value, it is totally up to
the programmer.

···

On 8/31/06, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

Jia Pu wrote:
> I have a class defined as:
>
> ###########################
> class MyClass
> def initialize(s1, s2, s3)
> @string1 = s1
> @string2 = s2
> @string3 = s3
> end
>
> def hash
> return (@string1 + @string2 + @string3).hash
> end
> end
> ###########################
>
> Then I create two instances of this class:
> ###########################
> obj1 = MyClass.new('a', 'b', 'c')
> obj2 = MyClass.new('a', 'b', 'c')
> ###########################
>
> If verify obj1 and obj2 do have the same hash value, I check that
> 'obj1.hash == obj2.hash'
> returns true
>
> Now i try to use those objects as hash key:
> h[obj1] = 'abc'
> h[obj2] = 'def'
>
> I expected h contains only one item with value 'def', since obj1 and
obj2
> have the same hash value. But it turns out it has 2 items instead.
>
> Did I miss something?
>

You need to define #eql? in your class, since that is how Hash compares
objects that hash the same:

class MyClass
    def contents
        [@string1, @string2, @string3]
    end

    def eql?(other)
        other.class == self.class and other.contents == self.contents
    end
end

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Jia Pu wrote:

Thanks, that works.
I am wondering the requirement of #eql? is necessary. If the programmer
wants to have non-equal objects to have same hash value, it is totally up to
the programmer.

I'm not sure what you are asking, but you can define #== and #eql? differently. In fact, ruby already does this in some cases:

irb(main):001:0> 1.0.eql? 1
=> false
irb(main):002:0> 1.0 == 1
=> true

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

$ri Object#hash
------------------------------------------------------------ Object#hash
     obj.hash => fixnum

···

On 9/1/06, Jia Pu <very.funny.j@gmail.com> wrote:

Thanks, that works.
I am wondering the requirement of #eql? is necessary. If the programmer
wants to have non-equal objects to have same hash value, it is totally up to
the programmer.

------------------------------------------------------------------------
     Generates a +Fixnum+ hash value for this object. This function must
     have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
     hash value is used by class +Hash+. Any hash value that exceeds the
     capacity of a +Fixnum+ will be truncated before being used.

Note that the requirement is that if a.eql?(b) then a.hash must be == b.hash

This is because hash is used as a test to avoid a potentially
expensive .eql? test, if two object's DON'T have the same hash value
they are assumed to be not equal.

A Hash (I'm assuming here based on how hash algorithms work in
general) uses the hash of the key to compute an initial slot to store
a reference to an object when you put the object, at a particular key
in the hash with []=. If that slot is empty that's where it goes. If
it's not, meaning that there's a key there with the same hash, then it
uses eql? to see if it's the right slot, and if not it's called a hash
collision, and the hash calculates a new slot and repeats the process.
Finding the object with a given key works basically the same way.

So the net of all this is, if you want hash to treat two different
instances of your class as the same then you must redefine eql?, and
whenever you redefine .eql? then you need to redefine hash to make
sure that the implication in the specification of Object#hash holds.
--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

"Jia Pu" <very.funny.j@gmail.com> writes:

Thanks, that works.
I am wondering the requirement of #eql? is necessary. If the programmer
wants to have non-equal objects to have same hash value, it is totally up to
the programmer.

I assume you are asking why defining .eql? is necessary. (You seem to
have dropped the word "why" in your question)

If two objects have the same hash value, this does not mean that they
should overwrite each other when used as keys! In fact, that would be
disasterous as hash collisions are a fact of life - has must be a
Fixnum, and there are only so many Fixnums to go around.

For example:

irb(main):001:0> h = Hash.new
=> {}
irb(main):002:0> h[1] = 'Value for 1'
=> "Value for 1"
irb(main):003:0> h[4294967298] = 'Value for a really big number'
=> "Value for a really big number"
irb(main):004:0> h[1]
=> "Value for 1"
irb(main):005:0> 1.hash
=> 3
irb(main):006:0> 4294967298.hash
=> 3

Fortunately, 1.eql?(4294967298) is false, so I can store different
values in my hash.

Note that the definition of .eql? and == do not need to have anything
to do with each other, or with the definition of ===. However, it is
generally very good practice to make sure that the following
implication chain holds:

(a.eql?(b)) implies (a == b) implies (a === b)

Also, eql? and == should be symmetric (that is, it should make no
difference switching a and b for those two methods).

···

--
s=%q( Daniel Martin -- martin@snowplow.org
       puts "s=%q(#{s})",s.map{|i|i}[1] )
       puts "s=%q(#{s})",s.map{|i|i}[1]