Ruby and Java equality usage

So the primary reason eql? is separate from == is that eql? needs to
be sure to follow hash. So now my question is: why isn't the default
implementation of eql? to compare hash instead of object_id? Then we
can override hash in cases where we want to, and not need to worry
about eql? unless we really need to...

Hashes values don't have to be unique. Two distinct values may return
the same hash code, but should return false for eql?. Remember how hash
tables work from CS class -- if two hash codes are the same, the table
chains the values together in a linked list under the same hash
'bucket'. To retrieve a value it first finds the relevant hash bucket
via #hash, and then walks through the list sequentially using #eql? to
compare. So #eql? had better return false for two distinct values. On
the other hand, 'def hash; 1; end' is a perfectly correct albeit very
inefficient #hash implementation.

Steve

Ah, gotcha. Thanks for the refresher. :slight_smile: Is it true then that
a.eql?(b) only if a.hash == b.hash? What's the damage in a.eql?(b)
returning true when in different buckets? As far as I can tell, #eql?
is only used internally by Hash -- it shouldn't be being called by
other code. So if a and b are in different buckets (different #hash
values), a.eql?(b) will never be called anyway. Relaxing that
restriction, can't we swing back the other way and ask again:

  Why doesn't the default implementation of Object#eql? just use #== internally?

E.g.:

  class Object
    def eql?(other)
      self == other
    end
  end

Jacob Fugal

···

On 6/27/06, Molitor, Stephen L <Stephen.L.Molitor@erac.com> wrote:

> So the primary reason eql? is separate from == is that eql? needs to
> be sure to follow hash. So now my question is: why isn't the default
> implementation of eql? to compare hash instead of object_id? Then we
> can override hash in cases where we want to, and not need to worry
> about eql? unless we really need to...

Hashes values don't have to be unique. Two distinct values may return
the same hash code, but should return false for eql?. Remember how hash
tables work from CS class -- if two hash codes are the same, the table
chains the values together in a linked list under the same hash
'bucket'. To retrieve a value it first finds the relevant hash bucket
via #hash, and then walks through the list sequentially using #eql? to
compare. So #eql? had better return false for two distinct values. On
the other hand, 'def hash; 1; end' is a perfectly correct albeit very
inefficient #hash implementation.

I don't know if this is quite accurate. In Java world at least such an
implementation is not enough good. Also, the known Hash
implementations are using both hashCode() and equals() methods on key
objects (and in this exact order). And I think it should be the same
in Ruby (or pretty similar).

./alex

···

On 6/27/06, Jacob Fugal <lukfugl@gmail.com> wrote:

On 6/27/06, Molitor, Stephen L <Stephen.L.Molitor@erac.com> wrote:
> > So the primary reason eql? is separate from == is that eql? needs to
> > be sure to follow hash. So now my question is: why isn't the default
> > implementation of eql? to compare hash instead of object_id? Then we
> > can override hash in cases where we want to, and not need to worry
> > about eql? unless we really need to...
>
> Hashes values don't have to be unique. Two distinct values may return
> the same hash code, but should return false for eql?. Remember how hash
> tables work from CS class -- if two hash codes are the same, the table
> chains the values together in a linked list under the same hash
> 'bucket'. To retrieve a value it first finds the relevant hash bucket
> via #hash, and then walks through the list sequentially using #eql? to
> compare. So #eql? had better return false for two distinct values. On
> the other hand, 'def hash; 1; end' is a perfectly correct albeit very
> inefficient #hash implementation.

Ah, gotcha. Thanks for the refresher. :slight_smile: Is it true then that
a.eql?(b) only if a.hash == b.hash? What's the damage in a.eql?(b)
returning true when in different buckets? As far as I can tell, #eql?
is only used internally by Hash -- it shouldn't be being called by
other code. So if a and b are in different buckets (different #hash
values), a.eql?(b) will never be called anyway. Relaxing that
restriction, can't we swing back the other way and ask again:

  Why doesn't the default implementation of Object#eql? just use #== internally?

E.g.:

  class Object
    def eql?(other)
      self == other
    end
  end

Jacob Fugal

--
.w( the_mindstorm )p.
---
(http://themindstorms.blogspot.com)

I wonder if this has anything to do with

class A
   include Comparable
     def <=>(other)
       ...
     end
end

e.g. Maybe you overrode #== in a such a way as to upset a hash but didn't know it.

I dunno.

···

On Jun 27, 2006, at 1:45 PM, Jacob Fugal wrote:

Why doesn't the default implementation of Object#eql? just use #== internally?

E.g.:

class Object
   def eql?(other)
     self == other
   end
end

Jacob Fugal

Hmm, I'm not sure I follow. The implementation in Ruby is similar.
This is the basic operation for a fetch (hash[key]):

  hash_val = key.hash
  bin = hash.table.bins[hash_val % hash.table.bins.size]
  found = bin.find do |entry|
    entry.hash_value == value and
    entry.key.eql?(key)
  end
  found ? found.value : nil

It is of course implemented in C and has some special cases for
Fixnums, Symbols, Strings and so forth and skips method invocations
when it can see that the objects are the same (same object_id). But
that's the gist of it. Java's hashCode() and equals() become Ruby's
#hash and #eql? respectively. The implementation of #eql? doesn't
affect the mechanism of the hash algorithm, only the semantics of the
keys used in the hash itself.

Jacob Fugal

···

On 6/27/06, Alexandru Popescu <the.mindstorm.mailinglist@gmail.com> wrote:

On 6/27/06, Jacob Fugal <lukfugl@gmail.com> wrote:
> On 6/27/06, Molitor, Stephen L <Stephen.L.Molitor@erac.com> wrote:
> > Hashes values don't have to be unique. Two distinct values may return
> > the same hash code, but should return false for eql?. Remember how hash
> > tables work from CS class -- if two hash codes are the same, the table
> > chains the values together in a linked list under the same hash
> > 'bucket'. To retrieve a value it first finds the relevant hash bucket
> > via #hash, and then walks through the list sequentially using #eql? to
> > compare. So #eql? had better return false for two distinct values. On
> > the other hand, 'def hash; 1; end' is a perfectly correct albeit very
> > inefficient #hash implementation.
>
> Ah, gotcha. Thanks for the refresher. :slight_smile: Is it true then that
> a.eql?(b) only if a.hash == b.hash? What's the damage in a.eql?(b)
> returning true when in different buckets? As far as I can tell, #eql?
> is only used internally by Hash -- it shouldn't be being called by
> other code. So if a and b are in different buckets (different #hash
> values), a.eql?(b) will never be called anyway. Relaxing that
> restriction, can't we swing back the other way and ask again:
>
> Why doesn't the default implementation of Object#eql? just use #== internally?
>
> E.g.:
>
> class Object
> def eql?(other)
> self == other
> end
> end

I don't know if this is quite accurate. In Java world at least such an
implementation is not enough good. Also, the known Hash
implementations are using both hashCode() and equals() methods on key
objects (and in this exact order). And I think it should be the same
in Ruby (or pretty similar).