Hash access

Hi, was playing around with an idea after reading the thread about defining
#hash. My understanding was that #hash gives a unique identifier, and that
#eql? allows the hash to determine whether the two objects are equal in
terms of being the same hash key. So I wrote some code that should take an
equivalent instance, or a string for quick access. But it behaves in a way
that I completely don't understand. Hoping someone can help:

User = Struct.new :name, :age, :identifier do
  def hash
    name.hash
  end

  def eql?(other)
    puts "#{name} was asked if they were equal to #{other.inspect}"
    (other == name) || (other.name == name && other.age == age)
  end
end

josh = User.new 'Josh', 28, 'first Josh'
hash = {josh => josh}

hash[josh] # => #<struct User name="Josh",
age=28, identifier="first Josh">
hash[User.new 'Josh', 28, 'second Josh'] # => #<struct User name="Josh",
age=28, identifier="first Josh">
hash['Josh'] # => nil

# >> Josh was asked if they were equal to #<struct User name="Josh",
age=28, identifier="first Josh">

So I would have expected all three to go through eql? Instead, we see that
only the case where the key was the same object goes through. However, it
identifies that the second Josh is the same key, without invoking User#eql?
How does it do this?

And why does the string "Josh" not find the instance?

This is all probably in my copy of the Pickaxe, but it's in Chicago and I'm
out of town :confused:

Hi Josh,

Here is how Hash in Ruby works when it tries to determine if two keys are equal:
* the #hash method on both objects are called to calculate their hash codes
* if their hash codes are not equal, they are not equal
* if their hash codes are equal, then #== is called to determine if
two objects are equal

In your example, all three objects actually return the same hash
codes, so #== (instead of eql?) is used to check their equality.

The "first Josh" and the "second Josh" are equal because their #==
(inherited from Object#==) simply calls #eql? which you have
overridden to make them equal.

The "first Josh" is not equal to "Josh" because they are of different
classes, and User#== (inherited from Object#==) does not allow objects
of different classes to be equal.

As a side note: you should always define #hash and #== together and
make sure whenever #== returns true #hash mush return the same number,
otherwise, using these objects as hash keys will break the hash
semantics. Also, avoid using mutable objects as hash keys unless their
#hash number is immutable.

I hope this helps

···

On Fri, Dec 30, 2011 at 1:37 PM, Josh Cheek <josh.cheek@gmail.com> wrote:

Hi, was playing around with an idea after reading the thread about defining
#hash. My understanding was that #hash gives a unique identifier, and that
#eql? allows the hash to determine whether the two objects are equal in
terms of being the same hash key. So I wrote some code that should take an
equivalent instance, or a string for quick access. But it behaves in a way
that I completely don't understand. Hoping someone can help:

User = Struct.new :name, :age, :identifier do
def hash
name.hash
end

def eql?(other)
puts "#{name} was asked if they were equal to #{other.inspect}"
(other == name) || (other.name == name && other.age == age)
end
end

josh = User.new 'Josh', 28, 'first Josh'
hash = {josh => josh}

hash[josh] # => #<struct User name="Josh",
age=28, identifier="first Josh">
hash[User.new 'Josh', 28, 'second Josh'] # => #<struct User name="Josh",
age=28, identifier="first Josh">
hash['Josh'] # => nil

# >> Josh was asked if they were equal to #<struct User name="Josh",
age=28, identifier="first Josh">

So I would have expected all three to go through eql? Instead, we see that
only the case where the key was the same object goes through. However, it
identifies that the second Josh is the same key, without invoking User#eql?
How does it do this?

And why does the string "Josh" not find the instance?

This is all probably in my copy of the Pickaxe, but it's in Chicago and I'm
out of town :confused:

I see. The confusion for me was that the comparison goes in the other
direction. (ie hash["Josh"] turns into "Josh".eql?(#<struct User ...>) but
I was thinking it would be #<struct User ...>.eql?("Josh")). This becomes
apparent if I change the log line to `puts "#{inspect} was asked if they
were #eql? to #{other.inspect}"` I just didn't do that in the name of
brevity, and it masked the discrepancy.

So it's probably implemented something like this (ignoring nuances like
collisions and default values)

# expectation
class Hash
  def (key)
    potential_key, potential_value = at_hash key.hash
    return potential_value if potential_key.eql? key
  end
end

# actual
class Hash
  def (key)
    potential_key, potential_value = at_hash key.hash
    return potential_value if key.equal? potential_key
    return potential_value if key.eql? potential_key
  end
end

···

On Fri, Dec 30, 2011 at 1:41 AM, Yong Li <gilbertly@gmail.com> wrote:

Hi Josh,

Here is how Hash in Ruby works when it tries to determine if two keys are
equal:
* the #hash method on both objects are called to calculate their hash codes
* if their hash codes are not equal, they are not equal
* if their hash codes are equal, then #== is called to determine if
two objects are equal

In your example, all three objects actually return the same hash
codes, so #== (instead of eql?) is used to check their equality.

The "first Josh" and the "second Josh" are equal because their #==
(inherited from Object#==) simply calls #eql? which you have
overridden to make them equal.

The "first Josh" is not equal to "Josh" because they are of different
classes, and User#== (inherited from Object#==) does not allow objects
of different classes to be equal.

As a side note: you should always define #hash and #== together and
make sure whenever #== returns true #hash mush return the same number,
otherwise, using these objects as hash keys will break the hash
semantics. Also, avoid using mutable objects as hash keys unless their
#hash number is immutable.

I hope this helps

On Fri, Dec 30, 2011 at 1:37 PM, Josh Cheek <josh.cheek@gmail.com> wrote:
> Hi, was playing around with an idea after reading the thread about
defining
> #hash. My understanding was that #hash gives a unique identifier, and
that
> #eql? allows the hash to determine whether the two objects are equal in
> terms of being the same hash key. So I wrote some code that should take
an
> equivalent instance, or a string for quick access. But it behaves in a
way
> that I completely don't understand. Hoping someone can help:
>
>
> User = Struct.new :name, :age, :identifier do
> def hash
> name.hash
> end
>
> def eql?(other)
> puts "#{name} was asked if they were equal to #{other.inspect}"
> (other == name) || (other.name == name && other.age == age)
> end
> end
>
> josh = User.new 'Josh', 28, 'first Josh'
> hash = {josh => josh}
>
> hash[josh] # => #<struct User name="Josh",
> age=28, identifier="first Josh">
> hash[User.new 'Josh', 28, 'second Josh'] # => #<struct User name="Josh",
> age=28, identifier="first Josh">
> hash['Josh'] # => nil
>
> # >> Josh was asked if they were equal to #<struct User name="Josh",
> age=28, identifier="first Josh">
>
>
>
> So I would have expected all three to go through eql? Instead, we see
that
> only the case where the key was the same object goes through. However, it
> identifies that the second Josh is the same key, without invoking
User#eql?
> How does it do this?
>
> And why does the string "Josh" not find the instance?
>
> This is all probably in my copy of the Pickaxe, but it's in Chicago and
I'm
> out of town :confused:

The other confusion for you is insisting it is eql? instead of ==. Yong Li nailed the description of how it works. Please read it again. It is as close to perfect as we're going to get.

Also, as pointed out on the other hash thread... There _needs_ to be a 1:1 correlation between the result of #== and the result of #hash. You cannot simply use the "most relevant attribute". You _must_ use _all_ the attributes that you use against equality tests. Doing this is fundamental to ruby (and computer science) and must be thoroughly understood.

···

On Dec 30, 2011, at 00:59 , Josh Cheek wrote:

I see. The confusion for me was that the comparison goes in the other
direction. (ie hash["Josh"] turns into "Josh".eql?(#<struct User ...>) but
I was thinking it would be #<struct User ...>.eql?("Josh")). This becomes
apparent if I change the log line to `puts "#{inspect} was asked if they
were #eql? to #{other.inspect}"` I just didn't do that in the name of
brevity, and it masked the discrepancy.

The other confusion for you is insisting it is eql? instead of ==. Yong Li nailed the description of how it works. Please read it again. It is as close to perfect as we're going to get.

Also, as pointed out on the other hash thread... There _needs_ to be a 1:1 correlation between the result of #== and the result of #hash. You cannot simply use the "most relevant attribute". You _must_ use _all_ the attributes that you use against equality tests. Doing this is fundamental to ruby (and computer science) and must be thoroughly understood.

I think there are some errors and/or misleading statements in this
discussion.

First of all, the implementation of Hash depends on testing the
equality of two objects via #eql? and not via #==. This is easy
to see by using 1 and 1.0 in a hash:

1.hash #=> 3943323080027384908
(1.0).hash #=> -6757032739833615
1 == 1.0 #=> true
1.eql?(1.0) #=> false
h = {} #=> {}
h[1] = 'a' #=> "a"
h[1.0] = 'b' #=> "b"
h #=> {1=>"a", 1.0=>"b"}

If #== was being used by Hash, the hash at the end of that sequence
would only have one entry with a key of 1.0.

I don't think it is correct to call the relationship between eql?
and == to be one-to-one.

(a == b) implies (a.hash == b.hash)

but the reverse is not true.

(a.hash == b.hash) does not imply (a == b)

If two objects have the same hash, they may or may not be equal.
If they aren't equal, you just have a hash collision that has to
be disambiguated by doing a full equality test via eql?.

Finally, there is no hard requirement that a hash implementation
'must use all the attributes' used for the equality test. If there
is a subset of attributes that are generally different for non-equal
objects then the hash function will be more performant if it only
uses the subset of attributes.

The important point is that you don't want your hash function to
create too many collisions where non-equal objects have the same
hash function.

For example:

   def hash; 1; end

will 'work' but will cause performance problems when those objects
are stored in a Hash:

require 'benchmark'

class A; end
class B; def hash; 1; end; end

n = 10000;

Benchmark.bm(20) do |x|
  x.report('Object#hash') { h = {}; n.times { |i| h[A.new] = i }; }
  x.report('1') { h = {}; n.times { |i| h[B.new] = i }; }
end

                           user system total real
Object#hash 0.010000 0.000000 0.010000 ( 0.008124)
1 2.300000 0.000000 2.300000 ( 2.311228)

Gary Wright

···

On Dec 30, 2011, at 4:59 PM, Ryan Davis wrote:

You're right. I really should have something that prevents me from even opening my email until it detects I've had my second espresso. I wonder if that can be done without a blood sample.

···

On Dec 30, 2011, at 15:05 , Gary Wright wrote:

I think there are some errors and/or misleading statements in this
discussion.

First of all, the implementation of Hash depends on testing the
equality of two objects via #eql? and not via #==. This is easy
to see by using 1 and 1.0 in a hash: