Equal? versus eql? versus == versus === verus <=>

Paul_Butcher1 · 16 October 2007 10:03

I recently found myself explaining to a friend how Ruby's various
comparison operators work. In the process, I tried to find some decent
documentation on why Ruby has so many different ways to test for
equality, how they differ and how they should be implemented and used.

I was unable to find any such documentation, so I decided to have a go
myself You can see the fruits of my labours here:

http://www.texperts.com/2007/10/16/navigating-the-equality-maze/

I believe that it's an accurate reflection of both how things work and
the philosophy underlying the design of this area. I would be very
grateful if you could let me know if I have anything wrong or have left
anything out though!

There are a number of subtle pitfalls in this area waiting to trap the
unwary. I hope that this may go some way to helping a few people avoid
them

Comments, criticisms and suggestions all gratefully received.

Paul.

···

--
Posted via http://www.ruby-forum.com/.

Jesus_Gabriel_y_Gala · 16 October 2007 11:01

Thanks for this. I found it great for a quick reference and to explain
it to someone else :-).

There are a couple of typos that you might want to correct:

Note that this means that, unlike the other methods we're considering
here, this means that === won't in general be commutative:

String === 'foo'
=> true
'foo' == String
=> false

Should be 'foo' === String (three equals).

[...]
In the vast majority of cases, you will either want to test for
"natural" equality (===)

Should be (==).

Thanks,

Jesus.

···

On 10/16/07, Paul Butcher <paul@texperts.com> wrote:

I recently found myself explaining to a friend how Ruby's various
comparison operators work. In the process, I tried to find some decent
documentation on why Ruby has so many different ways to test for
equality, how they differ and how they should be implemented and used.

I was unable to find any such documentation, so I decided to have a go
myself You can see the fruits of my labours here:

钛刻专家——官网

I believe that it's an accurate reflection of both how things work and
the philosophy underlying the design of this area. I would be very
grateful if you could let me know if I have anything wrong or have left
anything out though!

There are a number of subtle pitfalls in this area waiting to trap the
unwary. I hope that this may go some way to helping a few people avoid
them

Comments, criticisms and suggestions all gratefully received.

Paul_Butcher1 · 17 October 2007 09:26

Paul Butcher wrote:

I was unable to find any such documentation, so I decided to have a go
myself You can see the fruits of my labours here:

钛刻专家——官网

Nobody has (yet) risen to the challenge at the end of my article, so I
thought that I'd ask here

Just about every class in the standard library implements == and eql? as
I describe in the article, i.e. eql? tests for equal values and == tests
for "natural" equality (which normally means equal values).

For example:

    [1, 2].eql? [1,2]
    => true
    'foo'.eql? 'foo'
    => true
    [1, 2] == [1,2]
    => true
    'foo' == 'foo'
    => true

Hash, however, is an exception. Hash#== tests for equal values.
Hash.eql?, however, tests for object identity:

    {:x=>1, :y=>2} == {:x=>1, :y=>2}
    => true
    {:x=>1, :y=>2}.eql?({:x=>1, :y=>2})
    => false

Why is hash the odd one out? I'm sure that there must be a good reason
(Matz?) but I can't at the moment work out what it might be.

I'd be very grateful for any light anyone could cast on this. Thanks!

Paul.

···

--
Posted via http://www.ruby-forum.com/\.

Paul_Butcher1 · 16 October 2007 12:11

Jesús Gabriel y Galán wrote:

Thanks for this. I found it great for a quick reference and to explain
it to someone else :-).

Thanks Jesús - glad to know that it was of some use!

There are a couple of typos that you might want to correct:

Thanks. These are now fixed!

Paul.

···

--
Posted via http://www.ruby-forum.com/\.

Rick_DeNatale1 · 17 October 2007 12:12

Nobody has (yet) risen to the challenge at the end of my article, so I
thought that I'd ask here

Just about every class in the standard library implements == and eql? as
I describe in the article, i.e. eql? tests for equal values and == tests
for "natural" equality (which normally means equal values).

...

Hash, however, is an exception. Hash#== tests for equal values.
Hash.eql?, however, tests for object identity:

...

Why is hash the odd one out? I'm sure that there must be a good reason
(Matz?) but I can't at the moment work out what it might be.

I'd be very grateful for any light anyone could cast on this. Thanks!

Some clues:
irb(main):001:0> h1 = {:a => "1", :b => "2"}
=> {:b=>"2", :a=>"1"}
irb(main):002:0> h2 = {:a => "1", :b => "2"}
=> {:b=>"2", :a=>"1"}
irb(main):003:0> h1 == h2
=> true
irb(main):004:0> h1.eql? h2
=> false
irb(main):005:0> h1.hash
=> 19850
irb(main):006:0> h2.hash
=> 276530
irb(main):007:0> h1.object_id
=> 19850
irb(main):008:0> h2.object_id
=> 276530

I think the reason is twofold:

1) Using hashs as keys in another hash is not a common use case. I'm a
little hard-pressed to think of why I'd want to, although I'm famous
for lack of imagination.
2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
obj2.hash, implementing Hash#hash requires iterating over the keys and
values and would be fairly expensive and make accessing a hash with
hash keys by key impractical.

···

On 10/17/07, Paul Butcher <paul@texperts.com> wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Chad_Perrin1 · 17 October 2007 15:32

foo.eql? still seems a little inconsistent (and surprising) to me.
Essentially, it's == except when comparing hashes, at which point it
suddenly becomes foo.equal?. Is there some instance in which foo.eql?
and foo.equal? evaluate differently for hashes that escapes me at the
moment? Is there a particular use-case for foo.eql? (which for the
moment looks to me like the red-headed stepchild of Ruby equality) that
isn't satisfied by any of the other equality comparison methods?

While we're discussing things with equal signs -- why isn't = a method?

···

On Wed, Oct 17, 2007 at 09:12:26PM +0900, Rick DeNatale wrote:

1) Using hashs as keys in another hash is not a common use case. I'm a
little hard-pressed to think of why I'd want to, although I'm famous
for lack of imagination.
2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
obj2.hash, implementing Hash#hash requires iterating over the keys and
values and would be fairly expensive and make accessing a hash with
hash keys by key impractical.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
print substr("Just another Perl hacker", 0, -2);

Gavin_Kistner3 · 17 October 2007 15:55

> Just about every class in the standard library implements == and eql? as
> I describe in the article, i.e. eql? tests for equal values and == tests
> for "natural" equality (which normally means equal values).

> Hash, however, is an exception. Hash#== tests for equal values.
> Hash.eql?, however, tests for object identity:

> Why is hash the odd one out? I'm sure that there must be a good reason
> (Matz?) but I can't at the moment work out what it might be.

I think the reason is twofold:

1) Using hashs as keys in another hash is not a common use case. I'm a
little hard-pressed to think of why I'd want to, although I'm famous
for lack of imagination.

I've wanted it on 3 occasions (that I can remember) now. Here's a
contrived example derived from the real-world use case I can no longer
remember:

You have a file like this...
alpha,beta,15
gamma,delta,3
beta,alpha,4
alpha,alpha,3
delta,alpha,5
gamma,delta,7
...and you want to sum up the numbers for each unique pair of greek
letters. Naively, I'd do (and initially tried) something like:

  sums = Hash.new{ 0 }
  DATA.each{ |line|
    _, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
    sums[ { g1=>true, g2=>true } ] += num.to_i
  }

I believe I instead resorted to sorting the keys and using a nested
hash to drill down to the value. It was annoying.

2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
obj2.hash, implementing Hash#hash requires iterating over the keys and
values and would be fairly expensive and make accessing a hash with
hash keys by key impractical.

That logic seems slightly mothering, though. "Ruby prevents you from
doing A because if you did A it might be slow." Ruby doesn't prevent
me from writing:
  my_huge_array.delete_if{ |v1|
    my_huge_array.find{ |v2| (v1 - v2).abs < mu }
  }
I suppose the distinction is that the above is a foolish pairing of
individually-reasonable parts, while Hash#hash is an atomic method
written to optimize speed for one (reasonably useless) use case at the
expense of allowing another use case.

As a related aside:
Having never written a hashing function, I'm uncertain how I'd write
Hash#hash in a way that reasonably prevented two hashes with different
keys and/or values from ending up with the same value. (Multiply
the .hash values of all keys and values in the Hash and then mod them
on a big prime number?) Has anyone taken a stab at implementing this?

···

On Oct 17, 6:12 am, "Rick DeNatale" <rick.denat...@gmail.com> wrote:

On 10/17/07, Paul Butcher <p...@texperts.com> wrote:

Rick_DeNatale1 · 17 October 2007 19:38

> > Just about every class in the standard library implements == and eql? as
> > I describe in the article, i.e. eql? tests for equal values and == tests
> > for "natural" equality (which normally means equal values).
>
> > Hash, however, is an exception. Hash#== tests for equal values.
> > Hash.eql?, however, tests for object identity:
>
> > Why is hash the odd one out? I'm sure that there must be a good reason
> > (Matz?) but I can't at the moment work out what it might be.
>
> I think the reason is twofold:
>
> 1) Using hashs as keys in another hash is not a common use case. I'm a
> little hard-pressed to think of why I'd want to, although I'm famous
> for lack of imagination.

I've wanted it on 3 occasions (that I can remember) now. Here's a
contrived example derived from the real-world use case I can no longer
remember:

You have a file like this...
alpha,beta,15
gamma,delta,3
beta,alpha,4
alpha,alpha,3
delta,alpha,5
gamma,delta,7
...and you want to sum up the numbers for each unique pair of greek
letters. Naively, I'd do (and initially tried) something like:

  sums = Hash.new{ 0 }
  DATA.each{ |line|
    _, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
    sums[ { g1=>true, g2=>true } ] += num.to_i
  }

I believe I instead resorted to sorting the keys and using a nested
hash to drill down to the value. It was annoying.

Well in the above code you can change the line:

sums[ { g1=>true, g2=>true } ] += num.to_i
to
sums[[g1,g2].sort] += num.to_i

which I actually think looks cleaner

> 2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
> obj2.hash, implementing Hash#hash requires iterating over the keys and
> values and would be fairly expensive and make accessing a hash with
> hash keys by key impractical.

That logic seems slightly mothering, though. "Ruby prevents you from
doing A because if you did A it might be slow." Ruby doesn't prevent
me from writing:
  my_huge_array.delete_if{ |v1|
    my_huge_array.find{ |v2| (v1 - v2).abs < mu }
  }
I suppose the distinction is that the above is a foolish pairing of
individually-reasonable parts, while Hash#hash is an atomic method
written to optimize speed for one (reasonably useless) use case at the
expense of allowing another use case.

I suspect that it's just pragmatism

As a related aside:
Having never written a hashing function, I'm uncertain how I'd write
Hash#hash in a way that reasonably prevented two hashes with different
keys and/or values from ending up with the same value. (Multiply
the .hash values of all keys and values in the Hash and then mod them
on a big prime number?) Has anyone taken a stab at implementing this?

That's not the requirement, the requirement is that if two objects are
eql? then their hashes must also be ==, there's nothing to prevent two
objects which aren't eql? to have the same hash value.

In fact let me offer:

require "benchmark"

DATA = <<-END
alpha,beta,15
gamma,delta,3
beta,alpha,4
alpha,alpha,3
delta,alpha,5
gamma,delta,7
END

def hash_key_impl
  sums = Hash.new(0)
  DATA.each{ |line|
    _, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
    sums[ { g1=>true, g2=>true } ] += num.to_i
  }
  sums
end

class HashableHash < Hash
  def hash
    to_a.sort.hash
  end

  def eql?(other)
    self == other
  end
end

def hashable_key_impl
  sums = Hash.new(0)
  DATA.each{ |line|
    _, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
    sums[ HashableHash[g1=>true, g2=>true ] ] += num.to_i
  }
  sums
end

class HashableHash2 < Hash
  def hash
    1
  end

  def eql?(other)
    self == other
  end
end

def hashable2_key_impl
  sums = Hash.new(0)
  DATA.each{ |line|
    _, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
    sums[ HashableHash2[g1=>true, g2=>true ] ] += num.to_i
  }
  sums
end

def array_key_impl
  sums = Hash.new(0)
  DATA.each{ |line|
    _, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
    sums[ [g1,g2].sort ] += num.to_i
  }
  sums
end

p hash_key_impl
p hashable_key_impl
p hashable2_key_impl
p array_key_impl

TESTS = 1000
Benchmark.bmbm do |results|

  results.report("hash key:") do
    TESTS.times do
      hash_key_impl
    end
  end

  results.report("hashable key:") do
    TESTS.times do
      hashable_key_impl
    end
  end

  results.report("hashable2 key:") do
    TESTS.times do
      hashable2_key_impl
    end
  end

  results.report("array key:") do
    TESTS.times do
      array_key_impl
    end
  end

end

{{"delta"=>true, "gamma"=>true}=>7, {"alpha"=>true, "delta"=>true}=>5,
{"alpha"=>true}=>3, {"alpha"=>true, "beta"=>true}=>4, {"delta"=>true,
"gamma"=>true}=>3, {"alpha"=>true, "beta"=>true}=>15}
{{"delta"=>true, "gamma"=>true}=>10, {"alpha"=>true}=>3,
{"alpha"=>true, "beta"=>true}=>19, {"alpha"=>true, "delta"=>true}=>5}
{{"alpha"=>true, "delta"=>true}=>5, {"alpha"=>true}=>3,
{"delta"=>true, "gamma"=>true}=>10, {"alpha"=>true, "beta"=>true}=>19}
{["alpha", "alpha"]=>3, ["delta", "gamma"]=>10, ["alpha", "beta"]=>19,
["alpha", "delta"]=>5}
Rehearsal --------------------------------------------------
hash key: 0.050000 0.000000 0.050000 ( 0.080213)
hashable key: 0.100000 0.000000 0.100000 ( 0.104884)
hashable2 key: 0.080000 0.000000 0.080000 ( 0.079243)
array key: 0.050000 0.000000 0.050000 ( 0.055140)
----------------------------------------- total: 0.280000sec

user system total real
hash key: 0.050000 0.000000 0.050000 ( 0.050082)
hashable key: 0.100000 0.010000 0.110000 ( 0.101206)
hashable2 key: 0.070000 0.000000 0.070000 ( 0.077322)
array key: 0.050000 0.000000 0.050000 ( 0.051378)

The original hash implementation performs almost exactly the same as
my simple substitution of sorted arrays as the keys, but it has the
distinct disadvantage of getting the wrong answer

I did two different implementations of a HashableHash which use == for
eql? and differ in that one uses information in the hash to compute a
hash value, while the other returns a constant value for hash.

Both are slower than the array key implementation. HashableHash2 is
somewhat faster than HashableHash since the hash method runs in
constant time, but I'm not sure that mapping the hash value to a
constant is a good idea.

···

On 10/17/07, Phrogz <phrogz@mac.com> wrote:

On Oct 17, 6:12 am, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> On 10/17/07, Paul Butcher <p...@texperts.com> wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Rick_DeNatale1 · 17 October 2007 19:56

I assume you mean in the execution of an expression like

a = b

as opposed to

a.b = c

Now in the first case, what object is sent the message :=

Remember that variable aren't objects, they are references to objects,
and assignment changes the binding of a variable it doesn't operate on
an object.

···

On 10/17/07, Chad Perrin <perrin@apotheon.com> wrote:

While we're discussing things with equal signs -- why isn't = a method?

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Gary_Wright · 28 November 2007 06:06

Chad Perrin wrote:

1) Using hashs as keys in another hash is not a common use case. I'm a
littlehard-pressed to think of why I'd want to, although I'm famous
for lack of imagination.
2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
obj2.hash, implementing Hash#hash requires iterating over the keys and
values and would be fairly expensive and make accessing a hash with
hash keys by key impractical.

foo.eql? still seems a little inconsistent (and surprising) to me.
Essentially, it's == except when comparing hashes, at which point it
suddenly becomes foo.equal?.

This isn't really true. I just spent some time looking at the 1.8 sources
for Object/Array/String/Hash/Set in order to understand things better and
found some interesting insights into eql? vs == that I haven't seen
mentioned before.

Here is how I would characterize equal?, eql? and ==

  equal? object identity test
         eql? strict equality test (defaults to identity)
         == loose equality test (defaults to identity)

Strict equality for Object, Array, String, and Hash means that
the class of the objects are identical. I mention this because that
is not the case with loose equality.

Array and String define 'strict equality' based on their contents. The
containers don't have have to be identical but the contents must follow
the 'strict equality' rule. For arrays, this means considering x.eql?(y)
for all the corresponding elements. For strings, it just means a byte
comparison of the contents.

Object and Hash define 'strict equality' as 'object identity'. The
other messages in the thread give some reasons why this is a reasonable
definition for Hash (i.e. why the contents are ignored).

The more interesting concept is 'loose equality' defined by ==.
What I didn't know and what isn't clear from the documentation is
that Array, String, and Hash all implement a coercion 'protocol' for
#==. If the object on the right hand side responds to to_ary, to_str,
or to_hash respectively, then the receiver and argument are reversed and
#== is called again.

str == str_pretender

becomes

str_pretender == str

at which point the implementation of == as provided by the 'pretender'
comes into play. This little trick means that can you construct
your own classes that pretend to be arrays, strings, or hashes and
== will commute correctly with the standard classes as long as you
take care to define to_ary/to_str/to_hash and ==. Nice.

In addition, it is important to note that while Hash#== uses #eql?
to compare *keys* it uses #== to compare values:

     h1 = {1 => 2}
     h2 = {1 => 2.0}
     h3 = {1.0 => 2.0}

     h1.eql?(h2) # false (identity test)
     h1 == h2 # true (2 == 2.0 is true)
     h1 == h3 # false (1.eql?(1.0) is false)

Because #== is used on the values, the coercion protocol I mentioned
above will come into play when comparing hash contents.

I didn't look at <=> as closely but it looks like Array#<=> will
happily convert the right hand side via #to_ary if available.
String#<=> looks for to_str and then applies the coercion protocol
as with == taking care to reverse the results.
Objects and Hashes don't have a natural ordering so <=> isn't defined.

Set

Set is a bit strange. Set#eql? depends on Hash#eql? on an internal
hash but it isn't possible for two sets to share the same internal
hash and since Hash#eql? is based on object identity it isn't possible
for Set#eql? to ever return true, which means that 'strict equality'
for sets is the same as object identity. If the definition of Hash#eql?
changed, then Set#eql? would follow. Set#== doesn't implement a
coercion protocol and sets aren't ordered so no Set#<=>.

Gary Wright

···

On Wed, Oct 17, 2007 at 09:12:26PM +0900, Rick DeNatale wrote:

Chad_Perrin1 · 17 October 2007 21:19

Oops. I guess I should have thought of that.

Thanks.

···

On Thu, Oct 18, 2007 at 04:56:48AM +0900, Rick DeNatale wrote:

Remember that variable aren't objects, they are references to objects,
and assignment changes the binding of a variable it doesn't operate on
an object.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
John Kenneth Galbraith: "If all else fails, immortality can always be
assured through spectacular error."

Stefan_Rusterholz · 18 October 2007 10:18

Rick Denatale wrote:

  def hash
    to_a.sort.hash
  end

This requires the hash key and value to implement <=> which will not be
always the case and thus breaks.

  def hash
    1
  end
...
but I'm not sure that mapping the hash value to a
constant is a good idea.

I'm sure it is not. That way every Hash as key is stored in the same
bucket, essentially making it no different from an Array in the way keys
are looked up (only advantage is, that you don't have the method call
overhead you'd have with an Array solution).

Regards
Stefan

···

--
Posted via http://www.ruby-forum.com/\.

Rick_DeNatale1 · 18 October 2007 18:07

Rick Denatale wrote:
> def hash
> to_a.sort.hash
> end

This requires the hash key and value to implement <=> which will not be
always the case and thus breaks.

Correct, depending on the individual hash. On the other hand this is
yet another aspect of the subtle differences between the 'type' of an
object and its class, and I don't think that such things are probably
unavoidable is such an evil thing:

http://talklikeaduck.denhaven2.com/articles/2007/10/17/you-cant-judge-a-book-some-mental-traps-in-learning-ruby

> def hash
> 1
> end
> ...
> but I'm not sure that mapping the hash value to a
> constant is a good idea.

I'm sure it is not. That way every Hash as key is stored in the same
bucket, essentially making it no different from an Array in the way keys
are looked up (only advantage is, that you don't have the method call
overhead you'd have with an Array solution).

Well not exactly, it means that there will always be a collision,
followed by searching in the hash.

I was actually subtly reinforcing the point that performance
considerations might well have driven the decision to make Hash#eql?
be an alias for equal? To be fair, my naive constant hash could be
improved by using some inexpensive quality which did partitioned
hashes into a greater number of equivalence classes, for example
perhaps:
class Hash
  def hash
       length
  end
end

Although this would be been pretty ineffective in Phrogz' use-case

It might be of interest to note that Smalltalk has both a Dictionary
(which is pretty much equivalent to Ruby's Hash) and an
IdentityDictionary which uses object-identity instead of equality to
differentiate keys.

···

On 10/18/07, Stefan Rusterholz <apeiros@gmx.net> wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Topic		Replies	Views
Ruby and Java equality usage ruby-talk	31	170	27 June 2006
Hash, ==, key-value comparison ruby-talk	19	98	19 April 2004
Whats so different about a Hash? ruby-talk	14	118	5 May 2005
Proposal Hash eql? , ==, === ruby-talk	2	81	19 April 2004
Ruby and Java equality usage ruby-talk	2	139	27 June 2006

Equal? versus eql? versus == versus === verus <=>

Related topics