Comparing objects

Array#& uses eql? instead of == because internally, it works something
like this:

class Array
  def &(other)
    h1={}
    other.each{|x| h1=true}
    select{|x| h1 }
  end
end

In other words, it creates a (hash) index to get a speedup. (From
O(M*N) to O(M+N).)

···

On 6/11/10, Robert Dober <robert.dober@gmail.com> wrote:

OP wanted to use Array#&, and Array#&, for a reason not too clear to
me, uses Object#eql? instead of Object#== I did discourage the
overloading of Object#eql? and Object#hash for *that purpose*.

I don't think we disagree, nor do I argue with you. I just posted blog links as illustration to Rein's point about how to implement those methods.

Kind regards

  robert

···

On 06/11/2010 08:15 PM, Robert Dober wrote:

On Fri, Jun 11, 2010 at 6:47 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

On 10.06.2010 18:27, Robert Dober wrote:

On Thu, Jun 10, 2010 at 6:10 PM, Robert Klemme >>> <shortcutter@googlemail.com> wrote:

http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html

I
You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why *not* redefining #hash and #eql? will
cause problems, because that was Wilson's statement. I am still
waiting :(.

The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of that
class should be used as Hash keys or in Set. If not at least equivalence
can be defined other than via identity (which is the default) then it is
perfectly reasonable to not override both methods and go with the default
implementation.

But that was *exactly* my point.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Marcin Wolski wrote:

Rein Henrichs wrote:

Mark Abramov wrote:

[tl;dr]

Sorry, guys, didn't notice how I used eql instead of eql?
Btw, without #hash it won't work anyways which I consider *weird* at the
very least.

#hash makes sense for Hash# and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.

If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby Programming Language book.

Thus, when you redefine eql?, the hash methods also should be redefined.

Well, it doesn't say much in core api :frowning:

···

On 2010-06-10 07:20:03 -0700, Mark Abramov said:

--
Posted via http://www.ruby-forum.com/\.

Rein Henrichs:

#hash makes sense for Hash# and etc. #eql? makes more
sense for Array#&. I too find it odd that both are necessary.

Both are necessary because #eql? says whether two objects are surely
the same, while #hash says whether they’re surely different – which,
perhaps counterintuitively, is not the same problem.

The difference is that in many, many cases it’s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether they’re surely the same (#eql? can be quite slow).

This is not necessarily true. Any reasonable implementation of #eql?
will bail out as soon as it sees a difference. On the contrary, you
always need to look at the complete state of an instance to calculate
#hash. I can easily construct an example where #eql? beats #hash:

14:40:54 Temp$ ruby19 eql-test.rb
same
  0.110000 0.000000 0.110000 ( 0.098000)
  0.093000 0.000000 0.093000 ( 0.099000)
  0.157000 0.000000 0.157000 ( 0.151000)
different early
  0.093000 0.000000 0.093000 ( 0.101000)
  0.094000 0.000000 0.094000 ( 0.096000)
  0.000000 0.000000 0.000000 ( 0.000000)
different late
  0.109000 0.000000 0.109000 ( 0.105000)
  0.094000 0.000000 0.094000 ( 0.098000)
  0.156000 0.000000 0.156000 ( 0.149000)
14:40:56 Temp$ cat eql-test.rb
require 'benchmark'
a1 = Array.new 1_000_000
a2 = Array.new 1_000_000
puts "same"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a1[0] = 1
a2[0] = 2
puts "different early"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a2[0] = a1[0]
a2[999_999] = 1
puts "different late"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
14:40:58 Temp$

Notice also how #eql? with equal arrays is not much slower than #hash.

The main difference betwen #eql? and #hash is that #hash can return the
same value for objects that are not #eql? (but if two objects are #eql?
then #hash must return the same value).

An untested, and definitely not optimal
(but hopefully simple) example follows. :slight_smile:

Imagine that you want to implement a new immutable string class, one
which caches the string length (for performance reasons). Imagine also
that the vast majority of such strings you use are of different lenghts,
and that you want to use them as Hash keys.

class ImmutableString

def initialize string
@string = string.dup.freeze
@length = string.length
end

end

Given the above assumptions, it might make sense for #hash to
return the @length, while #eql? makes the ‘proper’ comparison:

class ImmutableString

def hash
@length

Bad hash implementation. Why don't you use String#hash?

end

alias eql? ==

end

This way in the vast majority of cases, when your ImmutableStrings will
be considered for Hash keys, the check whether a given key exists will
be very quick; only when two objects #hash to the same value (i.e.,
when they’re not surely different) the #eql? is called to tell whether
they’re surely the same.

If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection. That's why I suggested to use a Struct for
key fields (which has proper #hash and #eql? built in).

Kind regards

robert

···

2010/6/11 Shot (Piotr Szotkowski) <shot@hot.pl>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I see, thanx :slight_smile:

···

On Fri, Jun 11, 2010 at 9:11 PM, Caleb Clausen <vikkous@gmail.com> wrote:

Forgive my confusion then.
Cheers
Robert

···

On Sat, Jun 12, 2010 at 10:55 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

I don't think we disagree, nor do I argue with you. I just posted blog
links as illustration to Rein's point about how to implement those methods.

--
The best way to predict the future is to invent it.
-- Alan Kay

Robert Klemme:

The difference is that in many, many cases it’s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether they’re surely the same (#eql? can be quite slow).

This is not necessarily true. Any reasonable implementation
of #eql? will bail out as soon as it sees a difference.

Sure, reasonable implementations of #eql? will test object properties in
the decreasing order of probability of a given property being different
between two objects (and bail out as soon as possible), but there are
cases where a fast #hash might be useful (partly immutable objects
which cache the hash based on the immutable parts, perhaps?), exactly
because it doesn’t have to reliably tell whether two objects are surely
the same (just whether they surely differ).

I agree I might’ve went over the top with the ‘many, many’
remark, though (but then I did not say ‘most’, just ‘many’…). :wink:

On the contrary, you always need to look at the
complete state of an instance to calculate #hash.

This is definitely not true; you only should consider the parts
that differentiate two objects of a given class most often, but
you definitely do not ‘need’ to look at the complete state (even
a constant #hash is valid, albeit quite useless).

The whole point of #hash is that it acts only as a hint whether two
objects are ‘the same’ – it’s your choice how credible vs how performant
it needs to be. At the same time, #eql? has to be 100% credible
(although you’re right that it can take many of the same shortcuts
a given #hash takes, and that there are cases where #hash can be slower
than #eql?, as in your Array example, but it’s just because you *want*
that #hash to depend on the complete state of an instance).

I can easily construct an example where #eql? beats #hash: […]
Notice also how #eql? with equal arrays is not much slower than #hash.

Sure, because Array#hash is implemented in the way you describe (its
hash depends on all of its elements). I was pointing out that there
are cases where it doesn’t make sense to implement #hash like this,
and having both #hash and #eql? gives you more control and more choices.

class ImmutableString

def initialize string
@string = string.dup.freeze
@length = string.length
end

def hash
@length
end

alias eql? ==

end

Bad hash implementation. Why don't you use String#hash?

Because String#hash depends on the contents of the string and is
recomputed every time, while in this particular scenario (where the
vast majority of very long strings differ in length) it might be faster
to refer to the cached length. Of course with immutable strings you
probably should just cache the hash, but I made the example immutable
to not have to add that @length needs to be recomputed on mutations
(I was also quite explicit that this is not an optimal example, just
a simple one).

Of course in this case a sane #eql? implementation would also bail out
as soon as the lengths differ, but my point was that #hash doesn’t have
to be credible on whether two objects really differ, while #eql? has
to, so in many cases #eql? has to start with checking all the properties
that #hash value depends upon anyway (but Array#eql? and Array#hash are
a good counterexample where such checks can bail out faster), plus it
often should check the class of ther ‘other’ as well (which is quick,
but one more check nevertheless).

If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection.

Sure, but I assume it’s not a very common situation; I’d think twice
before I designed an object with different ‘equality’ semantics. On
the other hand, crafting your own #==, #hash and #eql? is quite common
(at least I do it very often, because I often end up storing my objects
in Sets).

Note also that I was explicitely replying to the remark that it’s
‘odd that both [#hash and #eql?] are necessary’, not to the OP. :wink:

— Shot

···

2010/6/11 Shot (Piotr Szotkowski) <shot@hot.pl>:

--
1986: Brad Cox and Tom Love create Objective-C, announcing ‘this
language has all the memory safety of C combined with all the blazing
speed of Smalltalk’. Modern historians suspect the two were dyslexic.
[James Iry, A Brief, Incomplete, and Mostly Wrong History of Programming Languages]

No problem. I think I fueled it by not including a comment in the original posting. Sorry for that.

Kind regards

  robert

···

On 12.06.2010 11:01, Robert Dober wrote:

On Sat, Jun 12, 2010 at 10:55 AM, Robert Klemme > <shortcutter@googlemail.com> wrote:

I don't think we disagree, nor do I argue with you. I just posted blog
links as illustration to Rein's point about how to implement those methods.

Forgive my confusion then.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/