Inconsistent behavior in Array methods

I recently tracked down a bug in my code to a line that read:

    cards = cards.sort.uniq

This seemed perfectly reasonable - the cards array contained Card

objects which were Enumerable. However, the bug was due to the fact that
Array#uniq does not use the object’s == method to do comparisons - even
though Array#sort uses the object’s <=> method. Upon investigation, I found
that the implementation of Array#uniq uses a hash, and since the key to the
hash is the object ID, uniqueness ends up being defined as (obj1.id ==
obj2.id). This was surprising to me.

Further investigation shows that the following methods of Array ignore

the object’s == method:

    Array#&
    Array#|
    Array#uniq
    Array#uniq!

On the other hand, the following methods honor the object's == method:

    Array#-
    Array#==
    Array#===
    Array#assoc
    Array#delete
    Array#include?
    Array#index
    Array#rassoc
    Array@rindex

In addition, the following functions honor the object's <=> method:

    Array#<=>
    Array#sort
    Array#sort!

Not only does this cause an inconsistency between the definition of

equality between “set intersection” (Array#&) and “set difference”
(Array#-), it also leads to different behavior between Numerics (and
Strings) and other objects:

irb(main):001:0> RUBY_VERSION
=> "1.6.8"
irb(main):002:0> class A
irb(main):003:1> def initialize(i) @i = i end
irb(main):004:1> def ==(j) @i == j end
irb(main):005:1> end
=> nil
irb(main):006:0> [1,1,2].uniq
=> [1, 2]
irb(main):007:0> [‘1’,‘1’,‘2’].uniq
=> [“1”, “2”]
irb(main):008:0> [A.new(1),A.new(1),A.new(2)].uniq
=> [#<A:0x810190c @i=1>, #<A:0x81018f8 @i=1>, #<A:0x81018e4 @i=2>]

It seems to me that uniqueness should be determined by the object's ==

method instead of the object’s ID as in Array#&, Array#|, Array#uniq, and
Array#uniq!. This seems more in line with methods like Array#- and
Array#include?.

Any other thoughts?

- Warren Brown

Hi,

···

In message “Inconsistent behavior in Array methods” on 03/02/21, “Warren Brown” wkb@airmail.net writes:

I recently tracked down a bug in my code to a line that read:

   cards = cards.sort.uniq

This seemed perfectly reasonable - the cards array contained Card
objects which were Enumerable. However, the bug was due to the fact that
Array#uniq does not use the object’s == method to do comparisons - even
though Array#sort uses the object’s <=> method. Upon investigation, I found
that the implementation of Array#uniq uses a hash, and since the key to the
hash is the object ID, uniqueness ends up being defined as (obj1.id ==
obj2.id). This was surprising to me.

I know, but I thought the order of execution time was more important,
especially for big arrays.

						matz.

Hello Warren,

Thursday, February 20, 2003, 9:27:27 PM, you wrote:

though Array#sort uses the object’s <=> method.

are you sure? :slight_smile:

class Fixnum
alias q <=>
def <=>x
-q(x)
end
end

p [1,2,3,4].sort #=>[1, 2, 3, 4]
p [1,2,3,4].sort {|a,b| a<=>b} #=>[4, 3, 2, 1]

···


Best regards,
Bulat mailto:bulatz@integ.ru

I ran into a similar issue that also surprised me. Below is an example of
it. It seems that the union operator (‘|’) uses object#id instead of
object#== to compare values. I understand the need for performance, but it
would be nice to be able to use == instead of comparing the id’s.

irb(main):001:0> [1,2,3] | [3,4]
=> [1, 2, 3, 4]
irb(main):002:0> [“a”,“b”,“c”] | [“c”,“d”]
=> [“a”, “b”, “c”, “d”]
irb(main):003:0> class A
irb(main):004:1> def initialize(x,y)
irb(main):005:2> @x,@y = x,y
irb(main):006:2> end
irb(main):007:1> def ==(other)
irb(main):008:2> @x == other.x && @y == other.y
irb(main):009:2> end
irb(main):010:1> end
=> nil
irb(main):011:0> class A
irb(main):012:1> attr_reader :x,:y
irb(main):013:1> end
=> nil
irb(main):014:0> [A.new(1,1), A.new(1,2)] | [A.new(1,2), A.new(1,3)]
=> [#<A:0x2810408 @y=1, @x=1>, #<A:0x28103f0 @y=2, @x=1>, #<A:0x2810390
@y=2, @x
=1>, #<A:0x2810318 @y=3, @x=1>]
irb(main):015:0>

Oddly though, if you do the same with strings, it doesn’t double up on the
strings even though they have differing object id’s. Why is that?

irb(main):001:0> a = “a”
=> “a”
irb(main):002:0> b = “b”
=> “b”
irb(main):003:0> b1 = “b”
=> “b”
irb(main):004:0> c = “c”
=> “c”
irb(main):005:0> [a, b] | [b1, c]
=> [“a”, “b”, “c”]
irb(main):006:0> b.id
=> 20682852
irb(main):007:0> b1.id
=> 20674092

Steve Tuckner

···

-----Original Message-----
From: Yukihiro Matsumoto [mailto:matz@ruby-lang.org]
Sent: Thursday, February 20, 2003 1:18 PM
To: ruby-talk ML
Subject: Re: Inconsistent behavior in Array methods

Hi,

In message “Inconsistent behavior in Array methods” > on 03/02/21, “Warren Brown” wkb@airmail.net writes:

I recently tracked down a bug in my code to a line that read:

   cards = cards.sort.uniq

This seemed perfectly reasonable - the cards array contained Card
objects which were Enumerable. However, the bug was due to
the fact that
Array#uniq does not use the object’s == method to do
comparisons - even
though Array#sort uses the object’s <=> method. Upon
investigation, I found
that the implementation of Array#uniq uses a hash, and since
the key to the
hash is the object ID, uniqueness ends up being defined as
(obj1.id ==
obj2.id). This was surprising to me.

I know, but I thought the order of execution time was more important,
especially for big arrays.

  					matz.

Bulat,

though Array#sort uses the object’s <=> method.

are you sure? :slight_smile:

class Fixnum
alias q <=>
def <=>x
-q(x)
end
end

p [1,2,3,4].sort #=>[1, 2, 3, 4]
p [1,2,3,4].sort {|a,b| a<=>b} #=>[4, 3, 2, 1]

Quite sure:

irb(main):001:0> RUBY_VERSION
=> “1.6.8”
irb(main):002:0> class A; end
=> nil
irb(main):003:0> [A.new,A.new].sort
NameError: undefined method <=>' for #<A:0x8054cb4> from (irb):3:in sort’
from (irb):3

There are special cases for Fixnum and String in Array#sort:

if (FIXNUM_P(a) && FIXNUM_P(b)) {
if (a > b) return 1;
if (a < b) return -1;
return 0;
}
if (TYPE(a) == T_STRING && TYPE(b) == T_STRING) {
return rb_str_cmp(a, b);
}

retval = rb_funcall(a, cmp, 1, b);
return rb_cmpint(retval);

- Warren Brown

Hi,

···

In message “Re: Inconsistent behavior in Array methods” on 03/02/21, Steve Tuckner STUCKNER@MULTITECH.COM writes:

I ran into a similar issue that also surprised me. Below is an example of
it. It seems that the union operator (‘|’) uses object#id instead of
object#== to compare values. I understand the need for performance, but it
would be nice to be able to use == instead of comparing the id’s.

It uses Hash inside, so that you have to override “eql?” and “hash” as
well as “==”.

						matz.