Array::uniq { block }?

Belorion · 26 January 2005 15:33

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Daniel_Berger3 · 26 January 2005 16:15

Belorion wrote:

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan

Robert · 26 January 2005 18:00

"Belorion" <belorion@gmail.com> schrieb im Newsbeitrag news:a48d774d05012607334cd93e1e@mail.gmail.com...

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:

%w{a b c d a d}.uniq {|*x| p x}

=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

How about using a Hash in the meantime?

h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Or

h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Kind regards

robert

Robert · 26 January 2005 18:05

"Robert Klemme" <bob.news@gmx.net> schrieb im Newsbeitrag news:35q3t9F4jlog7U1@individual.net...

"Belorion" <belorion@gmail.com> schrieb im Newsbeitrag news:a48d774d05012607334cd93e1e@mail.gmail.com...

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:

%w{a b c d a d}.uniq {|*x| p x}

=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

Delete that: you can provide a block for *every* method - it's simply ignored. So no bug, but unique obviously is not built to use it. Stupid me...

robert

···

How about using a Hash in the meantime?

h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Or

h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Kind regards

robert

Its_Me · 26 January 2005 18:55

I agree, #uniq is certainly not the only candidate. I think it would be very
useful to compile a list of all such methods that could use a block to
override some default internal method (#== in the case of uniq), and
consider these for 2.0.

"Daniel Berger" <djberg96@hotmail.com> wrote in message
news:1106756045.223279.94600@f14g2000cwb.googlegroups.com...

···

Belorion wrote:
> I have an array of arrays. I want to be able to do a uniq operation
> on my array, and have it accept a block so I can specify what to
> examine for uniqueness. For example, in IRB
>
> a = [ [1,2], [3,4], [1,3] ]
> a.sort { |x,y| x[0] <=> y[0] }
>
> returns => [ [1,2], [1,3], [3,4] ]
>
> However, it does not appear uniq accepts blocks:
>
> a = [ [1,2], [3,4], [1,3] ]
> a.uniq { |x,y| x[0] == y[0] }
>
> returns => [ [1,2], [3,4], [1,3] ]
> and a.uniq! with the above block returns nil.
>
> Am I stuck writing my own deep_uniq? Granted, it would be hard, but
> certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan

ts1 · 27 January 2005 10:33

I agree, #uniq is certainly not the only candidate. I think it would be very
useful to compile a list of all such methods that could use a block to
override some default internal method (#== in the case of uniq), and
consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Guy Decoux

Robert · 27 January 2005 11:07

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200501271033.j0RAXMu05608@moulon.inra.fr...

> I agree, #uniq is certainly not the only candidate. I think it would

be very

> useful to compile a list of all such methods that could use a block

to

> override some default internal method (#== in the case of uniq), and
> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Hm...

class Dummy
  def eql?(x) puts "eql?"; super end
  def equal?(x) puts "equal?"; super end
  def ==(x) puts "=="; super end
  def hash() puts "hash"; super end
  def id() puts "id"; super end
end

h={Dummy.new => 1, Dummy.new =>2}

hash
hash
=> {#<Dummy:0x10187428>=>2, #<Dummy:0x10187440>=>1}

h.keys.concat(h.keys).uniq

hash
hash
hash
hash
hash
hash
hash
hash
=> [#<Dummy:0x10187428>, #<Dummy:0x10187440>]

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

Kind regards

robert

Martin_DeMello1 · 27 January 2005 11:11

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function. A uniq_by would probably cover
most of the use cases anyway.

martin

···

ts <decoux@moulon.inra.fr> wrote:

> I agree, #uniq is certainly not the only candidate. I think it would be very
> useful to compile a list of all such methods that could use a block to
> override some default internal method (#== in the case of uniq), and
> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

ts1 · 27 January 2005 11:11

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
  * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
  * otherwise there is collision

Object#hash return the id of object

Guy Decoux

Its_Me · 28 January 2005 02:25

"Martin DeMello" <martindemello@yahoo.com> wrote in message
news:rZ3Kd.187524$6l.10402@pd7tw2no...

>
> > I agree, #uniq is certainly not the only candidate. I think it would

be very

> > useful to compile a list of all such methods that could use a block

to

> > override some default internal method (#== in the case of uniq), and
> > consider these for 2.0.
>
> #uniq don't use #==, it use #hash and #eql?

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function.

Good points. Probably applies to many other cases that I was thinking 'Now,
why is there no block here?'

A uniq_by would probably cover
most of the use cases anyway.

Why a different method? Is that the way it is done elsewhere? Why not
def uniq
    if block_given?
        ... less_efficient
    else default_way
end

···

ts <decoux@moulon.inra.fr> wrote:

Robert · 27 January 2005 12:25

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200501271111.j0RBBJu07211@moulon.inra.fr...

> I don't see eql? called here - maybe it's an internal optimization

that

> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Ah, yes of course. That's it! Thanks for reminding me.

Now if they have the same hash value :
  * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
  * otherwise there is collision

Of all what you said I don't understand your last point: Why is there a
collision?

class Foo; def hash() 0 end end

=> nil

h = {Foo.new => 1, Foo.new => 2}

=> {#<Foo:0x101a9808>=>2, #<Foo:0x101a9898>=>1}

h.keys.uniq

=> [#<Foo:0x101a9808>, #<Foo:0x101a9898>]

And indeed a redefined Dummy clarifies this:

class Dummy
  def eql?(x) puts "eql?"; super end
  def equal?(x) puts "equal?"; super end
  def ==(x) puts "=="; super end
  def hash() puts "hash"; 0 end
  def id() puts "id"; super end
end

h={Dummy.new => 1, Dummy.new => 2}

hash
hash
eql?
=> {#<Dummy:0x1018da28>=>2, #<Dummy:0x1018da88>=>1}

h.keys.uniq

hash
hash
eql?
=> [#<Dummy:0x1018da28>, #<Dummy:0x1018da88>]

Again a reminder that it's always a good idea to override #hash, #== and
#eql? in one go for classes that should be used as hash keys or in sets.

Object#hash return the id of object

Yeah, but that might be overridden in an arbitrary way. Normally one
can't rely on that #hash tells anything about the identity of an instance
other than probably that instances with different hash are not identical
and not equivalent.

Kind regards

robert

Bertram_Scharpf · 27 January 2005 13:46

Hi,

···

Am Donnerstag, 27. Jan 2005, 20:11:35 +0900 schrieb ts:

> I don't see eql? called here - maybe it's an internal optimization that
> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
  * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
  * otherwise there is collision

What a look at the interpreter source code will validate.
The elements addes as hash keys.

Strings are treated special, by the way.

Bertram

--
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

Martin_DeMello1 · 28 January 2005 08:40

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i[i] rather than
i. a.uniq {|i,j| f(i,j) == true } would be the generalised method, but
as noted that's O(n^2) rather than O(n). (Also I'm not sure it'd be
algorithmically sound - you could introduce order dependencies with a
sufficiently adversarial comparison function.)

martin

···

itsme213 <itsme213@hotmail.com> wrote:

"Martin DeMello" <martindemello@yahoo.com> wrote in message

> A uniq_by would probably cover
> most of the use cases anyway.

Why a different method? Is that the way it is done elsewhere? Why not
def uniq
    if block_given?
        ... less_efficient
    else default_way
end

ts1 · 27 January 2005 12:39

Of all what you said I don't understand your last point: Why is there a
collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a
different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)

Guy Decoux

7rans · 30 January 2005 10:45

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i[i]
rather than i. a.uniq {|i,j| f(i,j) == true } would be the
generalised method, but as noted that's O(n^2) rather
than O(n). (Also I'm not sure it'd be algorithmically sound
- you could introduce order dependencies with a sufficiently
adversarial comparison function.)

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.
Thanks,
T.

Robert · 27 January 2005 12:50

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200501271239.j0RCdOg11318@moulon.inra.fr...

> Of all what you said I don't understand your last point: Why is there

a

> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a

I think the term is "bucket" and not "key". Maybe that was confusing me.

different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same

hash

value but are different (eql? return false) (see st.c)

Ok, I now see what "collision" you meant. Of course there is a collision
but still both keys are stored.

Thx for clarifying!

robert

···

Guy Decoux

Pit · 30 January 2005 11:07

Trans schrieb:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

require 'set'

   class Array
     def uniq_by
       result =
       values = Set.new
       each do |elem|
         value = yield elem
         unless values.include? value
           values << value
           result << elem
         end
       end
       result
     end
   end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject

Regards,
Pit

Martin_DeMello1 · 30 January 2005 11:25

module Enumerable
  def uniq_by(*args, &blk)
    blk = lambda {|i| i.send(*args)} unless args.empty?
    raise "needs args or block" unless blk
    h = {}
    a =
    each {|i|
      k = blk.call(i)
      a << i unless h[k]
      h[k] = true
    }
    a
  end
end

p (-5..5).to_a.uniq_by {|i| i*i}
p [[1, 2], [3, 4], [5, 6], [7, 2]].uniq_by(:at, 1)

martin

···

Trans <transfire@gmail.com> wrote:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

7rans · 30 January 2005 13:55

LOL

*http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/128868
Oh, that's rich. I think I need a vacation.

Florian_Gross · 30 January 2005 14:10

Pit Capitain wrote:

Trans schrieb:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

  require 'set'

  class Array
    def uniq_by
      result =
      values = Set.new
      each do |elem|
        value = yield elem
        unless values.include? value
          values << value
          result << elem
        end
      end
      result
    end
  end

  p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject

module Enumerable
   def uniq_by()
     inject() do |state, item|
       value = yield(item)
       state.include?(value) ? state : state + [item]
     end
   end
end

Topic		Replies	Views
Array uniq ruby-talk	1	88	5 March 2005
Array.uniq on specific attributes ruby-talk	0	142	20 February 2012
Making Array#uniq work ruby-talk	0	70	24 January 2006
Uniq! array of arrays based on one element? ruby-talk	4	147	31 October 2012
Array.uniq on specific attributes ruby-talk	0	127	20 February 2012

Array::uniq { block }?

Related topics