Array::uniq { block }?

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Belorion wrote:

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan

"Belorion" <belorion@gmail.com> schrieb im Newsbeitrag news:a48d774d05012607334cd93e1e@mail.gmail.com...

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:

%w{a b c d a d}.uniq {|*x| p x}

=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

How about using a Hash in the meantime?

h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Or

h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Kind regards

    robert

"Robert Klemme" <bob.news@gmx.net> schrieb im Newsbeitrag news:35q3t9F4jlog7U1@individual.net...

"Belorion" <belorion@gmail.com> schrieb im Newsbeitrag news:a48d774d05012607334cd93e1e@mail.gmail.com...

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:

%w{a b c d a d}.uniq {|*x| p x}

=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

Delete that: you can provide a block for *every* method - it's simply ignored. So no bug, but unique obviously is not built to use it. Stupid me...

    robert

···

How about using a Hash in the meantime?

h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Or

h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

=> {1=>[1, 2], 3=>[3, 4]}

h.values

=> [[1, 2], [3, 4]]

Kind regards

   robert

I agree, #uniq is certainly not the only candidate. I think it would be very
useful to compile a list of all such methods that could use a block to
override some default internal method (#== in the case of uniq), and
consider these for 2.0.

"Daniel Berger" <djberg96@hotmail.com> wrote in message
news:1106756045.223279.94600@f14g2000cwb.googlegroups.com...

···

Belorion wrote:
> I have an array of arrays. I want to be able to do a uniq operation
> on my array, and have it accept a block so I can specify what to
> examine for uniqueness. For example, in IRB
>
> a = [ [1,2], [3,4], [1,3] ]
> a.sort { |x,y| x[0] <=> y[0] }
>
> returns => [ [1,2], [1,3], [3,4] ]
>
> However, it does not appear uniq accepts blocks:
>
> a = [ [1,2], [3,4], [1,3] ]
> a.uniq { |x,y| x[0] == y[0] }
>
> returns => [ [1,2], [3,4], [1,3] ]
> and a.uniq! with the above block returns nil.
>
> Am I stuck writing my own deep_uniq? Granted, it would be hard, but
> certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan

I agree, #uniq is certainly not the only candidate. I think it would be very
useful to compile a list of all such methods that could use a block to
override some default internal method (#== in the case of uniq), and
consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Guy Decoux

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200501271033.j0RAXMu05608@moulon.inra.fr...

> I agree, #uniq is certainly not the only candidate. I think it would

be very

> useful to compile a list of all such methods that could use a block

to

> override some default internal method (#== in the case of uniq), and
> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Hm...

class Dummy
  def eql?(x) puts "eql?"; super end
  def equal?(x) puts "equal?"; super end
  def ==(x) puts "=="; super end
  def hash() puts "hash"; super end
  def id() puts "id"; super end
end

h={Dummy.new => 1, Dummy.new =>2}

hash
hash
=> {#<Dummy:0x10187428>=>2, #<Dummy:0x10187440>=>1}

h.keys.concat(h.keys).uniq

hash
hash
hash
hash
hash
hash
hash
hash
=> [#<Dummy:0x10187428>, #<Dummy:0x10187440>]

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

Kind regards

    robert

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function. A uniq_by would probably cover
most of the use cases anyway.

martin

···

ts <decoux@moulon.inra.fr> wrote:

> I agree, #uniq is certainly not the only candidate. I think it would be very
> useful to compile a list of all such methods that could use a block to
> override some default internal method (#== in the case of uniq), and
> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
  * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
  * otherwise there is collision

Object#hash return the id of object

Guy Decoux

"Martin DeMello" <martindemello@yahoo.com> wrote in message
news:rZ3Kd.187524$6l.10402@pd7tw2no...

>
> > I agree, #uniq is certainly not the only candidate. I think it would

be very

> > useful to compile a list of all such methods that could use a block

to

> > override some default internal method (#== in the case of uniq), and
> > consider these for 2.0.
>
> #uniq don't use #==, it use #hash and #eql?

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function.

Good points. Probably applies to many other cases that I was thinking 'Now,
why is there no block here?'

A uniq_by would probably cover
most of the use cases anyway.

Why a different method? Is that the way it is done elsewhere? Why not
def uniq
    if block_given?
        ... less_efficient
    else default_way
end

···

ts <decoux@moulon.inra.fr> wrote:

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200501271111.j0RBBJu07211@moulon.inra.fr...

> I don't see eql? called here - maybe it's an internal optimization

that

> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Ah, yes of course. That's it! Thanks for reminding me.

Now if they have the same hash value :
  * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
  * otherwise there is collision

Of all what you said I don't understand your last point: Why is there a
collision?

class Foo; def hash() 0 end end

=> nil

h = {Foo.new => 1, Foo.new => 2}

=> {#<Foo:0x101a9808>=>2, #<Foo:0x101a9898>=>1}

h.keys.uniq

=> [#<Foo:0x101a9808>, #<Foo:0x101a9898>]

And indeed a redefined Dummy clarifies this:

class Dummy
  def eql?(x) puts "eql?"; super end
  def equal?(x) puts "equal?"; super end
  def ==(x) puts "=="; super end
  def hash() puts "hash"; 0 end
  def id() puts "id"; super end
end

h={Dummy.new => 1, Dummy.new => 2}

hash
hash
eql?
=> {#<Dummy:0x1018da28>=>2, #<Dummy:0x1018da88>=>1}

h.keys.uniq

hash
hash
eql?
=> [#<Dummy:0x1018da28>, #<Dummy:0x1018da88>]

Again a reminder that it's always a good idea to override #hash, #== and
#eql? in one go for classes that should be used as hash keys or in sets.

Object#hash return the id of object

Yeah, but that might be overridden in an arbitrary way. Normally one
can't rely on that #hash tells anything about the identity of an instance
other than probably that instances with different hash are not identical
and not equivalent.

Kind regards

    robert

Hi,

···

Am Donnerstag, 27. Jan 2005, 20:11:35 +0900 schrieb ts:

> I don't see eql? called here - maybe it's an internal optimization that
> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
  * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
  * otherwise there is collision

What a look at the interpreter source code will validate.
The elements addes as hash keys.

Strings are treated special, by the way.

Bertram

--
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i[i] rather than
i. a.uniq {|i,j| f(i,j) == true } would be the generalised method, but
as noted that's O(n^2) rather than O(n). (Also I'm not sure it'd be
algorithmically sound - you could introduce order dependencies with a
sufficiently adversarial comparison function.)

martin

···

itsme213 <itsme213@hotmail.com> wrote:

"Martin DeMello" <martindemello@yahoo.com> wrote in message

> A uniq_by would probably cover
> most of the use cases anyway.

Why a different method? Is that the way it is done elsewhere? Why not
def uniq
    if block_given?
        ... less_efficient
    else default_way
end

Of all what you said I don't understand your last point: Why is there a
collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a
different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)

Guy Decoux

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i[i]
rather than i. a.uniq {|i,j| f(i,j) == true } would be the
generalised method, but as noted that's O(n^2) rather
than O(n). (Also I'm not sure it'd be algorithmically sound
- you could introduce order dependencies with a sufficiently
adversarial comparison function.)

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.
Thanks,
T.

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200501271239.j0RCdOg11318@moulon.inra.fr...

> Of all what you said I don't understand your last point: Why is there

a

> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a

I think the term is "bucket" and not "key". Maybe that was confusing me.

different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same

hash

value but are different (eql? return false) (see st.c)

Ok, I now see what "collision" you meant. Of course there is a collision
but still both keys are stored.

Thx for clarifying!

    robert

···

Guy Decoux

Trans schrieb:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

   require 'set'

   class Array
     def uniq_by
       result =
       values = Set.new
       each do |elem|
         value = yield elem
         unless values.include? value
           values << value
           result << elem
         end
       end
       result
     end
   end

   p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject :wink:

Regards,
Pit

module Enumerable
  def uniq_by(*args, &blk)
    blk = lambda {|i| i.send(*args)} unless args.empty?
    raise "needs args or block" unless blk
    h = {}
    a =
    each {|i|
      k = blk.call(i)
      a << i unless h[k]
      h[k] = true
    }
    a
  end
end

p (-5..5).to_a.uniq_by {|i| i*i}
p [[1, 2], [3, 4], [5, 6], [7, 2]].uniq_by(:at, 1)

martin

···

Trans <transfire@gmail.com> wrote:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

LOL :slight_smile:

*http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/128868
Oh, that's rich. I think I need a vacation.

Pit Capitain wrote:

Trans schrieb:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

  require 'set'

  class Array
    def uniq_by
      result =
      values = Set.new
      each do |elem|
        value = yield elem
        unless values.include? value
          values << value
          result << elem
        end
      end
      result
    end
  end

  p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject :wink:

module Enumerable
   def uniq_by()
     inject() do |state, item|
       value = yield(item)
       state.include?(value) ? state : state + [item]
     end
   end
end