Detecting duplicates in an array, anything in the standard library?

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
  def dups
    inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
  end
end

which will give:

%w(a b c c).dups

=> ["c"]

Anything more elegant ?

cheers

Thibaut

Thibaut Barrère wrote:

Anything more elegant ?

No! :-)) - I tried it only using Arrays...

a = [1,2,3,4,5,4,2,2]
p a.inject([,a[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # => [2, 4]
b = %w(a b c c)
p b.inject([,b[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # => ["c"]

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Couldn't you also just do a union with itself?

a = %w(a b c b a)
b = a & a #=> ["a", "b", "c"]

Score one for me :-))
~ Ari
English is like a pseudo-random number generator - there are a bajillion rules to it, but nobody cares.

···

On Aug 19, 2007, at 6:39 AM, Thibaut Barrère wrote:

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
  def dups
    inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
  end
end

which will give:

%w(a b c c).dups

=> ["c"]

Anything more elegant ?

Here's a modification of a technique used by
Simon Kroger:

class Array
  def dups
    values_at( * (0...size).to_a - uniq.map{|x| index(x)} )
  end
end
    ==>nil

%w(a b a c c d).dups
    ==>["a", "c"]

···

On Aug 19, 5:38 am, Thibaut Barrère <thibaut.barr...@gmail.com> wrote:

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
  def dups
    inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
  end
end

which will give:

> %w(a b c c).dups

=> ["c"]

Anything more elegant ?

cheers

Thibaut

Actually you are not deleting duplicates as far as I can see. Here's another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x| h+=1;h}.inject(){|h,(k,v)|h<<k if v>1;h}
=> ["c"]

You could even change that to need just one iteration through the original array but it's too late and I'm too lazy. :slight_smile:

Kind regards

  robert

···

On 19.08.2007 12:38, Thibaut Barrère wrote:

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
  def dups
    inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
  end
end

which will give:

%w(a b c c).dups

=> ["c"]

# inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys

sshhh, in ruby1.9, i think you just do

   group_by{|e|e}.select{|_,v| v.size>1}.keys

yes, yes, hash#select now hopefully returns hash.
can't we have group_by now ? :slight_smile:

kind regards -botp

···

From: Thibaut Barrère [mailto:thibaut.barrere@gmail.com]

I just thought I would put in my 2 cents. I actually had to create a
script that would run through a file and find all the duplicate account
numbers and the number of times they were duplicated and write that to a
new file.

@lines = Hash.new(0)
@group = Array.new
IO.readlines("C:/test/" + @file).each { |line|
@lines[line.split(';')[5].chomp] += 1 }
@lines.each_pair { |k,v| @group << k.to_s + " => " + v.to_s if v > 1 }

This is a part of the file that reads the file and grabs the duplicates

~Jeremy

···

--
Posted via http://www.ruby-forum.com/.

Hi --

Thibaut Barrère wrote:

Anything more elegant ?

No! :-)) - I tried it only using Arrays...

a = [1,2,3,4,5,4,2,2]
p a.inject([,a[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # => [2, 4]
b = %w(a b c c)
p b.inject([,b[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # => ["c"]

How about:

   >> a = [1,2,3,4,5,4,2,2]
   => [1, 2, 3, 4, 5, 4, 2, 2]
   >> a.inject() {|acc,e| acc << e unless acc.include?(e); acc }
   => [1, 2, 3, 4, 5]

David

···

On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Hi --

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
def dups
   inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:

%w(a b c c).dups

=> ["c"]

Anything more elegant ?

Couldn't you also just do a union with itself?

a = %w(a b c b a)
b = a & a #=> ["a", "b", "c"]

Score one for me :-))

I think that just reinvents uniq (see my previous reinvention :slight_smile:

For what it's worth, here's a nice-looking but probably very
inefficient version:

module ArrayStuff
   def count(e)
     select {|f| f == e }.size
   end

   def dups
     select {|e| count(e) > 1 }.uniq
   end
end

a = [1,2,3,3,4,5,2].extend(ArrayStuff)

p a.dups # [2,3]

David

···

On Sun, 19 Aug 2007, Ari Brown wrote:

On Aug 19, 2007, at 6:39 AM, Thibaut Barrère wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Thanks for all your replies!

Does everyone agree that #dups is the best name for this? I recently
added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

(Facets already had #nonuniq, btw.)

T.

···

On Aug 19, 12:34 pm, William James <w_a_x_...@yahoo.com> wrote:

On Aug 19, 5:38 am, Thibaut Barrère <thibaut.barr...@gmail.com> wrote:

> Hi!

> Just wondering if there is something simple already built in the std
> library to remove duplicates from an array (or an enumerable). I've
> seen and used various approaches, like:

> module Enumerable
> def dups
> inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
> end
> end

> which will give:

> > %w(a b c c).dups

> => ["c"]

> Anything more elegant ?

> cheers

> Thibaut

Here's a modification of a technique used by
Simon Kroger:

class Array
  def dups
    values_at( * (0...size).to_a - uniq.map{|x| index(x)} )
  end
end
    ==>nil

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
  def dups
    inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
  end
end

which will give:

%w(a b c c).dups

=> ["c"]

Actually you are not deleting duplicates as far as I can see.

Did I say it's too late? Man, I should've worn my glasses...

Here's another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x| h+=1;h}.inject(){|h,(k,v)|h<<k if v>1;h}
=> ["c"]

You could even change that to need just one iteration through the original array but it's too late and I'm too lazy. :slight_smile:

Cheers

  robert

···

On 19.08.2007 23:15, Robert Klemme wrote:

On 19.08.2007 12:38, Thibaut Barrère wrote:

Duplicates can also be extracted from an array like this:

class Array

  def find_dups
    uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
  end

end

(The faster, the better; http://snippets.dzone.com/posts/show/4148 )

Cheers,

j.k.

···

--
Posted via http://www.ruby-forum.com/.

Jeremy Woertink wrote:
I actually had to ... find all the duplicate account
numbers and the number of times they were duplicated and ... .
...
~Jeremy

A much less verbose 'nil' fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

And with this fixed version it's also possible to count & grab duplicate
array items in one go:

a = [nil,1,2,2,3,nil,nil]
a = (a * 5 << "unique_obj1" << "unique_obj2").sort_by { rand }

p a.uniq.map {|v| diff = (a.size - (a-[v]).size); (diff > 1) ? [v, diff]
: nil}.compact

=> [[2, 10], [3, 5], [nil, 15], [1, 5]]

Cheers,

j.k.

···

--
Posted via http://www.ruby-forum.com/\.

David A. Black wrote:

Hi --

: [r[0], r[1][1..-1]]}[0].uniq # => ["c"]

How about:

   >> a = [1,2,3,4,5,4,2,2]
   => [1, 2, 3, 4, 5, 4, 2, 2]
   >> a.inject() {|acc,e| acc << e unless acc.include?(e); acc }
   => [1, 2, 3, 4, 5]

David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn't care about double elements,
otherwise...

irb(main):004:0> a
=> [1, 2, 3, 4, 5, 4, 2, 2]
irb(main):005:0> b
=> [1, 2, 3, 4, 5]
irb(main):006:0> a-b
=>

...would work. My solution is not recommended at all - it's sunday after
lunch time, and I had the decision between cleaning the dishes or to do
some nice things before...

Wolfgang Nádasi-Donner

···

On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:

--
Posted via http://www.ruby-forum.com/\.

The only reason I'll accept that

is because you wrote the book I'm reading.

---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est man alive

···

On Aug 19, 2007, at 9:06 AM, David A. Black wrote:

I think that just reinvents uniq (see my previous reinvention :slight_smile:

> > Hi!

> > Just wondering if there is something simple already built in the std
> > library to remove duplicates from an array (or an enumerable). I've
> > seen and used various approaches, like:

> > module Enumerable
> > def dups
> > inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
> > end
> > end

> > which will give:

> > > %w(a b c c).dups

> > => ["c"]

                                I recently
added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

+1

···

On Aug 19, 3:05 pm, Trans <transf...@gmail.com> wrote:

On Aug 19, 12:34 pm, William James <w_a_x_...@yahoo.com> wrote:
> On Aug 19, 5:38 am, Thibaut Barrère <thibaut.barr...@gmail.com> wrote:

(Facets already had #nonuniq, btw.)

T.

or...

require 'set'

new_ary = ary.to_set.to_a #set strips dups.

···

On Aug 19, 5:16 pm, Robert Klemme <shortcut...@googlemail.com> wrote:

On 19.08.2007 23:15, Robert Klemme wrote:

> On 19.08.2007 12:38, Thibaut Barrère wrote:
>> Hi!

>> Just wondering if there is something simple already built in the std
>> library to remove duplicates from an array (or an enumerable). I've
>> seen and used various approaches, like:

>> module Enumerable
>> def dups
>> inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
>> end
>> end

>> which will give:

>>> %w(a b c c).dups
>> => ["c"]

> Actually you are not deleting duplicates as far as I can see.

Did I say it's too late? Man, I should've worn my glasses...

> Here's another one

> irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
> h+=1;h}.inject(){|h,(k,v)|h<<k if v>1;h}
> => ["c"]

> You could even change that to need just one iteration through the
> original array but it's too late and I'm too lazy. :slight_smile:

Cheers

        robert

# uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact

cool.
could we simplify it like,

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=> [1, 2]

kind regards -botp

···

From: Jimmy Kofler [mailto:koflerjim@mailinator.com]

Hi --

Duplicates can also be extracted from an array like this:

class Array

def find_dups
   uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

It's buggy, though:

  >> [nil,1,2,2,3,nil].find_dups
  => [2]

David

···

On Tue, 21 Aug 2007, Jimmy Kofler wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)