How to get non-unique elements from an array?

Sam_Kong · 15 October 2005 17:56

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

h = {}
a.each do |i|
  if h[i]
    h[i] += 1
  else
    h[i] = 1
  end
end

u = []
h.each do |k, v|
  if v > 1
    u << k
  end
end

#now u == [2,3]

This works fine.
But I think there's a better way.
How do you handle such a case?

Thanks in advance.

Sam

Paolo_Capriotti · 15 October 2005 18:17

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Paolo

···

On 10/15/05, Sam Kong <sam.s.kong@gmail.com> wrote:

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

James_Edward_Gray_II · 15 October 2005 18:23

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

Here's what I thought of:

>> a = [0,1,2,3,4,5,2,3]
=> [0, 1, 2, 3, 4, 5, 2, 3]
>> seen = Hash.new(0)
=> {}
>> a.select { |e| (seen[e] += 1) > 1 }.uniq
=> [2, 3]

Hope that helps.

James Edward Gray II

···

On Oct 15, 2005, at 12:56 PM, Sam Kong wrote:

Simon_Strandgaard2 · 15 October 2005 18:31

How about?

ary = [1, 1, 2, 3, 3, 4, 2, 5, 7]
#=> [1, 1, 2, 3, 3, 4, 2, 5, 7]
uary = ary.uniq; ary.map.delete_if{|i| x=uary.member?(i); uary.delete(i); x}
#=> [1, 3, 2]

···

On 10/15/05, Sam Kong <sam.s.kong@gmail.com> wrote:

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

--
Simon Strandgaard

para.hsu · 16 October 2005 11:31

Hello!

How about this ?
(0...a.length-1).select{|i| a[i]==a[i+1]}.map{|i|a[i]}.uniq

David_A_Black3 · 16 October 2005 15:51

Hi --

···

On Sun, 16 Oct 2005, Sam Kong wrote:

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

One probably slow but kind of cool way is:

a.uniq.find_all {|x| a.find_all {|y| y == x }.size > 1 }

David

--
David A. Black
dblack@wobblini.net

Lyndon_Samson · 17 October 2005 01:38

Not very elegant, but a different approach...

bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if (bn & ( 1<<el) ) != 0) | bn |= (1<<el) }
0.upto(a.length) { |bitNdx| puts bitNdx if (bn2 & ( 1<<bitNdx )) != 0 }

Martin_DeMello1 · 17 October 2005 07:36

Without counting:

h1 = {}
h2 = {}
a.each {|i|
h2[i] = true if h1[i]
h1[i] = true
}

p h2.keys

martin

···

Sam Kong <sam.s.kong@gmail.com> wrote:

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

Relm · 17 October 2005 09:06

Two variants using regular expressions:

eval(a.inspect.gsub(/\b(\d+),(?!.*\b\1\b)/,'')).uniq

a.sort.inspect.scan(/\b(\d+)(, \1\b)+/).map{|x|x[0].to_i}

···

On Sun, 16 Oct 2005, Sam Kong wrote:

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

--
Relm

pauldacus · 17 October 2005 16:03

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a = [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=>["0", "1", "4", "5"]

Sure... it returns the elements as strings, but a quick "to_i" call in
the inevitable iterator that is coming solves that. This thing is also
subject to a Schwartzian Transform of the "map-sort-map" type.

Paul Dacus

Sam Kong wrote:

···

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

h = {}
a.each do |i|
  if h[i]
    h[i] += 1
  else
    h[i] = 1
  end
end

u =
h.each do |k, v|
  if v > 1
    u << k
  end
end

#now u == [2,3]

This works fine.
But I think there's a better way.
How do you handle such a case?

Thanks in advance.

Sam

pauldacus · 17 October 2005 17:41

Well, until someone catches the horrible gaff on my last try by solving
the EXACT OPPOSITE PROBLEM WHILE BEING CONDESCENDING (my wife is used
to this), I'll keep trying:

Of course, the following is what works, finding duplicates ELEMENTS:

a.sort.to_s.scan(/(.)(\1)+/).flatten.uniq

Yeah... I guess it's clear this finds individual duplicate ELEMENTS,
not numbers. Works on single digits/letters/etc, blows up on all else.
How's that for useless?

What would do the trick is a 'non-greedy' array subtraction:

Normal(greedy): [0,1,2,3,4,5,2,3] - [0,1,2,3,4,5] =
Non-greedy: [0,1,2,3,4,5,2,3] - [0,1,2,3,4,5] = [2,3]

Seems like there has got to be a way of solving this w/o counters or
blocks...

Sam Kong wrote:

···

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

h = {}
a.each do |i|
  if h[i]
    h[i] += 1
  else
    h[i] = 1
  end
end

u =
h.each do |k, v|
  if v > 1
    u << k
  end
end

#now u == [2,3]

This works fine.
But I think there's a better way.
How do you handle such a case?

Thanks in advance.

Sam

Sam_Kong · 17 October 2005 19:36

Hi!

I'm the OP.
I thank those who replied to my message.
I learned a lot and seeing all different approaches was fun.

Somebody emailed me with his own solution which I like very much.
I want to share it with the group.

a = [0,1,2,3,4,5,2,3]
a.select{|e| a.index(e) != a.rindex(e)}.uniq

It is very brief and still very easy to understand.
Don't you think so?

I wonder if my slightly modified version works.

a.select{|e| a.grep(e).length > 1}.uniq

grep is for pattern but seems to work fine for integers.
I'm not sure though. (Simple tests show it works.)

In my opinion, using regex is too much for this problem (maybe
slower?).
Array#inject seems to be good but I get confused whenever I come across
it (maybe I'm not smart enough).

What do you think?

Thanks all!

Sam

Sam_Kong · 15 October 2005 19:11

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Thank you for the insight.

However, your code doesn't work if an element repeats more than twice.

a = [0,1,2,3,4,5,2,3,2] #there are three 2's.
Your code's result is [2, 3, 2] instead of [2, 3]
It should be modified like
u = a.inject() {|res, x| h ? res + : h = res}.uniq

Other than that, I like your way.
Thank you again.

Sam

Robert · 16 October 2005 12:52

a needs to be sorted for this to work. As far as I remember that was not a guaranteed precondition so you would have to add that here.

IMHO most efficient variants are still those that use a hash for element counting.

Kind regards

robert

···

para.hsu@gmail.com wrote:

Hello!

How about this ?
(0...a.length-1).select{|i| a[i]==a[i+1]}.map{|i|a[i]}.uniq

Lyndon_Samson · 17 October 2005 01:44

Funny, doesnt work when an item in the array > 10...

Plus, reading the documentation ( doh! ) I can access a Bignum bit with

···

On 10/17/05, Lyndon Samson <lyndon.samson@gmail.com> wrote:

Not very elegant, but a different approach...

bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if (bn & ( 1<<el) ) != 0) | bn |= (1<<el) }
0.upto(a.length) { |bitNdx| puts bitNdx if (bn2 & ( 1<<bitNdx )) != 0 }

--
Into RFID? www.rfidnewsupdate.com <http://www.rfidnewsupdate.com> Simple,
fast, news.

Simon_Strandgaard2 · 17 October 2005 16:41

no need to sort..

a = [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=> [0, 1, 4, 5]

···

On 10/17/05, pauldacus@gmail.com <pauldacus@gmail.com> wrote:

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a = [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=>["0", "1", "4", "5"]

--
Simon Strandgaard

Ryan_Leavengood · 17 October 2005 20:11

It works, but is considerably slower (I've included a few other recent
submissions as well):

require 'benchmark'
include Benchmark

a = [0,1,2,3,4,5,2,3]

BMN = 50000
bm(5) do |bx|
  bx.report('index ') { BMN.times { a.select{|e| a.index(e) !=
a.rindex(e)}.uniq } }
  bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } }
  bx.report('scan ') { BMN.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
  bx.report('inject') { BMN.times { a.uniq.inject(a.dup) { |b,i|
b.delete_at(b.index(i)) ; b } } }
  bx.report('gsub ') { BMN.times {
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i} } }
end

Results on my laptop:

user system total real
index 1.281000 0.000000 1.281000 ( 1.281000)
grep 3.813000 0.015000 3.828000 ( 3.827000)
scan 3.515000 0.000000 3.515000 ( 3.531000)
inject 1.594000 0.000000 1.594000 ( 1.593000)
gsub 3.063000 0.016000 3.079000 ( 3.078000)

Whoever created that index and rindex based solution created a very
fast one. I'll leave it as an exercise for the reading to benchmark
all the other solutions.

Ryan

···

On 10/17/05, Sam Kong <sam.s.kong@gmail.com> wrote:

I wonder if my slightly modified version works.

pauldacus · 17 October 2005 20:26

That is a good one.

Personally, I was curious though when you said you thought regex-ing it
was slower or 'too much', if it really would be on huge arrays.

And so a test:

(REGEX) a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
vs.
(INDEX) a.select{|e| a.index(e) != a.rindex(e)}.uniq

I ran these on an array with 11,000 elements, 2,000 of which were
duplicates, as follows:

(1..10000).to_a + (2000..3000).to_a

The scan (REGEX) script finished in 0.21 seconds
The (INDEX) script ran in 28.77 seconds

The regex was over 100X faster!

And I actually was trying to run this with 110,000 elements but the
INDEX script seem to just lock up. The regex was done in 2.63 seconds,
and the INDEX script was still going 10 minutes later....

Have no doubt... a scan is darn fast!

Mark_Van_Holstyn1 · 17 October 2005 20:40

Hey Sam,

I too tried a solution similar to...
a.select{|e| a.index(e) != a.rindex(e)}.uniq

Then i tried the one which i wrote...
t=; a.select{ |e| r=t.include?e; t<<e; r }.uniq

The second runs MUCH faster. The first one must search the array twice for
each element in in, and therefor runs something on the order of n^3 worst
case. The second only searches once for each and runs n^2 worst case. As for
the version with grep... Also, a.grep seems to run somewhere on the order of
n^4.

Adding mine to ryan's bench mark i come out with...

a = [0,1,2,3,4,5,2,3]

BMN = 50000
bm(5) do |bx|
bx.report('index ') { BMN.times { a.select{|e| a.index(e) != a.rindex(e)}.uniq
} }
bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } }
bx.report('scan ') { BMN.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
bx.report('inject ') { BMN.times { a.uniq.inject(a.dup) { |b,i| b.delete_at(
b.index(i)) ; b } } }
bx.report('gsub ') { BMN.times {
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
} }
bx.report('include') { BMN.times { t=; a.select{ |e| r=t.include?e; t<<e;
r }.uniq } }
end

user system total real
index 1.312000 0.000000 1.312000 ( 1.328000)
grep 3.594000 0.000000 3.594000 ( 3.625000)
scan 3.094000 0.000000 3.094000 ( 3.109000)
inject 1.406000 0.000000 1.406000 ( 1.407000)
gsub 2.844000 0.000000 2.844000 ( 2.843000)
include 1.109000 0.000000 1.109000 ( 1.110000)

Mark

···

On 10/17/05, Sam Kong <sam.s.kong@gmail.com> wrote:

Hi!

I'm the OP.
I thank those who replied to my message.
I learned a lot and seeing all different approaches was fun.

Somebody emailed me with his own solution which I like very much.
I want to share it with the group.

a = [0,1,2,3,4,5,2,3]
a.select{|e| a.index(e) != a.rindex(e)}.uniq

It is very brief and still very easy to understand.
Don't you think so?

I wonder if my slightly modified version works.

a.select{|e| a.grep(e).length > 1}.uniq

grep is for pattern but seems to work fine for integers.
I'm not sure though. (Simple tests show it works.)

In my opinion, using regex is too much for this problem (maybe
slower?).
Array#inject seems to be good but I get confused whenever I come across
it (maybe I'm not smart enough).

What do you think?

Thanks all!

Sam

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

Robert · 15 October 2005 19:31

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Thank you for the insight.

However, your code doesn't work if an element repeats more than twice.

a = [0,1,2,3,4,5,2,3,2] #there are three 2's.
Your code's result is [2, 3, 2] instead of [2, 3]
It should be modified like
u = a.inject() {|res, x| h ? res + : h = res}.uniq

Other than that, I like your way.
Thank you again.

Sam

···

Sam Kong <sam.s.kong@gmail.com> wrote:

Topic		Replies	Views
Isolating non-unique items in an array ruby-talk	15	111	13 October 2006
Making a count of unique elements in an array ruby-talk	1	129	9 January 2006
How to get only the duplicatet items from an array ruby-talk	4	114	27 October 2006
Duplicate elements in array ruby-talk	14	91	29 October 2007
Isolating non-unique items in an array ruby-talk	0	116	12 October 2006

How to get non-unique elements from an array?

Related topics