How to get non-unique elements from an array?

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

h = {}
a.each do |i|
  if h[i]
    h[i] += 1
  else
    h[i] = 1
  end
end

u = []
h.each do |k, v|
  if v > 1
    u << k
  end
end

#now u == [2,3]

This works fine.
But I think there's a better way.
How do you handle such a case?

Thanks in advance.

Sam

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Paolo

···

On 10/15/05, Sam Kong <sam.s.kong@gmail.com> wrote:

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

Here's what I thought of:

>> a = [0,1,2,3,4,5,2,3]
=> [0, 1, 2, 3, 4, 5, 2, 3]
>> seen = Hash.new(0)
=> {}
>> a.select { |e| (seen[e] += 1) > 1 }.uniq
=> [2, 3]

Hope that helps.

James Edward Gray II

···

On Oct 15, 2005, at 12:56 PM, Sam Kong wrote:

How about?

ary = [1, 1, 2, 3, 3, 4, 2, 5, 7]
#=> [1, 1, 2, 3, 3, 4, 2, 5, 7]
uary = ary.uniq; ary.map.delete_if{|i| x=uary.member?(i); uary.delete(i); x}
#=> [1, 3, 2]

···

On 10/15/05, Sam Kong <sam.s.kong@gmail.com> wrote:

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

--
Simon Strandgaard

Hello!

How about this ?
(0...a.length-1).select{|i| a[i]==a[i+1]}.map{|i|a[i]}.uniq

Hi --

···

On Sun, 16 Oct 2005, Sam Kong wrote:

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

One probably slow but kind of cool way is:

   a.uniq.find_all {|x| a.find_all {|y| y == x }.size > 1 }

David

--
David A. Black
dblack@wobblini.net

Not very elegant, but a different approach...

bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if (bn & ( 1<<el) ) != 0) | bn |= (1<<el) }
0.upto(a.length) { |bitNdx| puts bitNdx if (bn2 & ( 1<<bitNdx )) != 0 }

Without counting:

h1 = {}
h2 = {}
a.each {|i|
  h2[i] = true if h1[i]
  h1[i] = true
}

p h2.keys

martin

···

Sam Kong <sam.s.kong@gmail.com> wrote:

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

Two variants using regular expressions:

  eval(a.inspect.gsub(/\b(\d+),(?!.*\b\1\b)/,'')).uniq

  a.sort.inspect.scan(/\b(\d+)(, \1\b)+/).map{|x|x[0].to_i}

···

On Sun, 16 Oct 2005, Sam Kong wrote:

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

--
Relm

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a = [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=>["0", "1", "4", "5"]

Sure... it returns the elements as strings, but a quick "to_i" call in
the inevitable iterator that is coming solves that. This thing is also
subject to a Schwartzian Transform of the "map-sort-map" type.

Paul Dacus

Sam Kong wrote:

···

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

h = {}
a.each do |i|
  if h[i]
    h[i] += 1
  else
    h[i] = 1
  end
end

u =
h.each do |k, v|
  if v > 1
    u << k
  end
end

#now u == [2,3]

This works fine.
But I think there's a better way.
How do you handle such a case?

Thanks in advance.

Sam

Well, until someone catches the horrible gaff on my last try by solving
the EXACT OPPOSITE PROBLEM WHILE BEING CONDESCENDING (my wife is used
to this), I'll keep trying:

Of course, the following is what works, finding duplicates ELEMENTS:

a.sort.to_s.scan(/(.)(\1)+/).flatten.uniq

Yeah... I guess it's clear this finds individual duplicate ELEMENTS,
not numbers. Works on single digits/letters/etc, blows up on all else.
How's that for useless?

What would do the trick is a 'non-greedy' array subtraction:

Normal(greedy): [0,1,2,3,4,5,2,3] - [0,1,2,3,4,5] =
Non-greedy: [0,1,2,3,4,5,2,3] - [0,1,2,3,4,5] = [2,3]

Seems like there has got to be a way of solving this w/o counters or
blocks...

Sam Kong wrote:

···

Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

h = {}
a.each do |i|
  if h[i]
    h[i] += 1
  else
    h[i] = 1
  end
end

u =
h.each do |k, v|
  if v > 1
    u << k
  end
end

#now u == [2,3]

This works fine.
But I think there's a better way.
How do you handle such a case?

Thanks in advance.

Sam

Hi!

I'm the OP.
I thank those who replied to my message.
I learned a lot and seeing all different approaches was fun.

Somebody emailed me with his own solution which I like very much.
I want to share it with the group.

a = [0,1,2,3,4,5,2,3]
a.select{|e| a.index(e) != a.rindex(e)}.uniq

It is very brief and still very easy to understand.
Don't you think so?

I wonder if my slightly modified version works.

a.select{|e| a.grep(e).length > 1}.uniq

grep is for pattern but seems to work fine for integers.
I'm not sure though. (Simple tests show it works.)

In my opinion, using regex is too much for this problem (maybe
slower?).
Array#inject seems to be good but I get confused whenever I come across
it (maybe I'm not smart enough).

What do you think?

Thanks all!

Sam

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Thank you for the insight.

However, your code doesn't work if an element repeats more than twice.

a = [0,1,2,3,4,5,2,3,2] #there are three 2's.
Your code's result is [2, 3, 2] instead of [2, 3]
It should be modified like
u = a.inject() {|res, x| h ? res + : h = res}.uniq

Other than that, I like your way.
Thank you again.

Sam

a needs to be sorted for this to work. As far as I remember that was not a guaranteed precondition so you would have to add that here.

IMHO most efficient variants are still those that use a hash for element counting.

Kind regards

    robert

···

para.hsu@gmail.com wrote:

Hello!

How about this ?
(0...a.length-1).select{|i| a[i]==a[i+1]}.map{|i|a[i]}.uniq

Funny, doesnt work when an item in the array > 10...

Plus, reading the documentation ( doh! ) I can access a Bignum bit with

···

On 10/17/05, Lyndon Samson <lyndon.samson@gmail.com> wrote:

Not very elegant, but a different approach...

bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if (bn & ( 1<<el) ) != 0) | bn |= (1<<el) }
0.upto(a.length) { |bitNdx| puts bitNdx if (bn2 & ( 1<<bitNdx )) != 0 }

--
Into RFID? www.rfidnewsupdate.com <http://www.rfidnewsupdate.com> Simple,
fast, news.

no need to sort.. :slight_smile:

a = [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=> [0, 1, 4, 5]

···

On 10/17/05, pauldacus@gmail.com <pauldacus@gmail.com> wrote:

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a = [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=>["0", "1", "4", "5"]

--
Simon Strandgaard

It works, but is considerably slower (I've included a few other recent
submissions as well):

require 'benchmark'
include Benchmark

a = [0,1,2,3,4,5,2,3]

BMN = 50000
bm(5) do |bx|
  bx.report('index ') { BMN.times { a.select{|e| a.index(e) !=
a.rindex(e)}.uniq } }
  bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } }
  bx.report('scan ') { BMN.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
  bx.report('inject') { BMN.times { a.uniq.inject(a.dup) { |b,i|
b.delete_at(b.index(i)) ; b } } }
  bx.report('gsub ') { BMN.times {
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i} } }
end

Results on my laptop:

           user system total real
index 1.281000 0.000000 1.281000 ( 1.281000)
grep 3.813000 0.015000 3.828000 ( 3.827000)
scan 3.515000 0.000000 3.515000 ( 3.531000)
inject 1.594000 0.000000 1.594000 ( 1.593000)
gsub 3.063000 0.016000 3.079000 ( 3.078000)

Whoever created that index and rindex based solution created a very
fast one. I'll leave it as an exercise for the reading to benchmark
all the other solutions.

Ryan

···

On 10/17/05, Sam Kong <sam.s.kong@gmail.com> wrote:

I wonder if my slightly modified version works.

That is a good one.

Personally, I was curious though when you said you thought regex-ing it
was slower or 'too much', if it really would be on huge arrays.

And so a test:

(REGEX) a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
   vs.
(INDEX) a.select{|e| a.index(e) != a.rindex(e)}.uniq

I ran these on an array with 11,000 elements, 2,000 of which were
duplicates, as follows:

(1..10000).to_a + (2000..3000).to_a

The scan (REGEX) script finished in 0.21 seconds
The (INDEX) script ran in 28.77 seconds

The regex was over 100X faster!

And I actually was trying to run this with 110,000 elements but the
INDEX script seem to just lock up. The regex was done in 2.63 seconds,
and the INDEX script was still going 10 minutes later....

Have no doubt... a scan is darn fast!

Hey Sam,

I too tried a solution similar to...
a.select{|e| a.index(e) != a.rindex(e)}.uniq

Then i tried the one which i wrote...
t=; a.select{ |e| r=t.include?e; t<<e; r }.uniq

The second runs MUCH faster. The first one must search the array twice for
each element in in, and therefor runs something on the order of n^3 worst
case. The second only searches once for each and runs n^2 worst case. As for
the version with grep... Also, a.grep seems to run somewhere on the order of
n^4.

Adding mine to ryan's bench mark i come out with...

a = [0,1,2,3,4,5,2,3]

BMN = 50000
bm(5) do |bx|
bx.report('index ') { BMN.times { a.select{|e| a.index(e) != a.rindex(e)}.uniq
} }
bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } }
bx.report('scan ') { BMN.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
bx.report('inject ') { BMN.times { a.uniq.inject(a.dup) { |b,i| b.delete_at(
b.index(i)) ; b } } }
bx.report('gsub ') { BMN.times {
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
} }
bx.report('include') { BMN.times { t=; a.select{ |e| r=t.include?e; t<<e;
r }.uniq } }
end

user system total real
index 1.312000 0.000000 1.312000 ( 1.328000)
grep 3.594000 0.000000 3.594000 ( 3.625000)
scan 3.094000 0.000000 3.094000 ( 3.109000)
inject 1.406000 0.000000 1.406000 ( 1.407000)
gsub 2.844000 0.000000 2.844000 ( 2.843000)
include 1.109000 0.000000 1.109000 ( 1.110000)

Mark

···

On 10/17/05, Sam Kong <sam.s.kong@gmail.com> wrote:

Hi!

I'm the OP.
I thank those who replied to my message.
I learned a lot and seeing all different approaches was fun.

Somebody emailed me with his own solution which I like very much.
I want to share it with the group.

a = [0,1,2,3,4,5,2,3]
a.select{|e| a.index(e) != a.rindex(e)}.uniq

It is very brief and still very easy to understand.
Don't you think so?

I wonder if my slightly modified version works.

a.select{|e| a.grep(e).length > 1}.uniq

grep is for pattern but seems to work fine for integers.
I'm not sure though. (Simple tests show it works.)

In my opinion, using regex is too much for this problem (maybe
slower?).
Array#inject seems to be good but I get confused whenever I come across
it (maybe I'm not smart enough).

What do you think?

Thanks all!

Sam

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Thank you for the insight.

However, your code doesn't work if an element repeats more than twice.

a = [0,1,2,3,4,5,2,3,2] #there are three 2's.
Your code's result is [2, 3, 2] instead of [2, 3]
It should be modified like
u = a.inject() {|res, x| h ? res + : h = res}.uniq

Other than that, I like your way.
Thank you again.

Sam

···

Sam Kong <sam.s.kong@gmail.com> wrote: