How to get non-unique elements from an array?

Robert · 15 October 2005 19:31

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Thank you for the insight.

However, your code doesn't work if an element repeats more than twice.

a = [0,1,2,3,4,5,2,3,2] #there are three 2's.
Your code's result is [2, 3, 2] instead of [2, 3]
It should be modified like
u = a.inject() {|res, x| h ? res + : h = res}.uniq

Other than that, I like your way.
Thank you again.

Sam

a = [0,1,2,3,4,5,2,3]

=> [0, 1, 2, 3, 4, 5, 2, 3]

a.inject(Hash.new(0)) {|h,e| h[e]+=1;h}.select {|k,v|v>1}.map {|a,b|a}

=> [2, 3]

Quite complicated...

robert

···

Sam Kong <sam.s.kong@gmail.com> wrote:

Ara.T.Howard6 · 16 October 2005 14:11

and you needed even count i think:

   harp:~ > cat a.rb
   class Array
     def dups
       h, d = {}, ; each{|e| h[e] ? (d << h.delete(e)) : (h[e] = e) }; d
     end
   end

a = 0, 1, 2, 3, 4, 5, 2, 3, 2

p a.dups

harp:~ > ruby a.rb
[2, 3]

regards.

-a

···

On Sun, 16 Oct 2005, Robert Klemme wrote:

para.hsu@gmail.com wrote:

Hello!

How about this ?
(0...a.length-1).select{|i| a[i]==a[i+1]}.map{|i|a[i]}.uniq

a needs to be sorted for this to work. As far as I remember that was not a guaranteed precondition so you would have to add that here.

IMHO most efficient variants are still those that use a hash for element counting.

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
anything that contradicts experience and logic should be abandoned.
-- h.h. the 14th dalai lama

===============================================================================

Explosiv0SX · 17 October 2005 02:07

You could just sort the array, then iterate. If the previous iteration value is equal to the current, you have a non-unique. This would work with string or numeric data types.

Rick

···

On Oct 16, 2005, at 9:44 PM, Lyndon Samson wrote:

Funny, doesnt work when an item in the array > 10...

Plus, reading the documentation ( doh! ) I can access a Bignum bit with

On 10/17/05, Lyndon Samson <lyndon.samson@gmail.com> wrote:

Not very elegant, but a different approach...

bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if (bn & ( 1<<el) ) != 0) | bn |= (1<<el) }
0.upto(a.length) { |bitNdx| puts bitNdx if (bn2 & ( 1<<bitNdx )) != 0 }

--
Into RFID? www.rfidnewsupdate.com <http://www.rfidnewsupdate.com> Simple,
fast, news.

Ryan_Leavengood · 17 October 2005 17:53

Both are interesting solutions, but you are returning the unique
elements of the array, not the non-unique ones, which was the original
problem. I say this not to be pedantic, but to see how you would solve
it.

Ryan

···

On 10/17/05, Simon Strandgaard <neoneye@gmail.com> wrote:

On 10/17/05, pauldacus@gmail.com <pauldacus@gmail.com> wrote:
> Young grasshoppers... let us never forget the power of the REGEX! Let
> us not bother with counters or blocks... let us GSUB this bad boy!
>
> a = [0,1,2,3,4,5,2,3]
>
> a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)
>
> =>["0", "1", "4", "5"]

no need to sort..

a = [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=> [0, 1, 4, 5]

Ryan_Leavengood · 17 October 2005 21:16

And so a test:

(REGEX) a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
vs.
(INDEX) a.select{|e| a.index(e) != a.rindex(e)}.uniq

I ran these on an array with 11,000 elements, 2,000 of which were
duplicates, as follows:

(1..10000).to_a + (2000..3000).to_a

The scan (REGEX) script finished in 0.21 seconds
The (INDEX) script ran in 28.77 seconds

The regex was over 100X faster!

Thanks for this. I was wondering if my method of benchmarking might be
flawed, and it sort of is in this case. But it makes sense here
because the scan just has to make one run through the string, whereas
each iteration of the select has multiple calls to an index which
might potentially traverse the entire array. As my benchmark shows
though, a lot of searches in a smaller array is faster using the
index.

Also another problem with the string scanning is that it depends on
the array elements being numbers (at least in the case above.)

I decided to benchmark again using your method and adding some more
implementations. Here we go (beware of wrapped lines):

require 'benchmark'
include Benchmark

[[[0,1,2,3,4,5,2,3], 50000],[(1..5000).to_a + (2000..2500).to_a,
1]].each do |a, iterations|
  bm(12) do |bx|
    bx.report('simonk') { iterations.times { a.select{|e| a.index(e)
!= a.rindex(e)}.uniq } }
    bx.report('samk ') { iterations.times { a.select{|e|
a.grep(e).length > 1}.uniq } }
    bx.report('paul ') { iterations.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
    bx.report('jeff ') { iterations.times { a.uniq.inject(a.dup) {

b,i| b.delete_at(b.index(i)) ; b } } }

    bx.report('paolo ') { iterations.times { h = {};u = a.inject()
{|res, x| h ? res + : h = res}.uniq } }
    bx.report('jegII ') { iterations.times { seen =
Hash.new(0);a.select { |e| (seen[e] += 1) > 1 }.uniq } }
    bx.report('simons') { iterations.times { uary = a.uniq;
a.map.delete_if{|i| x=uary.member?(i); uary.delete(i); x} } }
    bx.report('robert') { iterations.times { a.inject(Hash.new(0))
{|h,e| h[e]+=1;h}.select {|k,v|v>1}.map {|x,y|x} } }
    bx.report('david ') { iterations.times { a.uniq.find_all {|x|
a.find_all {|y| y == x }.size > 1 } } }
    bx.report('zach ') { iterations.times { t=; a.delete_if{ |e|
r=(not t.include? e); t.push(e); r }.uniq } }
    bx.report('markvh') { iterations.times { t=; a.select{ |e|
r=t.include?e; t<<e; r }.uniq } }
    bx.report('martin') { iterations.times { h1 = {};h2 = {};a.each
{|i| h2[i] = true if h1[i];h1[i] = true };h2.keys } }
  end
end

The results:

Small array, lots of iterations:
user system total real
simonk 1.344000 0.000000 1.344000 ( 1.344000)
samk 3.859000 0.000000 3.859000 ( 3.876000)
paul 3.532000 0.000000 3.532000 ( 3.547000)
jeff 1.640000 0.000000 1.640000 ( 1.641000)
paolo 2.656000 0.000000 2.656000 ( 2.672000)
jegII 1.704000 0.016000 1.720000 ( 1.719000)
simons 2.109000 0.000000 2.109000 ( 2.110000)
robert 3.172000 0.015000 3.187000 ( 3.203000)
david 5.078000 0.000000 5.078000 ( 5.079000)
zach 0.328000 0.000000 0.328000 ( 0.328000)
markvh 0.391000 0.000000 0.391000 ( 0.391000)
martin 0.390000 0.000000 0.390000 ( 0.484000)

Big array, one iteration:
user system total real
simonk 4.125000 0.000000 4.125000 ( 4.453000)
samk 17.125000 0.000000 17.125000 ( 17.283000)
paul 0.063000 0.000000 0.063000 ( 0.062000)
jeff 0.109000 0.000000 0.109000 ( 0.110000)
paolo 0.031000 0.000000 0.031000 ( 0.031000)
jegII 0.016000 0.000000 0.016000 ( 0.016000)
simons 2.219000 0.000000 2.219000 ( 2.265000)
robert 0.031000 0.000000 0.031000 ( 0.032000)
david 34.188000 0.016000 34.204000 ( 35.127000)
zach 2.156000 0.000000 2.156000 ( 2.203000)
markvh 0.015000 0.000000 0.015000 ( 0.016000)
martin 0.000000 0.000000 0.000000 ( 0.000000)

Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

But it is interesting to see the varying results for each algorithm
based on how they are used.

Ryan

···

On 10/17/05, pauldacus@gmail.com <pauldacus@gmail.com> wrote:

pauldacus · 18 October 2005 23:36

For those still the slightest bit interested... I found two slightly
faster approaches. I stood on the shoulders of giants....

paul2: one={};two={};a.each{|i| one[i] ? two[i]=1 : one[i]=1;two.keys
paul2a: one={};two={};a.each{|i| (one[i] && two[i]=1) ||
one[i]=1;two.keys

These two scripts are exactly the same concept, the second just uses &&
and || shortcut evaluation... which proved infintesimally faster....
actually scored a 'zero' on the large array.

user system total real
david 35.150000 0.000000 35.150000 ( 35.150000)
jeff 9.720000 0.000000 9.720000 ( 9.720000)
jegII 9.830000 0.000000 9.830000 ( 9.830000)
markvh 8.130000 0.000000 8.130000 ( 8.130000)
martin 8.680000 0.000000 8.680000 ( 8.680000)
paolo 18.350000 0.000000 18.350000 ( 18.350000)
park 11.310000 0.000000 11.310000 ( 11.310000)
paul 19.390000 0.000000 19.390000 ( 19.390000)
robert 22.570000 0.000000 22.570000 ( 22.570000)
samk1 12.740000 0.000000 12.740000 ( 12.740000)
samk2 29.000000 0.000000 29.000000 ( 29.000000)
simonk 8.190000 0.000000 8.190000 ( 8.190000)
simons 13.670000 0.000000 13.670000 ( 13.670000)
zach 1.540000 0.000000 1.540000 ( 1.540000)
paul2 1.760000 0.000000 1.760000 ( 1.760000)
paul2a 1.760000 0.000000 1.760000 ( 1.760000)
user system total real
david 206.850000 0.000000 206.850000 (206.850000)
jeff 0.770000 0.000000 0.770000 ( 0.770000)
jegII 0.110000 0.000000 0.110000 ( 0.110000)
markvh 10.820000 0.000000 10.820000 ( 10.820000)
martin 0.050000 0.000000 0.050000 ( 0.050000)
paolo 0.170000 0.000000 0.170000 ( 0.170000)
park 0.110000 0.000000 0.110000 ( 0.110000)
paul 0.220000 0.000000 0.220000 ( 0.220000)
robert 0.220000 0.000000 0.220000 ( 0.220000)
samk1 0.110000 0.000000 0.110000 ( 0.110000)
samk2 79.210000 0.000000 79.210000 ( 79.210000)
simonk 22.850000 0.000000 22.850000 ( 22.850000)
simons 11.370000 0.000000 11.370000 ( 11.370000)
zach 11.750000 0.000000 11.750000 ( 11.750000)
paul2 0.050000 0.000000 0.050000 ( 0.050000)
paul2a 0.000000 0.000000 0.000000 ( 0.000000)

A combined score shows them to be prettty good, shortcut evaluation
winning by a nose:

1.76 paul2a
1.81 paul2
8.73 martin
9.94 jegII
10.49 jeff
11.42 park
12.85 samk1
13.29 zach
18.52 paolo
18.95 markvh
19.61 paul
22.79 robert
25.04 simons
31.04 simonk
108.21 samk2
242.00 david

Simon_Kroger · 15 October 2005 20:31

Robert Klemme wrote:

···

Sam Kong <sam.s.kong@gmail.com> wrote:

You don't need the exact number of repetitions. Try this:
a = [0,1,2,3,4,5,2,3]
h = {}
u = a.inject() {|res, x| h ? res + : h = res}

Thank you for the insight.

However, your code doesn't work if an element repeats more than twice.

a = [0,1,2,3,4,5,2,3,2] #there are three 2's.
Your code's result is [2, 3, 2] instead of [2, 3]
It should be modified like
u = a.inject() {|res, x| h ? res + : h = res}.uniq

Other than that, I like your way.
Thank you again.

Sam

a = [0,1,2,3,4,5,2,3]

=> [0, 1, 2, 3, 4, 5, 2, 3]

a.inject(Hash.new(0)) {|h,e| h[e]+=1;h}.select {|k,v|v>1}.map {|a,b|a}

=> [2, 3]

Quite complicated...

Yeah,

some thoughts of mine:

---------------------------------------------------------------------
require 'enumerator'

a = [0,1,2,3,4,3,5,2,3]

p a.sort.to_enum(:each_cons, 2).select{|i, j|i==j}.flatten.uniq

p a.select{|x|a.index(x) != a.rindex(x)}.uniq

p a.dup.reject{|x|a.delete(x)}.uniq
---------------------------------------------------------------------

all have downsides (performance wise and the last one is destructive)

cheers

Simon

Robert · 16 October 2005 17:11

Hello!

How about this ?
(0...a.length-1).select{|i| a[i]==a[i+1]}.map{|i|a[i]}.uniq

a needs to be sorted for this to work. As far as I remember that
was not a guaranteed precondition so you would have to add that here.

IMHO most efficient variants are still those that use a hash for
element counting.

and you needed even count i think:

  harp:~ > cat a.rb
  class Array
    def dups
      h, d = {}, ; each{|e| h[e] ? (d << h.delete(e)) : (h[e] = e)
    }; d end
  end

  a = 0, 1, 2, 3, 4, 5, 2, 3, 2

  p a.dups

  harp:~ > ruby a.rb
  [2, 3]

Did you mean to provide this as an example that you actually need to count? Because that's what it is:

?> a = 0, 1, 2, 3, 4, 5, 2, 3, 2, 2
=> [0, 1, 2, 3, 4, 5, 2, 3, 2, 2]

?> p a.dups
[2, 3, 2]
=> nil

Here's another alternative with limited counting:

a = 0, 1, 2, 3, 4, 5, 2, 3, 2, 2

=> [0, 1, 2, 3, 4, 5, 2, 3, 2, 2]

a.inject({}) do |h,i|

?> case h[i]

    when nil
      h[i] = 1
    when 1
      h[i] = 2
    when 2
      # ok

?> else
?> raise "Error"

end
h
end.inject() {|ar,(k,v)| ar << k if v == 2; ar}

=> [2, 3]

I doubt though that performance is better than that with counting.

Kind regards

robert

···

Ara.T.Howard <Ara.T.Howard@noaa.gov> wrote:

On Sun, 16 Oct 2005, Robert Klemme wrote:

para.hsu@gmail.com wrote:

J-Van · 17 October 2005 04:30

You could just sort the array, then iterate. If the previous iteration value is equal to the current, you have a non-unique. This would work with string or numeric data types.

Rick

Or, probably not efficient, but:

irb(main):031:0> a
=> [1, 2, 3, 4, 5, 5, 3]

irb(main):032:0> a.find_all { |x| a.find_all { |y| y == x }.size > 1 }.uniq
=> [3, 5]

···

On Oct 16, 2005, at 7:07 PM, Explosiv0SX wrote:

On Oct 16, 2005, at 9:44 PM, Lyndon Samson wrote:

Funny, doesnt work when an item in the array > 10...

Plus, reading the documentation ( doh! ) I can access a Bignum bit with

On 10/17/05, Lyndon Samson <lyndon.samson@gmail.com> wrote:

Not very elegant, but a different approach...

bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if (bn & ( 1<<el) ) != 0) | bn |= (1<<el) }
0.upto(a.length) { |bitNdx| puts bitNdx if (bn2 & ( 1<<bitNdx )) != 0 }

--
Into RFID? www.rfidnewsupdate.com <http://www.rfidnewsupdate.com> Simple,
fast, news.

Simon_Strandgaard2 · 17 October 2005 18:15

Sorry should be without the delete..

a = [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
#=> [2, 3]

···

On 10/17/05, Ryan Leavengood <leavengood@gmail.com> wrote:

On 10/17/05, Simon Strandgaard <neoneye@gmail.com> wrote:
> On 10/17/05, pauldacus@gmail.com <pauldacus@gmail.com> wrote:
> > Young grasshoppers... let us never forget the power of the REGEX! Let
> > us not bother with counters or blocks... let us GSUB this bad boy!
> >
> > a = [0,1,2,3,4,5,2,3]
> >
> > a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)
> >
> > =>["0", "1", "4", "5"]
>
>
> no need to sort..
>
> a = [0, 1, 2, 3, 4, 5, 2, 3]
> a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
> #=> [0, 1, 4, 5]

Both are interesting solutions, but you are returning the unique
elements of the array, not the non-unique ones, which was the original
problem. I say this not to be pedantic, but to see how you would solve
it.

--
Simon Strandgaard

Ryan_Leavengood · 17 October 2005 21:22

On second thought, Mark's may have been the best overall. Either way,
both Mark's and Martin's are good algorithms. Many of the others were
good as well.

Ryan

···

On 10/17/05, Ryan Leavengood <leavengood@gmail.com> wrote:

The results:

Small array, lots of iterations:
user system total real
markvh 0.391000 0.000000 0.391000 ( 0.391000)
martin 0.390000 0.000000 0.390000 ( 0.484000)

Big array, one iteration:
user system total real
markvh 0.015000 0.000000 0.015000 ( 0.016000)
martin 0.000000 0.000000 0.000000 ( 0.000000)

Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

pauldacus · 17 October 2005 22:16

Awesome! What seemed like a fairly simple question... not so! I'm
going to look over all the answers again. The benchmarking is awesome.

Sam_Kong · 18 October 2005 01:21

I must confess that I love this group...
Very enthusiastic!!!

Ryan Leavengood wrote:

Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

Ryan, thanks for the benchmarking.
Please, add my original solution to your benchmark.
It seems to be very fast.
The reason I asked the question was not performance but briefness and
rubish style.
The slow one I came up later looks very intuitive and brief but suffers
from performance.
Briefness and performance don't always go together.

Regards,
Sam

pauldacus · 19 October 2005 00:17

Well... it was a short run. At least I keep my streak alive... Still,
the only thing I've won in my life was a can of soup when I was 8 years
old!

user system total real
david 34.320000 0.000000 34.320000 ( 34.320000)
jeff 9.560000 0.000000 9.560000 ( 9.560000)
jegII 9.390000 0.000000 9.390000 ( 9.390000)
markvh 7.860000 0.000000 7.860000 ( 7.860000)
martin 8.790000 0.000000 8.790000 ( 8.790000)
paolo 18.340000 0.000000 18.340000 ( 18.340000)
park 11.320000 0.000000 11.320000 ( 11.320000)
paul 20.150000 0.000000 20.150000 ( 20.150000)
robert 22.910000 0.000000 22.910000 ( 22.910000)
samk1 11.920000 0.000000 11.920000 ( 11.920000)
samk2 28.340000 0.000000 28.340000 ( 28.340000)
simonk 8.070000 0.000000 8.070000 ( 8.070000)
simons 13.790000 0.000000 13.790000 ( 13.790000)
zach 1.540000 0.000000 1.540000 ( 1.540000)
paul2 1.810000 0.000000 1.810000 ( 1.810000)
paul2a 1.760000 0.000000 1.760000 ( 1.760000)
Sylv 1.370000 0.000000 1.370000 ( 1.370000)
user system total real
david 207.840000 0.000000 207.840000 (207.840000)
jeff 0.770000 0.000000 0.770000 ( 0.770000)
jegII 0.060000 0.000000 0.060000 ( 0.060000)
markvh 10.440000 0.000000 10.440000 ( 10.440000)
martin 0.060000 0.000000 0.060000 ( 0.060000)
paolo 0.160000 0.000000 0.160000 ( 0.160000)
park 0.110000 0.000000 0.110000 ( 0.110000)
paul 0.330000 0.000000 0.330000 ( 0.330000)
robert 0.210000 0.000000 0.210000 ( 0.210000)
samk1 0.110000 0.000000 0.110000 ( 0.110000)
samk2 78.160000 0.000000 78.160000 ( 78.160000)
simonk 21.640000 0.000000 21.640000 ( 21.640000)
simons 16.310000 0.000000 16.310000 ( 16.310000)
zach 10.650000 0.000000 10.650000 ( 10.650000)
paul2 0.000000 0.000000 0.000000 ( 0.000000)
paul2a 0.000000 0.000000 0.000000 ( 0.000000)
Sylv 0.000000 0.000000 0.000000 ( 0.000000)

Well, it was a good ride while it lasted:

Sylv 1.37
paul2a 1.76
paul2 1.81
martin 8.85
jegII 9.45
jeff 10.33
park 11.43
samk1 12.03
zach 12.19
markvh 18.30
paolo 18.50
paul 20.48
robert 23.12
simonk 29.71
simons 30.10
samk2 106.50
david 242.16

Ara.T.Howard6 · 16 October 2005 18:01

no - i meant you have to count - merely note that a element has been seen and,
if so, add it to the list. i notice now my example has a bug in that an
element occuring 4 times is added twice. it's easy fix though. i suppose you
could call that counting - but's it's really only 0 or 1, seen or seen again.

cheers.

-a

···

On Mon, 17 Oct 2005, Robert Klemme wrote:

Did you mean to provide this as an example that you actually need to count? Because that's what it is:

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
anything that contradicts experience and logic should be abandoned.
-- h.h. the 14th dalai lama

===============================================================================

Zach_Dennis1 · 17 October 2005 05:14

I don't' think I've seen this solution yet, so here's my stab:

irb(main):013:0> a = [1,2,5,4,3,5,3]
=> [1, 2, 5, 4, 3, 5, 3]

irb(main):017:0> t=[]; a.delete_if{ |e| r=(not t.include? e); t.push(e); r }
=> [5, 3]

Zach

Jeff_Wood · 17 October 2005 18:46

How 'bout

a = [ 0,1,2,3,4,5,2,3 ]
a.uniq.inject(a.dup) { |b,i| b.delete_at(b.index(i)) ; b }

···

On 10/17/05, Simon Strandgaard <neoneye@gmail.com> wrote:

On 10/17/05, Ryan Leavengood <leavengood@gmail.com> wrote:
> On 10/17/05, Simon Strandgaard <neoneye@gmail.com> wrote:
> > On 10/17/05, pauldacus@gmail.com <pauldacus@gmail.com> wrote:
> > > Young grasshoppers... let us never forget the power of the REGEX! Let
> > > us not bother with counters or blocks... let us GSUB this bad boy!
> > >
> > > a = [0,1,2,3,4,5,2,3]
> > >
> > > a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)
> > >
> > > =>["0", "1", "4", "5"]
> >
> >
> > no need to sort..
> >
> > a = [0, 1, 2, 3, 4, 5, 2, 3]
> > a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
> > #=> [0, 1, 4, 5]
>
> Both are interesting solutions, but you are returning the unique
> elements of the array, not the non-unique ones, which was the original
> problem. I say this not to be pedantic, but to see how you would solve
> it.

Sorry should be without the delete..

a = [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
#=> [2, 3]

--
Simon Strandgaard

--
"http://ruby-lang.org -- do you ruby?"

Jeff Wood

pauldacus · 17 October 2005 19:11

That '?!' zero width lookahead regex is nice, Simon.

A workable alternative, with nary a block or hash counter to offend the
eye:

a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq

This works for all numbers, not just single digits. Returns a string,
which you will 'to_i' in an iterator. Replace the \d with \w for
something to find dup strings... I think.

Lyndon_Samson · 17 October 2005 23:16

user system total real
index 1.344000 0.000000 1.344000 ( 1.437000)
grep 3.875000 0.000000 3.875000 ( 4.266000)
scan 2.703000 0.000000 2.703000 ( 3.188000)
inject 1.625000 0.000000 1.625000 ( 1.812000)
gsub 3.078000 0.000000 3.078000 ( 3.547000)
bits 0.969000 0.000000 0.969000 ( 1.062000)
include 1.250000 0.000000 1.250000 ( 1.469000)

BMN = 50000
bm(5) do |bx|
bx.report('index ') { BMN.times { a.select{|e| a.index(e) !=a.rindex(e)}.uniq
} }
bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } }
bx.report('scan ') { BMN.times {
a.sort.join('').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
} }
bx.report('inject') { BMN.times { a.uniq.inject(a.dup) { |b,i|b.delete_at(
b.index(i)) ; b } } }
bx.report('gsub ') { BMN.times
{a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
} }
bx.report('bits ') { BMN.times {
bn = bn2 = 0
a.each {|el|(bn2 |= (1<<el) if bn[el] == 1) | bn |= (1<<el) }
#0.upto(a.length) { |bitNdx| puts bitNdx if bn2[bitNdx] == 1 }
} }
bx.report('include') { BMN.times { t=[]; a.select{ |e| r=t.include?e; t<<e;r
}.uniq } }

end

Nyah nyah, unlimited length bitstrings win No output array of course but
the results are encoded in Fixnum/Bignum bits.

Zach_Dennis1 · 17 October 2005 23:58

Ryan Leavengood wrote:

The results:

Small array, lots of iterations:
user system total real
markvh 0.391000 0.000000 0.391000 ( 0.391000)
martin 0.390000 0.000000 0.390000 ( 0.484000)

Big array, one iteration:
user system total real
markvh 0.015000 0.000000 0.015000 ( 0.016000)
martin 0.000000 0.000000 0.000000 ( 0.000000)

Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

On second thought, Mark's may have been the best overall. Either way,
both Mark's and Martin's are good algorithms. Many of the others were
good as well.

Mine isn't the shortest, but it's the fastest I've seen thus far! Twice as fast as the next closest! Why does everyone leave me out of benchmark,

---my code
t=; a.delete_if{ |e| r=(not t.include? e); t.push(e); r }
---end my code

> "c:\ruby\bin\ruby.exe" find_non-uniques.rb -d
user system total real
delete 0.218000 0.000000 0.218000 ( 0.218000) <---- MINE!!!
index 0.500000 0.000000 0.500000 ( 0.547000)
grep 0.500000 0.000000 0.500000 ( 0.516000)
scan 1.078000 0.000000 1.078000 ( 1.172000)
inject 0.656000 0.000000 0.656000 ( 0.672000)
gsub 1.250000 0.000000 1.250000 ( 1.250000)
include 0.672000 0.000000 0.672000 ( 0.687000)

Zach

···

On 10/17/05, Ryan Leavengood <leavengood@gmail.com> wrote:

Topic		Replies	Views
Isolating non-unique items in an array ruby-talk	15	111	13 October 2006
Making a count of unique elements in an array ruby-talk	1	129	9 January 2006
How to get only the duplicatet items from an array ruby-talk	4	114	27 October 2006
Duplicate elements in array ruby-talk	14	91	29 October 2007
Isolating non-unique items in an array ruby-talk	0	116	12 October 2006

How to get non-unique elements from an array?

--

--

Related topics