Finding duplicates in an array and its index number

7stud2 · 19 April 2012 21:53

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

···

--
Posted via http://www.ruby-forum.com/.

Jeff_Lunt · 19 April 2012 22:06

I'm curious what the purpose is. If it's simply to remove the duplicates
you can call .uniq on the array to remove duplicates.

···

On Apr 19, 2012 4:55 PM, "newto ruby" <lists@ruby-forum.com> wrote:

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 19 April 2012 22:49

Hi,

As always, there are many ways to do this.

A highlevel approach: (short but not really inefficient)

···

####
x = [1, 2, 3, 2, 4, 5, 4, 4]

a = x.each_with_index.group_by(&:first).inject({}) do |result, (val,
group)|
next result if group.length == 1
result.merge val => group.map {|pair| pair[1]}
end

p a
####

A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.

--
Posted via http://www.ruby-forum.com/.

Sam_Duncan · 19 April 2012 22:58

How about;

1.9.3p125 :001 > foo = [1,2,3,1,3]
  => [1, 2, 3, 1, 3]
1.9.3p125 :002 > dup_indices = Hash.new {|h,k| h[k]=}
  => {}
1.9.3p125 :003 > foo.each_index {|i| dup_indices[foo[i]] << i unless 1 == foo.count(foo[i])}
  => [1, 2, 3, 1, 3]
1.9.3p125 :004 > dup_indices
  => {1=>[0, 3], 3=>[2, 4]}

Sam

···

On 20/04/12 09:53, newto ruby wrote:

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

7stud2 · 20 April 2012 10:34

I do a lot of indexing of data. Usually they're multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it's a slightly
different way to do it in the event an index of all entries would be
useful for lookups.

data = [1, 2, 3, 1, 3, 4, 6, 5, 4]
dataIndex = {}

data.each_with_index do |value, indx|

index_key = value
index_value = indx

  if !dataIndex.has_key?(index_key)
    dataIndex.store(index_key, [])
  end

dataIndex[index_key].push(index_value)
end

p dataIndex

dataIndex.each do |key, indxs|
  if indxs.count > 1
    # do something
    puts "#{key} appears at #{indxs.join(', ')}"
  end
end

-- gw

···

--
Posted via http://www.ruby-forum.com/.

Lars_Haugseth1 · 20 April 2012 10:46

Generate a hash with unique items as keys, and arrays of indices as values,
then select only those with more than one index:

array = [1, 2, 3, 4, 5, 3, 6, 7, 2, 8, 1, 9]

array.each_with_index.reduce({}) { |hash, (item, index)|
   hash[item] = (hash[item] || ) << index
   hash
}.select { |key, value|
   value.size > 1
}

# => {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}

···

On 04/19/2012 11:53 PM, newto ruby wrote:

Hi, I have following array and I am trying to find if there are duplicates in the array. I also want to find out their index.

--
Lars Haugseth

Jeff_Lunt · 19 April 2012 22:08

If you really want to know how it finds the duplicates, look at the source
code for the .uniq method as a starting point.

···

On Apr 19, 2012 5:06 PM, "Jeff Lunt" <jefflunt@gmail.com> wrote:

I'm curious what the purpose is. If it's simply to remove the duplicates
you can call .uniq on the array to remove duplicates.
On Apr 19, 2012 4:55 PM, "newto ruby" <lists@ruby-forum.com> wrote:

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 19 April 2012 23:03

Jan E. wrote in post #1057484:

A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.

... like this:

···

####
x = [1, 2, 3, 2, 4, 5, 4, 4]

duplicates = {}
x.each_with_index do |value, i|
  (i + 1).upto x.length - 1 do |j|
    if x[j] == value
      duplicates[value] = [i] if duplicates[value].nil?
      duplicates[value] << j
      break
    end
  end
end

p duplicates
####

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 20 April 2012 16:55

Greg Willits wrote in post #1057552:

I do a lot of indexing of data. Usually they're multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it's a slightly
different way to do it in the event an index of all entries would be
useful for lookups.

DOH. That first loop only needed to be:

data.each_with_index do |value, indx|
   if !dataIndex.has_key?(key)
     dataIndex.store(key, )
   end
   dataIndex[key].push(value)
end

That's what I get for doing a quick transcribe of project code.

-- gw

···

--
Posted via http://www.ruby-forum.com/\.

Ryan_Davis1 · 20 April 2012 23:47

Don't use reduce/inject for non-reductive applications. Use something more appropriate, like a plain each.

Also use Hash to its full capabilities:

hash = Hash.new { |h,k| h[k] = }

then it is as clean as:

array.each_with_index do |val, idx|
hash[val] << idx
end

Notice how you're not constantly re-assigning hash for no good reason in my version? That adds up, but more importantly it obfuscates the original intent.

···

On Apr 20, 2012, at 03:46 , Lars Haugseth wrote:

On 04/19/2012 11:53 PM, newto ruby wrote:

Hi, I have following array and I am trying to find if there are duplicates in the array. I also want to find out their index.

Generate a hash with unique items as keys, and arrays of indices as values,
then select only those with more than one index:

array = [1, 2, 3, 4, 5, 3, 6, 7, 2, 8, 1, 9]

array.each_with_index.reduce({}) { |hash, (item, index)|
hash[item] = (hash[item] || ) << index
hash
}.select { |key, value|
value.size > 1
}

# => {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}

Robert_K1 · 22 April 2012 11:46

OK folks,

news from the service department. Here's the shootout:

gist.github.com

https://gist.github.com/rklemme/2463600

find-duplicates.rb

require 'benchmark'

REP = 100_000

dat = [1, 2, 3, 2, 4, 5, 4, 4]

Benchmark.bmbm 25 do |x|
  x.report "Jan low level" do
    REP.times do
      dups = {}

This file has been truncated. show original

output.txt

$ ruby19 -v
ruby 1.9.3p125 (2012-02-16 revision 34643) [i686-linux]
$ ruby19 b.rb 
Rehearsal -------------------------------------------------------------
Jan low level               1.320000   0.000000   1.320000 (  1.373991)
Jan multi Hash              2.250000   0.000000   2.250000 (  2.395521)
Sam                         1.630000   0.000000   1.630000 (  2.000988)
Lars                        1.590000   0.000000   1.590000 (  1.675986)
Lars simpler                1.550000   0.000000   1.550000 (  1.616905)
Lars faster?                1.400000   0.000000   1.400000 (  1.467628)

This file has been truncated. show original

Turns out quickest is one of the less arcane solutions:

dups = {}

      dat.each_with_index do |val, idx|
        (dups[val] ||= []) << idx
      end

dups.delete_if {|k,v| v.size == 1}

A few remarks:

It's amazing how many forms even the same algorithm can take in Ruby.
Just the different options for adding to a Hash (#default_proc,
#fetch, ||=) can make a huge visual difference although the underlying
strategy of all these algorithms is the same.

Downside of Jan's first approach is that it has effort O(n**2) in
terms of elements in the array because of the nested iteration. Sam's
approach has a similar quality although less obvious (it's in the
#count).

Allocating multiple Hash instances is what makes Jan's second approach slow.

As always, #inject (or #reduce) approaches are slower than others.

The best strategy is to create a Hash with elements as keys and Arrays
of indexes as values. This approach won't suffer for large inputs
from the O(n**2) issue.

For one off scripts and infrequent execution it doesn't really matter much.

Kind regards

robert

···

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Duplicate elements in array ruby-talk	14	91	29 October 2007
Search for duplicate elements in the array ruby-talk	2	119	16 September 2007
How to get only the duplicatet items from an array ruby-talk	4	114	27 October 2006
List of removed items using Array#uniq ruby-talk	5	123	21 July 2005
Detecting duplicates in an array, anything in the standard library? ruby-talk	34	178	22 August 2007

Finding duplicates in an array and its index number

Related topics