Finding duplicates in an array and its index number

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

···

--
Posted via http://www.ruby-forum.com/.

I'm curious what the purpose is. If it's simply to remove the duplicates
you can call .uniq on the array to remove duplicates.

···

On Apr 19, 2012 4:55 PM, "newto ruby" <lists@ruby-forum.com> wrote:

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

--
Posted via http://www.ruby-forum.com/\.

Hi,

As always, there are many ways to do this.

A highlevel approach: (short but not really inefficient)

···

####
x = [1, 2, 3, 2, 4, 5, 4, 4]

a = x.each_with_index.group_by(&:first).inject({}) do |result, (val,
group)|
  next result if group.length == 1
  result.merge val => group.map {|pair| pair[1]}
end

p a
####

A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.

--
Posted via http://www.ruby-forum.com/.

How about;

1.9.3p125 :001 > foo = [1,2,3,1,3]
  => [1, 2, 3, 1, 3]
1.9.3p125 :002 > dup_indices = Hash.new {|h,k| h[k]=}
  => {}
1.9.3p125 :003 > foo.each_index {|i| dup_indices[foo[i]] << i unless 1 == foo.count(foo[i])}
  => [1, 2, 3, 1, 3]
1.9.3p125 :004 > dup_indices
  => {1=>[0, 3], 3=>[2, 4]}

Sam

···

On 20/04/12 09:53, newto ruby wrote:

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

I do a lot of indexing of data. Usually they're multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it's a slightly
different way to do it in the event an index of all entries would be
useful for lookups.

data = [1, 2, 3, 1, 3, 4, 6, 5, 4]
dataIndex = {}

data.each_with_index do |value, indx|

  index_key = value
  index_value = indx

  if !dataIndex.has_key?(index_key)
    dataIndex.store(index_key, [])
  end

  dataIndex[index_key].push(index_value)
end

p dataIndex

dataIndex.each do |key, indxs|
  if indxs.count > 1
    # do something
    puts "#{key} appears at #{indxs.join(', ')}"
  end
end

-- gw

···

--
Posted via http://www.ruby-forum.com/.

Generate a hash with unique items as keys, and arrays of indices as values,
then select only those with more than one index:

array = [1, 2, 3, 4, 5, 3, 6, 7, 2, 8, 1, 9]

array.each_with_index.reduce({}) { |hash, (item, index)|
   hash[item] = (hash[item] || ) << index
   hash
}.select { |key, value|
   value.size > 1
}

# => {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}

···

On 04/19/2012 11:53 PM, newto ruby wrote:

Hi, I have following array and I am trying to find if there are duplicates in the array. I also want to find out their index.

--
Lars Haugseth

If you really want to know how it finds the duplicates, look at the source
code for the .uniq method as a starting point.

···

On Apr 19, 2012 5:06 PM, "Jeff Lunt" <jefflunt@gmail.com> wrote:

I'm curious what the purpose is. If it's simply to remove the duplicates
you can call .uniq on the array to remove duplicates.
On Apr 19, 2012 4:55 PM, "newto ruby" <lists@ruby-forum.com> wrote:

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

--
Posted via http://www.ruby-forum.com/\.

Jan E. wrote in post #1057484:

A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.

... like this:

···

####
x = [1, 2, 3, 2, 4, 5, 4, 4]

duplicates = {}
x.each_with_index do |value, i|
  (i + 1).upto x.length - 1 do |j|
    if x[j] == value
      duplicates[value] = [i] if duplicates[value].nil?
      duplicates[value] << j
      break
    end
  end
end

p duplicates
####

--
Posted via http://www.ruby-forum.com/\.

Greg Willits wrote in post #1057552:

I do a lot of indexing of data. Usually they're multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it's a slightly
different way to do it in the event an index of all entries would be
useful for lookups.

DOH. That first loop only needed to be:

data.each_with_index do |value, indx|
   if !dataIndex.has_key?(key)
     dataIndex.store(key, )
   end
   dataIndex[key].push(value)
end

That's what I get for doing a quick transcribe of project code.

-- gw

···

--
Posted via http://www.ruby-forum.com/\.

Don't use reduce/inject for non-reductive applications. Use something more appropriate, like a plain each.

Also use Hash to its full capabilities:

hash = Hash.new { |h,k| h[k] = }

then it is as clean as:

array.each_with_index do |val, idx|
  hash[val] << idx
end

Notice how you're not constantly re-assigning hash for no good reason in my version? That adds up, but more importantly it obfuscates the original intent.

···

On Apr 20, 2012, at 03:46 , Lars Haugseth wrote:

On 04/19/2012 11:53 PM, newto ruby wrote:

Hi, I have following array and I am trying to find if there are duplicates in the array. I also want to find out their index.

Generate a hash with unique items as keys, and arrays of indices as values,
then select only those with more than one index:

array = [1, 2, 3, 4, 5, 3, 6, 7, 2, 8, 1, 9]

array.each_with_index.reduce({}) { |hash, (item, index)|
hash[item] = (hash[item] || ) << index
hash
}.select { |key, value|
value.size > 1
}

# => {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}

OK folks,

news from the service department. Here's the shootout:

Turns out quickest is one of the less arcane solutions:

      dups = {}

      dat.each_with_index do |val, idx|
        (dups[val] ||= []) << idx
      end

      dups.delete_if {|k,v| v.size == 1}

A few remarks:

It's amazing how many forms even the same algorithm can take in Ruby.
Just the different options for adding to a Hash (#default_proc,
#fetch, ||=) can make a huge visual difference although the underlying
strategy of all these algorithms is the same.

Downside of Jan's first approach is that it has effort O(n**2) in
terms of elements in the array because of the nested iteration. Sam's
approach has a similar quality although less obvious (it's in the
#count).

Allocating multiple Hash instances is what makes Jan's second approach slow.

As always, #inject (or #reduce) approaches are slower than others.

The best strategy is to create a Hash with elements as keys and Arrays
of indexes as values. This approach won't suffer for large inputs
from the O(n**2) issue.

For one off scripts and infrequent execution it doesn't really matter much. :slight_smile:

Kind regards

robert

···

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/