A highlevel approach: (short but not really inefficient)
···
####
x = [1, 2, 3, 2, 4, 5, 4, 4]
a = x.each_with_index.group_by(&:first).inject({}) do |result, (val,
group)|
next result if group.length == 1
result.merge val => group.map {|pair| pair[1]}
end
p a
####
A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.
I do a lot of indexing of data. Usually they're multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it's a slightly
different way to do it in the event an index of all entries would be
useful for lookups.
data = [1, 2, 3, 1, 3, 4, 6, 5, 4]
dataIndex = {}
data.each_with_index do |value, indx|
index_key = value
index_value = indx
if !dataIndex.has_key?(index_key)
dataIndex.store(index_key, [])
end
dataIndex[index_key].push(index_value)
end
p dataIndex
dataIndex.each do |key, indxs|
if indxs.count > 1
# do something
puts "#{key} appears at #{indxs.join(', ')}"
end
end
If you really want to know how it finds the duplicates, look at the source
code for the .uniq method as a starting point.
···
On Apr 19, 2012 5:06 PM, "Jeff Lunt" <jefflunt@gmail.com> wrote:
I'm curious what the purpose is. If it's simply to remove the duplicates
you can call .uniq on the array to remove duplicates.
On Apr 19, 2012 4:55 PM, "newto ruby" <lists@ruby-forum.com> wrote:
Hi,
I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.
e.g.
a=[1,2,3,1,3]
Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.
A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.
... like this:
···
####
x = [1, 2, 3, 2, 4, 5, 4, 4]
duplicates = {}
x.each_with_index do |value, i|
(i + 1).upto x.length - 1 do |j|
if x[j] == value
duplicates[value] = [i] if duplicates[value].nil?
duplicates[value] << j
break
end
end
end
I do a lot of indexing of data. Usually they're multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it's a slightly
different way to do it in the event an index of all entries would be
useful for lookups.
DOH. That first loop only needed to be:
data.each_with_index do |value, indx|
if !dataIndex.has_key?(key)
dataIndex.store(key, )
end
dataIndex[key].push(value)
end
That's what I get for doing a quick transcribe of project code.
Don't use reduce/inject for non-reductive applications. Use something more appropriate, like a plain each.
Also use Hash to its full capabilities:
hash = Hash.new { |h,k| h[k] = }
then it is as clean as:
array.each_with_index do |val, idx|
hash[val] << idx
end
Notice how you're not constantly re-assigning hash for no good reason in my version? That adds up, but more importantly it obfuscates the original intent.
···
On Apr 20, 2012, at 03:46 , Lars Haugseth wrote:
On 04/19/2012 11:53 PM, newto ruby wrote:
Hi, I have following array and I am trying to find if there are duplicates in the array. I also want to find out their index.
Generate a hash with unique items as keys, and arrays of indices as values,
then select only those with more than one index:
news from the service department. Here's the shootout:
Turns out quickest is one of the less arcane solutions:
dups = {}
dat.each_with_index do |val, idx|
(dups[val] ||= []) << idx
end
dups.delete_if {|k,v| v.size == 1}
A few remarks:
It's amazing how many forms even the same algorithm can take in Ruby.
Just the different options for adding to a Hash (#default_proc, #fetch, ||=) can make a huge visual difference although the underlying
strategy of all these algorithms is the same.
Downside of Jan's first approach is that it has effort O(n**2) in
terms of elements in the array because of the nested iteration. Sam's
approach has a similar quality although less obvious (it's in the #count).
Allocating multiple Hash instances is what makes Jan's second approach slow.
As always, #inject (or #reduce) approaches are slower than others.
The best strategy is to create a Hash with elements as keys and Arrays
of indexes as values. This approach won't suffer for large inputs
from the O(n**2) issue.
For one off scripts and infrequent execution it doesn't really matter much.