I need to remove duplicates from an array of arrays. I can't use Array#uniq because some fields are different and not part of the "key." Here's an example where the first 3 elements of each sub array are the "key" and determine uniqueness. I want to keep only the first one I get.
>> a = [[1, 2, 3, 4, 5], [1, 2, 3, 9, 4], [1, 2, 3, 4, 4]]
=> [[1, 2, 3, 4, 5], [1, 2, 3, 9, 4], [1, 2, 3, 4, 4]]
The return value of deduplicating this array should be: [[1, 2, 3, 4, 5]]
Here is my first attempt at solving the problem:
>> def dedup ary
>> ary.map do |line|
?> dupes = ary.select { |row| row[0..2] == line[0..2] }
?> dupes.first
>> end.uniq
>> end
=> nil
>>
?> dedup a
=> [[1, 2, 3, 4, 5]]
This works. However, it is *super slow* when operating on my dataset. My arrays contain hundreds of thousands of sub arrays. The unique key for each sub array is the first 12 (of 18) elements. It is taking many seconds to produce each intermediate array ("dupes" in the example above), so deduping the entire thing would likely take days.
Anyone have a superior and faster solution?
cr