Determining uniqueness on a single array element

I am loading file names and mtimes into an array and then putting that
array inside an outer array. I have run into the situation where the
same file sometimes exists in different places in the file system and
occasionally with a different file name.

I need to ensure that I process the contents of each file only once.
So, in addition to the two elements originally captured I now create an
MD5 hexdigest of the file contents: [ f.mtime, f.name, f.hexdigest ]
and store that.

Now I wish to ensure that each distinct hexdigest is processed but once.
I can do this:

   hex_array = []
   outer_array.each do |inner_array|
     next if hex_array.include?( inner_array[2] )
     hex_array << inner_array[2]
     . . .

I wonder if there is a better way? Any suggestions?

···

--
Posted via http://www.ruby-forum.com/.

This assumes Ruby 1.9.2 where Array#uniq takes a block:

outer_array.uniq { |mtime, name, md5| md5 }.do |mtime, name, md5|
     # do stuff here
end

Gary Wright

···

On Feb 4, 2011, at 4:58 PM, James Byrne wrote:

  hex_array =
  outer_array.each do |inner_array|
    next if hex_array.include?( inner_array[2] )
    hex_array << inner_array[2]
    . . .

James Byrne wrote in post #979694:

Now I wish to ensure that each distinct hexdigest is processed but once.
I can do this:

   hex_array =
   outer_array.each do |inner_array|
     next if hex_array.include?( inner_array[2] )
     hex_array << inner_array[2]
     . . .

I wonder if there is a better way? Any suggestions?

(1) auto-splat to avoid the [2] magic index

outer_array.each do |mtime, name, hexdigest|

(2) Use a hash, rather than an array, to record ones you've processed.
This avoids a linear search on every iteration

seen = {}
outer_array.each do |mtime, name, hexdigest|
  next if seen[hexdigest]
  seen[hexdigest] = true
  ...
end

···

--
Posted via http://www.ruby-forum.com/\.

Brian Candler wrote in post #979745:

(1) auto-splat to avoid the [2] magic index

outer_array.each do |mtime, name, hexdigest|

(2) Use a hash, rather than an array, to record ones you've processed.
This avoids a linear search on every iteration

seen = {}
outer_array.each do |mtime, name, hexdigest|
  next if seen[hexdigest]
  seen[hexdigest] = true
  ...
end

Very nice. Thank you.

···

--
Posted via http://www.ruby-forum.com/\.