Comparing elements in same array

Hello,

I have an (name-sorted) Excel spreadsheet that looks like the following

   Name Location AA /opt/files/1 AA /opt/files/2 AB /opt/files/3 AB
/opt/files/4 BB /opt/files/5 BB /opt/files/6 BB /opt/files/7 CC
/opt/files/8 CC /opt/files/9
I want to open the spreadsheet and compare the contents of the files (in
column Location) that have the same name.

I am using the gem 'roo' to push all file names into an array. My plan is to
traverse the array to find the same names and open their respective files (
I've created a Name => Location hash for this purpose.)

I am stuck at finding files with same names from the array that I've
created. any help/pointer would be appreciated.

TIA.

post sample code pls
thanks -botp

···

On Thu, Oct 6, 2011 at 9:06 AM, Thomas Le <metasmorph@gmail.com> wrote:

I am stuck at finding files with same names from the array that I've
created. any help/pointer would be appreciated.

glad to hear it... happy ruby-ing!

- j

···

--
Posted via http://www.ruby-forum.com/.

Sorry .. this is my sample code

···

---------
require 'roo'

ORGFILE = 'organize.xlsx'

xcel = Excelx.new("#{ORGFILE}")
xcel.default_sheet = xcel.sheets.first

startingRow = 2
nameArray =
nameProfile = Hash.new

startingRow.upto(xcel.last_row) do |row|

    name = xcel.cell(row,'A')
    location = xcel.cell(row,'B')

    nameProfile["#{name}"] = "#{location}"

    nameArray.push("#{name}")

    startingRow += 1

end
-------

Thanks

On Wed, Oct 5, 2011 at 8:19 PM, botp <botpena@gmail.com> wrote:

On Thu, Oct 6, 2011 at 9:06 AM, Thomas Le <metasmorph@gmail.com> wrote:
> I am stuck at finding files with same names from the array that I've
> created. any help/pointer would be appreciated.

post sample code pls
thanks -botp

require 'roo'
ORGFILE = 'organize.xlsx'
xcel = Excelx.new("#{ORGFILE}")

lose the inlining hash, just do
  xcel = Excelx.new(ORGFILE)

same w others below..

xcel.default_sheet = xcel.sheets.first
startingRow = 2
nameArray =
nameProfile = Hash.new
startingRow.upto(xcel.last_row) do |row|
name = xcel.cell(row,'A')
location = xcel.cell(row,'B')
nameProfile["#{name}"] = "#{location}"

at this point, since you are associating the name to the location, you
are assumming that the name is unique, ie each name has one and only
one location. now what if you're wrong?

nameArray.push("#{name}")
startingRow += 1
end

hmm, finding duplicates names in nameArray is so obvious. why do you
want it? to remove duplicate rows in your excel file? .... or maybe,
you may also want to know duplicate names but different location, no?

kind regards -botp

···

On Thu, Oct 6, 2011 at 10:02 AM, Thomas Le <metasmorph@gmail.com> wrote:

hi Thomas,

  well, if i understand correctly what you want to do, i think this...

    nameProfile["#{name}"] = "#{location}"

  ...is going to be a problem. if the names repeat, the location will
be overwritten each time the name is found, and you'll wind up with only
the last one found as the value for your 'name' key.

  i might instead first set up an array of arrays with your names and
locations, so that you can account for the same name being in multiple
locations. then i would step through the array and create a hash in
which each name was a key, and the value was an array of the locations
associated with that name. it would then be simple to do whatever you
wanted with the location values stored in the array of each name key.

  a dumb example:

···

########

nl_array = [
  ["Hector", "Barcelona"],
  ["Bill", "New York"],
  ["Ivan", "Moscow"],
  ["Hector", "San Francisco"],
  ["Bill", "Boston"]
]

nl_hash = Hash.new

nl_array.each{|entry|
  unless nl_hash.has_key?(entry[0])
    nl_hash[entry[0]] = [entry[1]]
  else
    nl_hash[entry[0]] << entry[1]
  end
}

p nl_hash

########

=> {"Ivan"=>["Moscow"], "Bill"=>["New York", "Boston"],
"Hector"=>["Barcelona", "San Francisco"]}

  hth,

  - j

--
Posted via http://www.ruby-forum.com/\.

I want to find (thousands of) duplicate names in the spreadsheet so I can
compare the corrresponding files to see if they produce the same results ...

my logic is like this "if name in column A row 2 is same as that in colum A
row 3, open files in column B row 2 and column B row 3,...do something else"
...

Thanks,
TL

···

On Thu, Oct 6, 2011 at 2:26 AM, botp <botpena@gmail.com> wrote:

On Thu, Oct 6, 2011 at 10:02 AM, Thomas Le <metasmorph@gmail.com> wrote:
> require 'roo'
> ORGFILE = 'organize.xlsx'
> xcel = Excelx.new("#{ORGFILE}")

lose the inlining hash, just do
xcel = Excelx.new(ORGFILE)

same w others below..

> xcel.default_sheet = xcel.sheets.first
> startingRow = 2
> nameArray =
> nameProfile = Hash.new
> startingRow.upto(xcel.last_row) do |row|
> name = xcel.cell(row,'A')
> location = xcel.cell(row,'B')
> nameProfile["#{name}"] = "#{location}"

at this point, since you are associating the name to the location, you
are assumming that the name is unique, ie each name has one and only
one location. now what if you're wrong?

> nameArray.push("#{name}")
> startingRow += 1
> end

hmm, finding duplicates names in nameArray is so obvious. why do you
want it? to remove duplicate rows in your excel file? .... or maybe,
you may also want to know duplicate names but different location, no?

kind regards -botp

This works for me beautifully ... my code looks like this .. still more to
go to complete my task but I've passed the first hurdle.

#---start code

require 'roo'

ORGFILE = 'organize.xlsx'
xcel = Excelx.new(ORGFILE)
xcel.default_sheet = xcel.sheets.first

startingRow = 2
nameArray =

startingRow.upto(xcel.last_row) do |row|
   name = xcel.cell(row,'A')
   location = xcel.cell(row,'B')
   nameArray << [name,location]

   startingRow += 1
end

nameHash = Hash.new

   nameArray.each { |entry|
       unless nameHash.has_key?(entry[0])
           nameHash[entry[0]] = [entry[1]]
       else
           nameHash[entry[0]] << entry[1]
end
}
# --end code

Thanks to Jake & Botp. I've learned a lot from your suggestions.

TL

···

On Thu, Oct 6, 2011 at 9:28 AM, jake kaiden <jakekaiden@yahoo.com> wrote:

hi Thomas,

well, if i understand correctly what you want to do, i think this...
>
> nameProfile["#{name}"] = "#{location}"
>
  ...is going to be a problem. if the names repeat, the location will
be overwritten each time the name is found, and you'll wind up with only
the last one found as the value for your 'name' key.

i might instead first set up an array of arrays with your names and
locations, so that you can account for the same name being in multiple
locations. then i would step through the array and create a hash in
which each name was a key, and the value was an array of the locations
associated with that name. it would then be simple to do whatever you
wanted with the location values stored in the array of each name key.

a dumb example:

########

nl_array = [
["Hector", "Barcelona"],
["Bill", "New York"],
["Ivan", "Moscow"],
["Hector", "San Francisco"],
["Bill", "Boston"]
]

nl_hash = Hash.new

nl_array.each{|entry|
unless nl_hash.has_key?(entry[0])
   nl_hash[entry[0]] = [entry[1]]
else
   nl_hash[entry[0]] << entry[1]
end
}

p nl_hash

########

=> {"Ivan"=>["Moscow"], "Bill"=>["New York", "Boston"],
"Hector"=>["Barcelona", "San Francisco"]}

hth,

- j

--
Posted via http://www.ruby-forum.com/\.

I want to find (thousands of) duplicate names in the spreadsheet so I can
compare the corrresponding files to see if they produce the same results ...

ok

my logic is like this "if name in column A row 2 is same as that in colum A
row 3, open files in column B row 2 and column B row 3,...do something else"

column A is already name column. files of the same pathname are the
same. compare only if they have different pathnames.

you can try something like,

#---start code
require 'roo'

ORGFILE = 'organize.xlsx'
xcel = Excelx.new(ORGFILE)
xcel.default_sheet = xcel.sheets.first

startingRow = 2
nameProfile = Hash.new { |h, k| h[k] = }

startingRow.upto(xcel.last_row) do |row|

   name = xcel.cell(row,'A')
   location = xcel.cell(row,'B')

   nameProfile[name] << location

   startingRow += 1
end

nameProfile.delete_if{|k,v| v.unique.size>1}

#---end code

nameProfile should only contain hash of names with multiple pathnames

note, code above untested since i do not have your actual file.
try it in irb.

kind regards -botp

···

On Thu, Oct 6, 2011 at 12:04 PM, Thomas Le <metasmorph@gmail.com> wrote:

sorry, that should be the opposite, ie,

   nameProfile.delete_if{|k,v| v.unique.size<2}

thanks -botp

···

On Thu, Oct 6, 2011 at 1:23 PM, botp <botpena@gmail.com> wrote:

nameProfile.delete_if{|k,v| v.unique.size>1}