I have an (name-sorted) Excel spreadsheet that looks like the following
Name Location AA /opt/files/1 AA /opt/files/2 AB /opt/files/3 AB
/opt/files/4 BB /opt/files/5 BB /opt/files/6 BB /opt/files/7 CC
/opt/files/8 CC /opt/files/9
I want to open the spreadsheet and compare the contents of the files (in
column Location) that have the same name.
I am using the gem 'roo' to push all file names into an array. My plan is to
traverse the array to find the same names and open their respective files (
I've created a Name => Location hash for this purpose.)
I am stuck at finding files with same names from the array that I've
created. any help/pointer would be appreciated.
name = xcel.cell(row,'A')
location = xcel.cell(row,'B')
nameProfile["#{name}"] = "#{location}"
nameArray.push("#{name}")
startingRow += 1
end
-------
Thanks
On Wed, Oct 5, 2011 at 8:19 PM, botp <botpena@gmail.com> wrote:
On Thu, Oct 6, 2011 at 9:06 AM, Thomas Le <metasmorph@gmail.com> wrote:
> I am stuck at finding files with same names from the array that I've
> created. any help/pointer would be appreciated.
at this point, since you are associating the name to the location, you
are assumming that the name is unique, ie each name has one and only
one location. now what if you're wrong?
nameArray.push("#{name}")
startingRow += 1
end
hmm, finding duplicates names in nameArray is so obvious. why do you
want it? to remove duplicate rows in your excel file? .... or maybe,
you may also want to know duplicate names but different location, no?
kind regards -botp
···
On Thu, Oct 6, 2011 at 10:02 AM, Thomas Le <metasmorph@gmail.com> wrote:
well, if i understand correctly what you want to do, i think this...
nameProfile["#{name}"] = "#{location}"
...is going to be a problem. if the names repeat, the location will
be overwritten each time the name is found, and you'll wind up with only
the last one found as the value for your 'name' key.
i might instead first set up an array of arrays with your names and
locations, so that you can account for the same name being in multiple
locations. then i would step through the array and create a hash in
which each name was a key, and the value was an array of the locations
associated with that name. it would then be simple to do whatever you
wanted with the location values stored in the array of each name key.
I want to find (thousands of) duplicate names in the spreadsheet so I can
compare the corrresponding files to see if they produce the same results ...
my logic is like this "if name in column A row 2 is same as that in colum A
row 3, open files in column B row 2 and column B row 3,...do something else"
...
Thanks,
TL
···
On Thu, Oct 6, 2011 at 2:26 AM, botp <botpena@gmail.com> wrote:
On Thu, Oct 6, 2011 at 10:02 AM, Thomas Le <metasmorph@gmail.com> wrote:
> require 'roo'
> ORGFILE = 'organize.xlsx'
> xcel = Excelx.new("#{ORGFILE}")
lose the inlining hash, just do
xcel = Excelx.new(ORGFILE)
at this point, since you are associating the name to the location, you
are assumming that the name is unique, ie each name has one and only
one location. now what if you're wrong?
> nameArray.push("#{name}")
> startingRow += 1
> end
hmm, finding duplicates names in nameArray is so obvious. why do you
want it? to remove duplicate rows in your excel file? .... or maybe,
you may also want to know duplicate names but different location, no?
Thanks to Jake & Botp. I've learned a lot from your suggestions.
TL
···
On Thu, Oct 6, 2011 at 9:28 AM, jake kaiden <jakekaiden@yahoo.com> wrote:
hi Thomas,
well, if i understand correctly what you want to do, i think this...
>
> nameProfile["#{name}"] = "#{location}"
>
...is going to be a problem. if the names repeat, the location will
be overwritten each time the name is found, and you'll wind up with only
the last one found as the value for your 'name' key.
i might instead first set up an array of arrays with your names and
locations, so that you can account for the same name being in multiple
locations. then i would step through the array and create a hash in
which each name was a key, and the value was an array of the locations
associated with that name. it would then be simple to do whatever you
wanted with the location values stored in the array of each name key.
I want to find (thousands of) duplicate names in the spreadsheet so I can
compare the corrresponding files to see if they produce the same results ...
ok
my logic is like this "if name in column A row 2 is same as that in colum A
row 3, open files in column B row 2 and column B row 3,...do something else"
column A is already name column. files of the same pathname are the
same. compare only if they have different pathnames.