Parsing CSV file with ruby

I'm currently trying to do something that seems rather simple but I'm
slightly new to ruby. I want to read in a cvs file, find rows that are
distinct with respect to one of the elements in the row (for example,
all rows in which the first element is "A") and then do something with
these rows (in this case, parse them, build some XML and write it to a
file). I'm not familiar enough with iterators in ruby but I seem to
remember there being functionality that will allow me to get distinct
rows based on some element in the row. Let me know if this is possible
and how I should approach it.

Thanks,
Drew

···

--
Posted via http://www.ruby-forum.com/.

Let me be more specific: essentially I want to find the groups of rows
that share an element. Let's say each row in my CVS doc has 3 elements.
I want to iterate across every group of rows that share the same value
for the first element. Hope this makes sense.

···

--
Posted via http://www.ruby-forum.com/.

Drew Olson wrote:

Let me be more specific: essentially I want to find the groups of rows
that share an element. Let's say each row in my CVS doc has 3 elements.
I want to iterate across every group of rows that share the same value
for the first element. Hope this makes sense.

#!/usr/bin/ruby -w

row_hash = {}

File.open("data.txt").each { |record|
   fields = record.split(",")
   row_hash[fields.first] = unless row_hash[fields.first]
   row_hash[fields.first] << record
}

row_hash.keys.sort.each { |key|
   puts "Group: #{key}"
   row_hash[key].each { |record|
      puts "\t#{record}"
   }
}

data.txt:

a,this,is,one,record
a,this,is,another,record
b,this,is,one,record
b,this,is,another,record
c,this,is,one,record
c,this,is,another,record

output:

Group: a
        a,this,is,one,record
        a,this,is,another,record
Group: b
        b,this,is,one,record
        b,this,is,another,record
Group: c
        c,this,is,one,record
        c,this,is,another,record

···

--
Paul Lutus
http://www.arachnoid.com

Let me be more specific: essentially I want to find the groups of rows
that share an element. Let's say each row in my CVS doc has 3 elements.
I want to iterate across every group of rows that share the same value
for the first element. Hope this makes sense.

I'm assuming you meant CSV (not CVS). :wink:

See if this gets you going:

Firefly:~/Desktop$ cat data.csv
one,1,A
one,2,B
one,3,C
two,1,A
two,2,B
three,1,A
Firefly:~/Desktop$ irb -r csv
>> rows = CSV.read("data.csv")
=> [["one", "1", "A"], ["one", "2", "B"], ["one", "3", "C"], ["two", "1", "A"], ["two", "2", "B"], ["three", "1", "A"]]
>> groups = rows.map { |row| row.first }.uniq
=> ["one", "two", "three"]
>> groups.each do |group|
?> puts group
>> rows.select { |row| row.first == group }.each { |row| puts " #{row.inspect}" }
>> end
one
   ["one", "1", "A"]
   ["one", "2", "B"]
   ["one", "3", "C"]
two
   ["two", "1", "A"]
   ["two", "2", "B"]
three
   ["three", "1", "A"]
=> ["one", "two", "three"]

James Edward Gray II

···

On Aug 30, 2006, at 11:01 AM, Drew Olson wrote:

Thank you both for the responses. Both seem to be EXTREMELY helpful.
I'll be sure to post issues I have in the form in the future.

Thanks,
Drew

···

--
Posted via http://www.ruby-forum.com/.