Parsing a comma-separated file

Justin_To · 9 June 2008 16:11

Hi, I had a question about parsing just one line at a time beforehand
and now I'm working on a program to parse multiple items on each
line-something like the following:

name, age, gender
Bob, 32, M
Stacy, 14, F
...
...

How do I parse 'Bob', knowing it's the first element on the line, '32'
is the second, 'M' is the last...I've been reading about regular
expressions. Is this the best way to solve this problem? And how exactly
do you use them?

Thanks!!

···

--
Posted via http://www.ruby-forum.com/.

ThoML · 9 June 2008 16:25

Are you looking for this?
http://fastercsv.rubyforge.org/

Ruby also has the csv standard library.

Regards,
Thomas.

Greg_Willits · 9 June 2008 22:55

name, age, gender
Bob, 32, M
Stacy, 14, F
...
How do I parse 'Bob', knowing it's the first element on the line, '32'
is the second, 'M' is the last...I've been reading about regular
expressions. Is this the best way to solve this problem? And how exactly
do you use them?

This doesn't handle all CSV specs, but if you know you have pure data
like you show above, these are the rudimentary steps without the
one-liner tricks, so it should be pretty straight forward to understand
each step. Arranging them as methods to a class would be good.

# read the file into a var

  if FileTest::exist?(file_name)
    file_lines = IO.readlines(file_name)
  end

# normalize line endings so it doesn't matter what they are

  file_lines.strip!
  file_lines.gsub!(/\r\n/,'\n')
  file_lines.gsub!(/\r/,'\n')

# normalize comma delimiters so it doesn't matter
# if you have one, two or one,two or one , two etc...

file_lines.gsub!(/\s*,\s*/, ',')

# split lines into a single array of lines

lines_array = file_lines.split('\n')

# split each line into an array

final_data =

  lines_array.each do |this_line|
    final_data << this_line.split(',')
  end

# final_data is now an array of arrays that looks like this:

  [
    ['name', 'age', 'gender'],
    ['Bob', '32', 'M'],
    ['Stacy', '14', 'F']
  ]

So, to get Bob, you'd have to know his line number, and index into the
record array:

final_data[1][0] # Bob
final_data[2][3] # F

-- greg willits

···

--
Posted via http://www.ruby-forum.com/\.

Justin_To · 9 June 2008 16:46

ThoML wrote:

Are you looking for this?
http://fastercsv.rubyforge.org/

Ruby also has the csv standard library.

Regards,
Thomas.

That is great Thomas! Although, I'd like to know how to do it with the
regular expressions as well.

Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

Justin_To · 9 June 2008 23:54

final_data[1][0] # Bob
final_data[2][3] # F

should the last one be:
final_data[2][2] # F
??

Thanks! Also, is this an effective way to parse a large file. What if I
had to read a million lines with multiple columns? Would this solution
still be practical?

Thanks again!

···

--
Posted via http://www.ruby-forum.com/\.

Avdi_Grimm1 · 9 June 2008 16:56

I'd recommend using Sring#split. In the simplest case you could just
specify line.split(','); no regular expressions needed. If you wanted
you could use a regular expression argument to #split in order to skip
whitespace:

line.split(/\s*,\s*/)

but you could just as easily trim the values after the fact too:

line.split(',').map{|v| v.strip}

Regular expressions are not the best solution for parsing CSV,
especially once you start dealing with quoted values.

···

On Mon, Jun 9, 2008 at 12:46 PM, Justin To <tekmc@hotmail.com> wrote:

That is great Thomas! Although, I'd like to know how to do it with the
regular expressions as well.

--
Avdi

Home: http://avdi.org
Developer Blog: Avdi Grimm, Code Cleric
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com

Justin_To · 9 June 2008 18:08

So is the fasterCSV the most effective way of parsing a comma-separated
file?

···

--
Posted via http://www.ruby-forum.com/.

Avdi_Grimm1 · 9 June 2008 19:09

It is the fastest and most robust way.

···

On Mon, Jun 9, 2008 at 2:08 PM, Justin To <tekmc@hotmail.com> wrote:

So is the fasterCSV the most effective way of parsing a comma-separated
file?

--
Avdi

Home: http://avdi.org
Developer Blog: http://avdi.org/devblog/
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com

Charles_Walden · 9 June 2008 21:52

My experience (at least a year ago) was that fastercsv was a great way to go if you had very clean files without errors, odd characters, etc. Unfortunately, I had files that were a bit more problematic and so I ended up using a combination of either parsing it myself (split, regexs. etc) and catching all the errors and handling them or using the parse_line method in the standard csv library.

···

On Jun 9, 2008, at 2:09 PM, Avdi Grimm wrote:

On Mon, Jun 9, 2008 at 2:08 PM, Justin To <tekmc@hotmail.com> wrote:

So is the fasterCSV the most effective way of parsing a comma-separated
file?

It is the fastest and most robust way.

--
Avdi

Home: http://avdi.org
Developer Blog: http://avdi.org/devblog/
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com

Justin_To · 9 June 2008 22:53

Great guys, thanks for the help!

···

--
Posted via http://www.ruby-forum.com/.

James_Edward_Gray_II · 9 June 2008 23:26

FasterCSV has a parse_line() method as well, just FYI.

James Edward Gray II

···

On Jun 9, 2008, at 4:52 PM, Charles Walden wrote:

My experience (at least a year ago) was that fastercsv was a great way to go if you had very clean files without errors, odd characters, etc. Unfortunately, I had files that were a bit more problematic and so I ended up using a combination of either parsing it myself (split, regexs. etc) and catching all the errors and handling them or using the parse_line method in the standard csv library.

Topic		Replies	Views
Regexp help: Parsing a CSV file ruby-talk	26	189	27 February 2003
Parsing a CSV file having multiple records in RUBYp ruby-talk	7	127	27 December 2006
Simple regexp question ruby-talk	4	69	28 September 2007
Ruby Quiz is Back - Challenge #1 - Read Comma-Separated Values (CSV) from the "Real World" ruby-talk	4	463	29 October 2018
Faster CSV parsing ruby-talk	10	80	30 October 2005

Parsing a comma-separated file

Related topics