1.9 CSV Parsing Issues

I'm currently porting a script to 1.9 and I'm having problems getting
CSV parsing to work. This script worked fine in 1.8.7 and used the
FasterCSV library for parsing. After playing around in the IRB, I have
determined that the current parser seems incapable of handling newlines
as row seperators (a rather basic and important feature).

I tested with a simple file whose contents are:
field1,field2
field3,field4

This file was created using a basic text editor and does not contain any
unorthodox newline characters. Attempting to parse this file results in
the following error:

C:/Ruby192/lib/ruby/1.9.1/csv.rb:1885:in `block (2 levels) in shift':
Unquoted fields do not allow \r or \n (line 1). (CSV::MalformedCSVError)
  from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1856:in `each'
  from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1856:in `block in shift'
  from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1818:in `loop'
  from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1818:in `shift'
  from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1760:in `each'

The return value of the opened csv file shows row_sep to be "\r\n" which
seems correct. I have tried manually setting the value of row_sep when
calling CSV::open but I get the same issue.

Once again, I do not have this problem with FasterCSV under 1.8.7 (which
as I understand, is the same code used in 1.9's csv library). I'm using
Ruby 1.9.2p0 on Windows XP. I would greatly appreciate any help.

···

--
Posted via http://www.ruby-forum.com/.

I'm currently porting a script to 1.9 and I'm having problems getting
CSV parsing to work.

I tested with a simple file whose contents are:
field1,field2
field3,field4

CSV should definitely handle that data. Indeed it does for me:

$ ruby -v -r csv -e 'p CSV.parse("field1,field2\r\nfield3,field4\r\n")'
ruby 1.9.2dev (2010-04-28 trunk 27536) [x86_64-darwin10.3.0]
[["field1", "field2"], ["field3", "field4"]]

This file was created using a basic text editor and does not contain any
unorthodox newline characters.

Can we see exactly what the file does contain, with code like:

$ ruby -e 'p File.read("path/to/file.csv")'

?

James Edward Gray II

···

On Nov 4, 2010, at 1:40 PM, Kenny Lam wrote:

File.read shows "field1,field2\nfield3,field4\n"
I have played around with the some of the other methods and have
determined that this problem only seems to occur when using CSV::open
and then looped through with CSV::each. CSV::foreach and CSV::parse
seem fine. Unfortunately, I need to use CSV::open because I need a
reference to the opened file object in order to do some file cursor
manipulation.

Other things I have noted is that when running CSV.open('file','r') the
result is show:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\r\n" quote_char:"\"">

While CSV.open('test.log','r',:row_sep => '\r\n') shows result:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\\r\\n" quote_char:"\"">

The double backslashes make me question if the escape character is being
processed correctly. I am relatively new to Ruby, am I using the
language incorrectly or is this a bug?

···

--
Posted via http://www.ruby-forum.com/.

Excellent, that works perfectly. Thanks a lot for your help.

···

--
Posted via http://www.ruby-forum.com/.

I'm running into this same error, file reads like so: (Client
Uploaded CSV)

"field1,field2\rfield3,field4\r\n"

Is this an issue with my how the CSV file was generated, or is there
some setting I can use to avoid this error?

Appreciate any assistance!

···

--
Posted via http://www.ruby-forum.com/.

The Issue is that I don't have control over the generation of the file.
(Client Uploaded)

Here is the solution I came up with, Since the File is stored on s3, I
have to write a new tempfile then edit that... Unless someone can
suggest how to read

    csv_file = open(path_to_file, "r:windows-1251:utf-8")
    csv_file.seek(-2, IO::SEEK_END) # go to end of file
    if csv_file.read == "\r\n"
      uri = URI.parse(path_to_file)
      tempfile = Tempfile.new File.basename(uri.path),
"#{Rails.root}/tmp"
      csv_file.seek 0
      tempfile.write csv_file.read
      tempfile.seek(-2, IO::SEEK_END)
      tempfile.write " "
      tempfile.seek 0
      csv_file = tempfile
    end
    ::CSV.new(csv_file, :headers => :first_row).each do |row|
    ...........

···

--
Posted via http://www.ruby-forum.com/.

You could always "chomp" the file before passing it to the parser, or
make that the default action.

···

--
Posted via http://www.ruby-forum.com/.

You can Chomp a file? I thought that was only strings

···

--
Posted via http://www.ruby-forum.com/.

Well, if you read the file into a string, you can chomp it.

···

--
Posted via http://www.ruby-forum.com/.

File.read shows "field1,field2\nfield3,field4\n"

Great. That's what we expected to see. You are right about the content.

I have played around with the some of the other methods and have
determined that this problem only seems to occur when using CSV::open
and then looped through with CSV::each. CSV::foreach and CSV::parse
seem fine.

Ah, and let me guess, you always pass a read mode of 'r' to open(), right? CSV is clever and it shuts off Ruby's line ending translation on Windows using 'rb' if you don't specify a mode. By specify a mode, you leave this feature on which allows Ruby to switch \r\n to \n as it did with the read above.

Unfortunately, I need to use CSV::open because I need a
reference to the opened file object in order to do some file cursor
manipulation.

No worries, open() is going to work for you.

Other things I have noted is that when running CSV.open('file','r') the
result is show:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\r\n" quote_char:"\"">

While CSV.open('test.log','r',:row_sep => '\r\n') shows result:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\\r\\n" quote_char:"\"">

The double backslashes make me question if the escape character is being
processed correctly. I am relatively new to Ruby, am I using the
language incorrectly or is this a bug?

You have a misunderstanding of Ruby Strings. Double quotes allow for escapes like \r or \n, but single quotes do not. You've set the :row_sep to literally slash, r, slash, and n.

I image all you need to do is switch your open() call to:

  CSV.open('path/to/file')

The library should take it from there.

Hope that helps.

James Edward Gray II

···

On Nov 4, 2010, at 2:26 PM, Kenny Lam wrote:

My pleasure.

James Edward Gray II

···

On Nov 4, 2010, at 2:52 PM, Kenny Lam wrote:

Excellent, that works perfectly. Thanks a lot for your help.

I'm running into this same error, file reads like so: (Client
Uploaded CSV)

What *same* error?

"field1,field2\rfield3,field4\r\n"

Is this an issue with my how the CSV file was generated, or is there
some setting I can use to avoid this error?

It looks like your CSV is broken. I do not know what your goal is but
I assume you want the piece above to be treated as a single record.
You could manually parse:

irb(main):013:0> s="field1,field2\rfield3,field4\r\n"
=> "field1,field2\rfield3,field4\r\n"
irb(main):014:0> CSV.parse_line(s.gsub(/[\r\n]+/, ''), col_sep: ',')
=> ["field1", "field2field3", "field4"]
irb(main):015:0> CSV.parse_line(s.gsub(/[\r\n]+/, ','), col_sep: ',')
=> ["field1", "field2", "field3", "field4", nil]

Kind regards

robert

···

On Wed, Dec 11, 2013 at 4:53 PM, a grave robber wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert's responses notwithstanding, I've never seen a CSV file that uses
just CR ("\r") as a record separator. But even then, the CSV parser
wouldn't know what to do with the LF ("\n") at the very end: is that a new
record? How would it parse that as data? And it doesn't have the same
number of fields as the others.

How *did* you generate this file?

Can you should your code for how you are currently reading and parsing this
file?

···

On Wed, Dec 11, 2013 at 9:53 AM, Mark W. <lists@ruby-forum.com> wrote:

I'm running into this same error, file reads like so: (Client
Uploaded CSV)

"field1,field2\rfield3,field4\r\n"

Is this an issue with my how the CSV file was generated, or is there
some setting I can use to avoid this error?

Appreciate any assistance!

Robert Klemme wrote in post #1130373:

I'm running into this same error, file reads like so: (Client
Uploaded CSV)

What *same* error?

Unquoted fields do not allow \r or \n (line 152344).
(CSV::MalformedCSVError)

It looks like your CSV is broken. I do not know what your goal is but
I assume you want the piece above to be treated as a single record.

Actually the \r is the record separator, and it processes the correct
number of records but errors on the \n at the end of the file.

Wouldn't your gsub line possibly strip any \r\n's from quoted text
contained in the CSV?

···

On Wed, Dec 11, 2013 at 4:53 PM, a grave robber wrote:

--
Posted via http://www.ruby-forum.com/\.

I was going to suggest to fix the generation code instead of adjusting
the parsing side - but that seemed too obvious. :slight_smile:

Cheers

robert

···

On Thu, Dec 12, 2013 at 5:47 AM, tamouse pontiki <tamouse.lists@gmail.com> wrote:

How *did* you generate this file?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Yes. For production code that needs to be made more robust of course.

Cheers

robert

···

On Wed, Dec 11, 2013 at 6:30 PM, Mark W. <lists@ruby-forum.com> wrote:

Robert Klemme wrote in post #1130373:

On Wed, Dec 11, 2013 at 4:53 PM, a grave robber wrote:

I'm running into this same error, file reads like so: (Client
Uploaded CSV)

What *same* error?

Unquoted fields do not allow \r or \n (line 152344).
(CSV::MalformedCSVError)

It looks like your CSV is broken. I do not know what your goal is but
I assume you want the piece above to be treated as a single record.

Actually the \r is the record separator, and it processes the correct
number of records but errors on the \n at the end of the file.

Wouldn't your gsub line possibly strip any \r\n's from quoted text
contained in the CSV?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/