Csv parsing issue

I've generated a CSV Document using Open Office csv export.
When I read it, with ruby 1.8.7 everything is fine.

Code:

require 'csv'
reader = CSV.open("C:tmp/document.csv", "r")

headline = reader.shift
reader.each do |row|
  puts row
end
reader.close()

When I read the same document with ruby 1.9.2, then I get the following
error:

C:/Ruby192/lib/ruby/1.9.1/csv.rb:1886:in `block (2 levels) in shift':
CSV::MalformedCSVError (CSV::MalformedCSVError)
        from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1863:in `each'
        from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1863:in `block in shift'
        from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1825:in `loop'
        from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1825:in `shift'
        from C:/Dokumente und Einstellungen/josemi1/Eigene
Dateien/NetBeansProjects/Test/lib/main.rb:8:in `<main>'

And with jruby 1.9 I get the following error message:

CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 2).
   shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1893
    each at org/jruby/RubyArray.java:1572
   shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1863
    loop at org/jruby/RubyKernel.java:1417
   shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1825
    each at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1768
  (root) at /home/michael/NetBeansProjects/Test/lib/main.rb:12

Hint:
ruby -e 'p File.read("/tmp/document.csv")'

"\"Projekt-ID\",<< cut off some data >>,\"letzte
Anderung\"\r\n\n\"\",\"HSW G04\",\"Prim\303\244rprojekt\",<< cut off
some data>>,\"zlebpa1\",07.03.2011\r\n"

Note: I have cut out irrelevant some data above and marked it with '<<
cut off some data >>'.

Questions:
Is the problem the '\r\n\n' above?
Is it a ruby error or an open office error?

···

--
Posted via http://www.ruby-forum.com/.

I've generated a CSV Document using Open Office csv export.
When I read it, with ruby 1.8.7 everything is fine.

Code:

require 'csv'
reader = CSV.open("C:tmp/document.csv", "r")

Try using mode "rb" so the line-endings are handled by CSV rather than the OS.

headline = reader.shift

You can also pass a :headers => true on the open

reader = CSV.open("C:tmp/document.csv", "rb", :headers => true)

reader.each do |row|
puts row
end
reader.close()

And even better for your example, use the .foreach method:

CSV.foreach("C:tmp/document.csv", "rb", :headers => true) do |row|
   puts row
end

When I read the same document with ruby 1.9.2, then I get the following
error:

Aha! The CSV code in 1.9 is what was in the FasterCSV from earlier versions. (gem install fastercsv)

C:/Ruby192/lib/ruby/1.9.1/csv.rb:1886:in `block (2 levels) in shift':
CSV::MalformedCSVError (CSV::MalformedCSVError)
       from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1863:in `each'
       from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1863:in `block in shift'
       from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1825:in `loop'
       from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1825:in `shift'
       from C:/Dokumente und Einstellungen/josemi1/Eigene
Dateien/NetBeansProjects/Test/lib/main.rb:8:in `<main>'

And with jruby 1.9 I get the following error message:

CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 2).
  shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1893
   each at org/jruby/RubyArray.java:1572
  shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1863
   loop at org/jruby/RubyKernel.java:1417
  shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1825
   each at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1768
(root) at /home/michael/NetBeansProjects/Test/lib/main.rb:12

Hint:
ruby -e 'p File.read("/tmp/document.csv")'

"\"Projekt-ID\",<< cut off some data >>,\"letzte
Anderung\"\r\n\n\"\",\"HSW G04\",\"Prim\303\244rprojekt\",<< cut off
some data>>,\"zlebpa1\",07.03.2011\r\n"

Note: I have cut out irrelevant some data above and marked it with '<<
cut off some data >>'.

Questions:
Is the problem the '\r\n\n' above?
Is it a ruby error or an open office error?

It's entirely possible that the error is from OO, but the use of the "rb" mode might solve the problem, too. (In which case, the blame is moot.)

-Rob

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

···

On Aug 31, 2011, at 3:31 PM, Michael Blue wrote:

Hello Rob,

Thank you for the answer.
No. Using the rb-mode does not solve the problem. The error still
occurs.

···

--
Posted via http://www.ruby-forum.com/.

Michael Blue wrote in post #1019459:

Hint:
ruby -e 'p File.read("/tmp/document.csv")'

"\"Projekt-ID\",<< cut off some data >>,\"letzte
Anderung\"\r\n\n\"\",\"HSW G04\",\"Prim\303\244rprojekt\",<< cut off
some data>>,\"zlebpa1\",07.03.2011\r\n"

You really couldn't come up with a 3 word sentence that duplicates the
problem?

Hint: post something legible.

···

--
Posted via http://www.ruby-forum.com/\.

Did any of the other suggestions work?

The "rb" option did surprisingly work on windows (ruby), on Linux
(jruby) the error remained.

In particular, the fact that CSV in 1.8.x and CSV in 1.9 are *different code*.
If you stay with 1.8.7, try using FasterCSV.

On 1.8.7. (Linux,jruby) with FasterCSV I get the same error like on
1.9.2 (Windows, ruby) with CSV.
It seems to be an issue with FasterCSV. I am not sure if there is an
additional issue with jruby.

I have attached 2 test-files that I have prepared with a hex editor from
the originally very large file.

Attachments:
http://www.ruby-forum.com/attachment/6571/testfile.csv
http://www.ruby-forum.com/attachment/6572/testfile1.csv

···

--
Posted via http://www.ruby-forum.com/\.

Did any of the other suggestions work? In particular, the fact that CSV in 1.8.x and CSV in 1.9 are *different code*.

If you stay with 1.8.7, try using FasterCSV.

-Rob

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

···

On Aug 31, 2011, at 5:09 PM, Michael Blue wrote:

Hello Rob,

Thank you for the answer.
No. Using the rb-mode does not solve the problem. The error still
occurs.