CSV bug?

Nahi aka NAKAMURA, Hiroshi [mailto:nahi@keynauts.com] humbly replied:

In particular, the CSV looks something like this:

row 1, field 1, field 2\r\n
row 2, “some\n
text”, field 2\r\n

CSV module (of mine) just does not expect the space just after field
delimiter(,) and before quote marker(").

0% ruby -rcsv -e ‘CSV::Reader.parse(%Q(“a”,“b”)) { |row| p
row }’ [“a”, “b”] 0% ruby -rcsv -e ‘CSV::Reader.parse(%Q(“a”,
“b”)) { |row| p row }’ /usr/local/lib/ruby/1.9/csv.rb:557:in
get_row': CSV::IllegalFormatError (CSV::IllegalFormatError) from /usr/local/lib/ruby/1.9/csv.rb:506:in each’
from /usr/local/lib/ruby/1.9/csv.rb:484:in `parse’
from -e:1

Does Excel generates such a line? Hmm. It must be supported

I am not SER, but I hope I can comment, too, pls.

then, even
though it could be a option. Do you expect ‘a, “b”’ is
parsed as
‘a,“b”’, not an error, right?

Yes, it must NOT be an error. You may include spaces (to be close to
original), but it does not matter now (on my case only).

What do you think about
‘a,junk"b"’ and

Yes, we need it here. Parse it like [‘a’,‘junk"b"’]. Eg, we have lines like

1;SMTP:“local;part”@test.com;x400:blahblahblah;smtp:test@test.com
2;SMTP:“local;part2”@test.com;x400:blahblahblah;smtp:test2@test.com

Though my sample is “;”-delimited, I would like to get the ff result in
parsing line 1 as:

[‘1’,‘SMTP:"local;part"@test.com’,‘x400:blahblahblah’,‘smtp:test@test.com’]

I really hope that is possible (I beg).

‘a,\t\r\n"b"’?

As long as the delimiter is “,” , then that should be parse as
[‘a’,‘\t\r\n"b"’]

Regards,
// NaHi

Thank you for your csv module. It’s very helpful. btw, Who does the
documentation? The doc is very sparse. I hope I can help :frowning:

kind regards -botp

Hi,

Peña, Botp wrote:

CSV module (of mine) just does not expect the space just after field
delimiter(,) and before quote marker(").

0% ruby -rcsv -e ‘CSV::Reader.parse(%Q(“a”,“b”)) { |row| p
row }’ [“a”, “b”] 0% ruby -rcsv -e ‘CSV::Reader.parse(%Q(“a”,
“b”)) { |row| p row }’ /usr/local/lib/ruby/1.9/csv.rb:557:in
get_row': CSV::IllegalFormatError (CSV::IllegalFormatError) from /usr/local/lib/ruby/1.9/csv.rb:506:in each’
from /usr/local/lib/ruby/1.9/csv.rb:484:in `parse’
from -e:1

Does Excel generates such a line? Hmm. It must be supported

I am not SER, but I hope I can comment, too, pls.

Of course.

What do you think about
‘a,junk"b"’ and

Yes, we need it here. Parse it like [‘a’,‘junk"b"’].

I understand the point. And Excel parses it as [“a”, “junk"b"”]. So
it must be a bug of my CSV module.

Eg, we have lines like

1;SMTP:“local;part”@test.com;x400:blahblahblah;smtp:test@test.com
2;SMTP:“local;part2”@test.com;x400:blahblahblah;smtp:test2@test.com

With the following patch, CSV module can handle non-quoted " field. But…

— csv.rb.dist 2004-04-20 23:17:20.000000000 +0900
+++ csv.rb 2004-05-18 14:15:13.698072000 +0900
@@ -398,5 +398,5 @@ class CSV
state = :ST_QUOTE
else

  •          raise IllegalFormatError.new
    
  •          cell.data << c.chr
            end
          elsif state.equal?(:ST_QUOTE)
    

Though my sample is “;”-delimited, I would like to get the ff result in
parsing line 1 as:

[‘1’,‘SMTP:"local;part"@test.com’,‘x400:blahblahblah’,‘smtp:test@test.com’]

I really hope that is possible (I beg).

Unfortunately, it works as below…

0% cat pena.csv
1;SMTP:“local;part”@test.com;x400:blahblahblah;smtp:test@test.com
2;SMTP:“local;part2”@test.com;x400:blahblahblah;smtp:test2@test.com

0% ruby -rcsv -e ‘CSV.parse(“pena.csv”, ?:wink: { |row| p row.to_a }’
[“1”, “SMTP:"local”, “part"@test.com”, “x400:blahblahblah”,
smtp:test@test.com”]
[“2”, “SMTP:"local”, “part2"@test.com”, “x400:blahblahblah”,
smtp:test2@test.com”]

Excel and OpenOffice seems to parse as the same (though Excel cannot
handle ; separated value). Does this enough for your purpose?

As long as the delimiter is “,” , then that should be parse as
[‘a’,‘\t\r\n"b"’]

Sure.

Thank you for your csv module. It’s very helpful. btw, Who does the
documentation? The doc is very sparse. I hope I can help :frowning:

It’s my fault. Ruby has now great RDoc with it but I don’t like
comments in code… it confuses me, even though I set
set foldexpr=getline(v:lnum)=~‘^\s*#.*’
set foldmethod=expr
in my .vimrc.

The newest version is at
http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/lib/csv/ , and it will be
merged to Ruby/1.9 soon. I hope someone point out which API seems not
enough to be documented and not easy to use. It must be a bad interface
design.

Regards,
// NaHi

I have news for you, botp: you do the documentation :slight_smile:

If you can write a single document that gives a good overview of
csv.rb, to whatever level of detail you think is appropriate, then I
will include it in the standard library and everyone will benefit.

I’ll offer as much help as you need to get it polished, but
essentially it has to come from someone who understands the library.

So what do you think?

Cheers,
Gavin

···

On Tuesday, May 18, 2004, 2:16:16 PM, Botp wrote:

Thank you for your csv module. It’s very helpful. btw, Who does the
documentation? The doc is very sparse. I hope I can help :frowning:

BTW, the csv.rb is really nice. I wanted to thank you for it. I’m
doing a subversive Ruby project at work, and was doing it “the hard
way” before I discovered csv.rb. Now the app is smaller and more
robust.

Thanks again!

— SER

Hi,

SER wrote:

BTW, the csv.rb is really nice. I wanted to thank you for it. I’m
doing a subversive Ruby project at work, and was doing it “the hard
way” before I discovered csv.rb. Now the app is smaller and more
robust.

Great to hear. Thank you and all users.

I recently added a feature which allows multi-char sequence as a record
separator and field separator. And some incompatible changes such as
CSV::Row → Array and CSV::Cell → String. It passes all tests as
before so I believe the newer version will work for my “the hard way”
thing, but API incompatibility will cause problems for many users.
Sorry for users in advance.

With the new version, you can do;

0% ruby -rcsv -e ’
CSV::Reader.parse(“a||b==c||d”, “||”, “==”) { |r| p r }

[“a”, “b”]
[“c”, “d”]

0% cat matrix.csv; echo

···

o
-±±
x>x>o
-±±
o> >

0% ruby -rcsv -e ’
CSV.parse(“matrix.csv”, “|”, “\n-±±\n”) { |r| p r }

[" ", " ", “o”]
[“x”, “x”, “o”]
[“o”, " ", nil]

And say your boss doesn’t like open source because of security;

0% cat src.rb
require ‘openssl’
text = ARGV.shift
key = ‘it is my secret key!’
des = OpenSSL::Cipher::DES.new( ‘EDE3’, ‘CBC’ )
des.decrypt( key )
result = des.update( text.unpack( ‘m’ )[0] )
result += des.final
p result

0% ruby src.rb DMZZ+eRpyQbv+nRlr6Y5BQ==
“hello world”

0% ruby -rcsv -e ’
CSV.generate(“dest.csv”, “ae”, “oe”) { |w|
CSV.parse(“src.rb”, " ") { |r| w << r }
}

0% cat dest.csv; echo

requireae’openssl’oetextae=aeARGV.shiftoekeyae=ae’itaeisaemyaesecretaekey!‘oedesae=aeOpenSSL::Cipher::DES.new(ae’EDE3’,ae’CBC’ae)oedes.decrypt(aekeyae)oeresultae=aedes.update(aetext.unpack(ae’m’ae)[0]ae)oeresultae+=aedes.finaloepaeresultoe

0% ruby -rcsv -e ’
CSV.generate(“dest.rb”, " ") { |w|
CSV.parse(“dest.csv”, “ae”, “oe”) { |r| w << r }
}

0% diff src.rb dest.rb

0%

:slight_smile:

The current version is in csv’s repository located at
http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/lib/csv/
and I’ll merge it to ruby’s repository tonight. Of course at HEAD(1.9).
What do you think about the branch for 1.8? I’m afraid that 1.8 users
don’t want incompatibilities…

Here’s the summary of API change (excerpted from README.txt);

[CAUTION] API change: CSV::Row removed. A row is represented as just an
Array. Since CSV::Row was a subclass of Array, it won’t hurt almost all
programs except one which depended CSV::Row#match.

[CAUTION] API change: CSV::Cell removed. A cell is represented as just
a String or nil(NULL). This change will cause widespread destruction.

CSV.open(“foo.csv”, “r”) do |row|
row.each do |cell|
if cell.is_null # Cell#is_null
p “(NULL)”
else
p cell.data # Cell#data
end
end
end

must be just;

CSV.open(“foo.csv”, “r”) do |row|
row.each do |cell|
if cell.nil?
p “(NULL)”
else
p cell
end
end
end

[CAUTION] record separator(CR, LF, CR+LF) behavior change: CSV.open,
CSV.parse, and CSV,generate now do not force opened file binmode.
Formerly it set binmode explicitly. With CSV.open, binmode of opened
file depends the given mode parameter “r”, “w”, “rb”, and “wb”.
CSV.parse and CSV.generate open file with “r” and “w”. Setting mode
properly is user’s responsibility now.

Regards,
// NaHi