FasterCSV problem

Mark_Van_Holstyn1 · 28 August 2006 20:40

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

Thanks,

···

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

Ara.T.Howard6 · 28 August 2006 21:23

it that's indeed the case why not simply do it yourself?

     harp:~ > cat a.rb
     require 'rubygems'
     require 'fastercsv'

     def munge line
       line.gsub!(%r/"+/){|q| q.size % 2 == 0 ? q : '"' + q}
       line.gsub!(%r/\ *\t\ */, '","')
       "%s%s%s" % ['"', line, '"']
     end

     def show line
       puts line
       munged = munge line
       puts munged
       p(FCSV.parse(munged).first)
       puts
     end

     lines = <<-lines
     20 6" Multibrand Pricer Insert 2 4
     20 6"" Multibrand Pricer Insert 2 4
     20 6""" Multibrand Pricer Insert 2 4
     20 6"""" Multibrand Pricer Insert 2 4
     lines

lines.each{|line| show line.strip}

     harp:~ > ruby a.rb
     20 6" Multibrand Pricer Insert 2 4
     "20","6""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"", "Multibrand", "Pricer", "Insert", "2", "4"]

     20 6"" Multibrand Pricer Insert 2 4
     "20","6""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"", "Multibrand", "Pricer", "Insert", "2", "4"]

     20 6""" Multibrand Pricer Insert 2 4
     "20","6""""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"\"", "Multibrand", "Pricer", "Insert", "2", "4"]

     20 6"""" Multibrand Pricer Insert 2 4
     "20","6""""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"\"", "Multibrand", "Pricer", "Insert", "2", "4"]

if fastercsv handled __all__ the 'simple' exectptions is would be slow and
complicated to maintain.

kind regards.

-a

···

On Tue, 29 Aug 2006, Mark Van Holstyn wrote:

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

Thanks,

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

James_Edward_Gray_II · 28 August 2006 21:25

Well, if quotes aren't quoted it's not CVS and all the parser you really need is:

line.split("\t")

right?

FasterCSV uses a very strict parser, so no it won't allow this. Sorry.

James Edward Gray II

···

On Aug 28, 2006, at 3:40 PM, Mark Van Holstyn wrote:

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

Mark_Van_Holstyn1 · 28 August 2006 22:22

I did end up cleaning the row myself. I just wondered if I was missing the
option somewhere. The only reason I ask is because Excel/OOCalc allow you to
say whether or not fields are surrounded by "'s. It would be a nice option.

mark

···

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

James_Edward_Gray_II · 28 August 2006 22:30

I guess I'm dense today...

FasterCSV is for parsing CSV. Without quoting, we are not talking about CSV.

Can you please explain how `fields = line.split("\t")` fails you?

If there's a real need for this, I'll consider it. But right now I would implement it as the above and I hope that's not what your asking for.

James Edward Gray II

···

On Aug 28, 2006, at 5:22 PM, Mark Van Holstyn wrote:

I did end up cleaning the row myself. I just wondered if I was missing the
option somewhere. The only reason I ask is because Excel/OOCalc allow you to
say whether or not fields are surrounded by "'s. It would be a nice option.

Mark_Van_Holstyn1 · 28 August 2006 22:42

FasterCSV is for parsing CSV. Without quoting, we are not talking
about CSV.

Technically, yes.

Can you please explain how `fields = line.split("\t")` fails you?

This would work in my situation just fine.

If there's a real need for this, I'll consider it. But right now I

would implement it as the above and I hope that's not what your
asking for.

If this is something you dont thinks should be in the CSV library, because
is it not actually "official" csv, then that is fine. I look at that file as
being "almost" CSV (with the exception of putting "'s around fields). The
only reason I even ran into this is because mysql outputs bad csv

mark

···

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

James_Edward_Gray_II · 28 August 2006 22:57

If this is something you dont thinks should be in the CSV library, because
is it not actually "official" csv, then that is fine.

Well, it's more that I don't see what I can give you that split() doesn't. Hard for me to improve on that, you know?

I look at that file as
being "almost" CSV (with the exception of putting "'s around fields).

In proper CSV the 6" field would really be:

"6"""

It's pretty different. Without the quotes it's illegal to use \t, \r, and \n in fields (I assume). There's just really nothing there you need a parser for, in my opinion.

James Edward Gray II

···

On Aug 28, 2006, at 5:42 PM, Mark Van Holstyn wrote:

Topic		Replies	Views
Q about the FasterCSV ruby-talk	2	74	27 April 2006
Q about the FasterCSV ruby-talk	0	76	27 April 2006
Q about the FasterCSV ruby-talk	4	78	9 May 2006
FasterCSV: preserving quoted strings ruby-talk	6	101	9 July 2009
Q about the FasterCSV ruby-talk	9	129	8 February 2011

FasterCSV problem

Related topics