FasterCSV problem

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

Thanks,

···

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

it that's indeed the case why not simply do it yourself?

     harp:~ > cat a.rb
     require 'rubygems'
     require 'fastercsv'

     def munge line
       line.gsub!(%r/"+/){|q| q.size % 2 == 0 ? q : '"' + q}
       line.gsub!(%r/\ *\t\ */, '","')
       "%s%s%s" % ['"', line, '"']
     end

     def show line
       puts line
       munged = munge line
       puts munged
       p(FCSV.parse(munged).first)
       puts
     end

     lines = <<-lines
     20 6" Multibrand Pricer Insert 2 4
     20 6"" Multibrand Pricer Insert 2 4
     20 6""" Multibrand Pricer Insert 2 4
     20 6"""" Multibrand Pricer Insert 2 4
     lines

     lines.each{|line| show line.strip}

     harp:~ > ruby a.rb
     20 6" Multibrand Pricer Insert 2 4
     "20","6""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"", "Multibrand", "Pricer", "Insert", "2", "4"]

     20 6"" Multibrand Pricer Insert 2 4
     "20","6""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"", "Multibrand", "Pricer", "Insert", "2", "4"]

     20 6""" Multibrand Pricer Insert 2 4
     "20","6""""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"\"", "Multibrand", "Pricer", "Insert", "2", "4"]

     20 6"""" Multibrand Pricer Insert 2 4
     "20","6""""","Multibrand","Pricer","Insert","2","4"
     ["20", "6\"\"", "Multibrand", "Pricer", "Insert", "2", "4"]

if fastercsv handled __all__ the 'simple' exectptions is would be slow and
complicated to maintain.

kind regards.

-a

···

On Tue, 29 Aug 2006, Mark Van Holstyn wrote:

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

Thanks,

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

Well, if quotes aren't quoted it's not CVS and all the parser you really need is:

   line.split("\t")

right? :wink:

FasterCSV uses a very strict parser, so no it won't allow this. Sorry.

James Edward Gray II

···

On Aug 28, 2006, at 3:40 PM, Mark Van Holstyn wrote:

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

I did end up cleaning the row myself. I just wondered if I was missing the
option somewhere. The only reason I ask is because Excel/OOCalc allow you to
say whether or not fields are surrounded by "'s. It would be a nice option.

mark

···

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

I guess I'm dense today...

FasterCSV is for parsing CSV. Without quoting, we are not talking about CSV.

Can you please explain how `fields = line.split("\t")` fails you?

If there's a real need for this, I'll consider it. But right now I would implement it as the above and I hope that's not what your asking for. :wink:

James Edward Gray II

···

On Aug 28, 2006, at 5:22 PM, Mark Van Holstyn wrote:

I did end up cleaning the row myself. I just wondered if I was missing the
option somewhere. The only reason I ask is because Excel/OOCalc allow you to
say whether or not fields are surrounded by "'s. It would be a nice option.

FasterCSV is for parsing CSV. Without quoting, we are not talking
about CSV.

Technically, yes.

Can you please explain how `fields = line.split("\t")` fails you?

This would work in my situation just fine.

If there's a real need for this, I'll consider it. But right now I

would implement it as the above and I hope that's not what your
asking for. :wink:

If this is something you dont thinks should be in the CSV library, because
is it not actually "official" csv, then that is fine. I look at that file as
being "almost" CSV (with the exception of putting "'s around fields). The
only reason I even ran into this is because mysql outputs bad csv

mark

···

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

If this is something you dont thinks should be in the CSV library, because
is it not actually "official" csv, then that is fine.

Well, it's more that I don't see what I can give you that split() doesn't. Hard for me to improve on that, you know? :wink:

I look at that file as
being "almost" CSV (with the exception of putting "'s around fields).

In proper CSV the 6" field would really be:

   "6"""

It's pretty different. Without the quotes it's illegal to use \t, \r, and \n in fields (I assume). There's just really nothing there you need a parser for, in my opinion.

James Edward Gray II

···

On Aug 28, 2006, at 5:42 PM, Mark Van Holstyn wrote: