Q about the FasterCSV

_Pena_Botp1 · 27 April 2006 07:32

# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

eg,

test, "1" ==> ["test","1"]

iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

eg,

test, "1" ==> ["test"," \"1\""]
test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above

Also, it would be nice if fastercsv could show what particular field it balked

kind regards -botp

Eric_Luo1 · 27 April 2006 08:01

I agree!

I'm really missing this feature for my current work!

···

On 4/27/06, Peña, Botp <botp@delmonte-phil.com> wrote:

Dave Burt [mailto:dave@burt.id.au] :

# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

  eg,

  test, "1" ==> ["test","1"]

  iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

  eg,

  test, "1" ==> ["test"," \"1\""]
  test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above

Also, it would be nice if fastercsv could show what particular field it
balked

kind regards -botp

James_Edward_Gray_II · 27 April 2006 11:58

and mabye, fastercsv can be more "intelligent" than other csv by

FasterCSV is intentionally a strict parser. For one thing, that helps a lot with the speed.

Also, it would be nice if fastercsv could show what particular field it balked

Sheesh, I just got it doing line numbers very recently. (Hard in CSV were \n can be embedded in a field.) It's never enough...

James Edward Gray II

···

On Apr 27, 2006, at 2:32 AM, Peña, Botp wrote:

Dave_Burt2 · 27 April 2006 15:28

Peña wrote:

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

Thanks, I think. (I don't know how many Rubyists Microsoft has - I don't
recall anyone on this list signing their email with an MS certification.)

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

  eg,

  test, "1" ==> ["test","1"]

  iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

  eg,

  test, "1" ==> ["test"," \"1\""]
  test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above

Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":

require 'faster_csv'
class FasterCSV
  # Pre-compiles parsers and stores them by name for access during
  # reads, just like the official FasterCSV version, BUT the central
  # parser allows arbitrary whitespace before and after the column
  # separator.
  def init_parsers( options )
    # prebuild Regexps for faster parsing
    @parsers = {
      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
      :csv_row =>
        ### The Primary Parser ###
        / \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
          \s* # <----- # ignore some whitespace
          (?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
              > # ... or ...
              ([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
              )/x,
        ### End Primary Parser ###
      :line_end =>
        /#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
    }
  end
end

All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.

Cheers,
Dave

Abu_A · 8 February 2011 17:33

I'm trying to upload data into the database and I've done so using
paperclip. However, l am having trouble loading the contents into the
database using fastercsv. I am using Hobo, but I suppose after managing
to upload the csv file its standard RoR.

This is my model:

import.rb:

class Import < ActiveRecord::Base

hobo_model # Don't put anything above this

fields do
   datatype :string
   abu :string
   paul :string
   age :integer
  timestamps
end

# Paperclip
         has_attached_file :csv
         validates_attachment_presence :csv
         validates_attachment_content_type :csv, :content_type =>
['text/csv','text/comma-separated-values','text/csv','application/csv','application/excel','application/vnd.ms-excel','application/vnd.msexcel','text/anytext','text/plain']

this works fine and it loads the csv file in public/systems/csvs

I am having trouble using Fastercsv to load the contents into the
database.

Can you point me to the right direction with this please.

Thanks in advance.

Abu

···

--
Posted via http://www.ruby-forum.com/.

James_Edward_Gray_II · 27 April 2006 15:35

Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":

require 'faster_csv'
class FasterCSV
  # Pre-compiles parsers and stores them by name for access during
  # reads, just like the official FasterCSV version, BUT the central
  # parser allows arbitrary whitespace before and after the column
  # separator.
  def init_parsers( options )
    # prebuild Regexps for faster parsing
    @parsers = {
      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields

You should modify the above line too. It takes both to correctly parse some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

      :csv_row =>
        ### The Primary Parser ###
        / \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
          \s* # <----- # ignore some whitespace
          (?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
              > # ... or ...
              ([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
              )/x,
        ### End Primary Parser ###
      :line_end =>
        /#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
    }
  end
end

All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.

Looks good to me. Just don't hold your breath waiting on the patch...

James Edward Gray II

···

On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

JEG2 · 8 February 2011 19:42

I'm not totally sure I understand the question, but loading data with FasterCSV is usually done something like:

FSCV.foreach( path, :header => true,
:header_converters => :symbol ) do |row|
SomeModel.create!(row.to_hash)
end

Hope that helps.

James Edward Gray II

···

On Feb 8, 2011, at 11:33 AM, Abu A. wrote:

I am having trouble using Fastercsv to load the contents into the
database.

Dave_Burt2 · 27 April 2006 15:52

James Edward Gray II wrote:

:leading_fields =>
/\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields

You should modify the above line too. It takes both to correctly parse
some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.

Looks good to me. Just don't hold your breath waiting on the patch...

Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library

Cheers,
Dave

[1] faster_csv.rb lines 1114..1115:
csv = if parse.sub!(@parsers[:leading_fields], "")
[nil] * $&.length

P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.

···

On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

James_Edward_Gray_II · 27 April 2006 16:42

Good points all around. Dave knows this code better than I do, clearly.

James Edward Gray II

···

On Apr 27, 2006, at 10:52 AM, Dave Burt wrote:

James Edward Gray II wrote:

On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields

You should modify the above line too. It takes both to correctly parse
some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.

Looks good to me. Just don't hold your breath waiting on the patch...

Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library

Cheers,
Dave

[1] faster_csv.rb lines 1114..1115:
      csv = if parse.sub!(@parsers[:leading_fields], "")
        [nil] * $&.length

P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.

Dave_Burt2 · 27 April 2006 18:43

James Edward Gray II wrote:

Good points all around. Dave knows this code better than I do,
clearly.

Thanks, but credit to you -- IIRC part of your stated aim for FasterCSV
was to make it short, legible, and therefore maintainable, and if I can
pick up this stuff in literally one minute of looking at the code,
you've succeeded. Well done.

Cheers,
Dave

Topic		Replies	Views
Q about the FasterCSV ruby-talk	1	101	27 April 2006
Q about the FasterCSV ruby-talk	1	73	28 April 2006
Q about the FasterCSV ruby-talk	1	77	28 April 2006
Q about the FasterCSV ruby-talk	2	70	27 April 2006
[ANN] FasterCSV 0.1.3--CSV parsing without the wait! ruby-talk	0	120	16 November 2005

Q about the FasterCSV

Related topics