Q about the FasterCSV

Dave Burt [mailto:dave@burt.id.au] :

# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

  eg,

  test, "1" ==> ["test","1"]

  iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

  eg,

  test, "1" ==> ["test"," \"1\""]
  test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above

Also, it would be nice if fastercsv could show what particular field it balked

kind regards -botp

I agree!

I'm really missing this feature for my current work!

···

On 4/27/06, Peña, Botp <botp@delmonte-phil.com> wrote:

Dave Burt [mailto:dave@burt.id.au] :

# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

  eg,

  test, "1" ==> ["test","1"]

  iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

  eg,

  test, "1" ==> ["test"," \"1\""]
  test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above

Also, it would be nice if fastercsv could show what particular field it
balked

kind regards -botp

and mabye, fastercsv can be more "intelligent" than other csv by

FasterCSV is intentionally a strict parser. For one thing, that helps a lot with the speed.

Also, it would be nice if fastercsv could show what particular field it balked

Sheesh, I just got it doing line numbers very recently. (Hard in CSV were \n can be embedded in a field.) It's never enough... :wink:

James Edward Gray II

···

On Apr 27, 2006, at 2:32 AM, Peña, Botp wrote:

Peña wrote:

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

Thanks, I think. (I don't know how many Rubyists Microsoft has - I don't
recall anyone on this list signing their email with an MS certification.)

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

  eg,

  test, "1" ==> ["test","1"]

  iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

  eg,

  test, "1" ==> ["test"," \"1\""]
  test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above

Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":

require 'faster_csv'
class FasterCSV
  # Pre-compiles parsers and stores them by name for access during
  # reads, just like the official FasterCSV version, BUT the central
  # parser allows arbitrary whitespace before and after the column
  # separator.
  def init_parsers( options )
    # prebuild Regexps for faster parsing
    @parsers = {
      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
      :csv_row =>
        ### The Primary Parser ###
        / \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
          \s* # <----- # ignore some whitespace
          (?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
              > # ... or ...
              ([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
              )/x,
        ### End Primary Parser ###
      :line_end =>
        /#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
    }
  end
end

All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.

Cheers,
Dave

I'm trying to upload data into the database and I've done so using
paperclip. However, l am having trouble loading the contents into the
database using fastercsv. I am using Hobo, but I suppose after managing
to upload the csv file its standard RoR.

This is my model:

import.rb:

class Import < ActiveRecord::Base

hobo_model # Don't put anything above this

fields do
   datatype :string
   abu :string
   paul :string
   age :integer
  timestamps
end

# Paperclip
         has_attached_file :csv
         validates_attachment_presence :csv
         validates_attachment_content_type :csv, :content_type =>
['text/csv','text/comma-separated-values','text/csv','application/csv','application/excel','application/vnd.ms-excel','application/vnd.msexcel','text/anytext','text/plain']

this works fine and it loads the csv file in public/systems/csvs

I am having trouble using Fastercsv to load the contents into the
database.

Can you point me to the right direction with this please.

Thanks in advance.

Abu

···

--
Posted via http://www.ruby-forum.com/.

Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":

require 'faster_csv'
class FasterCSV
  # Pre-compiles parsers and stores them by name for access during
  # reads, just like the official FasterCSV version, BUT the central
  # parser allows arbitrary whitespace before and after the column
  # separator.
  def init_parsers( options )
    # prebuild Regexps for faster parsing
    @parsers = {
      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields

You should modify the above line too. It takes both to correctly parse some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

      :csv_row =>
        ### The Primary Parser ###
        / \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
          \s* # <----- # ignore some whitespace
          (?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
              > # ... or ...
              ([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
              )/x,
        ### End Primary Parser ###
      :line_end =>
        /#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
    }
  end
end

All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.

Looks good to me. Just don't hold your breath waiting on the patch... :wink:

James Edward Gray II

···

On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

I'm not totally sure I understand the question, but loading data with FasterCSV is usually done something like:

FSCV.foreach( path, :header => true,
                     :header_converters => :symbol ) do |row|
   SomeModel.create!(row.to_hash)
end

Hope that helps.

James Edward Gray II

···

On Feb 8, 2011, at 11:33 AM, Abu A. wrote:

I am having trouble using Fastercsv to load the contents into the
database.

James Edward Gray II wrote:

      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields

You should modify the above line too. It takes both to correctly parse
some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.

Looks good to me. Just don't hold your breath waiting on the patch... :wink:

Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library :slight_smile:

Cheers,
Dave

[1] faster_csv.rb lines 1114..1115:
      csv = if parse.sub!(@parsers[:leading_fields], "")
        [nil] * $&.length

P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.

···

On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

Good points all around. Dave knows this code better than I do, clearly. :wink:

James Edward Gray II

···

On Apr 27, 2006, at 10:52 AM, Dave Burt wrote:

James Edward Gray II wrote:

On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields

You should modify the above line too. It takes both to correctly parse
some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.

Looks good to me. Just don't hold your breath waiting on the patch... :wink:

Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library :slight_smile:

Cheers,
Dave

[1] faster_csv.rb lines 1114..1115:
      csv = if parse.sub!(@parsers[:leading_fields], "")
        [nil] * $&.length

P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.

James Edward Gray II wrote:

Good points all around. Dave knows this code better than I do,
clearly. :wink:

Thanks, but credit to you -- IIRC part of your stated aim for FasterCSV
was to make it short, legible, and therefore maintainable, and if I can
pick up this stuff in literally one minute of looking at the code,
you've succeeded. Well done.

Cheers,
Dave