# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.
Dave, you're a cool rubyist. I think you are cooler than microsoft's.
and mabye, fastercsv can be more "intelligent" than other csv by
1) ignoring extra spaces in a captured separated value
I'm really missing this feature for my current work!
···
On 4/27/06, Peña, Botp <botp@delmonte-phil.com> wrote:
Dave Burt [mailto:dave@burt.id.au] :
# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.
Dave, you're a cool rubyist. I think you are cooler than microsoft's.
and mabye, fastercsv can be more "intelligent" than other csv by
1) ignoring extra spaces in a captured separated value
3) or maybe, fastercsv can include an option/flag to allow the above
Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":
require 'faster_csv'
class FasterCSV
# Pre-compiles parsers and stores them by name for access during
# reads, just like the official FasterCSV version, BUT the central
# parser allows arbitrary whitespace before and after the column
# separator.
def init_parsers( options )
# prebuild Regexps for faster parsing @parsers = {
:leading_fields =>
/\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
:csv_row =>
### The Primary Parser ###
/ \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
\s* # <----- # ignore some whitespace
(?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
> # ... or ...
([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
)/x,
### End Primary Parser ###
:line_end =>
/#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
}
end
end
All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.
I'm trying to upload data into the database and I've done so using
paperclip. However, l am having trouble loading the contents into the
database using fastercsv. I am using Hobo, but I suppose after managing
to upload the csv file its standard RoR.
This is my model:
import.rb:
class Import < ActiveRecord::Base
hobo_model # Don't put anything above this
fields do
datatype :string
abu :string
paul :string
age :integer
timestamps
end
Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":
require 'faster_csv'
class FasterCSV
# Pre-compiles parsers and stores them by name for access during
# reads, just like the official FasterCSV version, BUT the central
# parser allows arbitrary whitespace before and after the column
# separator.
def init_parsers( options )
# prebuild Regexps for faster parsing @parsers = {
:leading_fields =>
/\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
You should modify the above line too. It takes both to correctly parse some lines:
/\A\s*#{Regexp.escape(@col_sep)}+/
:csv_row =>
### The Primary Parser ###
/ \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
\s* # <----- # ignore some whitespace
(?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
> # ... or ...
([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
)/x,
### End Primary Parser ###
:line_end =>
/#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
}
end
end
All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.
Looks good to me. Just don't hold your breath waiting on the patch...
:leading_fields =>
/\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
You should modify the above line too. It takes both to correctly parse
some lines:
/\A\s*#{Regexp.escape(@col_sep)}+/
I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.
Looks good to me. Just don't hold your breath waiting on the patch...
Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library
P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.
Good points all around. Dave knows this code better than I do, clearly.
James Edward Gray II
···
On Apr 27, 2006, at 10:52 AM, Dave Burt wrote:
James Edward Gray II wrote:
On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:
:leading_fields =>
/\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
You should modify the above line too. It takes both to correctly parse
some lines:
/\A\s*#{Regexp.escape(@col_sep)}+/
I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.
Looks good to me. Just don't hold your breath waiting on the patch...
Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library
P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.
Good points all around. Dave knows this code better than I do,
clearly.
Thanks, but credit to you -- IIRC part of your stated aim for FasterCSV
was to make it short, legible, and therefore maintainable, and if I can
pick up this stuff in literally one minute of looking at the code,
you've succeeded. Well done.