Faster_csv vs File+split, why it is not faster?

Pablo_Q · 21 November 2008 17:55

Hi folks,

Why I'm getting this result? is It due just to this specif problem?

the file has 293858 record, here is some record samples:

"MARCOS, LUIS","547 N LAKE ST","","MUNDELEIN","IL","000000000"
"BALDWIN, T & S","4732 NE 203RD ST","","LAKE FOREST PARK","WA","000000000"
"RYBOLT, C","401 CEDAR DR","","CLINTON","IL","000000000"
"WELDT, KRISTINA","1945 N ORLEANS ST","","MCHENRY","IL","000000000"
.....

CODE

require 'benchmark'

Benchmark.bm do |x|
  x.report do
    FasterCSV.foreach("data_test/match.csv") do |row|
    end
  end
end

Benchmark.bm do |x|
  x.report do
    File.new("data_test/match.csv",'r').each{|line|
       row = line.split("\",\"",-1)
       row[0].gsub!('"','')
       row[a.length-1].gsub!('"','')
    }
  end
end

RESULTS

      user system total real
16.180000 0.740000 16.920000 ( *17.246190*)
      user system total real
  5.830000 0.120000 5.950000 ( *6.028469*)

is this true?

···

--
Pablo Q.

James_Edward_Gray_II · 21 November 2008 18:07

Is it true that File.split() is faster than FasterCSV? Yeah, I bet it is. Likely reasons are:

* It's written in C
* It doesn't handle all types of CSV data, so it has less work to do

To give some examples, you split code doesn't parse this valid CSV data:

no,quotes

Or this:

"embedded
newlines"

Hope that explains things a bit.

James Edward Gray II

···

On Nov 21, 2008, at 11:55 AM, Pablo Q. wrote:

RESULTS

user system total real
16.180000 0.740000 16.920000 ( *17.246190*)
user system total real
5.830000 0.120000 5.950000 ( *6.028469*)

is this true?

Pablo_Q · 21 November 2008 18:23

I thought so...

I'm just comparing a single case of FasterCSV to all the implementation of
the library.

Thank you for your time!

···

2008/11/21 James Gray <james@grayproductions.net>

On Nov 21, 2008, at 11:55 AM, Pablo Q. wrote:

RESULTS

user system total real
16.180000 0.740000 16.920000 ( *17.246190*)
user system total real
5.830000 0.120000 5.950000 ( *6.028469*)

is this true?

Is it true that File.split() is faster than FasterCSV? Yeah, I bet it is.
Likely reasons are:

* It's written in C
* It doesn't handle all types of CSV data, so it has less work to do

To give some examples, you split code doesn't parse this valid CSV data:

no,quotes

Or this:

"embedded
newlines"

Hope that explains things a bit.

James Edward Gray II

--
Pablo Q.

Topic		Replies	Views
Rio: which is the slow one? ruby-talk	7	103	7 March 2006
Splitting a CSV file into 40,000 line chunks ruby-talk	42	361	2 December 2006
[ANN] FasterCSV 0.1.3--CSV parsing without the wait! ruby-talk	0	122	16 November 2005
FasterCSV RCR? ruby-talk	27	213	6 June 2006
Q about the FasterCSV ruby-talk	1	73	28 April 2006

Faster_csv vs File+split, why it is not faster?

Related topics