CSV Reader (and Type Inference and Data Conversion) Benchmarks (Faster, Fasterer, Fastest) - And the Winner is... String#split

Gerald_Bauer1 · 22 November 2018 15:01

Hello,

I've put together some basic csv reader / parser benchmarks [1].
The "Raw" Read Benchmark returns all strings - no type inference or
data conversion (*)
and the Numerics Benchmark returns all numbers - simple type inference
or data conversion -
it's all numbers - all the time (except for the header row).

Here's the result for the numerics benchmark using the weather
station data from
the University of Waterloo, Ontario, Canada:

n = 100
user system total real
std: 20.781000 0.234000 21.015000 ( 21.039186)
split: 1.531000 0.063000 1.594000 ( 1.582496)
split(table): 2.000000 0.015000 2.015000 ( 2.016913)
reader: 63.500000 0.203000 63.703000 ( 63.691851)
reader(table): 37.407000 0.188000 37.595000 ( 37.601160)
reader(numeric): 40.421000 0.141000 40.562000 ( 40.595467)
reader(json): 1.125000 0.062000 1.187000 ( 1.191145)
reader(yaml): 38.485000 15.672000 54.157000 ( 54.229705)

And the winner is...

Of course - nothing is faster than "plain" string#split (with "simple
csv", that is,
no escape rules and edge cases):

   def read_faster_csv( path, sep: ',' )
     recs = []
     File.open( path, 'r:utf-8' ) do |f|
        f.each_line do |line|
          line = line.chomp( '' )
          values = line.split( sep )
          recs << values
        end
     end
     recs
   end

(*) Note: YAML and JSON - of course - always use YAML and JSON
encoding (and data conversion) rules :-).

Happy data wrangling with ruby. Cheers. Prost.

[1] https://github.com/csvreader/benchmarks

Topic		Replies	Views
[ANN] FasterCSV 0.1.3--CSV parsing without the wait! ruby-talk	0	120	16 November 2005
Why the CSV standard library is broken (and how to fix it), Part IV or Numerics a.k.a. Auto-Magic Type Inference for Strings and Numbers? ruby-talk	0	311	11 October 2018
Faster_csv vs File+split, why it is not faster? ruby-talk	2	131	21 November 2008
FasterCSV 1.4.0 -- The Final 1.8 Release ruby-talk	1	154	12 September 2008
Performance test: 1.8.0p2 versus 1.6.8 ruby-talk	1	119	22 April 2003

CSV Reader (and Type Inference and Data Conversion) Benchmarks (Faster, Fasterer, Fastest) - And the Winner is... String#split

Related topics