Why the CSV standard library is broken (and how to fix it), Part IV or Numerics a.k.a. Auto-Magic Type Inference for Strings and Numbers?

Gerald_Bauer1 · 11 October 2018 15:51

Hello,

I've written a new (and fourth) episode on why the CSV standard library is
broken, broken, broken (and how to fix it).

Let's have a look at numerics a.k.a. auto-magic type inference for
strings and numbers [1].

Here's the challenge for the standard csv library.
Let's read data.csv:

1,2,3
"4","5","6"

Using these popular two rules (bonus for NaNs - not a number).

Rule 1: Use "un-quoted" values for float numbers e.g. 1,2,3 or 1.0,
2.0, 3.0 etc.

Rule 2: Use quoted values for "non-numeric" strings e.g. "4", "5", "6"
or "Hello, World!" etc.

In the new csv reader it works like this :-):

   records = Csv.numeric.read( 'data.csv' )
   pp records
   # => [[1.0, 2.0, 3.0],
   # ["4", "5", "6"]]

And with your own not a number constants / configuration:

   records = Csv.numeric.parse( '1,2,NAN,#NA', nan: ['NAN', '#NA'] )
   pp records
   # => [[1.0, 2.0, NaN, NaN]]

Let's quote an old quote from this mailing list:

I disagree that it's broken.
It's implementing the [strict] RFC [CSV format] and gives you the tools that allow you to be less strict.

Anyone? Show us how you handle the reading of the numerics
variant and Not a Number (NaN) with the standard csv library?

Questions and comments welcome. Cheers. Prost.

PS: If you want to see other (more) CSV formats / dialects pre-configured
and supported "out-of-the-box" in the new csv reader, please tell.

Topic		Replies	Views
Why the CSV standard library is broken (and how to fix it) Part VII or What's Your Type? Guess. Again. And Again. And Again. Guess What's a Schema For? ruby-talk	0	333	30 October 2018
Why the CSV standard library is broken, broken, broken (and how to fix it) ruby-talk	13	2205	24 August 2018
Why the CSV standard library is broken (and how to fix it), Part III or Returning a CSV Record as an Array? Hash? Struct? Row? ruby-talk	0	317	1 October 2018
CSV Reader (and Type Inference and Data Conversion) Benchmarks (Faster, Fasterer, Fastest) - And the Winner is... String#split ruby-talk	0	376	22 November 2018
Why the CSV standard library is broken, broken, broken (and how to fix it), Part II or The Wonders of CSV Formats / Dialects ruby-talk	0	345	25 September 2018