Finding out what row DBI chokes on?

Some times I deal with a legacy table with screwy datetime info, so you get fields that in MySQL are '0000-00-00' or other kinds of invalid values ... I pull this out with DBI I get an ArgumentError that simply says "argument out of range", which I believe is being caused by DBI calling Time.gm( 0, 0, 0, 0, 0, 0 ) or something like that.

Now, it makes sense to me that DBI doesn't want to deal with this, but in some cases it might be convenient for me to get a message that tells me specifically what row has the bad value. Does anybody know how I might do this, short of hacking DBI directly?

Francis Hwang
http://fhwang.net/

Hi Francis,

···

--- Francis Hwang <sera@fhwang.net> wrote:

Some times I deal with a legacy table with screwy
datetime info, so you
get fields that in MySQL are '0000-00-00' or other
kinds of invalid
values ... I pull this out with DBI I get an
ArgumentError that simply
says "argument out of range", which I believe is
being caused by DBI
calling Time.gm( 0, 0, 0, 0, 0, 0 ) or something
like that.

Is it possible to do validation at the SQL level
itself? What does the SQL generates the result look
like?

-- shanko

__________________________________
Do you Yahoo!?
Dress up your holiday email, Hollywood style. Learn more.
http://celebrity.mail.yahoo.com

Hi Francis,

Some times I deal with a legacy table with screwy
datetime info, so you
get fields that in MySQL are '0000-00-00' or other
kinds of invalid
values ... I pull this out with DBI I get an
ArgumentError that simply
says "argument out of range", which I believe is
being caused by DBI
calling Time.gm( 0, 0, 0, 0, 0, 0 ) or something
like that.

Is it possible to do validation at the SQL level
itself? What does the SQL generates the result look
like?

-- shanko

Perhaps I didn't explain my problem very clearly. None of my current code is generating bad datetimes, but at work I deal with a database with legacy data going back 7 years, and from time to time I come across bad datetimes that were inserted years before. DBI can't handle these at all. Here's test code to show it:

1) First I create a table and insert a bogus datetime:

create table test ( dt datetime );

Query OK, 0 rows affected (0.26 sec)

insert into test( dt ) values( '2005-01-01' );

Query OK, 1 row affected (0.13 sec)

insert into test( dt ) values( '2004-01-01' );

Query OK, 1 row affected (0.37 sec)

insert into test( dt ) values( '0000-00-00' );

Query OK, 1 row affected (0.21 sec)

···

On Dec 30, 2004, at 7:00 PM, Shashank Date wrote:

--- Francis Hwang <sera@fhwang.net> wrote:

select * from test;

+---------------------+

dt |

+---------------------+

2005-01-01 00:00:00 |
2004-01-01 00:00:00 |
0000-00-00 00:00:00 |

+---------------------+
3 rows in set (0.24 sec)

2) Then I try to query this through DBI:

irb(main):001:0> require 'dbi'=> true
irb(main):002:0> dbh = DBI.connect( 'dbi:Mysql:test:localhost', 'francis', 'xxxxxxx' )
=> #<DBI::DatabaseHandle:0x5e76a4 @trace_mode=2, @handle=#<DBI::DBD::Mysql::Database:0x5e6ec0 @handle=#<Mysql>, @mutex=#<Mutex:0x5df4b8 @waiting=, @locked=false>, @have_transactions=true, @attr={"AutoCommit"=>true}>, @trace_output=#<IO:0x1d3a5c>>
irb(main):003:0> dbh.select_all( 'select * from test' )
ArgumentError: argument out of range
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:59:in `gm'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:59:in `as_timestamp'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:79:in `send'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:79:in `coerce'
         from /usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:418:in `fill_array'
         from /usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in `each_with_index'
         from /usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in `each'
         from /usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in `each_with_index'
         from /usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in `fill_array'
         from /usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:425:in `fetch'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:1145:in `fetch_all'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:1144:in `loop'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:1144:in `fetch_all'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:909:in `fetch_all'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:666:in `select_all'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:662:in `execute'
         from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:662:in `select_all'
         from (irb):3irb(main):004:0>
irb(main):005:0*

Now, it isn't surprising or necessarily a bug that DBI won't even sensibly process this value, and catching the exception is easy enough. But it slows me down a little that

1) I don't know exactly what the bad value is. You'd be surprised how many ways MySQL lets you insert bad datetime strings.
2) I don't know which row the bad value is in if I've done a select that should return more than 1 row.

So I'm wondering if, short of patching DBI, there is a way for a DBI client (Lafcadio, in my case) to get more info to return a more informative error message. I'm big into informative errors.

Francis Hwang

Hi Francis,

Francis Hwang wrote:

Perhaps I didn't explain my problem very clearly. None of my current code is generating bad datetimes, but at work I deal with a database with legacy data going back 7 years, and from time to time I come across bad datetimes that were inserted years before.

This much I had guessed from your earlier email, so I guess you had explained the problem clearly :wink:

Apologies if my answers did not make sense to you.

> DBI can't handle these at all. Here's test code to show it:

<snip>

Now, it isn't surprising or necessarily a bug that DBI won't even sensibly process this value, and catching the exception is easy enough. But it slows me down a little that

1) I don't know exactly what the bad value is. You'd be surprised how many ways MySQL lets you insert bad datetime strings.

I deal with such data almost on a routine basis (although not using MySQL), and use standard "scrubbing techniques":

1. Try to limit the data in the SELECT, like borrowing from your example:

SELECT * FROM TEST WHERE DT > '1997-01-01';

or

SELECT DISTINCT DT FROM TEST;

to see what values show up

2. Use the datetime / string conversion functions to try to identify the bad data.

3. If the data set is small (< 100,000 rows) use CURSOR logic (usually frowned upon) to identify the bad rows

4. Export the table to a CSV and then use the text filtering tools and then Import it back

Again, not knowing exactly what your situation is, I do not know if all these are even relevant. So I guess, what I am trying to say is, as far as possible try to handle it at the backend BEFORE it reaches DBI.

2) I don't know which row the bad value is in if I've done a select that should return more than 1 row.
So I'm wondering if, short of patching DBI, there is a way for a DBI client (Lafcadio, in my case) to get more info to return a more informative error message. I'm big into informative errors.

I am afraid there may be no way to do that. But I am by no means a DBI expert.

HTH,

Francis Hwang
http://fhwang.net/

-- shanko

These scrubbing techniques do help, thanks.

Francis Hwang

···

On Jan 2, 2005, at 10:31 AM, Shashank Date wrote:

I deal with such data almost on a routine basis (although not using MySQL), and use standard "scrubbing techniques":

1. Try to limit the data in the SELECT, like borrowing from your example:

SELECT * FROM TEST WHERE DT > '1997-01-01';

or

SELECT DISTINCT DT FROM TEST;

to see what values show up

2. Use the datetime / string conversion functions to try to identify the bad data.

3. If the data set is small (< 100,000 rows) use CURSOR logic (usually frowned upon) to identify the bad rows

4. Export the table to a CSV and then use the text filtering tools and then Import it back

Again, not knowing exactly what your situation is, I do not know if all these are even relevant. So I guess, what I am trying to say is, as far as possible try to handle it at the backend BEFORE it reaches DBI.