I'm using csv module to read parse 76,000 rows of patient data in a CSV
file. I use the below line to read in the file and loop through the
rows.
CSV.open("patientfile.txt", "r") do |row|
When I get to a row like below the script blows up:
/usr/local/lib/ruby/1.8/csv.rb:639:in `get_row': CSV::IllegalFormatError
(CSV::IllegalFormatError)
from /usr/local/lib/ruby/1.8/csv.rb:556:in `each'
from /usr/local/lib/ruby/1.8/csv.rb:531:in `parse'
from /usr/local/lib/ruby/1.8/csv.rb:311:in `open_reader'
from /usr/local/lib/ruby/1.8/csv.rb:85:in `open'
from sync.rb:1
The row is similar to below. Note the embedded "B" within the address
field.
"M1234567","John","A","Doe","321 NORTH "B"
ST","","Sometown","ST","55555"
Is there a way to get around this error and escape the "B" properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
It's just a guess, but maybe you could try replacing every
double-quote character that isn't either preceded or followed by a
comma with a single quote? Something like the untested code below:
line.gsub(/[^,]"[^,]/,"'")
It would probably require reading the whole file first, writing out a
corrected version, and then calling the CSV methods on that, but it
beats doing it by hand :).
···
On 5/5/06, Sean Clark <smc7000@gmail.com> wrote:
Is there a way to get around this error and escape the "B" properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
Well, the long and the short of this story is that the above line is not valid CSV. Gotta fix that somehow: by hand, with a preprocessor, or by fixing the broken software that spit it out.
James Edward Gray II
···
On May 5, 2006, at 11:48 AM, Sean Clark wrote:
"M1234567","John","A","Doe","321 NORTH "B"
ST","","Sometown","ST","55555"
...
"M1234567","John","A","Doe","321 NORTH "B"
ST","","Sometown","ST","55555"
Is there a way to get around this error and escape the "B" properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
Bira, I'm testing your idea with the below script but I'm having
problems. Thanks for the start though.
TEST PROGRAM:
line = "\"NAME\",\"610 \"A\" STREET\",\"STATE\",\"POSTAL_CODE\""
puts line
# if double quote not preceeded by a comman and not followed
# by a comma, then replace the quotation with a single quote.
new_line = line.gsub(/[^,]"[^,]/,"'")
puts new_line
OUTPUT:
"NAME","610 "A" STREET","STATE","POSTAL_CODE"
"NAME","610'" STREET","STATE","POSTAL_CODE"