Nuby problem w/CSV, tab-delimited files & embedded double-quotes

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

   this_file = CSV.open(decrypted_file, "r", "\t")
   header = this_file.shift
   this_file.each do |line|
      # do stuff w/line here
   end
   this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
  from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

somedata.each_line do |l|
   x = CSV.parse_line(l, "\t")[0]
   puts x
end

puts "Finished!"

rpardee@comcast.net wrote:

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

   this_file = CSV.open(decrypted_file, "r", "\t")
   header = this_file.shift
   this_file.each do |line|
      # do stuff w/line here
   end
   this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
  from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

The problem is that there's no CSV standard. The Ruby CSV library requires you to put values containing quotes into quotes themselves. In addition, you have to double the quotes within the quotes, i.e.

4-2718 "128 S. ""F"" Street"

will do it.

Cheers

Maik

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

  this_file = CSV.open(decrypted_file, "r", "\t")
  header = this_file.shift
  this_file.each do |line|
     # do stuff w/line here
  end
  this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
  from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

somedata.each_line do |l|
  x = CSV.parse_line(l, "\t")[0]
  puts x
end

puts "Finished!"

i think that, to be legitimate, this file would have to be

4-2718 "128 S. ""F"" Street"

so the data is corrupt and may need to be pre-munged. if you file is always
tab delimited why not

   table =
   parse = proc{|line| line.split(%r/\t/).map{|c| c.strip}}
   IO::readlines(decrypted_file){|line| table << parse[line]}
   header = table.shift

or do you sometimes have escaped tabs?

cheers.

-a

···

On Thu, 2 Jun 2005 rpardee@comcast.net wrote:
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

Woah--you are giving me the freak-out with that code. 8^)

I shouldn't have any escaped tabs--so that should suit.

Thanks!

-Roy

···

Ara.T.Howard@noaa.gov wrote:

On Thu, 2 Jun 2005 rpardee@comcast.net wrote:

> Hey All,
>
> I've got a file of tab-delimited data that I need to read in. Up until
> today this approach has worked wonderfully:
>
> this_file = CSV.open(decrypted_file, "r", "\t")
> header = this_file.shift
> this_file.each do |line|
> # do stuff w/line here
> end
> this_file.close
>
> But today's file has an entry w/a pair of double-quotes around it. So
> now I get:
>
> c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
> CSV::IllegalFormatError (CSV::IllegalFormatError)
> from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'
>
> I've looked through the rubydocs on CSV & am not finding a method for
> telling CSV to expect double-quotes in the file. Is there such a
> thing?
>
> Thanks!
>
> -Roy
>
> P.S. I believe the following illustrates the problem--the "F" street
> entry line does not seem to parse:
>
> require "CSV" # Lib for working with comma-separated-values
> files
>
> somedata = <<END_OF_FILE
> userid line1
> 1-2700 1313 Mockingbird Lane
> 2-2706 7100 58th Ave SE
> 4-2718 128 S. "F" Street
> 3-2712 45 600th Ave. NE
> END_OF_FILE
>
> somedata.each_line do |l|
> x = CSV.parse_line(l, "\t")[0]
> puts x
> end
>
> puts "Finished!"

i think that, to be legitimate, this file would have to be

> 4-2718 "128 S. ""F"" Street"

so the data is corrupt and may need to be pre-munged. if you file is always
tab delimited why not

   table =
   parse = proc{|line| line.split(%r/\t/).map{|c| c.strip}}
   IO::readlines(decrypted_file){|line| table << parse[line]}
   header = table.shift

or do you sometimes have escaped tabs?

cheers.

-a
--

> email :: ara [dot] t [dot] howard [at] noaa [dot] gov
> phone :: 303.497.6469
> My religion is very simple. My religion is kindness.
> --Tenzin Gyatso

Come to think of it--I don't really need CSV at all for this. I can
just .each_line the file to get my rows and then .split("\t") each line
to get my fields. Or whatever the heck I'm trying to say...

Thanks again everyone!

-Roy