Regexp for CSV header

Paul_Shapiro · 17 June 2009 15:31

My script currently is processing various csv files. The top row/header
resembles this format:

Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you answered
the online timed retrieval quiz?,3) B19. If you want your product to be
easy to find in the supermarket then you should make its container,"4)
C19. So that he can shift attention between the radio and his
incessantly talking girl friend when she is in the car, Joe adjusts his
radio",5) B20. Early selection is most likely to occur for,6) C20.
Early selection for a red target is most likely to occur when there
is,"7) B21. In a lexical decision task, when the target is a bird name,
e.g. robin, it is usually preceded by the prime BODY but is sometimes
preceded by the prime BIRD."

Most of the headers begin '1)', '5)', etc. I need to remove this from
the csv files. Another problem I've encountered while doing this is that
some of the headers are encased in double quotes like, '"4)4) C19. So
that he can shift attention between the radio and his incessantly
talking girl friend when she is in the car, Joe adjusts his radio", 5)
B20'

I have tried connveting the top row from an array to a string and then
gsub(/[\d]+\)/,''). This kinda works. It is unable to deal with the
double quote problem. It also replaces with whitespace, which I don't
want. Also, I can't figure out how to put it back in the array as it was
then write it back to the csv.

Help would be appreciated. Thanks.

···

--
Posted via http://www.ruby-forum.com/.

James_Edward_Gray_II · 17 June 2009 22:26

I recommend using a CSV parser so it can worry about all of those little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online reading comprehension quiz?,2) S32 Which best describes how you answered the online timed retrieval quiz?,3) B19. If you want your product to be easy to find in the supermarket then you should make its container,"4) C19. So that he can shift attention between the radio and his incessantly talking girl friend when she is in the car, Joe adjusts his radio",5) B20. Early selection is most likely to occur for,6) C20. Early selection for a red target is most likely to occur when there is,"7) B21. In a lexical decision task, when the target is a bird name, e.g. robin, it is usually preceded by the prime BODY but is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

···

On Jun 17, 2009, at 10:31 AM, Paul Shapiro wrote:

My script currently is processing various csv files.

James_Edward_Gray_II · 17 June 2009 22:27

I recommend using a CSV parser so it can worry about all of those little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online reading comprehension quiz?,2) S32 Which best describes how you answered the online timed retrieval quiz?,3) B19. If you want your product to be easy to find in the supermarket then you should make its container,"4) C19. So that he can shift attention between the radio and his incessantly talking girl friend when she is in the car, Joe adjusts his radio",5) B20. Early selection is most likely to occur for,6) C20. Early selection for a red target is most likely to occur when there is,"7) B21. In a lexical decision task, when the target is a bird name, e.g. robin, it is usually preceded by the prime BODY but is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

···

On Jun 17, 2009, at 10:31 AM, Paul Shapiro wrote:

My script currently is processing various csv files.

Paul_Shapiro · 18 June 2009 05:44

James Gray wrote:

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

#!/usr/bin/env ruby

require 'rubygems'
require 'roo'
require 'csv'
require 'fileutils'
require 'rio'
require 'fastercsv'

FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/xls"
FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/tmp"
FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/csv"

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/*.xls"]
for file in @filesxls
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/xls")
end

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls"]
@filetmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filesxls
  convert = Excel.new(file)
  convert.default_sheet = convert.sheets[0]
  convert.to_csv(file+"_tmp")
end

@filestmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filestmp
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/tmp")
end

dir = "/Users/pshapiro/Desktop/Excel/tmp/"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end

files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + f + ".csv"
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/tmp/*.csv"]

for file in @filescsv
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/csv")
end

FileUtils.rm_rf("/Users/pshapiro/Desktop/Excel/tmp")

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
  5.times {
  text=""
  File.open(file,"r"){|f|f.gets;text=f.read}
  File.open(file,"w+"){|f| f.write(text)}
  }
end

dir = "/Users/pshapiro/Desktop/Excel/csv/"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*') + ".tmp"
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
  csv = FasterCSV.read(file, :headers => true)
  lastc = csv.headers.length-1
# puts lastc
  rio(file).csv.skipcolumns(1..2,lastc) > rio(file+".csv").csv(',')
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
FileUtils.remove(file)
end

dir = "/Users/pshapiro/Desktop/Excel/csv"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end

2.times {
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end
}

files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + f + ".csv"
File.rename(oldFile, newFile)
end

···

On Jun 17, 2009, at 10:31 AM, Paul Shapiro wrote:

#####################################

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
  csv = FasterCSV.read(file, :headers => true)
  csv = csv.to_s
  fields = FCSV.parse_line(csv)

  fields.each do |f|
    f.sub!(/[\d]+\)+[\s]/,'')
  end

puts fields

  wline = FCSV.generate_line(fields)
  astring = rio(file).contents
  rio(file).csv.print(astring).close

  text=""
  File.open(file,"r"){|f|f.gets;text=f.read}
  File.open(file,"w+"){|f| f.write(text)}

astring = rio(file).contents
rio(file).csv.print(wline+astring).close
end

Again, Thanks!!!!!!!!!!
--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
If column header contain regexp, delete column ruby-talk	9	135	22 June 2009
Regexp help: Parsing a CSV file ruby-talk	26	183	27 February 2003
Nuby problem w/CSV, tab-delimited files & embedded double-quotes ruby-talk	4	113	3 June 2005
Regexp help: Parsing a CSV file ruby-talk	0	104	21 February 2003
Double quote problem in CSV ruby-talk	7	139	4 March 2009

Regexp for CSV header

Related topics