Converting CSV

7stud2 · 18 April 2012 19:15

Hello all!
I'm trying to learn ruby and i'm using it for different tasks and I have
no
one to ask for help. So bear with me

I receive CSV files that are separated by semicolon and no quotes.
Received format:

Heading1;Heading2;Heading3;
String1;String2;String3

I would like to make a script for reading all these files and converting
them to CSV files separated by commas and I would like all fields to be
quoted. If possible change encoding to UTF8 without BOM.

Desired format:

"Heading1","Heading2","Heading3"
"String1","String2","String3"

My first problem is how to parse and create a new csv.
I read that I could parse a whole csv to an array of arrays like this:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

But how should use this array when creating a new file?
I would like to loop through the array.

Any suggestions?

Br
cristian

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 April 2012 19:29

Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 April 2012 20:02

Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open 'myfile.csv', 'r+' do |csv|
new_csv = csv.read.gsub(/[^\n\r;]+/, '"\0"').gsub(';', ',')
csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 April 2012 20:10

Hi Christian:

The way you loop through the array, is with a Ruby iterator. The method
"each" will iterate through anything generally:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

new_file = FasterCSV.new()
array_of_arrays.each do | item |
new_file.add_row(item)
end
File.open("saved_file.csv", "w") { |f| new_file.dump }

It seems to me that there must be a better way to do what you're trying
to accomplish. I don't know why you'd want to mess-around with csv
files in the first place. I don't understand why the comma delimited
csv file is better than the original. You could just use the original.

Also, if you're just doing a simple conversion, I'd just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open("original_file.txt", "r") do |f|
  new_file = File.open("new_file.csv", "w")
  f.each_line do |line|
    fields = line.split(";")
    fields.each { |fd| fd = "\"#{fd}\""
    new_file.write(fields.join(",")
  end
  new_file.close
end

Perhaps if you explain what you're trying to do with these files, I
could give you better advise. Ruby has many tools to save objects like
YAML and JSON. Generally its better not to mess around with formatting
files yourself.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 19 April 2012 12:33

Thank you!
Both examples works great!

How can I use an array of file names from the directory to use it when
creating the new csv's?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",'w') do |csv|
new_csv = File.read(filname).gsub(/[^\n\r;]+/,'"\0"').gsub(';',',')
csv.print new_csv
end
end

Br
cristian

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 April 2012 19:43

Jan E. wrote in post #1057239:

Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.

Ok? Sounds difficult. Could you give me an example?

Thanks.

Br
cristian

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 April 2012 20:22

Jan E. wrote in post #1057244:

Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open 'myfile.csv', 'r+' do |csv|
new_csv = csv.read.gsub(/[^\n\r;]+/, '"\0"').gsub(';', ',')
csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.

Ok, thanks! Will this save the new file with the name "new_csv"?

Br
cristian

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 April 2012 20:26

Eric C. wrote in post #1057247:

Hi Christian:

The way you loop through the array, is with a Ruby iterator. The method
"each" will iterate through anything generally:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

new_file = FasterCSV.new()
array_of_arrays.each do | item |
  new_file.add_row(item)
end
File.open("saved_file.csv", "w") { |f| new_file.dump }

It seems to me that there must be a better way to do what you're trying
to accomplish. I don't know why you'd want to mess-around with csv
files in the first place. I don't understand why the comma delimited
csv file is better than the original. You could just use the original.

Also, if you're just doing a simple conversion, I'd just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open("original_file.txt", "r") do |f|
  new_file = File.open("new_file.csv", "w")
  f.each_line do |line|
    fields = line.split(";")
    fields.each { |fd| fd = "\"#{fd}\""
    new_file.write(fields.join(",")
  end
  new_file.close
end

Perhaps if you explain what you're trying to do with these files, I
could give you better advise. Ruby has many tools to save objects like
YAML and JSON. Generally its better not to mess around with formatting
files yourself.

Hi!
The files are to be read by some system that cannot handle csv files
with semicolon. The problem is that the fields might contain commas
also. So the file should be comma separated with quoted fields.

I get these error messages with your last example:

CSVConverter.rb:7: syntax error, unexpected kEND, expecting ')'
CSVConverter.rb:9: syntax error, unexpected kEND, expecting '}'

Br
cristian

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 19 April 2012 13:05

cristian cristian wrote in post #1057358:

How can I use an array of file names from the directory to use it when
creating the new csv's?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",'w') do |csv|
[...]

You should strip the ".csv" extension from filename. Otherwise, you'll
end up with names like "myfile.csv_converted.csv".

For example:

"#{filename[0...-4]}_converted.csv"

Also it doesn't really make sense to save the Dir#glob Enumerator in
files (unless you want to use it again).

Simple write it as one continuous expression:

Dir.glob("*.csv") do |file|
...
end

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 April 2012 20:30

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
csv.print new_csv
end

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 April 2012 21:15

Jan E. wrote in post #1057256:

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
csv.print new_csv
end

Thanks! Works fine. I have to read a lot about regular expressions to
understand it.

I will play around a little now with an array of files in the directory.

Br
cristian

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 18 April 2012 21:25

If the file is large this can easily break because you need to read
the whole thing into memory. I'd also rather use the proper tool for
the job instead of cooking something with regexp. In this case I'd do

require 'csv'

CSV.open("new.csv", "wb", col_sep: ",", force_quotes: true) do |csv_out|
  CSV.foreach("old.csv", col_sep: ";") do |rec|
    csv_out << rec
  end
end

Kind regards

robert

···

On Wed, Apr 18, 2012 at 10:30 PM, Jan E. <lists@ruby-forum.com> wrote:

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
csv.print new_csv
end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

7stud2 · 18 April 2012 22:19

Robert Klemme wrote in post #1057261:

If the file is large this can easily break because you need to read
the whole thing into memory.

I don't expect the CSVs to be *that* big. But sure, if we're talking
about hundreds of millions of entries here, you'll have to read the file
in small portions.

I'd also rather use the proper tool for
the job instead of cooking something with regexp.

Well, that's probably a question of personal preferences. I don't think
it's necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don't really need a full 100 MB algebra library.

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 19 April 2012 08:48

Of course you can write everything yourself. For any other than
trivial applications it's absurd though. Plus, even for the small
ones using a lib which exists vs. coding yourself is often quicker.
As always, it's a matter of tradeoffs.

Btw, your code creates two copies of the input. You could reduce
memory requirements by using String#gsub! instead of String#gsub.

Kind regards

robert

···

On Thu, Apr 19, 2012 at 12:19 AM, Jan E. <lists@ruby-forum.com> wrote:

Robert Klemme wrote in post #1057261:

I'd also rather use the proper tool for
the job instead of cooking something with regexp.

Well, that's probably a question of personal preferences. I don't think
it's necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don't really need a full 100 MB algebra library.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Complex CSV parsing ruby-talk	3	86	14 January 2009
String Spliting/CSV question ruby-talk	2	113	29 June 2010
Parsing a csv using ; => ruby doesn't split the row ruby-talk	4	124	5 January 2008
Changing the quote-character in csv parsing ruby-talk	3	124	28 March 2006
Faster CSV parsing ruby-talk	10	78	30 October 2005

Converting CSV

Related topics