Converting CSV

Hello all!
I'm trying to learn ruby and i'm using it for different tasks and I have
no
one to ask for help. So bear with me :slight_smile:

I receive CSV files that are separated by semicolon and no quotes.
Received format:

Heading1;Heading2;Heading3;
String1;String2;String3

I would like to make a script for reading all these files and converting
them to CSV files separated by commas and I would like all fields to be
quoted. If possible change encoding to UTF8 without BOM.

Desired format:

"Heading1","Heading2","Heading3"
"String1","String2","String3"

My first problem is how to parse and create a new csv.
I read that I could parse a whole csv to an array of arrays like this:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

But how should use this array when creating a new file?
I would like to loop through the array.

Any suggestions?

Br
cristian

路路路

--
Posted via http://www.ruby-forum.com/.

Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.

路路路

--
Posted via http://www.ruby-forum.com/.

Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open 'myfile.csv', 'r+' do |csv|
聽聽new_csv = csv.read.gsub(/[^\n\r;]+/, '"\0"').gsub(';', ',')
聽聽csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.

路路路

--
Posted via http://www.ruby-forum.com/.

Hi Christian:

The way you loop through the array, is with a Ruby iterator. The method
"each" will iterate through anything generally:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

new_file = FasterCSV.new()
array_of_arrays.each do | item |
聽聽new_file.add_row(item)
end
File.open("saved_file.csv", "w") { |f| new_file.dump }

It seems to me that there must be a better way to do what you're trying
to accomplish. I don't know why you'd want to mess-around with csv
files in the first place. I don't understand why the comma delimited
csv file is better than the original. You could just use the original.

Also, if you're just doing a simple conversion, I'd just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open("original_file.txt", "r") do |f|
聽聽new_file = File.open("new_file.csv", "w")
聽聽f.each_line do |line|
聽聽聽聽fields = line.split(";")
聽聽聽聽fields.each { |fd| fd = "\"#{fd}\""
聽聽聽聽new_file.write(fields.join(",")
聽聽end
聽聽new_file.close
end

Perhaps if you explain what you're trying to do with these files, I
could give you better advise. Ruby has many tools to save objects like
YAML and JSON. Generally its better not to mess around with formatting
files yourself.

路路路

--
Posted via http://www.ruby-forum.com/.

Thank you!
Both examples works great!

How can I use an array of file names from the directory to use it when
creating the new csv's?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",'w') do |csv|
new_csv = File.read(filname).gsub(/[^\n\r;]+/,'"\0"').gsub(';',',')
聽聽聽聽csv.print new_csv
聽聽end
end

Br
cristian

路路路

--
Posted via http://www.ruby-forum.com/.

Jan E. wrote in post #1057239:

Hi,

Why do you even want to parse the CSV? I would simply replace the
semicolons with commas and quote the strings with a regex.

Ok? Sounds difficult. Could you give me an example?

Thanks.

Br
cristian

路路路

--
Posted via http://www.ruby-forum.com/\.

Jan E. wrote in post #1057244:

Well, parsing the file and processing its content is certainly more
difficult than doing search and replace.

File.open 'myfile.csv', 'r+' do |csv|
  new_csv = csv.read.gsub(/[^\n\r;]+/, '"\0"').gsub(';', ',')
  csv.print new_csv
end

This quotes everything between the semicolons and then replaces them
with commas.

Ok, thanks! Will this save the new file with the name "new_csv"?

Br
cristian

路路路

--
Posted via http://www.ruby-forum.com/\.

Eric C. wrote in post #1057247:

Hi Christian:

The way you loop through the array, is with a Ruby iterator. The method
"each" will iterate through anything generally:

require 'fastercsv'
array_of_arrays = FasterCSV.read("myfile.csv")

new_file = FasterCSV.new()
array_of_arrays.each do | item |
  new_file.add_row(item)
end
File.open("saved_file.csv", "w") { |f| new_file.dump }

It seems to me that there must be a better way to do what you're trying
to accomplish. I don't know why you'd want to mess-around with csv
files in the first place. I don't understand why the comma delimited
csv file is better than the original. You could just use the original.

Also, if you're just doing a simple conversion, I'd just use simple ruby
code instead of having to learn everything about the CSV stuff:

File.open("original_file.txt", "r") do |f|
  new_file = File.open("new_file.csv", "w")
  f.each_line do |line|
    fields = line.split(";")
    fields.each { |fd| fd = "\"#{fd}\""
    new_file.write(fields.join(",")
  end
  new_file.close
end

Perhaps if you explain what you're trying to do with these files, I
could give you better advise. Ruby has many tools to save objects like
YAML and JSON. Generally its better not to mess around with formatting
files yourself.

Hi!
The files are to be read by some system that cannot handle csv files
with semicolon. The problem is that the fields might contain commas
also. So the file should be comma separated with quoted fields.

I get these error messages with your last example:

CSVConverter.rb:7: syntax error, unexpected kEND, expecting ')'
CSVConverter.rb:9: syntax error, unexpected kEND, expecting '}'

Br
cristian

路路路

--
Posted via http://www.ruby-forum.com/\.

cristian cristian wrote in post #1057358:

How can I use an array of file names from the directory to use it when
creating the new csv's?

I was thinking something like this:

files = Dir.glob("*.csv")

files.each |filename|

file.open (filename+"_converted.csv",'w') do |csv|
[...]

You should strip the ".csv" extension from filename. Otherwise, you'll
end up with names like "myfile.csv_converted.csv".

For example:

"#{filename[0...-4]}_converted.csv"

Also it doesn't really make sense to save the Dir#glob Enumerator in
files (unless you want to use it again).

Simple write it as one continuous expression:

Dir.glob("*.csv") do |file|
  ...
end

路路路

--
Posted via http://www.ruby-forum.com/\.

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
  new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
  csv.print new_csv
end

路路路

--
Posted via http://www.ruby-forum.com/\.

Jan E. wrote in post #1057256:

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
  new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
  csv.print new_csv
end

Thanks! Works fine. I have to read a lot about regular expressions to
understand it.

I will play around a little now with an array of files in the directory.

Br
cristian

路路路

--
Posted via http://www.ruby-forum.com/\.

If the file is large this can easily break because you need to read
the whole thing into memory. I'd also rather use the proper tool for
the job instead of cooking something with regexp. In this case I'd do

require 'csv'

CSV.open("new.csv", "wb", col_sep: ",", force_quotes: true) do |csv_out|
  CSV.foreach("old.csv", col_sep: ";") do |rec|
    csv_out << rec
  end
end

Kind regards

robert

路路路

On Wed, Apr 18, 2012 at 10:30 PM, Jan E. <lists@ruby-forum.com> wrote:

cristian cristian wrote in post #1057250:

Ok, thanks! Will this save the new file with the name "new_csv"?

No, it will overwrite the original file. If you want to write the CSV to
a new file, change the code to

File.open 'new.csv', 'w' do |csv|
new_csv = File.read('old.csv').gsub(/[^\n\r;]+/, '"\0"').gsub(';',
',')
csv.print new_csv
end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote in post #1057261:

If the file is large this can easily break because you need to read
the whole thing into memory.

I don't expect the CSVs to be *that* big. But sure, if we're talking
about hundreds of millions of entries here, you'll have to read the file
in small portions.

I'd also rather use the proper tool for
the job instead of cooking something with regexp.

Well, that's probably a question of personal preferences. I don't think
it's necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don't really need a full 100 MB algebra library.

路路路

--
Posted via http://www.ruby-forum.com/\.

Of course you can write everything yourself. For any other than
trivial applications it's absurd though. Plus, even for the small
ones using a lib which exists vs. coding yourself is often quicker.
As always, it's a matter of tradeoffs.

Btw, your code creates two copies of the input. You could reduce
memory requirements by using String#gsub! instead of String#gsub.

Kind regards

robert

路路路

On Thu, Apr 19, 2012 at 12:19 AM, Jan E. <lists@ruby-forum.com> wrote:

Robert Klemme wrote in post #1057261:

I'd also rather use the proper tool for
the job instead of cooking something with regexp.

Well, that's probably a question of personal preferences. I don't think
it's necessary to load a complete library for every tiny task that comes
around.

I mean: If I want to do some simple matrix calculations for example, I
don't really need a full 100 MB algebra library.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/