How to improve this code?

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
bla@bla.com,value1
bla@bla.com,value2
bla@bla.com,value3
bla@bla.com,value4
ruby@ruby.br,value1
ruby@ruby.br,value2

the output should be in two lines
bla@bla.com,value1,value2,value3,value4
ruby@ruby.br,value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open("Sector_brand.csv").each_line do |lines|
  values = lines.split(",")
  email = values[0]
  content = values[1]
  if h.key?(email)
    l = h[email]
    l.push content
    h[email] = l
  else
    l = [content]
    h[email] = l
  end
end

I didn't put the code to print the Hash. Also I didn't create the code
above in a class because it is just a test.

Well guys, does anyone could see my code and give comments? How the code
above could be improved?

Thanks in advice.

Junior

···

--
Posted via http://www.ruby-forum.com/.

Jair Rillo Junior wrote:

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
bla@bla.com,value1
bla@bla.com,value2
bla@bla.com,value3
bla@bla.com,value4
ruby@ruby.br,value1
ruby@ruby.br,value2

the output should be in two lines
bla@bla.com,value1,value2,value3,value4
ruby@ruby.br,value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open("Sector_brand.csv").each_line do |lines|
  values = lines.split(",")
  email = values[0]
  content = values[1]
  if h.key?(email)
    l = h[email]
    l.push content
    h[email] = l
  else
    l = [content]
    h[email] = l
  end
end

I didn't put the code to print the Hash. Also I didn't create the code
above in a class because it is just a test.

Well guys, does anyone could see my code and give comments? How the code
above could be improved?

Thanks in advice.

Junior

Try this:

h = Hash.new do |hash, key|
  hash[key] =
end

IO.foreach('data.txt') do |line|
  data = line.chomp.split(',')
  h[data[0]] << data[1]
end

p h

--output:--
{"ruby@ruby.br"=>["value1", "value2"], "bla@bla.com"=>["value1",
"value2", "value3", "value4"]}

···

--
Posted via http://www.ruby-forum.com/\.

Hm, I think that we, Ruby programmers, like "<<" (it's verbose and
less typing) above "push", so:

h[email] << content

···

--
Radosław Bułat

http://radarek.jogger.pl - mój blog

Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

From standard library module 'csv'

   # Open a CSV formatted file for reading or writing.

···

On Wed, 23 Jan 2008, Jair Rillo Junior wrote:

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

   #
   # For reading.
   #
   # EXAMPLE 1
   # CSV.open('csvfile.csv', 'r') do |row|
   # p row
   # end
   #
   # EXAMPLE 2
   # reader = CSV.open('csvfile.csv', 'r')
   # row1 = reader.shift
   # row2 = reader.shift
   # if row2.empty?
   # p 'row2 not find.'
   # end
   # reader.close
   #
   # ARGS
   # filename: filename to parse.
   # col_sep: Column separator. ?, by default. If you want to separate
   # fields with semicolon, give ?; here.
   # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
   # want to separate records with \r, give ?\r here.
   #
   # RETURNS
   # reader instance. To get parse result, see CSV::Reader#each.
   #
   # For writing.
   #
   # EXAMPLE 1
   # CSV.open('csvfile.csv', 'w') do |writer|
   # writer << ['r1c1', 'r1c2']
   # writer << ['r2c1', 'r2c2']
   # writer << [nil, nil]
   # end
   #
   # EXAMPLE 2
   # writer = CSV.open('csvfile.csv', 'w')
   # writer << ['r1c1', 'r1c2'] << ['r2c1', 'r2c2'] << [nil, nil]
   # writer.close
   #
   # ARGS
   # filename: filename to generate.
   # col_sep: Column separator. ?, by default. If you want to separate
   # fields with semicolon, give ?; here.
   # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
   # want to separate records with \r, give ?\r here.
   #
   # RETURNS
   # writer instance. See CSV::Writer#<< and CSV::Writer#add_row to know how
   # to generate CSV string.
   #

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

My flavourite idiom is...
   require 'set'
   h = Hash.new{|hash,key| hash[key] = Set.new}

then in the loop..
   values = lines.split(",")
   email = values.shift
   h[email].merge(values)

Ooh... That's just sooo pretty!

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

The others have shown you how to create a Hash with a block to provide
a default value. Another way to program this is to say:

File.open("Sector_brand.csv").each_line do |lines|
  values = lines.split(",")
  (h[values[0]] ||= []) << values[1]
end

...or the equivalent using one of the CSV libraries.

Clifford Heath.

my initial thought was just to output them plainly :slight_smile:
my stupid example follows,

botp@pc4all:~$ cat test.rb
v0=nil
File.open("test.txt").each_line do |lines|
  values = lines.chomp.split(",")
  if v0 != values[0]
    puts unless v0.nil?
    v0 = values[0]
    print v0
  end
  print ",",values[1]
end

botp@pc4all:~$ ruby test.rb
bla@bla.com,value1,value2,value3,value4
ruby@ruby.br,value1,value2

···

On Jan 23, 2008 9:05 AM, Jair Rillo Junior <jrjuniorsp@yahoo.com.br> wrote:

My initial thought was store the values into a Hash object, where the

awk -F, "{a[$1]=a[$1] FS $2} END{for(k in a)print k a[k]}" file

···

On Jan 22, 7:05 pm, Jair Rillo Junior <jrjunio...@yahoo.com.br> wrote:

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
b...@bla.com,value1
b...@bla.com,value2
b...@bla.com,value3
b...@bla.com,value4
r...@ruby.br,value1
r...@ruby.br,value2

the output should be in two lines
b...@bla.com,value1,value2,value3,value4
r...@ruby.br,value1,value

You can try this code

#!/usr/bin/env ruby

require "csv"

hash = Hash.new { |hash,key| hash[key] = [] }

CSV.open( "file.csv", "r", "," ) do |row|
  hash[row[0]] << row[1]
end

Good luck

Stephane

John Carter wrote:

Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as well.

From standard library module 'csv'

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

···

--
Posted via http://www.ruby-forum.com/\.

Hey guys, so many ways to do !!!

I didn't know about csv library, as well as I didn't know about the <<
operator.

Thank you very much guys!!

···

--
Posted via http://www.ruby-forum.com/.

Jair Rillo Junior wrote:

Hey guys, so many ways to do !!!

I didn't know about csv library, as well as I didn't know about the <<
operator.

Yeah its actually a method.. Array#<<

it seems to be prefered over Array#push, although they both return the
array itself so you can string along a load of appending values
together..

foo = [1,2]

=> [1, 2]

foo << 3 << 4

=> [1, 2, 3, 4]

foo.push(5,6)

=> [1, 2, 3, 4, 5, 6]

foo.push(7).push(8)

=> [1, 2, 3, 4, 5, 6, 7, 8]

Regards,
Lee

···

--
Posted via http://www.ruby-forum.com/\.

Most of the time, it's better to look for standard libraries or at least good third party libraries rather than re-inventing the wheel though. Most of the time, when one re-invents the wheel, one gets it wrong.

David Morton
Maia Mailguard http://www.maiamailguard.com
mortonda@dgrmm.net

···

On Jan 23, 2008, at 12:05 AM, 7stud -- wrote:

John Carter wrote:

Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as well.

From standard library module 'csv'

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

David Morton wrote:

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

Most of the time, it's better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

Looking for a standard library module so that you can split a string on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.

···

--
Posted via http://www.ruby-forum.com/\.

If you think parsing a CSV file is as simple as splitting on a comma, you need to think again.

Look up RFC 4180. It's not a hard format, but it *is* more than just "foo,bar".split(',').

It's enough code that I'd rather use an existing library than to waste a ridiculous amount of time doing it (correctly) myself.

David Morton
Maia Mailguard http://www.maiamailguard.com
mortonda@dgrmm.net

···

On Jan 23, 2008, at 1:21 PM, 7stud -- wrote:

David Morton wrote:

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

Most of the time, it's better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

Looking for a standard library module so that you can split a string on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.

I used the standard CSV class and James' fasterCSV with Ruby 1.8.x
James' solution is much faster. If I know not wrong, I think fasterCSV
replaced the 1.8 class in Ruby 1.9
Just gem install fasterCSV then google for it. You'll find a lot of
good explanations and the doc was mostly good enough for me.

Correct. In Ruby 1.9, when you `require "csv"` you are getting the FasterCSV code under its new name.

James Edward Gray II

···

On Jan 23, 2008, at 2:41 PM, Thomas Wieczorek wrote:

If I know not wrong, I think fasterCSV replaced the 1.8 class in Ruby 1.9