Output unique values in CSV columns to a text file

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
   UniqueVal1
   UniqueVal2
Column: ColHeader2
   UniqueVal1
   UniqueVal2
...

What I'm currently getting is output that looks as follows:

Column: ColHeader1

   ColHeader1UniqueVal1
   ColHeader1UniqueVal2
Column: ColHeader2

   ColHeader2UniqueVal1
   ColHeader2UniqueVal2
...

For some reason, it is appending the column header to each value and
also printing a blank row to start each column. My code is below. Any
help is much appreciated. Essentially I read the CSV into a hash where
the key is the column header and the element is an array of values from
that column. I then run .uniq! on each array in the hash and print the
results to a file.

require 'rubygems'
require 'faster_csv'

infile = "xyz.csv"

uniques = {}

FCSV.open(infile, :headers => true).each do |row|
  row.each_with_index do |element,j|
    uniques[row.headers[j]] ||= []
    uniques[row.header[j]] << element
  end
end

uniques.each do |key,element|
  element.uniq!
end

File.open("unique_output.txt","w+") do |out|
  uniques.each_key do |key|
    out.write "Column: #{key}\n"
    uniques[key].each do |element|
      out.write " #{element}\n"
    end
  end
end

···

--
Posted via http://www.ruby-forum.com/.

Well, if it all fits in memory it's super easy using FCSV's Tables:

#!/usr/bin/env ruby -w

require "rubygems"
require "faster_csv"

table = FCSV.parse(DATA.read, :headers => true)
table.by_col!.each do |header, col|
   puts "#{header}:"
   puts " #{col.uniq.join(', ')}"
end

__END__
nums,letters
1,a
2,b
3,c

James Edward Gray II

···

On Dec 18, 2006, at 4:04 PM, Drew Olson wrote:

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
   UniqueVal1
   UniqueVal2
Column: ColHeader2
   UniqueVal1
   UniqueVal2
...

Drew Olson wrote:

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
   UniqueVal1
   UniqueVal2
Column: ColHeader2
   UniqueVal1
   UniqueVal2
..

What I'm currently getting is output that looks as follows:

Column: ColHeader1

   ColHeader1UniqueVal1
   ColHeader1UniqueVal2
Column: ColHeader2

   ColHeader2UniqueVal1
   ColHeader2UniqueVal2
..

For some reason, it is appending the column header to each value and
also printing a blank row to start each column. My code is below. Any
help is much appreciated. Essentially I read the CSV into a hash where
the key is the column header and the element is an array of values from
that column. I then run .uniq! on each array in the hash and print the
results to a file.

require 'rubygems'
require 'faster_csv'

infile = "xyz.csv"

uniques = {}

FCSV.open(infile, :headers => true).each do |row|
  row.each_with_index do |element,j|
    uniques[row.headers[j]] ||=
    uniques[row.header[j]] << element
  end
end

uniques.each do |key,element|
  element.uniq!
end

File.open("unique_output.txt","w+") do |out|
  uniques.each_key do |key|
    out.write "Column: #{key}\n"
    uniques[key].each do |element|
      out.write " #{element}\n"
    end
  end
end

--
Posted via http://www.ruby-forum.com/\.

data = DATA.readlines.map{|s| s.chomp.split(",")}
header = data.shift.map{|s| "Column: " + s}

data = data.transpose.map{|ary| ary.uniq.map{|s| " " + s} }

puts header.zip(data)

__END__
It's,so,simple!
a,b,c
a,b,c
d,e,f