FasterCSV - varying headers

Hello,

Quick warning: I am very much a ruby newbie and am extremely new to
programming in general.

I'm attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user's headers as if they followed my original specifications exactly.

For example, let's say that I require the following columns: Product
Title, Product Price. If the user were to provide me with the headers
worded as Product Name and Product Pricing, I would want to assign
'Product Name' to represent 'Product Title.'

  I suspect that throwing the headers into a hash would be ideal, but
I'm not entirely sure how to approach it. Here an excerpt from my
attempt thus far...

require "rubygems"
require "fastercsv"

class HeaderProcessing
  attr_accessor :file
  attr_accessor :headers
  attr_accessor :clientid
  attr_accessor :product_title_header, :product_price_header

  def initialize
  puts "What is the client ID?"
  @clientid = gets.chomp
    open_file
  end

  def open_file
    infile = "tobeprocessed/#{@clientid}.csv"
    outfile = "tobeprocessed/#{@clientid}_out.csv"
  csv = FasterCSV.read(infile, {:headers => true, :return_headers =>
true, :header_converters => :symbol})
     # Not sure if read is the best approach here, since some files
could get quite large.
    puts "The user's headers are "
    puts csv.headers.inspect
    puts "\n \n Please enter the user supplied Product Title header"
  @product_title_header = gets.chomp
     puts "\n \n Please enter the user supplied Product Price"
  @product_price_header = gets.chomp
  # I do this with each required and optional header. Not very DRY for
now...
# I now have each of the user's headers I intend to use in a number of
instance variables.
# placeholder for user product data clean up
  File.open(outfile, "w") { |f| f.puts csv }
    end
  end
queued = HeaderProcessing.new

If I understand FasterCSV correctly, by setting :headers to true, the
csv file was read as a table object. Is it possible to turn the table's
headers into a hash and then set each key/value to the appropriate
variable (as per @product_title_header etc)? If so, how? I've been
rummaging through the FasterCSV docs that I believe pertain to the
question, but I'm a bit lost on the actual implementation

Is it also feasible to save these hash definitions to a separate file so
that I won't have to go through the same process when/if the user
provides a new file with updated prices? Alternatively, if there's a
more appropriate way to tackle this, I'm all ears.

Thanks in advance!
Inf

···

--
Posted via http://www.ruby-forum.com/.

I wrote a rails plugin which does this type of translation between
user supplied columns and expected columns. It is specific to Rails
but you might be able to get some ideas from it.

Andrew Timberlake
http://ramblingsonrails.com

http://MyMvelope.com - The SIMPLE way to manage your savings

···

On Thu, Oct 1, 2009 at 10:09 AM, Sean Mcknew <smcknew@gmail.com> wrote:

Hello,

Quick warning: I am very much a ruby newbie and am extremely new to
programming in general.

I'm attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user's headers as if they followed my original specifications exactly.

For example, let's say that I require the following columns: Product
Title, Product Price. If the user were to provide me with the headers
worded as Product Name and Product Pricing, I would want to assign
'Product Name' to represent 'Product Title.'

I suspect that throwing the headers into a hash would be ideal, but
I'm not entirely sure how to approach it. Here an excerpt from my
attempt thus far...

require "rubygems"
require "fastercsv"

class HeaderProcessing
attr_accessor :file
attr_accessor :headers
attr_accessor :clientid
attr_accessor :product_title_header, :product_price_header

def initialize
puts "What is the client ID?"
@clientid = gets.chomp
open_file
end

def open_file
infile = "tobeprocessed/#{@clientid}.csv"
outfile = "tobeprocessed/#{@clientid}_out.csv"
csv = FasterCSV.read(infile, {:headers => true, :return_headers =>
true, :header_converters => :symbol})
# Not sure if read is the best approach here, since some files
could get quite large.
puts "The user's headers are "
puts csv.headers.inspect
puts "\n \n Please enter the user supplied Product Title header"
@product_title_header = gets.chomp
puts "\n \n Please enter the user supplied Product Price"
@product_price_header = gets.chomp
# I do this with each required and optional header. Not very DRY for
now...
# I now have each of the user's headers I intend to use in a number of
instance variables.
# placeholder for user product data clean up
File.open(outfile, "w") { |f| f.puts csv }
end
end
queued = HeaderProcessing.new

If I understand FasterCSV correctly, by setting :headers to true, the
csv file was read as a table object. Is it possible to turn the table's
headers into a hash and then set each key/value to the appropriate
variable (as per @product_title_header etc)? If so, how? I've been
rummaging through the FasterCSV docs that I believe pertain to the
question, but I'm a bit lost on the actual implementation

Is it also feasible to save these hash definitions to a separate file so
that I won't have to go through the same process when/if the user
provides a new file with updated prices? Alternatively, if there's a
more appropriate way to tackle this, I'm all ears.

Thanks in advance!
Inf
--
Posted via http://www.ruby-forum.com/\.

Hello,

Hello.

I'm attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user's headers as if they followed my original specifications exactly.

Alternatively, if there's a more appropriate way to tackle this, I'm all ears.

I have some ideas.

First, let's talk about the matching headers problem. Coming up with everything a user might think of to type in sounds hard to me. What if we showed the user which headers are available instead and had them pick from a list? It seems like that would be easier and more accurate.

My other thought is that it looks like you are slurping the whole file into memory just to write it all back out. Why don't we just read a line, fix it, write it out, and move on to the next line? That should take less memory.

Here's some example code combining these thoughts:

   $ cat products.csv
   Product Title,Product Price,Product Rating
   Agricola,$55.99,4.5
   Dominion,$35.99,5
   Pandemic,$27.99,4.75
   $ ruby csv_transfer.rb products.csv
   1: Product Title
   2: Product Price
   3: Product Rating
   d: Done

   Column to include: 1
   Added Product Title.
   2: Product Price
   3: Product Rating
   d: Done

   Column to include: 2
   Added Product Price.
   3: Product Rating
   d: Done

   Column to include: d
   $ cat products_new.csv
   Product Title,Product Price
   Agricola,$55.99
   Dominion,$35.99
   Pandemic,$27.99
   $ cat csv_transfer.rb
   #!/usr/bin/env ruby -wKU

   require "rubygems"
   require "faster_csv"

   file = ARGV.shift or abort "USAGE: #{$PROGRAM_NAME} CSV_FILE"
   columns =
   FCSV.open("#{File.basename(file, '.csv')}_new.csv", "w") do |csv|
     FCSV.foreach(file, :headers => true) do |row|
       # The following is a simple menu selection for columns.
       if columns.empty?
         loop do
           choices = { }
           row.headers.each_with_index do |column, i|
             unless columns.include? column
               n = i + 1
               puts "#{n}: #{column}"
               choices[n] = column
             end
           end
           puts "d: Done"
           puts
           print "Column to include: "
           choice = gets or break
           if column = choices[choice.strip.to_i]
             columns << column
             puts "Added #{column}."
           elsif choice =~ /\Ad(?:one)?\Z/i
             break
           else
             puts "Invalid column selection."
           end
         end
         if columns.empty?
           puts "No columns selected."
           exit
         end
         csv << columns
       end

       # Copy only the selected columns.
       csv << columns.map { |column| row[column] }
     end
   end

   __END__

Hope that helps.

James Edward Gray II

···

On Oct 1, 2009, at 3:09 AM, Sean Mcknew wrote:

Good choices for the example (the ratings are over 5, right?) :slight_smile:

Jesus.

···

On Thu, Oct 1, 2009 at 3:56 PM, James Edward Gray II <james@graysoftinc.com> wrote:

$ cat products.csv
Product Title,Product Price,Product Rating
Agricola,$55.99,4.5
Dominion,$35.99,5
Pandemic,$27.99,4.75

Andrew: A rails version was definitely in the pipeline on this end, so
you will have saved me quite a bit of time. Thanks for sharing the
plugin!

James: I very much appreciate the assistance. I suspect I'll learn
quite a bit as I experiment with the example code you've posted.
Thanks!

Regards,
S

···

--
Posted via http://www.ruby-forum.com/.

Absolutely. I'm glad someone appreciated the examples. :wink:

James Edward Gray II

···

On Oct 1, 2009, at 9:27 AM, Jesús Gabriel y Galán wrote:

On Thu, Oct 1, 2009 at 3:56 PM, James Edward Gray II > <james@graysoftinc.com> wrote:

$ cat products.csv
Product Title,Product Price,Product Rating
Agricola,$55.99,4.5
Dominion,$35.99,5
Pandemic,$27.99,4.75

Good choices for the example (the ratings are over 5, right?) :slight_smile: