Help with refactor and db/ar advice

Hi all,
I have a number of delimited data files that I would like to import
into MySql. I am using ActiveRecord to create the schema and do the
import of the data. Code snippets follows for how I am doing one of
the many files. However, the code bloats when I do it for all of them.
Now, of course, I can't get around creating a schema for each table,
but it seems that the function that loads the code is a bit redundant.
So, my end goal in this post is just to learn a few things and create
some cleaner code, so I would love your comments.

My questions are:
1. Is there a way to take advantage of the schema so that I don't have
to re-write it in the load function?
2. Is there a way I can write a generic load function (taking
advantage of #1 if available, or not if not).
3. What is the "better" way (than ][1..-2]) to strip a string of a
starting/ending ~
4. Should I just use an AR generated primary key, rather than the data
sets ASCII primary key? (many of the tables in the datasets have
composite keys, in which I will not implement, so I am curious if I
just create all of the primary keys through AR insted).
5. Any other advice?

Thanks in advance!

===========> CURRENT CODE <=============

  class FoodDescription < ActiveRecord::Base
  end

  def create_food_descriptions
    ActiveRecord::Schema.define do
      create_table(:food_descriptions, :primary_key =>
'NDB_No', :options => 'DEFAULT CHARSET=utf8') do |table|
          table.column :NDB_No, :string, :limit => 5, :null => false
          table.column :FdGrp_Cd, :string, :limit => 4, :null => false
          table.column :Long_Desc, :string, :limit => 200, :null =>
false
          table.column :Shrt_Desc, :string, :limit => 60, :null =>
false
          table.column :CommName, :string, :limit => 100, :null =>
true
          table.column :ManufacName, :string, :limit => 65, :null =>
true
          table.column :Survey, :string, :limit => 1, :null => true
          table.column :Ref_desc, :string, :limit => 135, :null =>
true
          table.column :Refuse, :string, :limit => 2, :null => true
          table.column :SciName, :string, :limit => 65, :null => true
          table.column :N_Factor, :float, :null => true
          table.column :Pro_Factor, :float, :null => true
          table.column :Fat_Factor, :float, :null => true
          table.column :CHO_Factor, :float, :null => true
      end
      puts " ... food_descriptions created"
    end
  end

  def load_food_descriptions( food_desc_file_name )
    # Loop through the file, assuming the following format:
    # ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
    # ndbnum, fdgrp, long_desc, shrt_desc, common_name,
manufacturer_name, survey,
    # refuse_desc, percent_refuse, science_name, n_fac, pro_fac,
fat_fac, CHO_fac

    food_desc_file = File.open(food_desc_file_name)
    food_desc_file.each do |line|
      # caret (^) is the delimeter
      # squiglies are the string delimeters: ~string~
      # hence below, we split with 1..-2 to remove them if a string
      record_split = line.split('^')

      FoodDescription.create(
        :NDB_No => record_split[0][1..-2],
        :FdGrp_Cd => record_split[1][1..-2],
        :Long_Desc => record_split[2][1..-2],
        :Shrt_Desc => record_split[3][1..-2],
        :CommName => record_split[4][1..-2],
        :ManufacName => record_split[5][1..-2],
        :Survey => record_split[6][1..-2],
        :Ref_desc => record_split[7][1..-2],
        :Refuse => record_split[8][1..-2],
        :SciName => record_split[9][1..-2],
        :N_Factor => record_split[10],
        :Pro_Factor => record_split[11],
        :Fat_Factor => record_split[12],
        :CHO_Factor => record_split[13]
        );
    end
  end

ball wrote:

1. Is there a way to take advantage of the schema so that I don't have
to re-write it in the load function?

puts FoodDescription.columns.inspect
puts FoodDescription.map { |c| c.name }.inspect

2. Is there a way I can write a generic load function (taking
advantage of #1 if available, or not if not).

As long as the columns in the text source appear in the same order as
the columns in the database definition you should be fine. I think this
ordering is guaranteed by the database, otherwise you couldn't
meaningfully do "INSERT INTO foo VALUES (x,y,z)" (without naming the
columns) or "SELECT * FROM foo"

3. What is the "better" way (than ][1..-2]) to strip a string of a
starting/ending ~

Do it all in one go. A couple of options:

records = line.split(/^/).map { |rec| rec[1..-2] }
records = line.scan(/~([^~^]*)~/)

but they won't work unless *all* your fields are ~ delimited. So this
might work better for you:

records = line.split(/^/).map { |rec| rec.sub(/~(.*)~) { $1 } }

4. Should I just use an AR generated primary key, rather than the data
sets ASCII primary key? (many of the tables in the datasets have
composite keys, in which I will not implement, so I am curious if I
just create all of the primary keys through AR insted).

ActiveRecord has some hard-coded assumptions about primary keys. If you
override its assumption that the key is called "id" then it won't
generate keys automatically for new records. If you have non-integer
primary key then it may or may not work. It certainly doesn't handle
composite primary keys; I saw a plugin which claimed to do this some
time ago, but I don't know if it's been kept up to date.

So my advice with AR is: if you are working with an existing
database/schema that you don't control then try using non-standard
primary keys, but if you control the schema yourself, follow AR's
assumption of an integer primary key.

···

--
Posted via http://www.ruby-forum.com/\.