CSV import to hash to compare with database

Hello All,
Is it practical to import data from a csv file, via FasteCSV to a
temporary data structure, like a hash, for editing purposes?
I'm looking to read a file, send the data to a temp location, then
compare the file with similar files in a database table, if any. Then
import the edited data to the database table.

Thank you for any help.

JohnM

···

--
Posted via http://www.ruby-forum.com/.

Hello All,

Hello.

Is it practical to import data from a csv file, via FasteCSV to a
temporary data structure, like a hash, for editing purposes?

I don't see why it wouldn't be. FasterCSV::Row has a to_hash() method. I use this regularly to pump CSV rows into ActiveRecord's create!() method to add records to my database.

James Edward Gray II

···

On Aug 17, 2009, at 10:54 AM, John Mcleod wrote:

Hi,
If you wish to avoid programming and would consider a third party
tool, Try this one:

http://www.nobhillsoft.com/Columbo.aspx

it pretty much comapres everything tabular, so you can easily compare
CSV files. Its still on beta so its free, and our tech support will be
happy to walk you through the steps required to get it done, should
you run into difficulties

Mr. Gray,
Thank you for the quick reply.
I'm a PHP guy with 3 weeks in Ruby on Rails (Department changed platform
3 months after they hired me)
Could you please point me to a good 'FasterCSV' resource with some
examples?
Thank you very much.
JohnM

James Gray wrote:

···

On Aug 17, 2009, at 10:54 AM, John Mcleod wrote:

Hello All,

Hello.

Is it practical to import data from a csv file, via FasteCSV to a
temporary data structure, like a hash, for editing purposes?

I don't see why it wouldn't be. FasterCSV::Row has a to_hash()
method. I use this regularly to pump CSV rows into ActiveRecord's
create!() method to add records to my database.

James Edward Gray II

--
Posted via http://www.ruby-forum.com/\.

Mr. Gray,

You can stick with James. I'm not that old yet. :wink:

Thank you for the quick reply.

Sure. Happy to help.

Could you please point me to a good 'FasterCSV' resource with some
examples?

The documentation is pretty thorough:

http://fastercsv.rubyforge.org/

and the tests show usage:

http://fastercsv.rubyforge.org/svn/trunk/test/

If your needs are simple though, maybe this code is all you need:

   FCSV.foreach( path, :headers => true,
                       :header_converters => :symbol ) do |row|
     YourARModelClass.create!(row.to_hash)
   end

Hopefully that at least gets you started.

James Edward Gray II

···

On Aug 17, 2009, at 11:56 AM, John Mcleod wrote:

James,
Thank you for the example code.
It's funny, I found an old post of yours from 2005.

  h = Hash.new(0)
  file = FasterCSV.read(filename)[1..-1].each { |row| h[row[0]] += 1 }

I've tested it and it seems to work.
I will however, check out the documentation and your test code.

This is step 1 though. Next, populate a hash with data from the
database comparing data from the CSV file and making a editable CSV grid
(post to rails forum).

I'm not done yet.

Thanks again.

JohnM

James Gray wrote:

···

On Aug 17, 2009, at 11:56 AM, John Mcleod wrote:

Mr. Gray,

You can stick with James. I'm not that old yet. :wink:

Thank you for the quick reply.

Sure. Happy to help.

Could you please point me to a good 'FasterCSV' resource with some
examples?

The documentation is pretty thorough:

http://fastercsv.rubyforge.org/

and the tests show usage:

http://fastercsv.rubyforge.org/svn/trunk/test/

If your needs are simple though, maybe this code is all you need:

   FCSV.foreach( path, :headers => true,
                       :header_converters => :symbol ) do |row|
     YourARModelClass.create!(row.to_hash)
   end

Hopefully that at least gets you started.

James Edward Gray II

--
Posted via http://www.ruby-forum.com/\.

John,

if I understand you correctly you have data in CSV files and you have
a relational database. You want to compare the content of those files
with data already present in the database and edit the CSV data before
importing it into the DB.

I do not know what your schema looks like nor what "comparing" in your
case means, but did you consider first loading CSV data into the
database using some staging table, doing the comparison and editing
there and then copying the data over to the target location? That way
you can use the full power of SQL for comparison purposes. And there
are a lot of tools that allow to edit database data in tabular
representation.

Kind regards

robert

···

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert,
Yes, you understand my dilemma.
I want to thank you for your keen insight. I didn't even thought of
that.

When I say "compare", I mean a similarity comparison not an exact
comparison.
I installed 'amatch' gem. It has several comparison features that I
believe will benefit me. In particular, the Levenshtein distance.

Thank you again for the advice.

JohnM

Robert Klemme wrote:

···

John,

if I understand you correctly you have data in CSV files and you have
a relational database. You want to compare the content of those files
with data already present in the database and edit the CSV data before
importing it into the DB.

I do not know what your schema looks like nor what "comparing" in your
case means, but did you consider first loading CSV data into the
database using some staging table, doing the comparison and editing
there and then copying the data over to the target location? That way
you can use the full power of SQL for comparison purposes. And there
are a lot of tools that allow to edit database data in tabular
representation.

Kind regards

robert

--
Posted via http://www.ruby-forum.com/\.

Yes, you understand my dilemma.
I want to thank you for your keen insight. I didn't even thought of
that.

It may not be a viable solution in your case but it could be
worthwhile to ponder.

When I say "compare", I mean a similarity comparison not an exact
comparison.
I installed 'amatch' gem. It has several comparison features that I
believe will benefit me. In particular, the Levenshtein distance.

Two things come to mind: depending on the RDBMS you are using it may
have similar tools (functions) for text processing and analysis.
Depending on your requirements you can even write a database function
that calculates Levenshtein Distance.

Second, you could extract the data from the database via any of Ruby's
database interfaces, do the comparison in Ruby code and write out
results. Then, edit them and import the result back into the DB.

Thank you again for the advice.

Your welcome!

Kind regards

robert

···

2009/8/18 John Mcleod <john.mcleod@hc.msu.edu>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert,

Two things come to mind: depending on the RDBMS you are using it may
have similar tools (functions) for text processing and analysis.

Currently, my development database is SQLite, but my supervisor's plan
is to have Oracle as a production database.

Once again, I'll be in learning mode. (knowledge of Oracle = 0) I'm
experienced in mySQL mostly.

Second, you could extract the data from the database via any of Ruby's
database interfaces, ...

Are the "interfaces" easy to work with?

Then, edit them and import the result back into the DB.

I've been looking at jquery jgrid for rails. It looks easy to implement
(that's a topic for another forum)

Thanks again.

John

Robert Klemme wrote:

···

2009/8/18 John Mcleod <john.mcleod@hc.msu.edu>:

Yes, you understand my dilemma.
I want to thank you for your keen insight. I didn't even thought of
that.

It may not be a viable solution in your case but it could be
worthwhile to ponder.

When I say "compare", I mean a similarity comparison not an exact
comparison.
I installed 'amatch' gem. It has several comparison features that I
believe will benefit me. In particular, the Levenshtein distance.

Two things come to mind: depending on the RDBMS you are using it may
have similar tools (functions) for text processing and analysis.
Depending on your requirements you can even write a database function
that calculates Levenshtein Distance.

Second, you could extract the data from the database via any of Ruby's
database interfaces, do the comparison in Ruby code and write out
results. Then, edit them and import the result back into the DB.

Thank you again for the advice.

Your welcome!

Kind regards

robert

--
Posted via http://www.ruby-forum.com/\.

DBI is pretty straightforward - once you have it installed IIRC. It's
a while since I used it.

But you can as easily export data as CSV with SQL Plus - example for
the SCOTT demo schema that comes with the database (if it is
installed):

c:\Temp>sqlplus -S scott/PWD @emp
7369;"SMITH";"CLERK"
7499;"ALLEN";"SALESMAN"
7521;"WARD";"SALESMAN"
7566;"JONES";"MANAGER"
7654;"MARTIN";"SALESMAN"
7698;"BLAKE";"MANAGER"
7782;"CLARK";"MANAGER"
7788;"SCOTT";"ANALYST"
7839;"KING";"PRESIDENT"
7844;"TURNER";"SALESMAN"
7876;"ADAMS";"CLERK"
7900;"JAMES";"CLERK"
7902;"FORD";"ANALYST"
7934;"MILLER";"CLERK"

c:\Temp>

emp.sql:

set pagesize 0 linesize 200
set feedback off
select empno || ';"' || ename || '";"' || job || '"'
from emp
order by empno
/
exit

You can find all the docs over at http://tahiti.oracle.com/

Kind regards

robert

···

2009/8/18 John Mcleod <john.mcleod@hc.msu.edu>:

Robert,

Two things come to mind: depending on the RDBMS you are using it may
have similar tools (functions) for text processing and analysis.

Currently, my development database is SQLite, but my supervisor's plan
is to have Oracle as a production database.

Once again, I'll be in learning mode. (knowledge of Oracle = 0) I'm
experienced in mySQL mostly.

Second, you could extract the data from the database via any of Ruby's
database interfaces, ...

Are the "interfaces" easy to work with?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/