I don't have access to the database, but everyday I get a csv dump of all
live customer listings.
There are around 15,000 live listings.
A record contains 3 fields: id, name, description
Everyday some new accounts are created while others are deleted.
Everyday I want to compare yesterday's dump with today's dump and print
the full record of each terminated record and each new record.
I am achieving this with the script below.
That last line makes the script run very slow.
Is there a more elegant way to compare two
arrays>hashes>sets>FasterCSVtables while carrying a few fields along for the
ride?
WORKING CODE:
require 'rubygems'
require 'fastercsv'
sundays_dump = FasterCSV.read("./sunday.csv")
mondays_dump = FasterCSV.read("./monday.csv")
sundays_ids = sundays_dump.collect {|row| row[1]}
mondays_ids = mondays_dump.collect {|row| row[1]}
newaccount_ids = mondays_ids - sundays_ids
terminated_ids = sundays_ids - mondays_ids
sundays_dump.each {|row| puts 'delete,'+row[0]+','+row[1] if
terminated_ids.include? row[1]}
mondays_dump.each {|row| puts 'create,'+row[0]+','+row[1] if
newaccount_ids.include? row[1]}
TEST DATA:
sunday.csv
id,name,otherstuff
1,larry,bald
2,curly,tall
3,moe,meanie
monday.csv
id,name,otherstuff
1,larry,bald
4,shemp,greasy
OUTPUT:
delete,2,curly
delete,3,moe
create,4,shemp