Hello List,
I have really just started dabbling in Ruby and have found my first little task that I can use to try out some of what I've read.
Basically I'm reading in a CSV file and converting it into a proprietry XML format. To facilitate in the XML generation, I store the items in a
Hash of Hashes of Arrays. Where the first level of hashing is based on the type of data item, the second hash is based on the item's parent, and the Array carries all the items for a specific parent.
eg. hash=Hash["class1" => {"parent1" => [class1("a"),class1("b"),class1("c")], "parent2" => [class1("e"),class1("f"),class1("g")], ...}, "class2" => [...], ...]
The code below does this in a somewhat clumsy way using Ruby. I find that when running this on 10 000 lines -> execution time is 4.5 seconds, however if I increase to 300 000 lines (30 times the data) -> execution time is 14 min 21 sec (168 times the execution time). Is this nonlinear scaling due to allocating memory to hold the data? Could I improve this using symbols? Any other general remarks would be appreciated.
csv_file.each_line do |line| #FORMAT: PARENT1,PARENT2,PARENT3,ITEM,ITEMR,ITEMTYPE,SUBITEM,ITEM_LAYER,PNAME,PVALUE,LEVEL
sl=line.split(",")
parent1=sl[0]; parent2=sl[1]; parent3=sl[2]; item=sl[3]; itemr=sl[4]; itemtype=sl[5]; subitem=sl[6]; item_layer=sl[7]; pname=sl[8]; pvalue=sl[9]; level=sl[10]
if parent1 == "NONE" && parent2 != "NONE"
parent_type="PARENT2"
node=parent2
elsif parent1 != "NONE" && parent2 == "NONE"
parent_type="PARENT1"
node=parent1
end
i = case level
when "ITEM" then Item.new(node,item)
when "SUBITEM" then Subitem.new(node,item,subitem)
when "ITEM_LAYER" then Item_layer.new(node,item,subitem,item_layer)
when "RELATION" then Relation.new(node,item,itemr)
when "PARENT3" then Parent3.new(node,parent3)
when "PARENT2" then Parent2.new(parent2)
when "PARENT1" then Parent1.new(parent1)
end
if ! i.nil?
hash[(i.class)]=Hash.new if hash[(i.class)].nil?
hash[(i.class)][(i.parent)]=Array.new if hash[(i.class)][(i.parent)].nil?
ind = hash[(i.class)][(i.parent)].index(i)
if ind.nil?
i.addSetting( Setting.new( pname, pvalue ) )
hash[(i.class)][(i.parent)].push(i)
else
(hash[(i.class)][(i.parent)])[ind].addSetting( Setting.new( pname, pvalue ) )
end
end
end
Thanks in advance,
Jeremy