Following post is a bit lengthy and I am not expecting to get fully
working code to solve this problem, rather some pointers on how to
approach parsing in this example.
I have large text files(s) that I need to parse into more usable form.
Contents is output from legacy ERP-system.
Source files look like example file I have attached. Report includes
shipped amounts per week range by customer by customer department by
product by weekday. Page breaks are hard coded so customer and week info
can be found several times for given week if there are lots of products.
I am hoping to get into form where each line includes all relevant
information for given product on given day. So columns could be for
example:
Year, Week, Weekday, CustomerID, Customer, Sub-custID, Sub-cust,
ProductID, Product, Quantity
->
2008, 39, MON, 97, CUSTOMER A, 999, DEPARTMENT A, 123, PRODUCT A, 150
2008, 39, TUE, 97, CUSTOMER A, 999, DEPARTMENT A, 123, PRODUCT A, 50
I am pretty much noob with ruby so bear with me, but I had this kind of
idea on how this could work.
First file is read into array and array fed to method that does the
work. Finished rows are saved in array in array and that array is later
written to file
Check if row includes 'Customer' and compare it to current customer to
see if customer has changed. Check for Sub-cust to see if department has
changed and save these in variable. Same checks for week and year.
If row starts with number it is product heading row and that row
includes qty for monday (this is consistent). Six rows after that start
with TUE-SUN, except when page break breaks flow. Quantity is always
first number after day.
I have managed to extract customer id and I think other heading info
won't differ much from that, but now I got stuck when I tried to
determine if string starts with number to see if that row includes
product info. I tried start_with?(/\d+/)[0] but that didn't work.
def readfile(file)
IO.readlines(file)
end
def getcust(arr)
custs = []
arr.each do |fl|
if fl.include? 'Customer'
custs.push fl.scan(/\d+/)[0]
end
end
custs
end
lines = readfile 'source_example'
customers = getcust lines
puts customers
All comments that move me forward are appreciated.
Attachments:
http://www.ruby-forum.com/attachment/2757/source_example
···
--
Posted via http://www.ruby-forum.com/.