Hi,
So I've been digging into Ruby for the past week, and I've come across an
interesting problem that I want to solve with my new friend. Only I don't
want to reinvent the wheel.
So here's the problem: I have a CSV file that I need to munge into a batch
file for a mainframe to process. This file has many (say, 30 or more) fixed
width fields per record with distinct rules attached to each field (e.g.,
field1 is an eight-position date following the pattern YYYYMMDD, field2 is a
five-position enumerated customer type, field3 may contain either a 70 or a
71 depending on the customer type, etc). Say, ~10K records per batch file
(plus header, subheaders, footers, and possibly addendums) and the batch
file has to get rebuilt nightly, plus a great big monthly summary file
composed of all the nightly files strung together. I want to go from csv to
batch file automagically, and just writing a one-off for this one file
format seems like a total waste of time.
There are a bizillion of these formats out there, and I have to imagine
spooning csv files into yet another format is pretty common. I'm wondering
if Ruby already has a batch file library, something like text-format.rb only
more useful?
If not then I'm going to write one... but right now I'm also likely to write
something that looks way more like C than Ruby.
Any tips, pointers, hints, and otherwise code that might exemplify the Ruby
way to go about it would be much appreciated. Thanks in particular to any
ruby local out there who might take a sec to point a tourist in the right
direction.
TIA,
Steve
# go from csv to
# batch file automagically, and just writing a one-off for this one file
# format seems like a total waste of time.
# There are a bizillion of these formats out there, and I have
# to imagine
# spooning csv files into yet another format is pretty common.
# I'm wondering
# if Ruby already has a batch file library, something like
# text-format.rb only
# more useful?
just in case you'd want to start anew, try getting some ideas fr
1. http://www.devsource.com/article2/0,1895,1928561,00.asp
2. http://fastercsv.rubyforge.org/
kind regards -botp
···
From: Stephen Smith [mailto:4fires@gmail.com] :
I'm a little confused by your description of the file. You call it a CSV file and say it has fixed-width fields, but those are two different things.
Either way though, Ruby has the tools you need.
For CSV data, see the standard "csv" library. For splitting up fixed width fields, a call to String#unpack will do.
Hope that helps.
James Edward Gray II
···
On Apr 17, 2007, at 1:40 AM, Stephen Smith wrote:
So here's the problem: I have a CSV file that I need to munge into a batch
file for a mainframe to process. This file has many (say, 30 or more) fixed
width fields per record with distinct rules attached to each field (e.g.,
field1 is an eight-position date following the pattern YYYYMMDD, field2 is a
five-position enumerated customer type, field3 may contain either a 70 or a
71 depending on the customer type, etc).
Thanks for asking for clarification.
I frequently have to turn CSV files into files that follow different
fixed-width formats. So it's not that I need to unpack my data. Rather, I
need a way to take csv'd data, and a separate file format or even
specification, and "pack" the CSV data into the format. For example, right
now I need to generate a nightly batch file with headers and footers and
arbitrarily complicated field specifications from a largish csv file.
And the formats that I have to satisfy are often complex, so doing one-off
conversion scripts can get to be a pain to maintain very quickly.
Your FasterCSV and String#unpack seem like a great place to start. At a
minimum, I need to be able to attach rules to each field like the number of
positions (as well as packing character, eg, whitespace or zeroes, and maybe
formulas based on other field values) in my format string. Is there an
equivalent of String#pack out there?
Steve
···
On 4/17/07, James Edward Gray II <james@grayproductions.net> wrote:
On Apr 17, 2007, at 1:40 AM, Stephen Smith wrote:
> So here's the problem: I have a CSV file that I need to munge into
> a batch
> file for a mainframe to process. This file has many (say, 30 or
> more) fixed
> width fields per record with distinct rules attached to each field
> (e.g.,
> field1 is an eight-position date following the pattern YYYYMMDD,
> field2 is a
> five-position enumerated customer type, field3 may contain either a
> 70 or a
> 71 depending on the customer type, etc).
I'm a little confused by your description of the file. You call it a
CSV file and say it has fixed-width fields, but those are two
different things.
Either way though, Ruby has the tools you need.
For CSV data, see the standard "csv" library. For splitting up fixed
width fields, a call to String#unpack will do.
Hope that helps.
James Edward Gray II
@Botp, That's right up my alley... Here's where I'm headed. Since CSV file
structure is a bit like vanilla icecream, I'd like to provide a file
structure with more complexity or "flavor" as an input, along with some
CSV'd data. That way I can read in the structure and the data separately and
get my data out in the format I want. My inspiration is deBabelizer, but
also now simply some kind of String#pack.
···
On 4/17/07, Peña, Botp <botp@delmonte-phil.com> wrote:
From: Stephen Smith [mailto:4fires@gmail.com] :
# go from csv to
# batch file automagically, and just writing a one-off for this one file
# format seems like a total waste of time.
# There are a bizillion of these formats out there, and I have
# to imagine
# spooning csv files into yet another format is pretty common.
# I'm wondering
# if Ruby already has a batch file library, something like
# text-format.rb only
# more useful?
just in case you'd want to start anew, try getting some ideas fr
1. http://www.devsource.com/article2/0,1895,1928561,00.asp
2. http://fastercsv.rubyforge.org/
kind regards -botp
[snip]
Your FasterCSV and String#unpack seem like a great place to start. At a
minimum, I need to be able to attach rules to each field like the number of
positions (as well as packing character, eg, whitespace or zeroes, and maybe
formulas based on other field values) in my format string. Is there an
equivalent of String#pack out there?
Steve
Array#pack
So unpack goes from String to Array, and pack goes from Array to String.
-A
···
On 4/17/07, Stephen Smith <4fires@gmail.com> wrote:
Ah, I understand now. Your project is to translate CSV to fixed-width. Got it.
Obviously I'm biased, but FasterCSV should give you rich handling on the reading end, I think. That part should be pretty covered.
Where you are likely to spend the effort is in the fixed-width writing. There is an Array#pack, as others have pointed out, but you sound like you're after something higher level than that. You want it to catch the datas and pack them as YYYYMMDD strings for you and the like.
If you come up with a good solution it may be worth generalizing and sharing, in my opinion.
James Edward Gray II
···
On Apr 17, 2007, at 10:49 AM, Stephen Smith wrote:
I frequently have to turn CSV files into files that follow different
fixed-width formats.
Hi,
Since you are describing a lot of what I do (but not in Ruby), I thought I
might point you here for some ideas: http://www.unidex.com/overview.htm
Essentially, that link describes a flat-file parser that reads a XML
definition of the layout. You might want to use that and model your own
schema to accomplish something similar in Ruby.
Best Regards,
Dan
"Stephen Smith"
<4fires@gmail.com
> To
ruby-talk@ruby-lang.org (ruby-talk
04/17/2007 11:12 ML)
AM cc
Subject
Please respond to Re: text processing library
ruby-talk@ruby-la
ng.org
@Botp, That's right up my alley... Here's where I'm headed. Since CSV file
structure is a bit like vanilla icecream, I'd like to provide a file
structure with more complexity or "flavor" as an input, along with some
CSV'd data. That way I can read in the structure and the data separately
and
get my data out in the format I want. My inspiration is deBabelizer, but
also now simply some kind of String#pack.
···
On 4/17/07, Peña, Botp <botp@delmonte-phil.com> wrote:
From: Stephen Smith [mailto:4fires@gmail.com] :
# go from csv to
# batch file automagically, and just writing a one-off for this one file
# format seems like a total waste of time.
# There are a bizillion of these formats out there, and I have
# to imagine
# spooning csv files into yet another format is pretty common.
# I'm wondering
# if Ruby already has a batch file library, something like
# text-format.rb only
# more useful?
just in case you'd want to start anew, try getting some ideas fr
1. http://www.devsource.com/article2/0,1895,1928561,00.asp
2. http://fastercsv.rubyforge.org/
kind regards -botp
. This message and any attachments contain information from Union Pacific which may be confidential and/or privileged.
If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited by law. If you receive this message in error, please contact the sender immediately and delete the message and any attachments.
@James, Right on. I think you're correct - FasterCSV should handle the CSV
reading no problem. I think my first edition might handle writing to a flat
file based on an XML schema. I'm going to try to think about generalization
from the beginning, and see if that keeps my code easier to maintain and
use. When you wrote FasterCSV, how did you bake in the formatting rules? Did
you write an XML schema or something similar based on the CSV RFC?
@Dan, Many thanks! XFlat seems like exactly what I need. Only I can't find
the DTD anywhere... is it proprietary?
Now you've both got me thinking about a text conversion swiss army knife ...
so long as a conversion library has access to input and output schemas,
there's no reason why our well-formed data can't enjoy total freedom. Which
I probably can't convince anyone to pay me to write, but I might anyway.
Hm... In any event, my little csv-to-flat_file_format_X tool is a good way
to introduce myself to text processing in Ruby. (And I'm hoping not that
tough...) 
This seems like a much stronger idea now. Thanks guys.
4fires
···
On 4/17/07, DDENNISON@up.com <DDENNISON@up.com> wrote:
Hi,
Since you are describing a lot of what I do (but not in Ruby), I thought I
might point you here for some ideas: http://www.unidex.com/overview.htm
Essentially, that link describes a flat-file parser that reads a XML
definition of the layout. You might want to use that and model your own
schema to accomplish something similar in Ruby.
Best Regards,
Dan
"Stephen Smith"
<4fires@gmail.com
> To
ruby-talk@ruby-lang.org (ruby-talk
04/17/2007 11:12 ML)
AM cc
Subject
Please respond to Re: text processing library
ruby-talk@ruby-la
ng.org
@Botp, That's right up my alley... Here's where I'm headed. Since CSV file
structure is a bit like vanilla icecream, I'd like to provide a file
structure with more complexity or "flavor" as an input, along with some
CSV'd data. That way I can read in the structure and the data separately
and
get my data out in the format I want. My inspiration is deBabelizer, but
also now simply some kind of String#pack.
On 4/17/07, Peña, Botp <botp@delmonte-phil.com> wrote:
>
> From: Stephen Smith [mailto:4fires@gmail.com] :
> # go from csv to
> # batch file automagically, and just writing a one-off for this one file
> # format seems like a total waste of time.
> # There are a bizillion of these formats out there, and I have
> # to imagine
> # spooning csv files into yet another format is pretty common.
> # I'm wondering
> # if Ruby already has a batch file library, something like
> # text-format.rb only
> # more useful?
>
> just in case you'd want to start anew, try getting some ideas fr
> 1. http://www.devsource.com/article2/0,1895,1928561,00.asp
> 2. http://fastercsv.rubyforge.org/
>
> kind regards -botp
>
. This
message and any attachments contain information from Union Pacific which may
be confidential and/or privileged.
If you are not the intended recipient, be aware that any disclosure,
copying, distribution or use of the contents of this message is strictly
prohibited by law. If you receive this message in error, please contact the
sender immediately and delete the message and any attachments.
On 4/17/07, DDENNISON@up.com <DDENNISON@up.com> wrote:
Hi,
Since you are describing a lot of what I do (but not in Ruby), I thought I
might point you here for some ideas: http://www.unidex.com/overview.htm
Essentially, that link describes a flat-file parser that reads a XML
definition of the layout. You might want to use that and model your own
schema to accomplish something similar in Ruby.
Best Regards,
Dan
"Stephen Smith"
<4fires@gmail.com
> To
ruby-talk@ruby-lang.org (ruby-talk
04/17/2007 11:12 ML)
AM cc
Subject
Please respond to Re: text processing library
ruby-talk@ruby-la
ng.org
@Botp, That's right up my alley... Here's where I'm headed. Since CSV file
structure is a bit like vanilla icecream, I'd like to provide a file
structure with more complexity or "flavor" as an input, along with some
CSV'd data. That way I can read in the structure and the data separately
and
get my data out in the format I want. My inspiration is deBabelizer, but
also now simply some kind of String#pack.
On 4/17/07, Peña, Botp <botp@delmonte-phil.com> wrote:
>
> From: Stephen Smith [mailto:4fires@gmail.com] :
> # go from csv to
> # batch file automagically, and just writing a one-off for this one file
> # format seems like a total waste of time.
> # There are a bizillion of these formats out there, and I have
> # to imagine
> # spooning csv files into yet another format is pretty common.
> # I'm wondering
> # if Ruby already has a batch file library, something like
> # text-format.rb only
> # more useful?
>
> just in case you'd want to start anew, try getting some ideas fr
> 1. http://www.devsource.com/article2/0,1895,1928561,00.asp
> 2. http://fastercsv.rubyforge.org/
>
> kind regards -botp
>
. This
message and any attachments contain information from Union Pacific which may
be confidential and/or privileged.
If you are not the intended recipient, be aware that any disclosure,
copying, distribution or use of the contents of this message is strictly
prohibited by law. If you receive this message in error, please contact the
sender immediately and delete the message and any attachments.
FasterCSV was born from a discussion on Ruby Core about how we might speed up the CSV library. I provided some information out of the book Mastering Regular Expression, which claimed to have a single expression for parsing the format.
Some edge case that the expression didn't handle where raised, I fixed those, and that's pretty much FasterCSV's parser to this day. It's not too glamorous I guess, but I like how it shows what we can do when we work together.
James Edward Gray II
···
On Apr 17, 2007, at 9:31 PM, Stephen Smith wrote:
@James, Right on. I think you're correct - FasterCSV should handle the CSV
reading no problem. I think my first edition might handle writing to a flat
file based on an XML schema. I'm going to try to think about generalization
from the beginning, and see if that keeps my code easier to maintain and
use. When you wrote FasterCSV, how did you bake in the formatting rules? Did
you write an XML schema or something similar based on the CSV RFC?