One thing I find myself doing over and over is parsing some type of text file and making something sensible out of it.
As yet, I haven't found a good solution (I'm probably missing some gems or modules someone has already created). I think that, though BNF is fine for writing compilers, it's a little more complex than need be for parsing files. I've recently created a solution (with thanks to the author of OptionParser for the modus operandi). The following is part of a routine for parsing some really horrid SAP R/3 files:
# horrid file
VERSION
100
DEBUG
0
SYSTEM
PRD
HOSTNAME
198.203.4.202
SYSNUM
01
...
CONTAINER_ELEMENT_INFO
BASEUNITOFMEASURE 000002003C
CONTAINER_ELEMENT_VALUE
EA
CONTAINER_ELEMENT_INFO
CAUSECODEGROUP 000000008C
CONTAINER_ELEMENT_VALUE
CES
# Parsing routine using my parser module
DOTTED_QUAD_RE = '(\d+)\.(\d+)\.(\d+)\.(\d+)'
...
@wrk = RegexpParser.new
@wrk.on( /VERSION\n(\d+)\n/m ) { |version| @version = version }
@wrk.on( /DEBUG\n(\d+)\n/m ) { |debug| @debug = debug }
@wrk.on( /SYSTEM\n([^\n]+)\n/m ) { |system| @system = system }
@wrk.on( /HOSTNAME\n#{DOTTED_QUAD_RE}\n/m ) { |hostname|
@hostname = hostname
}
...
@wrk.on(
/CONTAINER_ELEMENT_INFO\n
([^\s]+) (?# element name )
\s+
(\d{4}) (?# unknown digits )
(\d{2}) (?# index number)
(\d{3}) (?# field width )
C\n (?# trailing literal C )
CONTAINER_ELEMENT_VALUE\n
([^\n]+)\n (?# value of the element )
/mx
) { |name, unknown, index, width, value|
case name
when /REQUIREDSTARTDATE/, /REQUIREDEND/, /DATESENT/,
/CONTRACTSTARTDATE/, /CONTRACTENDDATE/
value = yyyymmdd_to_datetime(value)
...
if @elements[name].nil?
@elements[name] = []
end
@elements[name][index] = value
}
Is there a better way to do this?
Should I share my parser with others (via RubyForge or the like)?
Thanks,
JJ
···
---
Help everyone. If you can't do that, then at least be nice.