i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....
..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)
so what is the non-greedy way for the above regexp to properly match
each invoice...
.*? has some pecularities, that were discussed here some time ago, so
perhaps you'd want to find them in the archives. (search for 'greedy'
or 'regex' - I don't remeber now)
···
On 11/9/06, Dave Rose <bitdoger2@yahoo.com> wrote:
i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....
..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)
so what is the non-greedy way for the above regexp to properly match
each invoice...
i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....
..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)
so what is the non-greedy way for the above regexp to properly match
each invoice...
try:
/(^BillHead(.*?))(^Bill_End(.*?))\n/m
or
/(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
notice the .*? instead of .*
*? has some pecularities, that were discussed here some time ago, so
perhaps you'd want to find them in the archives. (search for 'greedy'
or 'regex' - I don't remeber now)
I would also remove the last .* because that likely eats up the rest of the document. So that would be
i played around in irb with a shorten extract file and found that:
b=File.read("drbilp.txt").scan(/(^BillHead(.*?))(^Bill_End(\d*)(\s*UBPBILP1\n)(.*?))/m)
works in that separates each invoice in an sub-array of size=6
in which b[0]+b[2] completes that task of reading,scanning
correctly
and puting all in a ruby 'container' that i can do an each on....thanx
dave