Multi-line regular expression match question

Hi,

I have been trying to create a solid regular expression to match a
possible multi-line expression without success. So after several hours i
almost got there but not the point i would like, hoping somebody can
point me in the right direction.
Here is an example i am dealing with:

01xxxxxxxxxxxxxx

-:

<20>ABCD
<30>edfghi212
-|

-:

<20>EFGH
<30>hjkli3232
-|
89xxxxxxxxxxxxx

I need to match anything that is enclosed in between "|-:" and "-|"
So far i've got "/^\{\|-:$.*^-\|$/m" , this one is greedy, returning the
complete set, instead of each match, i just haven't figure out how to
make it reluctant enough to return one by one.

The returned matches expected must be something like this:

1:

-:

<20>ABCD
<30>edfghi212
-|

2:

-:

<20>EFGH
<30>hjkli3232
-|

Currently is returning:
1:

-:

<20>ABCD
<30>edfghi212
-|

-:

<20>EFGH
<30>hjkli3232
-|

Any suggestion is greatly appreciated.
And finally, any good regular expressions book ?? =)

Cheers,
guillermo

···

--
Posted via http://www.ruby-forum.com/\.

irb(main):018:0> s =<<EOF
irb(main):019:0" 01xxxxxxxxxxxxxx
irb(main):020:0" |-:
irb(main):021:0" <20>ABCD
irb(main):022:0" <30>edfghi212
irb(main):023:0" -|
irb(main):024:0" |-:
irb(main):025:0" <20>EFGH
irb(main):026:0" <30>hjkli3232
irb(main):027:0" -|
irb(main):028:0" 89xxxxxxxxxxxxx
irb(main):029:0" EOF
=> "01xxxxxxxxxxxxxx\n|-:\n<20>ABCD\n<30>edfghi212\n-|\n|-:\n<20>EFGH\n<30>hjkli3232\n-|\n89xxxxxxxxxxxxx\n"
irb(main):036:0> s.scan(/(\|-:.*?-\|)/m)
=> [["|-:\n<20>ABCD\n<30>edfghi212\n-|"], ["|-:\n<20>EFGH\n<30>hjkli3232\n-|"]]

Jesus.

···

On Fri, Nov 19, 2010 at 1:42 PM, Guillermo Riojas <guillermo.riojas@gmail.com> wrote:

Hi,

I have been trying to create a solid regular expression to match a
possible multi-line expression without success. So after several hours i
almost got there but not the point i would like, hoping somebody can
point me in the right direction.
Here is an example i am dealing with:

01xxxxxxxxxxxxxx
>-:
<20>ABCD
<30>edfghi212
-|
>-:
<20>EFGH
<30>hjkli3232
-|
89xxxxxxxxxxxxx

I need to match anything that is enclosed in between "|-:" and "-|"
So far i've got "/^\{\|-:$.*^-\|$/m" , this one is greedy, returning the
complete set, instead of each match, i just haven't figure out how to
make it reluctant enough to return one by one.

The returned matches expected must be something like this:

1:
>-:
<20>ABCD
<30>edfghi212
-|

2:
>-:
<20>EFGH
<30>hjkli3232
-|

Currently is returning:
1:

>-:
<20>ABCD
<30>edfghi212
-|
>-:
<20>EFGH
<30>hjkli3232
-|

Any suggestion is greatly appreciated.
And finally, any good regular expressions book ?? =)

I need to match anything that is enclosed in between "|-:" and "-|"
So far i've got "/^\{\|-:$.*^-\|$/m" , this one is greedy, returning the
complete set, instead of each match, i just haven't figure out how to
make it reluctant enough to return one by one.

If you're using ruby 1.9 you can do us that, *? is the reluctant version of *:

  /^\|-:$.*?^-\|$/m

Note that I removed the '\{' from your original pattern. It is not
needed in this case.

If you're on 1.8, one possibility is:

  /^\|-:$(?m:.*?)(?=^-\|)^-\|$/

But there are other, probably more efficient, ways to do it as well.

And finally, any good regular expressions book ?? =)

Jeffrey Friedl's Mastering Regular Expressions is an excellent read
and covers regular expressions inside and out, literally. Here's the
amazon link:

  Mastering Regular Expressions, 3rd Edition [Book]

HTH,
Ammar

···

On Fri, Nov 19, 2010 at 2:42 PM, Guillermo Riojas <guillermo.riojas@gmail.com> wrote:

If you're using ruby 1.9 you can do us that, *? is the reluctant version of *:

/^\|-:$.*?^-\|$/m

---8<---

If you're on 1.8, one possibility is:

/^\|-:$(?m:.*?)(?=^-\|)^-\|$/

I was under the impression that the reluctant versions of the four
quantifiers was only available under ruby 1.9, but they are apparently
available under 1.8 as well. I used it in the example I showed for 1.8
without noticing.

Regards,
Ammar

···

On Fri, Nov 19, 2010 at 3:50 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post
#962571:

<20>ABCD
complete set, instead of each match, i just haven't figure out how to
2:
<30>edfghi212
-|
>-:
<20>EFGH
<30>hjkli3232
-|

Any suggestion is greatly appreciated.
And finally, any good regular expressions book ?? =)

irb(main):018:0> s =<<EOF
irb(main):019:0" 01xxxxxxxxxxxxxx
irb(main):020:0" |-:
irb(main):021:0" <20>ABCD
irb(main):022:0" <30>edfghi212
irb(main):023:0" -|
irb(main):024:0" |-:
irb(main):025:0" <20>EFGH
irb(main):026:0" <30>hjkli3232
irb(main):027:0" -|
irb(main):028:0" 89xxxxxxxxxxxxx
irb(main):029:0" EOF
=>

"01xxxxxxxxxxxxxx\n|-:\n<20>ABCD\n<30>edfghi212\n-|\n|-:\n<20>EFGH\n<30>hjkli3232\n-|\n89xxxxxxxxxxxxx\n"

irb(main):036:0> s.scan(/(\|-:.*?-\|)/m)
=> [["|-:\n<20>ABCD\n<30>edfghi212\n-|"],
["|-:\n<20>EFGH\n<30>hjkli3232\n-|"]]

Jesus.

Muchas gracias =)
works like a charm
guillermo.

···

On Fri, Nov 19, 2010 at 1:42 PM, Guillermo Riojas > <guillermo.riojas@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Ammar Ali wrote in post #962576:

···

On Fri, Nov 19, 2010 at 3:50 PM, Ammar Ali <ammarabuali@gmail.com> > wrote:

If you're using ruby 1.9 you can do us that, *? is the reluctant version of *:

/^\|-:$.*?^-\|$/m

---8<---

If you're on 1.8, one possibility is:

/^\|-:$(?m:.*?)(?=^-\|)^-\|$/

I was under the impression that the reluctant versions of the four
quantifiers was only available under ruby 1.9, but they are apparently
available under 1.8 as well. I used it in the example I showed for 1.8
without noticing.

Regards,
Ammar

Thanks for clarification Ammar, it works perfectly, and also for the
book recommendation very useful , thanks a lot

guillermo

--
Posted via http://www.ruby-forum.com/\.