Why does this:
text= "AA<X>BB<X>CC</X>DD</X>EE"
regex = %r{(.*)<X>(.*)}
t = text.sub( regex, "z" );
print "$1=#{$1}\n$2=#{$2}\n$3=#{$3}\n$4=#{$4}\n"
Return this:
$1=AA<X>BB
$2=CC</X>DD</X>EE
$3=
$4=
Instead of:
$1=AA
$2=BB<X>CC</X>DD</X>EE
$3=
$4=
And how would I fix it?
Paul
Why does this:
text= "AA<X>BB<X>CC</X>DD</X>EE"
regex = %r{(.*)<X>(.*)}
t = text.sub( regex, "z" );
print "$1=#{$1}\n$2=#{$2}\n$3=#{$3}\n$4=#{$4}\n"
Return this:
$1=AA<X>BB
$2=CC</X>DD</X>EE
$3=
$4=
Because the construct .* means, "Zero of more non-newline characters, but as many as I can get". We say the * operator is "greedy".
Instead of:
$1=AA
$2=BB<X>CC</X>DD</X>EE
$3=
$4=
And how would I fix it?
One way would be to switch from the greedy * to the conservative *?. That would have your Regexp looking like this:
%r{(.*?)<X>(.*)}
Another way is to use split() with a limit:
irb(main):001:0> text= "AA<X>BB<X>CC</X>DD</X>EE"
=> "AA<X>BB<X>CC</X>DD</X>EE"
irb(main):002:0> first, rest = text.split(/<X>/, 2)
=> ["AA", "BB<X>CC</X>DD</X>EE"]
irb(main):003:0> first
=> "AA"
irb(main):004:0> rest
=> "BB<X>CC</X>DD</X>EE"
Hope that helps.
James Edward Gray II
···
On Mar 31, 2005, at 2:49 PM, Paul Hanchett wrote:
Hi --
Why does this:
text= "AA<X>BB<X>CC</X>DD</X>EE"
regex = %r{(.*)<X>(.*)}
t = text.sub( regex, "z" );
print "$1=#{$1}\n$2=#{$2}\n$3=#{$3}\n$4=#{$4}\n"
Return this:
$1=AA<X>BB
$2=CC</X>DD</X>EE
$3=
$4=
Instead of:
$1=AA
$2=BB<X>CC</X>DD</X>EE
$3=
$4=
Because * is "greedy" -- meaning, it eats up as many characters as
possible, from left to right, while still allowing for a successful
match overall.
So your first .* eats up everything until it reaches as far right as
it possibly can -- namely, just before the second <X> (which it then
leaves intact so that it can be matched by the literal <X> in your
regex). It even eats up the first <X>.
And how would I fix it?
Use *? instead of * -- like this:
regex = %r{(.*?)<X>(.*)}
David
···
On Fri, 1 Apr 2005, Paul Hanchett wrote:
--
David A. Black
dblack@wobblini.net
* Paul Hanchett (Mar 31, 2005 23:00):
text= "AA<X>BB<X>CC</X>DD</X>EE"
regex = %r{(.*)<X>(.*)}
use
regex = %r{(.*?)<X>(.*)}
The .* will match the first <X> and will only relinquish the second so
that an overall match can be made (for the <X>-part of the regex),
nikolai
···
--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: minimalistic.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
Thanks all for the help. I understand better now.
Paul