Hi,
The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):
irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”
The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?
Is there a better alternative to doing url parsing by hand?
Thanks
···
–
tom@alkali.spamfree.org
remove ‘spamfree.’ to respond
Hello –
Hi,
The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):
irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”
The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?
You’re missing the notion of a leftmost match. The regex engine reads
from left to right, so to speak, in looking for the ‘/’. It finds it
in the sixth character. Then it does what you ask: namely, look for
‘.html’ at the end of the line.
To do what you were trying to do, try this:
“http://www.x.com/y/z.html”.sub(%r|/[^/]+/?.html$|, ‘’)
“http://www.x.com/y”
That also finds the leftmost match – but in this case, the leftmost
match doesn’t start until the last ‘/’ (because none of the other
'/'s, even though they’re further left, allow the rest of the match to
succeed).
David
···
On Tue, 13 Aug 2002, Tom Robinson wrote:
–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav
irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/[^/]+.html$|,‘’)
“http://www.x.com/y”
···
On Tue, Aug 13, 2002 at 03:28:26AM +0900, Tom Robinson wrote:
Hi,
The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):
irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”
The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?
Is there a better alternative to doing url parsing by hand?
–
_ _
__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_
_ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com
Never trust an operating system you don’t have sources for. 
– Unknown source