Non-greedy regexp

Hi,

The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”

The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?

Is there a better alternative to doing url parsing by hand?

Thanks

···


tom@alkali.spamfree.org
remove ‘spamfree.’ to respond

Hello –

Hi,

The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”

The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?

You’re missing the notion of a leftmost match. The regex engine reads
from left to right, so to speak, in looking for the ‘/’. It finds it
in the sixth character. Then it does what you ask: namely, look for
‘.html’ at the end of the line.

To do what you were trying to do, try this:

http://www.x.com/y/z.html”.sub(%r|/[^/]+/?.html$|, ‘’)
http://www.x.com/y

That also finds the leftmost match – but in this case, the leftmost
match doesn’t start until the last ‘/’ (because none of the other
'/'s, even though they’re further left, allow the rest of the match to
succeed).

David

···

On Tue, 13 Aug 2002, Tom Robinson wrote:


David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/[^/]+.html$|,‘’)
http://www.x.com/y

···

On Tue, Aug 13, 2002 at 03:28:26AM +0900, Tom Robinson wrote:

Hi,

The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”

The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?

Is there a better alternative to doing url parsing by hand?


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Never trust an operating system you don’t have sources for. :wink:
– Unknown source

Hi –

···

On Tue, 13 Aug 2002 dblack@candle.superlink.net wrote:

To do what you were trying to do, try this:

http://www.x.com/y/z.html”.sub(%r|/[^/]+/?.html$|, ‘’)
http://www.x.com/y

Whoops, having seen Mauricio’s I now see a meaningless “/?” has
slipped into mine :slight_smile:

David


David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav