Surprising Regexp Behavior

What's the bug to you? The fact that the second set of <p></p> wasn't
stripped or the fact that $2 is empty?

In the former, sub != gsub. In the latter, you need multi-line mode
because of the "\n\n":

# Without /m
irb(main):026:0> html =~ /<p>(.*?)<\/p>(.*)/
=> 0
irb(main):027:0> $1
=> "one"
irb(main):028:0> $2
=> ""

# With /m
irb(main):023:0> html =~ /<p>(.*?)<\/p>(.*)/m
=> 0
irb(main):024:0> $1
=> "one"
irb(main):025:0> $2
=> "\n\n<p>two</p>"

Regards,

Dan

···

-----Original Message-----
From: James Edward Gray II [mailto:james@grayproductions.net]
Sent: Tuesday, September 13, 2005 12:31 PM
To: ruby-talk ML
Subject: Surprising Regexp Behavior

I keep running into some surprising points with Ruby's Regexp engine
today and this first one just looks plain wrong to me:

irb(main):001:0> html = "<p>one</p>\n\n<p>two</p>"
=> "<p>one</p>\n\n<p>two</p>"
irb(main):002:0> html.sub!(/<p>(.*?)<\/p>(.*)/) { $1.strip }
=> "one\n\n<p>two</p>"
irb(main):003:0> $2
=> ""

Can anyone explain to me how that isn't a bug?

Yep, that's what I was forgetting. Thanks for the lesson.

James Edward Gray II

···

On Sep 13, 2005, at 1:46 PM, Berger, Daniel wrote:

In the former, sub != gsub. In the latter, you need multi-line mode
because of the "\n\n":

# Without /m
irb(main):026:0> html =~ /<p>(.*?)<\/p>(.*)/
=> 0
irb(main):027:0> $1
=> "one"
irb(main):028:0> $2
=> ""

# With /m
irb(main):023:0> html =~ /<p>(.*?)<\/p>(.*)/m
=> 0
irb(main):024:0> $1
=> "one"
irb(main):025:0> $2
=> "\n\n<p>two</p>"

thank dave thomas - the pickaxe (html version I) is always open in my browser
- but far the most oft used page is the one on regex syntax. it just happend
to be open :wink:

-a

···

On Wed, 14 Sep 2005, James Edward Gray II wrote:

On Sep 13, 2005, at 1:46 PM, Berger, Daniel wrote:

In the former, sub != gsub. In the latter, you need multi-line mode
because of the "\n\n":

# Without /m
irb(main):026:0> html =~ /<p>(.*?)<\/p>(.*)/
=> 0
irb(main):027:0> $1
=> "one"
irb(main):028:0> $2
=> ""

# With /m
irb(main):023:0> html =~ /<p>(.*?)<\/p>(.*)/m
=> 0
irb(main):024:0> $1
=> "one"
irb(main):025:0> $2
=> "\n\n<p>two</p>"

Yep, that's what I was forgetting. Thanks for the lesson.

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================