Ruby multiline regex problem

Gregg_Yows · 8 April 2008 16:21

Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"

Pattern:

<td.*?>.*?<\/td\s*>

I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

Thanks!

···

--
Posted via http://www.ruby-forum.com/.

Todd_Benson · 8 April 2008 18:43

<CODE>

s = '<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>'

puts "######\ns:"
puts s

r1 = /<td.*?>.*?<\/td.*?>/m
r2 = /<td.*?>(.*?)<\/td.*?>/m

puts "######\nscan with r1:"
puts s.scan(r1)
puts
puts "######\nmatch with r1:"
puts (s.match r1)[0]
puts

s =~ r1
puts "######\n=~ and $1 with r1:"
puts $1

puts
puts
puts

puts "######\nscan with r2:"
puts s.scan(r2)
puts
puts "######\nmatch with r2:"
puts (s.match r2)[0]
puts

s =~ r2
puts "######\n=~ and $1 with r2:"
puts $1

</CODE>

Hmm, I'm not sure if the regexp /<td[^>]*>.*?<\/td[^>]*>/m would be
more appropriate or not.

Todd

···

On Tue, Apr 8, 2008 at 11:21 AM, Gregg Yows <gregg@yows.net> wrote:

Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"

Pattern:

<td.*?>.*?<\/td\s*>

I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

Thanks!

Robert_K1 · 9 April 2008 13:17

Works for me: no match without /m, match with /m:

irb(main):004:0> s=%q{<td align="left" ><div style="width: 165px;
height: 175px;"><a
irb(main):005:0'
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
irb(main):006:0' something here Best</td>}
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):007:0> s[%r{<td.*?</td\s*>}]
=> nil
irb(main):008:0> s[%r{<td.*?</td\s*>}m]
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):009:0>

Cheers

robert

···

2008/4/8, Gregg Yows <gregg@yows.net>:

Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"

Pattern:

<td.*?>.*?<\/td\s*>

I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

--
use.inject do |as, often| as.you_can - without end

Gregg_Yows · 10 April 2008 02:52

Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!

http://www.rubular.com/

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 10 April 2008 08:42

Why look so far? IRB serves the same purpose.

Cheers

robert

···

2008/4/10, Ransom Tullis <gregg@yows.net>:

Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!

--
use.inject do |as, often| as.you_can - without end

Gregg_Yows · 10 April 2008 12:33

Robert Klemme wrote:

···

2008/4/10, Ransom Tullis <gregg@yows.net>:

Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!

Why look so far? IRB serves the same purpose.

Cheers

robert

I'm a newb with Ruby and IRB. I did test the regex in IRB, but did not
know that I could set a literal string up with \n characters like you
did above through the interface. So, of course, it was passing
everytime. That is very cool! I am growing fonder of IRB every day...
--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Regular expression mismatch? ruby-talk	12	100	8 April 2005
Matching multiple line reg exp ruby-talk	3	158	23 November 2010
Rexeg help ruby-talk	0	66	7 March 2005
Surprising Regexp Behavior ruby-talk	0	86	13 September 2005
Regex: multiline ruby-talk	7	120	11 September 2009

Ruby multiline regex problem

Related topics