Code:
"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"
Pattern:
<td.*?>.*?<\/td\s*>
I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.
Anyway, the regex works fine as a single-line. as soon as I introduce
this:
"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here
Best</td>"
it fails.
When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?
Thanks!
···
--
Posted via http://www.ruby-forum.com/.
<CODE>
s = '<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here
Best</td>'
puts "######\ns:"
puts s
r1 = /<td.*?>.*?<\/td.*?>/m
r2 = /<td.*?>(.*?)<\/td.*?>/m
puts "######\nscan with r1:"
puts s.scan(r1)
puts
puts "######\nmatch with r1:"
puts (s.match r1)[0]
puts
s =~ r1
puts "######\n=~ and $1 with r1:"
puts $1
puts
puts
puts
puts "######\nscan with r2:"
puts s.scan(r2)
puts
puts "######\nmatch with r2:"
puts (s.match r2)[0]
puts
s =~ r2
puts "######\n=~ and $1 with r2:"
puts $1
</CODE>
Hmm, I'm not sure if the regexp /<td[^>]*>.*?<\/td[^>]*>/m would be
more appropriate or not.
Todd
···
On Tue, Apr 8, 2008 at 11:21 AM, Gregg Yows <gregg@yows.net> wrote:
Code:
"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"
Pattern:
<td.*?>.*?<\/td\s*>
I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.
Anyway, the regex works fine as a single-line. as soon as I introduce
this:
"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here
Best</td>"
it fails.
When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?
Thanks!
Works for me: no match without /m, match with /m:
irb(main):004:0> s=%q{<td align="left" ><div style="width: 165px;
height: 175px;"><a
irb(main):005:0'
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
irb(main):006:0' something here Best</td>}
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):007:0> s[%r{<td.*?</td\s*>}]
=> nil
irb(main):008:0> s[%r{<td.*?</td\s*>}m]
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):009:0>
Cheers
robert
···
2008/4/8, Gregg Yows <gregg@yows.net>:
Code:
"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"
Pattern:
<td.*?>.*?<\/td\s*>
I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.
Anyway, the regex works fine as a single-line. as soon as I introduce
this:
"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here
Best</td>"
it fails.
When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?
--
use.inject do |as, often| as.you_can - without end
Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!
http://www.rubular.com/
···
--
Posted via http://www.ruby-forum.com/.
Why look so far? IRB serves the same purpose.
Cheers
robert
···
2008/4/10, Ransom Tullis <gregg@yows.net>:
Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!
--
use.inject do |as, often| as.you_can - without end