Hi Carlos,
Carlos wrote:
Hi. The problem is that you can very easily NOT match "three" even if it's
there. I mean, if you have
/1.*3?.*5/ =~ '12345'
the engine can succeed matching the 1 at the beginning, the 5 at the end,
and trying to match the 3 where the 4 is... and failing, but since it's
optional, the overall match succeeds.
I think you should try it in two steps: first, try to match with the 3; if
that fails, without the "3". Something like:
/(?:(1).*(3)|(1)).*(5)/
(The '1' will come either on the first or third array position, you'll have
to take care of that.)
Yes, you understand exactly what my problem is.
Actually I guessed it as you said even if I couldn't explain it as well
as you did.
The solution I found was using 2 regexes.
First, I try to find a match assuming "three" is there.
If it fails, I try to find a match without "three".
This solved my problem.
But I wanted to know that if there's a one-shot solution.
This is the actual problem, just in case someone wants to know.
html = <<END
<tr id="row2_210819526">
<td class="year">
<h5>2004 Used</h5></td>
<td class="carlink"><h5>
<a name="210819526" href="210819526.html">BMW 325Ci
Coupe</a><br />
</h5></td>
<td class="mileage">
<span class="body20">38,604<br /></span><span
class="body30">Mileage</span></td>
<td class="price">
<span class="body20">
$24,995
<br />
</span>
<span class="body30">Price</span>
</td>
<td class="distanceFromZip">
<div class="zip">
<span class="body20">0 mi<br /></span><span
class="body30">from ZIP</span>
</div></td>
<td class="productTileCell" rowspan="2" valign="top">
<div class="srlProductContainer">
</div>
</td>
</tr>
<tr id="row3_210819526">
<td class="left">
<a href=210819526.html><img
src="http://images.autotrader.com/images/2006/10/16/210/819/1092478286.210819526.IM1.MAIN.60x45_A.60x45.jpg"
border="0" bordercolor="#000000" width="60" height="45"></a>
<div class="body40" style="padding-bottom:3px">
<img
src="Autotrader - page unavailable;
alt="Actual Photo Available" width="17" height="17" border="0"
/> 9 Photos
<br />
</div>
<img src="Autotrader - page unavailable;
width="60" height="1" /></td>
<td class="center" colspan="2">
<div class="centerinfo">
<p class="color body20">Color - Mystic Blue
Metallic</p>
<p class="description">Dark Blue/Beige, Premium Pkg,
Xenon Light, Single Compact Disc, Dual Power Seats, Memory Seat, Still
under Free BMW Maintenance and 4yr/50k Factory...</p>
<p class="vin">VIN WBABV13454JT20104</p>
<div class="body40" style="padding-top:5px;"><a
name="210819526" href="210819526.html">View Car Details</a><br /></div>
</div></td>
<td> </td>
<td valign="top" class="right body30">
<p class="dealername">
<a name="210819526" href="210819526.html">null</a>
<br />
</p>
<br />
</td>
</tr>
END
def parse_row row
m = row.scan(/.+?<h5>(\d{4}) Used<\/h5>.+?<h5>.+?<a name=\"\d+\"
href=\"(\d+.html)\">(.+?)<\/a><br \/>.+?<\/h5>.+?<span
class=\"body20\">([0-9,]+)<br \/><\/span><span
class=\"body30\">Mileage<\/span>.+?(\$[0-9,]+).+?(http:\/\/[^\"]+?\.jpg).+?Color
- (.+?)<\/p>/m)
if m[0].nil?
m = row.scan(/.+?<h5>(\d{4}) Used<\/h5>.+?<h5>.+?<a name=\"\d+\"
href=\"(\d+.html)\">(.+?)<\/a><br \/>.+?<\/h5>.+?<span
class=\"body20\">([0-9,]+)<br \/><\/span><span
class=\"body30\">Mileage<\/span>.+?(\$[0-9,]+).+?(http:\/\/[^\"]+?\.jpg)?.+?Color
- (.+?)<\/p>/m)
end
m[0]
end
p parse_row(html)
Sorry about the messy code.
Thanks.
Sam