sorry for double posting, it seems there is no edit post feature...
I have a problem with HTML parsing issue. I'll try to explain my problem
as clear as I can, and I hope someone can help me with this.
I've been given a task to fetch a specific data from HTML page. I'm
planning to use hpricot plugin to do this.
It's an online shop page, and I have to fetch cloth size information.
The product information part of the page can be in either of these 2
formats:
<table>
<tr>
... Some informations ...
</tr>
<tr>
<td>Available in:</td>
</tr>
<tr>
<td>... (The data I want to fetch) ...</td>
</tr>
</table>
OR
<table>
<tr>
... Some informations ...
</tr>
<tr>
<td>... Content ...</td>
<td>Available in:</td>
</tr>
<tr>
<td>... Content ...</td>
<td>... (The data I want to fetch) ...</td>
</tr>
</table>
The clue is: The row whose data I want to fetch, is always preceeded by
a row containing a string "Available in".
And I want to fetch NOT the content of the row, BUT the content of the
last cell (<td>) contained inside the row.
It's complex, and I have no idea on what to do here. Can someone help me
with this?
Thanks for the concern.
PS: The table snippet I post above may be contained inside another
table.
Apparently, the online shop use tables to do page formatting..