hi,
I've been using ruby for a few weeks and been messing with DOM-like programming
using hpricot.
I'm looking for an efficient (rubyiish) way to find the set of Node Elements
that have a text_element child containing a given string
for instance given the html code
<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is a
dog</td></tr></table></body></html>
I would like to set this query
getNodesContaining("dog")
that would return an Array with the xpath of the first and third td (since they
contain the text dog)
thanks in advance
Sylvain
···
--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
-- Daniel Pennac
well I think I solved it
#!/usr/bin/ruby
require 'hpricot'
html = <<EOS
<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is
adog</td></tr></table></body></html>
EOS
doc = Hpricot(html)
result=
doc.traverse_text do |text|
text_out = text.to_s.strip
if text_out =~ /dog/
result << text.parent.xpath
end
end
thanks anyway
Sylvain
Selon Sylvain Tenier <sylvain.tenier@loria.fr>:
···
hi,
I've been using ruby for a few weeks and been messing with DOM-like
programming
using hpricot.
I'm looking for an efficient (rubyiish) way to find the set of Node Elements
that have a text_element child containing a given string
for instance given the html code
<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is
a
dog</td></tr></table></body></html>
I would like to set this query
getNodesContaining("dog")
that would return an Array with the xpath of the first and third td (since
they
contain the text dog)
thanks in advance
Sylvain
--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
-- Daniel Pennac
I would like to set this query
getNodesContaining("dog")
(Hpricot(html)/"//td").map.reject{ |node| node.inner_text !~ /dog/ }
Cheers,
Peter
···
__
http://www.rubyrailways.com
your code assumes that the text is contained in a leaf that is child of a td
node. What I'm looking for is the direct parent of any leaf containing the
string.
For instance, in your code, if I replace td by tr I get
[{elem <tr> {elem <td> {text "harr is a dog"} </td>} {elem <td> {text "fuu is a
cat"} </td>} {elem <td> {text "jii is adog"} </td>} </tr>}]
I don't want tr to be returned, since it is an ascendant, not a direct parent
sorry if I wasn't clear enough in my question
Sylvain
···
--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
-- Daniel Pennac
Selon Peter Szinek <peter@rubyrailways.com>:
>> I would like to set this query
>>
>> getNodesContaining("dog")
(Hpricot(html)/"//td").map.reject{ |node| node.inner_text !~ /dog/ }
Cheers,
Peter
__
http://www.rubyrailways.com