Find all possible container nodes of a given text

hi,
I've been using ruby for a few weeks and been messing with DOM-like programming
using hpricot.
I'm looking for an efficient (rubyiish) way to find the set of Node Elements
that have a text_element child containing a given string

for instance given the html code

<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is a
dog</td></tr></table></body></html>

I would like to set this query

getNodesContaining("dog")

that would return an Array with the xpath of the first and third td (since they
contain the text dog)

thanks in advance

Sylvain

···

--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
         -- Daniel Pennac

well I think I solved it

#!/usr/bin/ruby
require 'hpricot'

html = <<EOS
<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is
adog</td></tr></table></body></html>
EOS

doc = Hpricot(html)
result=
doc.traverse_text do |text|
  text_out = text.to_s.strip
  if text_out =~ /dog/
    result << text.parent.xpath
  end
end

thanks anyway

Sylvain

Selon Sylvain Tenier <sylvain.tenier@loria.fr>:

···

hi,
I've been using ruby for a few weeks and been messing with DOM-like
programming
using hpricot.
I'm looking for an efficient (rubyiish) way to find the set of Node Elements
that have a text_element child containing a given string

for instance given the html code

<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is
a
dog</td></tr></table></body></html>

I would like to set this query

getNodesContaining("dog")

that would return an Array with the xpath of the first and third td (since
they
contain the text dog)

thanks in advance

Sylvain

--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
         -- Daniel Pennac

I would like to set this query

getNodesContaining("dog")

(Hpricot(html)/"//td").map.reject{ |node| node.inner_text !~ /dog/ }

Cheers,
Peter

···

__
http://www.rubyrailways.com

your code assumes that the text is contained in a leaf that is child of a td
node. What I'm looking for is the direct parent of any leaf containing the
string.
For instance, in your code, if I replace td by tr I get

[{elem <tr> {elem <td> {text "harr is a dog"} </td>} {elem <td> {text "fuu is a
cat"} </td>} {elem <td> {text "jii is adog"} </td>} </tr>}]

I don't want tr to be returned, since it is an ascendant, not a direct parent

sorry if I wasn't clear enough in my question

Sylvain

···

--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
         -- Daniel Pennac

Selon Peter Szinek <peter@rubyrailways.com>:

>> I would like to set this query
>>
>> getNodesContaining("dog")

(Hpricot(html)/"//td").map.reject{ |node| node.inner_text !~ /dog/ }

Cheers,
Peter

__
http://www.rubyrailways.com