How to write this xpath?

there is an html file
<table>
<tr>
<td>ok
<strong>Sep 10</strong>

<a href="ttt">Oct 10</a>
<a href="kkk">Dec 10</a>

<table>
<tr>
<td>
123
</td>
<td>
567
</td>
</tr>
</table>
</td>
</tr>
</table>
when i open it with firefox,the output is :
ok Sep 10 | Oct 10 | Dec 10
123 567
what i want to get is
ok Sep 10 | Oct 10 | Dec 10
here is my codes
require 'rubygems'
require 'nokogiri'
web='/home/test'
doc = Nokogiri::HTML.parse(open(web))
data=doc.xpath('/html/body/table/tr/td')
puts data
i get
<td>ok
<strong>Sep 10</strong>

<a href="ttt">Oct 10</a>
<a href="kkk">Dec 10</a>

<table><tr>
<td>
123
</td>
<td>
567
</td>
</tr></table>
</td>
how can i get :
ok
<strong>Sep 10</strong>

···

<a href="ttt">Oct 10</a>
<a href="kkk">Dec 10</a>

--
Posted via http://www.ruby-forum.com/\.

You want the first row? Try

'/html/body/table/tr[1]/td'

See also
http://www.zvon.org/xxl/XPathTutorial/General/examples.html
http://www.w3schools.com/xpath/

Cheers

  robert

···

On 07.09.2010 07:07, Pen Ttt wrote:

there is an html file
<table>
<tr>
<td>ok
<strong>Sep 10</strong>
><a href="ttt">Oct 10</a>
><a href="kkk">Dec 10</a>
<table>
<tr>
<td>
123
</td>
<td>
567
</td>
</tr>
</table>
</td>
</tr>
</table>
when i open it with firefox,the output is :
ok Sep 10 | Oct 10 | Dec 10
123 567
what i want to get is
ok Sep 10 | Oct 10 | Dec 10
here is my codes
require 'rubygems'
require 'nokogiri'
web='/home/test'
doc = Nokogiri::HTML.parse(open(web))
data=doc.xpath('/html/body/table/tr/td')
puts data
i get
<td>ok
<strong>Sep 10</strong>
><a href="ttt">Oct 10</a>
><a href="kkk">Dec 10</a>
<table><tr>
<td>
123
</td>
<td>
567
</td>
</tr></table>
</td>
how can i get :
ok
<strong>Sep 10</strong>
><a href="ttt">Oct 10</a>
><a href="kkk">Dec 10</a>

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

think for your help,but your method can't work ,i have made a try.

···

--
Posted via http://www.ruby-forum.com/.

Ah, I see - it's more complicated. I think it should be one of these
depending on whether you want the text nodes:

/html/body/table[1]/tr[1]/td[1]/(.|strong|a)
/html/body/table[1]/tr[1]/td[1]/(.|strong|a)/text()

but I cannot test it right now since I don't have Nokogiri on this machine.

Kind regards

robert

···

On Tue, Sep 7, 2010 at 9:05 AM, Pen Ttt <myocean135@yahoo.cn> wrote:

think for your help,but your method can't work ,i have made a try.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Now this looks awful but apparently it works:

$doc.xpath '/html/body/table[1]/tr[1]/td[1]/text()|/html/body/table[1]/tr[1]/td[1]/strong/text()|/html/body/table[1]/tr[1]/td[1]/a/text()'

If you want the elements you need to remove all "/text()" from the above.

The difficult thing here is that you want to select only a portion of
the child nodes of /table/tr/td.

Kind regards

robert

···

On Tue, Sep 7, 2010 at 9:37 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

On Tue, Sep 7, 2010 at 9:05 AM, Pen Ttt <myocean135@yahoo.cn> wrote:

think for your help,but your method can't work ,i have made a try.

Ah, I see - it's more complicated. I think it should be one of these
depending on whether you want the text nodes:

/html/body/table[1]/tr[1]/td[1]/(.|strong|a)
/html/body/table[1]/tr[1]/td[1]/(.|strong|a)/text()

but I cannot test it right now since I don't have Nokogiri on this machine.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/