Select tr>3 with nokogiri

i want to get row which it contains more than 3 columns
how to write xpath with nokogiri

  require 'rubygems'
  require 'nokogiri'
  item='sometext'
  doc = Nokogiri::HTML.parse(open(item))
  data=doc.xpath('/html/body/table/tr[@td.size>3]')
  puts data
  it can not run , help and advices appreciated.

···

--
Posted via http://www.ruby-forum.com/.

for example,
table1:
<table >
<tr>
<td>kk</td>
</tr>
<tr>
<td > 1 </td>
<td > 2 </td>
</tr>
<tr>
<td > 3 </td>
<td > 4 </td>
</tr>
<tr>
<td>qq</td>
</tr>
</table>

table2:
<table >
<tr>
<td>kk</td>
</tr>
<tr>
<td > 1 </td>
<td > 2 </td>
</tr>
<tr>
<td > 3 </td>
<td > 4 </td>
</tr>
</table>

i want to get table2 from table1,to get row which contains more then
one column,how to do it with nokogiri??

···

--
Posted via http://www.ruby-forum.com/.

doc.xpath('/html/body/table/tr[count(td)>3]')

···

On Fri, 27 Aug 2010 23:26:53 +0900, Pen Ttt wrote:

i want to get row which it contains more than 3 columns how to write
xpath with nokogiri

  require 'rubygems'
  require 'nokogiri'
  item='sometext'
  doc = Nokogiri::HTML.parse(open(item))
  data=doc.xpath('/html/body/table/tr[@td.size>3]') puts data
  it can not run , help and advices appreciated.

--
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Use count(), like:

  document.xpath("//*[count(td)=2]")

You can also select children at certain offsets with td:nth-child(N)
or position(N)

HTH,
Ammar

···

On Sat, Aug 28, 2010 at 10:17 AM, Pen Ttt <myocean135@yahoo.cn> wrote:

for example,
table1:
<table >
<tr>
<td>kk</td>
</tr>
<tr>
<td > 1 </td>
<td > 2 </td>
</tr>
<tr>
<td > 3 </td>
<td > 4 </td>
</tr>
<tr>
<td>qq</td>
</tr>
</table>

table2:
<table >
<tr>
<td>kk</td>
</tr>
<tr>
<td > 1 </td>
<td > 2 </td>
</tr>
<tr>
<td > 3 </td>
<td > 4 </td>
</tr>
</table>

i want to get table2 from table1,to get row which contains more then
one column,how to do it with nokogiri??

--
Posted via http://www.ruby-forum.com/\.

i found some secret,if my file /home/pt/mytest was changed into:
<table>
<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
  <td align="right" width="65" class="tickerSm">10/31/09</td>
  <td align="right" width="65" class="tickerSm">10/31/08</td>
  <td align="right" width="65" class="tickerSm">10/31/07</td>
  <td align="right" width="65" class="tickerSm">10/31/06</td>
  <td align="right" width="65" class="tickerSm">10/31/05</td>
  </tr>
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
  <td align="right" class="ticker">2,493</td>
  <td align="right" class="ticker">1,429</td>
  <td align="right" class="ticker">1,826</td>
  <td align="right" class="ticker">2,262</td>
  <td align="right" class="ticker">2,251</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
  <td align="right" class="ticker">595</td>
  <td align="right" class="ticker">770</td>
  <td align="right" class="ticker">735</td>
  <td align="right" class="ticker">692</td>
  <td align="right" class="ticker">753</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
  <td align="right" class="ticker">552</td>
  <td align="right" class="ticker">646</td>
  <td align="right" class="ticker">643</td>
  <td align="right" class="ticker">627</td>
  <td align="right" class="ticker">722</td>
  </tr>
<table>

with the code ,
  require 'rubygems'
  require 'nokogiri'
  doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
  result=doc.xpath('//table/tr[*[not(@class="tickerSm")]]')
  puts result

what i can get is:
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
  <td align="right" class="ticker">2,493</td>
  <td align="right" class="ticker">1,429</td>
  <td align="right" class="ticker">1,826</td>
  <td align="right" class="ticker">2,262</td>
  <td align="right" class="ticker">2,251</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
  <td align="right" class="ticker">595</td>
  <td align="right" class="ticker">770</td>
  <td align="right" class="ticker">735</td>
  <td align="right" class="ticker">692</td>
  <td align="right" class="ticker">753</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
  <td align="right" class="ticker">552</td>
  <td align="right" class="ticker">646</td>
  <td align="right" class="ticker">643</td>
  <td align="right" class="ticker">627</td>
  <td align="right" class="ticker">722</td>
  </tr>

the row can not be selected by my code,
<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
  <td align="right" width="65" class="tickerSm">10/31/09</td>
  <td align="right" width="65" class="tickerSm">10/31/08</td>
  <td align="right" width="65" class="tickerSm">10/31/07</td>
  <td align="right" width="65" class="tickerSm">10/31/06</td>
  <td align="right" width="65" class="tickerSm">10/31/05</td>
  </tr>
<tr bgcolor="ffffff">

but how to delete row with xpath?

<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
  <td align="right" class="ticker">2,493</td>
  <td align="right" class="ticker">1,429</td>
  <td align="right" class="ticker">1,826</td>
  <td align="right" class="ticker">2,262</td>
  <td align="right" class="ticker">2,251</td>
  </tr>
it can't work :
xpath('//table/tr[*[not(@class="tickerSm")]]')
maybe the reason is : some class of td is "ticker",another is
"tickerSm",
if i don't want to select it with xpath,how to express it with xpath??

···

--
Posted via http://www.ruby-forum.com/.

p1
data=doc.xpath('/table/tr/*[count(td)>1]')
  puts data
p2
data=doc.xpath('/table/tr/td[count(td)>1]')
  puts data
none of them is right,why can i get nothing?

···

--
Posted via http://www.ruby-forum.com/.

xpath('//table/tr[*[not(@class="tickerSm")]]')
maybe the reason is : some class of td is "ticker",another is
"tickerSm",
if i don't want to select it with xpath,how to express it with xpath??

Hi Pen,

I don't know if "not" is valid like that, I have to double check. But
you can use "!=" with attributes.

  doc.xpath('//table/tr/*[@class!="tickerSm"]')

I hope it helps,
Ammar

document.xpath("//*[count(td)=2]") is right,but i want to know
p1
data=doc.xpath('/table/tr/*[count(td)>1]')
  puts data
p2
data=doc.xpath('/table/tr/td[count(td)>1]')
  puts data
how to fix p1\p2?

···

--
Posted via http://www.ruby-forum.com/.

i found they are equal between not and != in nokogiri xpath
expression.
there is still one problem remain,if my html is the following:

<table>
<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
  <td align="right" width="65" class="tickerSm">10/31/09</td>
  <td align="right" width="65" class="tickerSm">10/31/08</td>
  <td align="right" width="65" class="tickerSm">10/31/07</td>
  <td align="right" width="65" class="tickerSm">10/31/06</td>
  <td align="right" width="65" class="tickerSm">10/31/05</td>
  </tr>
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
  <td align="right" class="ticker">2,493</td>
  <td align="right" class="ticker">1,429</td>
  <td align="right" class="ticker">1,826</td>
  <td align="right" class="ticker">2,262</td>
  <td align="right" class="ticker">2,251</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
  <td align="right" class="ticker">595</td>
  <td align="right" class="ticker">770</td>
  <td align="right" class="ticker">735</td>
  <td align="right" class="ticker">692</td>
  <td align="right" class="ticker">753</td>
  </tr>
</table>

xpath('//table/tr[td[@class="tickerSm"]') get :

<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
  <td align="right" width="65" class="tickerSm">10/31/09</td>
  <td align="right" width="65" class="tickerSm">10/31/08</td>
  <td align="right" width="65" class="tickerSm">10/31/07</td>
  <td align="right" width="65" class="tickerSm">10/31/06</td>
  <td align="right" width="65" class="tickerSm">10/31/05</td>
  </tr>
<tr bgcolor="ffffff">

xpath('//table/tr[td[@class="ticker"]') get :

<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
  <td align="right" class="ticker">595</td>
  <td align="right" class="ticker">770</td>
  <td align="right" class="ticker">735</td>
  <td align="right" class="ticker">692</td>
  <td align="right" class="ticker">753</td>
  </tr>

but how can i get the following with xpath expression?
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
  <td align="right" class="ticker">2,493</td>
  <td align="right" class="ticker">1,429</td>
  <td align="right" class="ticker">1,826</td>
  <td align="right" class="ticker">2,262</td>
  <td align="right" class="ticker">2,251</td>
  </tr>

···

--
Posted via http://www.ruby-forum.com/.

If the table is not the root or directly inside the root, you need 2
"/" in the beginning. The count function applies to the tr, not the
td, so you don't need the "*" in p1, or the td in p2. Try this:

  doc.xpath('//table/tr[count(td)>1]')

Good Luck,
Ammar

···

On Sat, Aug 28, 2010 at 3:33 PM, Pen Ttt <myocean135@yahoo.cn> wrote:

document.xpath("//*[count(td)=2]") is right,but i want to know
p1
data=doc.xpath('/table/tr/*[count(td)>1]')
puts data
p2
data=doc.xpath('/table/tr/td[count(td)>1]')
puts data
how to fix p1\p2?
--
Posted via http://www.ruby-forum.com/\.

a friend tell me,
//table/tr[td[1][@class="tickerSm"] and td[2][@class="ticker"]]
it is ok

···

--
Posted via http://www.ruby-forum.com/.

think Ammar ,one problem vanish,another occur.
here is the content of /home/pt/mytest:

<table>
<tr bgcolor="F3F3F3">
<td align="right" width="240">reportdate</td>
  <td align="right" width="65" class="tickerSm">10/31/09</td>
  <td align="right" width="65" class="tickerSm">10/31/08</td>
  <td align="right" width="65" class="tickerSm">10/31/07</td>
  <td align="right" width="65" class="tickerSm">10/31/06</td>
  <td align="right" width="65" class="tickerSm">10/31/05</td>
  </tr>
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
  <td align="right" class="ticker">2,493</td>
  <td align="right" class="ticker">1,429</td>
  <td align="right" class="ticker">1,826</td>
  <td align="right" class="ticker">2,262</td>
  <td align="right" class="ticker">2,251</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
  <td align="right" class="ticker">595</td>
  <td align="right" class="ticker">770</td>
  <td align="right" class="ticker">735</td>
  <td align="right" class="ticker">692</td>
  <td align="right" class="ticker">753</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
  <td align="right" class="ticker">552</td>
  <td align="right" class="ticker">646</td>
  <td align="right" class="ticker">643</td>
  <td align="right" class="ticker">627</td>
  <td align="right" class="ticker">722</td>
  </tr>
<table>

what i want to get is :
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
  <td align="right" class="ticker">595</td>
  <td align="right" class="ticker">770</td>
  <td align="right" class="ticker">735</td>
  <td align="right" class="ticker">692</td>
  <td align="right" class="ticker">753</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  <td align="right" class="ticker">0</td>
  </tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
  <td align="right" class="ticker">552</td>
  <td align="right" class="ticker">646</td>
  <td align="right" class="ticker">643</td>
  <td align="right" class="ticker">627</td>
  <td align="right" class="ticker">722</td>
  </tr>

  p1:
  require 'rubygems'
  require 'nokogiri'
  doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
  result=doc.xpath('//table/tr[td[@class="ticker"]]')
  puts result

  i can get what i want with p1

  p2:
  require 'rubygems'
  require 'nokogiri'
  doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
  result=doc.xpath('//table/tr[td[not(@class="tickerSm")]]')
  puts result

  why can't i get what i want with p2??
  how to fix p2?
  think for your help.

···

--
Posted via http://www.ruby-forum.com/.

That's good. Another possible approach is using following-sibling, if
you don't want the first td[@class="tickerSm"]

//table/tr/td[1][@class="tickerSm"]/following-sibling::td[@class!="tickerSm"]

Ammar

···

On Sun, Aug 29, 2010 at 9:40 AM, Pen Ttt <myocean135@yahoo.cn> wrote:

a friend tell me,
//table/tr[td[1][@class="tickerSm"] and td[2][@class="ticker"]]
it is ok