Nested loop for xpath

7stud2 · 5 April 2013 05:57

Rails 1.9.3

I am scraping a web page,

using Nokogiri and Xpath

I want all the names of the birds listed there.

xpath1 = '//div[@id='bodyContent']/div[@id='mw-content-text']/ul[1]'

should get a group of birds

Great Tinamou
Andean Tinamou
Elegant Crested Tinamou
Little Tinamou
Slaty-breasted Tinamou
Thicket Tinamou

I need to get each one of these names and tried nested loop

xpath1 = '//div[@id='bodyContent']/div[@id='mw-content-text']/ul'
doc = Nokogiri::HTML(open(url))
categories = doc.xpath(xpath1)

categories.each do | c |
    c.xpath('/li').each do | n |
         p n.text
    end
end

gives empty values.
Can anyone tell why? or are there better ways?

soichi

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 5 April 2013 07:08

Rails 1.9.3

I am scraping a web page,
List of birds by common name - Wikipedia

using Nokogiri and Xpath

I want all the names of the birds listed there.

xpath1 = '//div[@id='bodyContent']/div[@id='mw-content-text']/ul[1]'

That does not work. You need to escape single quotes or use double quotes.

should get a group of birds

Great Tinamou
Andean Tinamou
Elegant Crested Tinamou
Little Tinamou
Slaty-breasted Tinamou
Thicket Tinamou

I need to get each one of these names and tried nested loop

xpath1 = '//div[@id='bodyContent']/div[@id='mw-content-text']/ul'
doc = Nokogiri::HTML(open(url))
categories = doc.xpath(xpath1)

categories.each do | c |
    c.xpath('/li').each do | n |
         p n.text
    end
end

gives empty values.
Can anyone tell why?

Yes, your XPath in the nested loop searches a <li> at _the top of the
document_ because you prefix with "/". You would need

or are there better ways?

Yes. Having a loop here does not really make sense since that can be
solved by the XPath.

irb(main):009:0> puts
dom.xpath('//div[@id="bodyContent"]/div[@id="mw-content-text"]/ul[1]//a/text()').map(&:to_s)
Great Tinamou
Andean Tinamou
Elegant Crested Tinamou
Little Tinamou
Slaty-breasted Tinamou
Thicket Tinamou

Note, why #to_s is necessary to get String instances:

irb(main):012:0>
dom.xpath('//div[@id="bodyContent"]/div[@id="mw-content-text"]/ul[1]//a/text()').each
{|n| p n}
#<Nokogiri::XML::Text:0x..fc02b6340 "Great Tinamou">
#<Nokogiri::XML::Text:0x..fc02b469e "Andean Tinamou">
#<Nokogiri::XML::Text:0x..fc02b3d48 "Elegant Crested Tinamou">
#<Nokogiri::XML::Text:0x..fc02b33f2 "Little Tinamou">
#<Nokogiri::XML::Text:0x..fc02b1822 "Slaty-breasted Tinamou">
#<Nokogiri::XML::Text:0x..fc02b1016 "Thicket Tinamou">
=> 0

Btw, you can also use a more explicit XPath:
'//div[@id="bodyContent"]/div[@id="mw-content-text"]/ul[1]/li/a/text()'

(replaced "//" with "/li/")

Kind regards

robert

···

On Fri, Apr 5, 2013 at 7:57 AM, Soichi Ishida <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

7stud2 · 10 April 2013 07:15

thank you for your help! It worked.

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Rexml & nested loops ruby-talk	4	79	9 June 2008
Nokogiri not pulling correct XPath ruby-talk	4	171	1 March 2011
Nokogiri Xpath: need to write "\n" for every table element ruby-talk	1	132	31 October 2012
Nokogiri html xpath gestalt ruby-talk	2	422	17 December 2017
Nokogiri and xpath for changing value on the web ruby-talk	2	125	21 December 2009

Nested loop for xpath

Related topics