Parse both string and url using Nokogiri xpath

ruby 1.9.3
nokogiri 1.5.5

Say, a web page has a link,

  <a href="http://example.com">reference</a>

I would like to get both the url and text, "http://example.com" and
"reference".

First, access to the page that contains this link.

doc = Nokogiri::HTML(open(url))

then,

name = doc.xpath('//div.../a').text
url = doc.xpath('//div.../a/@href).text

It works. But the problem is this is parsing twice separately.
If you want to apply the same procedure to many links that exist in a
single page, it seems inefficient.

Is there anyway to produce both url and text by single parse? like

def parse_link_and_text (xpath)
   ...
end

p parse_link_and_text('//div...')

gives a hash

=> {'reference' => 'http://example.com'}

?

···

--
Posted via http://www.ruby-forum.com/.

Just search for <a> and go from there.

$ irb -r nokogiri
irb(main):001:0> dom = Nokogiri.HTML('<x><a href="link">text</a></x>')
=> #<Nokogiri::HTML::Document:0x434197c name="document"
children=[#<Nokogiri::XML::DTD:0x43411d4 name="html">,
#<Nokogiri::XML::Element:0x433df20 name="html"
children=[#<Nokogiri::XML::Element:0x433daac name="body"
children=[#<Nokogiri::XML::Element:0x433d48a name="x"
children=[#<Nokogiri::XML::Element:0x433cfee name="a"
attributes=[#<Nokogiri::XML::Attr:0x433b086 name="href" value="link">]
children=[#<Nokogiri::XML::Text:0x433be5a "text">]>]>]>]>]>
irb(main):002:0> node = dom.at_xpath '//a'
=> #<Nokogiri::XML::Element:0x433cfee name="a"
attributes=[#<Nokogiri::XML::Attr:0x433b086 name="href" value="link">]
children=[#<Nokogiri::XML::Text:0x433be5a "text">]>
irb(main):003:0> node[:href]
=> "link"
irb(main):004:0> node.text
=> "text"
irb(main):005:0>

Now, what is so difficult about that? You can easily find out more via
documentation.

Cheers

robert

···

On Sun, May 12, 2013 at 1:37 AM, Soichi Ishida <lists@ruby-forum.com> wrote:

ruby 1.9.3
nokogiri 1.5.5

Say, a web page has a link,

  <a href="http://example.com">reference</a>

I would like to get both the url and text, "http://example.com" and
"reference".

First, access to the page that contains this link.

doc = Nokogiri::HTML(open(url))

then,

name = doc.xpath('//div.../a').text
url = doc.xpath('//div.../a/@href).text

It works. But the problem is this is parsing twice separately.
If you want to apply the same procedure to many links that exist in a
single page, it seems inefficient.

Is there anyway to produce both url and text by single parse? like

def parse_link_and_text (xpath)
   ...
end

p parse_link_and_text('//div...')

gives a hash

=> {'reference' => 'http://example.com'}

?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Mayby using a temp variable ?

    links = doc.xpath('//div/a[@href]')
    links.map do |x| [x.text,x['href']] end => [["reference", "
http://example.com"]]

···

2013/5/12 Soichi Ishida <lists@ruby-forum.com>

ruby 1.9.3
nokogiri 1.5.5

Say, a web page has a link,

  <a href="http://example.com">reference</a>

I would like to get both the url and text, "http://example.com" and
"reference".

First, access to the page that contains this link.

doc = Nokogiri::HTML(open(url))

then,

name = doc.xpath('//div.../a').text
url = doc.xpath('//div.../a/@href).text

It works. But the problem is this is parsing twice separately.
If you want to apply the same procedure to many links that exist in a
single page, it seems inefficient.

Is there anyway to produce both url and text by single parse? like

def parse_link_and_text (xpath)
   ...
end

p parse_link_and_text('//div...')

gives a hash

=> {'reference' => 'http://example.com'}

?

--
Posted via http://www.ruby-forum.com/\.

Thanks both replies are helpful!

···

--
Posted via http://www.ruby-forum.com/.