Nokogiri parsing Google page. Want links

Nokogiri parsing Google page. Want links
Ruby friends,

I do a Google search on. for instance,

“Bob Golba” “(970) 581-8551”

I then Firefox inspect the result and see the first result as

<a href=“http://www.lbaronline.com/rosters/showAgent.asp?id=5281” onmousedown="return rwt

(this,'','','','1','AOvVaw3V3rJ0Lm_aRJ7Vp4EavfPN','','0ahUKEwivv_er0JnYAhVH12MKHfIpB6IQFggpMAA','','',event)">

Golba, Bob golba.bob@gmail.com - Loveland Berthoud Association of …

I wish to pick up the string

"http://www.lbaronline.com/rosters/showAgent.asp?id=5281"

The test program I am using is:


require ‘open-uri’

require ‘nokogiri’

require ‘byebug’

html_data = open(“https://www.google.com/search?&q=“Bob+Golba”+"(970)+581-8551"”).read

nokogiri_object = Nokogiri::HTML(html_data)

elements = nokogiri_object.xpath(’//h3/a’)

byebug

xyz=123


when I do a

elements[0]

at the byebug I get


#<Nokogiri::XML::Element:0x2af427b2e0f8 name=“a” attributes=[#<Nokogiri::XML::Attr:0x2af427b2efd0 name=“href” value="/url?q=http://www.lbaronline.com/rosters/showAgent.asp%3Fid%3D5281&sa=U&ved=0ahUKEwiChu-a05nYAhVR8mMKHTW_CaQQFggUMAA&usg=AOvVaw1sF6uZXxfSqYfqythiWpjj">] children=[#<Nokogiri::XML::Text:0x2af427b29b20 "Golba, “>, #<Nokogiri::XML::Element:0x2af427b29a58 name=“b” children=[#<Nokogiri::XML::Text:0x2af427b29850 “Bob golba”>]>, #<Nokogiri::XML::Text:0x2af427b296ac ".bob@gmail.com - Loveland Berthoud Association …”>]>


I have tried many things and the closest I have come is to get


(byebug) nokogiri_object.xpath(’//h3/a’)[0].values

["/url?q=http://www.lbaronline.com/rosters/showAgent.asp%3Fid%3D5281&sa=U&ved=0ahUKEwj389zjsJnYAhVRImMKHVrSDKYQFggUMAA&usg=AOvVaw3LKL9gpeGOlDLLysgxrTQw"]


How do I pick up the object of the href? That is how do I get the actual string

http://www.lbaronline.com/rosters/showAgent.asp?id=5281

?

Ralph Shnelvar

I don't have a clue about how to use Nokogiri, but, you may want to try
and use selenium and the watir gem to do this kind of stuff nowdays.

···

On 12/21/2017 12:45 AM, Ralph Shnelvar wrote:

Nokogiri parsing Google page. Want links Ruby friends,

I do a Google search on. for instance,
"Bob Golba" "(970) 581-8551"

I then Firefox inspect the result and see the first result as
<h3 class="r">
<a href="http://www.lbaronline.com/rosters/showAgent.asp?id=5281&quot;
onmousedown="return rwt

(this,'','','','1','AOvVaw3V3rJ0Lm_aRJ7Vp4EavfPN','','0ahUKEwivv_er0JnYAhVH12MKHfIpB6IQFggpMAA','','',event)">
Golba, Bob golba.bob@gmail.com - Loveland Berthoud Association of ...
</a>
</h3>

I wish to pick up the string
"http://www.lbaronline.com/rosters/showAgent.asp?id=5281&quot;
<http://www.lbaronline.com/rosters/showAgent.asp?id=5281&gt;

The test program I am using is:
- - -
require 'open-uri'
require 'nokogiri'
require 'byebug'

html_data =
open("\"Bob Golba\" \ - Google Search"(970)+581-8551&quot;").read
nokogiri_object = Nokogiri::HTML(html_data)

elements = nokogiri_object.xpath('//h3/a')

byebug
xyz=123
- - -

when I do a
elements[0]
at the byebug I get
- - -
#<Nokogiri::XML::Element:0x2af427b2e0f8 name="a"
attributes=[#<Nokogiri::XML::Attr:0x2af427b2efd0 name="href"
value="/url?q=http://www.lbaronline.com/rosters/showAgent.asp%3Fid%3D5281&sa=U&ved=0ahUKEwiChu-a05nYAhVR8mMKHTW_CaQQFggUMAA&usg=AOvVaw1sF6uZXxfSqYfqythiWpjj&quot;&gt;\]
children=[#<Nokogiri::XML::Text:0x2af427b29b20 "Golba, ">,
#<Nokogiri::XML::Element:0x2af427b29a58 name="b"
children=[#<Nokogiri::XML::Text:0x2af427b29850 "Bob golba">]>,
#<Nokogiri::XML::Text:0x2af427b296ac ".bob@gmail.com - Loveland
Berthoud Association ...">]>
- - -

I have tried many things and the closest I have come is to get
- - -
(byebug) nokogiri_object.xpath('//h3/a')[0].values
["/url?q=http://www.lbaronline.com/rosters/showAgent.asp%3Fid%3D5281&sa=U&ved=0ahUKEwj389zjsJnYAhVRImMKHVrSDKYQFggUMAA&usg=AOvVaw3LKL9gpeGOlDLLysgxrTQw&quot;\]
- - -

How do I pick up the object of the href? That is how do I get the
actual string
"http://www.lbaronline.com/rosters/showAgent.asp?id=5281&quot;
?

Ralph Shnelvar

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

--
--
--A.D Masiakos
--GIAC REM #4706
--KeyId: 0x48D84811
--http://recodestuff.wordpress.com

Hey Ralph,

Try using an xpath query to directly access the attribute, like:

//h3/a/@href

Sorry I can't try this at the moment since I'm typing on my phone. I think
this should get you a bit closer though.

Just got to my computer:

So the xpath query I wrote will give a list of the a elements hrefs, you
can access it more directly like this:

irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> html_data = <<EOF
irb(main):003:0" <h3 class="r">
irb(main):004:0" <a href="
http://www.lbaronline.com/rosters/showAgent.asp?id=5281&quot;
onmousedown="return rwt
irb(main):005:0"
(this,'','','','1','AOvVaw3V3rJ0Lm_aRJ7Vp4EavfPN','','0ahUKEwivv_er0JnYAhVH12MKHfIpB6IQFggpMAA','','',event)">
irb(main):006:0" Golba, Bob golba.bob@gmail.com - Loveland Berthoud
Association of ...
irb(main):007:0" </a>
irb(main):008:0" </h3>
irb(main):009:0" EOF
...
irb(main):010:0> nokogiri_object = Nokogiri::HTML(html_data)
...
irb(main):011:0> links = nokogiri_object.xpath('//h3/a/@href')
=> [#<Nokogiri::XML::Attr:0x3febad440984 name="href" value="
http://www.lbaronline.com/rosters/showAgent.asp?id=5281&quot;&gt;\]
irb(main):012:0> links.first.value
=> "http://www.lbaronline.com/rosters/showAgent.asp?id=5281&quot;

Note I trimmed some of the output from irb since it's just noise in this
context

Regards,

Jonathan

···

On Wed, 20 Dec 2017 at 23:14 Jonathan Harden <jfharden@gmail.com> wrote:

Hey Ralph,

Try using an xpath query to directly access the attribute, like:

//h3/a/@href

Sorry I can't try this at the moment since I'm typing on my phone. I think
this should get you a bit closer though.