How do I get open-uri to deliver the same html as what Firefox is seeing?
Ruby-talk friends,
This is a followup to “Re: Nokogiri parsing Google page. Want links”
When I do
html_data = open(“https://www.google.com/search?&q=“Bob+Golba”+"(970)+581-8551"”).read
The part of html_data I see that is relevant to my problem is
<a href="/url?q=http://www.lbaronline.com/rosters/showAgent.asp%3Fid%3D5281&sa=U&ved=0ahUKEwiBmuT_75nYAhUGKWMKHZU3CKAQFggUMAA&usg=AOvVaw2QRcnJeHOoQFCVpVySzba1">Golba, Bob golba.bob@gmail.com - Loveland Berthoud Association …
The string in html_data is not the same html I see when I place
https://www.google.com/search?&q=‘Bob+Golba"+"(970)+581-8551’’
into the address bar in Firefox and then use
Tools > Web Developer > Inspector
I see the following HTML in the inspector. It is not the same as the html above.
Golba, Bob golba.bob@gmail.com - Loveland Berthoud Association of …
How do I get open-uri to deliver the same html as what Firefox is seeing?
And a big thank you to Jonathan Harden for his help with xpath in my “Re: Nokogiri parsing Google page. Want links” post.
Ralph Shnelvar
Google customises search results based on the individual, using all
sorts of markers and heuristics. To get the exact same result, you
have to send the exact same query (with the same headers, including
things like session-identification cookies, possibly in the same
order; maybe even with the same timing characteristics) at the same
time from the same address.
Otherwise you just have to accept that Google doesn't always respond
the same way to similar requests.
Somehow Google has decided not to send AMP results to Firefox, maybe
Firefox requested that in your session, or maybe Google just doesn't
send them to Firefox. That's Google's prerogative, though.
Note that this is the nature of the web; you cannot control what a
server sends in response to any query. In any way. You just have to
take what you're given, and evaluate it on its own merits.
···
On 21 December 2017 at 11:28, Ralph Shnelvar <ralphs@dos32.com> wrote:
Ruby-talk friends,
This is a followup to "Re: Nokogiri parsing Google page. Want links"
When I do
html_data =
open("\"Bob Golba\" \ - Google Search"(970)+581-8551"").read
The part of html_data I see that is relevant to my problem is
- - -
<h3 class=\"r\">
<a
href=\"/url?q=http://www.lbaronline.com/rosters/showAgent.asp%3Fid%3D5281&sa=U&ved=0ahUKEwiBmuT_75nYAhUGKWMKHZU3CKAQFggUMAA&usg=AOvVaw2QRcnJeHOoQFCVpVySzba1\">Golba,
<b>Bob golba</b>.bob@gmail.com - Loveland Berthoud Association ...
</a>
</h3>
- - -
The string in html_data is not the same html I see when I place
https://www.google.com/search?&q='Bob+Golba"+"(970)+581-8551''
into the address bar in Firefox and then use
Tools > Web Developer > Inspector
I see the following HTML in the inspector. It is not the same as the html
above.
- - -
<h3 class="r">
<a href="http://www.lbaronline.com/rosters/showAgent.asp?id=5281"
onmousedown="return
rwt(this,'','','','1','AOvVaw3V3rJ0Lm_aRJ7Vp4EavfPN','','0ahUKEwjot9zIxZnYAhUByGMKHe6cAZsQFggpMAA','','',event)">Golba,
Bob golba.bob@gmail.com - Loveland Berthoud Association of ...
</a>
</h3>
- - -
How do I get open-uri to deliver the same html as what Firefox is seeing?
And a big thank you to Jonathan Harden for his help with xpath in my "Re:
Nokogiri parsing Google page. Want links" post.
Ralph Shnelvar
Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
--
Matthew Kerwin
http://matthew.kerwin.net.au/