HTTP headers and source

Ok guys, lets say i wanted to grab the source for google.com or
something... it wont allow if unless i send the correct headers to spoof
the program.. Can anyone give me a working example of how to send
headers and download a webpage source?

I tried looking through all of the docs and coming up with something but
i failed...

Thanks for any reply's

···

--
Posted via http://www.ruby-forum.com/.

With open-uri[0] you can open URIs just like local files. That would be
entirely sufficient to get the content of the index page of google.com, for
example. Instead of a simple URL you can also pass the open call a URI[1]
object, for which you can explicitly call headers if you need to.

You could then also also use Hpricot[2] to do all sorts of nifty HTML
parsing

[0] - http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/
[1] - http://www.ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html
[2] - http://code.whytheluckystiff.net/hpricot/

Felix

···

-----Original Message-----
From: list-bounce@example.com
[mailto:list-bounce@example.com] On Behalf Of Haze Noc
Sent: Wednesday, August 15, 2007 10:05 AM
To: ruby-talk ML
Subject: HTTP headers and source

Ok guys, lets say i wanted to grab the source for google.com
or something... it wont allow if unless i send the correct
headers to spoof the program.. Can anyone give me a working
example of how to send headers and download a webpage source?

I tried looking through all of the docs and coming up with
something but i failed...

Thanks for any reply's
--
Posted via http://www.ruby-forum.com/\.

Here's my suggestion:

Firefox + LiveHTTPHeaders - http://livehttpheaders.mozdev.org/installation.html

LHH shows all HTTP chatter, so there's nothing that a server can see
that you can't. From there it's just a matter of imitating the headers
with Net::HTTP.

Remember, though, that you have some vague sort of obligation to
maintain netiquette. If a server rejects automated requests, they may
have a good reason to, and you're going against their wishes to mimic
a real browser. I doubt the Feds are going to come kicking your door
in over it, but it's still worth trying to be respectful.

Google, for example, has an API that they encourage for automated
usage. Here are some details: http://code.google.com/apis/soapsearch/api_terms.html

-rking