I'm rather new to both web programming and ruby so forgive me if my question is ill formed.
I'm trying to do some screen scraping on a website that requires a login . What I would like to have happen is for the user to login to the website normally, then run my script which uses the existing login session to grab the page and do whatever to it.
To illustrate my problem: If I use Net::HTTP.get_response(URI.parse("http://foo.bar/baz.php")).body, then it serves up the index asking for a login. How do I get contents of baz.php?
I'm rather new to both web programming and ruby so forgive me if my question is ill formed.
I'm trying to do some screen scraping on a website that requires a login . What I would like to have happen is for the user to login to the website normally, then run my script which uses the existing login session to grab the page and do whatever to it.
To illustrate my problem: If I use Net::HTTP.get_response(URI.parse("http://foo.bar/baz.php"\)).body, then it serves up the index asking for a login. How do I get contents of baz.php?
I suspect that the user agent (i.e., the code, as opposed to a browser) needs to include site cookies in the request headers.
After you sign in using a browser, you'll need to find the cookie left by the site, or inspect a session cookie if the browser is not writing it to disk. Most browsers have a way to show cookies sent by a site.
I'm rather new to both web programming and ruby so forgive me if my question is ill formed.
I'm trying to do some screen scraping on a website that requires a login . What I would like to have happen is for the user to login to the website normally, then run my script which uses the existing login session to grab the page and do whatever to it.
To illustrate my problem: If I use Net::HTTP.get_response(URI.parse("http://foo.bar/baz.php"\)).body, then it serves up the index asking for a login. How do I get contents of baz.php?
I suspect that the user agent (i.e., the code, as opposed to a browser) needs to include site cookies in the request headers.
After you sign in using a browser, you'll need to find the cookie left by the site, or inspect a session cookie if the browser is not writing it to disk. Most browsers have a way to show cookies sent by a site.
James
Thank you, James. I see that when I login to the site 4 cookies are set, how would I include them in the request headers?
I'm rather new to both web programming and ruby so forgive me if my question is ill formed.
I'm trying to do some screen scraping on a website that requires a login . What I would like to have happen is for the user to login to the website normally, then run my script which uses the existing login session to grab the page and do whatever to it.
To illustrate my problem: If I use Net::HTTP.get_response(URI.parse("http://foo.bar/baz.php"\)).body, then it serves up the index asking for a login. How do I get contents of baz.php?
I suspect that the user agent (i.e., the code, as opposed to a browser) needs to include site cookies in the request headers.
After you sign in using a browser, you'll need to find the cookie left by the site, or inspect a session cookie if the browser is not writing it to disk. Most browsers have a way to show cookies sent by a site.
James
Thank you, James. I see that when I login to the site 4 cookies are set, how would I include them in the request headers?
I *think* you pass a hash into the Net::HTTP initializer, or perhaps as a parameter to 'get' but I can't find docs or examples to prove this.