Image scraping from behind a proxy

Abhishek_Ghose · 4 June 2008 10:48

Hi,

I was looking at this post in the forum for downloading image files from
the www:
http://www.ruby-forum.com/topic/133833

But it doesnt work for me, apparently because I am behind a proxy. For
the above code(s) I get errors like the following:

c:/ruby/lib/ruby/1.8/net/http.rb:564:in `initialize': No connection
could be mad
e because the target machine actively refused it. - connect(2)
(Errno::ECONNREFU
SED)
        from c:/ruby/lib/ruby/1.8/net/http.rb:564:in `open'
        from c:/ruby/lib/ruby/1.8/net/http.rb:564:in `connect'
        from c:/ruby/lib/ruby/1.8/timeout.rb:48:in `timeout'
        from c:/ruby/lib/ruby/1.8/timeout.rb:76:in `timeout'
        from c:/ruby/lib/ruby/1.8/net/http.rb:564:in `connect'
        from c:/ruby/lib/ruby/1.8/net/http.rb:557:in `do_start'
        from c:/ruby/lib/ruby/1.8/net/http.rb:546:in `start'
        from c:/ruby/lib/ruby/1.8/open-uri.rb:243:in `open_http'
         ... 7 levels...
        from test.rb:48:in `write_images'
        from test.rb:45:in `each'
        from test.rb:45:in `write_images'
        from test.rb:76

I had run into similar problems when I had tried to obtain a http
response. Back then I started doing this (which works perfectly for me):

$proxy_addr = 'proxyservername'
$proxy_port = 8080
$proxy=Net::HTTP::Proxy($proxy_addr, $proxy_port)

http_query="http://www.yahoo.com"
url = URI.parse(http_query)
http_response = $proxy.get_response(url)

Is there something similar I can do for obtaining image files? I did
tweak the above code to have a http image file location in the
http_query and store the http_response.body into a normal file. Though
that didnt give me any errors, my jpeg is unreadable.

···

--
Posted via http://www.ruby-forum.com/.

Abhishek_Ghose · 4 June 2008 11:10

While I was writing my query I figured out what I am supposed to do
Sorry for the thread. I hope it helps other visitors to the forum.

Here's how it works now:

$proxy_addr = 'proxyservername'
$proxy_port = 8080

Net::HTTP::Proxy($proxy_addr, $proxy_port).start("static.flickr.com") {

http>

resp = http.get("/92/218926700_ecedc5fef7_o.jpg")
open("fun.jpg", "wb") { |file|
file.write(resp.body)
}
}

The above is tweaked version of the example available here:
http://www.rubynoob.com/articles/2006/8/21/how-to-download-files-with-a-ruby-script

It just uses Net::HTTP::Proxy instead of Net::HTTP

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Downloading Images Errors ruby-talk	1	100	4 September 2011
Proxy Server troubles ruby-talk	12	71	25 April 2005
HTTP Proxy problem ruby-talk	2	70	7 May 2005
Help with HTTP Intercepting Proxy in Ruby? ruby-talk	0	129	20 April 2010
Connecting via a proxy ruby-talk	6	102	3 January 2008

Image scraping from behind a proxy

Related topics