Open - uri question

akanksha · 26 July 2006 16:40

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/") { |file|
lines = file.read
puts lines

}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)
        from /usr/local/lib/ruby/1.8/open-uri.rb:629:in `buffer_open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:167:in `open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:165:in `open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:135:in `open_uri'
        from /usr/local/lib/ruby/1.8/open-uri.rb:531:in `open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:86:in `open'
        from test.rb:2

However
require 'open-uri' # allows the use of a file like API for URLs
open( "http://www.google.com/") { |file|
lines = file.read
puts lines

}

works just fine. What am I doing wrong??

Chris6 · 26 July 2006 16:45

akanksha wrote:

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/"\) { |file|
lines = file.read
puts lines

}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)

...

You can see some info on HTTP 500 errors here:
http://www.checkupdown.com/status/E500.html

Maybe the service was down?
Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

Cheers

akanksha · 26 July 2006 17:50

Maybe the service was down?

The service was not down. Both urls open in a browser.

Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.

Justin_Bailey · 26 July 2006 17:54

Use something like Ethereal to capture the packets sent between your browser
and the service. Then imitate that in code. You will just need to send the
same HTTP headers and follow any redirects that the server sends.

Good luck!

Justin

···

On 7/26/06, akanksha <akanksha.baid@gmail.com> wrote:

> Or they may have it restricted to prevent scraping?
> You may need to provide some info to fool the site into
> thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.

Ara.T.Howard6 · 26 July 2006 17:56

you need to set user-agent to a 'real' browser. something like 'Mozilla/4.0'

-a

···

On Thu, 27 Jul 2006, akanksha wrote:

Maybe the service was down?

The service was not down. Both urls open in a browser.

Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

akanksha · 26 July 2006 18:20

yes that works and so does mechanize ....thanks!!!

···

ara.t.howard@noaa.gov wrote:

On Thu, 27 Jul 2006, akanksha wrote:

>> Maybe the service was down?
>
> The service was not down. Both urls open in a browser.
>
>
>
>> Or they may have it restricted to prevent scraping?
>> You may need to provide some info to fool the site into
>> thinking your a regular browser...
>
> How would I go about doing that ...could you plz point me to some
> info?
> Thank you.

you need to set user-agent to a 'real' browser. something like 'Mozilla/4.0'

-a
--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

Topic		Replies	Views
Open-uri question ruby-talk	2	129	26 July 2006
Open_uri.rb -- 500 Internal Server Error ruby-talk	1	136	17 January 2009
Open-uri / net/http bug? ruby-talk	3	124	8 June 2004
Cannot read a page with open-uri ruby-talk	2	109	26 April 2007
OpenURI open method problem ruby-talk	11	331	8 September 2012

Open - uri question

Related topics