Open - uri question

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/") { |file|
  lines = file.read
  puts lines

}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)
        from /usr/local/lib/ruby/1.8/open-uri.rb:629:in `buffer_open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:167:in `open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:165:in `open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:135:in `open_uri'
        from /usr/local/lib/ruby/1.8/open-uri.rb:531:in `open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:86:in `open'
        from test.rb:2

However
require 'open-uri' # allows the use of a file like API for URLs
open( "http://www.google.com/") { |file|
  lines = file.read
  puts lines

}

works just fine. What am I doing wrong??

akanksha wrote:

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/"\) { |file|
  lines = file.read
  puts lines

}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)

...

You can see some info on HTTP 500 errors here:
http://www.checkupdown.com/status/E500.html

Maybe the service was down?
Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

Cheers

Maybe the service was down?

The service was not down. Both urls open in a browser.

Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.

Use something like Ethereal to capture the packets sent between your browser
and the service. Then imitate that in code. You will just need to send the
same HTTP headers and follow any redirects that the server sends.

Good luck!

Justin

···

On 7/26/06, akanksha <akanksha.baid@gmail.com> wrote:

> Or they may have it restricted to prevent scraping?
> You may need to provide some info to fool the site into
> thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.

you need to set user-agent to a 'real' browser. something like 'Mozilla/4.0'

-a

···

On Thu, 27 Jul 2006, akanksha wrote:

Maybe the service was down?

The service was not down. Both urls open in a browser.

Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

yes that works and so does mechanize ....thanks!!!

···

ara.t.howard@noaa.gov wrote:

On Thu, 27 Jul 2006, akanksha wrote:

>> Maybe the service was down?
>
> The service was not down. Both urls open in a browser.
>
>
>
>> Or they may have it restricted to prevent scraping?
>> You may need to provide some info to fool the site into
>> thinking your a regular browser...
>
> How would I go about doing that ...could you plz point me to some
> info?
> Thank you.

you need to set user-agent to a 'real' browser. something like 'Mozilla/4.0'

-a
--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama