Open-uri question

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/") { |file|
  lines = file.read
  puts lines
}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)
        from /usr/local/lib/ruby/1.8/open-uri.rb:629:in `buffer_open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:167:in `open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:165:in `open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:135:in `open_uri'
        from /usr/local/lib/ruby/1.8/open-uri.rb:531:in `open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:86:in `open'
        from test.rb:2

However
require 'open-uri' # allows the use of a file like API for URLs
open( "http://www.google.com/") { |file|
  lines = file.read
  puts lines
}

works just fine. What am I doing wrong??

akanksha wrote:

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/"\) { |file|
  lines = file.read
  puts lines
}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)

<snip>

works just fine. What am I doing wrong??

Nothing, from the looks of your code. The server's returning an error message, which could (I guess) be something to do with cookies. Mechanize has no problems with that site, anyway. I'd suggest using that.

···

--
Alex

I get the same error although I noticed there was a semi-colon causing a syntax error:

SyntaxError: compile error
(irb):2: syntax error
open( "http://no-way-outspaik375.spaces.msn.com":wink: { |file|
                                                 ^
(irb):5: syntax error
        from (irb):5
irb(main):006:0>

... which could (I guess) be something to do with cookies

I don't know the reason this is occuring, but here is what the server is sending back from a wget:

wget --verbose --no-proxy --no-cookies --server-response http://no-way-outspaik375.spaces.msn.com

--20:22:47-- http://no-way-outspaik375.spaces.msn.com/
           => `index.html.5'
Resolving no-way-outspaik375.spaces.msn.com... 207.46.217.223, 207.46.217.219
Connecting to no-way-outspaik375.spaces.msn.com|207.46.217.223|:80... connected.

HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Connection: keep-alive
  Server: Microsoft-IIS/6.0
  P3P:CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"
  X-Powered-By: ASP.NET
  MSNSERVER: H: TK2SPCWEBA96 V: 1 D: 1/1/2000
  X-AspNet-Version: 2.0.50727
  Set-Cookie: MC1=V=3&GUID=e79f712dd7ae49d4ab26790953d8b9e0; domain=.msn.com; expires=Mon, 04-Oct-2021 19:00:00 GMT; path=/
  Set-Cookie: S_VDT=Fri, 07 Jul 2006 16:32:00 GMT; expires=Sat, 29-Jul-2006 00:22:48 GMT; path=/
  Set-Cookie: sc_stgcls_107=NzAwRGRYZlU3Q3Q5WDBYYnB2bktaSjkwY3hnUnllVU56SQ==; domain=no-way-outspaik375.spaces.msn.com; expires=Fri, 25-Aug-2006 00:22:48 GMT; path=/
  Set-Cookie: S_USI=0; expires=Sat, 29-Jul-2006 00:22:48 GMT; path=/
  Last-Modified: Wed, 19 Jul 2006 01:59:40 GMT
  Expires: Tue, 25 Jul 2006 17:22:48 GMT
  Cache-Control: private
  Content-Type: text/html; charset=utf-8
  Content-Length: 66500
Length: 66,500 (65K) [text/html]

100%[====================================>] 66,500 88.00K/s

20:22:48 (87.74 KB/s) - `index.html.5' saved [66500/66500]

···

Date: Wed, 26 Jul 2006 00:22:48 GMT