Open-uri / net/http bug?

I was trying to use RSSscraper to pul some web forums, and something
level went bang in the Net::* libraries.

I found some old references to this error from last year, and I
got the impression it was platform specific?

Can anyone else let me know if this causes problems for them?

It's obviously site specific , url = 'http://www.google.com' has no problems...

Here's the miniaml code (open(url)... is 'line 6' in the code below):

  require 'open-uri'

  url = 'http://p218.ezboard.com/fdebatingukfrm9'
  page = open(url).readlines

If I run this I get:

rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
        from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
        from /data/ruby/lib/ruby/1.9/net/protocol.rb:116:in `readuntil'
        from /data/ruby/lib/ruby/1.9/net/protocol.rb:126:in `readline'
        from /data/ruby/lib/ruby/1.9/net/http.rb:1850:in `read_status_line'
        from /data/ruby/lib/ruby/1.9/net/http.rb:1839:in `read_new'
        from /data/ruby/lib/ruby/1.9/net/http.rb:934:in `request'
        from /data/ruby/lib/ruby/1.9/net/http.rb:834:in `request_get'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:545:in `proxy_open'
         ... 7 levels...
        from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
        from ./regex.rb:6

This is exactly the error I was getting on the front of RSSscraper.
If it helps narrow it down, through a proxy i get:

rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/open-uri.rb:574:in `proxy_open': 503 Service Unavailable (OpenURI::HTTPError)
        from /data/ruby/lib/ruby/1.9/open-uri.rb:167:in `open_loop'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `catch'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `open_loop'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
        from ./regex.rb:6

···

--
A general leading the State Department resembles a dragon commanding
ducks.
    -- New York Times, Jan. 20, 1981
Rasputin :: Jack of All Trades - Master of Nuns

It appears to me that this site refuses to respond unless you have a
recognized User-agent set in the request header. That's probably the
problem with open-uri.

Chad

···

On Sun, 6 Jun 2004 05:02:25 +0900, Dick Davies <rasputnik@hellooperator.net> wrote:

I was trying to use RSSscraper to pul some web forums, and something
level went bang in the Net::* libraries.

I found some old references to this error from last year, and I
got the impression it was platform specific?

Can anyone else let me know if this causes problems for them?

It's obviously site specific , url = 'http://www.google.com' has no problems...

Here's the miniaml code (open(url)... is 'line 6' in the code below):

  require 'open-uri'

  url = 'http://p218.ezboard.com/fdebatingukfrm9&#39;
  page = open(url).readlines

If I run this I get:

rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
        from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
        from /data/ruby/lib/ruby/1.9/net/protocol.rb:116:in `readuntil'
        from /data/ruby/lib/ruby/1.9/net/protocol.rb:126:in `readline'
        from /data/ruby/lib/ruby/1.9/net/http.rb:1850:in `read_status_line'
        from /data/ruby/lib/ruby/1.9/net/http.rb:1839:in `read_new'
        from /data/ruby/lib/ruby/1.9/net/http.rb:934:in `request'
        from /data/ruby/lib/ruby/1.9/net/http.rb:834:in `request_get'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:545:in `proxy_open'
         ... 7 levels...
        from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
        from ./regex.rb:6

This is exactly the error I was getting on the front of RSSscraper.
If it helps narrow it down, through a proxy i get:

rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/open-uri.rb:574:in `proxy_open': 503 Service Unavailable (OpenURI::HTTPError)
        from /data/ruby/lib/ruby/1.9/open-uri.rb:167:in `open_loop'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `catch'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `open_loop'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
        from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
        from ./regex.rb:6

* Chad Fowler <chadfowler@gmail.com> [0655 03:55]:

On Sun, 6 Jun 2004 05:02:25 +0900, Dick Davies

> require 'open-uri'
>
> url = 'http://p218.ezboard.com/fdebatingukfrm9&#39;
> page = open(url).readlines

> /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
> from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
> from /data/ruby/lib/ruby/1.9/net/protocol.rb:116:in `readuntil'
> from /data/ruby/lib/ruby/1.9/net/protocol.rb:126:in `readline'
> from /data/ruby/lib/ruby/1.9/net/http.rb:1850:in `read_status_line'
> from /data/ruby/lib/ruby/1.9/net/http.rb:1839:in `read_new'
> from /data/ruby/lib/ruby/1.9/net/http.rb:934:in `request'
> from /data/ruby/lib/ruby/1.9/net/http.rb:834:in `request_get'
> from /data/ruby/lib/ruby/1.9/open-uri.rb:545:in `proxy_open'
> ... 7 levels...
> from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
> from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
> from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
> from ./regex.rb:6

It appears to me that this site refuses to respond unless you have a
recognized User-agent set in the request header. That's probably the
problem with open-uri.

Ah crap. wget worked fine.

Is there a workaround (other than wget'ting the file to a local
webserver and pulling it from there)? I can't see an easy way of
adding a user-agent header to net/http.rb headers.....

···

--
The District of Columbia has a law forbidding you to exert pressure on
a balloon and thereby cause a whistling sound on the streets.
Rasputin :: Jack of All Trades - Master of Nuns

Bad form to reply to myself, but for the record, adding a
header was incredibly easy:

.....
class DukPolScanner < RSSscraper::AbstractScanner
  def initialize
     @get_headers = {'User-agent' => 'RssScraper' }
.....

thanks Chad for the pointer, and RSSScrapers creator for a
well-designed tool....

* Dick Davies <rasputnik@hellooperator.net> [0632 12:32]:

···

* Chad Fowler <chadfowler@gmail.com> [0655 03:55]:
> On Sun, 6 Jun 2004 05:02:25 +0900, Dick Davies

> > /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
> > from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'

> It appears to me that this site refuses to respond unless you have a
> recognized User-agent set in the request header. That's probably the
> problem with open-uri.

--
There are two types of people in this world, good and bad. The good
sleep better, but the bad seem to enjoy the waking hours much more.
    -- Woody Allen
Rasputin :: Jack of All Trades - Master of Nuns