I was trying to use RSSscraper to pul some web forums, and something
level went bang in the Net::* libraries.
I found some old references to this error from last year, and I
got the impression it was platform specific?
Can anyone else let me know if this causes problems for them?
It's obviously site specific , url = 'http://www.google.com' has no problems...
Here's the miniaml code (open(url)... is 'line 6' in the code below):
require 'open-uri'
url = 'http://p218.ezboard.com/fdebatingukfrm9'
page = open(url).readlines
If I run this I get:
rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
from /data/ruby/lib/ruby/1.9/net/protocol.rb:116:in `readuntil'
from /data/ruby/lib/ruby/1.9/net/protocol.rb:126:in `readline'
from /data/ruby/lib/ruby/1.9/net/http.rb:1850:in `read_status_line'
from /data/ruby/lib/ruby/1.9/net/http.rb:1839:in `read_new'
from /data/ruby/lib/ruby/1.9/net/http.rb:934:in `request'
from /data/ruby/lib/ruby/1.9/net/http.rb:834:in `request_get'
from /data/ruby/lib/ruby/1.9/open-uri.rb:545:in `proxy_open'
... 7 levels...
from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
from ./regex.rb:6
This is exactly the error I was getting on the front of RSSscraper.
If it helps narrow it down, through a proxy i get:
rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/open-uri.rb:574:in `proxy_open': 503 Service Unavailable (OpenURI::HTTPError)
from /data/ruby/lib/ruby/1.9/open-uri.rb:167:in `open_loop'
from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `catch'
from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `open_loop'
from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
from ./regex.rb:6
···
--
A general leading the State Department resembles a dragon commanding
ducks.
-- New York Times, Jan. 20, 1981
Rasputin :: Jack of All Trades - Master of Nuns
It appears to me that this site refuses to respond unless you have a
recognized User-agent set in the request header. That's probably the
problem with open-uri.
Chad
···
On Sun, 6 Jun 2004 05:02:25 +0900, Dick Davies <rasputnik@hellooperator.net> wrote:
I was trying to use RSSscraper to pul some web forums, and something
level went bang in the Net::* libraries.
I found some old references to this error from last year, and I
got the impression it was platform specific?
Can anyone else let me know if this causes problems for them?
It's obviously site specific , url = 'http://www.google.com' has no problems...
Here's the miniaml code (open(url)... is 'line 6' in the code below):
require 'open-uri'
url = 'http://p218.ezboard.com/fdebatingukfrm9'
page = open(url).readlines
If I run this I get:
rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
from /data/ruby/lib/ruby/1.9/net/protocol.rb:116:in `readuntil'
from /data/ruby/lib/ruby/1.9/net/protocol.rb:126:in `readline'
from /data/ruby/lib/ruby/1.9/net/http.rb:1850:in `read_status_line'
from /data/ruby/lib/ruby/1.9/net/http.rb:1839:in `read_new'
from /data/ruby/lib/ruby/1.9/net/http.rb:934:in `request'
from /data/ruby/lib/ruby/1.9/net/http.rb:834:in `request_get'
from /data/ruby/lib/ruby/1.9/open-uri.rb:545:in `proxy_open'
... 7 levels...
from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
from ./regex.rb:6
This is exactly the error I was getting on the front of RSSscraper.
If it helps narrow it down, through a proxy i get:
rasputin@lb:rss$ ./regex.rb
/data/ruby/lib/ruby/1.9/open-uri.rb:574:in `proxy_open': 503 Service Unavailable (OpenURI::HTTPError)
from /data/ruby/lib/ruby/1.9/open-uri.rb:167:in `open_loop'
from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `catch'
from /data/ruby/lib/ruby/1.9/open-uri.rb:164:in `open_loop'
from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
from ./regex.rb:6
* Chad Fowler <chadfowler@gmail.com> [0655 03:55]:
On Sun, 6 Jun 2004 05:02:25 +0900, Dick Davies
> require 'open-uri'
>
> url = 'http://p218.ezboard.com/fdebatingukfrm9'
> page = open(url).readlines
> /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
> from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
> from /data/ruby/lib/ruby/1.9/net/protocol.rb:116:in `readuntil'
> from /data/ruby/lib/ruby/1.9/net/protocol.rb:126:in `readline'
> from /data/ruby/lib/ruby/1.9/net/http.rb:1850:in `read_status_line'
> from /data/ruby/lib/ruby/1.9/net/http.rb:1839:in `read_new'
> from /data/ruby/lib/ruby/1.9/net/http.rb:934:in `request'
> from /data/ruby/lib/ruby/1.9/net/http.rb:834:in `request_get'
> from /data/ruby/lib/ruby/1.9/open-uri.rb:545:in `proxy_open'
> ... 7 levels...
> from /data/ruby/lib/ruby/1.9/open-uri.rb:134:in `open_uri'
> from /data/ruby/lib/ruby/1.9/open-uri.rb:424:in `open'
> from /data/ruby/lib/ruby/1.9/open-uri.rb:85:in `open'
> from ./regex.rb:6
It appears to me that this site refuses to respond unless you have a
recognized User-agent set in the request header. That's probably the
problem with open-uri.
Ah crap. wget worked fine.
Is there a workaround (other than wget'ting the file to a local
webserver and pulling it from there)? I can't see an easy way of
adding a user-agent header to net/http.rb headers.....
···
--
The District of Columbia has a law forbidding you to exert pressure on
a balloon and thereby cause a whistling sound on the streets.
Rasputin :: Jack of All Trades - Master of Nuns
Bad form to reply to myself, but for the record, adding a
header was incredibly easy:
.....
class DukPolScanner < RSSscraper::AbstractScanner
def initialize
@get_headers = {'User-agent' => 'RssScraper' }
.....
thanks Chad for the pointer, and RSSScrapers creator for a
well-designed tool....
* Dick Davies <rasputnik@hellooperator.net> [0632 12:32]:
···
* Chad Fowler <chadfowler@gmail.com> [0655 03:55]:
> On Sun, 6 Jun 2004 05:02:25 +0900, Dick Davies
> > /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `sysread': End of file reached (EOFError)
> > from /data/ruby/lib/ruby/1.9/net/protocol.rb:135:in `rbuf_fill'
> It appears to me that this site refuses to respond unless you have a
> recognized User-agent set in the request header. That's probably the
> problem with open-uri.
--
There are two types of people in this world, good and bad. The good
sleep better, but the bad seem to enjoy the waking hours much more.
-- Woody Allen
Rasputin :: Jack of All Trades - Master of Nuns