Rescue http Bad File Descriptor error (EBADF)

I have a little web spider that scraps several web pages. Sometimes the
script gets a Bad File Descriptor error and the script bails out.

As far as I can understand this error is an OS (Windows XP) error and there
is nothing ruby can do to avoid it (Maybe XP cannot handle so much http
connections so rapidly). But I cannot find a way to recover from the
error.... I don't want the script to bail out, but simply continue with the
next page.

I have tried to rescue and try/catch the error with all imaginable exception
classes, still the script always bails out when this error ocurrs. I know
this error is in the ruby net/http library since I have used Mechanize,
Http-access2, http-access and all of them suffer from this error.

Here are the details of the error

c:/ruby/lib/ruby/1.8/net/http.rb:562:in `initialize': Bad file descriptor -
connect(2) (Errno::EBADF)
        from c:/ruby/lib/ruby/1.8/net/http.rb:562:in `connect'
        from c:/ruby/lib/ruby/1.8/timeout.rb:48:in `timeout'
        from c:/ruby/lib/ruby/1.8/timeout.rb:76:in `timeout'
        from c:/ruby/lib/ruby/1.8/net/http.rb:562:in `connect'
        from c:/ruby/lib/ruby/1.8/net/http.rb:555:in `do_start'
        from c:/ruby/lib/ruby/1.8/net/http.rb:544:in `start'
        from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.4.7/lib/mechanize.rb:279:in
`fetch_page'
        from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.4.7/lib/mechanize.rb:138:in
`get'
        from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.4.7/lib/mechanize.rb:193:in
`submit'
        from ./urls.rb:1001
        from ./urls.rb:998
        from ./spider.rb:540:in `get_pages'
        from ./spider.rb:470:in `start'
        from ./spider.rb:733:in `process_link'
        from ./spider.rb:677:in `start'
        from ./spider.rb:670:in `start'
        from main2.rb:4

I repeat that this error is very sporadic... it occurs at different times,
different pages and using either Mechanize or http-access2 or simple
open-uri.

The code I use (Mechanize case) is simply:

def get_page
     @agent = WWW::Mechanize.new
     @page = @agent.get @url
end

This code is executed for each page (100~). Adding rescue or try/catch
anywhere inside the get_page method or around it when called does not catch
the error... the script always stops when the error ocurrs. Also since this
error is very sporadic I cannot reproduce it making it very difficult to
debug.

Any tips are very appreciated.

Horacio

Horacio Sanson wrote:

I have a little web spider that scraps several web pages. Sometimes the
script gets a Bad File Descriptor error and the script bails out.

As far as I can understand this error is an OS (Windows XP) error and
there
is nothing ruby can do to avoid it (Maybe XP cannot handle so much http
connections so rapidly). But I cannot find a way to recover from the
error.... I don't want the script to bail out, but simply continue with
the
next page.

I have tried to rescue and try/catch the error with all imaginable
exception
classes, still the script always bails out when this error ocurrs. I
know
this error is in the ruby net/http library since I have used Mechanize,
Http-access2, http-access and all of them suffer from this error.

Did you try this?

begin
  # Read stuff etc.
rescue Errno::EBADF
  # Whatever
end

···

Here are the details of the error

<snip />

This code is executed for each page (100~). Adding rescue or try/catch
anywhere inside the get_page method or around it when called does not
catch
the error... the script always stops when the error ocurrs. Also since
this
error is very sporadic I cannot reproduce it making it very difficult to
debug.

Any tips are very appreciated.

Horacio

--
Posted via http://www.ruby-forum.com/\.

I think I did try... but will try again. now I have to wait until the error
occurs again...

thanks

···

On 8/28/06, Eero Saynatkari <eero.saynatkari@kolumbus.fi> wrote:

Horacio Sanson wrote:
> I have a little web spider that scraps several web pages. Sometimes the
> script gets a Bad File Descriptor error and the script bails out.
>
> As far as I can understand this error is an OS (Windows XP) error and
> there
> is nothing ruby can do to avoid it (Maybe XP cannot handle so much http
> connections so rapidly). But I cannot find a way to recover from the
> error.... I don't want the script to bail out, but simply continue with
> the
> next page.
>
> I have tried to rescue and try/catch the error with all imaginable
> exception
> classes, still the script always bails out when this error ocurrs. I
> know
> this error is in the ruby net/http library since I have used Mechanize,
> Http-access2, http-access and all of them suffer from this error.

Did you try this?

begin
  # Read stuff etc.
rescue Errno::EBADF
  # Whatever
end

> Here are the details of the error
>
> <snip />
>
> This code is executed for each page (100~). Adding rescue or try/catch
> anywhere inside the get_page method or around it when called does not
> catch
> the error... the script always stops when the error ocurrs. Also since
> this
> error is very sporadic I cannot reproduce it making it very difficult to
> debug.
>
> Any tips are very appreciated.
>
> Horacio

--
Posted via http://www.ruby-forum.com/\.