Exception in thread?

Hi,

I'm trying to use Net::SSH::Multi. http://net-ssh.rubyforge.org/

It seems that ruby have difficulties to catch exception in multi threaded mode, any hint?

the doc said, if we put (:on_error => :warn) it shouldn't fail, be the exception begin/rescue bloc did catch the exception, here Errno::EHOSTUNREACH

/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/./multi/session.rb
468 def next_session(server, force=false) #:nodoc:
[...]
482 begin
483 server.new_session
484 rescue Exception => e
485 server.fail!
486 @session_mutex.synchronize { @open_connections -= 1 }
487
488 case on_error
489 when :ignore then
490 # do nothing
491 when :warn then
492 warn("error connecting to #{server}: #{e.class} (#{e.message})")
493 when Proc then
494 go = catch(:go) { on_error.call(server); nil }
495 case go
496 when nil, :ignore then # nothing
497 when :retry then retry
498 when :raise then raise
499 else warn "unknown 'go' command: #{go.inspect}"
500 end
501 else
502 raise
503 end
504
505 return nil
506 end
[...]

As we can test, we are able to catch the exception at to level, could you confirm its tread related?

require 'rubygems'
require 'net/ssh/multi'
Net::SSH::Multi.start(:on_error => :warn) do |session|
  # define the servers we want to use
  session.use 'root@server-04'
  session.use 'root@server-07' # doesn't exist
  session.use 'root@server-08'

  # execute commands on all servers
begin
    session.exec( "hostname" )
rescue Exception => e
    p "main:#{e}"
end

  # run the aggregated event loop
  session.loop
end

ruby 1.8.5 (2006-08-25) [x86_64-linux]

Regards,
Sylvain.

Sylvain Viart wrote:

It seems that ruby have difficulties to catch exception in multi
threaded mode, any hint?

For debugging purposes, maybe you want Thread.abort_on_exception = true
(or just run ruby with -d flag)

Other than that I don't understand your problem. What behaviour do you
see when you run your test program? What behaviour do you expect? Is no
warning generated for the non-existent host?

  # execute commands on all servers
begin
    session.exec( "hostname" )
rescue Exception => e
    p "main:#{e}"
end

That rescue won't catch exceptions in other threads. Each thread of
execution is responsible for catching its own exceptions. If it doesn't,
then the thread just terminates (unless Thread.abort_on_exception is
set)

It *is* possible for one thread to raise an exception in another thread
(Thread#raise), but this is extremely hairy asynchronous programming and
I would strongly discourage it.

It would seem reasonable for session.exec to collect the status of each
of the threads and return an array of them. I don't know if it does so.
Perhaps you can use something like this:

    errs =
    ...
      :on_error => lambda { |server| errs << server }
    ...
    session.exec "hostname"
    unless errs.empty?
      puts "The command failed on #{errs.size} hosts"
    end

···

--
Posted via http://www.ruby-forum.com/\.

Hi Brian,

Brian Candler a écrit :

Sylvain Viart wrote:
  

It seems that ruby have difficulties to catch exception in multi
threaded mode, any hint?
    
For debugging purposes, maybe you want Thread.abort_on_exception = true (or just run ruby with -d flag)
  

Thanks good to know that.

Other than that I don't understand your problem. What behaviour do you see when you run your test program? What behaviour do you expect?

Sorry, I was late yesterday and my post is confusing.

In fact, I've made some tests and I suspect some strange behavior (or unknown to me) on exception handling.
In the lib, we got a bloc with

484 rescue Exception => e

Which I would expect to catch anything. but it missed Errno::EHOSTUNREACH.
I didn't find a good explanation so I suspect that exception in threads are behaving somewhat differently.

Strangely, if I add another rescue statement in the lib:

/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/./multi/session.rb
506 rescue
507 puts "caught:#{$!}"
508 end

it works?? Why?

Why the normally more open "rescue Exception => e" didn't do its job?
That's why I suspect some interaction between exception and threaded execution.
It seems I missed something about exception or so. :-\

Is no warning generated for the non-existent host?

  # execute commands on all servers
begin
    session.exec( "hostname" )
rescue Exception => e
    p "main:#{e}"
end
    

Sorry for that, I was expecting this bloc not using the begin/rescue in fact.
The rescue here, catch the Errno::EHOSTUNREACH., not caught internally by the lib.

I should have written:

session.exec( "hostname" )

With no rescue, the program fail, no job is performed on any host.

It would seem reasonable for session.exec to collect the status of each of the threads and return an array of them. I don't know if it does so. Perhaps you can use something like this:

    errs =
    ...
      :on_error => lambda { |server| errs << server }
    ...
    session.exec "hostname"
    unless errs.empty?
      puts "The command failed on #{errs.size} hosts"
    end
  

Hum, nice, I'm gonna try. :slight_smile:
Would it catch the Errno::EHOSTUNREACH?

Thanks for your hints.
Regards,
Sylvain.

Sylvain Viart wrote:

In the lib, we got a bloc with

484 rescue Exception => e

Which I would expect to catch anything. but it missed
Errno::EHOSTUNREACH.

Probably I should not try to answer this as I don't use Net::SSH::Multi,
but I've installed the gem now:

    Successfully installed net-ssh-2.0.4
    Successfully installed net-ssh-gateway-1.0.0
    Successfully installed net-ssh-multi-1.0.0

I see that rescue only covers the preceding line:

      begin
        server.new_session
      rescue Exception => e

That is, it will catch an exception raised by server.new_session only.

Strangely, if I add another rescue statement in the lib:

/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/./multi/session.rb
506 rescue
507 puts "caught:#{$!}"
508 end

it works?? Why?

Possibly it's rescuing an exception which is occurring between lines 469
and 480. But since you didn't show the actual backtrace, then this is
pure guesswork.

One option is:

  puts "caught:#{$!}\n#{$!.backtrace.join("\n")}"

Why the normally more open "rescue Exception => e" didn't do its job?

I don't know. But when making a extraordinary claim ("rescue is not
doing its job") then you need to provide the evidence to back it up.

Now, I can replicate something like your problem: pointing to a
non-existent host on my LAN gives an Errno::EHOSTUNREACH.

require 'rubygems'
require 'net/ssh/multi'
Net::SSH::Multi.start(:on_error => :warn) do |session|
  # define the servers we want to use
  session.use 'root@localhost'
  session.use 'root@10.1.1.10' # non-existent host on local LAN

  # execute commands on all servers
begin
    session.exec( "hostname" )
rescue Exception => e
    puts "main:#{e}\n#{e.backtrace.join("\n")}"
end

  # run the aggregated event loop
  session.loop
end

$ ruby test.rb
error connecting to root@localhost: Net::SSH::AuthenticationFailed
(root@localhost)
main:No route to host - connect(2)
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-2.0.4/lib/net/ssh/transport/session.rb:65:in
`initialize'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`join'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`sessions'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`each'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`sessions'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:81:in
`open_channel'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:120:in
`exec'
test.rb:10
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi.rb:62:in
`start'
test.rb:3

I get an exception like you do. However I see no evidence at all that
lib/net/ssh/multi/session.rb is involved.

This is ruby 1.8.6p114. I don't have any specific reason why 1.8.5
wouldn't work, but I consider 1.8.6p114 to be the most "stable" Ruby
available (certainly more stable than later releases in the 1.8.6 family
:slight_smile: and it may well be that Net::SSH::Multi hasn't been well tested with
1.8.5. So it could be worth a try.

···

--
Posted via http://www.ruby-forum.com/\.

Hi,

Sylvain Viart a écrit :

It would seem reasonable for session.exec to collect the status of each of the threads and return an array of them. I don't know if it does so. Perhaps you can use something like this:

    errs =
    ...
      :on_error => lambda { |server| errs << server }
    ...
    session.exec "hostname"
    unless errs.empty?
      puts "The command failed on #{errs.size} hosts"
    end

Hum, nice, I'm gonna try. :slight_smile:
Would it catch the Errno::EHOSTUNREACH?

It didn't catch the exception

a workaround, using a closure and Net::SSH::Multi::DynamicServer <http://net-ssh.rubyforge.org/multi/v1/api/classes/Net/SSH/Multi/DynamicServer.html&gt; behavior, instead of specifying the server. It's evaluated by attempting the ssh connection first. Which mean the server, is connected twice, during the test and later in the session. Note that I discard the 'options' to test the connection.

errs =
def test_server(errs, server)
    lambda do |options|
        begin
            server =~ /(.+)@(.+)/
            server_name, user = $2, $1
            puts server_name
            s = Net::SSH.start(server_name, user)
            s.close
            s = server
        rescue Errno::EHOSTUNREACH, SocketError
            puts "echec connexion #{server} : #{$!}"
            errs << server
            s = nil
        end

        return s
    end
end

Net::SSH::Multi.start(:on_error => :warn) do |session|
  # define the servers we want to use
  session.use &test_server(errs, 'root@srv-04')
  session.use &test_server(errs, 'root@srv-07')
  session.use &test_server(errs, 'root@srv-08')
  session.use &test_server(errs, 'root@fail-08.local')

  # execute commands on all servers
  session.exec( "hostname" )

  # run the aggregated event loop
  session.loop
end

unless errs.empty?
  puts "The command failed on #{errs.size} hosts"
end

#srv-04
#srv-07
#echec connexion root@srv-07 : No route to host - connect(2)
#srv-08
#fail-08.local
#echec connexion root@fail-08.local : getaddrinfo: Name or service not known
#[srv-04] srv-04
#[srv-08] srv-08
#The command failed on 2 hosts

Works, but I still don't know why the exception are not handled in the lib Net::SSH::Multi which may be specific to this lib.
I still appreciate some more hint.

Regards,
Sylvain.

Sylvain Viart wrote:

Works, but I still don't know why the exception are not handled in the
lib Net::SSH::Multi which may be specific to this lib.

Show the backtrace! Otherwise, nobody is going to be able to help you.

That is, in your original demo code, either remove the top-level rescue
clause entirely, or change it to

rescue Exception => e
    puts "main:#{e}\n#{e.backtrace.join("\n")}"
end

Then paste the full, unedited result here.

···

--
Posted via http://www.ruby-forum.com/\.

Hi Brian,

Thanks a lot for your work, I really appreciate your effort. :slight_smile:

Brian Candler a écrit :

Sylvain Viart wrote:
  

Works, but I still don't know why the exception are not handled in the
lib Net::SSH::Multi which may be specific to this lib.
    
Show the backtrace! Otherwise, nobody is going to be able to help you.
  

Sorry for that, I'm not enough backtrace friendly. :-\

----------------------------8<----------------------- t3.rb
require 'rubygems'
require 'net/ssh/multi'

Net::SSH::Multi.start(:on_error => :warn) do |session|
  # define the servers we want to use
  session.use 'root@srv-04'
  session.use 'root@srv-07'
  session.use 'root@srv-08'
  session.use 'root@fail-08.local'

  # execute commands on all servers
  session.exec( "hostname" )

  # run the aggregated event loop
  session.loop
end
----------------------------8<-----------------------

ruby t3.rb
error connecting to root@srv-04: Net::SSH::AuthenticationFailed (root@srv-04)
Text will be echoed in the clear. Please install the HighLine or Termios libraries to suppress echoed text.
Password: /var/lib/gems/1.8/gems/net-ssh-2.0.4/lib/net/ssh/transport/session.rb:65:in `initialize': No route to host - connect(2) (Errno::EHOSTUNREACH)
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in `join'
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in `sessions'
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in `each'
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in `sessions'
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:81:in `open_channel'
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:120:in `exec'
        from t3.rb:12
        from /var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi.rb:62:in `start'
        from t3.rb:4

shell returned 1

I think you're right and this kind of exception is not handled by the lib.
I've to reread the lib, but its doc is confusing.

Regards,
Sylvain.

It does seem inconsistent that "error connecting to root@srv-04:
Net::SSH::AuthenticationFailed" is caught as a warning, but
Errno::EHOSTUNREACH is not. I suggest you check for a project mailing
list or bug tracker and report it there.
http://rubyforge.org/projects/net-ssh

···

--
Posted via http://www.ruby-forum.com/.