Issue with drb

Hello all; I'm having a bit of a problem with distributed Ruby on
Win32; I'm using 1.8.2, from the installer, rc7a. I'm rather enjoying
ruby, and we're using it for a lot of testing in the company where I
work, but this problem is causing some grumbling about whether Ruby is
the right tool for the job... :-/

The issue is that sockets basically seem to be going away; I'm getting
a DRbConnectionError:Invalid argument in the 'read' function, (line
554) in 'load', coming from 'recv_reply'(611) from recv_reply(865),
from 'send_message' (1104), from method_missing (1015)

It looks like the handle it's trying to read is invalid somehow; this
seems very odd. What's even odder is that I still seem able to
communicate with the other end; when my testing script gets the
exception, it proceeds to shut down our software on all the remote
machines that are part of the actual 'test', seemingly without any
trouble. This includes on the machine were we were trying to
communicate and got this invalid socket thing.

What's especially irritating is that it is looking like the message is
being sent; the error is occurring on the reception of the reply, so
this means I can't just put in a 'retry the whole thing on error'
(well, I probably CAN, at least in this case, because it's just asking
for a report, but I don't think it's a Good Idea in general...). OR
does the invalid handle suggest that the message never got through, so
we could be safe doing a Big Retry? Note that I'm having trouble
getting this to reproduce on anything smaller than like a 2 hour test,
which makes the 'make a little change, check if it's working' not work
quite so well..

Has anyone seem anything like this? I've been getting suggestions
here like 'replace the whole test infrastructure with an equivalent in
C#' which makes me itch, so I'd really like to get this solved!

Thanks, and if you need any more info, feel free to email me; I'm Very
Interested in getting this solved...

Chris

Chris,

There are quite a few reasons this could be going wrong, both inside
and outside of your Ruby test harness. Since you said that the problem
was only reproducible on long (>1hr.) tests, though, I would suspect
socket connection (or other system-level resource) timeouts.

I would suggest adding a 'ping()' method to your DRb server, and then
having clients call it periodically (say, every 5-10 seconds) in a
background thread or process, as well as optionally before any call
with important data to be transferred. That way, both the client and
the server can detect connection failures before you have to worry
about losing data.

DRb is cheap wire-level scaffolding, but it's not a reliable messaging
system; that has to be handled at the application level.

···

--
Lennon
rcoder.net

Chris,

There are quite a few reasons this could be going wrong, both inside
and outside of your Ruby test harness. Since you said that the problem
was only reproducible on long (>1hr.) tests, though, I would suspect
socket connection (or other system-level resource) timeouts.

I would suggest adding a 'ping()' method to your DRb server, and then
having clients call it periodically (say, every 5-10 seconds) in a
background thread or process, as well as optionally before any call
with important data to be transferred. That way, both the client and
the server can detect connection failures before you have to worry
about losing data.

DRb is cheap wire-level scaffolding, but it's not a reliable messaging
system; that has to be handled at the application level.

···

--
Lennon
rcoder.net

"Chris Sheppard" <chris+ggl@kinitos.com> schrieb im Newsbeitrag
news:5c3f388a.0409130750.5b2692f6@posting.google.com...

Hello all; I'm having a bit of a problem with distributed Ruby on
Win32; I'm using 1.8.2, from the installer, rc7a. I'm rather enjoying
ruby, and we're using it for a lot of testing in the company where I
work, but this problem is causing some grumbling about whether Ruby is
the right tool for the job... :-/

The issue is that sockets basically seem to be going away; I'm getting
a DRbConnectionError:Invalid argument in the 'read' function, (line
554) in 'load', coming from 'recv_reply'(611) from recv_reply(865),
from 'send_message' (1104), from method_missing (1015)

Maybe the server does not enter an infinite loop and is gone by the time
the client tries to connect (a second time).

Difficult to answer without more input...

    robert

Lennon Day-Reynolds <rcoder@gmail.com> wrote in message news:<5d4c612404091310044fa47610@mail.gmail.com>...

Chris,

There are quite a few reasons this could be going wrong, both inside
and outside of your Ruby test harness. Since you said that the problem
was only reproducible on long (>1hr.) tests, though, I would suspect
socket connection (or other system-level resource) timeouts.

This is what I'm suspecting too; but the weird thing is, I would
expect to get something like a 0 result out of the read, or a
'connection reset by peer' or some similar business, not an Invalid
Handle sort of error.

The other thing that bugs me about this is that we got an error on the
receive part of the connection; I don't want that! If it's gonna
fail, I'd much rather it fail on the send. Is this related somehow?
ie when you close the write end of a socket, you send a FIN; so if the
other end ('them') has closed, it will send the FIN, immediately get
the ACK (courtesy of the kernel) and go into FIN_WAIT_2, whereas the
receiving end ('us') will go into CLOSE_WAIT (as it waits for your app
to notice and close the socket). So then we just write the request
(there's no way for the opposite end to signal that it's not going to
read any more; on Unix if the opposite end is closed I generally find
that I can write once, then I get an EPIPE on the second write), and
then we proceed to try to read, when we get the invalid handle error.
The other alternative is somehow this socket got closed but we're
still using it... but how would that happen between us writing and
reading?

The only issue with this is that the liveness checking should have
already detected that the socket was closed, ya? it does a quick
select() poll on the
socket to see if it was readable; this should notice if the socket was
closed I would think...

I would suggest adding a 'ping()' method to your DRb server, and then
having clients call it periodically (say, every 5-10 seconds) in a
background thread or process, as well as optionally before any call
with important data to be transferred. That way, both the client and
the server can detect connection failures before you have to worry
about losing data.

Well, that's the problem; I'm not totally sure that that will actually
help us out, since the connection seemed to fail in the mid-point; so
maybe the ping will succeed, which is well and good, but how do I know
the data traffic won't right after? Not to mention the fact that the
pool may grow, and therefore the ping would get a different connection
than the subsequent 'real' call...

DRb is cheap wire-level scaffolding, but it's not a reliable messaging
system; that has to be handled at the application level.

Yup; I'm not expecting foolproof-ness (I'm much too ingenious) but I'm
just curious how it can fail in the rather bizarre way it seems to be
failing in...

Thanks,
Chris

"Robert Klemme" <bob.news@gmx.net> wrote in message news:<2qnpgnF11sr31U1@uni-berlin.de>...

"Chris Sheppard" <chris+ggl@kinitos.com> schrieb im Newsbeitrag
news:5c3f388a.0409130750.5b2692f6@posting.google.com...
> Hello all; I'm having a bit of a problem with distributed Ruby on
> Win32; I'm using 1.8.2, from the installer, rc7a. I'm rather enjoying
> ruby, and we're using it for a lot of testing in the company where I
> work, but this problem is causing some grumbling about whether Ruby is
> the right tool for the job... :-/
>
> The issue is that sockets basically seem to be going away; I'm getting
> a DRbConnectionError:Invalid argument in the 'read' function, (line
> 554) in 'load', coming from 'recv_reply'(611) from recv_reply(865),
> from 'send_message' (1104), from method_missing (1015)

Maybe the server does not enter an infinite loop and is gone by the time
the client tries to connect (a second time).

Difficult to answer without more input...

    robert

No, I'm reasonably certain the server is still running, because it
proceeds to shut down our piece of software on that machine, ie
FURTHER COMMUNICATION WORKS. Which is weird. Really, to boil it
down, I want to know what the Invalid Argument error is, above all
other things; a closed connection I can deal with, as well as a reset
connection and suchlike; I know what causes them. I don't know what
causes the Invalid argument error, so I have no idea how to handle
it...

Thanks!

Chris