Recovering from failure of a Rinda ring server

I'm in the process of designing a fault-tolerant distributed
application using Ruby, and am looking at whether Rinda will be
suitable for this purpose. I am wondering what strategies are
available for recovering from the failure of the ring server, which
seems to me like a critical single point of failure. It is possible to
run the ring server/tuplespace daemon as part of a Linux-HA heartbeat
cluster to guard against physical failure of the primary ring server,
but this requires restarting the ring server on the secondary node.
The new ring server instance running on the backup server is now
ignorant of all services that were previously registered on the old
ring server before the failure. This is unacceptable for the
distributed application. Is there a way for live services to
automagically detect failure of the ring server, and automatically
reregister themselves with it when it goes back up in that case, or
some way for a primary and backup ring server to communicate with each
other and share information about registered services transparently?

I'm in the process of designing a fault-tolerant distributed
application using Ruby, and am looking at whether Rinda will be
suitable for this purpose. I am wondering what strategies are
available for recovering from the failure of the ring server, which
seems to me like a critical single point of failure. It is possible to
run the ring server/tuplespace daemon as part of a Linux-HA heartbeat
cluster to guard against physical failure of the primary ring server,
but this requires restarting the ring server on the secondary node.
The new ring server instance running on the backup server is now
ignorant of all services that were previously registered on the old
ring server before the failure.

Yup.

This is unacceptable for the
distributed application. Is there a way for live services to
automagically detect failure of the ring server, and automatically
reregister themselves with it when it goes back up in that case, or
some way for a primary and backup ring server to communicate with each
other and share information about registered services transparently?

1) Run more than one RingServer, and have each cross-register the other's services. (This service doesn't have to run on the RingServer itself, actually...)

2) The RingServer removes services automatically when a service's renewer fails to respond. Renewers are invoked after some timeout. On the service side, if the service's renewer is not invoked within a timeout, you could have the service re-register itself, something like IRC's PING/PONG handshake.

PGP.sig (186 Bytes)

ยทยทยท

On 20 Jan 2005, at 22:10, Dido Sevilla wrote:

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04