[Q] synchronize a "mocked" clock in a distributed system

I've been banging on a problem for a few days now and don't feel any closer to solving it. I'm hoping some of the big brains on the ruby ML can shed some light. Following are a few paragraphs with a brief system overview before I state the problem. I apologize in advance for this question being only tangentially related to Ruby the language. :slight_smile:

I have written a distributed message passing system (in Ruby!) for doing some mathematical simulation work. Each component of the system does a very specific job. Each component may run on any of 3 distinct machines on a LAN. Components communicate with each other using the 0mq "socket" library to pass messages on well-defined ports that all components know about (hard-coded information instead of a dynamic lookup via a "service directory" mechanism).

The entire system is akin to a distributed state machine. I poke a command into it from the outside and it sets off a cascade of events which in turn generate more events until eventually I have my answer. Some the events have timeouts or other time-based characteristics associated with them. Also, some of the returned data has time-based characteristics (e.g. a timestamp) which impacts the transitioning of the state machine. It's all working quite nicely in real-time.

My problem is mocking out the time source so that I can run simulations in faster than real-time. For example, I may send a request for a data record and give it a 5 second timeout. This works fine when the clock source is the actual operating system, but if I want to run faster than real-time I need to mock the clock out. That is, I want to take a simulation that might run in 4 hours real-time (with lots of waiting or other timer related delays) to run in 20 minutes because 1 second of simulation time is only a fraction of a second in the real world.

This is simple to do for a single component on a single system because I can intercept all calls to Time and replace it with my own source. However, I don't know how to get all of the distributed components (across multiple machines or multiple processes on one machine) to use a mocked clock.

I tried googling around for answers, but all of the papers appear to be concerned with adjusting clock skew across a network where each device already has a local time source. I don't know if those solutions apply here.

Anyone have any bright ideas? Need more information?

cr

A very simplistic solution would be to use DRb and have a centralized clock. Depending on the number of clients this may of course turn out as a bottleneck. In that case you would have to devise a more complex mechanism.

Maybe looking at time protocols such as NTP might give you some inspiration. Basically you want to solve the same problem, just with a different time source (I don't think that a mocked NTP server will work because that needs local clocks with a particular precision.

Another option might be UDP broadcast with the "current time" - if network latency as precision is good enough. If not, again you need a more complex mechanism (see time protocols).

Kind regards

  robert

路路路

On 01.07.2010 23:10, Chuck Remes wrote:

My problem is mocking out the time source so that I can run
simulations in faster than real-time. For example, I may send a
request for a data record and give it a 5 second timeout. This works
fine when the clock source is the actual operating system, but if I
want to run faster than real-time I need to mock the clock out. That
is, I want to take a simulation that might run in 4 hours real-time
(with lots of waiting or other timer related delays) to run in 20
minutes because 1 second of simulation time is only a fraction of a
second in the real world.

This is simple to do for a single component on a single system
because I can intercept all calls to Time and replace it with my own
source. However, I don't know how to get all of the distributed
components (across multiple machines or multiple processes on one
machine) to use a mocked clock.

I tried googling around for answers, but all of the papers appear to
be concerned with adjusting clock skew across a network where each
device already has a local time source. I don't know if those
solutions apply here.

Anyone have any bright ideas? Need more information?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

It sounds like the way you've written your program is time-dependent, or as
ChucK (the music language) would describe it "strongly timed"

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
"strongly timed" synchronized distributed systems is rather non-trivial.

路路路

On Thu, Jul 1, 2010 at 3:10 PM, Chuck Remes <cremes.devlist@mac.com> wrote:

The entire system is akin to a distributed state machine. I poke a command
into it from the outside and it sets off a cascade of events which in turn
generate more events until eventually I have my answer. Some the events have
timeouts or other time-based characteristics associated with them. Also,
some of the returned data has time-based characteristics (e.g. a timestamp)
which impacts the transitioning of the state machine. It's all working quite
nicely in real-time.

My problem is mocking out the time source so that I can run simulations in
faster than real-time. For example, I may send a request for a data record
and give it a 5 second timeout. This works fine when the clock source is the
actual operating system, but if I want to run faster than real-time I need
to mock the clock out. That is, I want to take a simulation that might run
in 4 hours real-time (with lots of waiting or other timer related delays) to
run in 20 minutes because 1 second of simulation time is only a fraction of
a second in the real world.

--
Tony Arcieri
Medioh! A Kudelski Brand

Yes, I suppose it is strongly timed. I didn't realize that was going to be such a problem.

Right now it is completely asynchronous when running across multiple nodes. Each machine's clock is NTP synched so it just does the "right thing" when it runs in real-time. This notion of strongly timed doesn't rear its *ugly* head until I try to replace the clock.

I'm going to try to broadcast a clock pulse or heartbeat to all components. I can set it up so that each component uses the real clock when no clock pulse message has been received but switch over to the mocked clock when it sees the first clock message. Hopefully the delivery latencies don't cause too much trouble by skewing the time between components.

I'll try it and see. Thanks to all for the suggestions.

cr

路路路

On Jul 1, 2010, at 6:43 PM, Tony Arcieri wrote:

It sounds like the way you've written your program is time-dependent, or as
ChucK (the music language) would describe it "strongly timed"

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
"strongly timed" synchronized distributed systems is rather non-trivial.

<snip>

A very simplistic solution would be to use DRb and have a centralized clock.
Depending on the number of clients this may of course turn out as a
bottleneck. In that case you would have to devise a more complex mechanism.

Hmm would a messaging based time mocking server be faster? I say that
because that was my idea but I feel that Drb is easier to integrate.
Cheers
R

路路路

On Fri, Jul 2, 2010 at 12:10 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

--
The best way to predict the future is to invent it.
-- Alan Kay

Chuck Remes wrote:

路路路

On Jul 1, 2010, at 6:43 PM, Tony Arcieri wrote:

It sounds like the way you've written your program is time-dependent, or as
ChucK (the music language) would describe it "strongly timed"

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
"strongly timed" synchronized distributed systems is rather non-trivial.
    
Yes, I suppose it is strongly timed. I didn't realize that was going to be such a problem.

Right now it is completely asynchronous when running across multiple nodes. Each machine's clock is NTP synched so it just does the "right thing" when it runs in real-time. This notion of strongly timed doesn't rear its *ugly* head until I try to replace the clock.

I'm going to try to broadcast a clock pulse or heartbeat to all components. I can set it up so that each component uses the real clock when no clock pulse message has been received but switch over to the mocked clock when it sees the first clock message. Hopefully the delivery latencies don't cause too much trouble by skewing the time between components.

I'll try it and see. Thanks to all for the suggestions.

cr

Could you setup a mock NTP time source that supplies "fast" time to its clients then configure each machine to use the mock NTP and update very frequently? This may not be practical and would certainly not work if the machines are being used for anything except your tests.

I have heared that being killed by a sysadmin is a terrible fate :wink:

Cheers
R.

路路路

On Mon, Jul 5, 2010 at 7:44 PM, William Rutiser <wruyahoo05@comcast.net> wrote:

Could you setup a mock NTP time source that supplies "fast" time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if the
machines are being used for anything except your tests.

The idea of using a hacked NTP daemon to speed up the clocks in not feasible. Interesting idea though...

cr

路路路

On Jul 5, 2010, at 2:12 PM, Robert Dober wrote:

On Mon, Jul 5, 2010 at 7:44 PM, William Rutiser <wruyahoo05@comcast.net> wrote:

Could you setup a mock NTP time source that supplies "fast" time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if the
machines are being used for anything except your tests.

I have heared that being killed by a sysadmin is a terrible fate :wink:

Why can't the "central time" be maintained by whatever process is scattering
work to your distributed nodes, and just asynchronously included in the
messages for use whenever your workers get around to processing them?

路路路

On Mon, Jul 5, 2010 at 2:06 PM, Chuck Remes <cremes.devlist@mac.com> wrote:

The idea of using a hacked NTP daemon to speed up the clocks in not
feasible. Interesting idea though...

--
Tony Arcieri
Medioh! A Kudelski Brand

Because there is no centralized server that all messages, data or control must pass through.

cr

路路路

On Jul 5, 2010, at 3:21 PM, Tony Arcieri wrote:

On Mon, Jul 5, 2010 at 2:06 PM, Chuck Remes <cremes.devlist@mac.com> wrote:

The idea of using a hacked NTP daemon to speed up the clocks in not
feasible. Interesting idea though...

Why can't the "central time" be maintained by whatever process is scattering
work to your distributed nodes, and just asynchronously included in the
messages for use whenever your workers get around to processing them?

If your system is fully asynchronous and there's no central data source, how
is it possible for nodes to synchronize to a central clock? That makes
absolutely no sense.

路路路

On Mon, Jul 5, 2010 at 3:15 PM, Chuck Remes <cremes.devlist@mac.com> wrote:

Because there is no centralized server that all messages, data or control
must pass through.

--
Tony Arcieri
Medioh! A Kudelski Brand

I wrote a long email describing why I thought I was right, but I kept coming back to your earlier question about a centralized data source. The problem I have with my data source is that the documents within it have different time granularities for the data. For example, some documents represent data aggregated over 1m, 1 day or 1 week. Since documents of each time granularity may be requested by various processes, I didn't see how I could use them as a source for the mock clock.

And then it hit me. I could have a mock clock process that subscribes to all of those data sources and receives all of those messages. The mock clock should *only* pay attention to the document data with the smallest time granularity for setting the clock and ignore the rest.

So yes, you are right. I *do* have a central data source that I can use to set the clock. I just didn't see it before.

Thanks for pressing me on this. It forced me to really figure it out.

cr

路路路

On Jul 6, 2010, at 12:52 PM, Tony Arcieri wrote:

On Mon, Jul 5, 2010 at 3:15 PM, Chuck Remes <cremes.devlist@mac.com> wrote:

Because there is no centralized server that all messages, data or control
must pass through.

If your system is fully asynchronous and there's no central data source, how
is it possible for nodes to synchronize to a central clock? That makes
absolutely no sense.

Cool, glad I could help

路路路

On Tue, Jul 6, 2010 at 12:47 PM, Chuck Remes <cremes.devlist@mac.com> wrote:

On Jul 6, 2010, at 12:52 PM, Tony Arcieri wrote:

> On Mon, Jul 5, 2010 at 3:15 PM, Chuck Remes <cremes.devlist@mac.com> > wrote:
>
>> Because there is no centralized server that all messages, data or
control
>> must pass through.
>>
>
> If your system is fully asynchronous and there's no central data source,
how
> is it possible for nodes to synchronize to a central clock? That makes
> absolutely no sense.

I wrote a long email describing why I thought I was right, but I kept
coming back to your earlier question about a centralized data source. The
problem I have with my data source is that the documents within it have
different time granularities for the data. For example, some documents
represent data aggregated over 1m, 1 day or 1 week. Since documents of each
time granularity may be requested by various processes, I didn't see how I
could use them as a source for the mock clock.

And then it hit me. I could have a mock clock process that subscribes to
all of those data sources and receives all of those messages. The mock clock
should *only* pay attention to the document data with the smallest time
granularity for setting the clock and ignore the rest.

So yes, you are right. I *do* have a central data source that I can use to
set the clock. I just didn't see it before.

Thanks for pressing me on this. It forced me to really figure it out.

cr

--
Tony Arcieri
Medioh! A Kudelski Brand