Replacing the use of gettimeofday in the scheduler

Tomas_Pospisek · 5 March 2007 16:12

Quoting Tomas Pospisek <tpo2@sourcepole.ch>:

Quoting Yukihiro Matsumoto <matz@ruby-lang.org>:

>
> >This might well be. Not being a contributor to the Ruby kernel, I
> >don't know what the policy is: does Ruby only implement features which
> >can be built with pure POSIX, or can they have OS-specific
> >implementations?
>
> It can have OS-specific implementation, but I want the core behavior
> being common on most (if not all) platforms. Besides that, I have no
> idea to fix this "bug" on _any_ platform right now. Any idea?

So here's what I found out after a bit of research and my proposition for a
solution.

Most languages use native threads to implement multithreading so they do not
have to care about scheduling and blocking by themselves. There do not seem
to
be many languages/runtimes that use green threads.

* the Gambc Scheme implementation is regarded as being very high quality
  wrt to its implementation of multithreading/green threads. Allthough it
  goes to great lengths to be portable, it seems to base its scheduling
  decisions on the "wall-clock"
  (gettimeofday and clock_gettime(CLOCK_REALTIME, ...) ), so I expect Gambc
  to suffer from the very same scheduling problems as Ruby.

* Python uses native threads but Stackless Python implements green threads
  scheduling by directly accessing the Pentium's internal clock through the
  RDTCS machine instruction. I have not checked how it implements sleep, i.e.
  whether and how hh:mm:ss.mm is calculated from it. This solution has
  evidently a very high hardcore coolness geek factor but is not very
portable.

* The GNU Portable Threads Library is using gettimeofday as well thus...

So after a day or so of research I am realizing the shocking fact - what Matz
saw too - that the core of the problem is base POSIX not providing any
monotonic clock API and aparently everybody's scheduler being at the merci of
some sysadmin issuing a "date -s". If anybody knows any better, then pointers
are wellcome.

So I see three approaches for a solution:

1. eliminate the worst case:

   eval.c has a few places where the timeofday() function is used, almost
   exclusively to do something like the following:

     loop() {
         start: start = timeofday()
                    do_something()
         meanwhile: elapsed_time = timeofday() - start
                    remaining_time = elapsed_time - interval_of_interest
                    if( remaining_time < 0 )
                        break;
                    else
                        # loop again
      }

   the code between start and meanwhile represents here a critical section
   where no one should on a system scale be allowed to mess with system time,
   which timeofday() doesn't guarantee.

   Thus what we *can* do is to at least guarantee that remaining_time
   *never ever* increases:

     if( remaining_time > previous_remaining_time )
        remaining_time = previous_remaining_time;
     # else
        previous_remaining_time = remaining_time;

2. Do it "right":

   Doing it right would require having a monotonic time source, which the
   REALTIME extension of POSIX provides through the
   clock_gettime( CLOCK_MONOTONIC, ... ) function.

   Thus Ruby could schedule correctly on systems that *do* implement the
   POSIX REALTIME extension and use the old "broken" method on the other
   systems or add system specific solutions for those at a later time/as
   needed/submitted.

   Linux and DragonFly BSD do have CLOCK_MONOTONIC but OSX does not seem
   to have it. If people want to check about whether their systems provide
it,
   here's a test:

     #include <unistd.h>
     #ifdef _POSIX_MONOTONIC_CLOCK
     main() {
       printf("yes\n");
     }
     #endif

3. use Ruby's own thread_timer as a source or as a
   time_sanity_offset_correction

   However - I'm not sure whether this approach yields reliable results and
   does not additional unnecessary complexity

All solutions however have a semantic side effect: timeofday is being called
from:

   a) the scheduler
   b) from sleep()
   c) indirectly from timeout() through sleep()

guaranteeing that remaining_time never increases is good for
a) the scheduler and c) timeout(), but can break existing programs using
c) sleep(), in case someone was doing somthing along the lines of:

     # need to wake up at noon
     sleep_time = noon() - now()
     sleep( sleep_time )

With the current "broken" semantics, that would work just right, since with
the current implementation sleep() time would increase/decrease in parallel
with the "sysadmin" changing the system time with "date -s" or similar.

Thus the question here is: do we want the scheduler and timeout() to work as
naively expected even in a situation where "wall clock" suddenly changes or
do
we want sleep to work correctly in the same situation. Do we want absolute
"wall
clock" work right or do we want the relative "stop watch" to
work right?

I'd suggest to apply both solutions from above, that is:

a) eliminate the worst case behaveour, where "remaining_time" is growing with
   the current implementation Ruby has and

b) "do the right thing" and use clock_gettime( CLOCK_MONOTONIC, ... ) instead
   of gettimeofday where available.

Opinions? Shall I try to submit a patch?
*t

[1]
clock_getres

It's funny to note, that the identical problem was recently reported against
Python [1], that the discussion contains also the two proposed approaches [2]
and that there's no decision yet on how to proceed and thus the bug is still
open. So Ruby is not alone

*t

[1]
https://sourceforge.net/tracker/index.php?func=detail&aid=1607041&group_id=5470&atid=105470
[2]
https://sourceforge.net/tracker/index.php?func=detail&aid=1607149&group_id=5470&atid=305470

···

> In message "Re: replacing the use of gettimeofday in the scheduler" > > on Fri, 2 Mar 2007 08:07:20 +0900, "Avdi Grimm" <avdi@avdi.org> writes:

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Gary_Wright · 5 March 2007 16:32

Any serious sysadmin will use a NTP based configuration to manage the time
on their system. Time adjustments will slow or speed up the clock but won't
cause it to skip ahead or back.

Your proposal seems like a lot of effort to go through and it certainly won't
prevent a sysadmin from just issuing "kill -9 pid". My point is that you'll
never be able to write a program that successfully operates in the presence of
a malicious or ignorant superuser.

As a compromise, maybe a scheme could be devised to *detect* when the clock
has been readjusted and to cause an exception. I'm not sure if that would
be helpful or useful, but it is probably easier than trying to craft a
portable user-space scheduler that is independent of the system clock
(especially without vendor support via POSIX or other industry standard).

Gary Wright

···

On Mar 5, 2007, at 9:03 AM, Tomas Pospisek wrote:

With the current "broken" semantics, that would work just right, since with
the current implementation sleep() time would increase/decrease in parallel
with the "sysadmin" changing the system time with "date -s" or similar.

Tomas_Pospisek · 5 March 2007 18:34

Quoting Gary Wright <gwtmp01@mac.com>:

···

On Mar 5, 2007, at 9:03 AM, Tomas Pospisek wrote:
> With the current "broken" semantics, that would work just right,
> since with
> the current implementation sleep() time would increase/decrease in
> parallel
> with the "sysadmin" changing the system time with "date -s" or
> similar.

Any serious sysadmin will use a NTP based configuration to manage the
time on their system. Time adjustments will slow or speed up the clock
but won't cause it to skip ahead or back.

Your proposal seems like a lot of effort to go through and it
certainly won't prevent a sysadmin from just issuing "kill -9 pid". My
point is that you'll never be able to write a program that successfully
operates in the presence of a malicious or ignorant superuser.

As a compromise, maybe a scheme could be devised to *detect* when the
clock has been readjusted and to cause an exception. I'm not sure if that
would be helpful or useful, but it is probably easier than trying to craft a
portable user-space scheduler that is independent of the system clock
(especially without vendor support via POSIX or other industry
standard).

That means, to be precise, that ruby is "serious admin proof" only and won't
"correcly" survive some noob that actually dares to use "date -s" on his system
(or - for that matter - doubleclicks on the clock on the right on the bottom of
the screen) at the wrong instant...

And no, the effort to fix it is not so big. Here's the patch for the "do it
right" approach:

-------------------------------------------------------
--- eval.c.orig 2007-03-05 14:51:31.000000000 +0100
+++ eval.c 2007-03-05 18:12:42.823969936 +0100
@@ -9858,9 +9858,15 @@
static double
timeofday()
{
+#ifdef _POSIX_MONOTONIC_CLOCK
+ struct timespec tv;
+ clock_gettime(CLOCK_MONOTONIC, &tv );
+ return (double)tv.tv_sec + (double)tv.tv_nsec * 1e-9;
+#else
     struct timeval tv;
     gettimeofday(&tv, NULL);
     return (double)tv.tv_sec + (double)tv.tv_usec * 1e-6;
+#endif
}

#define STACK(addr) (th->stk_pos<(VALUE*)(addr) &&
(VALUE*)(addr)<th->stk_pos+th->stk_len)
-------------------------------------------------------

Some comments about this patch:

* it survives the unit tests in the test/ directory
* it actually does fix the issue (changing the system time while sleeping)

* clock_gettime requires linking with librt, so this needs to be taken care of
in configure.in
* for some reason unknown to me I had to use gcc > 3.5.5 to link it, maybe
I'd have needed to "make clean && configure ..." - I don't know.

* it only takes care of the problem on systems that have
  _POSIX_MONOTONIC_CLOCK. Other systems either need to provide their own ways
  or live with the old, "buggy" behaveour or can implement the other suggested
  fix namely to "fix the worst case" through not letting the "remaining_time"
  ever increase.

* it'd be nice to change the function name timeofday() to something more
neutral like getcurrenttime() or such, to disambiguate the name from the
POSIX functions

Greets,
*t

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Gary_Wright · 5 March 2007 19:43

That means, to be precise, that ruby is "serious admin proof" only and won't
"correcly" survive some noob that actually dares to use "date -s" on his system
(or - for that matter - doubleclicks on the clock on the right on the bottom of
the screen) at the wrong instant...

I don't think my comment was about Ruby but more about superusers. Python,
Java, C, C++, and so on are all defenseless against noob's with superuser
privileges.

And no, the effort to fix it is not so big. Here's the patch for the "do it
right" approach:

Well, I did qualify my statement about needing vendor support for the
appropriate standards...

Gary Wright

···

On Mar 5, 2007, at 1:34 PM, Tomas Pospisek wrote:

Topic		Replies	Views
The dangers of sleeping ruby-talk	1	78	11 June 2004
1-second events ruby-talk	14	142	9 June 2002
The dangers of sleeping ruby-talk	2	77	12 June 2004
Changing date backward messes up threaded sleep ruby-talk	12	125	7 January 2009
The dangers of sleeping ruby-talk	4	65	11 June 2004

Replacing the use of gettimeofday in the scheduler

Related topics