Ruby lacks atfork : The evil that lives in fork

Consider this simple usage of Thread and Process....

I use a mutex to block access to the $state variable when it is in
an "inconsistent" state.

···

======================================================================
require 'thread'
require 'pp'
Thread.abort_on_exception = true
$state = "Uninited"

def state(m)
    print "#{caller(0)[1]}: #{Time.now} #{$state} "
    if m.locked?
       puts "Mutex locked"
    else
       puts "Mutex unlocked"
    end
end

m = Mutex.new
state(m)
$state = "Good"
t = Thread.new do
    begin
       state(m)
       m.synchronize do
          $state = "Inconsistent"
          state(m)
          sleep 10
          $state = "Good Again"
          state(m)
       end
    ensure
       state(m)
    end
end

state(m)
sleep 2

state(m)

pid = Process.fork do
    state(m)

    sleep 2

    state(m)
end

state(m)
pp Process.waitpid2(pid)

t.join

state(m)

This is what it outputs...

ruby -v;ruby -w evil_fork.rb
ruby 1.8.7 (2008-06-20 patchlevel 22) [i686-linux]
evil_fork.rb:16: Mon Oct 06 14:51:17 +1300 2008 Uninited Mutex unlocked
evil_fork.rb:20: Mon Oct 06 14:51:17 +1300 2008 Good Mutex unlocked
evil_fork.rb:23: Mon Oct 06 14:51:17 +1300 2008 Inconsistent Mutex locked
evil_fork.rb:33: Mon Oct 06 14:51:17 +1300 2008 Inconsistent Mutex locked
evil_fork.rb:36: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex locked
evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex unlocked
evil_fork.rb:46: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex locked
evil_fork.rb:43: Mon Oct 06 14:51:21 +1300 2008 Inconsistent Mutex unlocked
[5082, #<Process::Status: pid=5082,exited(0)>]
evil_fork.rb:26: Mon Oct 06 14:51:27 +1300 2008 Good Again Mutex locked
evil_fork.rb:29: Mon Oct 06 14:51:27 +1300 2008 Good Again Mutex unlocked
evil_fork.rb:51: Mon Oct 06 14:51:27 +1300 2008 Good Again Mutex unlocked

======================================================================

Oh dear!

When I Process.fork'ed I saw this....
evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex unlocked

ie. I could be accessing $state when it is in an inconsistent state
and the Mutex doesn't protect me.

From the fork man page...

        * The child process is created with a single thread — the one
          that called fork(). The entire virtual address space of
          the parent is replicated in the child, including the states
          of mutexes, condition variables, and other pthreads objects;
          the use of pthread_atfork(3) may be helpful for dealing with
          problems that this can cause.

Unfortunately Ruby doesn't provide an atfork facility.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

Hi,

···

In message "Re: Ruby lacks atfork : The evil that lives in fork..." on Mon, 6 Oct 2008 11:11:26 +0900, John Carter <john.carter@tait.co.nz> writes:

Consider this simple usage of Thread and Process....

I use a mutex to block access to the $state variable when it is in
an "inconsistent" state.

When I Process.fork'ed I saw this....
evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex unlocked

ie. I could be accessing $state when it is in an inconsistent state
and the Mutex doesn't protect me.

I am not sure what you meant here. It worked as I expected. You
didn't wrap state(m) by synchronize, so that they are not mutually
exclusive. What did you expect out of the script?

              matz.

John Carter wrote:

From the fork man page...

        * The child process is created with a single thread — the one
          that called fork().

That manpage is talking about OS threads, not Ruby threads.

If you want to know how Ruby handles its green threads through fork, you
need to refer to "ri Process::fork" instead.

According to ri, when (Ruby's) fork is called only the currently-running
(Ruby) thread continues to live in the child process.

          The entire virtual address space of
          the parent is replicated in the child, including the states
          of mutexes, condition variables, and other pthreads objects;

This is talking about OS mutexes etc. Again, the Ruby objects with
corresponding names are entirely different.

···

--
Posted via http://www.ruby-forum.com/\.

state(m) is merely reporting the value of $state and the whether the
mutex was locked or not.

For the time $state is "Inconsistent", the mutex should be in a locked
state. Which it is, when view by any other thread _in the same
process_.

However, if you fork a process, the mutex in the child process is in
the unlocked state whilst the resource is still in the inconsistent
state.

The usual pattern for a lock/unlock pair is to be wrapped round some
access to a shared resource.

In this case the shared resource is $state.

Let us make that more explicit. Suppose we are transferring money from
one account to another...

···

On Mon, 6 Oct 2008, Yukihiro Matsumoto wrote:

Hi,

In message "Re: Ruby lacks atfork : The evil that lives in fork..." > on Mon, 6 Oct 2008 11:11:26 +0900, John Carter <john.carter@tait.co.nz> writes:

>Consider this simple usage of Thread and Process....
>
>I use a mutex to block access to the $state variable when it is in
>an "inconsistent" state.

>When I Process.fork'ed I saw this....
>evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex unlocked
>
>ie. I could be accessing $state when it is in an inconsistent state
>and the Mutex doesn't protect me.

I am not sure what you meant here. It worked as I expected. You
didn't wrap state(m) by synchronize, so that they are not mutually
exclusive. What did you expect out of the script?

======================================================================
require 'thread'
Thread.abort_on_exception = true
STDOUT.sync = true

$account_a = 100
$account_b = 100
$total = $account_a + $account_b
$mutex = Mutex.new

def log(msg,level=1)
    puts "\n#{caller(0)[level]}:#{Time.now} #{msg}"
end

def invariant_check
    if $total == ($account_a + $account_b)
       log( "We are in a consistent state", 2)
    else
       log( "We are in an inconsistent state", 2)
    end
end

def transfer( sum)
    log "At the start of transaction the invariant holds $account_a + $account_b == 200"
    invariant_check
    $mutex.synchronize do
       log "Got lock"
       $account_a = $account_a - sum
       log " For the next 10 seconds we have lost money from our system. We are inconsistent."
       sleep 10
       $account_b = $account_b + sum
       log "Ah! Their it is again. We're consistent again."
    end
    log "Invariant holds at end"
    invariant_check end

t1 = Thread.new do
    log "Sleep 4 to ensure we wait for other"
    sleep 4
    log "Try get lock, can't since t2 has it. #{$mutex.locked?}"
    $mutex.synchronize do
       log "Only unblocks after 12 seconds into the program"
       invariant_check
       log "Release lock"
    end
    log "t1 exits"
end

sleep 1

t2 = Thread.new do
    log "t2 grabs lock immediately and holds for 10"
    transfer(50)
    log "t2 exits"
end

sleep 1

pid = Process.fork do
    log "Forked process wakes and sleeps 5"
    sleep 5
    log "By now t2 has the lock, but will try get it anyway"
    log( "Looky the lock is free") if !$mutex.locked?
    $mutex.synchronize do
       log "What! it Unblocks immediately!"
       log "Announces we're inconsistent!"
       invariant_check
       log "Relinquish lock"
    end
    log "exit process"
end

log "Wait for process"
p Process.waitpid2 pid

log "Wait for t1"
t1.join

log "Wait for t2"
t2.join

Then the output is...
ruby -w fork.rb

fork.rb:40:Mon Oct 06 17:23:25 +1300 2008 Sleep 4 to ensure we wait for other

fork.rb:54:Mon Oct 06 17:23:26 +1300 2008 t2 grabs lock immediately and holds for 10

fork.rb:24:in `transfer':Mon Oct 06 17:23:26 +1300 2008 At the start of transaction the invariant holds $account_a + $account_b == 200

fork.rb:25:in `transfer':Mon Oct 06 17:23:26 +1300 2008 We are in a consistent state

fork.rb:27:in `transfer':Mon Oct 06 17:23:26 +1300 2008 Got lock

fork.rb:29:in `transfer':Mon Oct 06 17:23:26 +1300 2008 For the next 10 seconds we have lost money from our system. We are inconsistent.

fork.rb:62:Mon Oct 06 17:23:27 +1300 2008 Forked process wakes and sleeps 5

fork.rb:75:Mon Oct 06 17:23:27 +1300 2008 Wait for process

fork.rb:42:Mon Oct 06 17:23:29 +1300 2008 Try get lock, can't since t2 has it. true

fork.rb:64:Mon Oct 06 17:23:32 +1300 2008 By now t2 has the lock, but will try get it anyway

fork.rb:65:Mon Oct 06 17:23:32 +1300 2008 Looky the lock is free

fork.rb:67:Mon Oct 06 17:23:32 +1300 2008 What! it Unblocks immediately!

fork.rb:68:Mon Oct 06 17:23:32 +1300 2008 Announces we're inconsistent!

fork.rb:69:Mon Oct 06 17:23:32 +1300 2008 We are in an inconsistent state

fork.rb:70:Mon Oct 06 17:23:32 +1300 2008 Relinquish lock

fork.rb:72:Mon Oct 06 17:23:32 +1300 2008 exit process
[15355, #<Process::Status: pid=15355,exited(0)>]

fork.rb:78:Mon Oct 06 17:23:32 +1300 2008 Wait for t1

fork.rb:32:in `transfer':Mon Oct 06 17:23:36 +1300 2008 Ah! Their it is again. We're consistent again.

fork.rb:44:Mon Oct 06 17:23:36 +1300 2008 Only unblocks after 12 seconds into the program
fork.rb:34:in `transfer':Mon Oct 06 17:23:36 +1300 2008 Invariant holds at end

fork.rb:45:Mon Oct 06 17:23:36 +1300 2008 We are in a consistent state
fork.rb:35:in `transfer':Mon Oct 06 17:23:36 +1300 2008 We are in a consistent state

fork.rb:46:Mon Oct 06 17:23:36 +1300 2008 Release lock
fork.rb:56:Mon Oct 06 17:23:36 +1300 2008 t2 exits

fork.rb:48:Mon Oct 06 17:23:36 +1300 2008 t1 exits

fork.rb:81:Mon Oct 06 17:23:36 +1300 2008 Wait for t2

======================================================================

Where the crucial lines are...
fork.rb:65:Mon Oct 06 17:23:32 +1300 2008 Looky the lock is free

fork.rb:67:Mon Oct 06 17:23:32 +1300 2008 What! it Unblocks immediately!

fork.rb:68:Mon Oct 06 17:23:32 +1300 2008 Announces we're inconsistent!

fork.rb:69:Mon Oct 06 17:23:32 +1300 2008 We are in an inconsistent state

fork.rb:70:Mon Oct 06 17:23:32 +1300 2008 Relinquish lock

The solution provided by POSIX is pthread_at_fork

        pthread_atfork - register handlers to be called at fork(2) time

SYNOPSIS
        #include <pthread.h>

        int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void));

DESCRIPTION

        "pthread_atfork" registers handler functions to be called just
        before and just after a new process is created with
        "fork"(2). The 'prepare' handler will be called from the parent
        process, just before the new process is created. The 'parent'
        handler will be called from the parent process, just before
        "fork"(2) returns. The 'child' handler will be called from the
        child process, just before "fork"(2) returns.

        One or several of the three handlers 'prepare', 'parent' and
        'child' can be given as "NULL", meaning that no handler needs
        to be called at the corresponding point.

        "pthread_atfork" can be called several times to install several
        sets of handlers. At "fork"(2) time, the 'prepare' handlers are
        called in LIFO order (last added with "pthread_atfork", first
        called before "fork"), while the 'parent' and 'child' handlers
        are called in FIFO order (first added, first called).

        To understand the purpose of "pthread_atfork", recall that
        "fork"(2) duplicates the whole memory space, including mutexes
        in their current locking state, but only the calling thread:
        other threads are not running in the child process. The
        mutexes are not usable after the "fork" and must be ini‐
        tialized with 'pthread_mutex_init' in the child process. This
        is a limitation of the current imple‐ mentation and might or
        might not be present in future versions.

Which, in my example may grab the Mutex in the parent process for the
lifetime of the child, leaving it unlocked in the child process.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

John Carter wrote:

From the fork man page...

        * The child process is created with a single thread — the one
          that called fork().

That manpage is talking about OS threads, not Ruby threads.

Correct. But I had just demonstrated that the problem pthread_atfork
was designed to solve exists within ruby threads.

According to ri, when (Ruby's) fork is called only the currently-running
(Ruby) thread continues to live in the child process.

Exactly the same as with pthreads and linux fork.

          The entire virtual address space of
          the parent is replicated in the child, including the states
          of mutexes, condition variables, and other pthreads objects;

This is talking about OS mutexes etc. Again, the Ruby objects with
corresponding names are entirely different.

That is neither here not there. The point is I have just shown that
problem described exists within in Ruby.

ie. If deep within a library routine there is are threads and mutexes
and deep within another library routine there is a Process.fork the
potential for "the wrong thing" to happen exists.

Where the wrong thing is that :- if a thread is in the critical
section protected by the mutex, it may leave it an inconsistent and
unusable state when the "fork" is executed by another thread.

If the child process ever invokes the library with the mutex, it may
find the Mutex unlocked, when it should be locked, and hence enter a
critical section, when it shouldn't, and find an inconsistent state
which leads to an erroneous result.

The solution proposed by POSIX is to provide the facility for
libraries to chain handlers, to handle in some sensible fashion, any
fork event occurring in a different library.

If Ruby can come up with a better mechanism than atfork to handle this
problem, I would be very pleased.

But some solution is required to be able to have reusable multiple
libraries some of which use sub-processes, some which use threads.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

···

On Tue, 7 Oct 2008, Brian Candler wrote:

Hi,

···

In message "Re: Ruby lacks atfork : The evil that lives in fork..." on Mon, 6 Oct 2008 13:32:30 +0900, John Carter <john.carter@tait.co.nz> writes:

state(m) is merely reporting the value of $state and the whether the
mutex was locked or not.

For the time $state is "Inconsistent", the mutex should be in a locked
state. Which it is, when view by any other thread _in the same
process_.

However, if you fork a process, the mutex in the child process is in
the unlocked state whilst the resource is still in the inconsistent
state.

When you fork off the process, the entire resources are (virtually)
copied, so that there's no way to ensure (copied) mutex to share
locking status across processes. The basic rule is: don't mix threads
(and thread related resources like mutex) with processes.

              matz.

The problem with "don't mix threads with processes" is unless you
inspect the source code of each version of each library in turn... it
is very hard to prove that nothing in your system is using a thread
and a process together.

Ah, but that's the point of pthread_atfork... it gives you several
possible strategies to allow you to mix threads with processes...

1) Give the resource to the child.

    Use the atfork handler to lock the mutex in the prepare handler
    (blocking if need be until you can obtain it) perform the fork,
    release the lock in the child handler. Thereafter the resource will be
    unobtainable in the parent until the child exits, and the child may
    continue to use the resource as need be.

2) Give the resource to the parent.

    Use atfork to lock the resource in the prepare handler, ensure it is
    locked in the child handler, unlock in the parent
    handler. Thereafter the child process will find it is always
    locked, and the parent process will be able to access it.

3) Lock both out for the duration.

4) Mutate the mutex into a valid interprocess lock like flock or
    fcntl.

Sun solaris has a "fork_all" variant that creates copies of all
actives threads as well... but I'm not sure that 'fork_all" really
solves the problem instead of multiplying it.

What the open group has to say on the subject is informative...

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

     There are at least two serious problems with the semantics of
     fork() in a multi-threaded program. One problem has to do with
     state (for example, memory) covered by mutexes. Consider the case
     where one thread has a mutex locked and the state covered by that
     mutex is inconsistent while another thread calls fork(). In the
     child, the mutex is in the locked state (locked by a nonexistent
     thread and thus can never be unlocked). Having the child simply
     reinitialize the mutex is unsatisfactory since this approach does
     not resolve the question about how to correct or otherwise deal
     with the inconsistent state in the child.

     It is suggested that programs that use fork() call an exec
     function very soon afterwards in the child process, thus resetting
     all states. In the meantime, only a short list of
     async-signal-safe library routines are promised to be available.

     Unfortunately, this solution does not address the needs of
     multi-threaded libraries. Application programs may not be aware
     that a multi-threaded library is in use, and they feel free to
     call any number of library routines between the fork() and exec
     calls, just as they always have. Indeed, they may be extant
     single-threaded programs and cannot, therefore, be expected to
     obey new restrictions imposed by the threads library.

     On the other hand, the multi-threaded library needs a way to
     protect its internal state during fork() in case it is re-entered
     later in the child process. The problem arises especially in
     multi-threaded I/O libraries, which are almost sure to be invoked
     between the fork() and exec calls to effect I/O redirection. The
     solution may require locking mutex variables during fork(), or it
     may entail simply resetting the state in the child after the
     fork() processing completes.

     The pthread_atfork() function provides multi-threaded libraries
     with a means to protect themselves from innocent application
     programs that call fork(), and it provides multi-threaded
     application programs with a standard mechanism for protecting
     themselves from fork() calls in a library routine or the
     application itself.

     The expected usage is that the prepare handler acquires all mutex
     locks and the other two fork handlers release them.

     For example, an application can supply a prepare routine that
     acquires the necessary mutexes the library maintains and supply
     child and parent routines that release those mutexes, thus
     ensuring that the child gets a consistent snapshot of the state of
     the library (and that no mutexes are left
     stranded). Alternatively, some libraries might be able to supply
     just a child routine that reinitializes the mutexes in the library
     and all associated states to some known value (for example, what
     it was when the image was originally executed).

     When fork() is called, only the calling thread is duplicated in
     the child process. Synchronization variables remain in the same
     state in the child as they were in the parent at the time fork()
     was called. Thus, for example, mutex locks may be held by threads
     that no longer exist in the child process, and any associated
     states may be inconsistent. The parent process may avoid this by
     explicit code that acquires and releases locks critical to the
     child via pthread_atfork(). In addition, any critical threads need
     to be recreated and reinitialized to the proper state in the child
     (also via pthread_atfork()).

     A higher-level package may acquire locks on its own data
     structures before invoking lower-level packages. Under this
     scenario, the order specified for fork handler calls allows a
     simple rule of initialization for avoiding package deadlock: a
     package initializes all packages on which it depends before it
     calls the pthread_atfork() function for itself.

Yes, I'm aware the author of that document was describe POSIX pthreads
not ruby threads. But clearly the same problems exist in both.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

···

On Tue, 7 Oct 2008, Yukihiro Matsumoto wrote:

When you fork off the process, the entire resources are (virtually)
copied, so that there's no way to ensure (copied) mutex to share
locking status across processes. The basic rule is: don't mix threads
(and thread related resources like mutex) with processes.

John Carter wrote:

Ah, but that's the point of pthread_atfork... it gives you several
possible strategies to allow you to mix threads with processes...

You can atfork in ruby; fsdb uses this kind of construct:

module ForkSafely
   def fork
     # clean up before forking
     super do
       # clean up after forking, in child
       # clean up inconsistent mutexes, etc.
       yield
     end
     # clean up after forking, in parent
   end
end

include ForkSafely

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Hi,

···

In message "Re: Ruby lacks atfork : The evil that lives in fork..." on Tue, 7 Oct 2008 09:41:55 +0900, John Carter <john.carter@tait.co.nz> writes:

The problem with "don't mix threads with processes" is unless you
inspect the source code of each version of each library in turn... it
is very hard to prove that nothing in your system is using a thread
and a process together.

The point is not to touch thread related objects from the forked child
process. They won't work as expected anyway. It's not too hard to
do, I believe. You were touching them in your example.

              matz.

Hmm. Cute. Really cute.

Can I chain the at fork handlers?
Does it work for Process.fork as well?

Let me try...

···

On Tue, 7 Oct 2008, Joel VanderWerf wrote:

You can atfork in ruby; fsdb uses this kind of construct:

module ForkSafely
def fork
   # clean up before forking
   super do
     # clean up after forking, in child
     # clean up inconsistent mutexes, etc.
     yield
   end
   # clean up after forking, in parent
end
end

include ForkSafely

======================================================================

module A

    def fork
       puts "Prereal"
       pid = super do
          puts "In real"
          yield
       end
       puts "post real"
       pid
    end

end
include A

pid = fork do
    puts "Did it work?"
end

Process.waitpid2 pid

module B
    def fork
       puts "Prereal1"
       pid = super do
          puts "In real1"
          yield
       end
       puts "post real1"
       pid
    end
end
include B

pid2 = fork do
    puts "Does chained work?"
end
Process.waitpid2 pid2

puts "Yes nested did work, does Process.fork work too?"

pid3 = Process.fork do
    puts "Does Process.fork work?"
end
Process.waitpid2 pid3

puts "No it didn't"

module C

    def Process.fork
       puts "b4 proc fork"
       super do
          puts "in proc fork"
          yield
       end
       puts "Post proc fork"
    end
end

include C

pid2 = Process.fork do
    puts "Bah"
end

ruby -w a.rb
Prereal
In real
Did it work?
post real
Prereal1
Prereal
In real
In real1
Does chained work?
post real
post real1
Yes nested did work, does Process.fork work too?
Does Process.fork work?
No it didn't
a.rb:52: warning: redefine fork
b4 proc fork
Prereal1
Prereal
In real
In real1
in proc fork
Bah
post real
post real1
Post proc fork

Hmm. Almost, but I can't get rid of this warning...
   a.rb:52: warning: redefine fork

any suggestions on how to get rid of that pesky warning?

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

use alias to copy it to a "backup" name before overriding.

   alias fork_orig fork

···

On Oct 6, 2008, at 20:08 PM, John Carter wrote:

Hmm. Almost, but I can't get rid of this warning...
a.rb:52: warning: redefine fork

any suggestions on how to get rid of that pesky warning?

John Carter wrote:

Can I chain the at fork handlers?

Chaining is possible. You can have multiple copies of the following code (changing the Mutex-specific part of course):

class Mutex
   module ForkSafely
     def fork
       super do
         ObjectSpace.each_object(Mutex) { |m| m.remove_dead }
         yield
       end
     end
   end
end

module ForkSafely
   include Mutex::ForkSafely
end
include ForkSafely

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407