The real difference between Mutex and Sync

As I've mentioned before, Sync and Mutex are very similar, and Mutex
is very simple.

Their locking algorithm (for exclusive locking) is essentially identical.

And in some detailed examinations of Mutex's behavior, there's nothing
superficially wrong with it. It's pure ruby, so there are no funny
memory allocations at the C level, and it essentially operates by
using an array as a queue for waiting threads. Very little is
involved.

Sync does a bunch of other things, since it supports shared locking in
addition to exclusive, so it is much more complicated. However, the
main difference between them, when one is using exclusive locking, is
in the way they perform unlocking.

Mutex:

    Thread.critical = true
    @locked = false
    begin
      t = @waiting.shift
      t.wakeup if t
    rescue ThreadError
      retry
    end
    Thread.critical = false
    begin
      t.run if t
    rescue ThreadError
    end
    self

With Mutex, the next thread to unlock is shifted off the beginning of the array.

With Sync, there is one key difference:

    wait = sync_waiting
    self.sync_waiting = []
    Thread.critical = false
    for w in wait
      w.run
    end

With Sync, a copy is made of the array or waiting threads, the waiting
thread queue is set to a fresh, empty array, and all of the threads
pointed to in the copy are woken up. One will get the lock, and the
rest insert themselves back into the waiting queue.

That copy of the sync_waiting array, "wait", which is now the only
thing pointing at the original array, then gets thrown away, letting
it get garbage collected.

That's the key to the difference, right there.

With Mutex, the shift operation reduces the size of the array, but
shift never reallocs.

With Sync, that array gets thrown away, and when gc runs, it is
cleaned up. This whole thing with Mutex hinges on the memory
management behavior in array.c. This is why throwing away the mutex
and creating a new one between iterations in that test script produces
a result similar to what Sync produces, too, as the net effect is that
the array that the mutex uses to track threads gets thrown away.

I made a copy of Mutex and changed it to use a linked list instead of an array to queue waiting threads, thereby eliminating array.c from the mix while keeping the rest of Mutex.rb intact, and I now get completely perfect, deterministic behavior on both Linux (1.8.4 and 1.8.5) and Win XP (one-click ruby 1.8.4 and cygwin ruby 1.8.5)

array.c's behavior is what needs to be examined in greater detail, here.
Mutex, itself, is not doing anything surprising.

Kirk Haines

Good findings.

I think the problem lies in the Array#shift (rb_ary_shift)
implementation. Basically, it just increments the internal pointer and
it never modifies the size of an array. This means that if you have an
array with 1000 elements and you 'shift' it 999 times, all these
elements are still visible to the garbage collector, until you modify
this array by triggering rb_ary_store method, for example.

···

On 9/4/06, khaines@enigo.com <khaines@enigo.com> wrote:

array.c's behavior is what needs to be examined in greater detail, here.
Mutex, itself, is not doing anything surprising.

Kirk Haines

--
Kent
---
http://www.datanoise.com

I meant that rb_ary_shift never modifies the size of allocated memory.

···

On 9/4/06, Kent Sibilev <ksruby@gmail.com> wrote:

On 9/4/06, khaines@enigo.com <khaines@enigo.com> wrote:
>
> array.c's behavior is what needs to be examined in greater detail, here.
> Mutex, itself, is not doing anything surprising.
>
> Kirk Haines
>

Good findings.

I think the problem lies in the Array#shift (rb_ary_shift)
implementation. Basically, it just increments the internal pointer and
it never modifies the size of an array. This means that if you have an
array with 1000 elements and you 'shift' it 999 times, all these
elements are still visible to the garbage collector, until you modify
this array by triggering rb_ary_store method, for example.

--
Kent
---
http://www.datanoise.com

Good findings.

I think the problem lies in the Array#shift (rb_ary_shift)
implementation. Basically, it just increments the internal pointer and
it never modifies the size of an array. This means that if you have an
array with 1000 elements and you 'shift' it 999 times, all these
elements are still visible to the garbage collector, until you modify
this array by triggering rb_ary_store method, for example.

Would this help?

class Array
   alias rb_ary_shift shift
   def shift
     @len ||= length
     @shift_count ||= 0
     res = rb_ary_shift
     @shift_count += 1
     if @shift_count >= @len / 2
       @shift_count = nil
       @len = nil
       replace(dup)
     end
     res
   end
end

···

On Sep 4, 2006, at 2:50 PM, Kent Sibilev wrote:

--
Kent
---
http://www.datanoise.com

This patch will make Mutex a bit slower, but much better in terms of
garbage collection:

Index: lib/thread.rb

···

===================================================================
RCS file: /src/ruby/lib/thread.rb,v
retrieving revision 1.16.2.2
diff -r1.16.2.2 thread.rb
99c99
< @waiting.push Thread.current
---

      @waiting.unshift Thread.current

115c115
< t = @waiting.shift
---

      t = @waiting.pop

On 9/4/06, Kent Sibilev <ksruby@gmail.com> wrote:

On 9/4/06, Kent Sibilev <ksruby@gmail.com> wrote:
> On 9/4/06, khaines@enigo.com <khaines@enigo.com> wrote:
> >
> > array.c's behavior is what needs to be examined in greater detail, here.
> > Mutex, itself, is not doing anything surprising.
> >
> > Kirk Haines
> >
>
> Good findings.
>
> I think the problem lies in the Array#shift (rb_ary_shift)
> implementation. Basically, it just increments the internal pointer and
> it never modifies the size of an array. This means that if you have an
> array with 1000 elements and you 'shift' it 999 times, all these
> elements are still visible to the garbage collector, until you modify
> this array by triggering rb_ary_store method, for example.
>

I meant that rb_ary_shift never modifies the size of allocated memory.

--
Kent
---
http://www.datanoise.com

--
Kent
---
http://www.datanoise.com

While I am not benchmarking with performance in mind, that change doesn't seem to have any significant effect on overall speed, which doesn't surprise me. Speed-wise, it is just flipping the location of the expensive array operation from the unlock to the lock.

And yes, that simple change does seem to make a significant difference, because pop will realloc the array.

Kirk Haines

···

On Tue, 5 Sep 2006, Kent Sibilev wrote:

This patch will make Mutex a bit slower, but much better in terms of
garbage collection:

Index: lib/thread.rb

RCS file: /src/ruby/lib/thread.rb,v
retrieving revision 1.16.2.2
diff -r1.16.2.2 thread.rb
99c99
< @waiting.push Thread.current
---

      @waiting.unshift Thread.current

115c115
< t = @waiting.shift
---

      t = @waiting.pop

I've attached a trivial script that demonstrates on OS X and likely linux (unlikely windows because of the ps stuff) the problem. To see the bug, run it like:

  ruby blowup2.rb

A *simple* fix is to just remove the reference to whatever is pointed to by the first element. To see this work, run the script as:

  ruby blowup2.rb fix

This trick will probably lead to a quicker fix (and if you are daring you might actually patch the C implementation of shift to do it for you). Might want to hear what Matz has to say about that.

Cheers,
Bob

···

On Sep 5, 2006, at 4:25 AM, khaines@enigo.com wrote:

On Tue, 5 Sep 2006, Kent Sibilev wrote:

This patch will make Mutex a bit slower, but much better in terms of
garbage collection:

Index: lib/thread.rb

RCS file: /src/ruby/lib/thread.rb,v
retrieving revision 1.16.2.2
diff -r1.16.2.2 thread.rb
99c99
< @waiting.push Thread.current
---

      @waiting.unshift Thread.current

115c115
< t = @waiting.shift
---

      t = @waiting.pop

While I am not benchmarking with performance in mind, that change doesn't seem to have any significant effect on overall speed, which doesn't surprise me. Speed-wise, it is just flipping the location of the expensive array operation from the unlock to the lock.

And yes, that simple change does seem to make a significant difference, because pop will realloc the array.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

$VERBOSE = nil
STDOUT.sync = true
trap('INT'){ exit }

fixit = 0 < ARGV.size

m = 1000000
n = 1000

all_arrays = []
first = true

n.times do |i|
   if 0 < m then
     a = ["x" * m ]
     all_arrays << a
     a[0] = nil if fixit and 0 < a.size
     a.shift
   else
     a = []
     all_arrays << a
   end

   if 0 == (all_arrays.size % 10) then
     GC.start
     stdout = `ps v -p #{ Process.pid }`
     stdout = stdout.split(%r/\n/)
     if first then
       printf("\n %s\n", stdout.first)
       first = false
     end
     printf("%6d:: %s\n", all_arrays.size, stdout.last)
   end
end

This is 'windows' version, using pslist[1]
And, yes, it does the same under windows as well.

Jano

[1] http://www.sysinternals.com/Utilities/PsList.html

$VERBOSE = nil
STDOUT.sync = true
trap('INT'){ exit }

fixit = 0 < ARGV.size

m = 1000000
n = 1000

all_arrays = []
first = true

n.times do |i|
  if 0 < m then
    a = ["x" * m ]
    all_arrays << a
    a[0] = nil if fixit and 0 < a.size
    a.shift
  else
    a = []
    all_arrays << a
  end

  if 0 == (all_arrays.size % 10) then
    GC.start
    stdout = `pslist -m #{ Process.pid }`
    stdout = stdout.split(%r/\n/)
    if first then
      printf("\n %s\n", stdout[7])
      first = false
    end
    printf("%6d:: %s\n", all_arrays.size, stdout.last)
  end
end

···

On 9/5/06, Bob Hutchison <hutch@recursive.ca> wrote:

I've attached a trivial script that demonstrates on OS X and likely
linux (unlikely windows because of the ps stuff) the problem. To see