[BUG] thread/sync.rb memory corruption

i’m using ef (electric fence) to show the memory corruption in this script.
to cause it corrupt memory use

CORRUPT=true ef ruby bug.rb

file: bug.rb

 require 'sync'

 class A
   def initialize
     extend Sync_m
     @observers = []
   end
   def meth
     synchronize(:EX){
       @observers.each do |o|
         if ENV['CORRUPT']
           o.notify nil
         else
           o.notify
         end
       end
     }
   end
   def add_observer o
     synchronize(:EX){
       @observers << o
     }
   end
 end

 class B
   def initialize a
     @a = a
     @a.add_observer self
   end
   def notify *a
     Thread.new{ @a.meth }
   end
 end

 a = A.new
 b = B.new a
 a.meth
 STDIN.gets

note that it is the simple passing, or not, of arguments to the notify method
which triggers the bug. perhaps this will yield some hints.

regards.

-a

bug.rb (547 Bytes)

···


suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.

  • h.h. the 14th dali lama

Ara, I still believe the error is in the script (see my other posting).

Kind regards

robert

···

2006/7/6, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov>:

i'm using ef (electric fence) to show the memory corruption in this script.
to cause it corrupt memory use

   CORRUPT=true ef ruby bug.rb

file: bug.rb

     require 'sync'

     class A
       def initialize
         extend Sync_m
         @observers =
       end
       def meth
         synchronize(:EX){
           @observers.each do |o|
             if ENV['CORRUPT']
               o.notify nil
             else
               o.notify
             end
           end
         }
       end
       def add_observer o
         synchronize(:EX){
           @observers << o
         }
       end
     end

     class B
       def initialize a
         @a = a
         @a.add_observer self
       end
       def notify *a
         Thread.new{ @a.meth }
       end
     end

     a = A.new
     b = B.new a
     a.meth
     STDIN.gets

note that it is the simple passing, or not, of arguments to the notify method
which triggers the bug. perhaps this will yield some hints.

regards.

-a
--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

--
Have a look: Robert K. | Flickr

hi robert-

i never saw any post!? maybe mailing issues again? sigh...

can you re-state?

regards.

-a

···

On Thu, 6 Jul 2006, Robert Klemme wrote:

Ara, I still believe the error is in the script (see my other posting).

Kind regards

robert

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

> Ara, I still believe the error is in the script (see my other posting).
>
> Kind regards
>
> robert

hi robert-

i never saw any post!? maybe mailing issues again? sigh...

can you re-state?

No prob. I'll put you on CC - just in case.

robert

···

2006/7/6, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov>:

On Thu, 6 Jul 2006, Robert Klemme wrote:

2006/7/6, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov>:

On Thu, 6 Jul 2006, Tom Rauchenwald wrote:

> ara.t.howard@noaa.gov writes:
>
>> any idea why this script slows downs drastically as it runs and seems to leak
>> memory?
>
> First, I'm not a Ruby expert or something, but..
>
> [Code skipped]
>
>> switch = Switch.new
>
> Here you create the switch.
>
>> toggle = SwitchToggle.new switch
>
> Here, toggle is created and registered as listener (in the constructor)
>
>> switch.on!
>
> So now you switch it on.
> Toggle gets notified, and creates a thread that switches it off.
> But when that Thread switches it off, toggle gets notified, and starts a
> thread that switches it on again..
> Basically it's an infinite loop, and i suspect it's leaking because GC
> doesn't kick in.

it would be, but the synchronize prevents two threads from entering any of the
Switch methods at a time so only thread can actually be running at once.

Yes, but they are still queued up against the mutex. I completely
agree with Tom's analysis. Having an Observer trigger the
notification again is definitively a bad idea - at least if events are
not queued up.

Kind regards

robert

--
Have a look: http://www.flickr.com/photos/fussel-foto/

ah - but they are indeed queued - this is precisely what sync.rb is supposed
to do. you can prove this for yourself by adding

   p 'Thread.list.size' => Thread.list.size

in the notify method. if you do so you'll see something like

   on @ 1152201072.40667
   {"Thread.list.size"=>1}
   off @ 1152201072.41042
   {"Thread.list.size"=>2}
   on @ 1152201072.41508
   {"Thread.list.size"=>3}
   off @ 1152201072.4219
   {"Thread.list.size"=>3}
   on @ 1152201072.43386
   {"Thread.list.size"=>2}
   off @ 1152201072.44186
   {"Thread.list.size"=>3}
   on @ 1152201072.46079
   {"Thread.list.size"=>2}

the number of threads running will never exceed three. the bug does not seem
to be directly caused by sync.rb in any case. here is a more minimal script
which will produce the error:

     require 'sync'
     class A
       def initialize
         extend Sync_m
         @observers =
       end
       def meth
         synchronize(:EX){
           @observers.each do |o|
             if ENV['CORRUPT']
               o.notify nil
             else
               o.notify
             end
           end
         }
       end
       def add_observer o
         synchronize(:EX){
           @observers << o
         }
       end
     end

     class B
       def initialize a
         @a = a
         @a.add_observer self
       end
       def notify *a
         Thread.new{ @a.meth }
       end
     end

     a = A.new
     b = B.new a
     a.meth
     STDIN.gets

if you run it normally

   ruby a.rb

it will run fine. if you run with

   CORRUPT=true ruby a.rb

you will trash memory. if you have ef and do

   CORRUPT=true ef ruby a.rb

electric fence will core dump almost immediately.

note that the only difference between crashing and not is calling notify with
and argument - when that occurs the number of threads will remain steady, but
the GC will stop collecting finished threads even though no reference to them
is held.

regards.

-a

···

On Thu, 6 Jul 2006, Robert Klemme wrote:

Yes, but they are still queued up against the mutex. I completely agree
with Tom's analysis. Having an Observer trigger the notification again is
definitively a bad idea - at least if events are not queued up.

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

> Yes, but they are still queued up against the mutex. I completely agree
> with Tom's analysis. Having an Observer trigger the notification again is
> definitively a bad idea - at least if events are not queued up.

ah - but they are indeed queued - this is precisely what sync.rb is supposed
to do. you can prove this for yourself by adding

With queued I meant something different in this case, i.e. have a
single event queue in the obervable that is processed by a single
thread only.

   p 'Thread.list.size' => Thread.list.size

in the notify method. if you do so you'll see something like

   on @ 1152201072.40667
   {"Thread.list.size"=>1}
   off @ 1152201072.41042
   {"Thread.list.size"=>2}
   on @ 1152201072.41508
   {"Thread.list.size"=>3}
   off @ 1152201072.4219
   {"Thread.list.size"=>3}
   on @ 1152201072.43386
   {"Thread.list.size"=>2}
   off @ 1152201072.44186
   {"Thread.list.size"=>3}
   on @ 1152201072.46079
   {"Thread.list.size"=>2}

Well, yes. Still I think that the design is flawed because there is a
recursion via threads. If you have two listeners the # of threads
will constantly grow. Try it out! I tried it with these changes
(attached). Probably your version crashes with a single listener
because threads are not as fast GC'ed as on my machine or whatever.

the number of threads running will never exceed three. the bug does not seem
to be directly caused by sync.rb in any case. here is a more minimal script
which will produce the error:

Try my changes. More and more threads queue up at the mutex.

it will run fine. if you run with

   CORRUPT=true ruby a.rb

you will trash memory. if you have ef and do

   CORRUPT=true ef ruby a.rb

electric fence will core dump almost immediately.

Btw, it does not crash on cygwin - neither way. :slight_smile:

note that the only difference between crashing and not is calling notify with
and argument - when that occurs the number of threads will remain steady, but
the GC will stop collecting finished threads even though no reference to them
is held.

Weird indeed.

Kind regards

robert

cr.rb (566 Bytes)

···

2006/7/6, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov>:

On Thu, 6 Jul 2006, Robert Klemme wrote:

--
Have a look: Robert K. | Flickr

With queued I meant something different in this case, i.e. have a single
event queue in the obervable that is processed by a single thread only.

<snip>

Well, yes. Still I think that the design is flawed because there is a
recursion via threads. If you have two listeners the # of threads will
constantly grow. Try it out! I tried it with these changes (attached).
Probably your version crashes with a single listener because threads are not
as fast GC'ed as on my machine or whatever.

agreed. this is actually the model i am using in the real code. note that
this code is simply a minimal example which triggers the bug. regardless of
whether it's a bad design or not (which it is) memory should not be corrupted
by a simply ruby script - if so i think it's a bug.

note that it's not that it simple gets slow that i'm showing - if you run
under electric fence is shows actual memory corruption - same with valgrind.

Try my changes. More and more threads queue up at the mutex.

oh i know - that's how the real code works - with a Queue and a single thread
consumer.

Btw, it does not crash on cygwin - neither way. :slight_smile:

huh. suprising. do you have electric fence? you cannot make it crash
without it unless you wait a very long time.

Weird indeed.

yes - this is the issue that concerns me.

cheers.

-a

cr.rb (566 Bytes)

···

On Thu, 6 Jul 2006, Robert Klemme wrote:
--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

agreed. this is actually the model i am using in the real code. note that
this code is simply a minimal example which triggers the bug. regardless of
whether it's a bad design or not (which it is) memory should not be corrupted
by a simply ruby script - if so i think it's a bug.

Hm, I'm not sure I agree. Try this one liner that will eventually
blow because too many threads are created that all wait for the mutex:

robert@fussel ~
$ ruby -r thread -e 'm=Mutex.new; m.synchronize { loop { Thread.new {
m.synchronize { } } } }'
[FATAL] failed to allocate memory

robert@fussel ~
$

This is not a Ruby bug but a but in the code.

note that it's not that it simple gets slow that i'm showing - if you run
under electric fence is shows actual memory corruption - same with valgrind.

> Try my changes. More and more threads queue up at the mutex.

oh i know - that's how the real code works - with a Queue and a single thread
consumer.

Good. You could also prevent notifications while notifying. Or store
notifications in a queue and loop as long in the notify method as
there are notifications in the queue.

> Btw, it does not crash on cygwin - neither way. :slight_smile:

huh. suprising. do you have electric fence? you cannot make it crash
without it unless you wait a very long time.

Please recheck that it's not the same situation I presented above. I
still have the feeling that this might be the real reason.

Kind regards

robert

···

2006/7/6, Ara.T.Howard <ara.t.howard@noaa.gov>:

--
Have a look: Robert K. | Flickr

This is an out of memory failure, with no unreleased GC'able memory in
sight. Of course this fails, and there is no alternative to this.

Memory corruption is a very different, and should not be produceable
by ruby scripts.

-Jürgen

···

On Fri, Jul 07, 2006 at 05:01:25AM +0900, Robert Klemme wrote:

2006/7/6, Ara.T.Howard <ara.t.howard@noaa.gov>:
>
>agreed. this is actually the model i am using in the real code. note that
>this code is simply a minimal example which triggers the bug. regardless
>of
>whether it's a bad design or not (which it is) memory should not be
>corrupted
>by a simply ruby script - if so i think it's a bug.

Hm, I'm not sure I agree. Try this one liner that will eventually
blow because too many threads are created that all wait for the mutex:

robert@fussel ~
$ ruby -r thread -e 'm=Mutex.new; m.synchronize { loop { Thread.new {
m.synchronize { } } } }'
[FATAL] failed to allocate memory

--
The box said it requires Windows 95 or better so I installed Linux

yes - this is exactly my point - no whether the approach is sane or not. it
definitely should not be able to corrupt memory with a vanilla ruby script
but, with the one i posted, it seems to be.

regards.

-a

···

On Fri, 7 Jul 2006, Juergen Strobel wrote:

Memory corruption is a very different, and should not be produceable
by ruby scripts.

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama