[ANN] Mongrel 0.3.13.4 Pre-Release -- Ruby's LEAK Fixed (Death To Mutex!)

Howdy Folks,

This release is after painstaking analysis of a memory leak that was
reported by Bradley Taylor, reduced by myself, and then fixed after much
work. You should all thank Bradley for finding the bizarre fix.

It turns out the Ruby has a memory leak when you use pretty much any
thread locking primitive other than Sync (Mutex, Monitor, etc.):

http://pastie.caboo.se/10194

The fix (for whatever reason) is to use Sync and put it in a block:

http://pastie.caboo.se/10317

Those two scripts are mini versions of how Mongrel manages threads so
that I could figure out a solution or get some input. The graph is
reported ram usage samples 1/second. As you can see the first leaking
graph goes up and doesn’t go down, the second (fixed) graph cycles
properly.

** This is a Ruby issue, so if you have software using Mutex or Monitor,
change to Sync now. **

Tests of this latest pre-release show that the RAM is properly cycled by
the GC and that it’s actually finally solved. If you run your app using
this release and you still have a leak then use the memory debugging
tools mongrel has to rule out your code (see below).

CHANGES

  • No more allow_concurrency. Until Ruby’s fixed I can’t let people do
    this anymore.
  • USR1 debugging. If you’re wondering about how Mongrel’s locking of
    Rails impacts your application, or what is causing BAD CLIENT then just
    hit your mongrel_rails with USR1 and Mongrel will tell you.
  • More extensive and accurate memory debugging. Use -B and look at the
    log/mongrel_log/objects.log to get a good idea of counts of objects,
    delta changes in counts, and mean+standard deviation lengths of objects
    with length methods.
  • Fixes a few places where sockets are closed and left in CLOSE_WAIT.

INSTALLING

As per usual:

sudo gem install mongrel --source=http://mongrel.rubyforge.org/releases/

Initial tests show it works on 1.8.5 and is actually faster, but this is
unsupported for now.

TESTING THIS RELEASE

If you want to test the memory leak, here’s the process:

  1. Start your application in production mode:
    mongrel_rails start -e production

  2. Hit it with USR1:
    killall -USR1 mongrel_rails

  3. Start running something that prints out the ram (here’s my fish
    code):
    while sleep 1
    ps aux | grep mongrel_rails | grep -v grep | grep -v gvim | ruby -aln
    -e “puts split[4 … 5].join(’,’)”
    end

  4. Thrash a simple rails controller with httperf:
    httperf --server 127.0.0.1 --port 3000 --num-conns 1000 --rate 120
    –uri /testuri

What you want to do is adjust num-conns and rate until Mongrel reports
“X threads waiting for /testuri…”

The bug only manifests itself when threads pile up behind the guard
around Rails dispatching. This is also how you’d find out which Rails
actions are too slow.

Please report any bugs you find in this release, and a Win32 release
will come out after I’m sure it works for everyone else.

···


Zed A. Shaw
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 – Come get help.

hi zed-

if you are really serious about fixing your leak i suggest you re-work your
tests. as i mentioned before they have several race conditions, not least of
which that they both start a random number of threads, not 1000 as the code
suggests (you can easily confirm by printing out the number of times the
thread init loop executes). further, sync.rb is the single ruby lib i've had
memory issues with on production systems. i have never managed to figure out
why that is...

in any case a careful script which allocated memory in a thread, waits for all
threads to finish allocation, checks memory, and then kills all threads before
checking again shows some suprising results which you should read carefully:

using mutex shows a nice cycle of memory freed:

     harp:~ > cat a.rb.mutex
     using: Mutex
     n: 420
     iter: 0
     with 420 threads holding memory : 44.0%
     with 0 threads holding memory : 13.0%
     iter: 1
     with 420 threads holding memory : 43.9%
     with 0 threads holding memory : 13.0%
     iter: 2
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.3%
     iter: 3
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.2%
     iter: 4
     with 420 threads holding memory : 44.0%
     with 0 threads holding memory : 13.5%
     iter: 5
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.2%
     iter: 6
     with 420 threads holding memory : 43.9%
     with 0 threads holding memory : 13.2%
     iter: 7
     with 420 threads holding memory : 44.2%
     with 0 threads holding memory : 13.2%
     iter: 8
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.5%
     iter: 9
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.9%

using sync, on the other hand, looks leaky, though i'm not saying it is.

     harp:~ > cat a.rb.sync
     using: Sync
     n: 420
     iter: 0
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 1.0%
     iter: 1
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 2.0%
     iter: 2
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 2.7%
     iter: 3
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 3.5%
     iter: 4
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 3.8%
     iter: 5
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 4.6%
     iter: 6
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 5.4%
     iter: 7
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 6.4%
     iter: 8
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 7.2%
     iter: 9
     with 420 threads holding memory : 43.7%
     with 0 threads holding memory : 8.1%

here is the code, note that it's quite careful to only create a fixed number
of threads, to wait for them to each init a mb of memory, and only then to
check memory usage. likewise for checking after killing all threads - it's
done immediately after killing threads and running gc. here is the code:

     harp:~ > cat a.rb
     require 'thread'
     require 'sync'

     class TestThreads
       def initialize which, n
         c = case which
           when /mutex/io
             Mutex
           when /sync/io
             Sync
         end
         @guard = c.new
         @n = Integer n
         puts "using: #{ c.name }"
         puts "n: #{ @n }"
       end

       def pct_mem # linux specific field pos i'm sure
         stdout = `ps v #{ Process.pid }`
         stdout.split(%r/\n/).last.strip.split(%r/\s+/)[8] + '%'
       end

       def tq
         q = Queue.new
         t = Thread.new{
           mb = @guard.synchronize{ 0.chr * (2 ** 20) }
           q.push :ready
           Thread.stop
         }
         [t, q]
       end

       def run
         list = []
         10.times do |i|
           puts "iter: #{ i }"

           # load 1000 threads up
           @n.times{ list << tq }

           # wait for all threads to init memory with mb of data
           list.each{|t,q| q.pop}

           # show memory usage
           GC.start
           puts "with #{ list.size } threads holding memory : #{ pct_mem }"

           # kill all threads - clean up
           list.each{|t,q| t.kill}
           list.clear
           sleep 1 until Thread.list.size == 1

           # show memory usage
           GC.start
           puts "with 0 threads holding memory : #{ pct_mem }"
         end
       end
     end

     $VERBOSE = nil
     STDOUT.sync = true
     Thread.abort_on_exception = true
     trap('INT'){ exit }

     which, n, ignored = ARGV
     TestThreads.new(which, n).run

in any case, i'd carefully examine your tests (or the rails code if that is
indeed what it's modeled after) to make sure that they test
Mutex/Sync/Thread/Ruby and not your os virtual memory system and look closely
at the results again - like i said, i have had issues with sync.rb.

the point here is that it is probably the code in question and not Mutex per
se that was causing your process to grow in vmsize.

regards.

-a

···

On Sat, 26 Aug 2006, Zed Shaw wrote:

Howdy Folks,

This release is after painstaking analysis of a memory leak that was
reported by Bradley Taylor, reduced by myself, and then fixed after much
work. You should all thank Bradley for finding the bizarre fix.

It turns out the Ruby has a memory leak when you use pretty much any
thread locking primitive other than Sync (Mutex, Monitor, etc.):

http://pastie.caboo.se/10194

The fix (for whatever reason) is to use Sync and put it in a block:

http://pastie.caboo.se/10317

Those two scripts are mini versions of how Mongrel manages threads so
that I could figure out a solution or get some input. The graph is
reported ram usage samples 1/second. As you can see the first leaking
graph goes up and doesn't go down, the second (fixed) graph cycles
properly.

** This is a Ruby issue, so if you have software using Mutex or Monitor,
change to Sync now. **

Tests of this latest pre-release show that the RAM is properly cycled by
the GC and that it's actually finally solved. If you run your app using
this release and you still have a leak then use the memory debugging
tools mongrel has to rule out your code (see below).

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

As I mentioned to you before, the evidence shows you are wrong. Sure
you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed
does operate the way Mongrel does. It's called a "bug reduction". I'm
not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

Also, as I said before, this shows the leak:

http://pastie.caboo.se/10194

And this script, with just 3 lines changed to use Sync shows it's fixed:

http://pastie.caboo.se/10317

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes with
hundreds of thousands of threads being cycled during the tests.

Not to mention about 8 other people switching to Sync report their leaks fixed,
our own test script showing it's fixed, Mongrel tests showing it's fixed,
several other frameworks showing it, and you can't argue with the evidence.

If your script has a leak then fine, just don't do that. Ultimately though the
ruby lang guys need to fix this because either way, there's leaks. For now,
Mongrel is not leaking and I'm happy with that.

Now, I'd appreciate it if you'd maybe spend your energy looking into the ruby
source to find this leak rather than bothering me about it.

Thank you.

···

On Sat, 2006-08-26 at 18:12 +0900, ara.t.howard@noaa.gov wrote:

On Sat, 26 Aug 2006, Zed Shaw wrote:

> Howdy Folks,
>
> This release is after painstaking analysis of a memory leak that was
> reported by Bradley Taylor, reduced by myself, and then fixed after much
> work. You should all thank Bradley for finding the bizarre fix.

if you are really serious about fixing your leak i suggest you re-work your
tests. as i mentioned before they have several race conditions, not least of
which that they both start a random number of threads, not 1000 as the code
suggests (you can easily confirm by printing out the number of times the
thread init loop executes). further, sync.rb is the single ruby lib i've had
memory issues with on production systems. i have never managed to figure out
why that is...

--
Zed A. Shaw


http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

I ran your test on OS/X looking at VSZ and RSS. And, like you, initially got Sync with no leak visible, and mutex with what looks like a bad leak. However, I notice that you only called GC once. I have a years old habit of always running GC at least three times when I really wanted GC to run (and in Java I had a loop that ran GC until it stopped freeing stuff which in some cases was eight or nine times). Superstition? Apparently not. On OS X, when I run GC three times neither sync nor mutex show a memory leak.

Zed, just for fun, try running GC a few times in a row (like GC.start; GC.start; GC.start) .

Cheers,
Bob

···

On Aug 26, 2006, at 5:12 AM, ara.t.howard@noaa.gov wrote:

in any case, i'd carefully examine your tests (or the rails code if that is
indeed what it's modeled after) to make sure that they test
Mutex/Sync/Thread/Ruby and not your os virtual memory system and look closely
at the results again - like i said, i have had issues with sync.rb.

the point here is that it is probably the code in question and not Mutex per
se that was causing your process to grow in vmsize.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

ara.t.howard@noaa.gov writes:

in any case a careful script which allocated memory in a thread,
waits for all threads to finish allocation, checks memory, and then
kills all threads before checking again shows some suprising results
which you should read carefully:

There's a problem with how this script measures its memory usage - it
measures %MEM, which is proportional to RSS. However, RSS is just how
much of the program's memory is *resident*; i.e. not swapped out.

To look at how much total memory is used, the variable you want is
VSZ, which isn't available in a "ps v" but is in a "ps u". I modified
your script to look at the %MEM, VSZ, and RSS numbers from "ps u" and
also to run GC.start a variable number of times.

Here's what I found, for running GC.start 4 times for mutex and sync,
on my system. VSZ is the middle of the three numbers; I really should
prune away the other two measures, as they are just indicative of how
actively my machine feels like swapping at that moment:

using: Mutex
n: 400
gctimes: 4
iter: 0
with 400 threads holding memory : 83.9 432136 262688
with 0 threads holding memory : 4.0 23792 12760
iter: 1
with 400 threads holding memory : 86.6 435044 271356
with 0 threads holding memory : 4.3 29808 13584
iter: 2
with 400 threads holding memory : 82.6 435060 258664
with 0 threads holding memory : 4.3 35872 13564
iter: 3
with 400 threads holding memory : 92.1 435240 288316
with 0 threads holding memory : 4.8 39052 15300
iter: 4
with 400 threads holding memory : 91.9 435012 287916
with 0 threads holding memory : 4.4 43252 13944
iter: 5
with 400 threads holding memory : 92.2 435060 288920
with 0 threads holding memory : 12.2 48440 38276
iter: 6
with 400 threads holding memory : 90.0 435000 282032
with 0 threads holding memory : 8.3 53516 26236
iter: 7
with 400 threads holding memory : 90.0 435012 281944
with 0 threads holding memory : 10.6 55500 33332
iter: 8
with 400 threads holding memory : 91.9 435108 287776
with 0 threads holding memory : 4.9 59632 15496
iter: 9
with 400 threads holding memory : 88.4 435024 276876
with 0 threads holding memory : 11.4 64828 35760

using: Sync
n: 400
gctimes: 4
iter: 0
with 400 threads holding memory : 87.1 432216 272720
with 0 threads holding memory : 4.4 23980 13864
iter: 1
with 400 threads holding memory : 85.4 435276 267484
with 0 threads holding memory : 4.4 31196 13880
iter: 2
with 400 threads holding memory : 85.0 435400 266116
with 0 threads holding memory : 4.2 37564 13280
iter: 3
with 400 threads holding memory : 89.8 435408 281320
with 0 threads holding memory : 4.1 43568 12876
iter: 4
with 400 threads holding memory : 92.1 435400 288608
with 0 threads holding memory : 4.4 47796 13988
iter: 5
with 400 threads holding memory : 91.5 435348 286668
with 0 threads holding memory : 8.6 51804 26984
iter: 6
with 400 threads holding memory : 91.3 435312 285848
with 0 threads holding memory : 4.4 56664 13828
iter: 7
with 400 threads holding memory : 92.0 435360 288156
with 0 threads holding memory : 9.1 61024 28788
iter: 8
with 400 threads holding memory : 91.9 435316 287780
with 0 threads holding memory : 11.2 63508 35184
iter: 9
with 400 threads holding memory : 91.2 435364 285672
with 0 threads holding memory : 11.0 67260 34468

What this tells me is that there is a memory leak, but it appears that
both sync and mutex suffer from it. It also appears to be, at this
point, relatively minor. I haven't looked at modifying Zed Shaw's
script to have it record its own numbers, so I can't verify what he
saw. It may make a big difference that Zed's script allocates memory
by building up a big array, whereas yours does so by making a big
string - I could easily believe that there's something subtle hidden
in the interaction of Array and Mutex, for example.

···

--
s=%q( Daniel Martin -- martin@snowplow.org
       puts "s=%q(#{s})",s.map{|i|i}[1] )
       puts "s=%q(#{s})",s.map{|i|i}[1]

Zed Shaw wrote:

> Howdy Folks,
>
> This release is after painstaking analysis of a memory leak that was
> reported by Bradley Taylor, reduced by myself, and then fixed after much
> work. You should all thank Bradley for finding the bizarre fix.

if you are really serious about fixing your leak i suggest you re-work your
tests. as i mentioned before they have several race conditions, not least of
which that they both start a random number of threads, not 1000 as the code
suggests (you can easily confirm by printing out the number of times the
thread init loop executes). further, sync.rb is the single ruby lib i've had
memory issues with on production systems. i have never managed to figure out
why that is...

Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

Maybe he simply cc'd you on the reply to the list? I doubt it was a
personal attack. At worst, probably a mistake. He's trying to be
helpful.

As I mentioned to you before, the evidence shows you are wrong. Sure
you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed
does operate the way Mongrel does. It's called a "bug reduction". I'm
not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

I think the point here is that it isn't necessarily Sync/Mutex/Ruby, but
the way you use it. He managed to show Mutex leaking, you managed to
show Sync leaking. And you eached managed to show neither of them
leaking.

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes
with
hundreds of thousands of threads being cycled during the tests.

Graphs don't make it true.

Not to mention about 8 other people switching to Sync report their leaks
fixed,
our own test script showing it's fixed, Mongrel tests showing it's
fixed,
several other frameworks showing it, and you can't argue with the
evidence.

This has only been tested for a day. There's still huge possibilities
that what you're seeing isn't what you think you're seeing. We've all
made that mistake more than once.

If your script has a leak then fine, just don't do that. Ultimately
though the
ruby lang guys need to fix this because either way, there's leaks. For
now,
Mongrel is not leaking and I'm happy with that.

Now, I'd appreciate it if you'd maybe spend your energy looking into the
ruby
source to find this leak rather than bothering me about it.

Thank you.

He's doing the same thing you are. Trying to expose the leak for what
it is. You can't go telling him to 'fix ruby' rather than bother the
list when you've done the same thing he has, just a day earlier. He's
submitting the information he has collected so others can try to figure
this out as well.

None of this is a personal attack. Nobody has said you didn't fix a
leak. There simply may be more going on here than anybody has seen yet.

···

On Sat, 2006-08-26 at 18:12 +0900, ara.t.howard@noaa.gov wrote:

On Sat, 26 Aug 2006, Zed Shaw wrote:

--
Posted via http://www.ruby-forum.com/.

Well I tried your test on OS X. The Sync had no problem, the mutex showed the memory growth (though it eventually (fifth iteration I think) cleaned itself up). I modified your test to create exactly 1000 threads and call GC three times at the end, things were better, i.e. it released its memory more quickly than without, but still not good. I ended up with:

       GC.start
       `sync; sync; sync`
       sleep 1
       GC.start
       `sync; sync; sync`
       sleep 1
       GC.start
       `sync; sync; sync`
       sleep 1
       GC.start
       `sync; sync; sync`
       sleep 1

and this made a bigger difference. The memory usage was much more tightly bound.

(And yes, the three calls to sync are also on purpose... in the late 70s through the 80s, calling sync once didn't guarantee anything, you had to call it a few times, three generally worked... I don't know the current situation because it is easy enough to type sync;sync;sync (well, in truth, I usually alias sync to the three calls))

But of course, the point is that despite appearances there is likely no memory leak at all on OS X, just some kind of long term cycle of process resource utilisation -- this is a complex situation, Ruby GC, process resource utilisation/optimisation, and system optimisation all interacting. Who knows what's actually going on.

So.

Cheers,
Bob

···

On Aug 26, 2006, at 10:22 AM, Bob Hutchison wrote:

On Aug 26, 2006, at 5:12 AM, ara.t.howard@noaa.gov wrote:

in any case, i'd carefully examine your tests (or the rails code if that is
indeed what it's modeled after) to make sure that they test
Mutex/Sync/Thread/Ruby and not your os virtual memory system and look closely
at the results again - like i said, i have had issues with sync.rb.

the point here is that it is probably the code in question and not Mutex per
se that was causing your process to grow in vmsize.

I ran your test on OS/X looking at VSZ and RSS. And, like you, initially got Sync with no leak visible, and mutex with what looks like a bad leak. However, I notice that you only called GC once. I have a years old habit of always running GC at least three times when I really wanted GC to run (and in Java I had a loop that ran GC until it stopped freeing stuff which in some cases was eight or nine times). Superstition? Apparently not. On OS X, when I run GC three times neither sync nor mutex show a memory leak.

Zed, just for fun, try running GC a few times in a row (like GC.start; GC.start; GC.start)

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

hi zed - calm down a minute! :wink: i have do idea what my mailer did but
suspect it has something to do with all the recipients and the fact that it
has to swtich roles to post to rails/ruby-talk. in any case, it wasn't by
design.

As I mentioned to you before, the evidence shows you are wrong.

but my evidence shows you are! i'm merely pointing out it's unclear at this
point.

Sure you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed does
operate the way Mongrel does. It's called a "bug reduction".

i think it's 'bug production' though - for instance, your loop that starts
'1000' threads starts a random number of them, your gc call, as bob showed,
doesn't seem to be quite enough (not that it's your fault). and there are
race conditions/edge cases with the os memory subsystem.

I'm not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

but you could change the code on mongrel right!? bear in mind you have not
just been talking about changing your code, but advocating changing ruby and
all ruby software - you have to back up that sort of thing!

Also, as I said before, this shows the leak:

http://pastie.caboo.se/10194

but it doesn't show a leak on my system, on bob's system, or on ed's system?
just tons of memory usage. are you ignoring the fact that no one else who has
run it shows a leak either? how about the fact that electric fence, mpatrol,
and mtrace also show no leak? how about the fact that very few people have
had issues with mutex and memory?

And this script, with just 3 lines changed to use Sync shows it's fixed:

http://pastie.caboo.se/10317

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes with
hundreds of thousands of threads being cycled during the tests.

well, you can graph my numbers too and they'll show a sawtooth for mutex and
linearly increasing line for sync, at least for the number of iterations i
posted. which is, of course, exactly the opposite of what you showed. btw,
i'd be glad if both of us could be proven wrong.

Not to mention about 8 other people switching to Sync report their leaks
fixed, our own test script showing it's fixed, Mongrel tests showing it's
fixed, several other frameworks showing it, and you can't argue with the
evidence.

i'm sure that's not true. what has happened is that they've shown a reduction
in memory usage. i don't think anything else has even been able to confirm
your code leaked - just that i grew large - eg had a bad memory
allocation/free cycle. reads bob's post for more info.

If your script has a leak then fine, just don't do that.

exactly my point too!

Ultimately though the ruby lang guys need to fix this because either way,
there's leaks.

here i disagree. i'm not sure either of us has show a leak in __ruby__. we
may have demonstrated a bad memory usage pattern for certain memory subsystems
- this is far fetched, read comp.unix.programmer and you can see thousands of
threads on this subject. it's a very frustrating one for many people.

For now, Mongrel is not leaking and I'm happy with that.

granted. however - screaming that everyone should re-write their code which
has used anything other than sync is just a bit extreme considering no one
else can show a leak - only badly behaving scripts - isn't it!?

Now, I'd appreciate it if you'd maybe spend your energy looking into the
ruby source to find this leak rather than bothering me about it.

the whole reason i'm posting is that i too have had issues with mutex/sync etc
and not solved it to my satisfaction. if you are satisfied fixing a heisenbug
and calling it good then fine. however the other reason i was posting is that
i've fixed hundreds of bugs in long running near realtime systems where the
coder wrote somthing they didn't really understand that seemed to 'fix' the
problem, only to create new ones and release a program that's even more
difficult to debug. it's very suprising how many peices of code work out of
luck but really shouldn't - if you're ok with that then fine. i just don't
happen to be.

regards.

-a

···

On Sat, 26 Aug 2006, Zed Shaw wrote:
--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

Daniel Martin wrote:

There's a problem with how this script measures its memory usage - it
measures %MEM, which is proportional to RSS. However, RSS is just how
much of the program's memory is *resident*; i.e. not swapped out.

You're assuming here that the OS is accurately counting RSS and that the
user-space tool is correctly picking it up from the OS. That's not
necessarily a given in Linux. :slight_smile: What's worse, no two programmers have
the same view of how one should export performance counters in /proc,
nor is there often much documentation except the source. "vi
/usr/src/linux/fs/proc/*.c" is your friend. :slight_smile:

To look at how much total memory is used, the variable you want is
VSZ, which isn't available in a "ps v" but is in a "ps u". I modified
your script to look at the %MEM, VSZ, and RSS numbers from "ps u" and
also to run GC.start a variable number of times.

Yeah, but ... threads share the same virtual address space, making the
"%MEM" more or less meaningless. To use the J word again, a Java runtime
can have ten or twenty or more threads, all sharing the same address
space and all having the same VSZ. They get counted more than once.
Ugly. Basically, if you want accurate memory usage information from
Linux, you need to read from /dev/kmem and walk the tables yourself. :frowning:

good catch! i'm tweaking my script now....

-a

···

On Sun, 27 Aug 2006, Daniel Martin wrote:

ara.t.howard@noaa.gov writes:

in any case a careful script which allocated memory in a thread,
waits for all threads to finish allocation, checks memory, and then
kills all threads before checking again shows some suprising results
which you should read carefully:

There's a problem with how this script measures its memory usage - it
measures %MEM, which is proportional to RSS. However, RSS is just how
much of the program's memory is *resident*; i.e. not swapped out.

To look at how much total memory is used, the variable you want is
VSZ, which isn't available in a "ps v" but is in a "ps u". I modified
your script to look at the %MEM, VSZ, and RSS numbers from "ps u" and
also to run GC.start a variable number of times.

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

I have been playing with this all morning.

I added a line to the script to print the object count when it prints how many threads are still spinning:

oc = 0; ObjectSpace.each_object {oc += 1}; STDERR.puts "OC: #{oc}"

And I changed it so that it creates a specific number of threads, like other people have, and I played both with keeping the explicit GC.start and removing it.

I altered the array creation line in the thread to use memory faster:

30000.times {|i| ta << "#{i}" }

Making a string out of that number is a much hungrier operation that, much more quickly, seems to show Mutex "leaking".

Finally, I added a three second sleep between when one set of 1000 threads finishes and when the next is started.

Other than that, I left it alone.

One version of the code I have been using is here:

http://rafb.net/paste/results/NbwVMo51.html

Then I sat here and ran the Mutex version and the Sync version over and over again.

The Mutex version ends up with many more objects sitting in ObjectSpace than the Sync based version has, and while the Sync based version's object counts are pretty consistent, the Mutex version's are volatile, jumping up and down wildly, with a trend towards an increasing average count.

However, I don't think that's the whole story here. Sync, because it is a lot more complex, it also a lot slower than Mutex. This speed issue is part of the key to the appearance that Mutex leaks while Sync does not.

If I add one line to the test code that uses a Mutex, I get very different results.

select(nil,nil,nil,0.025)

The number of objects floating in objectspace at any given time, while still a bit more volatile than with the Sync version, is now close to the same overall numbers that I see in the Sync version, and does not show a continually upward trending average count.

In addition, RAM usage at the OS level does not continue to increase.

This strongly suggests to me that the reported leak in Mutex is no leak at all, and that impression is boltered by having looked at the Mutex code -- it's very simple and itself creates very few objects.

Rather, in the test cases, using a Mutex just lets the threads build objects faster than the GC thread can clean them up. Slowing the threads down, either by using Sync, or by sticking a micro delay into them, gives the GC enough time to keep pace, and this is the only way that I have found to show any sort of leak with Mutex.

Kirk Haines

···

On Sun, 27 Aug 2006 ara.t.howard@noaa.gov wrote:

but it doesn't show a leak on my system, on bob's system, or on ed's system?
just tons of memory usage. are you ignoring the fact that no one else who has
run it shows a leak either? how about the fact that electric fence, mpatrol,
and mtrace also show no leak? how about the fact that very few people have
had issues with mutex and memory?

Bob Hutchison wrote:

[snip]

Well I tried your test on OS X. The Sync had no problem, the mutex
showed the memory growth (though it eventually (fifth iteration I think)
cleaned itself up). I modified your test to create exactly 1000 threads
and call GC three times at the end, things were better, i.e. it released
its memory more quickly than without, but still not good. I ended up with:

      GC.start
      `sync; sync; sync`
      sleep 1
      GC.start
      `sync; sync; sync`
      sleep 1
      GC.start
      `sync; sync; sync`
      sleep 1
      GC.start
      `sync; sync; sync`
      sleep 1

and this made a bigger difference. The memory usage was much more
tightly bound.

(And yes, the three calls to sync are also on purpose... in the late 70s
through the 80s, calling sync once didn't guarantee anything, you had to
call it a few times, three generally worked... I don't know the current
situation because it is easy enough to type sync;sync;sync (well, in
truth, I usually alias sync to the three calls))

But of course, the point is that despite appearances there is likely no
memory leak at all on OS X, just some kind of long term cycle of process
resource utilisation -- this is a complex situation, Ruby GC, process
resource utilisation/optimisation, and system optimisation all
interacting. Who knows what's actually going on.

Finally someone with some platform details!! <vbg>

OK ... here's my take

1. The OS "three-sync" thing is, as you pointed out, a throwback to days
when you needed to do that sort of thing. It's superstition.

2. IIRC OS X is a BSD-type kernel rather than a Linux kernel, so at
least we know a different memory manager still needs "help" to deal with
this kind of application.

3. Typing a single "sync" into a *Linux* kernel when there's a lot of
"stuff" built up in RAM is a very bad idea. It will force the system
into an I/O bound mode and lock everybody out until the kernel has
cleaned up after itself. Either OS X has a better memory and I/O manager
than Linux, or you didn't have a lot of "stuff" built up from this
simple test. The second and third syncs are both unnecessary and
harmless. :slight_smile:

4. Deleting references to no-longer-needed objects and then explicitly
calling the garbage collector has a longer history and tradition than
UNIX. It is "standard software engineering practice" in any environment
that has garbage collection. Just last week, I had to stick such a call
into an R program to keep it from crashing (on a Windows machine with 2
GB of RAM!)

For the software engineering philosophers on the list, what's the
difference between a language that forces the engineer to explicitly
manage dynamic memory allocation and de-allocation, and one that
supposedly relieves the engineer from that need -- until you crash in a
production system that worked on smaller test cases a couple of months
ago? :slight_smile:

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

"M. Edward (Ed) Borasky" <znmeb@cesmail.net> writes:

To look at how much total memory is used, the variable you want is
VSZ, which isn't available in a "ps v" but is in a "ps u". I modified
your script to look at the %MEM, VSZ, and RSS numbers from "ps u" and
also to run GC.start a variable number of times.

Yeah, but ... threads share the same virtual address space, making the
"%MEM" more or less meaningless. To use the J word again, a Java runtime
can have ten or twenty or more threads, all sharing the same address
space and all having the same VSZ. They get counted more than once.
Ugly. Basically, if you want accurate memory usage information from
Linux, you need to read from /dev/kmem and walk the tables yourself. :frowning:

You're telling me I shouldn't be using %MEM. I already knew that, and
that was why I was making this post. I'm basing my conclusions off VSZ.

VSZ is the total amount of virtual memory allocated by the OS to the
application, and is also the number that Linux uses when deciding what
to kill. Since this began with Mongrel processes being killed, this
is the relevant number. As ruby threads aren't seen as threads by the
underlying OS, what you see with java (where the OS does see the java
threads) really isn't relevant. As far as the OS is concerned, ruby
is just one process like other processes - ruby threads don't, for
example, take advantage of SMP machines and execute in parallel on
separate CPUs.

···

--
s=%q( Daniel Martin -- martin@snowplow.org
       puts "s=%q(#{s})",s.map{|i|i}[1] )
       puts "s=%q(#{s})",s.map{|i|i}[1]

fortytwo :~ > cat /etc/redhat-release; uname -srm; cat /proc/cpuinfo; cat /proc/meminfo
Fedora Core release 3 (Heidelberg)
Linux 2.6.12-1.1381_FC3 i686
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) processor
stepping : 2
cpu MHz : 1201.740
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 2375.68

MemTotal: 483260 kB
MemFree: 23380 kB
Buffers: 27868 kB
Cached: 273552 kB
SwapCached: 70392 kB
Active: 294380 kB
Inactive: 128220 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 483260 kB
LowFree: 23380 kB
SwapTotal: 983032 kB
SwapFree: 796192 kB
Dirty: 40 kB
Writeback: 0 kB
Mapped: 150804 kB
Slab: 28032 kB
CommitLimit: 1224660 kB
Committed_AS: 599536 kB
PageTables: 3764 kB
VmallocTotal: 540664 kB
VmallocUsed: 3600 kB
VmallocChunk: 536872 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB

cheers.

-a

···

On Sun, 27 Aug 2006, M. Edward (Ed) Borasky wrote:

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

System Info:
OS Name Microsoft Windows XP Professional
Version 5.1.2600 Service Pack 2 Build 2600
OS Manufacturer Microsoft Corporation
System Name DADDY
System Manufacturer Dell Inc.
System Model Precision WorkStation 490
System Type X86-based PC
Processor x86 Family 6 Model 15 Stepping 6 GenuineIntel ~2327 Mhz
BIOS Version/Date Dell Inc. A01, 6/30/2006
SMBIOS Version 2.3
Windows Directory C:\WINDOWS
System Directory C:\WINDOWS\system32
Boot Device \Device\HarddiskVolume1
Locale United States
Hardware Abstraction Layer Version = "5.1.2600.2180 (xpsp_sp2_rtm.040803-2158)"
User Name DADDY\patrick
Time Zone Eastern Standard Time
Total Physical Memory 2,048.00 MB
Available Physical Memory 598.48 MB
Total Virtual Memory 2.00 GB
Available Virtual Memory 1.96 GB
Page File Space 3.85 GB
Page File C:\pagefile.sys

Ruby 1.8.4

I see a slow growth in the virtual bytes counter at the first
iteration is ~32Meg, but every few iterations I see it kick up a notch
after 20+ iterations it is now ~42Meg. Now just as interestingly the
working set size on the app is rock solid just under 12Meg with no
visible growth.

Let me know if you need any more information
pth

···

On 8/26/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

Good question Ed, from what I know it was tested all over by most people
who use it. The leak was tested and confirmed on the following that I
know of:

* Ubuntu Linux on Dual Core Dual 1.83mhz
* Mac OSX on Intel, not sure about PPC.
* FreeBSD (not sure which version or what processors)
* Win32 -- And it appears that Mutex works but Sync doesn't on Win32 so
we're basically screwed.

···

On Sun, 2006-08-27 at 01:18 +0900, M. Edward (Ed) Borasky wrote:

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

--
Zed A. Shaw


http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

For the software engineering philosophers on the list, what's the
difference between a language that forces the engineer to explicitly
manage dynamic memory allocation and de-allocation, and one that
supposedly relieves the engineer from that need -- until you crash in a
production system that worked on smaller test cases a couple of months
ago? :slight_smile:

This is exactly the problem I'm complaining about. It's not that Ruby's
GC isn't collecting the ram fast enough or anything like that. If Ruby
runs out or I call GC.start then dammit clear the dead objects. It's
not a suggestion, it's a command. Collect the RAM. Don't do it lazy
until my OS or process crashes.

So, for me, despite everyone talking about timeouts and stuff (and I
know people are just trying to find out why Ruby does this) it's all
useless since I still have a program crashing and I'm now looking at a
total redesign to work around a GC.

And yet, all I really need is a single option I can pass to ruby that
says "never go above X" instead of this psuedo-leak-until-crash crap.

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

Luis Lavena ran our tests on Win32 and found that Mutex works but Sync
doesn't, so I'm back to square one. I'm probably going to do a fast
patch so that people can just pick either and see if the leak goes away.

···

On Sun, 2006-08-27 at 12:04 +0900, M. Edward (Ed) Borasky wrote:

--
Zed A. Shaw


http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Indeed.

So, instead of just counting the live objects, I decided to take a look at what was there.

Guess what I found.

Using Ara's test and mutex, I was getting a kind of oscillation of memory use. Not a nice profile at all. Having a look at the objects still in ObjectSpace what do I see? why the Thread objects! I was running Ara's test with 100 threads and there was 101 threads in ObjectSpace (well, sometimes 103).

So I set the guard to nil before running the cleanup step, and created a new mutex at the beginning of the iteration. Now only one Thread in the ObjectSpace. And guess what? A nice *constant* memory utilisation (according to VSZ).

The Mutex was somehow holding the Thread object in memory and that was holding other stuff.

You're right, deleting references to no-longer-needed objects is standard. I suppose it is also standard to make sure you get *all* of them :slight_smile:

Cheers,
Bob

···

On Aug 26, 2006, at 11:04 PM, M. Edward (Ed) Borasky wrote:

4. Deleting references to no-longer-needed objects and then explicitly
calling the garbage collector has a longer history and tradition than
UNIX. It is "standard software engineering practice" in any environment
that has garbage collection. Just last week, I had to stick such a call
into an R program to keep it from crashing (on a Windows machine with 2
GB of RAM!)

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

Well I certainly hope it is superstition. But it is kind of like those fancy new stove tops that are not supposed to get hot enough to burn you... *you* try it out, I'll keep my hands away :slight_smile:

FWIW, I also still write 0 == N rather than N == 0 all the time. Sometimes the burn was so bad that it is psychologically easier to just keep up the avoidance behaviour :slight_smile:

Cheers,
Bob

···

On Aug 26, 2006, at 11:04 PM, M. Edward (Ed) Borasky wrote:

1. The OS "three-sync" thing is, as you pointed out, a throwback to days
when you needed to do that sort of thing. It's superstition.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>