Debugging memory use and GC

What is a good way to find out what objects are not being GC'd ? I am
seeing a strange pattern I can't figure out. The app is handling
large files and will use up to 150mb or so of memory and then when I
call GC.start it goes back down to around 8mb. But after a few cycles
memory stops being reclaimed.

Chris

snacktime wrote:

What is a good way to find out what objects are not being GC'd ? I am
seeing a strange pattern I can't figure out. The app is handling
large files and will use up to 150mb or so of memory and then when I
call GC.start it goes back down to around 8mb. But after a few cycles
memory stops being reclaimed.

Debug, set a breakpoint after GC when you expect the anomaly to occur,
inspect ObjectSpace?

David Vallner

snacktime wrote:

What is a good way to find out what objects are not being GC'd ? I am
seeing a strange pattern I can't figure out. The app is handling
large files and will use up to 150mb or so of memory and then when I
call GC.start it goes back down to around 8mb. But after a few cycles
memory stops being reclaimed.

Chris

Print out all Objects to a file before the leak and after the leak.
diff the files. Sort the objects by their class and then the object_id.
If you suspect some objects, add a finalizer using ObjectSpace#add_finalizer and put some trace in it.

···

--
Darshan Patil

"The trouble with work is that it interferes with living." - Peter Mckill 1968

http://scattrbrain.com

Of course, what you really want to know is not just what's not getting
GCed but WHY.

This can be a difficult problem. You really want to find the
reference paths from root objects.

Some GC languages like Smalltalk have methods in object to get a list
of everything which references it. I haven't seen such a facility in
Ruby. Then the problem with this is that calling this method
generates additional references to the objects referencing the object
etc. This kind of heisenberg effect makes building a tool to find
reference paths difficult.

···

On 10/25/06, snacktime <snacktime@gmail.com> wrote:

What is a good way to find out what objects are not being GC'd ? I am
seeing a strange pattern I can't figure out. The app is handling
large files and will use up to 150mb or so of memory and then when I
call GC.start it goes back down to around 8mb. But after a few cycles
memory stops being reclaimed.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Of course adding a finalizer won't be much help in debugging why an
object is not being GCed, since the finalizer will never be invoked.

···

On 10/26/06, Darshan Patil <dapatil@nerdshack.com> wrote:

If you suspect some objects, add a finalizer using
ObjectSpace#add_finalizer and put some trace in it.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

it would be expensive, but i wonder of dumping the objects in objectspace
might be useful - since Marshal.dump already follows all references it seems
like a custom _dump method on object which could all themselves to a tree
might do the trick. in otherwords, if you dumped an object with a global tree
in contect then all objects being dumped as a result would add themselves to
this tree. after the dump, you simply keep a copy of the tree...

just a thought...

-a

···

On Fri, 27 Oct 2006, Rick DeNatale wrote:

On 10/25/06, snacktime <snacktime@gmail.com> wrote:

What is a good way to find out what objects are not being GC'd ? I am
seeing a strange pattern I can't figure out. The app is handling
large files and will use up to 150mb or so of memory and then when I
call GC.start it goes back down to around 8mb. But after a few cycles
memory stops being reclaimed.

Of course, what you really want to know is not just what's not getting
GCed but WHY.

This can be a difficult problem. You really want to find the
reference paths from root objects.

Some GC languages like Smalltalk have methods in object to get a list
of everything which references it. I haven't seen such a facility in
Ruby. Then the problem with this is that calling this method
generates additional references to the objects referencing the object
etc. This kind of heisenberg effect makes building a tool to find
reference paths difficult.

--
my religion is very simple. my religion is kindness. -- the dalai lama

Rick DeNatale wrote:

What is a good way to find out what objects are not being GC'd ? I am
seeing a strange pattern I can't figure out. The app is handling
large files and will use up to 150mb or so of memory and then when I
call GC.start it goes back down to around 8mb. But after a few cycles
memory stops being reclaimed.

Of course, what you really want to know is not just what's not getting
GCed but WHY.

This can be a difficult problem. You really want to find the
reference paths from root objects.

Some GC languages like Smalltalk have methods in object to get a list
of everything which references it. I haven't seen such a facility in
Ruby. Then the problem with this is that calling this method
generates additional references to the objects referencing the object
etc. This kind of heisenberg effect makes building a tool to find
reference paths difficult.

I wrote a patch for ruby 1.6/1.7 that would search for all ways of reaching an object from root objects in objectspace:

http://redshift.sourceforge.net/debugging-GC/

Usage info at:

http://redshift.sourceforge.net/debugging-GC/gc-patch.txt

Nobu updated it for CVS as of 12 Aug 2005:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/151854

I've found this to be useful only once or twice, but in those rare cases it can be very helpful...

···

On 10/25/06, snacktime <snacktime@gmail.com> wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Not sure, what I was suggesting was that the real goal is to somehow
root out the reference path or path which is keeping an object from
being reclaimed without making additional references.

Another issue is that ObjectSpace.each_object can give you objects
which aren't really alive:

ick@frodo:/public/rubyscripts$ cat gctest.rb
class Foo
  def initialize
    @iv = "bar"
  end
end

def make_foo
  p Foo.new
end

GC.enable

make_foo

ObjectSpace.each_object {|f| p f if Foo === f }
ObjectSpace.garbage_collect
puts "after gc"
ObjectSpace.each_object {|f| p f if Foo === f }
puts "done"
rick@frodo:/public/rubyscripts$ ruby gctest.rb
#<Foo:0xb7dc1804 @iv="bar">
after gc
#<Foo:0xb7dc1804 @iv="bar">
done

I've played around with various versions of this, like
each_object(Foo) and that instance of Foo with no apparent references
to it seems to be sticking around for some reason.

I instantiated Foo in the make_foo method to make sure that it wasn't
still in the current stack frame.

This really goes to show that the guarantee that the GC makes is not
to free live objects, and not to free dead ones ASAP.

It also shows why you shouldn't rely on finalization as part of
application/system logic, since you never know when, or even if it
will be called.

···

On 10/26/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:

it would be expensive, but i wonder of dumping the objects in objectspace
might be useful - since Marshal.dump already follows all references it seems
like a custom _dump method on object which could all themselves to a tree
might do the trick. in otherwords, if you dumped an object with a global tree
in contect then all objects being dumped as a result would add themselves to
this tree. after the dump, you simply keep a copy of the tree...

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

<snip>

right. i was thinking of something like this:

     harp:~ > cat a.rb
     def really_stupid_reference_finder obj
       begin
         class << obj
           def _dump(*_) throw 'referer', true end
         end
       rescue TypeError
         nil
       end
       ObjectSpace.each_object do |candidate|
         next if candidate == obj
         referer = catch 'referer' do
           begin
             Marshal.dump candidate
           rescue TypeError
             false
           end
           false
         end
         return candidate if referer
       end
       return nil
     ensure
       GC.start
     end

     a = [b = '42']

     referer = really_stupid_reference_finder b
     p referer
     p referer == a

     referer = really_stupid_reference_finder [ 'new_array' ]
     p referer
     p referer == a

     harp:~ > ruby a.rb
     ["42"]
     true
     nil
     false

-a

···

On Fri, 27 Oct 2006, Rick DeNatale wrote:

Not sure, what I was suggesting was that the real goal is to somehow root
out the reference path or path which is keeping an object from being
reclaimed without making additional references.

--
my religion is very simple. my religion is kindness. -- the dalai lama

That's a nice idea!

One problem is that a referrer can be something other than an object: a ruby global var, a C global var, a local var. Or it can be an object that is not dumpable, such as a proc binding.

But the throw/dump combo is a great trick to remember...

What I'd really like to see is a general object graph traversal mechanism that can be used to help implement marshal and other dumpers, gc tools, etc. Several (3 or 4) years ago, matz said he was moving in this direction...[1]

···

ara.t.howard@noaa.gov wrote:

On Fri, 27 Oct 2006, Rick DeNatale wrote:

Not sure, what I was suggesting was that the real goal is to somehow root
out the reference path or path which is keeping an object from being
reclaimed without making additional references.

<snip>

right. i was thinking of something like this:

    harp:~ > cat a.rb
    def really_stupid_reference_finder obj

----

[1] See:

http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/marshal.c?cvsroot=src

and grep for vjoel:

* marshal.c (w_object): T_DATA process patch from Joel VanderWerf
   <vjoel@PATH.Berkeley.EDU>. This is temporary hack; it remains
   undocumented, and it will be removed when marshaling is
   re-designed.

The hack is still there (as of 1.8.5, anyway), still undocumented, and still useful.

The original discussion about why it is useful starts at:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/34037

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf wrote:

What I'd really like to see is a general object graph traversal mechanism that can be used to help implement marshal and other dumpers, gc tools, etc. Several (3 or 4) years ago, matz said he was moving in this direction...[1]

This is the thread where matz said he was looking at a more general traversal mechanism to support marshal and other purposes:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/34335

Maybe it is still "vapor"...

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

now __that__ is good know!

cheers.

-a

···

On Fri, 27 Oct 2006, Joel VanderWerf wrote:

----

[1] See:

http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/marshal.c?cvsroot=src

and grep for vjoel:

* marshal.c (w_object): T_DATA process patch from Joel VanderWerf
<vjoel@PATH.Berkeley.EDU>. This is temporary hack; it remains
undocumented, and it will be removed when marshaling is
re-designed.

The hack is still there (as of 1.8.5, anyway), still undocumented, and still useful.

The original discussion about why it is useful starts at:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/34037

--
my religion is very simple. my religion is kindness. -- the dalai lama