Fun with finalizers

Hi all,

Just ran into something very interesting with finalizers. I've found a workaround (it'll be obvious what it is from the code below), but I just thought I'd share it for discussion's sake.

Consider the code below:

$fcount = 0

class A
   def initialize
   end
end

class B

   def initialize
   end

   def bar a
     ObjectSpace.define_finalizer(a, lambda {|oid| $fcount += 1})
     a = nil # xxx
     nil
   end

   def foo

     a = A.new
     bar a
     nil
   end
end

b = B.new
for i in 1 .. 10000
   GC.start
   b.foo
   GC.start
end
$stderr.print "Program ends. #{$fcount} finalizers called.\n"

All but one of the finalizers run at the point of the trace.

Now, comment the line marked with xxx. This shouldn't make any difference- but it does. The program will report that 0 finalizers ran at the point of the trace. You can confirm that the rest did run, but they *only* ran at program exit, after the trace. Basically, the resources are never released when finalizers are used. This is a big problem in a long-running program.

If 10000 iterations isn't enough, you can always increase the counter.

Note that I am using Ruby 1.9.2p136, Linux. Other versions may behave differently.

Why the code above? I was noticing in my code that finalizers were *never* being run under any circumstances. The above is a stripped-down set of code that acts similarly to mine.

From a bit of research online, I've seen comments that say sometimes values are left in registers, which affects GC. That seems fair enough in general- but not here. There are 10000 objects here that aren't being finalized- they're not in all in registers. If it's the stack, then the first run should also have failed. It's not the return value, this is nil. It's not the parameter coming in, the first test would have failed.

Is it the current scope? I have a feeling that, based on the one line change I made, that the current scope is somehow being captured by the finaliser, so that if "a" remains set, the finaliser holds on to it, and the object is never released. That's just my theory- I could be wrong. This situation is particularly bad if you want to set up a finaliser and then immediately return the value (say, as a result of caching a value)- the finaliser will never be called, because you can't clear the value before returning it.

If you've followed me so far, you can probably guess the workaround- call a separate method to set the finaliser, and clear the parameter afterward in that call, then return to the caller with a nil return value. It's annoying, but not too painful.

What I am incredibly curious about is why this happens in the first place- and why there doesn't seem too much talk of this specific problem when using finalizers online. Finalizers failing to work when used in the current scope without explicitly clearing the object afterward seems like the sort of problem other people should be running into more often.

It's bizarre. I'm wondering what everyone else thinks of it. Have I missed something?

Garth

Quick guess, it's the lambda. Replace it with #proc and try again?

···

Sent from my phone, so excuse the typos.
On Feb 16, 2013 4:39 PM, "Garthy D" <garthy_lmkltybr@entropicsoftware.com> wrote:

Hi all,

Just ran into something very interesting with finalizers. I've found a
workaround (it'll be obvious what it is from the code below), but I just
thought I'd share it for discussion's sake.

Consider the code below:

$fcount = 0

class A
  def initialize
  end
end

class B

  def initialize
  end

  def bar a
    ObjectSpace.define_finalizer(**a, lambda {|oid| $fcount += 1})
    a = nil # xxx
    nil
  end

  def foo

    a = A.new
    bar a
    nil
  end
end

b = B.new
for i in 1 .. 10000
  GC.start
  b.foo
  GC.start
end
$stderr.print "Program ends. #{$fcount} finalizers called.\n"

All but one of the finalizers run at the point of the trace.

Now, comment the line marked with xxx. This shouldn't make any difference-
but it does. The program will report that 0 finalizers ran at the point of
the trace. You can confirm that the rest did run, but they *only* ran at
program exit, after the trace. Basically, the resources are never released
when finalizers are used. This is a big problem in a long-running program.

If 10000 iterations isn't enough, you can always increase the counter.

Note that I am using Ruby 1.9.2p136, Linux. Other versions may behave
differently.

Why the code above? I was noticing in my code that finalizers were *never*
being run under any circumstances. The above is a stripped-down set of code
that acts similarly to mine.

From a bit of research online, I've seen comments that say sometimes
values are left in registers, which affects GC. That seems fair enough in
general- but not here. There are 10000 objects here that aren't being
finalized- they're not in all in registers. If it's the stack, then the
first run should also have failed. It's not the return value, this is nil.
It's not the parameter coming in, the first test would have failed.

Is it the current scope? I have a feeling that, based on the one line
change I made, that the current scope is somehow being captured by the
finaliser, so that if "a" remains set, the finaliser holds on to it, and
the object is never released. That's just my theory- I could be wrong. This
situation is particularly bad if you want to set up a finaliser and then
immediately return the value (say, as a result of caching a value)- the
finaliser will never be called, because you can't clear the value before
returning it.

If you've followed me so far, you can probably guess the workaround- call
a separate method to set the finaliser, and clear the parameter afterward
in that call, then return to the caller with a nil return value. It's
annoying, but not too painful.

What I am incredibly curious about is why this happens in the first place-
and why there doesn't seem too much talk of this specific problem when
using finalizers online. Finalizers failing to work when used in the
current scope without explicitly clearing the object afterward seems like
the sort of problem other people should be running into more often.

It's bizarre. I'm wondering what everyone else thinks of it. Have I missed
something?

Garth

Hi Matthew,

Excellent thinking. I also thought it might be something along those lines too. I tried various combinations as well: proc, Proc.new, I think a return from method(), calls to a separate object; but there was no impact on the result. If "a" isn't cleared, the object is held. I'm guessing that there might be some way to say to not touch a thing in the current scope, but I'm not sure *how* to specify it.

I also adapted the main program based on my experience with the code below, and suddenly the finalizers were called. So it's the same type of problem.

So there's a problem, and it's avoidable. I know the "what", but don't know the "why". There is some subtlety I'm missing. Most interesting. :slight_smile:

Cheers,
Garth

···

On 16/02/13 20:30, Matthew Kerwin wrote:

Quick guess, it's the lambda. Replace it with #proc and try again?

Sent from my phone, so excuse the typos.

On Feb 16, 2013 4:39 PM, "Garthy D" > <garthy_lmkltybr@entropicsoftware.com > <mailto:garthy_lmkltybr@entropicsoftware.com>> wrote:

    Hi all,

    Just ran into something very interesting with finalizers. I've found
    a workaround (it'll be obvious what it is from the code below), but
    I just thought I'd share it for discussion's sake.

    Consider the code below:

    $fcount = 0

    class A
       def initialize
       end
    end

    class B

       def initialize
       end

       def bar a
         ObjectSpace.define_finalizer(__a, lambda {|oid| $fcount += 1})
         a = nil # xxx
         nil
       end

       def foo

         a = A.new
         bar a
         nil
       end
    end

    b = B.new
    for i in 1 .. 10000
       GC.start
       b.foo
       GC.start
    end
    $stderr.print "Program ends. #{$fcount} finalizers called.\n"

    All but one of the finalizers run at the point of the trace.

    Now, comment the line marked with xxx. This shouldn't make any
    difference- but it does. The program will report that 0 finalizers
    ran at the point of the trace. You can confirm that the rest did
    run, but they *only* ran at program exit, after the trace.
    Basically, the resources are never released when finalizers are
    used. This is a big problem in a long-running program.

    If 10000 iterations isn't enough, you can always increase the counter.

    Note that I am using Ruby 1.9.2p136, Linux. Other versions may
    behave differently.

    Why the code above? I was noticing in my code that finalizers were
    *never* being run under any circumstances. The above is a
    stripped-down set of code that acts similarly to mine.

     >From a bit of research online, I've seen comments that say
    sometimes values are left in registers, which affects GC. That seems
    fair enough in general- but not here. There are 10000 objects here
    that aren't being finalized- they're not in all in registers. If
    it's the stack, then the first run should also have failed. It's not
    the return value, this is nil. It's not the parameter coming in, the
    first test would have failed.

    Is it the current scope? I have a feeling that, based on the one
    line change I made, that the current scope is somehow being captured
    by the finaliser, so that if "a" remains set, the finaliser holds on
    to it, and the object is never released. That's just my theory- I
    could be wrong. This situation is particularly bad if you want to
    set up a finaliser and then immediately return the value (say, as a
    result of caching a value)- the finaliser will never be called,
    because you can't clear the value before returning it.

    If you've followed me so far, you can probably guess the workaround-
    call a separate method to set the finaliser, and clear the parameter
    afterward in that call, then return to the caller with a nil return
    value. It's annoying, but not too painful.

    What I am incredibly curious about is why this happens in the first
    place- and why there doesn't seem too much talk of this specific
    problem when using finalizers online. Finalizers failing to work
    when used in the current scope without explicitly clearing the
    object afterward seems like the sort of problem other people should
    be running into more often.

    It's bizarre. I'm wondering what everyone else thinks of it. Have I
    missed something?

    Garth

Excellent thinking. I also thought it might be something along those lines
too.

To make it crystal clear: the reason is that there is a closure
involved. The closure will hold on to the object referenced by a on
method entry - unless, as you discovered, that reference is cleared.

Just in case and if you don't know, here's what a closure does:

irb(main):015:0> def f; x=0; lambda { x+=1 } end
=> nil
irb(main):016:0> g = f
=> #<Proc:0x802b7778@(irb):15 (lambda)>
irb(main):017:0> g.call
=> 1
irb(main):018:0> g.call
=> 2
irb(main):019:0> g.call
=> 3

The closure captures the current scope, i.e. all local variables.
This includes method arguments and "self" - and hence all member
variables of self as well:

irb(main):023:0> def f; @x = 0; lambda { @x += 1 } end
=> nil
irb(main):024:0> g = f
=> #<Proc:0x8029f934@(irb):23 (lambda)>
irb(main):025:0> g.call
=> 1
irb(main):026:0> g.call
=> 2
irb(main):027:0> g.call
=> 3
irb(main):028:0> g.call
=> 4

I tried various combinations as well: proc, Proc.new, I think a return
from method(), calls to a separate object; but there was no impact on the
result. If "a" isn't cleared, the object is held.

I would be surprised if there was. lambda and Proc both create a
closure. Difference between lambda and Proc are in a different area:

I'm guessing that there
might be some way to say to not touch a thing in the current scope, but I'm
not sure *how* to specify it.

There is no way to exclude that variable from the closure - other than
not passing it. But that would be pointless here. :slight_smile:

I also adapted the main program based on my experience with the code below,
and suddenly the finalizers were called. So it's the same type of problem.

So there's a problem, and it's avoidable. I know the "what", but don't know
the "why". There is some subtlety I'm missing. Most interesting. :slight_smile:

Now you should know.

Kind regards

robert

···

On Sat, Feb 16, 2013 at 12:08 PM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Hi Robert,

Thankyou very much, yet again. An excellent and incredibly informative response, as always.

The hole in my understanding (and what I had begun to suspect was the case, and I think you have identified) was pretty-much here:

> The closure captures the current scope, i.e. all local variables.

Before I had encountered the problem, my understanding was that the closure would capture any referenced variables similarly to a function/method call. I wasn't 100% sure of the mechanics, but I believed that it "just happened". What I didn't understand was that it did this by holding on to the entire scope- and just that one scope. The distinction had not become apparent to me as my understanding did not clash with what was actually happening when finalizers were not involved. However, the differences that arise once finalizers enter the picture are actually very significant.

I think I did not pick this up early as a consequence of this means that most of the discussion online regarding Ruby finalizers either glosses over this point- or flat out misses it. There are plenty of mentions of not implicitly including the object being finalized in the finalizer (by, say, expecting to be able to call a method on the finalized object), but I'm not sure I've seen a mention of capturing the current scope and needing to be careful with the visible variables in it. However, a common solution seems to be to use a method to return the finaliser proc itself, and I'd missed the distinction that by doing it this way, the call is created with a different scope than that which actually sets the finalizer itself. Thus the finalizer never even sees the value being finalized, avoiding the problem nicely.

The first example you have given makes it completely clear what is happening. Based on my previous understanding, I would be unsure of what the output from "g.call" would be. My first two guesses would probably have been an exception, or possibly one or zero, but at that point I'd be questioning if my understanding was actually correct. Knowing what I know now, the answer is obvious, even trivial. Note that if the example didn't use an integer, but an object, it would have fit in with my previous understanding. Only by being an immediate value did the flaw in my previous understanding become apparent.

Thankyou for taking the time to put together yet another superb post for the list. I am frequently in awe at the level of detailed knowledge you have in some of the more complex mechanics in Ruby. I hope that people encountering similar issues can also stumble across it, so that the post ends up helping considerably more people than just myself.

Cheers,
Garth

>
>> Excellent thinking. I also thought it might be something along those lines
>> too.
>
> To make it crystal clear: the reason is that there is a closure
> involved. The closure will hold on to the object referenced by a on
> method entry - unless, as you discovered, that reference is cleared.
>
> Just in case and if you don't know, here's what a closure does:
>
> irb(main):015:0> def f; x=0; lambda { x+=1 } end
> => nil
> irb(main):016:0> g = f
> => #<Proc:0x802b7778@(irb):15 (lambda)>
> irb(main):017:0> g.call
> => 1
> irb(main):018:0> g.call
> => 2
> irb(main):019:0> g.call
> => 3
>
> The closure captures the current scope, i.e. all local variables.
> This includes method arguments and "self" - and hence all member
> variables of self as well:
>
> irb(main):023:0> def f; @x = 0; lambda { @x += 1 } end
> => nil
> irb(main):024:0> g = f
> => #<Proc:0x8029f934@(irb):23 (lambda)>
> irb(main):025:0> g.call
> => 1
> irb(main):026:0> g.call
> => 2
> irb(main):027:0> g.call
> => 3
> irb(main):028:0> g.call
> => 4
>
>> I tried various combinations as well: proc, Proc.new, I think a return
>> from method(), calls to a separate object; but there was no impact on the
>> result. If "a" isn't cleared, the object is held.
>
> I would be surprised if there was. lambda and Proc both create a
> closure. Difference between lambda and Proc are in a different area:
> http://stackoverflow.com/questions/1740046/whats-the-difference-between-a-proc-and-a-lambda-in-ruby
>
>> I'm guessing that there
>> might be some way to say to not touch a thing in the current scope, but I'm
>> not sure *how* to specify it.
>
> There is no way to exclude that variable from the closure - other than
> not passing it. But that would be pointless here. :slight_smile:
>
>> I also adapted the main program based on my experience with the code below,
>> and suddenly the finalizers were called. So it's the same type of problem.
>>
>> So there's a problem, and it's avoidable. I know the "what", but don't know

···

On 17/02/13 01:00, Robert Klemme wrote:
> On Sat, Feb 16, 2013 at 12:08 PM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:
>> the "why". There is some subtlety I'm missing. Most interesting. :slight_smile:
>
> Now you should know.
>
> Kind regards
>
> robert
>