Druby and garbage collection

Hello,

I must be doing something quite wrong, but I'm stumped by this:

The server:

require 'drb'

class Server
  def pgproxy
    return DatabaseProxy.new
  end
end

class DatabaseProxy
  include DRbUndumped

  def exec(q, *methods)
      a = Array.new(10000)
      STDERR.puts "Created #{a.object_id}"; STDERR.flush
      ObjectSpace::define_finalizer(a, proc {|id| STDERR.puts
"Finalizer #{id}"; STDERR.flush } )
      a
  end
  def gc
      GC.start
  end
end
DRb.start_service("drbunix:/tmp/testdrb", Server.new)
DRb.thread.join
### end of server.rb

And the client:

require 'drb'

DRb.start_service
b = DRbObject.new(nil, "drbunix://tmp/testdrb").pgproxy
100.times do
  a = b.exec('aa')
end
b.gc
### end of client.rb

If I run them I don't see any garbage collection going on in the
server process, and the process size grows considerably.
Who is holding references to these arrays ?

Thanks in advance for any light shed on this.

Cheers,

Han Holl

Hello,

I must be doing something quite wrong, but I'm stumped by this:

The server:

require 'drb'

class Server
  def pgproxy
    return DatabaseProxy.new
  end
end

This is very, very unsafe. Your DatabaseProxy may destroy itself at any
time, since the reference is pulled out of ObjectSpace. You probably
want to use TimerIdConv instead of the default DRbIdConv.

The default IdConv uses ObjectSpace._id2ref to retrieve remote objects.
Without some reference to these objects your remote processes may be
left hanging even immediately after object creation. TimerIdConv ages
objects and allows them to expire N minutes after the last access.

class DatabaseProxy
  include DRbUndumped

  def exec(q, *methods)
      a = Array.new(10000)
      STDERR.puts "Created #{a.object_id}"; STDERR.flush
      ObjectSpace::define_finalizer(a, proc {|id|
        STDERR.puts "Finalizer #{id}"; STDERR.flush
      })

This is the wrong way to define a finalizer. The proc must be created
outside of where self exists, otherwise the block holds onto self, and
self can never be gc'd. (This is a big gotcha! Understanding why this
is wrong took me quite some time to figure out.)

      a
  end
  def gc
      GC.start
  end

end

STDERR.sync = true

class DBProxy

  @@finalizer = proc do |id|
    STDERR.puts "Finalizing #{id}"
  }

  def exec(q, *methods)
    a = Array.new(10000)

    STDERR.puts "Created #{a.object_id}"

    ObjectSpace.define_finalizer a, @@finalizer

    return a
  end

end

DRb.start_service("drbunix:/tmp/testdrb", Server.new)
DRb.thread.join
### end of server.rb

And the client:

require 'drb'

DRb.start_service
b = DRbObject.new(nil, "drbunix://tmp/testdrb").pgproxy
100.times do
  a = b.exec('aa')
end
b.gc
### end of client.rb

If I run them I don't see any garbage collection going on in the
server process, and the process size grows considerably.
Who is holding references to these arrays ?

To answer again, the finalizer is.

Defining a finalizer where you have a reference to the object you want
to finalize creates a circular reference which prevents the finalizer
from working. You must declare your finalizer in its own scope.
Either creating a static finalizer or by invoking a method that will
create one for you will allow your finalizer to work correctly.

···

Han Holl (han.holl@pobox.com) wrote:

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Eric Hodel <drbrain@segment7.net> wrote in message news:<20040707063932.GA5027@segment7.net>...

This is very, very unsafe. Your DatabaseProxy may destroy itself at any
time, since the reference is pulled out of ObjectSpace. You probably
want to use TimerIdConv instead of the default DRbIdConv.

[ cut ]
Ah, yes. I see that now. Thanks for pointing that out.
I think the solution in this case can be to make the proxy a Singleton, and
return DatabaseProxy.instance instead.

> a = Array.new(10000)
> STDERR.puts "Created #{a.object id}"; STDERR.flush
> ObjectSpace::define finalizer(a, proc {|id|
> STDERR.puts "Finalizer #{id}"; STDERR.flush
> })

This is the wrong way to define a finalizer. The proc must be created
outside of where self exists, otherwise the block holds onto self, and
self can never be gc'd. (This is a big gotcha! Understanding why this
is wrong took me quite some time to figure out.)

Ok, this example works fine now. My original problem was that large
returned datasets didn't seem to get garbage collected, and only _after_
that I put the finalizer thing in, and then I simplified the server to the
above silly thing.
I'll go and recheck the original now, which is too large and too convoluted
to post in a message.

Thanks a lot for your help,

Han Holl

(and somehow STDERR.sync doesn't quite work for me).