Soft object reference for mark and sweep

GD1 · 26 January 2013 06:13

Hi.

I recall a mention a bit back about wanting to see more interesting problems on ruby-talk. Let's see if I can help out.

I have a system in which a Ruby script has a number of remote object references (to a script in another process), identified by an object with a unique string ID, which is used to make remote calls on or using these objects. The system is all in place and is working fine- it's actually multi-way and each connection is bidirectional, but that is unimportant for now.

I would like the script to be able to effectively ignore whether something is remote or not, which means I'd like it to be able to pass around these remote objects and be able to assume that once an object is no longer used, it'll just drop out of use and be garbage collected in time, like everything else.

One problem: The store keeps a reference to the object in its Hash, for use in lookups and incoming calls. Because of this, it'll never be collected. I'd like to somehow have it be collected if it's just down to one "reference", and the information available to the store, so it knows it can lose- or has just lost- the object.

The problem is essentially similar to the idea of weak pointers in a reference-counted system- something by which you can hold a weak reference that can detect, but doesn't prevent, cleanup. If this was a C++ problem, I'd be using Boost smart pointers, the store would use a weak_ptr, and everything else would use a shared_ptr.

However, this doesn't really make sense in a mark-and-sweep context, nor does the idea of the object being "partially freeable" make sense, ie. aware that it is just about to vanish or can be made free.

Any ideas as to a good way or suitable mechanisms to implement this? Is there some way to look at an object and be able to detect that this is the last reference that exists to the object in a script?

NB: I already have one idea- the Ruby script is embedded, so I might be able to do something clever on the C/C++ side with Data_Wrap_Struct and family, so that when a certain object is freed, I change something on the C side, and the objects are stored in a container that doesn't pass on any of the mark calls. I'm not sure of the specifics yet, I'm sure it's quite possible, but I thought I'd just dig around for suggestions beforehand.

Cheers,
Garth

Nokan_Emiro · 26 January 2013 08:52

Any ideas as to a good way or suitable mechanisms to implement this?

Just a silly one: if you seldom use that Hash in the store, you can
eliminate it. You could
still lookup your objects by iterating trough ObjectSpace and find them.
But this way the
GC will be able to drop objects if there's no reference to them. Basically
what you do here
the same old stuff: a trade off between memory and CPU. You have to
decide if lookup
speed is more important than memory footprint or the other way around.
(...or wait for a
less silly answer here...

GD1 · 18 February 2013 07:25

Hi all,

To anybody interested, I've recently finished implementing a solution to my original problem, and I thought I'd share the results.

In the end, I basically used two main tools to solve the entire problem:

- I created my own C/C++-based data type which did one thing: Held a Ruby value, and didn't mark it during mark and sweep. Nothing fancy, and probably could have been done in pure Ruby by storing/using object_id and _id2ref. In fact, part of the first-pass solution did exactly that.

- Used finalizers each time one of these weak reference objects was created, that when called, amongst other tasks, wiped the value in the weak reference.

I then used these two things together as mostly-functional form of weak reference. The whole problem essentially reduced down to the application of these two tools in some way. Well, that, and rewriting a whole bunch of code that made some assumptions that clashed with how it worked.

Cheers,
Garth

···

On 26/01/13 16:43, Garthy D wrote:
>
> Hi.
>
> I recall a mention a bit back about wanting to see more interesting
> problems on ruby-talk. Let's see if I can help out.
>
> I have a system in which a Ruby script has a number of remote object
> references (to a script in another process), identified by an object
> with a unique string ID, which is used to make remote calls on or using
> these objects. The system is all in place and is working fine- it's
> actually multi-way and each connection is bidirectional, but that is
> unimportant for now.
>
> I would like the script to be able to effectively ignore whether
> something is remote or not, which means I'd like it to be able to pass
> around these remote objects and be able to assume that once an object is
> no longer used, it'll just drop out of use and be garbage collected in
> time, like everything else.
>
> One problem: The store keeps a reference to the object in its Hash, for
> use in lookups and incoming calls. Because of this, it'll never be
> collected. I'd like to somehow have it be collected if it's just down to
> one "reference", and the information available to the store, so it knows
> it can lose- or has just lost- the object.
>
> The problem is essentially similar to the idea of weak pointers in a
> reference-counted system- something by which you can hold a weak
> reference that can detect, but doesn't prevent, cleanup. If this was a
> C++ problem, I'd be using Boost smart pointers, the store would use a
> weak_ptr, and everything else would use a shared_ptr.
>
> However, this doesn't really make sense in a mark-and-sweep context, nor
> does the idea of the object being "partially freeable" make sense, ie.
> aware that it is just about to vanish or can be made free.
>
> Any ideas as to a good way or suitable mechanisms to implement this? Is
> there some way to look at an object and be able to detect that this is
> the last reference that exists to the object in a script?
>
> NB: I already have one idea- the Ruby script is embedded, so I might be
> able to do something clever on the C/C++ side with Data_Wrap_Struct and
> family, so that when a certain object is freed, I change something on
> the C side, and the objects are stored in a container that doesn't pass
> on any of the mark calls. I'm not sure of the specifics yet, I'm sure
> it's quite possible, but I thought I'd just dig around for suggestions
> beforehand.
>
> Cheers,
> Garth
>

GD1 · 26 January 2013 09:12

Hi Nokan,

Thankyou for the input, and no, it's not silly. It's opened up the possibility of another approach- don't use the Hash at all, and merely dig through ObjectSpace when needed, possibly even externally to the script in some way. Combined with a cache it might form an interesting first-pass solution. In any case, it raises the question as to whether looking through things externally (ie. via ObjectSpace) could offer a potential solution. I'm not 100% sure how it'd all fit together, but it's definitely got me thinking.

Cheers,
Garth

PS. I'm also relieved that my original post was understandable- it's a problem that's a little hard to explain.

···

On 26/01/13 19:22, Nokan Emiro wrote:

    Any ideas as to a good way or suitable mechanisms to implement this?

Just a silly one: if you seldom use that Hash in the store, you can
eliminate it. You could
still lookup your objects by iterating trough ObjectSpace and find them.
  But this way the
GC will be able to drop objects if there's no reference to them.
  Basically what you do here
the same old stuff: a trade off between memory and CPU. You have to
decide if lookup
speed is more important than memory footprint or the other way around.
  (...or wait for a
less silly answer here...

Robert_K1 · 26 January 2013 16:16

I think using WeakReference is even better because ObjectSpace lookup
is slow and does not work in some circumstances (JRuby with specific
settings).

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/weakref/rdoc/index.html

If need by one can always create a special Hash wrapper which converts
from and to WeakReference on insert and retrieval.

Kind regards

robert

···

On Sat, Jan 26, 2013 at 9:52 AM, Nokan Emiro <uzleepito@gmail.com> wrote:

Any ideas as to a good way or suitable mechanisms to implement this?

Just a silly one: if you seldom use that Hash in the store, you can
eliminate it. You could
still lookup your objects by iterating trough ObjectSpace and find them.
But this way the
GC will be able to drop objects if there's no reference to them. Basically
what you do here
the same old stuff: a trade off between memory and CPU. You have to decide
if lookup
speed is more important than memory footprint or the other way around.
(...or wait for a
less silly answer here...

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

GD1 · 27 January 2013 07:21

Hi Robert,

Any ideas as to a good way or suitable mechanisms to implement this?

Just a silly one: if you seldom use that Hash in the store, you can
eliminate it. You could
still lookup your objects by iterating trough ObjectSpace and find them.
But this way the
GC will be able to drop objects if there's no reference to them. Basically
what you do here
the same old stuff: a trade off between memory and CPU. You have to decide
if lookup
speed is more important than memory footprint or the other way around.
(...or wait for a
less silly answer here...

I think using WeakReference is even better because ObjectSpace lookup
is slow and does not work in some circumstances (JRuby with specific
settings).

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/weakref/rdoc/index.html

If need by one can always create a special Hash wrapper which converts
from and to WeakReference on insert and retrieval.

Thanks for that. I didn't even know WeakRef existed! That's going to open up a wide range of solutions now. Looking at the implementation, it's using finalizers, and I had been wondering if finalizers were going to lead to a potential solution.

The WeakRef interface worries me though. You'd normally expect to see just one call on such a thing (lock, which turns weak to strong/fail) and maybe a check call, with a big warning that the result might change post-check. A delegation-style interface with a check only seems a bit unusual. However, I may just not have a proper understanding of how it works yet.

Still- it's opened up a bunch of possibilities and ideas- lots of things to look into and explore. Thankyou again Robert for yet another one of your excellent suggestions.

Cheers,
Garth

···

On 27/01/13 02:46, Robert Klemme wrote:

On Sat, Jan 26, 2013 at 9:52 AM, Nokan Emiro<uzleepito@gmail.com> wrote:

Charles_Nutter · 8 February 2013 04:42

I strongly discourage using WeakRef for its delegate interface. It's a
terrible pattern that simply needs to go away. If you use WeakRef,
just use it as a weak object holder and always check for nil on the
return value. You should be ok.

You might also check out the "weakling" gem, which provides some other
nice features from JVM like reference queues, where your WeakRefs get
inserted as their references get collected. It's a bit more efficient
(or at least doesn't impact GC performance) than using finalizers.

- Charlie

···

On Sun, Jan 27, 2013 at 8:21 AM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

The WeakRef interface worries me though. You'd normally expect to see just
one call on such a thing (lock, which turns weak to strong/fail) and maybe a
check call, with a big warning that the result might change post-check. A
delegation-style interface with a check only seems a bit unusual. However, I
may just not have a proper understanding of how it works yet.

GD1 · 16 February 2013 06:46

Hi Charlie,

The WeakRef interface worries me though. You'd normally expect to see just
one call on such a thing (lock, which turns weak to strong/fail) and maybe a
check call, with a big warning that the result might change post-check. A
delegation-style interface with a check only seems a bit unusual. However, I
may just not have a proper understanding of how it works yet.

I strongly discourage using WeakRef for its delegate interface. It's a
terrible pattern that simply needs to go away. If you use WeakRef,
just use it as a weak object holder and always check for nil on the
return value. You should be ok.

You might also check out the "weakling" gem, which provides some other
nice features from JVM like reference queues, where your WeakRefs get
inserted as their references get collected. It's a bit more efficient
(or at least doesn't impact GC performance) than using finalizers.

Thankyou for your thoughts on this one. Apologies for my slow reply- some time had elapsed and I've not got anything to notify me on new replies to the thread. I only just noticed this one.

Doing some research on WeakRef revealed it had significant problems under MRI- not just with the Delegate interface, but that is sometimes got the object references wrong. That pretty-much rules it out for me. I didn't proceed using WeakRef in the end. This still isn't a solved problem for me so I can't say how I've overcome it yet, but if my current approach turns out then hopefully I'll have something interesting to report back.

Cheers,
Garth

···

On 08/02/13 15:12, Charles Oliver Nutter wrote:

On Sun, Jan 27, 2013 at 8:21 AM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:

Topic		Replies	Views
How to mark an object so it is not garbage collected ruby-talk	3	166	26 January 2015
C Extension Shutdown GC vs Free Functions Called ruby-talk	4	157	21 November 2012
Garbage Collection: Marking RData ptr ruby-talk	2	99	16 September 2006
Force_recycle ruby-talk	2	87	10 August 2005
Force_recycle ruby-talk	1	141	11 August 2005

Soft object reference for mark and sweep

Related topics