I'm working on a project that involves wrapping C libraries and
presenting them as Ruby classes.
The libraries are logically grouped together around "objects", where the
object is a pointer to a struct that is passed to each of the
appropriate library functions. For example, we have the idea of a
Selectable type which is represented by an instance of pn_selectable_t
in C.
And it's here where I'm hitting a lack of documentation: object
references and reference counting.
I need to be able to update reference counts on C "objects" when they're
used by Ruby so that nothing is mistakenly GC'd. But I can't seem to
find anything online or in my books for how to increment/decrement
references.
Ruby does not rely on reference counting of objects, but instead scans
the machine and VM stack for live objects. The API for interacting with
the GC is documented in README.EXT in the source tree.
Like BoehmGC, it's mostly "hands-off" from the perspective of a C
programmer, but there are some unfortunate subtleties such as the need
for RB_GC_GUARD in some places.
Feel free to ask here if you need any clarification and I'll
try my best to answer and update README.EXT as needed.
···
"Darryl L. Pierce" <mcpierce@gmail.com> wrote:
And it's here where I'm hitting a lack of documentation: object
references and reference counting.
I need to be able to update reference counts on C "objects" when they're
used by Ruby so that nothing is mistakenly GC'd. But I can't seem to
find anything online or in my books for how to increment/decrement
references.
Can anyone explain me why the tests exist?
Is the API broken?
Are the ruby tests broken?
How can we infere the memsize of an object if we are not able to use ObjectSpace#memsize_of?
If you've received this email by mistake, we're sorry for bothering you. It may contain information that's confidential, so please delete it without sharing it. And if you let us know, we can try to stop it from happening again. Thank you.
We may monitor any emails sent or received by us, or on our behalf. If we do, this will be in line with relevant law and our own policies.
Sage (UK) Limited. Registered in England at North Park, Newcastle upon Tyne, NE13 9AA. Registered number 1045967.
Then I create a pure Ruby object (say an instance of a class named Farkle)
and assign that object to my instance of pn_holder_t.
Use Data_Wrap_Struct and friends to wrap up the C struct as a ruby object. Wrapping will involve registering a mark function and a free function. The mark function will mark any sub-objects that it holds. This way the GC can do its job w/o knowing how to walk through a C struct. The free function is responsible for cleaning up its children and getting rid of itself. In this case, you'd NULL out your void* to the pure ruby object and you'd free the struct.
One question I'd have is: who's holding onto the C struct?
Finally, I delete at some point my pn_holder_t instance by calling the
function_pn_holder_free. That method just frees the used memory that the
struct was holding.
Don't do this. Instead, let the GC do it for you. It'll go through the registered free function.
That, or, depending on the answer to my question above about who's holding the C struct... if it has a very limited lifespan, say that in the span of one function call it is created and then freed, then you go ahead and free your own object, but you NULL out your reference to the pure ruby object before you do so.
Hold do I ensure that my instance of Farkle isn't out there taking up
memory and never being reclaimed? Assume that Farkles have a one-way
reference to some other object that may or may not be used, but nothing
but the pn_holder_t ever refenced the Farkle.
You can never _really_ ensure that things get cleaned up. That's the problem with more conservative collectors. If something _looks_ like it is pointing to your object, then your object will remain. There are pros and cons to this approach, like not having problems with cycles (like reference counting does) and having a much cleaner implementation.
···
On Jan 22, 2015, at 13:36, Darryl L. Pierce <mcpierce@gmail.com> wrote:
Swig doesn't really do anything but set up the structs. It's the Ruby GC that ensures that things don't get cleaned up until there are no more references to them. For example:
···
On Jan 22, 2015, at 14:23, Darryl L. Pierce <mcpierce@gmail.com> wrote:
In the wrapper code generated by Swig (called rubyRUBY_wrap.c), I see
where Data_Wrap_Struct is called for each of our structs in the
library. So is it possible that, when these structs are reclaimed Swig
is also making it so that any possible Ruby object is no longer
referenced and can be garbage collected by Ruby?
If ruby-obj1 goes out of scope, it can get collected, but if ruby-obj2 is still valid, then the GC won't collect the C-struct or its ruby wrapper. The struct+wrapper won't get collected until all references to them go out of scope.
It's similar to reference counting, except that it's actually reference _checking_ instead of just _counting_. This is why it is possible to collect cyclic references.
Oh, I'm going to have TONS of questions. And I appreciate your help
in getting my head around this.
So, let me give a more concrete example of what I'm trying to do. Maybe
by kicking this around we can come to a better understanding of what I'm
doing and how to do it in Ruby.
So, in our project, we create have "objects", as I mentioned earlier.
They are internally represented as structs that have void * to other
objects. And in some cases those void * will point to a _Ruby_ object
and not something the C code knew about.
Say we have one such struct named pn_holder_t that has a void * that
points to an arbitrary chunk of memory:
I create an instance of it with a C function named pn_holder_new, which
allocates memory and initializes the pointer reference.
Then I create a pure Ruby object (say an instance of a class named Farkle)
and assign that object to my instance of pn_holder_t.
Finally, I delete at some point my pn_holder_t instance by calling the
function_pn_holder_free. That method just frees the used memory that the
struct was holding.
Hold do I ensure that my instance of Farkle isn't out there taking up
memory and never being reclaimed? Assume that Farkles have a one-way
reference to some other object that may or may not be used, but nothing
but the pn_holder_t ever refenced the Farkle.
Does that spell out a clear enough scenario for what I'm trying to do?
···
On Thu, Jan 22, 2015 at 09:13:08PM +0000, Eric Wong wrote:
Ruby does not rely on reference counting of objects, but instead scans
the machine and VM stack for live objects. The API for interacting with
the GC is documented in README.EXT in the source tree.
Like BoehmGC, it's mostly "hands-off" from the perspective of a C
programmer, but there are some unfortunate subtleties such as the need
for RB_GC_GUARD in some places.
Feel free to ask here if you need any clarification and I'll
try my best to answer and update README.EXT as needed.
--
Darryl L. Pierce <mcpierce@gmail.com>
Famous last words:
"I wonder what happens if we do it this way?"
>
> Then I create a pure Ruby object (say an instance of a class named Farkle)
> and assign that object to my instance of pn_holder_t.
Use Data_Wrap_Struct and friends to wrap up the C struct as a ruby object. Wrapping will involve registering a mark function and a free function. The mark function will mark any sub-objects that it holds. This way the GC can do its job w/o knowing how to walk through a C struct. The free function is responsible for cleaning up its children and getting rid of itself. In this case, you'd NULL out your void* to the pure ruby object and you'd free the struct.
I forgot one (apparently important) element to our scenario. We're using
Swig to generate a one-to-one Ruby language binding for the C libraries.
So it's Swig that's taking something like pn_holder_t and wrapping it so
that Ruby can see it.
At the Ruby layer that pn_holder_t looks like a Ruby object, but has no
useful methods. It just gets passed around to the library functions, and
Swig ensures that the C struct inside is unpackaged before the C code is
called. See my next paragraph below.
One question I'd have is: who's holding onto the C struct?
In our project, we have classes that either call down to the C libraries
to create an instance of a struct, or else are passed in a previously
created instance. So, for example, a Ruby wrapper class for the
pn_holder_t type would be named Holder and would have an instance
variable named @impl that references a Swig-wrapped pn_holder_t struct.
> Finally, I delete at some point my pn_holder_t instance by calling the
> function_pn_holder_free. That method just frees the used memory that the
> struct was holding.
Don't do this. Instead, let the GC do it for you. It'll go through the registered free function.
I think this is where I'm concerned, but maybe unnecessarily?
In the wrapper code generated by Swig (called rubyRUBY_wrap.c), I see
where Data_Wrap_Struct is called for each of our structs in the
library. So is it possible that, when these structs are reclaimed Swig
is also making it so that any possible Ruby object is no longer
referenced and can be garbage collected by Ruby?
···
On Thu, Jan 22, 2015 at 01:58:25PM -0800, Ryan Davis wrote:
> On Jan 22, 2015, at 13:36, Darryl L. Pierce <mcpierce@gmail.com> wrote:
That, or, depending on the answer to my question above about who's holding the C struct... if it has a very limited lifespan, say that in the span of one function call it is created and then freed, then you go ahead and free your own object, but you NULL out your reference to the pure ruby object before you do so.
> Hold do I ensure that my instance of Farkle isn't out there taking up
> memory and never being reclaimed? Assume that Farkles have a one-way
> reference to some other object that may or may not be used, but nothing
> but the pn_holder_t ever refenced the Farkle.
You can never _really_ ensure that things get cleaned up. That's the problem with more conservative collectors. If something _looks_ like it is pointing to your object, then your object will remain. There are pros and cons to this approach, like not having problems with cycles (like reference counting does) and having a much cleaner implementation.
--
Darryl L. Pierce <mcpierce@gmail.com>
Famous last words:
"I wonder what happens if we do it this way?"
Thanks for the info! I think I've got a decent enough grasp of this. And
coupled with some experimenting I did last night, I'm fairly confident
that what I'm doing at this point won't result in a memory leak.
I created a branch [1] on my repository that:
* creates a C struct named pn_rubyref_t which contains a single void *
that references any arbitrary chunk of data
* a few C library APIs to do CRUD operations on pn_rubyref_t
* wrapped the above in Swig
* created a pure Ruby class, named Farkle, to be attached to
pn_rubyref_t objects, and
* a circular reference from Farkle back to pn_rubyref_t
I wrote an app to test this, creating 1M instances of rb_rubyref_t from
Ruby (via Swig) and add an instance of Farkle to each one. I also added
a finalize! method to Farkle to simply output text when the object is
GC'd. And I used ObjectSpace to count the number of instances of each
type after doing the create and some partial clean up.
All of it looked good. Memory usage showed no leaks, and I saw objects
being collected in just the way I would have expected.
On Thu, Jan 22, 2015 at 04:04:01PM -0800, Ryan Davis wrote:
Swig doesn't really do anything but set up the structs. It's the Ruby GC that ensures that things don't get cleaned up until there are no more references to them. For example:
If ruby-obj1 goes out of scope, it can get collected, but if ruby-obj2 is still valid, then the GC won't collect the C-struct or its ruby wrapper. The struct+wrapper won't get collected until all references to them go out of scope.
It's similar to reference counting, except that it's actually reference _checking_ instead of just _counting_. This is why it is possible to collect cyclic references.
--
Darryl L. Pierce <mcpierce@gmail.com>
Famous last words:
"I wonder what happens if we do it this way?"
This happens in some particular situations in Ruby. You have a
"simple" version at core and an "extended" version at standard lib so
you keep thin and get fat only when you need.
Can anyone explain me why the tests exist?
Is the API broken?
Are the ruby tests broken?
How can we infere the memsize of an object if we are not able to use
ObjectSpace#memsize_of?
If you've received this email by mistake, we're sorry for bothering you. It
may contain information that's confidential, so please delete it without
sharing it. And if you let us know, we can try to stop it from happening
again. Thank you.
We may monitor any emails sent or received by us, or on our behalf. If we
do, this will be in line with relevant law and our own policies.
Sage (UK) Limited. Registered in England at North Park, Newcastle upon Tyne,
NE13 9AA. Registered number 1045967.
Now I saw the require ‘objspace’ in the unit test file.
This means that the unit tests for objspace standard library class is being done in the core unit test?
Is there a common logic for those situations?
Thanks
···
On 26 Jan 2015, at 17:23, Abinoam Jr. <abinoam@gmail.com> wrote:
Hi Daniel,
This method is from 'that' ObjectSpace from standard lib (not core).
This happens in some particular situations in Ruby. You have a
"simple" version at core and an "extended" version at standard lib so
you keep thin and get fat only when you need.
Can anyone explain me why the tests exist?
Is the API broken?
Are the ruby tests broken?
How can we infere the memsize of an object if we are not able to use
ObjectSpace#memsize_of?
If you've received this email by mistake, we're sorry for bothering you. It
may contain information that's confidential, so please delete it without
sharing it. And if you let us know, we can try to stop it from happening
again. Thank you.
We may monitor any emails sent or received by us, or on our behalf. If we
do, this will be in line with relevant law and our own policies.
Sage (UK) Limited. Registered in England at North Park, Newcastle upon Tyne,
NE13 9AA. Registered number 1045967.
If you've received this email by mistake, we're sorry for bothering you. It may contain information that's confidential, so please delete it without sharing it. And if you let us know, we can try to stop it from happening again. Thank you.
We may monitor any emails sent or received by us, or on our behalf. If we do, this will be in line with relevant law and our own policies.
Sage (UK) Limited. Registered in England at North Park, Newcastle upon Tyne, NE13 9AA. Registered number 1045967.
The thing I didn't verify was that the underlying pure Ruby object was
still valid. So I'm three steps forward but back maybe 1 1/2 steps back
in that regard.
So now the question becomes: how to mark a Ruby object so that it won't
get GC'd even if _Ruby_ doesn't see it? I have control over when the
object holding it lets go so I can, at that time, update the object so
that it can be GC'd.
···
On Fri, Jan 23, 2015 at 10:39:23AM -0500, Darryl L. Pierce wrote:
On Thu, Jan 22, 2015 at 04:04:01PM -0800, Ryan Davis wrote:
> Swig doesn't really do anything but set up the structs. It's the Ruby GC that ensures that things don't get cleaned up until there are no more references to them. For example:
>
> +-----------+
> > ruby-obj1 |---
> +-----------+ \ +------------------+ +----------+
> -->| Data_Wrap_Struct |-------->| C-struct |
> +-----------+ / +------------------+ +----------+
> > ruby-obj2 |---
> +-----------+
>
> If ruby-obj1 goes out of scope, it can get collected, but if ruby-obj2 is still valid, then the GC won't collect the C-struct or its ruby wrapper. The struct+wrapper won't get collected until all references to them go out of scope.
>
> It's similar to reference counting, except that it's actually reference _checking_ instead of just _counting_. This is why it is possible to collect cyclic references.
Thanks for the info! I think I've got a decent enough grasp of this. And
coupled with some experimenting I did last night, I'm fairly confident
that what I'm doing at this point won't result in a memory leak.
I created a branch [1] on my repository that:
* creates a C struct named pn_rubyref_t which contains a single void *
that references any arbitrary chunk of data
* a few C library APIs to do CRUD operations on pn_rubyref_t
* wrapped the above in Swig
* created a pure Ruby class, named Farkle, to be attached to
pn_rubyref_t objects, and
* a circular reference from Farkle back to pn_rubyref_t
I wrote an app to test this, creating 1M instances of rb_rubyref_t from
Ruby (via Swig) and add an instance of Farkle to each one. I also added
a finalize! method to Farkle to simply output text when the object is
GC'd. And I used ObjectSpace to count the number of instances of each
type after doing the create and some partial clean up.
All of it looked good. Memory usage showed no leaks, and I saw objects
being collected in just the way I would have expected.
Wrap a C pointer into a Ruby object. If object has references to other
Ruby objects, they should be marked by using the mark function during
the GC process. Otherwise, mark should be 0. When this object is no
longer referred by anywhere, the pointer will be discarded by free
function.
···
On Jan 23, 2015, at 10:33, Darryl L. Pierce <mcpierce@gmail.com> wrote:
So now the question becomes: how to mark a Ruby object so that it won't
get GC'd even if _Ruby_ doesn't see it? I have control over when the
object holding it lets go so I can, at that time, update the object so
that it can be GC'd.
I sent out a question in a separate message just a few minutes ago
regarding this; i.e., how to actually mark them. I see rb_gc_mark() as
the way to do this part. But now I want to know how to unmark objects so
they can then be GC'd as needed?
···
On Fri, Jan 23, 2015 at 02:25:13PM -0800, Ryan Davis wrote:
> On Jan 23, 2015, at 10:33, Darryl L. Pierce <mcpierce@gmail.com> wrote:
>
> So now the question becomes: how to mark a Ruby object so that it won't
> get GC'd even if _Ruby_ doesn't see it? I have control over when the
> object holding it lets go so I can, at that time, update the object so
> that it can be GC'd.
Wrap a C pointer into a Ruby object. If object has references to other
Ruby objects, they should be marked by using the mark function during
the GC process. Otherwise, mark should be 0. When this object is no
longer referred by anywhere, the pointer will be discarded by free
function.
--
Darryl L. Pierce <mcpierce@gmail.com>
Famous last words:
"I wonder what happens if we do it this way?"
You don't unmark. Each GC cycle calls your mark function. Anything NOT marked is considered garbage.
Yes, it should be rb_gc_mark. Yes, it seems to be entirely undocumented. Yes, I'm not happy about that and am considering my options to fix it.
···
On Jan 26, 2015, at 11:08, Darryl L. Pierce <mcpierce@gmail.com> wrote:
I sent out a question in a separate message just a few minutes ago
regarding this; i.e., how to actually mark them. I see rb_gc_mark() as
the way to do this part. But now I want to know how to unmark objects so
they can then be GC'd as needed?
I sent out a question in a separate message just a few minutes ago
regarding this; i.e., how to actually mark them. I see rb_gc_mark() as
the way to do this part. But now I want to know how to unmark objects so
they can then be GC'd as needed?
I believe that happens automatically when the object whose mark method
was called is freed. The time I used the mark method (3 years ago) it
looks like I did not have to do anything special when freeing the
object.
Carlo
···
Subject: Re: Native extensions and reference counting....
Date: Mon 26 Jan 15 02:08:26PM -0500
--
* Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
* di parlare tanto di amore e di rettitudine? (Chuang-Tzu)
Understood. Now I'm looking for how to hook into this with Swig so that
I can add a mark function to the underlying C code. It seems quite
possible, however the example in the Swig docs is for C++ code and not C
structs. While they're effectively the same, I haven't managed to get my
mark function to be invoked as of yet...
···
On Mon, Jan 26, 2015 at 01:35:57PM -0800, Ryan Davis wrote:
> On Jan 26, 2015, at 11:08, Darryl L. Pierce <mcpierce@gmail.com> wrote:
>
> I sent out a question in a separate message just a few minutes ago
> regarding this; i.e., how to actually mark them. I see rb_gc_mark() as
> the way to do this part. But now I want to know how to unmark objects so
> they can then be GC'd as needed?
You don't unmark. Each GC cycle calls your mark function. Anything NOT marked is considered garbage.
Yes, it should be rb_gc_mark. Yes, it seems to be entirely undocumented. Yes, I'm not happy about that and am considering my options to fix it.
--
Darryl L. Pierce <mcpierce@gmail.com>
Famous last words:
"I wonder what happens if we do it this way?"