C Extension: Why would VALUE state appear differently during shutdown than immediately before it?

I have a graph of objects in a C extension, and as such I maintain
reference counts from children to parents as deallocations are order
dependent (see struct definition below).

While I run my program the reference counts match up, but when the
program exits, when ruby dumps its entire object table calling free on
every VALUE the parent's free functions are called before the children
as the parents were allocated first and they appear in the ruby object
table first (this was the whole reason for adding reference counting,
because otherwise order dependent deallocations in C code causes a
SEGV).

Anyways, before shutdown here are my ref counts:

[REFERENCE COUNT][O INCR] (child @ 00007fd6d0ddd5c0): 1 (parent @
00007fd6d2538460): 2

n.b. 2 refs

And after shutdown the ref counts are "magically" different:

[REFERENCE COUNT][I DECR] (child @ 00007fd6d2538460): 1 (parent @
0000000000000000): -10

n.b. -10 is just a sentinel marker, ignore that
n.b. 1 refs though does not match 2 above

The latter trace is for the root object, hence no parent. But notice the
root object's reference count of 1. Take a look at its reference count
before program termination, 2.

This is saying that I have a case of memory corruption on my part,
memory corruption on ruby's part, or my understanding of how to set up
graphs of interrelated objects in C extensions is seriously flawed.

Regarding the last option, all I am doing is maintaining a struct of:

struct handle
{
    RUBY_DATA_FUNC free_func;
    rb_atomic_t atomic;
    VALUE parent;
};

Would someone have an idea of how to approach this, what could possibly
be going on? I have been at this for three days now and I don't see any
bug on my part.

Bob

···

--
Posted via http://www.ruby-forum.com/.

For the record, if anyone else tries to write a database driver, or any
C extension that has a graph of objects that have deallocation order
dependencies...

If you are integrating with mark and sweep, and provide free hooks for
your objects, be aware that at shutdown Ruby drops objects from its
object table in allocation order (table order). This means that once it
has called your free-hooks (defined in your calls to Data_Wrap_Struct),
afterwards ever after YOU CANNOT / MUST NOT, ***EVER*** call
Data_Get_Struct on a VALUE that has been visited/dropped from the Ruby
object table. Ruby will SEGV with a very hard to diagnose issue.

Your C structs should probably do something like this:

struct hierarchical_handle
{
    // the hook to the real C/C++ free code, never accessed
    // directly by Ruby GC, only via ref count decr routines
    RUBY_DATA_FUNC free_func;

    // increment this to 1 on allocation, children inc count
    // only free-hooks decr counts; in fact, your free-hooks
    // could simply be atomic_dec(handle*)
    rb_atomic_t atomic;

    // used by ref counting routines to pin C/C++ objects
    // in memory till the last one drops its reference
    nuodb_handle * parent_handle;

    // used to pin parents in memory via rb_gc_mark
    VALUE parent;
};

Like me, just steal atomic.[c|h] from Ruby. But N.B. There is a major
bug in them, somebody in the Ruby camp used the WRONG
GCC_ATOMIC_BUILTINS !@!!

Somebody ought to fix atomic.h, its screwed up badly.

Robert Buck wrote in post #1086203:

···

I have a graph of objects in a C extension, and as such I maintain
reference counts from children to parents as deallocations are order
dependent (see struct definition below).

While I run my program the reference counts match up, but when the
program exits, when ruby dumps its entire object table calling free on
every VALUE the parent's free functions are called before the children
as the parents were allocated first and they appear in the ruby object
table first (this was the whole reason for adding reference counting,
because otherwise order dependent deallocations in C code causes a
SEGV).

Anyways, before shutdown here are my ref counts:

[REFERENCE COUNT][O INCR] (child @ 00007fd6d0ddd5c0): 1 (parent @
00007fd6d2538460): 2

n.b. 2 refs

And after shutdown the ref counts are "magically" different:

[REFERENCE COUNT][I DECR] (child @ 00007fd6d2538460): 1 (parent @
0000000000000000): -10

n.b. -10 is just a sentinel marker, ignore that
n.b. 1 refs though does not match 2 above

The latter trace is for the root object, hence no parent. But notice the
root object's reference count of 1. Take a look at its reference count
before program termination, 2.

This is saying that I have a case of memory corruption on my part,
memory corruption on ruby's part, or my understanding of how to set up
graphs of interrelated objects in C extensions is seriously flawed.

Regarding the last option, all I am doing is maintaining a struct of:

struct handle
{
    RUBY_DATA_FUNC free_func;
    rb_atomic_t atomic;
    VALUE parent;
};

Would someone have an idea of how to approach this, what could possibly
be going on? I have been at this for three days now and I don't see any
bug on my part.

Bob

--
Posted via http://www.ruby-forum.com/\.

I had a similar problem. I fixed it by preventing deallocation from
being called at the program termination.

static int in_finalizer = 0;

static void at_exit_func(VALUE val)
{
    in_finalizer = 1;
}

static void cleanup_func(void *ptr)
{
    if (in_finalizer) {
        /* Do nothing.
         * Resources are freed at the program termination.
        */
        return;
    }
    ... deallocate ptr ...
}

...
   obj = Data_Wrap_Struct(klass, mark_func, cleanup_func, ptr);
...

void
Init_xxx()
{
    rb_set_end_proc(at_exit_func, Qnil);
    ...
}

···

On Sat, Nov 24, 2012 at 10:44 PM, Robert Buck <lists@ruby-forum.com> wrote:

Would someone have an idea of how to approach this, what could possibly
be going on? I have been at this for three days now and I don't see any
bug on my part.

I live for the day 80% of these posts don't sound like Greek to me.

Being pretty new to all of this though I am loving the little 'peek inside'

Ok, carry on :slight_smile:

···

Sent from my iPhone

On Nov 24, 2012, at 10:55 AM, Robert Buck <lists@ruby-forum.com> wrote:

For the record, if anyone else tries to write a database driver, or any
C extension that has a graph of objects that have deallocation order
dependencies...

If you are integrating with mark and sweep, and provide free hooks for
your objects, be aware that at shutdown Ruby drops objects from its
object table in allocation order (table order). This means that once it
has called your free-hooks (defined in your calls to Data_Wrap_Struct),
afterwards ever after YOU CANNOT / MUST NOT, ***EVER*** call
Data_Get_Struct on a VALUE that has been visited/dropped from the Ruby
object table. Ruby will SEGV with a very hard to diagnose issue.

Your C structs should probably do something like this:

struct hierarchical_handle
{
   // the hook to the real C/C++ free code, never accessed
   // directly by Ruby GC, only via ref count decr routines
   RUBY_DATA_FUNC free_func;

   // increment this to 1 on allocation, children inc count
   // only free-hooks decr counts; in fact, your free-hooks
   // could simply be atomic_dec(handle*)
   rb_atomic_t atomic;

   // used by ref counting routines to pin C/C++ objects
   // in memory till the last one drops its reference
   nuodb_handle * parent_handle;

   // used to pin parents in memory via rb_gc_mark
   VALUE parent;
};

Like me, just steal atomic.[c|h] from Ruby. But N.B. There is a major
bug in them, somebody in the Ruby camp used the WRONG
GCC_ATOMIC_BUILTINS !@!!

Somebody ought to fix atomic.h, its screwed up badly.

Robert Buck wrote in post #1086203:

I have a graph of objects in a C extension, and as such I maintain
reference counts from children to parents as deallocations are order
dependent (see struct definition below).

While I run my program the reference counts match up, but when the
program exits, when ruby dumps its entire object table calling free on
every VALUE the parent's free functions are called before the children
as the parents were allocated first and they appear in the ruby object
table first (this was the whole reason for adding reference counting,
because otherwise order dependent deallocations in C code causes a
SEGV).

Anyways, before shutdown here are my ref counts:

[REFERENCE COUNT][O INCR] (child @ 00007fd6d0ddd5c0): 1 (parent @
00007fd6d2538460): 2

n.b. 2 refs

And after shutdown the ref counts are "magically" different:

[REFERENCE COUNT][I DECR] (child @ 00007fd6d2538460): 1 (parent @
0000000000000000): -10

n.b. -10 is just a sentinel marker, ignore that
n.b. 1 refs though does not match 2 above

The latter trace is for the root object, hence no parent. But notice the
root object's reference count of 1. Take a look at its reference count
before program termination, 2.

This is saying that I have a case of memory corruption on my part,
memory corruption on ruby's part, or my understanding of how to set up
graphs of interrelated objects in C extensions is seriously flawed.

Regarding the last option, all I am doing is maintaining a struct of:

struct handle
{
   RUBY_DATA_FUNC free_func;
   rb_atomic_t atomic;
   VALUE parent;
};

Would someone have an idea of how to approach this, what could possibly
be going on? I have been at this for three days now and I don't see any
bug on my part.

Bob

--
Posted via http://www.ruby-forum.com/\.

Robert Buck wrote in post #1086213:

struct hierarchical_handle
{

.......

    // used to pin parents in memory via rb_gc_mark
    VALUE parent;
};

Hi,

Just a little comment here. If I have to to traverse a graph of
objects, I will use something similar to "VALUE* children" instead of
"VALUE parent" for rb_gc_mark (and it is more true for the free function
pointers).

Regards,

Bill

···

--
Posted via http://www.ruby-forum.com/\.

Now that is not something I tried:

  rb_set_end_proc

Thank you very much for this tip. I will try this out too.

What I tried which did not work for me was:

  ruby_vm_at_exit

I read some threads on related topics, and those threads referred to
the ruby_vm_at_exit function, but I found that when I traced all calls
into my code, that this is called after the finalizers have finished
dumping the object table - effectively useless.

I am actually coming up with a new GC integration mini-framework (real
mini, perhaps 100 lines at most) which I will share on Github, which
will handle all basic cases for graphs in C. I think I have convinced
myself that a fundamental bifurcation between GC managed objects
(structs) and managed resources (other structs) has to occur in order to
keep this clean. The former manages the lifecycle of Data_Wrap_Struct
objects, and only incr/decr reference counts on other, internal, C
structs, and only depending upon the specified release-policy, will
recursively release all internal resources. This would allow for
optional user-directed closing of parent resources, which otherwise
would cause a SEGV when child resources are closed. I will share the
link when I have it wrapped up.

The immediate application for me is for Ruby DBI-like drivers for NuoDB,
a new RDBMS, going GA real soon.

Takehiro, thank you for your tip.

Takehiro Kubo wrote in post #1086244:

···

On Sat, Nov 24, 2012 at 10:44 PM, Robert Buck <lists@ruby-forum.com> > wrote:

Would someone have an idea of how to approach this, what could possibly
be going on? I have been at this for three days now and I don't see any
bug on my part.

I had a similar problem. I fixed it by preventing deallocation from
being called at the program termination.

static int in_finalizer = 0;

static void at_exit_func(VALUE val)
{
    in_finalizer = 1;
}

static void cleanup_func(void *ptr)
{
    if (in_finalizer) {
        /* Do nothing.
         * Resources are freed at the program termination.
        */
        return;
    }
    ... deallocate ptr ...
}

...
   obj = Data_Wrap_Struct(klass, mark_func, cleanup_func, ptr);
...

void
Init_xxx()
{
    rb_set_end_proc(at_exit_func, Qnil);
    ...
}

--
Posted via http://www.ruby-forum.com/\.

I hear what you are saying Bill. I am writing up my thoughts on how to
do this much better than my interim, NuoDB RC2, release. You referred to
VALUE**, in effect. Here were my thoughts, in progress, still polishing
them up and thinking through all the use-cases in terms of language
integration, all permutations of code folks could write (read my thread
I just sent to Takehiro referring to a bifurcation of the internally
managed objects), but this will give you a glimpse:

struct os_entity // internal stuff, user managed or ref count managed
{
    void * data_handle;
    unsigned int flags;
    rb_atomic_t refers;

    callback incr_func;
    callback decr_func;
    callback free_func;

    os_entity * parent;
};

struct gc_entity // what the gc sees and manages
{
    os_entity * entity;
    VALUE ** marks;
};

You mentioned a list of VALUE*, yes, this was where my thoughts were
leading me. But as out of order frees can occur at shutdown, these
VALUE** only are used in the mark phase, and I do not traverse the graph
of VALUE references, rather traverse the internal graph of os_entity
objects, as I call them (for lack of a better name).

Comment if you are interested, I will share the net result on Github
when I am done. For RC2 of NuoDB, the updated Ruby driver will show up
on Github within a day or two, which has some ideas incorporated from
you folks. I'd live for folks to comment on my code there (but please,
not on what exists there till after Nov 27, 2012.

Admin Tensor wrote in post #1086223:

···

Robert Buck wrote in post #1086213:

struct hierarchical_handle
{

.......

    // used to pin parents in memory via rb_gc_mark
    VALUE parent;
};

Hi,

Just a little comment here. If I have to to traverse a graph of
objects, I will use something similar to "VALUE* children" instead of
"VALUE parent" for rb_gc_mark (and it is more true for the free function
pointers).

Regards,

Bill

--
Posted via http://www.ruby-forum.com/\.