C extension: How to check if a VALUE is still alive (not being GC'ed)?

Hi, I'm coding an async DNS resolver for EventMachine based on udns (a
C library).

In the Ruby C extension, when submitting a DNS query I pass the
current object (VALUE pointer) as an argument of the DNS query C
function.
Later on, when the DNS server replies, EventMachine (which watches on
the UDP socket of udns) invokes the C callback function belonging to
such query, and that callback includes the given VALUE as one of its
arguments. So I invoke a Ruby method for such VALUE (rb_funcall).

The problem is that the VALUE could be GC'ed (or being marked for so)
if the programmer assigns it to "nil" before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC'ed neither marked for being GC'ed). How can I
inspect that? (I mean in C).

Thanks a lot.

···

--
Iñaki Baz Castillo
<ibc@aliax.net>

The easiest way would be to assign this pointer to the resolver object
as an instance variable (with rb_iv_set), or perhaps store it inside a
hash which maps request ID's to pointers, if you want to be able to use
the same resolver for several simultaneous callbacks.

···

On Sun, 6 Feb 2011 08:02:39 +0900 Iñaki Baz Castillo <ibc@aliax.net> wrote:

Hi, I'm coding an async DNS resolver for EventMachine based on udns (a
C library).

In the Ruby C extension, when submitting a DNS query I pass the
current object (VALUE pointer) as an argument of the DNS query C
function.
Later on, when the DNS server replies, EventMachine (which watches on
the UDP socket of udns) invokes the C callback function belonging to
such query, and that callback includes the given VALUE as one of its
arguments. So I invoke a Ruby method for such VALUE (rb_funcall).

The problem is that the VALUE could be GC'ed (or being marked for so)
if the programmer assigns it to "nil" before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC'ed neither marked for being GC'ed). How can I
inspect that? (I mean in C).

--
  WBR, Peter Zotov

How is rb_gc_mark() insufficient here? Marking prevents objects from being
GCed... you seem to be suggesting it's the other way around.

If you mark all VALUEs your C extension is internally referencing in your C
extension's "mark" function it will prevent them from being garbage
collected.

As Peter said though, if you use rb_iv_set, you get this behavior for free
as Ruby automatically marks instance variables for you.

···

On Sat, Feb 5, 2011 at 4:02 PM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

The problem is that the VALUE could be GC'ed (or being marked for so)
if the programmer assigns it to "nil" before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC'ed neither marked for being GC'ed). How can I
inspect that? (I mean in C).

--
Tony Arcieri
Medioh! Kudelski

Yes, in fact I must do in order to avoid the Query object being GC'ed.
So yes, better if I handle that hash internally within the resolver
rather than letting the user accessing the hash.

Thanks.

···

2011/2/6 Peter Zotov <whitequark@whitequark.org>:

The easiest way would be to assign this pointer to the resolver object
as an instance variable (with rb_iv_set), or perhaps store it inside a
hash which maps request ID's to pointers, if you want to be able to use
the same resolver for several simultaneous callbacks.

--
Iñaki Baz Castillo
<ibc@aliax.net>

The problem is that the VALUE could be GC'ed (or being marked for so)
if the programmer assigns it to "nil" before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC'ed neither marked for being GC'ed). How can I
inspect that? (I mean in C).

How is rb_gc_mark() insufficient here? Marking prevents objects from being
GCed... you seem to be suggesting it's the other way around.

Sorry, I was wrong about the meaning of "marking an object".

If you mark all VALUEs your C extension is internally referencing in your C
extension's "mark" function it will prevent them from being garbage
collected.

But this is not my case. My case is as follows (an example code):

···

2011/2/6 Tony Arcieri <tony.arcieri@medioh.com>:

-------------------------------------------------------------------------------------
domain = ARGV[0]

module UdnsWatcher
  def initialize(resolver)
    @resolver = resolver
  end

  def notify_readable
    @resolver.ioevent
  end
end

EM.run do

  resolver = EM::Udns::Resolver.new

  EM.watch resolver.fd, UdnsWatcher, resolver do |conn|
    conn.notify_readable = true
  end

  query = EM::Udns::Query.new
  query.submit_A resolver, domain
  query.callback do |result|
    puts "DEFERRABLE CALLBACK: result = #{result.inspect}"
  end

end
-------------------------------------------------------------------------------------

- "Resolver" class is my C extension which wraps a C struct
'dns_context', no more Ruby objects for now.

- "Query" class is pure Ruby and includes EM::Deferrable (it has nothing else).

- "query.submit_A resolver, domain" invokes a function of udns C library:

     VALUE Query_submit_query_A(VALUE self, VALUE context, VALUE str)
     {
       struct dns_ctx *dns_context = NULL;
       char *domain;

       Data_Get_Struct(context, struct dns_ctx, dns_context);
       domain = StringValueCStr(str);

       dns_submit_a4(dns_context, domain, 0, dns_res_A_cb, (void*)self);
       [...]
    }

As you can see "self" is passed as argument to dns_submit_a4(). This
is because when the DNS response arrives, a callback "dns_res_A_cb()"
function will be called, and that function contains as argument the
same (void*)self so I can know which Query instance the response
belongs to, and can invoke "set_deferred_status" by using
rb_funcall().

But in my above Ruby code I don't store "query" in a hash or array, so
it could be GC'ed before the DNS response arrives, so when the
callback is called I'd get a coredump. I don't want to store "query"
in a Hash or Array since it requires inserting and deleting it (so
wasted time), I just want "query" not to be GC'ed until udns callback
function is executed.

So, if I include rb_gc_mark(self) in Query_submit_query_A() function,
would it prevent "query" from being GC'ed?
But in this case, how to unmark it so it can be GC'ed after query
completes? wouldn't leak if not? (note that Resolver instance lives
forever.

Thanks a lot.

--
Iñaki Baz Castillo
<ibc@aliax.net>

After rechecking it I strongly think I must store the object "query"
in a Hash. If not, the object clearly "dissapears" and could be
legitimately GC'ed at any time.

···

2011/2/6 Iñaki Baz Castillo <ibc@aliax.net>:

But in my above Ruby code I don't store "query" in a hash or array, so
it could be GC'ed before the DNS response arrives, so when the
callback is called I'd get a coredump. I don't want to store "query"
in a Hash or Array since it requires inserting and deleting it (so
wasted time), I just want "query" not to be GC'ed until udns callback
function is executed.

--
Iñaki Baz Castillo
<ibc@aliax.net>

I'd think the resolver object would hold on to all active queries until
completed.

···

On Sun, Feb 6, 2011 at 3:53 PM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

After rechecking it I strongly think I must store the object "query"
in a Hash. If not, the object clearly "dissapears" and could be
legitimately GC'ed at any time.

--
Tony Arcieri
Medioh! Kudelski

Yes, that's work: the resolver object contains a hash attribute in
which queries are stored until completed.

···

2011/2/8 Tony Arcieri <tony.arcieri@medioh.com>:

After rechecking it I strongly think I must store the object "query"
in a Hash. If not, the object clearly "dissapears" and could be
legitimately GC'ed at any time.

I'd think the resolver object would hold on to all active queries until
completed.

--
Iñaki Baz Castillo
<ibc@aliax.net>

Use a ruby hash and supply mark and GC callbacks to Data_Wrap_Struct so the GC will keep track of it.

···

On Feb 8, 2011, at 12:53 AM, Iñaki Baz Castillo wrote:

2011/2/8 Tony Arcieri <tony.arcieri@medioh.com>:

After rechecking it I strongly think I must store the object "query"
in a Hash. If not, the object clearly "dissapears" and could be
legitimately GC'ed at any time.

I'd think the resolver object would hold on to all active queries until
completed.

Yes, that's work: the resolver object contains a hash attribute in
which queries are stored until completed.

Thanks, but I don't understand why I must use mark callback:

- My class uses Data_Wrap_Struct containing a xxx_free function to
free the C structure when the object is GC'ed.

- An instance of my class contains an @hash attribute (which is set
empty in initialize method).

- Such @hash is populted with some normal Ruby objects during runtime.
Nothing special here.

- So when my instance is GC'ed, at some point Ruby will GC the @hash
attribute and also the objects it contains.

Then... why do I need a mark callback? Maybe I miss something, however
Itested my code under high load and doesn't seem to leak.
Thanks a lot.

···

2011/2/8 Eric Hodel <drbrain@segment7.net>:

Yes, that's work: the resolver object contains a hash attribute in
which queries are stored until completed.

Use a ruby hash and supply mark and GC callbacks to Data_Wrap_Struct so the GC will keep track of it.

--
Iñaki Baz Castillo
<ibc@aliax.net>

Since you use the instance variable you do not need the mark callback. The mark callback is only required if you are storing ruby objects in a structure that hides them from ruby.

···

On Feb 8, 2011, at 3:18 PM, Iñaki Baz Castillo wrote:

2011/2/8 Eric Hodel <drbrain@segment7.net>:

Yes, that's work: the resolver object contains a hash attribute in
which queries are stored until completed.

Use a ruby hash and supply mark and GC callbacks to Data_Wrap_Struct so the GC will keep track of it.

Thanks, but I don't understand why I must use mark callback:

- My class uses Data_Wrap_Struct containing a xxx_free function to
free the C structure when the object is GC'ed.

- An instance of my class contains an @hash attribute (which is set
empty in initialize method).

- Such @hash is populted with some normal Ruby objects during runtime.
Nothing special here.

- So when my instance is GC'ed, at some point Ruby will GC the @hash
attribute and also the objects it contains.

Then... why do I need a mark callback? Maybe I miss something, however
Itested my code under high load and doesn't seem to leak.
Thanks a lot.

Ok. So if for example my class stores some VALUE objects in a pure C
array, and these objects are not referenced at Ruby level, then I must
mark them. If not, Ruby GC could remove them from memory at any time,
am I right?

Thanks a lot.

···

2011/2/9 Eric Hodel <drbrain@segment7.net>:

Then... why do I need a mark callback? Maybe I miss something, however
Itested my code under high load and doesn't seem to leak.
Thanks a lot.

Since you use the instance variable you do not need the mark callback. The mark callback is only required if you are storing ruby objects in a structure that hides them from ruby.

--
Iñaki Baz Castillo
<ibc@aliax.net>

Correct.

···

On Feb 8, 2011, at 4:29 PM, Iñaki Baz Castillo wrote:

2011/2/9 Eric Hodel <drbrain@segment7.net>:

Then... why do I need a mark callback? Maybe I miss something, however
Itested my code under high load and doesn't seem to leak.
Thanks a lot.

Since you use the instance variable you do not need the mark callback. The mark callback is only required if you are storing ruby objects in a structure that hides them from ruby.

Ok. So if for example my class stores some VALUE objects in a pure C
array, and these objects are not referenced at Ruby level, then I must
mark them. If not, Ruby GC could remove them from memory at any time,
am I right?

Thanks a lot.