Thread.critical and segfaults calling into Ruby callbacks from C

hi,

i just ran into an interesting segfault this weekend.

i have a Ruby application that uses my D-BUS bindings.

to receive messages, i register a Proc object with the extension API,
which then registers a wrapper callback in C which does:

rb_funcall(proc_obj_value, rb_intern("call"), n, ...);

this is all fine, and works well.

as it happens though, one of my callbacks now does something which
changes hardware status, causing HAL to emit a message which will be
delivered to this same callback.

now, i'm not entirely sure whether its being delivered from another OS
thread, i suspect it may be (its invocation is being dispatched by
D-BUS itself by way of GLib).

if it is being called from another OS thread, what would be the safest
way to ensure callback invocation is serialized? Thread.critical?

on a slower machine, the code as it stands causes an interpreter
coredump, with the following backtrace:

#0 0x4015acdf in raise () from /lib/tls/libc.so.6
#1 0x4015c4e5 in abort () from /lib/tls/libc.so.6
#2 0x40039afe in rb_bug (fmt=0x0) at error.c:214
#3 0x4009f262 in sigsegv (sig=11) at signal.c:446
#4 <signal handler called>
#5 0x400a18c3 in st_lookup (table=0x812b7d0, key=5209,
value=0xbfffd7c8) at st.c:259
#6 0x4003cd66 in search_method (klass=1087036240, id=5209,
origin=0xbfffd7f4) at eval.c:380
#7 0x4003cdc1 in rb_get_method_body (klassp=0xbfffd850,
idp=0xbfffd834, noexp=0xbfffd838) at eval.c:401
#8 0x40047d6c in rb_call (klass=1087036240, recv=1087038460,
mid=5209, argc=2, argv=0xbfffd868, scope=1) at eval.c:5713
#9 0x400481dc in vafuncall (recv=1087038460, mid=1769366852, n=2,
ar=0xbfffd8a4) at ruby.h:631
#10 0x4004825c in rb_funcall (recv=1769366852, mid=1769366852,
n=1769366852) at eval.c:5831
#11 0x40302fe0 in _real_call_filter_cb (args=3221215760) at
ruby-dbus-connection.c:598
#12 0x40046d19 in rb_protect (proc=0x40302f50 <_real_call_filter_cb>,
data=3221215760, state=0xbfffda0c) at eval.c:5163
#13 0x4030308b in _on_filter_message (conn=0x8202678, msg=0x8152e78,
user_data=0x80c8f38) at ruby-dbus-connection.c:619
#14 0x40322e0c in dbus_connection_dispatch (connection=0x8202678) at
dbus-connection.c:2550
#15 0x40379a78 in gsource_connection_dispatch (source=0x8152d18,
callback=0x69766544, user_data=0x69766544) at dbus-gmain.c:258
#16 0x403a3c30 in unblock_source () from /usr/lib/libglib-2.0.so.0
#17 0x403a4c98 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#18 0x403a4fad in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#19 0x403a56cf in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#20 0x4084445f in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#21 0x4070b8d0 in Init_gtk_curve () from
/usr/lib/ruby/site_ruby/1.8/i386-linux/gtk2.so
#22 0x40054cce in call_cfunc (func=0x4070b8a0 <Init_gtk_curve+1376>,
recv=1076360312, len=135444432, argc=5209, argv=0x400cf620) at
eval.c:5384
#23 0x40047535 in rb_call0 (klass=1086822540, recv=1076360312,
id=5073, oid=1769366852, argc=0, argv=0x0, body=0x40c796d4, nosuper=0)
at eval.c:5516
#24 0x40047e54 in rb_call (klass=1086822540, recv=1076360312,
mid=5073, argc=0, argv=0x0, scope=0) at eval.c:5737
#25 0x4004275b in rb_eval (self=1076489492, n=0x69766544) at eval.c:3240
#26 0x4004782c in rb_call0 (klass=1086750040, recv=1076489492,
id=5105, oid=1769366852, argc=0, argv=0xbfffe66c, body=0x402c11d4,
nosuper=0)
    at eval.c:5644
#27 0x40047e54 in rb_call (klass=1086750040, recv=1076489492,
mid=5105, argc=1, argv=0xbfffe668, scope=0) at eval.c:5737
#28 0x4004275b in rb_eval (self=1076476352, n=0x69766544) at eval.c:3240
#29 0x40041dde in rb_eval (self=1076476352, n=0x69766544) at eval.c:3030
#30 0x40041f4c in rb_eval (self=1076476352, n=0x69766544) at eval.c:3079
#31 0x4003e7dc in ruby_exec_internal () at eval.c:1467
#32 0x4003e834 in ruby_exec () at eval.c:1487
#33 0x4003e875 in ruby_run () at eval.c:1497
#34 0x08048713 in main (argc=1769366852, argv=0x69766544,
envp=0xbffffad0) at main.c:46
(gdb)

this occurs at the same place every time, and i've tried three
different Ruby versions (a stable snapshot before -preview3, -preview3
itself, and the latest stable snapshot (20041211).

this leads me to believe i'm doing something unexpected by the interpreter.

modifying the callback's Ruby code to contain:

def cb
Thread.critical = true
....
ensure
Thread.critical = false
end

seems to fix the problem as well.

i'd appreciate if someone had a good explanation for the best thing to
do in this case. i'd prefer to do it at the C level so that users of
the bindings don't have to worry about this.

thanks!
leon

modifying the callback's Ruby code to contain:

def cb
Thread.critical = true

You may already be in a critical section here.

....
ensure
Thread.critical = false
end

def cb
   old_critical = Thread.critical = true
   # ...
ensure
   Thread.critical = old_critical
end

(GC.disable/enable works the same way.)

···

On 10 Dec 2004, at 22:27, leon breedt wrote:

Thanks for the tip.

It seems I spoke too quickly though, doing something as simply as add
$stderr.puts before and after the Thread.critical modifications brings
back the same segfault...Doing an 'info threads' in gdb also shows
only the one process.

Back to debugging...

···

On Sat, 11 Dec 2004 15:45:56 +0900, Eric Hodel <drbrain@segment7.net> wrote:

You may already be in a critical section here.

> ....
> ensure
> Thread.critical = false
> end

def cb
   old_critical = Thread.critical = true
   # ...
ensure
   Thread.critical = old_critical
end

(GC.disable/enable works the same way.)

It seems I spoke too quickly though, doing something as simply as add
$stderr.puts before and after the Thread.critical modifications brings
back the same segfault...Doing an 'info threads' in gdb also shows
only the one process.

Ruby's threads are 'green' so you won't see a second thread.

···

On 10 Dec 2004, at 23:11, leon breedt wrote:

Back to debugging...