Dear All,
I have created a Ruby extension that runs under both Windows (Ruby
1.8.4) and Linux (Ruby 1.8.5), using SWIG. The extension is written in
C and is multi-threaded, thanks to PThreads. Several C functions are
exported to Ruby to perform various operations.
One of the C functions is a callback notifier whose usage is as
follows:
1) The main Ruby program must first invoke a C native function:
Libext.ext_notifier_add(dhtid, proc, "method@class", 0) where the
"method@class" is a dummy used normally for the Java version of the
extension, and the main parameter is the proc address corresponding to
a Ruby block like: proc = Proc.new { | value | Libext.ext_put(id, key,
value, index) } . This registers a Ruby callback with the C extension.
2) The main Ruby program will also invoke C native functions that will
result in PThreads being created to receive and process UDP messages.
On average, some 50 PThreads are active at any time inside the C
extension. There are many separate instances of the Ruby program
(processes) running simultaneously on the same or different (networked)
computers.
3) When a Ruby program calls Libext.ext_put(id, key, value, index) a
message will be sent to another node. On the destination node upon
receiving the message some parameters are evaluated and if applicable
the registered Ruby callback is invoked with:
rb_thread_critical = Qtrue;
res = rb_funcall(proc, rb_intern("call"), 4, rb_str_new2(triggerstr),
rb_str_new2(evalstr), INT2NUM(index), LONG2NUM(state->id));
rb_thread_critical = 0;
4) The callback will normally contain only fairly simple statements but
may as well need to invoke C native functions (as in the example
above).
So far I have seen the following problems with this solution:
a) cross-thread violation on rb_thread_schedule() : this happens on the
Windows version of the C extension when NO native function are invoked
from the callback and rb_thread_critical = Qtrue; is not used.
Under heavy load of incoming UDP messages from other nodes, the
callback execution will produce that error. It seems that the Ruby
scheduler is preempting the execution of the Ruby callback and
determines that a native thread has invoked it (a bad thing it seems).
Adding rb_thread_critical = Qtrue; makes it impossible for the
scheduler to change the thread of execution and eliminates the error
except for once every several hundred callbacks under sustained load.
b) SystemStackError: stack level too deep : this happens exclusively on
Linux (Debian) after only a few callbacks when NO native function are
invoked and rb_thread_critical = Qtrue; is used. The C extension uses a
fairly deeply nested function calls (with several layers of callbacks).
I assume that Ruby somehow (how?) detects that the stack has grown past
permitted levels (which are?) and throw that error message.
Now coming to the main issue... I would like to get rid of problem b)
and one possible solution that occurred to me is to create a native
PThread that will perform the Ruby callback and exit when all Ruby and
native method/function calls have completed. This seems possible under
Java JNI through the use of AttachCurrentThread() that converts
temporarily a native thread into a Java thread.
However, I did not discover any similar function in Ruby. Besides, I
read plenty of comments that mixing native threads with the Ruby
interpreter is a bad idea. As a sideline, all the testing has been done
from the command line using ruby.exe, although under Windows I will
soon have to migrate to Active Ruby.
Being new to all this Ruby internal coding I would like to sollicit
your feeback and advice on how I could tackle this problem. I hope I
have overlooked something that will make my life easier Overall,
Ruby has proven a fantastic language to develop on but I fear that
compared to JNI the support for native threads is severely lacking
Cheers,
Serge Kruppa
CTO - Simitel