Understanding "[BUG] cross-thread violation on rb_gc()"

Hello,

After some deep thought, I think I finally understand what the error
message "[BUG] cross-thread violation on rb_gc()" means. I will
present my understanding here; please correct me if I am wrong or
confirm if I am correct.

Suppose you have a C program which statically links with the Ruby
library and makes use of Ruby's C API functions. In particular, the
C program creates some Ruby strings from C strings using
StringValueCStr() and puts them inside a Ruby array.

So far so good. Now suppose that the C program wants to run an
embedded Ruby interpreter in the "background" while it is doing
other things. The C program does this by wrapping the ruby_run()
function (which never returns) inside a pthread.

Let's review the situation so far with a diagram:

{a process:

   main()

   {a pthread:

      ruby_run()
   }
}

Here we have the C program's main() function running in the
foreground and the embedded Ruby interpreter running in the background.

Now, suppose the embedded Ruby interpreter inside ruby_run() invokes
the garbage collector by calling the rb_gc() function. What happens?

There are two situations:

1. The garbage collector collects garbage created *only* by the
embedded Ruby interpreter inside ruby_run(). It ignores the garbage
created by the C program (remember that it created a Ruby array of
Ruby strings).

No problem here. The garbage collector frees up space for the
embedded Ruby interpreter and the interpreter merrily resumes
whatever it was doing.

2. The garbage collector tries to collect the Ruby array of Ruby
strings created by the C program. But wait a minute... the C program
which created that garbage is running in a *different* thread than
the embedded Ruby interpreter inside ruby_run()!

So, the garbage collector emits this error message and terminates
the Ruby interpreter (ruby_run): "[BUG] cross-thread violation on
rb_gc()".

That's my understanding of the error message. If I am correct, then
a solution to this problem would be to prohibit the C program from
using Ruby's C API functions. That way, the C program never creates
any Ruby objects which can be garbage collected by the embedded Ruby
interpreter that runs inside another thread.

What do you think?

Suraj N. Kurapati wrote:

In particular, the
C program creates some Ruby strings from C strings using
StringValueCStr() and puts them inside a Ruby array.

Whoops, my mistake. The C program creates Ruby strings from C
strings using rb_str_new2(), not by using StringValueCStr().

···

So far so good. Now suppose that the C program wants to run an
embedded Ruby interpreter in the "background" while it is doing
other things. The C program does this by wrapping the ruby_run()
function (which never returns) inside a pthread.

Let's review the situation so far with a diagram:

{a process:

   main()

   {a pthread:

      ruby_run()
   }
}

Here we have the C program's main() function running in the
foreground and the embedded Ruby interpreter running in the background.

Now, suppose the embedded Ruby interpreter inside ruby_run() invokes
the garbage collector by calling the rb_gc() function. What happens?

There are two situations:

1. The garbage collector collects garbage created *only* by the
embedded Ruby interpreter inside ruby_run(). It ignores the garbage
created by the C program (remember that it created a Ruby array of
Ruby strings).

No problem here. The garbage collector frees up space for the
embedded Ruby interpreter and the interpreter merrily resumes
whatever it was doing.

2. The garbage collector tries to collect the Ruby array of Ruby
strings created by the C program. But wait a minute... the C program
which created that garbage is running in a *different* thread than
the embedded Ruby interpreter inside ruby_run()!

So, the garbage collector emits this error message and terminates
the Ruby interpreter (ruby_run): "[BUG] cross-thread violation on
rb_gc()".

That's my understanding of the error message. If I am correct, then
a solution to this problem would be to prohibit the C program from
using Ruby's C API functions. That way, the C program never creates
any Ruby objects which can be garbage collected by the embedded Ruby
interpreter that runs inside another thread.

What do you think?

Hi,

At Sun, 16 Apr 2006 10:52:55 +0900,
Suraj N. Kurapati wrote in [ruby-talk:188996]:

Now, suppose the embedded Ruby interpreter inside ruby_run() invokes
the garbage collector by calling the rb_gc() function. What happens?

There are two situations:

1. The garbage collector collects garbage created *only* by the
embedded Ruby interpreter inside ruby_run(). It ignores the garbage
created by the C program (remember that it created a Ruby array of
Ruby strings).

No differences between objects created inside ruby_run() and
others. The ruby interpreter which is consisted from the core
and many class libraries use each other, is also just one of
programs using "embedded" interpreter.

2. The garbage collector tries to collect the Ruby array of Ruby
strings created by the C program. But wait a minute... the C program
which created that garbage is running in a *different* thread than
the embedded Ruby interpreter inside ruby_run()!

The point is where the object is stored.

Ruby GC scans the current machine stack to detect live objects,
from the bottom of it through the current top of it. Note that
the bottom is initialized the position of the stack when
ruby_init() is called. If you run it from another native
thread other than the initial thread, the stack pointer will
point another stack, and GC will run out "gaps" between those
stacks.

That's my understanding of the error message. If I am correct, then
a solution to this problem would be to prohibit the C program from
using Ruby's C API functions. That way, the C program never creates
any Ruby objects which can be garbage collected by the embedded Ruby
interpreter that runs inside another thread.

Incorrect.

You have to ensure rb_gc() will run only in the initial thread,
and all objects created in your program will be refered from
somewhere other than the main stack.

···

--
Nobu Nakada

Hi,

nobu@ruby-lang.org wrote:

Ruby GC scans the current machine stack to detect live objects,
from the bottom of it through the current top of it. Note that
the bottom is initialized the position of the stack when
ruby_init() is called. If you run it from another native
thread other than the initial thread, the stack pointer will
point another stack, and GC will run out "gaps" between those
stacks.

Ah! Now I see. Thank you. :slight_smile:

That's my understanding of the error message. If I am correct, then
a solution to this problem would be to prohibit the C program from
using Ruby's C API functions. That way, the C program never creates
any Ruby objects which can be garbage collected by the embedded Ruby
interpreter that runs inside another thread.

Incorrect.

You have to ensure rb_gc() will run only in the initial thread,
and all objects created in your program will be refered from
somewhere other than the main stack.

Could you suggest a way to do this?

I hoped that after ruby_init() was called, all new Ruby objects
would automatically use the stack pointer prepared by ruby_init().
Unfortunately, that was not so:

I tried the following scenario where the C process (1) creates and
runs the Ruby pthread, and (2) creates a Ruby array. But when the C
process creates the Ruby array, the cross-thread violation occurs.
Meaning that new Ruby objects are always created with respect to the
stack pointer of the current thread/process, not with respect to the
stack pointer prepared by ruby_init().

{process:
  pthread_create();
  ...

  {pthread:
    ruby_init();
    ...
    ruby_run();
  }

  ...
  rb_ary_new(); // causes cross-thread violation on rb_gc()
}

At present, the only solution I know is to prohibit the C process
from creating Ruby objects. Is there a way that the C process can
create and use Ruby objects with respect to the stack pointer
prepared by ruby_init()?

Thanks.

nobu@ruby-lang.org wrote:

That's my understanding of the error message. If I am correct, then
a solution to this problem would be to prohibit the C program from
using Ruby's C API functions. That way, the C program never creates
any Ruby objects which can be garbage collected by the embedded Ruby
interpreter that runs inside another thread.

Incorrect.

You have to ensure rb_gc() will run only in the initial thread,
and all objects created in your program will be refered from
somewhere other than the main stack.

IMHO, the only practical to way to implement your suggestion is
through this policy:

1. Code executed by the C process *cannot* use the Ruby C API.

2. Code executed by the Ruby thread *can* use the Ruby C API.

At heart, this policy simply prohibits using the Ruby C API as a
means of inter-process communication between the C process and the
Ruby thread. Other means of inter-process communication, such as
pipes or sockets, can be used instead.

Thus, the resulting scenario looks like this:

{process:
  // *never* use the Ruby C API here

  pthread_create();
  ...

  {pthread:
    // use Ruby C API here

    ruby_init();
    ...
    ruby_run();
  }

  ...
}

Thanks for your advice and explanations.

Suraj N. Kurapati wrote:

{process:
  pthread_create();
  ...

  {pthread:
    ruby_init();
    ...
    ruby_run();
  }

  ...
  rb_ary_new(); // causes cross-thread violation on rb_gc()
}

Attached with this e-mail is example code for the above scenario. If
you un-comment the function-call to rb_ary_new() inside main(), then
the cross-thread violation does not occur.

Here is what happens when the scenario is compiled and run:

$ gcc -Wall -g -I$(ruby -rmkmf -e 'puts $topdir') \
- -l$(ruby -rmkmf -e 'puts CONFIG["RUBY_SO_NAME"]') \
- -lpthread ctviol.c -o ctviol

$ ./ctviol
[BUG] cross-thread violation on rb_gc()
ruby 1.8.4 (2005-12-24) [i486-linux]

Aborted (core dumped)

Here is the environment in which I ran the scenario:

$ ruby -v
ruby 1.8.4 (2005-12-24) [i486-linux]

$ uname -a
Linux yantram 2.6.15-20-386 #1 PREEMPT Tue Apr 4 17:48:51 UTC 2006
i686 GNU/Linux

$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
max nice (-e) unlimited
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) unlimited
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Thanks for your consideration.

ctviol.c (515 Bytes)

Interestingly, the previous example code can be simplified into the
following:

#include <ruby.h>

int main(int argc, char** argv) {
  rb_ary_new(); // causes cross-thread violation on rb_gc()

  return 0;
}

Now the reason for the cross-thread violation is quite obvious: you
cannot create Ruby objects without initializing the Ruby library
first! So, a solution would look like this:

#include <ruby.h>

int main(int argc, char** argv) {
  ruby_init();
  rb_ary_new(); // no cross-thread violation!

  return 0;
}

I applied this same technique to the previous example code (with
Ruby interpreter inside a pthread. As expected, the cross-thread
violation did not occur:

#include <ruby.h>
#include <pthread.h>

void* ruby_run_handshake(void* dummy) {
  ruby_init();
  ruby_init_loadpath();

  ruby_run(); // this never returns
}

int main(int argc, char** argv) {
  // run a Ruby interpreter in the background
  pthread_t rubyThread;
  pthread_create(&rubyThread, 0, ruby_run_handshake, 0);

  // create some Ruby objects
  ruby_init();
  rb_ary_new(); // no cross-thread violation!

  return 0;
}

Now, I have some questions regarding the above code:

1. Is the Ruby "environment" running inside ruby_run_handshake()
different from the Ruby "environment" running inside main()?

2. Does ruby_init() inside main() override the stack pointer
prepared by ruby_init() inside ruby_run_handshake()?

3. If we create some Ruby arrays inside main(), will they be garbage
collected by the Ruby "environment" inside the pthread? vice versa?

Thanks for your attention.

Suraj N. Kurapati wrote:

I applied this same technique to the previous example code (with
Ruby interpreter inside a pthread. As expected, the cross-thread
violation did not occur:

Unfortunately, this approach doesn't work for a larger program where
the Ruby thread creates so many Ruby objects that the garbage
collector is invoked---at which point the cross-thread violation occurs.

1. Is the Ruby "environment" running inside ruby_run_handshake()
different from the Ruby "environment" running inside main()?

- From what I have observed, no.

3. If we create some Ruby arrays inside main(), will they be garbage
collected by the Ruby "environment" inside the pthread? vice versa?

- From what I have observed, yes.