Basic threading question: can ruby use real threads?

I've read somewhere, and would love for it to be wrong, that ruby
doesn't use real threads, that it handles it's threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

For many systems this isn't a big deal one way or the other, since
they only have one physical processor. Luckily(?) pretty much all my
systems have two procs. (Two real processors, not HT, but that's a
debate for another day.) I'd like to write some threaded ruby code,
and have it spread across my cpus, share data structures etc.

I'm used to pthreads in UNIX systems :slight_smile: so I'd _really_ like it if I
could do the same type of things I've done before, just in a rubyish
sort of way. Setting up a shared memory area and all that jazz that
you had to do for forking really doesn't sound like a fun, especially
when the point of the code I wanna write _is_ for fun.

Thanks,
         Kyle

This was recently discussed in detail by the creators:

James Edward Gray II

···

On May 8, 2007, at 3:52 PM, Kyle Schmitt wrote:

I've read somewhere, and would love for it to be wrong, that ruby
doesn't use real threads, that it handles it's threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

Quoting Kyle Schmitt <kyleaschmitt@gmail.com>:

I've read somewhere, and would love for it to be wrong, that ruby
doesn't use real threads, that it handles it's threads internally. Is
that true?

You have heard correctly and yes it is a pain.

http://www.surfjunky.com/?r=Gabrielll cheach this out :smiley:

···

--
Posted via http://www.ruby-forum.com/.

Sweet, thanks for the link!

http://www.surfjunky.com/?r=Gabrielll chack this out :smiley: it chaged my
life style :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

OK, so I'm reading that article, and I'm getting three things form it:
YARV uses native threads.
YARV doesn't run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I'll just hope that writing threaded code doesn't
change to much with ruby2.0/YARV.

--Kyle

well you can use fastthreads gem (part of mongrel)
also you can fork your script ^^ threads usually execute on same processor
AFIK, that's why if you want to use 2 processors you have to fork your
scripts, and if you need comunication between them consider using drb.

very good gem is slave - it makes creating new processes super easy - it
provides easy way to comunicate, so you can create 4-6 new processes each
will get data to compute from mother process and the'll use both processors

sorry for lots of randomness and strange grammar - to much coffeine
to sumarize - read rdoc for gems:
- fasthread(s)
- slave(s)
(i never remember if they are plurar or singular)

···

On Tuesday 08 May 2007 21:34, Kyle Schmitt wrote:

OK, so I'm reading that article, and I'm getting three things form it:
YARV uses native threads.
YARV doesn't run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I'll just hope that writing threaded code doesn't
change to much with ruby2.0/YARV.

--Kyle

--
Marcin Raczkowski
---
Friends teach what you should know
Enemies Teach what you have to know

fastthread just makes the locking primitives from thread.rb a little faster; it doesn't otherwise affect the operation of Ruby threads. Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental

···

On Thu, 10 May 2007 01:00:04 +0900, Marcin Raczkowski <swistak@mailx.expro.pl> wrote:

well you can use fastthreads gem (part of mongrel)

I didn't say it makes use of POSIX threads - i just recomended it becouse they
are well ... faster.

only thing right now that'll let you use botht procesors is fork

···

On Wednesday 09 May 2007 18:27, MenTaLguY wrote:

On Thu, 10 May 2007 01:00:04 +0900, Marcin Raczkowski <swistak@mailx.expro.pl> wrote:
> well you can use fastthreads gem (part of mongrel)

fastthread just makes the locking primitives from thread.rb a little
faster; it doesn't otherwise affect the operation of Ruby threads.
Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental

--
Marcin Raczkowski
---
Friends teach what you should know
Enemies Teach what you have to know

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

If someone were to... write a c extension who's objects were threaded,
via pthreads, would it be a nightmare?

Even just typing that line almost scares me....but I can think of some
clean(ish?) ways of doing it. I'm just worried I'd loose the rubyness
of the thing if I did it that way.

Thanks,
          Kyle

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can't inadvertently change the state of another.
On a multi-processor box you'll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can't be partitioned easily into separate addresses spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn't quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you'll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary Wright

···

On May 9, 2007, at 2:57 PM, Marcin Raczkowski wrote:

I didn't say it makes use of POSIX threads - i just recomended it becouse they
are well ... faster.

only thing right now that'll let you use botht procesors is fork

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

Totally RPC. You could use DRb to do this in a Rubyesque fashion.

It's worth noting that no matter _what_ threading approach you use, it's
absolutely best to minimize the number of objects shared between threads.

If someone were to... write a c extension who's objects were threaded,
via pthreads, would it be a nightmare?

Yes, somewhere between nightmare and flesh-rending terror. At least if you're
planning on manipulating Ruby objects from each thread.

You might want to consider using JRuby instead. It's compatible enough with MRI
that it runs Rails, and it uses "real" threads for multi-CPU goodness.

-mental

···

On Thu, 10 May 2007 04:03:57 +0900, "Kyle Schmitt" <kyleaschmitt@gmail.com> wrote:

As i mentioned earlier - easiest way to get REAL concurency (java VM will NOT
use both processors - for few reasons JavaVM ALWAYS use one processor -
scalling for example Tomcat in production enviroment require running 2-4 java
VM's) is to use Slave gem - I'm using it for my project for concurent
parasing of logs - overhead on DRb is not big -and what's more you can use it
on few machines if you want to scale it further

http://www.codeforpeople.com/lib/ruby/slave/slave-1.2.1/

creating new forks is really easy and you can create just one class for
procesing of data that can be concurent and everything else can be done in
main program

···

On Wednesday 09 May 2007 19:20, Gary Wright wrote:

On May 9, 2007, at 2:57 PM, Marcin Raczkowski wrote:
> I didn't say it makes use of POSIX threads - i just recomended it
> becouse they
> are well ... faster.
>
> only thing right now that'll let you use botht procesors is fork

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can't inadvertently change the state of another.
On a multi-processor box you'll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can't be partitioned easily into separate addresses
spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn't quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking
solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you'll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary Wright

--
Marcin Raczkowski
---
Friends teach what you should know
Enemies Teach what you have to know

Manipulating ruby objects from inside the threads would be the idea in
some cases I'm thinking of... so it looks like JRuby until YARV gets
concurrent threads... and ooh do I hope it does.

Will the threading interface be drastically different between
MRI/JRuby/YARV? IE does anyone know if I code on MRI will it
automatically use real threads on JRuby, or will I have to re-code
some parts to get that?

Thanks again,
Kyle

Have you got evidence for this? I do not believe it to be the case for a
non-green-threaded JVM.

-mental

···

On Thu, 10 May 2007 18:45:47 +0900, Marcin Raczkowski <swistak@mailx.expro.pl> wrote:

As i mentioned earlier - easiest way to get REAL concurency (java VM will
NOT use both processors - for few reasons JavaVM ALWAYS use one processor -

Yes.

The APIs are the same between MRI and JRuby, though JRuby deliberately hedges
on the implementation of certain unsafe features like Thread#kill, Thread#raise,
and Thread.critical=.

-mental

···

On Thu, 10 May 2007 06:59:33 +0900, "Kyle Schmitt" <kyleaschmitt@gmail.com> wrote:

does anyone know if I code on MRI will it automatically use real threads on JRuby,

MenTaLguY wrote:

···

On Thu, 10 May 2007 18:45:47 +0900, Marcin Raczkowski <swistak@mailx.expro.pl> wrote:

As i mentioned earlier - easiest way to get REAL concurency (java VM will
NOT use both processors - for few reasons JavaVM ALWAYS use one processor -

Have you got evidence for this? I do not believe it to be the case for a
non-green-threaded JVM.

The OP is incorrect. Java VMs always use all cores in the system, except in a very few specialized VMs that are green threaded.

Even if we're talking about only one thread of execution, there's still the GC thread which generally runs in parallel.

- Charlie

The APIs are the same between MRI and JRuby, though JRuby deliberately
hedges on the implementation of certain unsafe features like
Thread#kill, Thread#raise, and Thread.critical=.

Thread#raise, "unsafe" ? It is the most useful thread-related functionality
I've seen since I'm using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is "unsafe" ?

···

--
Sylvain Joyeux

The APIs are the same between MRI and JRuby, though JRuby deliberately
hedges on the implementation of certain unsafe features like
Thread#kill, Thread#raise, and Thread.critical=.

Thread#raise, "unsafe" ? It is the most useful thread-related functionality I've seen since I'm using threads ! It allows for instance to handle failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is "unsafe" ?

Hi,

I'm not sure if this is what MenTaLGuY meant, but one way that Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an 'ensure' block.

This can cause a failure of critical resources to be cleaned up
correctly, such as locks on mutexes, etc., as some or all of the
code in the ensure block is skipped.

I first ran into this when I tried to use timeout{} to implement
a ConditionVariable#timed_wait, like:

  require 'thread'
  require 'timeout'
  class ConditionVariable
    def timed_wait(mutex, timeout_secs)
      timeout(timeout_secs) { wait(mutex) } # THIS IS UNSAFE
    end
  end

Note that 'timeout' functions by creating a temporary new thread
which sleeps for the duration, then raises an exception in the
'current' thread that invoked timeout.

If the timeout raises its exception at an unlucky moment, the
various internals of ConditionVariable#wait and Mutex#synchronize
that depend on ensure blocks to restore their class invariants are
skipped, resulting in nasty things like a permanently locked mutex.

Not fun... :frowning:

Regards,

Bill

···

From: "Sylvain Joyeux" <sylvain.joyeux@polytechnique.org>