Threads vs Processes

I have just switched back to Windows from the Mac/Linux world and my
first non-web Ruby project is a program to manage a bunch of independent
backup tasks ... the spec calls for allowing many tasks to be queued but
to only allow a certain number to run at a time. I don't foresee the
tasks needing to share any data with the parent process or each other
but that might be an option to keep open...

My first thought is to do something along the lines of a thread pool and
have each task as a thread... but then I thought of processes... I am
admittedly a but fuzzy on the distinction between processes and threads
in Ruby, especially on Windows... can someone shine some light on this
for me?

Would one approach be easier to manage in this type of scenario? Are
there any performance or portability issues I should be aware of?

Thanks much!
Tim

···

--
Posted via http://www.ruby-forum.com/.

Tim Ferrell wrote:

I have just switched back to Windows from the Mac/Linux world and my
first non-web Ruby project is a program to manage a bunch of independent
backup tasks ... the spec calls for allowing many tasks to be queued but
to only allow a certain number to run at a time. I don't foresee the
tasks needing to share any data with the parent process or each other
but that might be an option to keep open...

My first thought is to do something along the lines of a thread pool and
have each task as a thread... but then I thought of processes... I am
admittedly a but fuzzy on the distinction between processes and threads
in Ruby, especially on Windows... can someone shine some light on this
for me?

Would one approach be easier to manage in this type of scenario? Are
there any performance or portability issues I should be aware of?

Thanks much!
Tim

You might take a look at SizedQueue. There's a good description of it in
"The Ruby Programming Language" book. I used it as a work queue to
request a boatload of remote files, from which a set of concurrent ftp
connection threads pulled from to go get them.

I did notice, as forwarned, that the performance degraded after a while
using the native ruby threads. If you're on Windows, though, you might
give IronRuby a fling (or Jruby if it's installed). Those switch over to
their framework's Thread classes, and worked very well.

···

--
Posted via http://www.ruby-forum.com/\.

I have just switched back to Windows from the Mac/Linux world and my
first non-web Ruby project is a program to manage a bunch of independent
backup tasks ... the spec calls for allowing many tasks to be queued but
to only allow a certain number to run at a time. I don't foresee the
tasks needing to share any data with the parent process or each other
but that might be an option to keep open...

My first thought is to do something along the lines of a thread pool and
have each task as a thread... but then I thought of processes... I am
admittedly a but fuzzy on the distinction between processes and threads
in Ruby, especially on Windows... can someone shine some light on this
for me?

Processes are fairly independent. Threads share the same memory space and if the process exits they are all gone. In Ruby 1.8 there was only one OS level thread that did the work for all Ruby threads so you could not make good use of multiple cores that way. OTOH, if your threads just control external programs that you execute (e.g. via "system" or "IO.popen") then the single thread might be sufficient. In 1.9 things have been improved but still there are some limitations to the concurrency of multiple threads. Using JRuby with real threads is also an option.

Would one approach be easier to manage in this type of scenario? Are
there any performance or portability issues I should be aware of?

Performance wise and from a robustness point of view multiple processes are probably better. AFAIK the windows version of Ruby does not have support for "fork" (unless you are using cygwin) so there you might rather want to use threads.

Using processes is fairly easy - you can try it out with something like this:

#! /usr/bin/env ruby19

def log msg
   printf "pid %5d %-10s %s\n", $$, Time.now, msg
end

tasks = (1..10).map { 2 + rand(5) }

limit = 2
processes =

log "starting"

tasks.each do |t|
   if processes.size == limit
     processes.delete Process.wait
   end

   processes << fork do
     log "start #{t}"
     sleep t
     log "end #{t}"
   end
end

log "all started"
Process.waitall
log "done"

Kind regards

  robert

···

On 02/21/2010 01:50 AM, Tim Ferrell wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/