[ANN] process-group gem - concurrent processes with fibers

Hi Everyone,

`Process::Group` is a class for coordinating and managing multiple
processes which execute concurrently in fibers.

In some of my testing scripts, multiple processes need to run. In the past,
I've just done this sequentially. However, I've been modernising some
scripts and I've bundled up the code into this gem.

Previously:

Process.spawn("some-task")
process_and_email_results

Process.spawn("some-other-task --foobar")
process_and_email_results

Now I can run like this:

group = Process::Group.new

Fiber.new do
group.spawn("some-task")
process_and_email_results
end.resume

Fiber.new do
group.spawn("some-other-task --foobar")
process_and_email_results
end.resume

group.wait

Process::Group allows you to run the two tasks concurrently and in these
cases it was an easy option to modernise existing scripts. You can call
spawn multiple times in a fiber and it will work as expected. You can also
kill the entire group of processes if you wish.

Examples, documentation and code: https://github.com/ioquatix/process-group

Kind regards,
Samuel

What's the advantage over using threads, the old school way?

Thread.new do
   system("some-task")
   process_and_email_results
end

···

On 03/11/2014 12:17 AM, Samuel Williams wrote:

Hi Everyone,

`Process::Group` is a class for coordinating and managing multiple
processes which execute concurrently in fibers.

In some of my testing scripts, multiple processes need to run. In the
past, I've just done this sequentially. However, I've been modernising
some scripts and I've bundled up the code into this gem.

Previously:

Process.spawn("some-task")
process_and_email_results

Process.spawn("some-other-task --foobar")
process_and_email_results

Now I can run like this:

group = Process::Group.new

Fiber.new do
group.spawn("some-task")
process_and_email_results
end.resume

Fiber.new do
group.spawn("some-other-task --foobar")
process_and_email_results
end.resume

group.wait

Process::Group allows you to run the two tasks concurrently and in these
cases it was an easy option to modernise existing scripts. You can call
spawn multiple times in a fiber and it will work as expected. You can
also kill the entire group of processes if you wish.

Examples, documentation and code: GitHub - ioquatix/process-group: Manages a group of processes which can run concurrently using fibers.

Kind regards,
Samuel

Or a system like http://celluloid.io which provides both threads and fibers
and can integrate with things like I/O reactors...

···

On Tue, Mar 11, 2014 at 6:56 PM, Joel VanderWerf <joelvanderwerf@gmail.com>wrote:

What's the advantage over using threads, the old school way?

--
Tony Arcieri

Threads are good but I felt like I wanted something more predictable. Also,
not all implementations of Ruby use green threads and therefore might have
synchronisation issues if you use (either directly or indirectly through a
gem/library) shared global state.

···

On 12 March 2014 14:56, Joel VanderWerf <joelvanderwerf@gmail.com> wrote:

On 03/11/2014 12:17 AM, Samuel Williams wrote:

Hi Everyone,

`Process::Group` is a class for coordinating and managing multiple
processes which execute concurrently in fibers.

In some of my testing scripts, multiple processes need to run. In the
past, I've just done this sequentially. However, I've been modernising
some scripts and I've bundled up the code into this gem.

Previously:

Process.spawn("some-task")
process_and_email_results

Process.spawn("some-other-task --foobar")
process_and_email_results

Now I can run like this:

group = Process::Group.new

Fiber.new do
group.spawn("some-task")
process_and_email_results
end.resume

Fiber.new do
group.spawn("some-other-task --foobar")
process_and_email_results
end.resume

group.wait

Process::Group allows you to run the two tasks concurrently and in these
cases it was an easy option to modernise existing scripts. You can call
spawn multiple times in a fiber and it will work as expected. You can
also kill the entire group of processes if you wish.

Examples, documentation and code: ioquatix (Samuel Williams) · GitHub
process-group

Kind regards,
Samuel

What's the advantage over using threads, the old school way?

Thread.new do
  system("some-task")
  process_and_email_results
end

Celluloid looks pretty interesting - I've seen it pop up quite a few
times. A unix process group and a set of actors are two completely
different things (e.g. signal handling). I wanted something dead simple and
specific to what I was trying to do. I've also got some use-cases for which
celluloid feels too heavy.

···

On 12 March 2014 15:00, Tony Arcieri <tony.arcieri@gmail.com> wrote:

On Tue, Mar 11, 2014 at 6:56 PM, Joel VanderWerf <joelvanderwerf@gmail.com > > wrote:

What's the advantage over using threads, the old school way?

Or a system like http://celluloid.io which provides both threads and
fibers and can integrate with things like I/O reactors...

--
Tony Arcieri

You can avoid fibers/threads entirely, too.
Just a hash, lambdas, and waitpid2:

# tasks is a hash which maps pids to lambdas (callbacks):
tasks = {
  Process.spawn("some-task") => lambda do |status|
    process_and_email_results(status, "some task done!")
  end,
  Process.spawn("some-other-task --foobar") => lambda do |status|
    process_and_email_results(status, "some other task done!")
  end,
}

until tasks.empty?
  pid, status = Process.waitpid2(-1)
  if callback = tasks.delete(pid)
    callback.call(status)
  else
    warn "reaped unknown process: #{status.inspect}"
  end
end

Even green threads have this danger, don't they?

Taking over manual scheduling seems a bit awkward compared to using some kind of concurrency control (mutexes, queues, actors). What happens if application code inside the fiber (process_and_email_results in the example) makes a blocking IO call?

Manual scheduling with fibers is great for testing concurrent code which would otherwise run in threads, because you can force a certain kind of contention in a predicable way. I'm working on extracting a library for doing this from a project where it's been a useful technique.

···

On 03/12/2014 06:22 AM, Samuel Williams wrote:

Threads are good but I felt like I wanted something more predictable.
Also, not all implementations of Ruby use green threads and therefore
might have synchronisation issues if you use (either directly or
indirectly through a gem/library) shared global state.

Even green threads have this danger, don't they?

Yes, but in this context, I'm actually not sure I'd call the manual
scheduling a danger. While it could be referred to as explicit scheduling,
I prefer to look at as providing a specific, well defined, non-blocking API
with explicit synchronisation points.

(I think what I really like about fibers is they make it very easy to
compose concurrent code in a predictable way. For all intents and purposes,
the code is still sequential with very little overhead.)

Taking over manual scheduling seems a bit awkward compared to using some

kind of concurrency control (mutexes, queues, actors).

I would have said the opposite. Code using threads is typically very hard
to reason about compared to sequential code (like the API I've proposed).

Except in specific situations (e.g. game engines, data processing/access,
algorithms/compression), I find threading causes more problems than it
solves (e.g.
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
). Even debugging code with threads can be a nightmare - why is there a
deadlock - why is there memory corruption - etc. The only situation where
I've seen this working well in general is in languages/environments
designed from the ground up to support parallel processing (e.g. haskell,
clojure, etc). Everything else seems like a hack that requires careful
analysis to verify correctness and the path to the dark side is always just
one (poorly chosen) line of code away..

Anyway, basically, I really like fibers - if you want to run concurrent
unix processes, this gem is a good starting point.

Thanks for your thoughts and input.

Kind regards,
Samuel

···

On 13 March 2014 09:04, Joel VanderWerf <joelvanderwerf@gmail.com> wrote:

On 03/12/2014 06:22 AM, Samuel Williams wrote:

Threads are good but I felt like I wanted something more predictable.
Also, not all implementations of Ruby use green threads and therefore
might have synchronisation issues if you use (either directly or
indirectly through a gem/library) shared global state.

Even green threads have this danger, don't they?

Taking over manual scheduling seems a bit awkward compared to using some
kind of concurrency control (mutexes, queues, actors). What happens if
application code inside the fiber (process_and_email_results in the
example) makes a blocking IO call?

Manual scheduling with fibers is great for testing concurrent code which
would otherwise run in threads, because you can force a certain kind of
contention in a predicable way. I'm working on extracting a library for
doing this from a project where it's been a useful technique.

Still wondering how you handle blocking IO in fibers.

If all of the code inside the fiber is under your control, you can use non-blocking operations, and Fiber.yield if the operation would block. (See example below.)

But I get the impression you are dealing with various third-party libs which might just open a socket and start talking? Couldn't that block the fiber and therefore the whole thread?

This has always seemed to me to be the compelling feature of ruby's threads: you just let the thread scheduler manage blocking.

For anyone else who's reading and hasn't played with fibers, here's what you can do to avoid blocking the whole thread while one fiber waits for input:

···

----

require 'socket'
require 'fiber'

s1, s2 = UNIXSocket.pair

f = Fiber.new do
   loop do
     begin
       puts "Fiber checking for available data"
       data = s1.read_nonblock(10)
       puts "Fiber received #{data.inspect}"
     rescue IO::WaitReadable
       puts "Fiber yielding"
       Fiber.yield
       puts "Fiber resuming"
       unless IO.select([s1], , , 0)
         puts "..even though no data is available"
       end
       retry
     rescue => ex
       puts ex
     end
   end
end

f.resume

puts "writing to socket"
s2.write "123456"

f.resume

puts "writing to socket"
s2.write "abcdef"

f.resume

On 03/12/2014 05:03 PM, Samuel Williams wrote:

> Even green threads have this danger, don't they?

Yes, but in this context, I'm actually not sure I'd call the manual
scheduling a danger. While it could be referred to as explicit
scheduling, I prefer to look at as providing a specific, well defined,
non-blocking API with explicit synchronisation points.

(I think what I really like about fibers is they make it very easy to
compose concurrent code in a predictable way. For all intents and
purposes, the code is still sequential with very little overhead.)

Taking over manual scheduling seems a bit awkward compared to using some kind of concurrency control (mutexes, queues, actors).

I would have said the opposite. Code using threads is typically very
hard to reason about compared to sequential code (like the API I've
proposed).

Except in specific situations (e.g. game engines, data
processing/access, algorithms/compression), I find threading causes more
problems than it solves (e.g.
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them\).
Even debugging code with threads can be a nightmare - why is there a
deadlock - why is there memory corruption - etc. The only situation
where I've seen this working well in general is in
languages/environments designed from the ground up to support parallel
processing (e.g. haskell, clojure, etc). Everything else seems like a
hack that requires careful analysis to verify correctness and the path
to the dark side is always just one (poorly chosen) line of code away..

Anyway, basically, I really like fibers - if you want to run concurrent
unix processes, this gem is a good starting point.

Thanks for your thoughts and input.

Kind regards,
Samuel

On 13 March 2014 09:04, Joel VanderWerf <joelvanderwerf@gmail.com > <mailto:joelvanderwerf@gmail.com>> wrote:

    On 03/12/2014 06:22 AM, Samuel Williams wrote:

        Threads are good but I felt like I wanted something more
        predictable.
        Also, not all implementations of Ruby use green threads and
        therefore
        might have synchronisation issues if you use (either directly or
        indirectly through a gem/library) shared global state.

    Even green threads have this danger, don't they?

    Taking over manual scheduling seems a bit awkward compared to using
    some kind of concurrency control (mutexes, queues, actors). What
    happens if application code inside the fiber
    (process_and_email_results in the example) makes a blocking IO call?

    Manual scheduling with fibers is great for testing concurrent code
    which would otherwise run in threads, because you can force a
    certain kind of contention in a predicable way. I'm working on
    extracting a library for doing this from a project where it's been a
    useful technique.

This is a genuine concern with this sort of library. For it to really be
useful, you need to be able to do things like I/O concurrently. In fact, if
it can't do I/O, it's not particularly helpful, because Fibers are useless
for CPU-bound tasks by default. I/O is one of the biggest use cases of
fibers.

If you're curious how Celluloid handles it, it provides a Celluloid::IO
companion library which has duck types of things like TCPSocket, UDPSocket,
and UNIXSocket which interact with Celluloid's scheduling and can
suspend/resume fibers when they make "blocking" calls. I/O multiplexing is
handled by a central reactor/event loop (provided by nio4r)

···

On Wed, Mar 12, 2014 at 5:56 PM, Joel VanderWerf <joelvanderwerf@gmail.com>wrote:

Still wondering how you handle blocking IO in fibers.

--
Tony Arcieri

Still wondering how you handle blocking IO in fibers.

That wasn't an important feature for the intended purpose of the gem,
therefore there is no explicit support for it at the moment.. that might
seem like a cop out but it is exactly what I wanted (minimal features,
specific use-case).

But I get the impression you are dealing with various third-party libs

which might just open a socket and start talking? Couldn't that block the
fiber and therefore the whole thread?

That is the same problem you'd have for any sequential code, whether it is
running in a fiber or in an actor - calling something that blocks
indefinitely - but I think as a user you'd be aware of this. I'm not
proposing a solution to this problem, I think that's probably impossible
anyway.

This has always seemed to me to be the compelling feature of ruby's

threads: you just let the thread scheduler manage blocking.

The thread scheduler may seem like a good idea in theory, but in practice
event driven code that works with OS primitives (select, epoll, kevent) is
generally more efficient. I think there are good arguments either way (e.g.
SUN UltraSparc chips seemed to be designed for thread-based workloads,
running up to 64 threads in parallel, a bit like HyperThreading in x86),
but event driven systems generally seem easier to reason about, give more
predictable behaviour, better defined resource usage, etc. Also, as
mentioned, while some implementations use green threads, not all
implementations are using green threads. That means that if you use
threads, you need to deal with reentrancy and contention issues - at least
the same, if not more, complex than dealing with fibers (e.g. calling fork
might break everything when using threads, as mentioned).

Thanks for the example code. I'm sure that can be done more efficiently and
cleanly by having one function calling #select and resuming the correct
fiber.

Thanks for your ideas and feedback.

Kind regards,
Samuel

···

On 13 March 2014 13:56, Joel VanderWerf <joelvanderwerf@gmail.com> wrote:

Still wondering how you handle blocking IO in fibers.

If all of the code inside the fiber is under your control, you can use
non-blocking operations, and Fiber.yield if the operation would block. (See
example below.)

But I get the impression you are dealing with various third-party libs
which might just open a socket and start talking? Couldn't that block the
fiber and therefore the whole thread?

This has always seemed to me to be the compelling feature of ruby's
threads: you just let the thread scheduler manage blocking.

For anyone else who's reading and hasn't played with fibers, here's what
you can do to avoid blocking the whole thread while one fiber waits for
input:

----

require 'socket'
require 'fiber'

s1, s2 = UNIXSocket.pair

f = Fiber.new do
  loop do
    begin
      puts "Fiber checking for available data"
      data = s1.read_nonblock(10)
      puts "Fiber received #{data.inspect}"
    rescue IO::WaitReadable
      puts "Fiber yielding"
      Fiber.yield
      puts "Fiber resuming"
      unless IO.select([s1], , , 0)
        puts "..even though no data is available"
      end
      retry
    rescue => ex
      puts ex
    end
  end
end

f.resume
f.resume

puts "writing to socket"
s2.write "123456"

f.resume
f.resume

puts "writing to socket"
s2.write "abcdef"

f.resume
f.resume

On 03/12/2014 05:03 PM, Samuel Williams wrote:

> Even green threads have this danger, don't they?

Yes, but in this context, I'm actually not sure I'd call the manual
scheduling a danger. While it could be referred to as explicit
scheduling, I prefer to look at as providing a specific, well defined,
non-blocking API with explicit synchronisation points.

(I think what I really like about fibers is they make it very easy to
compose concurrent code in a predictable way. For all intents and
purposes, the code is still sequential with very little overhead.)

Taking over manual scheduling seems a bit awkward compared to using some

kind of concurrency control (mutexes, queues, actors).

I would have said the opposite. Code using threads is typically very
hard to reason about compared to sequential code (like the API I've
proposed).

Except in specific situations (e.g. game engines, data
processing/access, algorithms/compression), I find threading causes more
problems than it solves (e.g.
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-
before-using-them).
Even debugging code with threads can be a nightmare - why is there a
deadlock - why is there memory corruption - etc. The only situation
where I've seen this working well in general is in
languages/environments designed from the ground up to support parallel
processing (e.g. haskell, clojure, etc). Everything else seems like a
hack that requires careful analysis to verify correctness and the path
to the dark side is always just one (poorly chosen) line of code away..

Anyway, basically, I really like fibers - if you want to run concurrent
unix processes, this gem is a good starting point.

Thanks for your thoughts and input.

Kind regards,
Samuel

On 13 March 2014 09:04, Joel VanderWerf <joelvanderwerf@gmail.com >> <mailto:joelvanderwerf@gmail.com>> wrote:

    On 03/12/2014 06:22 AM, Samuel Williams wrote:

        Threads are good but I felt like I wanted something more
        predictable.
        Also, not all implementations of Ruby use green threads and
        therefore
        might have synchronisation issues if you use (either directly or
        indirectly through a gem/library) shared global state.

    Even green threads have this danger, don't they?

    Taking over manual scheduling seems a bit awkward compared to using
    some kind of concurrency control (mutexes, queues, actors). What
    happens if application code inside the fiber
    (process_and_email_results in the example) makes a blocking IO call?

    Manual scheduling with fibers is great for testing concurrent code
    which would otherwise run in threads, because you can force a
    certain kind of contention in a predicable way. I'm working on
    extracting a library for doing this from a project where it's been a
    useful technique.

This is a genuine concern with this sort of library. For it to really be

useful

This library is VERY useful for me in it's current form. If you want
concurrent I/O, yes, don't use this library. IF you just want to run
processes to completion concurrently, this library is perfect. I'm using it
to retrofit existing sequential scripts and also in another project similar
to make which doesn't care about IO, just running compilers/linkers, etc.

···

On 13 March 2014 19:01, Tony Arcieri <tony.arcieri@gmail.com> wrote:

On Wed, Mar 12, 2014 at 5:56 PM, Joel VanderWerf <joelvanderwerf@gmail.com > > wrote:

Still wondering how you handle blocking IO in fibers.

This is a genuine concern with this sort of library. For it to really be
useful, you need to be able to do things like I/O concurrently. In fact, if
it can't do I/O, it's not particularly helpful, because Fibers are useless
for CPU-bound tasks by default. I/O is one of the biggest use cases of
fibers.

If you're curious how Celluloid handles it, it provides a Celluloid::IO
companion library which has duck types of things like TCPSocket, UDPSocket,
and UNIXSocket which interact with Celluloid's scheduling and can
suspend/resume fibers when they make "blocking" calls. I/O multiplexing is
handled by a central reactor/event loop (provided by nio4r)

--
Tony Arcieri