Parallel for loop

There doesn't seem to be any EASY way of doing a parallel computation
in Ruby.
I would like to do something like this :

array.map do |i|
  fork do
    i + 1
  end
end
Process.waitall

wich would give back the array with one added to each element in an
array, and it would perform this "calculation" in parallel. However,
this doesn't work since fork runs a subprocess which is another Ruby
interpreter and I can't get anything back from that black hole, except
some exit status.

Actually, it would be really nice if there was a 'forkmap' method that
could do this:

array.forkmap do |i|
  i + 1
end

But there isn't, right?

http://peach.rubyforge.org/?peach

···

--
Posted via http://www.ruby-forum.com/.

Fredrik wrote:

There doesn't seem to be any EASY way of doing a parallel computation
in Ruby.
I would like to do something like this :

array.map do |i|
  fork do
    i + 1
  end
end
Process.waitall

wich would give back the array with one added to each element in an
array, and it would perform this "calculation" in parallel. However,
this doesn't work since fork runs a subprocess which is another Ruby
interpreter and I can't get anything back from that black hole, except
some exit status.

Actually, it would be really nice if there was a 'forkmap' method that
could do this:

array.forkmap do |i|
  i + 1
end

But there isn't, right?

Ruby uses green threads. All your threads would run within the Ruby
process, and aren't running parallel in the sense you seem to imply. If
you want to use threads, maybe JRuby and Java are what you seek.

JRuby uses Java threads, which are OS threads.

If I'm on the wrong tangent, just ignore this reply. :slight_smile:

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

You thought I was taking your woman away from you. You're jealous.
You tried to kill me with your bare hands. Would a Kelvan do that?
Would he have to? You're reacting with the emotions of a human.
You are human.
~ -- Kirk, "By Any Other Name," stardate 4657.5

cfp:~ > cat a.rb
module Enumerable
   def forkify &b
     map do |*a|
       r, w = IO.pipe
       fork do
         r.close
         w.write( Marshal.dump( b.call(*a) ) )
       end
       [ w.close, r ].last
     end.map{|r| Marshal.load [ r.read, r.close ].first}
   end
end

result =
   [0, 1, 2, 3].forkify do |i|
     p [ Process.ppid, Process.pid ]
     i ** 2
   end

p result

cfp:~ > ruby a.rb
[80870, 80871]
[80870, 80872]
[80870, 80873]
[80870, 80874]
[0, 1, 4, 9]

a @ http://codeforpeople.com/

···

On Apr 15, 2008, at 9:35 PM, Fredrik wrote:

But there isn't, right?

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

http://skynet.rubyforge.org/

HTH,
Michael Guterl

···

On Tue, Apr 15, 2008 at 11:35 PM, Fredrik <fredjoha@gmail.com> wrote:

There doesn't seem to be any EASY way of doing a parallel computation
in Ruby.
I would like to do something like this :

array.map do |i|
  fork do
    i + 1
  end
end
Process.waitall

wich would give back the array with one added to each element in an
array, and it would perform this "calculation" in parallel. However,
this doesn't work since fork runs a subprocess which is another Ruby
interpreter and I can't get anything back from that black hole, except
some exit status.

Actually, it would be really nice if there was a 'forkmap' method that
could do this:

array.forkmap do |i|
  i + 1
end

But there isn't, right?

Thanks! This code is just what I am looking for!
Peach for JRuby seems nice too, but I don't have JRuby :slight_smile:

···

On Apr 16, 1:10 pm, "ara.t.howard" <ara.t.how...@gmail.com> wrote:

On Apr 15, 2008, at 9:35 PM, Fredrik wrote:

> But there isn't, right?

cfp:~ > cat a.rb
module Enumerable
   def forkify &b
     map do |*a|
       r, w = IO.pipe
       fork do
         r.close
         w.write( Marshal.dump( b.call(*a) ) )
       end
       [ w.close, r ].last
     end.map{|r| Marshal.load [ r.read, r.close ].first}
   end
end

result =
   [0, 1, 2, 3].forkify do |i|
     p [ Process.ppid, Process.pid ]
     i ** 2
   end

p result

cfp:~ > ruby a.rb
[80870, 80871]
[80870, 80872]
[80870, 80873]
[80870, 80874]
[0, 1, 4, 9]

a @http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Actually, I'll change it a bit. I added Process.waitall since there
are otherwise some dead(?) processes left.

module Enumerable
  def fmap &b
    result = map do |*a|
      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      [ w.close, r ].last
    end
    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

Fredrik wrote:

Thanks! This code is just what I am looking for!
Peach for JRuby seems nice too, but I don't have JRuby :slight_smile:

It's just a download away. :wink:
jruby.codehouse.org

However, you'll need a JVM that is compatible (IIRC, JRE 1.4.2 and newer).

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

~ "But the important thing is persistence." -Calvin trying to juggle eggs

probably some errors there to catch - but the concept is solid enough. you might also be interested in slave.rb

   http://codeforpeople.com/lib/ruby/slave/slave-1.2.1/README

regards.

a @ http://codeforpeople.com/

···

On Apr 15, 2008, at 10:30 PM, Fredrik wrote:

Thanks! This code is just what I am looking for!

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

indeed, and you can blow up in the child and not know. read this code to see how to handle that:

http://codeforpeople.com/lib/ruby/open4/open4-0.9.6/lib/open4.rb

the bit about EOFError

it's damn tricky - it *excepts* get get an exception marshaled up a dedicated pipe, if this does *not* occur we know the child process started successfully. you can adapt.

cheers.

a @ http://codeforpeople.com/

···

On Apr 15, 2008, at 10:55 PM, Fredrik wrote:

Actually, I'll change it a bit. I added Process.waitall since there
are otherwise some dead(?) processes left.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Actually, I'll change it a bit. I added Process.waitall since there
are otherwise some dead(?) processes left.

module Enumerable
  def fmap &b
    result = map do |*a|

      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      [ w.close, r ].last
    end

    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

It's great:

Benchmark.realtime { [1,2,3,4,5,6,7,8,9].map { |i| sleep 1; i +1 } }

=> 8.99636912345886

Benchmark.realtime { [1,2,3,4,5,6,7,8,9].forkmap { |i| sleep 1;

i +1 } }
=> 1.02371001243591

XD

···

2008/4/16, Fredrik <fredjoha@gmail.com>:

--
Iñaki Baz Castillo
<ibc@aliax.net>

Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Fredrik wrote:

>
> Thanks! This code is just what I am looking for!
> Peach for JRuby seems nice too, but I don't have JRuby :slight_smile:

It's just a download away. :wink:
jruby.codehouse.org

However, you'll need a JVM that is compatible (IIRC, JRE 1.4.2 and newer).

www.jruby.org will get you there, and JRuby 1.1 requires Java 1.5 or higher.

- Charlie

I added an argument to limit the number of concurrent processes (my
workstation practically died when I ran all the processes I wanted to
run):

module Enumerable
  def forkmap n, &b
    result = map do |*a|
      nproc = 0
      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      if (nproc+=1) >= n
        Process.wait ; nproc -= 1
      end
      [ w.close, r ].last
    end
    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

It seems to be doing its job correctly :

Benchmark.realtime { [1,2,3].forkmap(3){|i| sleep(1) ; i * 2} }

=> 1.01134896278381

Benchmark.realtime { [1,2,3].forkmap(1){|i| sleep(1) ; i * 2} }

=> 3.01262402534485

/Fredrik

indeed, and you can blow up in the child and not know. read this code
to see how to handle that:

http://codeforpeople.com/lib/ruby/open4/open4-0.9.6/lib/open4.rb

the bit about EOFError

it's damn tricky - it *excepts* get get an exception marshaled up a
dedicated pipe, if this does *not* occur we know the child process
started successfully. you can adapt.

I'm not sure I understand what you mean. But are you saying that open4
can solve all my problems?

Charles Oliver Nutter wrote:

Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Fredrik wrote:

>
> Thanks! This code is just what I am looking for!
> Peach for JRuby seems nice too, but I don't have JRuby :slight_smile:

It's just a download away. :wink:
jruby.codehouse.org

However, you'll need a JVM that is compatible (IIRC, JRE 1.4.2 and
newer).

www.jruby.org will get you there,

Right, codehaus, not house. *facepalm*

and JRuby 1.1 requires Java 1.5 or
higher.

Oops, my mistake (didn't the 1.0 series require only JRE 1.4.2, or so?).

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

~ - You know you've been hacking too long when...
...you see a flock of birds and try to figure out the algorithms that
determine their movement.

Sorry...posting wrong code. These lines should be flipped:

    result = map do |*a|
      nproc = 0

should be

     nproc = 0
     result = map do |*a|

Sorry 'bout that...

/Fredrik

It's really great. I just see one thing to improve:
The new "n" parameter is mandatory since it's the first parameter. It would be
nice if it could be not defined (so = infinite):

forkmap(4) { code } --> max 4 process
forkmap { code } --> max infinite

Do you think your code can be feasible for production enviroments? maybe it
envolves some danger or risk? If not I suggest you to publish it in any way
since it's really cool and a missing feature of Ruby. :wink:

···

El Jueves, 17 de Abril de 2008, Fredrik escribió:

I added an argument to limit the number of concurrent processes (my
workstation practically died when I ran all the processes I wanted to
run):

module Enumerable
  def forkmap n, &b
    result = map do |*a|
      nproc = 0
      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      if (nproc+=1) >= n
        Process.wait ; nproc -= 1
      end
      [ w.close, r ].last
    end
    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

It seems to be doing its job correctly :

> Benchmark.realtime { [1,2,3].forkmap(3){|i| sleep(1) ; i * 2} }
=> 1.01134896278381
> Benchmark.realtime { [1,2,3].forkmap(1){|i| sleep(1) ; i * 2} }
=> 3.01262402534485

--
Iñaki Baz Castillo

Phillip Gawlowski wrote:

Oops, my mistake (didn't the 1.0 series require only JRE 1.4.2, or so?).

Yes, JRuby 1.0 worked on Java 1.4.2, but there were too many benefits moving to Java 5 to keep it that way, especially availability of annotations and the concurrency APIs.

- Charlie

i've got something close to gem'ing... there is nothing wrong with the concept - this is precisly how objects are returned from drb: marshaled data over a socket/pipe.

a @ http://codeforpeople.com/

···

On Apr 17, 2008, at 1:02 PM, Iñaki Baz Castillo wrote:

Do you think your code can be feasible for production enviroments? maybe it
envolves some danger or risk? If not I suggest you to publish it in any way
since it's really cool and a missing feature of Ruby. :wink:

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

The new "n" parameter is mandatory since it's the first parameter. It would be
nice if it could be not defined (so = infinite):

forkmap(4) { code } --> max 4 process
forkmap { code } --> max infinite

I was thinking about that too, but as far as I understand it Ruby only
allows optional arguments to be the last arguments - i.e. the "n"
parameter would have to appear after the code block. And that would
look strange : forkmap{ code }(4).