Parallel for loop

Fredrik · 16 April 2008 03:35

There doesn't seem to be any EASY way of doing a parallel computation
in Ruby.
I would like to do something like this :

array.map do |i|
  fork do
    i + 1
  end
end
Process.waitall

wich would give back the array with one added to each element in an
array, and it would perform this "calculation" in parallel. However,
this doesn't work since fork runs a subprocess which is another Ruby
interpreter and I can't get anything back from that black hole, except
some exit status.

Actually, it would be really nice if there was a 'forkmap' method that
could do this:

array.forkmap do |i|
i + 1
end

But there isn't, right?

Kyle_Hunter · 16 April 2008 03:42

http://peach.rubyforge.org/?peach

···

--
Posted via http://www.ruby-forum.com/.

Phil · 16 April 2008 03:47

Fredrik wrote:

There doesn't seem to be any EASY way of doing a parallel computation
in Ruby.
I would like to do something like this :

array.map do |i|
  fork do
    i + 1
  end
end
Process.waitall

wich would give back the array with one added to each element in an
array, and it would perform this "calculation" in parallel. However,
this doesn't work since fork runs a subprocess which is another Ruby
interpreter and I can't get anything back from that black hole, except
some exit status.

Actually, it would be really nice if there was a 'forkmap' method that
could do this:

array.forkmap do |i|
  i + 1
end

But there isn't, right?

Ruby uses green threads. All your threads would run within the Ruby
process, and aren't running parallel in the sense you seem to imply. If
you want to use threads, maybe JRuby and Java are what you seek.

JRuby uses Java threads, which are OS threads.

If I'm on the wrong tangent, just ignore this reply.

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

You thought I was taking your woman away from you. You're jealous.
You tried to kill me with your bare hands. Would a Kelvan do that?
Would he have to? You're reacting with the emotions of a human.
You are human.
~ -- Kirk, "By Any Other Name," stardate 4657.5

a11 · 16 April 2008 04:10

cfp:~ > cat a.rb
module Enumerable
   def forkify &b
     map do |*a|
       r, w = IO.pipe
       fork do
         r.close
         w.write( Marshal.dump( b.call(*a) ) )
       end
       [ w.close, r ].last
     end.map{|r| Marshal.load [ r.read, r.close ].first}
   end
end

result =
   [0, 1, 2, 3].forkify do |i|
     p [ Process.ppid, Process.pid ]
     i ** 2
   end

p result

cfp:~ > ruby a.rb
[80870, 80871]
[80870, 80872]
[80870, 80873]
[80870, 80874]
[0, 1, 4, 9]

a @ http://codeforpeople.com/

···

On Apr 15, 2008, at 9:35 PM, Fredrik wrote:

But there isn't, right?

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Michael_Guterl1 · 16 April 2008 04:26

http://skynet.rubyforge.org/

HTH,
Michael Guterl

···

On Tue, Apr 15, 2008 at 11:35 PM, Fredrik <fredjoha@gmail.com> wrote:

There doesn't seem to be any EASY way of doing a parallel computation
in Ruby.
I would like to do something like this :

array.map do |i|
  fork do
    i + 1
  end
end
Process.waitall

wich would give back the array with one added to each element in an
array, and it would perform this "calculation" in parallel. However,
this doesn't work since fork runs a subprocess which is another Ruby
interpreter and I can't get anything back from that black hole, except
some exit status.

Actually, it would be really nice if there was a 'forkmap' method that
could do this:

array.forkmap do |i|
  i + 1
end

But there isn't, right?

Fredrik · 16 April 2008 04:30

Thanks! This code is just what I am looking for!
Peach for JRuby seems nice too, but I don't have JRuby

···

On Apr 16, 1:10 pm, "ara.t.howard" <ara.t.how...@gmail.com> wrote:

On Apr 15, 2008, at 9:35 PM, Fredrik wrote:

> But there isn't, right?

cfp:~ > cat a.rb
module Enumerable
   def forkify &b
     map do |*a|
       r, w = IO.pipe
       fork do
         r.close
         w.write( Marshal.dump( b.call(*a) ) )
       end
       [ w.close, r ].last
     end.map{|r| Marshal.load [ r.read, r.close ].first}
   end
end

result =
   [0, 1, 2, 3].forkify do |i|
     p [ Process.ppid, Process.pid ]
     i ** 2
   end

p result

cfp:~ > ruby a.rb
[80870, 80871]
[80870, 80872]
[80870, 80873]
[80870, 80874]
[0, 1, 4, 9]

a @http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Fredrik · 16 April 2008 04:55

Actually, I'll change it a bit. I added Process.waitall since there
are otherwise some dead(?) processes left.

module Enumerable
  def fmap &b
    result = map do |*a|
      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      [ w.close, r ].last
    end
    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

Phil · 16 April 2008 04:35

Fredrik wrote:

Thanks! This code is just what I am looking for!
Peach for JRuby seems nice too, but I don't have JRuby

It's just a download away.
jruby.codehouse.org

However, you'll need a JVM that is compatible (IIRC, JRE 1.4.2 and newer).

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

~ "But the important thing is persistence." -Calvin trying to juggle eggs

a11 · 16 April 2008 04:44

probably some errors there to catch - but the concept is solid enough. you might also be interested in slave.rb

http://codeforpeople.com/lib/ruby/slave/slave-1.2.1/README

regards.

a @ http://codeforpeople.com/

···

On Apr 15, 2008, at 10:30 PM, Fredrik wrote:

Thanks! This code is just what I am looking for!

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

a11 · 16 April 2008 05:24

indeed, and you can blow up in the child and not know. read this code to see how to handle that:

http://codeforpeople.com/lib/ruby/open4/open4-0.9.6/lib/open4.rb

the bit about EOFError

it's damn tricky - it *excepts* get get an exception marshaled up a dedicated pipe, if this does *not* occur we know the child process started successfully. you can adapt.

cheers.

a @ http://codeforpeople.com/

···

On Apr 15, 2008, at 10:55 PM, Fredrik wrote:

Actually, I'll change it a bit. I added Process.waitall since there
are otherwise some dead(?) processes left.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Inaki_Baz_Castillo · 16 April 2008 10:17

Actually, I'll change it a bit. I added Process.waitall since there
are otherwise some dead(?) processes left.

module Enumerable
  def fmap &b
    result = map do |*a|

      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      [ w.close, r ].last
    end

    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

It's great:

Benchmark.realtime { [1,2,3,4,5,6,7,8,9].map { |i| sleep 1; i +1 } }

=> 8.99636912345886

Benchmark.realtime { [1,2,3,4,5,6,7,8,9].forkmap { |i| sleep 1;

i +1 } }
=> 1.02371001243591

XD

···

2008/4/16, Fredrik <fredjoha@gmail.com>:

--
Iñaki Baz Castillo
<ibc@aliax.net>

Charles_Oliver_Nutte · 16 April 2008 05:04

Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Fredrik wrote:

>
> Thanks! This code is just what I am looking for!
> Peach for JRuby seems nice too, but I don't have JRuby

It's just a download away.
jruby.codehouse.org

However, you'll need a JVM that is compatible (IIRC, JRE 1.4.2 and newer).

www.jruby.org will get you there, and JRuby 1.1 requires Java 1.5 or higher.

- Charlie

Fredrik · 17 April 2008 03:15

I added an argument to limit the number of concurrent processes (my
workstation practically died when I ran all the processes I wanted to
run):

module Enumerable
  def forkmap n, &b
    result = map do |*a|
      nproc = 0
      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      if (nproc+=1) >= n
        Process.wait ; nproc -= 1
      end
      [ w.close, r ].last
    end
    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

It seems to be doing its job correctly :

Benchmark.realtime { [1,2,3].forkmap(3){|i| sleep(1) ; i * 2} }

=> 1.01134896278381

Benchmark.realtime { [1,2,3].forkmap(1){|i| sleep(1) ; i * 2} }

=> 3.01262402534485

/Fredrik

Fredrik · 17 April 2008 03:15

indeed, and you can blow up in the child and not know. read this code
to see how to handle that:

http://codeforpeople.com/lib/ruby/open4/open4-0.9.6/lib/open4.rb

the bit about EOFError

it's damn tricky - it *excepts* get get an exception marshaled up a
dedicated pipe, if this does *not* occur we know the child process
started successfully. you can adapt.

I'm not sure I understand what you mean. But are you saying that open4
can solve all my problems?

Phil · 16 April 2008 05:11

Charles Oliver Nutter wrote:

Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Fredrik wrote:

>
> Thanks! This code is just what I am looking for!
> Peach for JRuby seems nice too, but I don't have JRuby

It's just a download away.
jruby.codehouse.org

However, you'll need a JVM that is compatible (IIRC, JRE 1.4.2 and
newer).

www.jruby.org will get you there,

Right, codehaus, not house. *facepalm*

and JRuby 1.1 requires Java 1.5 or
higher.

Oops, my mistake (didn't the 1.0 series require only JRE 1.4.2, or so?).

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

~ - You know you've been hacking too long when...
...you see a flock of birds and try to figure out the algorithms that
determine their movement.

Fredrik · 17 April 2008 05:55

Sorry...posting wrong code. These lines should be flipped:

result = map do |*a|
nproc = 0

should be

nproc = 0
result = map do |*a|

Sorry 'bout that...

/Fredrik

Inaki_Baz_Castillo · 17 April 2008 19:02

It's really great. I just see one thing to improve:
The new "n" parameter is mandatory since it's the first parameter. It would be
nice if it could be not defined (so = infinite):

forkmap(4) { code } --> max 4 process
forkmap { code } --> max infinite

Do you think your code can be feasible for production enviroments? maybe it
envolves some danger or risk? If not I suggest you to publish it in any way
since it's really cool and a missing feature of Ruby.

···

El Jueves, 17 de Abril de 2008, Fredrik escribió:

I added an argument to limit the number of concurrent processes (my
workstation practically died when I ran all the processes I wanted to
run):

module Enumerable
  def forkmap n, &b
    result = map do |*a|
      nproc = 0
      r, w = IO.pipe
      fork do
        r.close
        w.write( Marshal.dump( b.call(*a) ) )
      end
      if (nproc+=1) >= n
        Process.wait ; nproc -= 1
      end
      [ w.close, r ].last
    end
    Process.waitall
    result.map{|r| Marshal.load [ r.read, r.close ].first}
  end
end

It seems to be doing its job correctly :

> Benchmark.realtime { [1,2,3].forkmap(3){|i| sleep(1) ; i * 2} }
=> 1.01134896278381
> Benchmark.realtime { [1,2,3].forkmap(1){|i| sleep(1) ; i * 2} }
=> 3.01262402534485

--
Iñaki Baz Castillo

Charles_Oliver_Nutte · 16 April 2008 07:03

Phillip Gawlowski wrote:

Oops, my mistake (didn't the 1.0 series require only JRE 1.4.2, or so?).

Yes, JRuby 1.0 worked on Java 1.4.2, but there were too many benefits moving to Java 5 to keep it that way, especially availability of annotations and the concurrency APIs.

- Charlie

a11 · 17 April 2008 20:13

i've got something close to gem'ing... there is nothing wrong with the concept - this is precisly how objects are returned from drb: marshaled data over a socket/pipe.

a @ http://codeforpeople.com/

···

On Apr 17, 2008, at 1:02 PM, Iñaki Baz Castillo wrote:

Do you think your code can be feasible for production enviroments? maybe it
envolves some danger or risk? If not I suggest you to publish it in any way
since it's really cool and a missing feature of Ruby.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Fredrik · 18 April 2008 01:15

The new "n" parameter is mandatory since it's the first parameter. It would be
nice if it could be not defined (so = infinite):

forkmap(4) { code } --> max 4 process
forkmap { code } --> max infinite

I was thinking about that too, but as far as I understand it Ruby only
allows optional arguments to be the last arguments - i.e. the "n"
parameter would have to appear after the code block. And that would
look strange : forkmap{ code }(4).

Topic		Replies	Views
[ANN] forkoff - parallel processing for ruby enumerables ruby-talk	22	190	30 April 2008
[ANN] forkoff-1.1.0 ruby-talk	0	114	12 October 2009
Ruby for massively multi-core chips? ruby-talk	17	104	24 January 2007
Using multicore CPUs in parallel tasks ruby-talk	18	132	3 November 2009
How can i find lingering file descriptors? ruby-talk	6	130	14 September 2008

Parallel for loop

Related topics