[ANN] threadify-0.0.1

this one's for you charlie :wink:

NAME
   threadify.rb

SYNOPSIS
   enumerable = %w( a b c d )
   enumerable.threadify(2){ 'process this block using two worker threads' }

DESCRIPTION
   threadify.rb makes it stupid easy to process a bunch of data using 'n'
   worker threads

INSTALL
   gem install threadify

URI
   http://rubyforge.org/projects/codeforpeople

SAMPLES

   <========< sample/a.rb >========>

   ~ > cat sample/a.rb

     require 'open-uri'
     require 'yaml'

     require 'rubygems'
     require 'threadify'

     uris =
       %w(
         http://google.com
         http://yahoo.com
         http://rubyforge.org
         http://ruby-lang.org
         http://kcrw.org
         http://drawohara.com
         http://codeforpeople.com
       )

     time 'without threadify' do
       uris.each do |uri|
         body = open(uri){|pipe| pipe.read}
       end
     end

     time 'with threadify' do
       uris.threadify do |uri|
         body = open(uri){|pipe| pipe.read}
       end
     end

     BEGIN {
       def time label
         a = Time.now.to_f
         yield
       ensure
         b = Time.now.to_f
         y label => (b - a)
       end
     }

   ~ > ruby sample/a.rb

···

---
     without threadify: 7.41900205612183
     ---
     with threadify: 3.69886112213135

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Thank you! This gem pretty much makes my life simpler, and will continue to make it simpler!

(stdlib please?)

~ ari

I only see a tgz link which redirects me to

http://rubyforge.rubyuser.de/codeforpeople/threadify-0.0.1.tgz

which in turn 404s

martin

···

On Tue, Jul 1, 2008 at 1:04 PM, ara howard <ara.t.howard@gmail.com> wrote:

URI
http://rubyforge.org/projects/codeforpeople

ara howard wrote:

this one's for you charlie :wink:

Appears to work just dandy under JRuby:

➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each {|i| p fib(i)}"
  ...
real 0m11.889s
user 0m11.733s
sys 0m0.188s
~/NetBeansProjects/jruby ➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.threadify {|i| p fib(i)}"
  ...
real 0m8.213s
user 0m12.722s
sys 0m0.178s

(One thread on my system consumes roughly 65-70% CPU, which explains why full CPU on both cores doesn't double performance here)

I also found some weird bug where Thread#kill/exit from within the thread interacts weirdly with join happening outside, and never terminates. Fixing that now.

- Cahrlie

Pretty cool. I tried it with file-find. Here was the code:

require 'file/find'
require 'threadify'

rule = File::Find.new(
   :pattern => "*.rb",
   :path => "C:\\ruby"
)

start = Time.now

rule.find.threadify(10){ |f|
   p f
}

p start
p Time.now

Without threadify, it took 1:40 on my laptop. With threadify(10) it
dropped to 44 seconds.

I think I'll add a "threads" option directly, and borrow some of your
code. :slight_smile:

Thanks,

Dan

···

On Jul 1, 2:04 pm, ara howard <ara.t.how...@gmail.com> wrote:

this one's for you charlie :wink:

NAME
threadify.rb

SYNOPSIS
enumerable = %w( a b c d )
enumerable.threadify(2){ 'process this block using two worker
threads' }

DESCRIPTION
threadify.rb makes it stupid easy to process a bunch of data using
'n'
worker threads

INSTALL
gem installthreadify

URI
http://rubyforge.org/projects/codeforpeople

SAMPLES

<========< sample/a.rb >========>

~ > cat sample/a.rb

 require &#39;open\-uri&#39;
 require &#39;yaml&#39;

 require &#39;rubygems&#39;
 require &#39;threadify&#39;

 uris =
   %w\(
     http://google.com
     http://yahoo.com
     http://rubyforge.org
     http://ruby-lang.org
     http://kcrw.org
     http://drawohara.com
     http://codeforpeople.com
   \)

 time &#39;withoutthreadify&#39; do
   uris\.each do |uri|
     body = open\(uri\)\{|pipe| pipe\.read\}
   end
 end

 time &#39;withthreadify&#39; do
   uris\.threadifydo |uri|
     body = open\(uri\)\{|pipe| pipe\.read\}
   end
 end

 BEGIN \{
   def time label
     a = Time\.now\.to\_f
     yield
   ensure
     b = Time\.now\.to\_f
     y label =&gt; \(b \- a\)
   end
 \}

~ > ruby sample/a.rb

 \-\-\-
 withoutthreadify: 7\.41900205612183
 \-\-\-
 withthreadify: 3\.69886112213135

Martin DeMello wrote:

···

On Tue, Jul 1, 2008 at 1:04 PM, ara howard <ara.t.howard@gmail.com> wrote:

URI
http://rubyforge.org/projects/codeforpeople

I only see a tgz link which redirects me to

http://rubyforge.rubyuser.de/codeforpeople/threadify-0.0.1.tgz

which in turn 404s

mirror delay. check codeforpeople svn, it's only one file.

- Charlie

Appears to work just dandy under JRuby:

➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each {|i| p fib(i)}"

wow that's cool - now that's a a seriously easy way to parallelize :wink:

I also found some weird bug where Thread#kill/exit from within the thread interacts weirdly with join happening outside, and never terminates. Fixing that now.

glad to have helped :wink:

i just pushed out 0.0.2 and it just lets the thread die rather that self-destructing. see how that works...

cheers.

a @ http://codeforpeople.com/

···

On Jul 1, 2008, at 3:04 PM, Charles Oliver Nutter wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Appears to work just dandy under JRuby:

➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each {|i| p fib(i)}"

wow that's cool - now that's a a seriously easy way to parallelize :wink:

I also found some weird bug where Thread#kill/exit from within the thread interacts weirdly with join happening outside, and never terminates. Fixing that now.

glad to have helped :wink:

i just pushed out 0.0.2 and it just lets the thread die rather that self-destructing. see how that works...

cheers.

a @ http://codeforpeople.com/

···

On Jul 1, 2008, at 3:04 PM, Charles Oliver Nutter wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287\. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

threadify-0.0.2
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-08 rev 7130) [i386-java]
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

OS X 10.5.4

Thanks,
Michael Guterl

···

On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver Nutter <charles.nutter@sun.com> wrote:

ara howard wrote:

this one's for you charlie :wink:

Appears to work just dandy under JRuby:

sweet. i wouldn't launch rockets with it - but it a cheap speedup for a bunch of ruby code. btw - check out my find method

   http://codeforpeople.com/lib/ruby/alib/alib-0.5.1/lib/alib-0.5.1/find2.rb

very stolen and hacked

a @ http://codeforpeople.com/

···

On Jul 10, 2008, at 8:39 PM, Daniel Berger wrote:

Pretty cool. I tried it with file-find. Here was the code:

require 'file/find'
require 'threadify'

rule = File::Find.new(
  :pattern => "*.rb",
  :path => "C:\\ruby"
)

start = Time.now

rule.find.threadify(10){ |f|
  p f
}

p start
p Time.now

Without threadify, it took 1:40 on my laptop. With threadify(10) it
dropped to 44 seconds.

I think I'll add a "threads" option directly, and borrow some of your
code. :slight_smile:

Thanks,

Dan

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

thanks, gotit. will also install the gem when it propagates, just to
keep my system informed :slight_smile:

m.

···

On Tue, Jul 1, 2008 at 2:02 PM, Charles Oliver Nutter <charles.nutter@sun.com> wrote:

mirror delay. check codeforpeople svn, it's only one file.

ara.t.howard wrote:

i just pushed out 0.0.2 and it just lets the thread die rather that self-destructing. see how that works...

I fixed in JRuby just now (Thread#kill does an implicit join in JRuby to make sure the thread dies...but if target == caller it was still trying to join itself in a weird way) but basically breaking out of the loop instead of Thread#exit solved it. Your 0.0.2 change is probably equivalent.

- Charlie

bunch of 'java.lang' stuff in there - i'm out! :wink:

a @ http://codeforpeople.com/

···

On Jul 8, 2008, at 5:43 PM, Michael Guterl wrote:

On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver Nutter > <charles.nutter@sun.com> wrote:

ara howard wrote:

this one's for you charlie :wink:

Appears to work just dandy under JRuby:

I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287\. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

threadify-0.0.2
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-08 rev 7130) [i386-java]
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

OS X 10.5.4

Thanks,
Michael Guterl

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Michael Guterl wrote:

ara howard wrote:

this one's for you charlie :wink:

Appears to work just dandy under JRuby:

I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287\. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Thanks for filing the bug. I'm looking into it now.

In general we have inserted synchronization code only where it really appears to be necessary to maintain the integrity of data structures. That means that in some cases, you need to be mindful of code actually running in parallel against e.g. arrays, hashes, strings, and so on. But we do want to reduce the possibility of a Java exception, so I'll investigate a bit.

- Charlie

···

On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver Nutter > <charles.nutter@sun.com> wrote:

Michael Guterl wrote:

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Ok, there's good news and bad news. First the good news.

I've found several egregious threading bugs in JRuby's Enumerable implementation that probably caused the bulk of errors you saw. Basically, the runtime information for the main Ruby thread in JRuby was getting reused by the blocks passed into threadify, causing all sorts of wacky errors (multiple threads all sharing runtime thread data...fun!). Fixing that seems to have resolved most of the errors.

Now the bad news...

What you're doing is a bit suspect. In this case, it works out reasonable well, since you're just doing a map and gathering results. There's some remaining bugs in JRuby wrt the temporary data structure used to gather map results (it needs to be made thread-safe) but it can work. However in general I don't think this use of threadify is going to apply well to Enumera(ble|tor) since so many of the operations depend on the result of the previous iteration.

I'll have the remaining issues wrapped up shortly, but I'd love to see someone come up with a safe set of Enumerable-like operations that can run in parallel. For example, a detect that uses a cross-thread trigger to stop all iterations (rather than the naive threadification of detect which would not propagate a successful detection out of the thread). Things like that could be very useful.

I'd also love to see someone come up with a nice installable gem of truly thread-safe wrappers around the core collections, since in general I don't believe the core array and friends should suffer the perf penalty that comes from always synchronizing.

- Charlie

0.0.2 and gem should be up

a @ http://codeforpeople.com/

···

On Jul 1, 2008, at 3:11 PM, Martin DeMello wrote:

thanks, gotit. will also install the gem when it propagates, just to
keep my system informed :slight_smile:

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Charles Oliver Nutter wrote:

In general we have inserted synchronization code only where it really appears to be necessary to maintain the integrity of data structures. That means that in some cases, you need to be mindful of code actually running in parallel against e.g. arrays, hashes, strings, and so on. But we do want to reduce the possibility of a Java exception, so I'll investigate a bit.

I did find a few threading bugs in JRuby, and I'm working on them now. Most of them seem specific to Enumerator...

- Charlie

check out 0.0.3, it allows this, but the sync overhead is prohibitive for in memory stuff - for network scraping it'd be great though. anyhow, 0.0.3 allows one the 'break' from parallel processing and the value broken with will be the same as if the jobs were run serially. damn tricky that.

cheers.

a @ http://codeforpeople.com/

···

On Jul 11, 2008, at 2:38 PM, Charles Oliver Nutter wrote:

Ok, there's good news and bad news. First the good news.

I've found several egregious threading bugs in JRuby's Enumerable implementation that probably caused the bulk of errors you saw. Basically, the runtime information for the main Ruby thread in JRuby was getting reused by the blocks passed into threadify, causing all sorts of wacky errors (multiple threads all sharing runtime thread data...fun!). Fixing that seems to have resolved most of the errors.

Now the bad news...

What you're doing is a bit suspect. In this case, it works out reasonable well, since you're just doing a map and gathering results. There's some remaining bugs in JRuby wrt the temporary data structure used to gather map results (it needs to be made thread-safe) but it can work. However in general I don't think this use of threadify is going to apply well to Enumera(ble|tor) since so many of the operations depend on the result of the previous iteration.

I'll have the remaining issues wrapped up shortly, but I'd love to see someone come up with a safe set of Enumerable-like operations that can run in parallel. For example, a detect that uses a cross-thread trigger to stop all iterations (rather than the naive threadification of detect which would not propagate a successful detection out of the thread). Things like that could be very useful.

I'd also love to see someone come up with a nice installable gem of truly thread-safe wrappers around the core collections, since in general I don't believe the core array and friends should suffer the perf penalty that comes from always synchronizing.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Thanks Charlie, I just verified that my script no longer crashes with
my latest pull of JRuby.

jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-12 rev 7146) [i386-java]

Regards,
Michael Guterl

···

On Fri, Jul 11, 2008 at 4:38 PM, Charles Oliver Nutter <charles.nutter@sun.com> wrote:

Michael Guterl wrote:

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Ok, there's good news and bad news. First the good news.

I've found several egregious threading bugs in JRuby's Enumerable
implementation that probably caused the bulk of errors you saw. Basically,
the runtime information for the main Ruby thread in JRuby was getting reused
by the blocks passed into threadify, causing all sorts of wacky errors
(multiple threads all sharing runtime thread data...fun!). Fixing that seems
to have resolved most of the errors.

Now the bad news...

What you're doing is a bit suspect. In this case, it works out reasonable
well, since you're just doing a map and gathering results. There's some
remaining bugs in JRuby wrt the temporary data structure used to gather map
results (it needs to be made thread-safe) but it can work. However in
general I don't think this use of threadify is going to apply well to
Enumera(ble|tor) since so many of the operations depend on the result of the
previous iteration.

I'll have the remaining issues wrapped up shortly, but I'd love to see
someone come up with a safe set of Enumerable-like operations that can run
in parallel. For example, a detect that uses a cross-thread trigger to stop
all iterations (rather than the naive threadification of detect which would
not propagate a successful detection out of the thread). Things like that
could be very useful.

I'd also love to see someone come up with a nice installable gem of truly
thread-safe wrappers around the core collections, since in general I don't
believe the core array and friends should suffer the perf penalty that comes
from always synchronizing.