time 'without threadify' do
uris.each do |uri|
body = open(uri){|pipe| pipe.read}
end
end
time 'with threadify' do
uris.threadify do |uri|
body = open(uri){|pipe| pipe.read}
end
end
BEGIN {
def time label
a = Time.now.to_f
yield
ensure
b = Time.now.to_f
y label => (b - a)
end
}
~ > ruby sample/a.rb
···
---
without threadify: 7.41900205612183
---
with threadify: 3.69886112213135
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama
➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each {|i| p fib(i)}"
...
real 0m11.889s
user 0m11.733s
sys 0m0.188s
~/NetBeansProjects/jruby ➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.threadify {|i| p fib(i)}"
...
real 0m8.213s
user 0m12.722s
sys 0m0.178s
(One thread on my system consumes roughly 65-70% CPU, which explains why full CPU on both cores doesn't double performance here)
I also found some weird bug where Thread#kill/exit from within the thread interacts weirdly with join happening outside, and never terminates. Fixing that now.
require 'open\-uri'
require 'yaml'
require 'rubygems'
require 'threadify'
uris =
%w\(
http://google.com
http://yahoo.com
http://rubyforge.org
http://ruby-lang.org
http://kcrw.org
http://drawohara.com
http://codeforpeople.com
\)
time 'withoutthreadify' do
uris\.each do |uri|
body = open\(uri\)\{|pipe| pipe\.read\}
end
end
time 'withthreadify' do
uris\.threadifydo |uri|
body = open\(uri\)\{|pipe| pipe\.read\}
end
end
BEGIN \{
def time label
a = Time\.now\.to\_f
yield
ensure
b = Time\.now\.to\_f
y label => \(b \- a\)
end
\}
➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each {|i| p fib(i)}"
wow that's cool - now that's a a seriously easy way to parallelize
I also found some weird bug where Thread#kill/exit from within the thread interacts weirdly with join happening outside, and never terminates. Fixing that now.
glad to have helped
i just pushed out 0.0.2 and it just lets the thread die rather that self-destructing. see how that works...
On Jul 1, 2008, at 3:04 PM, Charles Oliver Nutter wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama
➔ time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each {|i| p fib(i)}"
wow that's cool - now that's a a seriously easy way to parallelize
I also found some weird bug where Thread#kill/exit from within the thread interacts weirdly with join happening outside, and never terminates. Fixing that now.
glad to have helped
i just pushed out 0.0.2 and it just lets the thread die rather that self-destructing. see how that works...
On Jul 1, 2008, at 3:04 PM, Charles Oliver Nutter wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama
I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.
Sample code and three different results are posted here: http://pastie.org/230287\. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
i just pushed out 0.0.2 and it just lets the thread die rather that self-destructing. see how that works...
I fixed in JRuby just now (Thread#kill does an implicit join in JRuby to make sure the thread dies...but if target == caller it was still trying to join itself in a weird way) but basically breaking out of the loop instead of Thread#exit solved it. Your 0.0.2 change is probably equivalent.
On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver Nutter > <charles.nutter@sun.com> wrote:
ara howard wrote:
this one's for you charlie
Appears to work just dandy under JRuby:
I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.
Sample code and three different results are posted here: http://pastie.org/230287\. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.
Sample code and three different results are posted here: http://pastie.org/230287\. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
Thanks for filing the bug. I'm looking into it now.
In general we have inserted synchronization code only where it really appears to be necessary to maintain the integrity of data structures. That means that in some cases, you need to be mindful of code actually running in parallel against e.g. arrays, hashes, strings, and so on. But we do want to reduce the possibility of a Java exception, so I'll investigate a bit.
- Charlie
···
On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver Nutter > <charles.nutter@sun.com> wrote:
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
Ok, there's good news and bad news. First the good news.
I've found several egregious threading bugs in JRuby's Enumerable implementation that probably caused the bulk of errors you saw. Basically, the runtime information for the main Ruby thread in JRuby was getting reused by the blocks passed into threadify, causing all sorts of wacky errors (multiple threads all sharing runtime thread data...fun!). Fixing that seems to have resolved most of the errors.
Now the bad news...
What you're doing is a bit suspect. In this case, it works out reasonable well, since you're just doing a map and gathering results. There's some remaining bugs in JRuby wrt the temporary data structure used to gather map results (it needs to be made thread-safe) but it can work. However in general I don't think this use of threadify is going to apply well to Enumera(ble|tor) since so many of the operations depend on the result of the previous iteration.
I'll have the remaining issues wrapped up shortly, but I'd love to see someone come up with a safe set of Enumerable-like operations that can run in parallel. For example, a detect that uses a cross-thread trigger to stop all iterations (rather than the naive threadification of detect which would not propagate a successful detection out of the thread). Things like that could be very useful.
I'd also love to see someone come up with a nice installable gem of truly thread-safe wrappers around the core collections, since in general I don't believe the core array and friends should suffer the perf penalty that comes from always synchronizing.
In general we have inserted synchronization code only where it really appears to be necessary to maintain the integrity of data structures. That means that in some cases, you need to be mindful of code actually running in parallel against e.g. arrays, hashes, strings, and so on. But we do want to reduce the possibility of a Java exception, so I'll investigate a bit.
I did find a few threading bugs in JRuby, and I'm working on them now. Most of them seem specific to Enumerator...
check out 0.0.3, it allows this, but the sync overhead is prohibitive for in memory stuff - for network scraping it'd be great though. anyhow, 0.0.3 allows one the 'break' from parallel processing and the value broken with will be the same as if the jobs were run serially. damn tricky that.
On Jul 11, 2008, at 2:38 PM, Charles Oliver Nutter wrote:
Ok, there's good news and bad news. First the good news.
I've found several egregious threading bugs in JRuby's Enumerable implementation that probably caused the bulk of errors you saw. Basically, the runtime information for the main Ruby thread in JRuby was getting reused by the blocks passed into threadify, causing all sorts of wacky errors (multiple threads all sharing runtime thread data...fun!). Fixing that seems to have resolved most of the errors.
Now the bad news...
What you're doing is a bit suspect. In this case, it works out reasonable well, since you're just doing a map and gathering results. There's some remaining bugs in JRuby wrt the temporary data structure used to gather map results (it needs to be made thread-safe) but it can work. However in general I don't think this use of threadify is going to apply well to Enumera(ble|tor) since so many of the operations depend on the result of the previous iteration.
I'll have the remaining issues wrapped up shortly, but I'd love to see someone come up with a safe set of Enumerable-like operations that can run in parallel. For example, a detect that uses a cross-thread trigger to stop all iterations (rather than the naive threadification of detect which would not propagate a successful detection out of the thread). Things like that could be very useful.
I'd also love to see someone come up with a nice installable gem of truly thread-safe wrappers around the core collections, since in general I don't believe the core array and friends should suffer the perf penalty that comes from always synchronizing.
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama
On Fri, Jul 11, 2008 at 4:38 PM, Charles Oliver Nutter <charles.nutter@sun.com> wrote:
Michael Guterl wrote:
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
Ok, there's good news and bad news. First the good news.
I've found several egregious threading bugs in JRuby's Enumerable
implementation that probably caused the bulk of errors you saw. Basically,
the runtime information for the main Ruby thread in JRuby was getting reused
by the blocks passed into threadify, causing all sorts of wacky errors
(multiple threads all sharing runtime thread data...fun!). Fixing that seems
to have resolved most of the errors.
Now the bad news...
What you're doing is a bit suspect. In this case, it works out reasonable
well, since you're just doing a map and gathering results. There's some
remaining bugs in JRuby wrt the temporary data structure used to gather map
results (it needs to be made thread-safe) but it can work. However in
general I don't think this use of threadify is going to apply well to
Enumera(ble|tor) since so many of the operations depend on the result of the
previous iteration.
I'll have the remaining issues wrapped up shortly, but I'd love to see
someone come up with a safe set of Enumerable-like operations that can run
in parallel. For example, a detect that uses a cross-thread trigger to stop
all iterations (rather than the naive threadification of detect which would
not propagate a successful detection out of the thread). Things like that
could be very useful.
I'd also love to see someone come up with a nice installable gem of truly
thread-safe wrappers around the core collections, since in general I don't
believe the core array and friends should suffer the perf penalty that comes
from always synchronizing.