Two threads connected by a queue, passing one fixnum at a time, doing trivial calculations. Run time should be linear in number of inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.
Here's the code, with timing outputs in the DATA section:
require 'thread'
N = Integer(ARGV.shift || 1000)
q = Queue.new
th = Thread.new do
n = 0
while x = q.pop
n += 1
end
puts "n = #{n}"
end
time0 = Process.times
N.times do
100.times do |x|
q.push x
end
end
q.push nil
Adding "Thread.pass" after q.push seems to improve things as it
keeps the queue smaller.
Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.
The array was growing larger due to thread scheduling being different
and less likely to switch threads.
I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.
···
Joel VanderWerf <joelvanderwerf@gmail.com> wrote:
Two threads connected by a queue, passing one fixnum at a time,
doing trivial calculations. Run time should be linear in number of
inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.
Two threads connected by a queue, passing one fixnum at a time,
doing trivial calculations. Run time should be linear in number of
inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.
N.times do
100.times do |x|
q.push x
Adding "Thread.pass" after q.push seems to improve things as it
keeps the queue smaller.
Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.
I wonder if we can make a more efficient Array by making it a circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don't see?
···
Joel VanderWerf <joelvanderwerf@gmail.com> wrote:
The array was growing larger due to thread scheduling being different
and less likely to switch threads.
I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.
FWIW, JRuby's Queue is implemented in Java, primarily because we it's
faster to use Java's synchronization primitives directly. And indeed,
it uses a LinkedList.
- Charlie
···
On Fri, Dec 16, 2011 at 4:27 PM, Eric Wong <normalperson@yhbt.net> wrote:
Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.
The array was growing larger due to thread scheduling being different
and less likely to switch threads.
I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.
It would require a lot of changes to existing C code in both Ruby itself
and C extensions that depend on RARRAY_PTR().
Better to start with a new data structure (implemented entirely in Ruby)
for an uncommon case.
···
Peter Zotov <whitequark@whitequark.org> wrote:
Eric Wong писал 17.12.2011 02:27:
>Part of the problem is the Queue in MRI 1.9.x uses an Array
>internally.
>Calling Array#shift is expensive on larger Arrays as it needs to move
>all the unshifted elements. Using a linked list as the insternal
>structure should improve things.
I wonder if we can make a more efficient Array by making it a
circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I
don't see?
I wonder if we can make a more efficient Array by making it a circular
buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don't
see?
Hi,
Is there some C++ STL-like data structure in Ruby? Or is it a good idea
if I (or some other people) try to make one? With STL-like data
structure you can have the best structure for the problem at hand, and
we don't have to use Array and Hash all the time.
Eric Wong писал 17.12.2011 02:27:
>Part of the problem is the Queue in MRI 1.9.x uses an Array
>internally.
>Calling Array#shift is expensive on larger Arrays as it needs to move
>all the unshifted elements. Using a linked list as the insternal
>structure should improve things.
I wonder if we can make a more efficient Array by making it a
circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I
don't see?
It would require a lot of changes to existing C code in both Ruby itself
and C extensions that depend on RARRAY_PTR().
Yes, RARRAY_PTR will break. This is unacceptable.
Better to start with a new data structure (implemented entirely in Ruby)
for an uncommon case.
I'm not sure that a pure Ruby implementation will be even nearly as efficient
as Array on MRI. How would you implement it?
I wonder if we can make a more efficient Array by making it a circular
buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don't
see?
Hi,
Is there some C++ STL-like data structure in Ruby? Or is it a good idea
if I (or some other people) try to make one? With STL-like data
structure you can have the best structure for the problem at hand, and
we don't have to use Array and Hash all the time.
Regards,
Bill
Array _is_ std::vector, and Hash, again, is exactly std::map, if you leave out
the strong typing part. This whole discussion is about implementing a construct
like std::queue (or deque).
I wonder if we can make a more efficient Array by making it a circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push
be
O(n) only if the array grows bigger. Are there any implications I
don't see?
It would require a lot of changes to existing C code in both Ruby
itself
and C extensions that depend on RARRAY_PTR().
Performance sucks, but at least it appears to be linear as it grows:
$ for i in 1000 10000 100000; do ./trunk/ruby -I lib /tmp/zzz.rb $i; done
n = 100000
utime = 0.27999999999999997
n = 1000000
utime = 6.08
in = 10000000
utime = 57.82
(/tmp/zzz.rb is Joel's original script without Thread.pass)
Maybe keeping the array small, swapping Thread#wakeup for Thread#run
and calling Thread#run outside of the Mutex#synchronize block
will be better...
Nope... Throwing Thread.pass/Thread#run in various places didn't seem
to significantly improve over the Ruby linked list implementation.