Ruby 1.9 threading performance goes non-linear

Two threads connected by a queue, passing one fixnum at a time, doing trivial calculations. Run time should be linear in number of inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.

Here's the code, with timing outputs in the DATA section:

require 'thread'

N = Integer(ARGV.shift || 1000)

q = Queue.new

th = Thread.new do
   n = 0
   while x = q.pop
     n += 1
   end
   puts "n = #{n}"
end

time0 = Process.times

N.times do
   100.times do |x|
     q.push x
   end
end
q.push nil

th.join

time1 = Process.times

puts "utime = #{time1.utime - time0.utime}"

__END__

$ ruby -v
ruby 1.9.3p0 (2011-10-30) [x86_64-linux]
$ ruby th.rb 10
n = 1000
utime = 0.01
$ ruby th.rb 100
n = 10000
utime = 0.01
$ ruby th.rb 1000
n = 100000
utime = 0.33999999999999997
$ ruby th.rb 10000
n = 1000000
utime = 44.65

Memory usage is pretty flat--seems to stay under 15MB.

For comparison, here's the same series on ruby 1.8.7:

$ ruby1.8 -v
ruby 1.8.7 (2010-06-23 patchlevel 299) [x86_64-linux]
$ ruby1.8 th.rb 10
n = 1000
utime = 0.01
$ ruby1.8 th.rb 100
n = 10000
utime = 0.14
$ ruby1.8 th.rb 1000
n = 100000
utime = 0.16
$ ruby1.8 th.rb 10000
n = 1000000
utime = 1.43

And FWIW the same series on jruby, but the comparison is probably muddled by the JIT compiler:

$ jruby -v -e 'p RUBY_VERSION'
jruby 1.6.5 (ruby-1.9.2-p136) (2011-10-25 9dcd388) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_26) [linux-amd64-java]
"1.9.2"
$ jruby th.rb 10
n = 1000
utime = 0.0349998474121094
$ jruby th.rb 100
n = 10000
utime = 0.293999910354614
$ jruby th.rb 1000
n = 100000
utime = 1.52300000190735
$ jruby th.rb 10000
n = 1000000
utime = 2.90700006484985

Adding "Thread.pass" after q.push seems to improve things as it
keeps the queue smaller.

Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.

The array was growing larger due to thread scheduling being different
and less likely to switch threads.

I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.

···

Joel VanderWerf <joelvanderwerf@gmail.com> wrote:

Two threads connected by a queue, passing one fixnum at a time,
doing trivial calculations. Run time should be linear in number of
inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.

N.times do
  100.times do |x|
    q.push x

Ok, I've updated patch, and now array is a real circular buffer both
with
#push/#shift pattern and #unshift/#pop pattern

patch for 1.9.3 and tests are here https://gist.github.com/2981959
pull request https://github.com/ruby/ruby/pull/133
issue http://bugs.ruby-lang.org/issues/6638

Sokolov Yura aka funny-falcon

···

--
Posted via http://www.ruby-forum.com/.

Eric Wong писал 17.12.2011 02:27:

Two threads connected by a queue, passing one fixnum at a time,
doing trivial calculations. Run time should be linear in number of
inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.

N.times do
  100.times do |x|
    q.push x

Adding "Thread.pass" after q.push seems to improve things as it
keeps the queue smaller.

Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.

I wonder if we can make a more efficient Array by making it a circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don't see?

···

Joel VanderWerf <joelvanderwerf@gmail.com> wrote:

The array was growing larger due to thread scheduling being different
and less likely to switch threads.

I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.

--
   WBR, Peter Zotov.

FWIW, JRuby's Queue is implemented in Java, primarily because we it's
faster to use Java's synchronization primitives directly. And indeed,
it uses a LinkedList.

- Charlie

···

On Fri, Dec 16, 2011 at 4:27 PM, Eric Wong <normalperson@yhbt.net> wrote:

Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.

The array was growing larger due to thread scheduling being different
and less likely to switch threads.

I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.

It would require a lot of changes to existing C code in both Ruby itself
and C extensions that depend on RARRAY_PTR().

Better to start with a new data structure (implemented entirely in Ruby)
for an uncommon case.

···

Peter Zotov <whitequark@whitequark.org> wrote:

Eric Wong писал 17.12.2011 02:27:
>Part of the problem is the Queue in MRI 1.9.x uses an Array
>internally.
>Calling Array#shift is expensive on larger Arrays as it needs to move
>all the unshifted elements. Using a linked list as the insternal
>structure should improve things.

I wonder if we can make a more efficient Array by making it a
circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I
don't see?

Peter Zotov wrote in post #1037076:

I wonder if we can make a more efficient Array by making it a circular
buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don't
see?

Hi,

Is there some C++ STL-like data structure in Ruby? Or is it a good idea
if I (or some other people) try to make one? With STL-like data
structure you can have the best structure for the problem at hand, and
we don't have to use Array and Hash all the time.

Regards,

Bill

···

--
Posted via http://www.ruby-forum.com/\.

Eric Wong писал 17.12.2011 03:05:

Eric Wong писал 17.12.2011 02:27:
>Part of the problem is the Queue in MRI 1.9.x uses an Array
>internally.
>Calling Array#shift is expensive on larger Arrays as it needs to move
>all the unshifted elements. Using a linked list as the insternal
>structure should improve things.

I wonder if we can make a more efficient Array by making it a
circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I
don't see?

It would require a lot of changes to existing C code in both Ruby itself
and C extensions that depend on RARRAY_PTR().

Yes, RARRAY_PTR will break. This is unacceptable.

Better to start with a new data structure (implemented entirely in Ruby)
for an uncommon case.

I'm not sure that a pure Ruby implementation will be even nearly as efficient
as Array on MRI. How would you implement it?

···

Peter Zotov <whitequark@whitequark.org> wrote:

--
   WBR, Peter Zotov.

Admin Tensor писал 17.12.2011 03:12:

Peter Zotov wrote in post #1037076:

I wonder if we can make a more efficient Array by making it a circular
buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don't
see?

Hi,

Is there some C++ STL-like data structure in Ruby? Or is it a good idea
if I (or some other people) try to make one? With STL-like data
structure you can have the best structure for the problem at hand, and
we don't have to use Array and Hash all the time.

Regards,

Bill

Array _is_ std::vector, and Hash, again, is exactly std::map, if you leave out
the strong typing part. This whole discussion is about implementing a construct
like std::queue (or deque).

···

--
   WBR, Peter Zotov.

Tried a singly linked list, but object allocation overhead sucks :<

Maybe keeping the array small, swapping Thread#wakeup for Thread#run
and calling Thread#run outside of the Mutex#synchronize block
will be better...

···

Peter Zotov <whitequark@whitequark.org> wrote:

Eric Wong писал 17.12.2011 03:05:
>Better to start with a new data structure (implemented entirely in
>Ruby)
>for an uncommon case.

I'm not sure that a pure Ruby implementation will be even nearly as
efficient
as Array on MRI. How would you implement it?

Peter Zotov wrote in post #1037082:

Eric Wong писал 17.12.2011 03:05:

I wonder if we can make a more efficient Array by making it a circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push
be
O(n) only if the array grows bigger. Are there any implications I
don't see?

It would require a lot of changes to existing C code in both Ruby
itself
and C extensions that depend on RARRAY_PTR().

Yes, RARRAY_PTR will break. This is unacceptable.

Well, I actually did it, see array.c: steel shared array's container when ARY_SHARED_NUM == 1 by funny-falcon · Pull Request #133 · ruby/ruby · GitHub
(corresponding issue Feature #6638: Array as queue - Ruby master - Ruby Issue Tracking System )
and see tests here Test patch for increasing "array as queue" speed · GitHub

And it doesn't break RARRAY_PTR() cause Array#shift already moves array
to shared copy, I just used that fact in a full way.

Strictly speaking, it still some times do whole memmove, but very very
rare, so that it could be considered as real circular buffer in
practice.

I've only implemented it for #push/#shift pattern, cause most libraries
actually use it. But it could be extended for #unshift/#pop as well.

Sokolov Yura aka funny-falcon

···

--
Posted via http://www.ruby-forum.com/\.

Tried a singly linked list, but object allocation overhead sucks :<

Btw, it's in the "ll-queue" branch of git://bogomips.org/ruby
(http://bogomips.org/ruby.git/commit/?h=ll-queue\) against trunk.

Performance sucks, but at least it appears to be linear as it grows:

$ for i in 1000 10000 100000; do ./trunk/ruby -I lib /tmp/zzz.rb $i; done
n = 100000
utime = 0.27999999999999997
n = 1000000
utime = 6.08
in = 10000000
utime = 57.82

(/tmp/zzz.rb is Joel's original script without Thread.pass)

Maybe keeping the array small, swapping Thread#wakeup for Thread#run
and calling Thread#run outside of the Mutex#synchronize block
will be better...

Nope... Throwing Thread.pass/Thread#run in various places didn't seem
to significantly improve over the Ruby linked list implementation.

Maybe this is worth revisiting: Feature #3620: Add Queue, SIzedQueue and ConditionVariable implementations in C in addition to ruby ones - Ruby master - Ruby Issue Tracking System

···

Eric Wong <normalperson@yhbt.net> wrote: