Ruby's builtin Datastructures

Hello all,

for some time I’m wondering if there exists a library with datastructures
like deque, queue, list, btree, and similar in ruby, such that I don’t
have to implement them myself?
Until now I have found nothing, so I’m posting here in the hope that I
should not have re-rtfm’d more.

Thanks,

Brian

···


Brian Schröder
http://www.brian-schroeder.de/

for some time I’m wondering if there exists a library with datastructures
like deque, queue, list, btree, and similar in ruby, such that I don’t
have to implement them myself?

deque, queue, list… then ruby’s builtin Array class will fit.

irb
irb(main):001:0> stack =
=>
irb(main):002:0> stack.push(3)
=> [3]
irb(main):003:0> stack.push(9)
=> [3, 9]
irb(main):004:0> stack.push(10)
=> [3, 9, 10]
irb(main):005:0> stack.pop
=> 10
irb(main):006:0> stack.pop
=> 9
irb(main):007:0> stack
=> [3]
irb(main):008:0>

which kind of btree do you want? b+ b* … or

red-black tree
http://raa.ruby-lang.org/project/ruby-rbtree/

avl tree
http://raa.ruby-lang.org/project/ruby-avl/

Many goodies in the library section of RAA, which may interest you
http://raa.ruby-lang.org/cat.rhtml?category_major=Library

···

Brian Schroeder spam0504@bssoftware.de wrote:


Simon Strandgaard

“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040520123256.46e2f1c3.neoneye@adslhome.dk…

for some time I’m wondering if there exists a library with
datastructures
like deque, queue, list, btree, and similar in ruby, such that I don’t
have to implement them myself?

deque, queue, list… then ruby’s builtin Array class will fit.

irb
irb(main):001:0> stack =
=>
irb(main):002:0> stack.push(3)
=> [3]
irb(main):003:0> stack.push(9)
=> [3, 9]
irb(main):004:0> stack.push(10)
=> [3, 9, 10]
irb(main):005:0> stack.pop
=> 10
irb(main):006:0> stack.pop
=> 9
irb(main):007:0> stack
=> [3]
irb(main):008:0>

As queue:

irb(main):010:0> queue =
=>
irb(main):011:0> queue.push 1
=> [1]
irb(main):012:0> queue.push 2
=> [1, 2]
irb(main):013:0> queue.push 3
=> [1, 2, 3]
irb(main):014:0> queue.unshift
=> [1, 2, 3]
irb(main):015:0> queue.shift
=> 1
irb(main):016:0> queue.shift
=> 2
irb(main):017:0> queue.shift
=> 3
irb(main):018:0>

You can easily make that a bit more intuitive, if you like:

irb(main):018:0> class Array; alias :enq :push; alias :deq :shift; end
=> nil
irb(main):019:0> queue =
=>
irb(main):020:0> queue.enq 1
=> [1]
irb(main):021:0> queue.enq 2
=> [1, 2]
irb(main):022:0> queue.enq 3
=> [1, 2, 3]
irb(main):023:0> queue.deq
=> 1
irb(main):024:0> queue.deq
=> 2
irb(main):025:0> queue.deq
=> 3
irb(main):026:0> queue.deq
=> nil

And there’s also a thread safe queue:

$ irb -r thread
irb(main):001:0> queue = Queue.new
=> #<Queue:0x1017ad68 @que=, @waiting=>
irb(main):002:0> queue.enq 1
=> nil
irb(main):003:0> queue.enq 2
=> nil
irb(main):004:0> queue.enq 3
=> nil
irb(main):005:0> queue
=> #<Queue:0x1017ad68 @que=[1, 2, 3], @waiting=>
irb(main):006:0> queue.deq
=> 1
irb(main):007:0> queue.deq
=> 2
irb(main):008:0> queue.deq
=> 3

Regards

robert
···

Brian Schroeder spam0504@bssoftware.de wrote:

Thanks for your answer. I should have looked in the raa by myself. Just a
question regarding array as deque. How efficient is it? Is all the data
shifted through the array every time I push something at the front and
remove it from the back, or is it organized more efficiently?

Thanks,

Brian

···

On Thu, 20 May 2004 19:37:19 +0900, Simon Strandgaard wrote:

deque, queue, list… then ruby’s builtin Array class will fit.

irb
irb(main):001:0> stack =
=>
irb(main):002:0> stack.push(3)
=> [3]
irb(main):003:0> stack.push(9)
=> [3, 9]
irb(main):004:0> stack.push(10)
=> [3, 9, 10]
irb(main):005:0> stack.pop
=> 10
irb(main):006:0> stack.pop
=> 9
irb(main):007:0> stack
=> [3]
irb(main):008:0>

which kind of btree do you want? b+ b* … or

red-black tree
http://raa.ruby-lang.org/project/ruby-rbtree/

avl tree
http://raa.ruby-lang.org/project/ruby-avl/

Many goodies in the library section of RAA, which may interest you
http://raa.ruby-lang.org/cat.rhtml?category_major=Library


Brian Schröder
http://www.brian-schroeder.de/

Brian Schroeder wrote:

Thanks for your answer. I should have looked in the raa by myself. Just a
question regarding array as deque. How efficient is it? Is all the data
shifted through the array every time I push something at the front and
remove it from the back, or is it organized more efficiently?

I don’t know how Array is implemented… I think matz (Ruby’s author) has
chosen a datastructure with constant-time head/tail insert/removal.
Not sure though.

It would be nice with an O(x) chart of the most common operations:

my guess … please correct me if im wrong

···

push element O(1)
pop element O(1)
shift element O(?) I hope O(1)
unshift element O(?) I hope O(1)
insert element O(n)
remove element O(n)


Simon Strandgaard

[Brian Schroeder spam0504@bssoftware.de, 2004-05-20 14.23 CEST]

Thanks for your answer. I should have looked in the raa by myself. Just a
question regarding array as deque. How efficient is it? Is all the data
shifted through the array every time I push something at the front and
remove it from the back, or is it organized more efficiently?

If I’m not mistaken, #push (appends at the end), #pop (remove last) and
#shift (remove first) are efficient (#shift moves the pointer). #unshift
(insert at beginning) moves all elements.

But I can be mistaken.

One of the great things about ruby is how easy it is to test things
like this :slight_smile:

mark@imac% ruby
require ‘profile’
def t_pop(ary) 5000.times{|n| ary.pop} end
def t_push(ary) 5000.times{|n| ary.push n} end
def t_shift(ary) 5000.times{|n| ary.shift} end
def t_unshift(ary) 5000.times{|n| ary.unshift n} end

a=
t_push a
t_pop a
t_unshift a
t_shift a
% cumulative self self total
time seconds seconds calls ms/call ms/call name
71.77 11.06 11.06 4 2765.00 3850.00 Integer#times
8.37 12.35 1.29 5000 0.26 0.26 Array#unshift
7.46 13.50 1.15 5000 0.23 0.23 Array#pop
7.40 14.64 1.14 5000 0.23 0.23 Array#push
4.93 15.40 0.76 5000 0.15 0.15 Array#shift
0.84 15.53 0.13 1 130.00 130.00
Profiler__.start_profile
0.06 15.54 0.01 1 10.00 3720.00 Object#t_shift
0.00 15.54 0.00 4 0.00 0.00
Module#method_added
0.00 15.54 0.00 1 0.00 3860.00 Object#t_pop
0.00 15.54 0.00 1 0.00 15410.00 #toplevel
0.00 15.54 0.00 1 0.00 4080.00 Object#t_unshift
0.00 15.54 0.00 1 0.00 3750.00 Object#t_push

Interpretation: it looks like unshift is slightly (~6%) slower than
pop. shift is barely faster than push. Still, they are all comparable
in speed.

I ran this test a few times, with similar results.

cheers,
–Mark

···

On May 20, 2004, at 5:23 AM, Brian Schroeder wrote:

Thanks for your answer. I should have looked in the raa by myself.
Just a
question regarding array as deque. How efficient is it? Is all the data
shifted through the array every time I push something at the front and
remove it from the back, or is it organized more efficiently?

my guess … please correct me if im wrong


push element O(1)
amortized O(1), I think
pop element O(1)
shift element O(?) I hope O(1)
O(1)
unshift element O(?) I hope O(1)
O(n)

···

On Thu, May 20, 2004 at 09:45:12PM +0900, Simon Strandgaard wrote:

insert element O(n)
remove element O(n)


Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

  • gb notes that fdisk thinks his cdrom can store one terabyte
    – Seen on #Linux

8< ----- snip -----

I don’t know how Array is implemented… I think matz (Ruby’s author) has
chosen a datastructure with constant-time head/tail insert/removal.
Not sure though.

It would be nice with an O(x) chart of the most common operations:

my guess … please correct me if im wrong


push element O(1)
pop element O(1)
shift element O(?) I hope O(1)
unshift element O(?) I hope O(1)
insert element O(n)
remove element O(n)


Simon Strandgaard

Assuming that the Array is set up like a C/C++ (double) linked list
with pointers, then all should take Theta(1), except insert which can
take Theta(n) if the element is out of the Array boundary and all
intermediate items need to be created first. If insert/remove
specifies inserting elements into a sorted list then it can be done in
Theta(log n) by using a binary search to find the location to insert.

But look what happens for larger data,
% cumulative self self total
time seconds seconds calls ms/call ms/call name
56.80 33.00 33.00 100000 0.33 0.33 Array#unshift
34.77 53.20 20.20 4 5050.00 14525.00 Integer#times
4.87 56.03 2.83 100000 0.03 0.03 Array#pop
1.94 57.16 1.13 100000 0.01 0.01 Array#push
1.62 58.10 0.94 100000 0.01 0.01 Array#shift
0.02 58.11 0.01 1 10.00 10.00 Profiler__.start_profile
0.00 58.11 0.00 4 0.00 0.00 Module#method_added
0.00 58.11 0.00 1 0.00 6520.00 Object#t_pop
0.00 58.11 0.00 1 0.00 6390.00 Object#t_shift
0.00 58.11 0.00 1 0.00 58100.00 #toplevel
0.00 58.11 0.00 1 0.00 39010.00 Object#t_unshift
0.00 58.11 0.00 1 0.00 6180.00 Object#t_push

···

— Mark Hubbart discord@mac.com wrote:

On May 20, 2004, at 5:23 AM, Brian Schroeder wrote:

Thanks for your answer. I should have looked in the raa by myself.
Just a
question regarding array as deque. How efficient is it? Is all the data
shifted through the array every time I push something at the front and
remove it from the back, or is it organized more efficiently?

One of the great things about ruby is how easy it is to test things
like this :slight_smile:

mark@imac% ruby
require ‘profile’
def t_pop(ary) 5000.times{|n| ary.pop} end
def t_push(ary) 5000.times{|n| ary.push n} end
def t_shift(ary) 5000.times{|n| ary.shift} end
def t_unshift(ary) 5000.times{|n| ary.unshift n} end

a=
t_push a
t_pop a
t_unshift a
t_shift a
% cumulative self self total
time seconds seconds calls ms/call ms/call name
71.77 11.06 11.06 4 2765.00 3850.00 Integer#times
8.37 12.35 1.29 5000 0.26 0.26 Array#unshift
7.46 13.50 1.15 5000 0.23 0.23 Array#pop
7.40 14.64 1.14 5000 0.23 0.23 Array#push
4.93 15.40 0.76 5000 0.15 0.15 Array#shift
0.84 15.53 0.13 1 130.00 130.00
Profiler__.start_profile
0.06 15.54 0.01 1 10.00 3720.00 Object#t_shift
0.00 15.54 0.00 4 0.00 0.00
Module#method_added
0.00 15.54 0.00 1 0.00 3860.00 Object#t_pop
0.00 15.54 0.00 1 0.00 15410.00 #toplevel
0.00 15.54 0.00 1 0.00 4080.00 Object#t_unshift
0.00 15.54 0.00 1 0.00 3750.00 Object#t_push

Interpretation: it looks like unshift is slightly (~6%) slower than
pop. shift is barely faster than push. Still, they are all comparable
in speed.

I ran this test a few times, with similar results.


Do you Yahoo!?
Yahoo! Domains – Claim yours for only $14.70/year
http://smallbusiness.promotions.yahoo.com/offer

How much does it cost for an array access in the middle? Why does it cost
order(n) to remove the tail? Why doesn’t it float backwards until it needs
memory ahead of it for instance? I’d be curious to now a bit more of about
this, as well as how the other datastructures are implemented.

Charlie

···

On Thu, 20 May 2004, Mauricio Fernández wrote:

On Thu, May 20, 2004 at 09:45:12PM +0900, Simon Strandgaard wrote:

my guess … please correct me if im wrong


push element O(1)
amortized O(1), I think
pop element O(1)
shift element O(?) I hope O(1)
O(1)
unshift element O(?) I hope O(1)
O(n)
insert element O(n)
remove element O(n)

oops. I messed up - I plead sleep deprivation.

Here’s a different test I ran: create arrays of three different sizes,
time various operations on them. Here’s the results:

Operation Elements Seconds per
in Array Operation

···

On May 20, 2004, at 9:23 AM, Jeff Mitchell wrote:

Interpretation: it looks like unshift is slightly (~6%) slower than
pop. shift is barely faster than push. Still, they are all comparable
in speed.

I ran this test a few times, with similar results.

But look what happens for larger data,


Array#unshift - 10000: 0.00091sec
Array#unshift - 100000: 0.02521sec
Array#unshift - 1000000: 0.14279sec
Array#unshift - 10000000: 1.45725sec
Array#shift - 10000: 0.00011sec
Array#shift - 100000: 0.00010sec
Array#shift - 1000000: 0.00011sec
Array#shift - 10000000: 0.00002sec
Array#push - 10000: 0.00004sec
Array#push - 100000: 0.00011sec
Array#push - 1000000: 0.00011sec
Array#push - 10000000: 0.44526sec
Array#pop - 10000: 0.00003sec
Array#pop - 100000: 0.00003sec
Array#pop - 1000000: 0.00010sec
Array#pop - 10000000: 0.00010sec
Array#delete_at - 10000: 0.00122sec
Array#delete_at - 100000: 0.01302sec
Array#delete_at - 1000000: 0.14309sec
Array#delete_at - 10000000: 1.44485sec
Array#insert - 10000: 0.00036sec
Array#insert - 100000: 0.02693sec
Array#insert - 1000000: 0.12566sec
Array#insert - 10000000: 3.38137sec

I’ll let you draw your own conclusions, but it seems fairly simple.

cheers,
–Mark

How much does it cost for an array access in the middle?

O(1), since an Array is implemented as a C array of VALUEs.

Why does it cost order(n) to remove the tail? Why doesn’t it float
backwards until it needs memory ahead of it for instance?

Array uses internally an array of VALUEs, array->ptr.
When you unshift a VALUE, it has to be inserted at array->ptr[0]; all
elements must be shifted, hence O(n). You cannot just do array->ptr–
since you’d fall outside the malloc()ed area.

I’d be curious to now a bit more of about
this, as well as how the other datastructures are implemented.

I recommend you read the sources, they are very easy to follow. For
instance, to see how unshift is implemented, just look at array.c, and
scroll down to the method definition to find the associated C function.
rb_define_method(rb_cArray, “unshift”, rb_ary_unshift_m, -1);

···

On Fri, May 21, 2004 at 07:23:47AM +0900, Charles Comstock wrote:
================


Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

baz bat bamus batis bant.
– James Troup