Good or best way to allocate a large array

Ralph Shnelvar wrote:

Marnen,

Unless "scope" means something different in Ruby than it does in other
languages, it would appear that the garbage collector cannot possibly
free memory when something goes out of scope but, instead, when the
UseCount goes to zero.
  

Except that uses aren't counted in Ruby. Ruby has a mark-and-sweep garbage collector. When Ruby runs out of memory a garbage collection run is triggered. If there aren't any references to the objects in question left, they aren't reachable from the root set and thus not marked in the first phase of GC, in the second phase they will then (too be conservative: most likely) be freed.

···

--
Florian Frank

Is there any way to tell what architecture Ruby thinks I'm using?

> Cheers

> robert
  
Check the size of an integer.

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ irb
irb(main):001:0> 1.size
=> 8

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
$ irb
irb(main):001:0> 1.size
=> 4

-Justin

Ralph Shnelvar wrote:

> Ralph Shnelvar wrote:
> [...]

Could someone point me at documentation on narray, please.

> You could have found it in about 5 seconds with Google. Please try
that
> before posting in future.

I am new to Ruby.

But not to Google, surely.

There are/were indications that this is an incmoplete project.

Indeed, as a newbie, I attempted to install/expand the tar file and
could not get the installation to work.

Hence, my request which I had hoped would point me at the most recent
information.

Then you probably would have received more helpful answers if you had
said that -- without that context, most people would just point you to
the obvious docs where you'd already been.

Please include all relevant information when asking for help. You'll
get better answers if you do that.

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org

--
Posted via http://www.ruby-forum.com/\.

Florian,

If there is no use count, how does Ruby know that there are no
references to the object that should be freed?

Ralph

Monday, November 9, 2009, 3:02:37 PM, you wrote:

···

Ralph Shnelvar wrote:

Marnen,

Unless "scope" means something different in Ruby than it does in other
languages, it would appear that the garbage collector cannot possibly
free memory when something goes out of scope but, instead, when the
UseCount goes to zero.
  

Except that uses aren't counted in Ruby. Ruby has a mark-and-sweep
garbage collector. When Ruby runs out of memory a garbage collection run
is triggered. If there aren't any references to the objects in question
left, they aren't reachable from the root set and thus not marked in the
first phase of GC, in the second phase they will then (too be
conservative: most likely) be freed.

--
Best regards,
Ralph mailto:ralphs@dos32.com

Justin,

Is there any way to tell what architecture Ruby thinks I'm using?

> Cheers

> robert
  
Check the size of an integer.

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ irb

irb(main):001:0>> 1.size
=>> 8

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
$ irb

irb(main):001:0>> 1.size
=>> 4

What a lovely and elegant solution!

Thanks!

Ralph

> There are/were indications that this is an incmoplete project.

Some of the docs do make it look this way, but I think most who have
used it would agree that narray is quite mature. For what it does, it
seems to be very complete and very robust, and it has been around a
long time and seen much use.

> Indeed, as a newbie, I attempted to install/expand the tar file and
> could not get the installation to work.

I have used it as a gem and built it from source on many different
machines with no trouble at all. It's one of the few gems requiring
compilation I wouldn't hesitate using for any cross-platform project
since it has no dependencies to speak of and seems to compile without
a hitch every time. I'd be happy to help if you want to email me
personally.

As Florian said, MRI ruby uses a mark-sweep GC algorithm, when it
needs to GC it traces out references from a set of root objects, and
marks any objects which are reachable recursively.

Then it frees any objects which aren't marked.

Reference counting might seem simpler, but it is expensive because the
count needs to be maintained everytime a reference changes, and it
can't reclaim cyclic garbage which can't be reached from a root:

       a -> b -> c
       ^ |

···

On Mon, Nov 9, 2009 at 5:46 PM, Ralph Shnelvar <ralphs@dos32.com> wrote:

Florian,

If there is no use count, how does Ruby know that there are no
references to the object that should be freed?

       *------------+

Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Ralph

Monday, November 9, 2009, 4:06:23 PM, you wrote:

···

On Mon, Nov 9, 2009 at 5:46 PM, Ralph Shnelvar <ralphs@dos32.com> wrote:

Florian,

If there is no use count, how does Ruby know that there are no
references to the object that should be freed?

As Florian said, MRI ruby uses a mark-sweep GC algorithm, when it
needs to GC it traces out references from a set of root objects, and
marks any objects which are reachable recursively.

Then it frees any objects which aren't marked.

Reference counting might seem simpler, but it is expensive because the
count needs to be maintained everytime a reference changes, and it
can't reclaim cyclic garbage which can't be reached from a root:

       a -> b -> c
       ^ |
       *------------+

Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

--
Best regards,
Ralph mailto:ralphs@dos32.com

* Rick DeNatale <rick.denatale@gmail.com> (2009-11-10) schrieb:

Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

mfg, simon .... l

Hi,

You can manually kickstart the GC by using GC.start.

   http://ruby-doc.org/core/classes/GC.html#M003682

Otherwise, AFAIK, the MRI GCs before breaking certain memory barriers, but
don't quote me on that. (I think it does that by counting mallocs and
starting the GC once a certain number is hit)

This actually explains this behavior:

   $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
   real 0m4.091s
   user 0m3.430s
   sys 0m0.361s

   $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {1<<63}'

   real 0m2.489s
   user 0m1.845s
   sys 0m0.565s

But be aware though, that this is not caused by the allocation
of the array:

   $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'

   real 0m0.846s
   user 0m0.735s
   sys 0m0.049s

   $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {x}'

   real 0m0.778s
   user 0m0.710s
   sys 0m0.052s

So, before (knowingly) breaking those limits, it might be an option
to disable the GC. Handle with care, though.

Regards,
Florian

···

On Nov 9, 2009, at 9:05 PM, Ralph Shnelvar wrote:

Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Ralph

Ralph Shnelvar wrote:

Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Perhaps. But there's no reason to do this. Use a database. Even a
file-based DB like SQLite will make your life easier.

Ralph

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

No, a copying GC doesn't substitute for the sweep phase of mark and sweep.

Instead it works by tracking references to new objects from old
objects, and copying live objects to a new generation space, leaving
new objects which are not reachable behind. After an object has
survived a number of such scavenging cycles it gets moved to old space
which is then usually gc'd using another method such as mark and
sweep. A copying GC does a bit of marking via a write barrier which
notes when a reference to a 'young' object has been stored in an 'old'
object, and combines the traversal of the young object space with
collection by moving surviving objects.

It's efficient because the overhead of eliminating young dead objects
is quite a bit lower, and typically there is a lot of infant mortality
in a highly object oriented system.

···

On Thu, Nov 12, 2009 at 10:05 AM, Simon Krahnke <overlord@gmx.li> wrote:

* Rick DeNatale <rick.denatale@gmail.com> (2009-11-10) schrieb:

Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Florian,

Again, I am a newbie but ...

What I _think_ you are saying is that the difference between:

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
real 0m4.091s

and

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'
real 0m0.846s

is that the first initializes 300_000*20 BigNums and the second initializes 300_000*20 references to a BigNum.

Ralph

Monday, November 9, 2009, 7:32:06 PM, you wrote:

···

Hi,

You can manually kickstart the GC by using GC.start.

   module GC - RDoc Documentation

Otherwise, AFAIK, the MRI GCs before breaking certain memory barriers,
but
don't quote me on that. (I think it does that by counting mallocs and
starting the GC once a certain number is hit)

This actually explains this behavior:

   $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
   real 0m4.091s
   user 0m3.430s
   sys 0m0.361s

   $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {1<<63}'

   real 0m2.489s
   user 0m1.845s
   sys 0m0.565s

But be aware though, that this is not caused by the allocation
of the array:

   $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'

   real 0m0.846s
   user 0m0.735s
   sys 0m0.049s

   $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {x}'

   real 0m0.778s
   user 0m0.710s
   sys 0m0.052s

So, before (knowingly) breaking those limits, it might be an option
to disable the GC. Handle with care, though.

Regards,
Florian

On Nov 9, 2009, at 9:05 PM, Ralph Shnelvar wrote:

Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Ralph

--
Best regards,
Ralph mailto:ralphs@dos32.com

Marnen Laibow-Koser wrote:

[...] there's no reason to do this. Use a database. Even a
file-based DB like SQLite will make your life easier.

What a great idea. I suggest we email all the image-processing
developers and suggest them to use an RDBM instead of memory arrays.

···

--
Posted via http://www.ruby-forum.com/\.

Ralph,

Well, that's a minor point. I just reused Roberts examples because
I expected that to be rather clear.

The thing is that an Array of size (300_000*20) already has as many
slots for references. So it's not "initializing references". It just
makes every field of the Array referencing that one BigNum. There is
no allocation or initialization happening in this step.

As you correctly stated, the first example allocs (300_000*20 + 1)
objects. The interesting point in this is that it triggers the
GC more than once, although there is nothing to collect.

Regards,
Florian

···

On Nov 10, 2009, at 4:30 AM, Ralph Shnelvar wrote:

Florian,

Again, I am a newbie but ...

What I _think_ you are saying is that the difference between:

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
real 0m4.091s

and

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'
real 0m0.846s

is that the first initializes 300_000*20 BigNums and the second initializes 300_000*20 references to a BigNum.

Ralph

Monday, November 9, 2009, 7:32:06 PM, you wrote:

> Hi,

> You can manually kickstart the GC by using GC.start.

> module GC - RDoc Documentation

> Otherwise, AFAIK, the MRI GCs before breaking certain memory barriers,
> but
> don't quote me on that. (I think it does that by counting mallocs and
> starting the GC once a certain number is hit)

> This actually explains this behavior:

> $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
> real 0m4.091s
> user 0m3.430s
> sys 0m0.361s

> $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {1<<63}'

> real 0m2.489s
> user 0m1.845s
> sys 0m0.565s

> But be aware though, that this is not caused by the allocation
> of the array:

> $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'

> real 0m0.846s
> user 0m0.735s
> sys 0m0.049s

> $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {x}'

> real 0m0.778s
> user 0m0.710s
> sys 0m0.052s

> So, before (knowingly) breaking those limits, it might be an option
> to disable the GC. Handle with care, though.

> Regards,
> Florian

> On Nov 9, 2009, at 9:05 PM, Ralph Shnelvar wrote:

Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Ralph

--
Best regards,
Ralph mailto:ralphs@dos32.com

--
Florian Gilcher

smtp: flo@andersground.net
jabber: Skade@jabber.ccc.de
gpg: 533148E2

The thing is that an Array of size (300_000*20) already has as many
slots for references. So it's not "initializing references". It just
makes every field of the Array referencing that one BigNum. There is
no allocation or initialization happening in this step.

Right. Only the single BigNum and the Array need to be allocated.

As you correctly stated, the first example allocs (300_000*20 + 1)
objects. The interesting point in this is that it triggers the
GC more than once, although there is nothing to collect.

Well, but this is expected: the second example has more need for
memory (every time a new BigNum must be created), and if memory is
needed it is reasonable to first start GC before asking the OS for
more memory. GC cannot know that it will not find anything
collectible.

Kind regards

robert

···

2009/11/10 Florian Gilcher <flo@andersground.net>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Florian,

The thing is that an Array of size (300_000*20) already has as many
slots for references. So it's not "initializing references". It just
makes every field of the Array referencing that one BigNum. There is
no allocation or initialization happening in this step.

There has to be initialization, doesn't there?

irb(main):001:0> a=Array.new(10)
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
irb(main):002:0> a[5]
=> nil

···

--
Best regards,
Ralph mailto:ralphs@dos32.com

It was not meant as a complaint, more as an explanation why the behavior
in my first mail happens (~~40% performance gain by switching off the GC
before allocating the array). It is just a sample where the heuristic
applied by the GC somewhat fails.

The GC behavior is reasonable, but not efficient in that case. In most cases,
this one second does not really matter, so I would recommend against
tampering with it. Thats why I wrote that you should handle this with care.

It is, in the end, a nice example illustrating how the GC behaves and
that's what I used it for.

Regards,
Florian

···

On Nov 10, 2009, at 10:39 AM, Robert Klemme wrote:

As you correctly stated, the first example allocs (300_000*20 + 1)
objects. The interesting point in this is that it triggers the
GC more than once, although there is nothing to collect.

Well, but this is expected: the second example has more need for
memory (every time a new BigNum must be created), and if memory is
needed it is reasonable to first start GC before asking the OS for
more memory. GC cannot know that it will not find anything
collectible.

Kind regards

robert

What's about using C or C++ to allocate and deallocate the array? I do not
think Ruby was made to create such Arrays.

Jonathan Schmidt-Dominé - Developer wrote:

What's about using C or C++ to allocate and deallocate the array?

Unnecessary. Just use a database.

I do
not
think Ruby was made to create such Arrays.

What gives you that idea?

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.