Good or best way to allocate a large array

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Ralph

Ralph Shnelvar wrote:

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or
64-bit quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Ralph

I don't know when arrays stop being as performant as one could wish. For
a problem of this size, it seems to me that one may want instead to use
a database, such as sqlite, or a stronger beast? I expect it'll make
your code more readable, too.
Is there a particular reason to not make the jump to a database?

···

--
Posted via http://www.ruby-forum.com/\.

Ralph Shnelvar wrote:

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or
64-bit quantities.

Dude, you want a database. There's little reason in any language to
have an array that big.

Does Ruby support native 64-bit quantities?

I'm not sure, but with transparent Bignum support, it doesn't matter
that much.

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

The answer to both questions: as far as I know, it will be deallocated
when it goes out of scope.

Ralph

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit quantities.

So we are talking about 300,000 * 20 * 8 ~ 48MB which does not sound too large.

robert@fussel:~$ time ruby1.9 -e 'x=1<<63;Array.new(300_000*20) {x}'

real 0m1.922s
user 0m1.288s
sys 0m0.032s
robert@fussel:~$ time ruby1.9 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'

real 0m7.143s
user 0m6.668s
sys 0m0.204s

Does Ruby support native 64-bit quantities?

Yes. You can try it out

ruby1.9 -e '100.times {|i| x = 1 << i;p x, x.class}'

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

No need to: if there are no more live references to that Array it will get collected eventually.

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Probably not because of the number of objects in the Array (if they are all Fixnums it will probably a bit faster.

The question is, what do you do with that huge Array? Chances are that you need to allocate more memory during processing. What's your use case? A database might be an option or a flat file - it all depends.

Kind regards

  robert

···

On 11/09/2009 06:25 PM, Ralph Shnelvar wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ralph Shnelvar wrote:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or
64-bit quantities.

Does Ruby support native 64-bit quantities?

If you want to hold this all in RAM, then I think your most
space-efficient way would be to pack each row of 20 columns into a
String (of 80 or 160 bytes).

You can use Array#pack and String#unpack to convert when required.

val1 = 0x0102030405060708
val2 = 0xf0e0d0c0b0a09080
row = [val1, val2].pack("Q*")
p row.unpack("Q*")

Q* is for unsigned 64-bit. Use q* for signed 64, L* or l* for
unsigned/signed 32 bit. With 32-bit there are also options for using a
fixed byte ordering (little or big endian) rather than native byte
ordering.

Warning: take care if using ruby 1.9, because 1.9 Strings are of
"characters" and not necessarily of "bytes". This may become critical if
you decide to write these structures out to disk and read them back in
again. Details at

···

--
Posted via http://www.ruby-forum.com/\.

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

···

On 11/9/09, Ralph Shnelvar <ralphs@dos32.com> wrote:

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit
quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Marnen,

I am a Ruby newbie but as far as I know, memory gets freed when the
garbage collector gets around to doing it.

Ralph

···

The answer to both questions: as far as I know, it will be deallocated
when it goes out of scope.

Robert,

Doesn't that just move things into a BiNum ... which is not native 64?

Ralph

Monday, November 9, 2009, 2:35:26 PM, you wrote:

···

Does Ruby support native 64-bit quantities?

Yes. You can try it out

ruby1.9 -e '100.times {|i| x = 1 << i;p x, x.class}'

--
Best regards,
Ralph mailto:ralphs@dos32.com

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit
quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

Actually everybody waited for you to finally come up with your reply. Shame on you that it took you so long. :wink:

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

Certainly. However, if Bignums work as well, why worry to use another lib? Whether the allocation overhead is OK or not depends of course on the use case. Btw, did we see a more concrete description of the problem? I cannot remember having seen it.

Kind regards

  robert

···

On 12.11.2009 21:24, Caleb Clausen wrote:

On 11/9/09, Ralph Shnelvar <ralphs@dos32.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Could someone point me at documentation on narray, please.

···

On 11/9/09, Ralph Shnelvar <ralphs@dos32.com> wrote:

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit
quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

--
Best regards,
Ralph mailto:ralphs@dos32.com

Ralph Shnelvar wrote:

Marnen,

I am a Ruby newbie but as far as I know, memory gets freed when the
garbage collector gets around to doing it.

Right. And as far as I know, it will do that when the object goes out
of scope.

Ralph

> The answer to both questions: as far as I know, it will be
deallocated
> when it goes out of scope.

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Please do not top post.

···

2009/11/9 Ralph Shnelvar <ralphs@dos32.com>:

Doesn't that just move things into a BiNum ... which is not native 64?

Well, define "native". I chose to interpret it as "does Ruby come
with built in, C implemented support for 64 integers" - which it does.
:slight_smile:

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ralph Shnelvar wrote:
[...]

Could someone point me at documentation on narray, please.

You could have found it in about 5 seconds with Google. Please try that
before posting in future.

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Could someone point me at documentation on narray, please.

http://narray.rubyforge.org/

Now that I'm looking at it, I see that it says:

Supported element types are 1/2/4-byte Integer

No 8-byte integers... maybe you could use two parallel arrays, one for
low half and one for high half of your data. Depends on what you're
using this for.

···

On 11/13/09, Ralph Shnelvar <ralphs@dos32.com> wrote:

Friday, November 13, 2009, 9:45:10 AM, you wrote:

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit
quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

Actually everybody waited for you to finally come up with your reply.
Shame on you that it took you so long. :wink:

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

Certainly. However, if Bignums work as well, why worry to use another
lib? Whether the allocation overhead is OK or not depends of course on
the use case. Btw, did we see a more concrete description of the
problem? I cannot remember having seen it.

I am the OP and there really isn't much more to the problem.

Basically, each row of the array is a list of integers. The first
element of the array is an id. The remaining elements of the array are
a list of ids.

Thus, if
MyMatrix = [ [100, [200, 300, 400]],
             [200, [300, 400, 500, 600]],
             [300, [400, 600, 555, 1902, 1708]],
             [400, 101, 200, 100] ]

If I am processing the last vector (i.e. [400, 101, 200, 100]) then I
want to be able to note that

101 has no corresponding element
200 was found and its remaining content is [300, 400, 500, 600]
100 was found and its remaining content is [200, 300, 400]

Speed is critical. The array is more-or-less static. That is, access
to this array swamps any changes to the array.

The order of the rows is irrelevant ... and for speed reasons the
Matrix will be sorted by the first column so that I can do a binary
search.

Of course, I could use a hash ... but I doubt that a hash would beat
my binary search ... I could be wrong about this.

···

On 12.11.2009 21:24, Caleb Clausen wrote:

On 11/9/09, Ralph Shnelvar <ralphs@dos32.com> wrote:

Kind regards

        robert

--
Best regards,
Ralph mailto:ralphs@dos32.com

Marnen,

Unless "scope" means something different in Ruby than it does in other
languages, it would appear that the garbage collector cannot possibly
free memory when something goes out of scope but, instead, when the
UseCount goes to zero.

Ralph

Monday, November 9, 2009, 12:31:21 PM, you wrote:

···

Ralph Shnelvar wrote:

Marnen,

I am a Ruby newbie but as far as I know, memory gets freed when the
garbage collector gets around to doing it.

Right. And as far as I know, it will do that when the object goes out
of scope.

Ralph

> The answer to both questions: as far as I know, it will be
deallocated
> when it goes out of scope.

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org

--
Best regards,
Ralph mailto:ralphs@dos32.com

Robert,

Tuesday, November 10, 2009, 1:23:48 AM, you wrote:

Please do not top post.

Yup, you are right. The culture here is to do it this way.

Thanks for pointing it out.

While I'm at it, does this list like/dislike HTML-based posts?

Doesn't that just move things into a BiNum ... which is not native 64?

Well, define "native". I chose to interpret it as "does Ruby come
with built in, C implemented support for 64 integers" - which it does.
:slight_smile:

There is a footnote on page 829 of the 1.9 Pickaxe book that reads:

"Fixnums are stored as 31-bit numbers ... Or 63 bit on wider CPU
architectures."

Is there any way to tell what architecture Ruby thinks I'm using?

···

2009/11/9 Ralph Shnelvar <ralphs@dos32.com>:

Cheers

robert

--
Best regards,
Ralph mailto:ralphs@dos32.com

* Caleb Clausen <vikkous@gmail.com> (07:23) schrieb:

···

On 11/13/09, Ralph Shnelvar <ralphs@dos32.com> wrote:

Could someone point me at documentation on narray, please.

http://narray.rubyforge.org/

Now that I'm looking at it, I see that it says:

Supported element types are 1/2/4-byte Integer

No 8-byte integers... maybe you could use two parallel arrays, one for
low half and one for high half of your data. Depends on what you're
using this for.

Or look into the source, and eventually patch it. It might already
support 8 bytes on 64bit, and just not document that, or it might
be an easy change.

mfg, simon .... l

Ralph Shnelvar wrote:
[...]

Could someone point me at documentation on narray, please.

You could have found it in about 5 seconds with Google. Please try that
before posting in future.

I am new to Ruby.

There are/were indications that this is an incmoplete project.

Indeed, as a newbie, I attempted to install/expand the tar file and
could not get the installation to work.

Hence, my request which I had hoped would point me at the most recent
information.

Ralph Shnelvar wrote:

Of course, I could use a hash ... but I doubt that a hash would beat
my binary search ... I could be wrong about this.

You are almost certainly wrong about this. Ruby Hash implementation is
in C and very efficient. I'm sure this would be the quickest way to find
the row with a given key.

The array is more-or-less static. That is, access
to this array swamps any changes to the array.

Especially good for a hash.

So my advice is: write it the simplest possible way first. Then analyse
(a) the speed, and/or (b) the memory usage, if they are problematic.

There's a difference between "quickest" and "quick enough". If you were
looking for "quickest" you wouldn't be using Ruby in the first place.
But if you value your development time, you could implement this with a
Hash in 10 minutes.

300K rows and 20 columns

If those are values that fit in a Fixnum then that's only 24MB in total
on a 32-bit machine, or 48MB on a 64-bit machine. So just do the
obvious. If a few of those values are Bignums, no big deal.

If you need better packing, you can always just allocate packed strings.
For example, you can pack 20 32-bit integers (signed or unsigned) into
an 80 byte string, and unpack the row again later. Look at Array#pack
and String#unpack

···

--
Posted via http://www.ruby-forum.com/\.