Memory-efficient set of Fixnums

Ryan Davis wrote in post #1061877:

Also... smells like homework.

But if not, it reminds me of this article:

Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory - High Scalability -

Hi,

Whether it is a homework or not, it is a very realistic problem that we
may encounter in our daily programming.

Absolutely! I know all my databases store off tens of millions of unique integers that I iterate over and search against all the time. That's pretty much all I've been doing over my last 22 years of professional development.

The article is about using probabilistic algorithms with some level of
error.

Some level of _precalculated_ error, which is what makes this an interesting solution.

I think we all assumed that the original poster wants an error
of zero. (If the problem could not be solved using several gigabytes of
RAM, or if the computation time took too long, then probabilistic
algorithms ought to be considered.)

Making the assumption that requirements as stated are correct is what dooms software to be bad. Using a 512 megabyte in-memory bitmap is NOT a _good_ solution, it's just that it is one of the simplest solutions that meets all the (ridiculous) requirements.

···

On May 23, 2012, at 4:09 PM, Admin Tensor wrote:

+1

The Pickaxe has a good section about this (it's actually about how to
integrate with a C library, but if you know C and know how to make Ruby
talk to C, then you should be able to get it).

···

On Thu, May 24, 2012 at 1:22 PM, Admin Tensor <lists@ruby-forum.com> wrote:

George Dupre wrote in post #1062009:
>
> I will also look into this google_hash gem, it should also be useful.

Hi,

I think in Ruby speed is not the primary objective. So if you need the
code to run faster (by about the same order of magnitude), probably you
can transform the critical part into C.

Robert Klemme wrote in post #1061860:

Based on your parameter description, I think the bitmap suggested by
Brian Candler is the best choice.

Just for the record: that is the same approach as I suggested earlier.
Basis for a bit set which uses String as storage. Storage is split up into multiple String instances to avoid too large allocations. · GitHub

Cheers

robert

Hi Robert,

I am sorry that I did not look at your github.

Brian Candler gave an explicit space feasibility for the problem (half
gigabyte of RAM) but he does not explicitly show how to do the bitset.
You gave the explicit code how to do it, but the code does not
explicitly say how much RAM will be needed for the problem at hand.

I am glad that the original poster finally could handle the problem.

Regards,

Bill

···

On Wed, May 23, 2012 at 6:44 PM, Admin Tensor <lists@ruby-forum.com> > wrote:

--
Posted via http://www.ruby-forum.com/\.

Ryan Davis wrote in post #1061894:

Making the assumption that requirements as stated are correct is what
dooms software to be bad. Using a 512 megabyte in-memory bitmap is NOT a
_good_ solution, it's just that it is one of the simplest solutions that
meets all the (ridiculous) requirements.

I think we just helped the original poster to handle the software
problem. Whether the requirements are solid or not, I think it is
better to leave it out to the original poster.

Regards,

Bill

···

--
Posted via http://www.ruby-forum.com/\.

I'm not so sure that we helped.

···

On May 23, 2012, at 5:53 PM, Admin Tensor wrote:

Ryan Davis wrote in post #1061894:

Making the assumption that requirements as stated are correct is what
dooms software to be bad. Using a 512 megabyte in-memory bitmap is NOT a
_good_ solution, it's just that it is one of the simplest solutions that
meets all the (ridiculous) requirements.

I think we just helped the original poster to handle the software
problem.