Looking for a Fast Persistent Store

Hi,

I'm looking for a persistent store, where:
  * I can use it from Ruby
  * it is fast
  * it is transactional
  * it can update more than one store in a single transaction
  * simple String -> String mappings in the store are sufficient
  * ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes or BTrees (or both). I know that Sleepycat's Berkeley DB <http://www.sleepycat.com/> does this, and I've used Java based systems that do this. I also know of some C based things but they don't have Ruby wrappers. I can't find anything in the Ruby world, and I don't care too much if it isn't pure Ruby.

I know about Purple <http://purple.rubyforge.org/> and the QDMB Ruby wrapper <http://qdbm.sourceforge.net/>. Neither do the multiple hash/BTree in a transaction thing.

The trick appears to be with transactions.

I know this can be done easily enough in a Relational DB, but I know that, for example, JDBC/MySQL combination is significantly slower at what I need to do than Perst (a pure Java thing that's startlingly fast). I've taken a look at sqlite, which satisfies all requirements but I can't shake this feeling that things can be better.

Does anyone have any ideas?

Thanks,
Bob

···

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

Hi,

I'm looking for a persistent store, where:
  * I can use it from Ruby
  * it is fast
  * it is transactional
  * it can update more than one store in a single transaction
  * simple String -> String mappings in the store are sufficient
  * ideally uses files in the filesystem

...

Does anyone have any ideas?

Thanks,
Bob

Hi Bob,

If you're the fun and adventurous type you might want to look into
Mongoose. ( http://rubyforge.org/projects/mongoose/ ) It's new, and
claims to be fast. In the past I wrote some small apps that used the
same authors other system called KirbyBase which was very nice, I
haven't tried Mongoose yet, but I'm happily awaiting a problem with
which to use it on. Though now that I'm thinking about it I don't
know if it has built-in "transaction" support. I suppose it depends on
what you want it to do.

Have you tried SQLite and found it to be too slow? It sounds to me
like you might be prematurely optimizing. We have several medium to
medium-large sized apps that make heavy use of SQLite and speed has
been not a problem at all.

What domain are you working in that requires this kind of speed
anyway? Just curious.

Hope that helps,
-Harold

···

On 8/9/06, Bob Hutchison <hutch@recursive.ca> wrote:

What about PStore?

  robert

···

On 09.08.2006 15:30, Bob Hutchison wrote:

Hi,

I'm looking for a persistent store, where:
* I can use it from Ruby
* it is fast
* it is transactional
* it can update more than one store in a single transaction
* simple String -> String mappings in the store are sufficient
* ideally uses files in the filesystem

Hi,

I'm looking for a persistent store, where:
* I can use it from Ruby
* it is fast
* it is transactional
* it can update more than one store in a single transaction
* simple String -> String mappings in the store are sufficient
* ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes or
BTrees (or both). I know that Sleepycat's Berkeley DB
<Oracle Berkeley DB Downloads; does this, and I've used Java based systems that
do this. I also know of some C based things but they don't have Ruby
wrappers. I can't find anything in the Ruby world, and I don't care too much
if it isn't pure Ruby.

I know about Purple <http://purple.rubyforge.org/&gt; and the QDMB Ruby wrapper <http://qdbm.sourceforge.net/&gt;\. Neither do the multiple hash/BTree in a transaction thing.

The trick appears to be with transactions.

check out joel's fsdb - it's very nice if you want to go pure ruby. i've used
it on several projects.

I know this can be done easily enough in a Relational DB, but I know that,
for example, JDBC/MySQL combination is significantly slower at what I need
to do than Perst (a pure Java thing that's startlingly fast). I've taken a
look at sqlite, which satisfies all requirements but I can't shake this
feeling that things can be better.

Does anyone have any ideas?

sqlite is hard to beat - i use it in at least 20 production systems and have
never had a single error. ruby queue uses it under the hood for the cluster
job store and this is used over nfs - we've been through disk failures and
power failures and come through unscathed.

it's also very fast, esp if you use in memory tables, but on linux it's just
as easy to copy the db to /dev/shm and then go...

if you end up using it try with my arrayfields package - james' sqlite binding
detects it automatically and the tuples with come back as arrays with named
field access - it's faster and requires less memory than a hash, and is also
really convenient to code with.

cheers.

-a

···

On Wed, 9 Aug 2006, Bob Hutchison wrote:
--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dali lama

Why not use berkeleydb? There are ruby bindings for it.

···

On 8/9/06, Bob Hutchison <hutch@recursive.ca> wrote:

Hi,

I'm looking for a persistent store, where:
  * I can use it from Ruby
  * it is fast
  * it is transactional
  * it can update more than one store in a single transaction
  * simple String -> String mappings in the store are sufficient
  * ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes
or BTrees (or both). I know that Sleepycat's Berkeley DB <http://
www.sleepycat.com/> does this, and I've used Java based systems that
do this. I also know of some C based things but they don't have Ruby
wrappers. I can't find anything in the Ruby world, and I don't care
too much if it isn't pure Ruby.

SQLite has fully Ruby support and is free.

http://sqlite-ruby.rubyforge.org/

···

On 8/9/06, Bob Hutchison <hutch@recursive.ca> wrote:

Hi,

I'm looking for a persistent store, where:
  * I can use it from Ruby
  * it is fast
  * it is transactional
  * it can update more than one store in a single transaction
  * simple String -> String mappings in the store are sufficient
  * ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes
or BTrees (or both). I know that Sleepycat's Berkeley DB <http://
www.sleepycat.com/> does this, and I've used Java based systems that
do this. I also know of some C based things but they don't have Ruby
wrappers. I can't find anything in the Ruby world, and I don't care
too much if it isn't pure Ruby.

--
Jon Smirl
jonsmirl@gmail.com

Harold Hausman wrote:

Have you tried SQLite and found it to be too slow? It sounds to me
like you might be prematurely optimizing. We have several medium to
medium-large sized apps that make heavy use of SQLite and speed has
been not a problem at all.

On Linux, at any rate, one ought to be able to make SQLite extremely fast by adding RAM and tuning the I/O subsystem and the kernel, and using tricks like memory mapped files, assuming the database in question isn't too large. I'm assuming it's small, otherwise he'd pretty much need a humongous database server and an industrial strength database like Oracle or PostgreSQL.

Hi Bob,

If you're the fun and adventurous type you might want to look into
Mongoose. ( http://rubyforge.org/projects/mongoose/ ) It's new, and
claims to be fast. In the past I wrote some small apps that used the
same authors other system called KirbyBase which was very nice, I
haven't tried Mongoose yet, but I'm happily awaiting a problem with
which to use it on. Though now that I'm thinking about it I don't
know if it has built-in "transaction" support. I suppose it depends on
what you want it to do.

I have had a look at Mongoose, and quite like it. I'm sure to find a use for it. However, for this particular situation Mongoose's transactions are too week.

Have you tried SQLite and found it to be too slow? It sounds to me
like you might be prematurely optimizing. We have several medium to
medium-large sized apps that make heavy use of SQLite and speed has
been not a problem at all.

In the Java world things like Perst and JDBM were very fast. SQLite was problematic for some reason that is a bit foggy just now (I'll think of it).

What domain are you working in that requires this kind of speed
anyway? Just curious.

I write web based applications. The situations I'm thinking of are either financial or CMS-like applications. In the financial cases I've got situations where there might be a few million (probably not more than ten million, well maybe 20 million) small things tucked away. In the case of CMS-like stuff there are far fewer things, lets say 25k at most and usually more like a thousand or so, but they are bigger things.

The key factor (pardon the pun) is that the *vast* majority of the time (and I know how to make it 100% of the time) queries are as you'd have in a hash table (i.e. a single key where the key is usually a String).

This is for the persistent store associated with xampl (see URL in my signature). Xampl already has a few persistent stores but I now need one a little fancier.

Hope that helps,
-Harold

Thanks Harold.

Cheers,
Bob

···

On Aug 9, 2006, at 9:56 AM, Harold Hausman wrote:

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Hi,
I'm looking for a persistent store, where:
* I can use it from Ruby
* it is fast
* it is transactional
* it can update more than one store in a single transaction
* simple String -> String mappings in the store are sufficient
* ideally uses files in the filesystem

What about PStore?

Oh. Right. PStore never registered with me before.

I just had a look at it and it seems to be along the lines of QDBM and Purple, transactional on a single store and each store being one hash/tree.

Thanks for pointing that out.

Cheers,
Bob

···

On Aug 9, 2006, at 10:05 AM, Robert Klemme wrote:

On 09.08.2006 15:30, Bob Hutchison wrote:

  robert

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Hi Ara,

Hi,

I'm looking for a persistent store, where:
* I can use it from Ruby
* it is fast
* it is transactional
* it can update more than one store in a single transaction
* simple String -> String mappings in the store are sufficient
* ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes or
BTrees (or both). I know that Sleepycat's Berkeley DB
<Oracle Berkeley DB Downloads; does this, and I've used Java based systems that
do this. I also know of some C based things but they don't have Ruby
wrappers. I can't find anything in the Ruby world, and I don't care too much
if it isn't pure Ruby.

I know about Purple <http://purple.rubyforge.org/&gt; and the QDMB Ruby wrapper <http://qdbm.sourceforge.net/&gt;\. Neither do the multiple hash/BTree in a transaction thing.

The trick appears to be with transactions.

check out joel's fsdb - it's very nice if you want to go pure ruby. i've used
it on several projects.

I'm already using that (version 0.5 -- I can't get to RAA right now for some reason and there are no files on Rubyforge for fsdb so I don't know if there is a more recent version). In version 0.5 the transactions were not sufficient I think (it would be nice if I was wrong).

I know this can be done easily enough in a Relational DB, but I know that,
for example, JDBC/MySQL combination is significantly slower at what I need
to do than Perst (a pure Java thing that's startlingly fast). I've taken a
look at sqlite, which satisfies all requirements but I can't shake this
feeling that things can be better.

Does anyone have any ideas?

sqlite is hard to beat - i use it in at least 20 production systems and have
never had a single error. ruby queue uses it under the hood for the cluster
job store and this is used over nfs - we've been through disk failures and
power failures and come through unscathed.

That's good to hear. Don't misunderstand, I *like* SQLite but I don't know that it is suitable. If I can't find anything else I'll use it.

The thing that makes me nervous is that I can do what I need with just two tables. One that has (key, value-kind, value), and another that has (index-name, index-value, key, value-kind). Each column is a String. The vast majority of queries would be based on key and value-kind on the first table. The remaining queries would be a select/join kind of thing. And I can very easily jam the key and value-kind into the same field (but that would be an optimisation that may not be necessary). The trouble is that the first table might get to be very very large. I don't know how SQLite behaves with a single huge table. I suppose I'm going to have to find out.

it's also very fast, esp if you use in memory tables, but on linux it's just
as easy to copy the db to /dev/shm and then go...

That's interesting.

if you end up using it try with my arrayfields package - james' sqlite binding
detects it automatically and the tuples with come back as arrays with named
field access - it's faster and requires less memory than a hash, and is also
really convenient to code with.

I noticed that the other day. Nice.

Cheers,
Bob

···

On Aug 9, 2006, at 10:08 AM, ara.t.howard@noaa.gov wrote:

On Wed, 9 Aug 2006, Bob Hutchison wrote:

cheers.

-a
--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dali lama

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

The licensing cost of bdb for commercial re-distribution is prohibitive. Other than that, no reason.

Cheers,
Bob

···

On Aug 9, 2006, at 11:32 PM, snacktime wrote:

On 8/9/06, Bob Hutchison <hutch@recursive.ca> wrote:

Hi,

I'm looking for a persistent store, where:
  * I can use it from Ruby
  * it is fast
  * it is transactional
  * it can update more than one store in a single transaction
  * simple String -> String mappings in the store are sufficient
  * ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes
or BTrees (or both). I know that Sleepycat's Berkeley DB <http://
www.sleepycat.com/> does this, and I've used Java based systems that
do this. I also know of some C based things but they don't have Ruby
wrappers. I can't find anything in the Ruby world, and I don't care
too much if it isn't pure Ruby.

Why not use berkeleydb? There are ruby bindings for it.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

A few GB max for one set of applications, far far less for others.

Cheers,
Bob

···

On Aug 9, 2006, at 10:15 AM, M. Edward (Ed) Borasky wrote:

Harold Hausman wrote:

Have you tried SQLite and found it to be too slow? It sounds to me
like you might be prematurely optimizing. We have several medium to
medium-large sized apps that make heavy use of SQLite and speed has
been not a problem at all.

On Linux, at any rate, one ought to be able to make SQLite extremely fast by adding RAM and tuning the I/O subsystem and the kernel, and using tricks like memory mapped files, assuming the database in question isn't too large. I'm assuming it's small, otherwise he'd pretty much need a humongous database server and an industrial strength database like Oracle or PostgreSQL.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

please post a summary of your findings when you run your tests - i'd be
interested to read them.

cheers.

-a

···

On Thu, 10 Aug 2006, Bob Hutchison wrote:

The thing that makes me nervous is that I can do what I need with just two
tables. One that has (key, value-kind, value), and another that has
(index-name, index-value, key, value-kind). Each column is a String. The
vast majority of queries would be based on key and value-kind on the first
table. The remaining queries would be a select/join kind of thing. And I
can very easily jam the key and value-kind into the same field (but that
would be an optimisation that may not be necessary). The trouble is that the
first table might get to be very very large. I don't know how SQLite behaves
with a single huge table. I suppose I'm going to have to find out.

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dali lama

From: Bob Hutchison [mailto:hutch@recursive.ca]
Sent: Wednesday, August 09, 2006 4:39 PM

> Have you tried SQLite and found it to be too slow? It sounds to me
> like you might be prematurely optimizing. We have several medium to
> medium-large sized apps that make heavy use of SQLite and speed has
> been not a problem at all.

In the Java world things like Perst and JDBM were very fast. SQLite
was problematic for some reason that is a bit foggy just now (I'll
think of it).

All I can say is SQLite3 was amazingly fast at solving every problem
I threw at it (I don't know for the ruby bindings, but they appear
to be relativ thin).

I have to admit though that I never had more than a few 100 thousand
items in a single table.

Being blessed with the power and speed of complex SQL statements is
invaluable especialy in a scripting language.

cheers

Simon

Bob Hutchison wrote:

Hi Ara,

Hi,

I'm looking for a persistent store, where:
* I can use it from Ruby
* it is fast
* it is transactional
* it can update more than one store in a single transaction
* simple String -> String mappings in the store are sufficient
* ideally uses files in the filesystem

These kinds of stores are normally implemented as persistent hashes or
BTrees (or both). I know that Sleepycat's Berkeley DB
<Oracle Berkeley DB Downloads; does this, and I've used Java based systems that
do this. I also know of some C based things but they don't have Ruby
wrappers. I can't find anything in the Ruby world, and I don't care too much
if it isn't pure Ruby.

I know about Purple <http://purple.rubyforge.org/&gt; and the QDMB Ruby wrapper <http://qdbm.sourceforge.net/&gt;\. Neither do the multiple hash/BTree in a transaction thing.

The trick appears to be with transactions.

check out joel's fsdb - it's very nice if you want to go pure ruby. i've used
it on several projects.

I'm already using that (version 0.5 -- I can't get to RAA right now for some reason and there are no files on Rubyforge for fsdb so I don't know if there is a more recent version). In version 0.5 the transactions were not sufficient I think (it would be nice if I was wrong).

I apologize for the lack of fsdb stuff on the RubyForge page. I've never
found a comfortable way to automate gem uploads, so I still use my old
scripts for building this page:

http://redshift.sourceforge.net/

A quick scan there shows that fsdb-0.5 is the latest.

It's on my list to figure out how to automate the RubyForge dance (and I
think Ara or someone did this a while ago, so maybe it's a solved
problem now).

Now, on to your question...

Transactions in fsdb can be nested (and the transactions can be on
different dbs)--is that sufficient? This may not be what you mean by "single transaction", though.

But, like PStore, FSDB isn't very fast--it's pure ruby, it marshals
objects (by default--but you _can_ tell fsdb to write the value strings
directly rather than via marshal), and it pays the cost of thread- and
process-safety. One advantage over pstore is finer granularity.

Anyway, I hope your search is fruitful, whether it lands at bdb or sqlite or ...

···

On Aug 9, 2006, at 10:08 AM, ara.t.howard@noaa.gov wrote:

On Wed, 9 Aug 2006, Bob Hutchison wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Bob Hutchison wrote:

Harold Hausman wrote:

Have you tried SQLite and found it to be too slow? It sounds to me
like you might be prematurely optimizing. We have several medium to
medium-large sized apps that make heavy use of SQLite and speed has
been not a problem at all.

On Linux, at any rate, one ought to be able to make SQLite extremely fast by adding RAM and tuning the I/O subsystem and the kernel, and using tricks like memory mapped files, assuming the database in question isn't too large. I'm assuming it's small, otherwise he'd pretty much need a humongous database server and an industrial strength database like Oracle or PostgreSQL.

A few GB max for one set of applications, far far less for others.

A few GB ... I'd be looking at a "real database" for that. "Pay me now or pay me later", as the saying goes. :slight_smile:

···

On Aug 9, 2006, at 10:15 AM, M. Edward (Ed) Borasky wrote:

Austin Ziegler has automated the "Rubyforge dance"- look at
PDF::Writer or post to him on this list.

···

On 8/14/06, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:
> I apologize for the lack of fsdb stuff on the RubyForge page. I've never

found a comfortable way to automate gem uploads, so I still use my old
scripts for building this page:

Well, my concern comes probably comes from the same place that your thinking of a 'real database' comes from :slight_smile: To make it worse, in a relational DB I've got this urge to stick all that into only two tables... makes me nervous. Anyway, in the Java world, Perst has proven itself very capable, jaw-dropping fast, and reliable.

Cheers,
Bob

···

On Aug 9, 2006, at 11:12 PM, M. Edward (Ed) Borasky wrote:

Bob Hutchison wrote:

On Aug 9, 2006, at 10:15 AM, M. Edward (Ed) Borasky wrote:

Harold Hausman wrote:

Have you tried SQLite and found it to be too slow? It sounds to me
like you might be prematurely optimizing. We have several medium to
medium-large sized apps that make heavy use of SQLite and speed has
been not a problem at all.

On Linux, at any rate, one ought to be able to make SQLite extremely fast by adding RAM and tuning the I/O subsystem and the kernel, and using tricks like memory mapped files, assuming the database in question isn't too large. I'm assuming it's small, otherwise he'd pretty much need a humongous database server and an industrial strength database like Oracle or PostgreSQL.

A few GB max for one set of applications, far far less for others.

A few GB ... I'd be looking at a "real database" for that. "Pay me now or pay me later", as the saying goes. :slight_smile:

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Actually, the Net::LDAP Rakefile is the best thing to look at right now. :wink:

-austin

···

On 8/14/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:

On 8/14/06, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:
> I apologize for the lack of fsdb stuff on the RubyForge page. I've never
> found a comfortable way to automate gem uploads, so I still use my old
> scripts for building this page:
Austin Ziegler has automated the "Rubyforge dance"- look at
PDF::Writer or post to him on this list.

--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
               * austin@zieglers.ca

What about a prevalent DB, like Madeleine? You'd need the couple GB to keep
that one big app in-memory, but you said the others require far less.

···

--
Contribute to RubySpec! @ www.headius.com/rubyspec
Charles Oliver Nutter @ headius.blogspot.com
Ruby User @ ruby.mn
JRuby Developer @ www.jruby.org
Application Architect @ www.ventera.com