A memcached-like server in Ruby - feasible?

it doesn't take much stuff to fill up a 160 GB iPod, right?

http://drawohara.tumblr.com/post/17471102

BTW, my wife and I were only able to fit about 3/5ths of our CD
collection on our 40 gig iPod. (I rip at 320kbps mp3 admittedly.)

So while a 160 GB iPod would be slightly overkill for us, it wouldn't be outrageously so.

Regards,

Bill

···

From: "ara.t.howard" <ara.t.howard@gmail.com>

On Oct 28, 2007, at 12:48 AM, M. Edward (Ed) Borasky wrote:

The other thing you can play with is using sqlite as the local (one
per app server) cache engine.

Thanks, but if I'm already caching at the local process level, I might
as well cache to in-memory Ruby objects; the entire data-set isn't
that huge for a high-end server RAM capacity: about 500 MB all in all.

YS.

-Tom

···

On 10/28/07, Yohanes Santoso <ysantoso-rubytalk@dessyku.is-a-geek.org> wrote:

Thanks, that's true, and we already do that. We have a very large
cache in fact (~500 MB) and it does improve performance, though not
enough.

-Tom

···

On 10/28/07, Andreas S. <x-ruby-forum.com@andreas-s.net> wrote:

Try enabling the MySQL query cache. For many applications even a few MB
can work wonders.

I'd like to see you take the approach of extending memcached with ruby.
Leverage what memcached is already doing and already good at.

Based on my limited understanding (please correct me if I'm wrong):
memcached works by accepting a request for a certain 'key' and then returns
all objects that match that key. Right?

For example, you can make memcached store 1000 user records and then ask for
all of them, but you can't ask for them with a 'query' that limits the set
of users.

The way that memcached would answer your query for 1000 users is to go to
each node and fetch and return to you all the users stored in that node,
this node has 400 of them, this node has 200 of them, combine them all
together and return them...

So what you want to do, is to be able to define some arbitrary ruby that
gets executed at each Node to trim down the set of users so that the entire
set doesn't need to be returned.

So alter memcached to accept a 'query' in the form of arbitrary ruby (or
perhaps a pre-defined ruby) that a peer-daemon is to execute over the set of
results a particular memcached node contains.

In my understanding, this is sort of the way CouchDB is supposed to work.
(http://theexciter.com/articles/couchdb-views-in-ruby-instead-of-javascript)

do you follow?

Tom Machinski wrote:

I might have impressed you with a somewhat inflated view of how large
our data-set is :slight_smile:

We have about 100K objects, occupying ~500KB per object. So all in
all, the total weight of our dataset is no more than 500MBs. We might
grow to maybe twice that in the next 2 years. But that's it.

So it's very feasible to keep the entire data-set in *good* RAM for a
reasonable cost.

I was just thinking ... Erlang has an in-RAM database capability called "Mnesia". Perhaps it could be ported to Ruby or one could write an ActiveRecord connector to a Mnesia database.

Good point. Unfortunately, MySQL 5 doesn't appear to be able to take
hints. We've analyzed our queries and there's some strategies there we
could definitely improve by manual hinting, but alas we'd need to
switch to an RDBMS that supports those.

I wonder if you could trick PostgreSQL into putting its database in a RAM disk. :slight_smile: Seriously, though, if you're on Linux, you could probably tweak PostgreSQL and the Linux page cache to get the whole database in RAM while still having it safely stored on hard drives. I suppose you could also do that for MySQL, but PostgreSQL is simply a better RDBMS.

Yohanes Santoso schrieb:

"Tom Machinski" <tom.machinski@gmail.com> writes:

Long term, my goal is to minimize the amount of queries that hit the
database. Some of the queries are more complex than the relatively
simple example I've given here. And I don't think I could optimize
them much beyond 0.01 secs per query.

I was hoping to alleviate with memcached_improved some of the pains
associated with database scaling, e.g. building a replicating cluster
etc. Basically what memcached does for you, except as demonstrated,
memcached by itself seems insufficient for our needs.

The other thing you can play with is using sqlite as the local (one
per app server) cache engine.

With in-memory tables :slight_smile:

Regards,

   Michael

Yeah, I thought of writing a Ruby daemon that "wraps" memcached.

But then the wrapper would have to deal with all the performance
challenges that a full replacement to memcached has to deal with,
namely: handling multiple concurrent clients, multiple simultaneous
read/write requests
(race conditions etc.) and heavy loads.

A naive implementation of memcached itself would be trivial to write;
memcached's real merits are not its rather limited featureset, but
its performance, stability, and robustness - i.e., its capability to
overcome the above challenges.

The only way I could use memcached to do complex queries is by
patching memcached to accept and handle complex queries. Such a patch
won't have anything to do with Ruby itself, would probably be very
non-trivial, and will have to significantly extend memcached's
architecture. I doubt I have the time to roll out something like that.

-Tom

···

On 10/28/07, Jacob Burkhart <igotimac@gmail.com> wrote:

So alter memcached to accept a 'query' in the form of arbitrary ruby (or
perhaps a pre-defined ruby) that a peer-daemon is to execute over the set of
results a particular memcached node contains.

"Tom Machinski" <tom.machinski@gmail.com> writes:

···

On 10/28/07, Yohanes Santoso <ysantoso-rubytalk@dessyku.is-a-geek.org> wrote:

The other thing you can play with is using sqlite as the local (one
per app server) cache engine.

Thanks, but if I'm already caching at the local process level, I might
as well cache to in-memory Ruby objects; the entire data-set isn't
that huge for a high-end server RAM capacity: about 500 MB all in all.

Caching to in-memory ruby objects does not automatically confer the
smartness you was describing. The sqlite is for the smartness.

I think Ara T Howard in the other thread was quite spot-on in
summarising your need.

YS.