Store object in on disk / mini database

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read it again? - Without having to parse in and output (and create the table all over).

I need to store some information, which could be storred in a small database (like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is not installed in the default location). Is there an interface to the Berkley DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

Kristian,

If you're working with small datasets, you can use the built-in
'Marshal' module to persist data.

For example, if the variable 'data' contains your hash to be saved,
you can just do the following:

···

--
open('myapp.dat', 'wb') do |fh|
  Marshal.dump(data, fh)
end
--

To load your data later, you can use 'Marshal.load', which will
restore an object from either an open filehandle or a string. If you
need transactions, take a look at the 'PStore' library, which is part
of the standard distribution; it wraps a convenient database-like
interface on top of the Marshal methods, complete with transactional
access.

If you don't want to keep everything in RAM, there are also DBM, GDBM,
and SDBM bindings in the standard distribution.

--
Lennon
rcoder.net

yes. yes.

   jib:~ > cat a.rb
   require 'pstore'

   db = PStore::new 'db'

   this_time = Time::now
   last_time = nil

   db.transaction do
     if db.root? 'time'
       last_time = db['time']
     end
     db['time'] = this_time
   end

   puts "this_time <#{ this_time }>"
   puts "last_time <#{ last_time }>"

   jib:~ > ruby a.rb
   this_time <Mon Sep 20 13:05:29 MDT 2004>
   last_time <>

   jib:~ > ruby a.rb
   this_time <Mon Sep 20 13:05:33 MDT 2004>
   last_time <Mon Sep 20 13:05:29 MDT 2004>

   jib:~ > ruby a.rb
   this_time <Mon Sep 20 13:05:38 MDT 2004>
   last_time <Mon Sep 20 13:05:33 MDT 2004>

   jib:~ > cat b.rb
   require 'bdb'

   db = BDB::Btree.open "bdb", nil, BDB::CREATE, 0644

   this_time = Time::now
   last_time = nil

   last_time = db['time']
   db['time'] = this_time

   puts "this_time <#{ this_time }>"
   puts "last_time <#{ last_time }>"

   db.close

   jib:~ > ruby b.rb
   this_time <Mon Sep 20 13:10:55 MDT 2004>
   last_time <>

   jib:~ > ruby b.rb
   this_time <Mon Sep 20 13:10:56 MDT 2004>
   last_time <Mon Sep 20 13:10:55 MDT 2004>

   jib:~ > ruby b.rb
   this_time <Mon Sep 20 13:11:01 MDT 2004>
   last_time <Mon Sep 20 13:10:56 MDT 2004>

regards.

-a

···

On Mon, 20 Sep 2004, [ISO-8859-1] Kristian Sørensen wrote:

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read
it again? - Without having to parse in and output (and create the table all
over).

I need to store some information, which could be storred in a small database
(like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is
not installed in the default location). Is there an interface to the Berkley
DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Thanks for both your suggestions! That was just what I needed! :slight_smile:

Kristian Sørensen wrote:

···

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read it again? - Without having to parse in and output (and create the table all over).

I need to store some information, which could be storred in a small database (like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is not installed in the default location). Is there an interface to the Berkley DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

There's also YAML.

  require 'yaml'

  # save
  open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

  # retrieve
  data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like
that.]

Nice thing about YAML is that the file it creates is human readable and
editable!

T.

···

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed! :slight_smile:

> Thanks for both your suggestions! That was just what I needed! :slight_smile:

There's also YAML.

[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
  ystore["my_hash"] = my_hash
  ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
  ystore.roots.each do |key|
    puts ystore[key].inspect
  end
end

# note the above code is untested

Regards,

Bill

···

From: "trans. (T. Onoma)" <transami@runbox.com>

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

trans. (T. Onoma) wrote:

···

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed! :slight_smile:

There's also YAML.

  require 'yaml'

  # save
  open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

  # retrieve
  data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like that.]

Nice thing about YAML is that the file it creates is human readable and editable!

But you still reparse the data, which the OP wanted to avoid.

James

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

   - yaml is alot slower than marshal. if your db has only 10,000 entries or
     so this no problem

   - flock does not work on nfs filesystems (used by pstore an
     yaml::store)

-a

···

On Tue, 21 Sep 2004, Bill Kelly wrote:

From: "trans. (T. Onoma)" <transami@runbox.com>

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed! :slight_smile:

There's also YAML.

[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
   puts ystore[key].inspect
end
end

# note the above code is untested

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Hi!

This sounds VERY cool! I will definitly have a look at this tomorrow!! Thanks! :smiley:

Cheers, KS.

Bill Kelly wrote:

···

From: "trans. (T. Onoma)" <transami@runbox.com>

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed! :slight_smile:

There's also YAML.

[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
  ystore["my_hash"] = my_hash
  ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
  ystore.roots.each do |key|
    puts ystore[key].inspect
  end
end

# note the above code is untested

Regards,

Bill

Ah, shucks!! :slight_smile: Although, I imagine you reparse at some level no matter what.
But certainly Marshal is closer to the metal.

···

On Monday 20 September 2004 06:44 pm, James Britt wrote:

But you still reparse the data, which the OP wanted to avoid.

--
( o _ カラチ
// trans.
/ \ transami@runbox.com

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

> Nice thing about YAML is that the file it creates is human readable and
> editable!

But you still reparse the data, which the OP wanted to avoid.

I'd thought the OP didn't want to manually write the code to parse keys
and values from a text file.... :slight_smile: (As opposed to behind-the-scenes
parsing going on in a library...)

But IANTOP ;-D

Regards,

Bill

···

From: "James Britt" <jamesUNDERBARb@neurogami.com>

- syck crashes quite often :frowning:

···

On Tue, Sep 21, 2004 at 06:24:39AM +0900, Ara.T.Howard@noaa.gov wrote:

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

  - yaml is alot slower than marshal. if your db has only 10,000 entries or
    so this no problem

  - flock does not work on nfs filesystems (used by pstore an
    yaml::store)

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Mauricio Fernández wrote:

- syck crashes quite often :frowning:

Are you refering to the bug you found while working on rpa? [ruby-core:02729] Or are you alluding to other bugs?

_why

Other bugs that look similar (assuming you fixed that one). And I've
had syck-related bugs with rpa-base quite recently (with some 1.8.2
CVS version).

I also have a proof of concept for a versioned FS datastore that
has the very nice property of crashing syck in no time :slight_smile:
It's been a few weeks since I last tested it, but I hope its magic
still works -- if so, you can expect a copy in short.

···

On Tue, Sep 21, 2004 at 06:49:41AM +0900, why the lucky stiff wrote:

Mauricio Fernández wrote:

>- syck crashes quite often :frowning:
>
>
Are you refering to the bug you found while working on rpa?
[ruby-core:02729] Or are you alluding to other bugs?

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing
only if you are interested. what's the concept of your fs db?

cheers.

-a

···

On Tue, 21 Sep 2004, Mauricio [iso-8859-1] Fernández wrote:

I also have a proof of concept for a versioned FS datastore that has the
very nice property of crashing syck in no time :slight_smile: It's been a few weeks
since I last tested it, but I hope its magic still works -- if so, you can
expect a copy in short.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/\).

only if you are interested. what's the concept of your fs db?

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs
(gdbm, ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want
(as done by monotone), but it could also work in-mem with a simple Hash
and serialization via Marshal, etc...

···

On Tue, Sep 21, 2004 at 07:44:39AM +0900, Ara.T.Howard@noaa.gov wrote:

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to read
the code to make sure but it's 1am)

yes - true.

I believe something like rdbm would be better (http://www.fefe.de/rdbm/\).

perhaps not as nfs safe...

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

sound very cool.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs (gdbm,
ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want (as done by
monotone), but it could also work in-mem with a simple Hash and
serialization via Marshal, etc...

any pointers to read about? sounds like a very interesting concept.

-a

···

On Tue, 21 Sep 2004, Mauricio [iso-8859-1] Fernández wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

* Mauricio Fern?ndez <batsman.geo@yahoo.com> [0924 00:24]:

···

On Tue, Sep 21, 2004 at 07:44:39AM +0900, Ara.T.Howard@noaa.gov wrote:
> have you seen this?
>
> http://repetae.net/~john/computer/vsdb/
>
> super cool idea - but crashes alot. i have a little c binding for testing

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/\).

> only if you are interested. what's the concept of your fs db?

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

--
In order to make an apple pie from scratch, you must first create the
universe.
    -- Carl Sagan, Cosmos
Rasputin :: Jack of All Trades - Master of Nuns

> I first learned about this approach via Eivind Eklund when talking about
> OVCS. It's the method used by Subversion and monotone (AFAIR): index
> data by its digest. A number of interesting things happen when you do so:
> * full-tree versioning
> * "implicit deltas" and fairly efficient compression of the data
> * ...

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

Yes, the fundamental idea is the same.

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

A moving CRC will do, e.g.

  if crc(buffer, offset, CRCLEN) % AVERAGE_LENGTH == 1
     chop up to current offset
   insert fragment
  else
     offset += 1
     ... logic if offset >= MAX_FRAGMENT_SIZE ...
  end

that gives you chunks of length averaging AVERAGE_LENGTH, in most
cases. Lower values mean higher P(node reuse) but there's a per-chunk overhead
(key + pointer to it in a list, etc).

···

On Tue, Sep 21, 2004 at 06:24:52PM +0900, Dick Davies wrote:

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com