Store object in on disk / mini database

Kristian_Sorensen · 20 September 2004 18:54

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read it again? - Without having to parse in and output (and create the table all over).

I need to store some information, which could be storred in a small database (like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is not installed in the default location). Is there an interface to the Berkley DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

Lennon_Day-Reynolds1 · 20 September 2004 19:16

Kristian,

If you're working with small datasets, you can use the built-in
'Marshal' module to persist data.

For example, if the variable 'data' contains your hash to be saved,
you can just do the following:

···

--
open('myapp.dat', 'wb') do |fh|
Marshal.dump(data, fh)
end
--

To load your data later, you can use 'Marshal.load', which will
restore an object from either an open filehandle or a string. If you
need transactions, take a look at the 'PStore' library, which is part
of the standard distribution; it wraps a convenient database-like
interface on top of the Marshal methods, complete with transactional
access.

If you don't want to keep everything in RAM, there are also DBM, GDBM,
and SDBM bindings in the standard distribution.

--
Lennon
rcoder.net

Ara.T.Howard6 · 20 September 2004 19:25

yes. yes.

jib:~ > cat a.rb
require 'pstore'

db = PStore::new 'db'

this_time = Time::now
last_time = nil

   db.transaction do
     if db.root? 'time'
       last_time = db['time']
     end
     db['time'] = this_time
   end

puts "this_time <#{ this_time }>"
puts "last_time <#{ last_time }>"

   jib:~ > ruby a.rb
   this_time <Mon Sep 20 13:05:29 MDT 2004>
   last_time <>

   jib:~ > ruby a.rb
   this_time <Mon Sep 20 13:05:33 MDT 2004>
   last_time <Mon Sep 20 13:05:29 MDT 2004>

   jib:~ > ruby a.rb
   this_time <Mon Sep 20 13:05:38 MDT 2004>
   last_time <Mon Sep 20 13:05:33 MDT 2004>

jib:~ > cat b.rb
require 'bdb'

db = BDB::Btree.open "bdb", nil, BDB::CREATE, 0644

this_time = Time::now
last_time = nil

last_time = db['time']
db['time'] = this_time

puts "this_time <#{ this_time }>"
puts "last_time <#{ last_time }>"

db.close

   jib:~ > ruby b.rb
   this_time <Mon Sep 20 13:10:55 MDT 2004>
   last_time <>

   jib:~ > ruby b.rb
   this_time <Mon Sep 20 13:10:56 MDT 2004>
   last_time <Mon Sep 20 13:10:55 MDT 2004>

   jib:~ > ruby b.rb
   this_time <Mon Sep 20 13:11:01 MDT 2004>
   last_time <Mon Sep 20 13:10:56 MDT 2004>

regards.

-a

···

On Mon, 20 Sep 2004, [ISO-8859-1] Kristian Sørensen wrote:

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read
it again? - Without having to parse in and output (and create the table all
over).

I need to store some information, which could be storred in a small database
(like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is
not installed in the default location). Is there an interface to the Berkley
DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Kristian_Sorensen · 20 September 2004 20:14

Thanks for both your suggestions! That was just what I needed!

Kristian Sørensen wrote:

···

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read it again? - Without having to parse in and output (and create the table all over).

I need to store some information, which could be storred in a small database (like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is not installed in the default location). Is there an interface to the Berkley DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

T_Onoma · 20 September 2004 20:37

There's also YAML.

require 'yaml'

# save
open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

# retrieve
data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like
that.]

Nice thing about YAML is that the file it creates is human readable and
editable!

T.

···

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed!

Bill_Kelly · 20 September 2004 20:52

> Thanks for both your suggestions! That was just what I needed!

There's also YAML.

[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
  ystore.roots.each do |key|
    puts ystore[key].inspect
  end
end

# note the above code is untested

Regards,

Bill

···

From: "trans. (T. Onoma)" <transami@runbox.com>

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

James_Britt6 · 20 September 2004 22:44

trans. (T. Onoma) wrote:

···

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed!

There's also YAML.

  require 'yaml'

  # save
  open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

  # retrieve
  data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like that.]

Nice thing about YAML is that the file it creates is human readable and editable!

But you still reparse the data, which the OP wanted to avoid.

James

Ara.T.Howard6 · 20 September 2004 21:24

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

- yaml is alot slower than marshal. if your db has only 10,000 entries or
so this no problem

- flock does not work on nfs filesystems (used by pstore an
yaml::store)

-a

···

On Tue, 21 Sep 2004, Bill Kelly wrote:

From: "trans. (T. Onoma)" <transami@runbox.com>

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed!

There's also YAML.

[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end

# note the above code is untested

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Kristian_Sorensen · 20 September 2004 21:44

Hi!

This sounds VERY cool! I will definitly have a look at this tomorrow!! Thanks!

Cheers, KS.

Bill Kelly wrote:

···

From: "trans. (T. Onoma)" <transami@runbox.com>

On Monday 20 September 2004 04:14 pm, Kristian Sørensen wrote:

Thanks for both your suggestions! That was just what I needed!

There's also YAML.

[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
  ystore["my_hash"] = my_hash
  ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
  ystore.roots.each do |key|
    puts ystore[key].inspect
  end
end

# note the above code is untested

Regards,

Bill

T_Onoma · 20 September 2004 22:49

Ah, shucks!! Although, I imagine you reparse at some level no matter what.
But certainly Marshal is closer to the metal.

···

On Monday 20 September 2004 06:44 pm, James Britt wrote:

But you still reparse the data, which the OP wanted to avoid.

--
( o _ カラチ
// trans.
/ \ transami@runbox.com

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

Bill_Kelly · 21 September 2004 00:31

> Nice thing about YAML is that the file it creates is human readable and
> editable!

But you still reparse the data, which the OP wanted to avoid.

I'd thought the OP didn't want to manually write the code to parse keys
and values from a text file.... (As opposed to behind-the-scenes
parsing going on in a library...)

But IANTOP ;-D

Regards,

Bill

···

From: "James Britt" <jamesUNDERBARb@neurogami.com>

Mauricio_Fernndez · 20 September 2004 21:37

- syck crashes quite often

···

On Tue, Sep 21, 2004 at 06:24:39AM +0900, Ara.T.Howard@noaa.gov wrote:

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

  - yaml is alot slower than marshal. if your db has only 10,000 entries or
    so this no problem

  - flock does not work on nfs filesystems (used by pstore an
    yaml::store)

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

why_the_lucky_stiff1 · 20 September 2004 21:49

Mauricio Fernández wrote:

- syck crashes quite often

Are you refering to the bug you found while working on rpa? [ruby-core:02729] Or are you alluding to other bugs?

_why

Mauricio_Fernndez · 20 September 2004 22:28

Other bugs that look similar (assuming you fixed that one). And I've
had syck-related bugs with rpa-base quite recently (with some 1.8.2
CVS version).

I also have a proof of concept for a versioned FS datastore that
has the very nice property of crashing syck in no time
It's been a few weeks since I last tested it, but I hope its magic
still works -- if so, you can expect a copy in short.

···

On Tue, Sep 21, 2004 at 06:49:41AM +0900, why the lucky stiff wrote:

Mauricio Fernández wrote:

>- syck crashes quite often
>
>
Are you refering to the bug you found while working on rpa?
[ruby-core:02729] Or are you alluding to other bugs?

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Ara.T.Howard6 · 20 September 2004 22:44

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing
only if you are interested. what's the concept of your fs db?

cheers.

-a

···

On Tue, 21 Sep 2004, Mauricio [iso-8859-1] Fernández wrote:

I also have a proof of concept for a versioned FS datastore that has the
very nice property of crashing syck in no time It's been a few weeks
since I last tested it, but I hope its magic still works -- if so, you can
expect a copy in short.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Mauricio_Fernndez · 20 September 2004 23:24

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/\).

only if you are interested. what's the concept of your fs db?

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs
(gdbm, ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want
(as done by monotone), but it could also work in-mem with a simple Hash
and serialization via Marshal, etc...

···

On Tue, Sep 21, 2004 at 07:44:39AM +0900, Ara.T.Howard@noaa.gov wrote:

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Ara.T.Howard6 · 21 September 2004 03:24

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to read
the code to make sure but it's 1am)

yes - true.

I believe something like rdbm would be better (http://www.fefe.de/rdbm/\).

perhaps not as nfs safe...

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

sound very cool.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs (gdbm,
ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want (as done by
monotone), but it could also work in-mem with a simple Hash and
serialization via Marshal, etc...

any pointers to read about? sounds like a very interesting concept.

-a

···

On Tue, 21 Sep 2004, Mauricio [iso-8859-1] Fernández wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Dick_Davies1 · 21 September 2004 09:24

* Mauricio Fern?ndez <batsman.geo@yahoo.com> [0924 00:24]:

···

On Tue, Sep 21, 2004 at 07:44:39AM +0900, Ara.T.Howard@noaa.gov wrote:
> have you seen this?
>
> http://repetae.net/~john/computer/vsdb/
>
> super cool idea - but crashes alot. i have a little c binding for testing

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/\).

> only if you are interested. what's the concept of your fs db?

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

--
In order to make an apple pie from scratch, you must first create the
universe.
-- Carl Sagan, Cosmos
Rasputin :: Jack of All Trades - Master of Nuns

Mauricio_Fernndez · 21 September 2004 19:25

> I first learned about this approach via Eivind Eklund when talking about
> OVCS. It's the method used by Subversion and monotone (AFAIR): index
> data by its digest. A number of interesting things happen when you do so:
> * full-tree versioning
> * "implicit deltas" and fairly efficient compression of the data
> * ...

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

Yes, the fundamental idea is the same.

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

A moving CRC will do, e.g.

  if crc(buffer, offset, CRCLEN) % AVERAGE_LENGTH == 1
     chop up to current offset
   insert fragment
  else
     offset += 1
     ... logic if offset >= MAX_FRAGMENT_SIZE ...
  end

that gives you chunks of length averaging AVERAGE_LENGTH, in most
cases. Lower values mean higher P(node reuse) but there's a per-chunk overhead
(key + pointer to it in a list, etc).

···

On Tue, Sep 21, 2004 at 06:24:52PM +0900, Dick Davies wrote:

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Topic		Replies	Views
A wish: Simple database ruby-talk	34	156	6 March 2005
Database part of a desktop application ruby-talk	13	162	20 October 2012
Looking for a Fast Persistent Store ruby-talk	50	337	15 August 2006
Ruby sdbm library reliable? ruby-talk	8	109	17 May 2006
Large disc-based "hash"? ruby-talk	12	96	6 October 2004

Store object in on disk / mini database

--

--

--

On Tue, 21 Sep 2004, Mauricio [iso-8859-1] Fernández wrote: --

Related topics

On Tue, 21 Sep 2004, Mauricio [iso-8859-1] Fernández wrote:
--