[ANN] ndb Object-Relational Mapper

= n/db README

== Table of Contents

1. Introduction
2. Features
3. Sample Code
4. Technical Information
5. Links
6. Current Version and Status
7. Future
8. Feedback
9. Licence

== Introduction

The aim of this project is to deliver an efficient, yet simple,
Object-Relational Mapper Library. The library manages the
lifecycle of standard Ruby objects. Unlike similar projects, our
approach requires neither xml configuration files (i.e. J2EE), nor
SQL definitions (i.e. ActiveRecord).

By defining the Managed Objects (or Entities) using Ruby, we
leverage Ruby's OOP features to allow for OO definitions of DataBase
schemas. A simple meta-language is provided to allow fine-tuning
of the generated SQL code.

This library is designed to be integrated in a Web Application
Framework. This Framework will be released on a later day.

== Features

The library provides the following features:

+ Object-Relational mapping
+ Deserialize to Ruby Objects or ResultSets
+ Deserialize sql join queries to Ruby Objects
+ Connection pooling
+ Thread safety
+ SQL transactions
+ Callbacks
+ Simple support for cascading deletes

== Sample Code

require "n/db"

# initialize the DB interface
$db = N::Db.new(
:address => "localhost",
:database => "test",
:user => "postgres",
:password => "mypassword",
:connection_count => 20
)

# create a managed object

class MyObject
include N::Entity
manage {
prop String, :name
prop [Fixnum, "smallint default 1"], :level; sql_index :level
}
# non serialized attributes
attr_accessor value
end

class AnotherObject < MyObject
manage {
prop Time, :create_time
}

def initialize
@create_time = Time.now
super
end
end

obj = AnotherObject.new
obj.name = "gmosx"

# insert into db
$db << obj

obj.name = "drak"

# update
$db << obj

# get object id allocated when inserting
oid = obj.oid

# lookup
obj = $db.get(oid, AnotherObject)

# multiple commands

$db.open { |db|
obj = db.get(oid, AnotherObject)
db.delete(obj)
db.select("SELECT * FROM #{AnotherObject::DBTABLE} WHERE oid=1")
}

# graph like objects

class Child
include N::Entity
include N::Child
manage {
prop Time, :create_time
}
end

child = Child.new
child.set_parent(obj)

$db << child

# get children of class Child

$db.children(obj, Child, "ORDER BY create_time")

# sql transactions

$db.transaction {
....
}

== Technical Information

=== Installation

Just uncompress the distribution and add the 'n' directory to
the Ruby path.

tar xvfz ndb-x.x.tar.gz
cd ndb
ruby n/.tc-db.rb

=== How the package is organized:

doc/*:
RDOC generated documentation.

n/db.rb:
The main n/db file.

n/db/*:
The db related files.

n/utils/*:
Various utility files.

=== Testing:

Unit Test files start with the .tc- prefix, i.e. they are hidden
by default (Unix). You can run the db test unit as follows:

ruby n/.tc-db.rb

Before running this test unit, make sure tha the postgreSQL server
is running, and the 'test' database is created.

== Links

* http://www.navel.gr, Navel
* http://www.navel.gr/open-source, Project Home Page

== Current Version and Status

Version 0.2 was released on 07-10-2004. The sofware is actually usable
but not tested in a production environment. Comments from the Ruby
community are critical in order to fix possible bugs and improve the
API. Suggestions for missing features are also welcome. This version
only supports the Postgres Database.

== Future

* Support multiple DataBase backends (Postgres, MySQL, ...)
* Support prepared statements (pgsql)
* Support stored procedures (pgsql)
* Support caching
* Deserialize to OpenStruct
* Better documentation
* Code cleanup, refactoring

== Feedback

If you would like to report a bug, suggest a method for inclusion, or
make
some other suggestion (documentation, package layout, etc.), please
send
an email to ndb-feedback@navel.gr

== Licence

Copyright (c) 2004 Navel, all rights reserved. http://www.navel.gr

n/db (http://www.navel.gr/open-source) is copyrighted free software
created and maintained by George Moschovitis (mailto:gm@navel.gr)
and released under the same license as Ruby.

Standard disclaimer:
THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  PURPOSE.

I'd like a database object mapping framework to remember the 'old' values of
properties at the time an object was read, and perform safe atomic updates.

e.g. sample

···

On Thu, Oct 07, 2004 at 08:14:53PM +0900, George Moschovitis wrote:

Comments from the Ruby
community are critical in order to fix possible bugs and improve the
API. Suggestions for missing features are also welcome. This version
only supports the Postgres Database.

+---+-------+------+-------+

id | a | b | c |

+---+-------+------+-------+

1 | hello | Ruby | world |

+---+-------+------+-------+

  obj = $db.get(1) # a="hello", b="Ruby", c="world"
  ...
  obj.a = "goodbye" # still remembers old_a = "hello"
  obj.b = "Perl" # still remembers old_b = "Ruby"
  obj.save

generates:

(1) lax

  UPDATE sample SET a='goodbye',b='Perl' WHERE id=1 AND
    (a='hello' OR a='goodbye') AND (b='Ruby' OR b='Perl');
or
(2) strict

  UPDATE sample SET a='goodbye',b='Perl' WHERE id=1 AND
    a='hello' AND b='Ruby';

and raises an exception if the row processed count = 0. That stops
simultaneous updates stamping on each other. The 'strict' version is useful
for counters, sequences etc.

If this exception occurs, it's then easy enough to query the database to get
a set of (keys,values) where the value in the database is not consistent
with the value you're trying to save, to allow the user to resolve the
conflict:

  # this needs wrapping in a method, iterating across all properties
  obj2 = $db.get(1)
  diff = {}
  diff[:a] = [obj.a, obj2.a] if (obj2.a != obj.old_a AND obj2.a != obj.a)
  diff[:b] = [obj.b, obj2.b] if (obj2.b != obj.old_b AND obj2.b != obj.b)
  obj.old_a = obj2.a
  obj.old_b = obj2.b

These 'old' properties might also be useful for object-level transactions,
without serialisation as ActiveRecord/Transaction::Simple does.

Regards,

Brian.

Hello Brian,

thank you for your suggestion! It took me some time to understand it,
but it
seems interesting. I 'll have to think about it a little bit longer
about how to implement
this in the library.

best regards,
George

I'd like a database object mapping framework to remember the 'old'
values of properties at the time an object was read, and perform
safe atomic updates.

e.g. sample
+---+-------+------+-------+
>id | a | b | c |
+---+-------+------+-------+
> 1 | hello | Ruby | world |
+---+-------+------+-------+

  obj = $db.get(1) # a="hello", b="Ruby", c="world"
  ...
  obj.a = "goodbye" # still remembers old_a = "hello"
  obj.b = "Perl" # still remembers old_b = "Ruby"
  obj.save

generates:

(1) lax

  UPDATE sample SET a='goodbye',b='Perl' WHERE id=1 AND
    (a='hello' OR a='goodbye') AND (b='Ruby' OR b='Perl');
or
(2) strict

  UPDATE sample SET a='goodbye',b='Perl' WHERE id=1 AND
    a='hello' AND b='Ruby';

and raises an exception if the row processed count = 0. That stops
simultaneous updates stamping on each other. The 'strict' version is
useful for counters, sequences etc.

That's an interesting suggestion, as it gives one a defense against some
external process coming in and changing things unexpectedly. Putting this
on my todo for Kansas!

These 'old' properties might also be useful for object-level
transactions, without serialisation as
ActiveRecord/Transaction::Simple does.

In Kansas, it keeps each old value within the context of a transaction, but
after the transaction is committed, the old values go away. I don't provide
any interface, however, for actually access the old values at any time. Can
you elaborate more on ways that you would see the old values being used?
This is quite educational.

As for object-level transactions without serialization, I'm not sure about
AR, but with Kansas one can set a read-only attribute on data objects, and
can direct a query to return the data with the objects already in a read-
only state. While read-only is on, changes to the object will not get
serialized back to the db.

Thanks,

Kirk Haines

···

On Thu, 7 Oct 2004 21:20:37 +0900, Brian Candler wrote

That's an interesting suggestion, as it gives one a defense against some
external process coming in and changing things unexpectedly. Putting this
on my todo for Kansas!

That's another O<->R mapping engine? Looks like there's a big ecosystem to
choose from :slight_smile:

> These 'old' properties might also be useful for object-level
> transactions, without serialisation as
> ActiveRecord/Transaction::Simple does.

In Kansas, it keeps each old value within the context of a transaction, but
after the transaction is committed, the old values go away. I don't provide
any interface, however, for actually access the old values at any time. Can
you elaborate more on ways that you would see the old values being used?
This is quite educational.

I think that once they're committed to the database, the old values are no
longer needed (and indeed, should be reset to the known current values in
the DB). The 'old' values represent a database snapshot at the last instance
that the DB was touched, and I think this is useful even before a
transaction starts.

My main worry is users A and B both pulling up the same record onto a
screen, making changes, and then writing back both; the one who gets there
first risks having their changes overwritten. Checking the old values have
not been changed as part of an atomic update is simple and robust, and
doesn't require record locking.

I wrote a system once which incorporated this concept, but since it had a
web interface, I think I ended up putting the 'old' values in hidden form
variables anyway.

As for object-level transactions without serialization, I'm not sure about
AR, but with Kansas one can set a read-only attribute on data objects, and
can direct a query to return the data with the objects already in a read-
only state. While read-only is on, changes to the object will not get
serialized back to the db.

OK, I think perhaps we're talking about something else. With AR (which I've
only looked at briefly), you can make changes to an object within a
transaction, and if it fails, the objects themselves are rolled back to
their state at the start of the transaction. This is done using Austin
Ziegler's Transaction::Simple library, which just keeps a copy of the object
in marshalled form in an instance variable. It rolls it back using, in
outline:

    r = Marshal.restore(@__foo__)
    self.replace(r) if respond_to?(:replace)
    r.instance_variables.each do |i|
      instance_variable_set(i, r.instance_variable_get(i))
    end

I was just thinking that if you're keeping the properties in a hash, and
have a separate hash for their snapshot values, then you get this capability
for free:

    @props = @oldprops.dup

But it's not clear to me whether the best approach is to have 'obj'
containing both old and new values, or whether you should just have two
separate objects representing then and now:

    obj1 = $db.get(n)
    obj2 = obj1.dup
    ... make changes to obj2
    obj2.save(obj1) # => generates SQL to update obj1 to obj2 atomically

Incidentally, talking about the OR ecosystem, I have some proof-of-concept
code which takes XML and stores it in SQL. It takes a stream of tag_start(),
element(), text(), tag_end() method calls (as generated by an
REXML::StreamParser), and writes rows in a DB. <foo>text</foo> ends up as a
single row. It can then spool out any subtree by doing a single 'select' and
converting this back into method calls; there are helper methods to turn a
method call stream into either XML text or a tree of REXML::Element objects.
The nice thing is that no intermediate representation is built in RAM, so
in principle you could spool gigabytes of XML in and out of SQL.

Dunno if anyone is interested in this... I was going to add a mechanism to
convert XPATH queries into SQL queries, and was starting to think about how
different element types could be mapped to different tables, at which point
it starts to look like an OR mapping solution. At the moment there are
just global 'elements' and 'attributes' tables. I will look at Kansas, AR,
NDB and others and see what good ideas I can steal from them :slight_smile:

But in principle an object with attributes should map quite nicely to
   <class attr1=val1 attr2=val2...>
Having a hierarchy can be useful too, e.g. for access control, where a
user can only "see" objects which are below them in the tree. And XML is
still useful as an export/import tool, even if most people on this list use
YAML anyway :slight_smile:

Regards,

Brian.

···

On Thu, Oct 07, 2004 at 11:31:37PM +0900, Kirk Haines wrote:

That's another O<->R mapping engine? Looks like there's a big
ecosystem to choose from :slight_smile:

In alphabetical order:

Active Record, Criteria, Kansas, Lafcadio, ndb are the five that pop to mind
immediately.

My main worry is users A and B both pulling up the same record onto a
screen, making changes, and then writing back both; the one who gets
there first risks having their changes overwritten. Checking the old
values have not been changed as part of an atomic update is simple
and robust, and doesn't require record locking.

(*nod*) This makes a great deal of sense to me.

OK, I think perhaps we're talking about something else. With AR
(which I've only looked at briefly), you can make changes to an
object within a transaction, and if it fails, the objects themselves
are rolled back to their state at the start of the transaction. This
is done using Austin Ziegler's Transaction::Simple library, which
just keeps a copy of the object in marshalled form in an instance
variable. It rolls it back using, in outline:

    r = Marshal.restore(@__foo__)
    self.replace(r) if respond_to?(:replace)
    r.instance_variables.each do |i|
      instance_variable_set(i, r.instance_variable_get(i))
    end

I was just thinking that if you're keeping the properties in a hash,
and have a separate hash for their snapshot values, then you get
this capability for free:

    @props = @oldprops.dup

But it's not clear to me whether the best approach is to have 'obj'
containing both old and new values, or whether you should just have two
separate objects representing then and now:

    obj1 = $db.get(n)
    obj2 = obj1.dup
    ... make changes to obj2
    obj2.save(obj1) # => generates SQL to update obj1 to obj2 atomically

Ah. Yeah, what I was talking about was when one wants to query data from a
db, and then do things with that data, possibly changing the data in the
objects, but without serializing the changes back to the db, or at least
without serializing them back immediately and without having to do
everything within a transaction.

You are just talking about the rollback implementation. Kansas' rollback
implementation needs some work, actually, but right now it keeps on hand
every value for every field between the start of the transaction and the
commit. There's just no interface for, at the user level, accessing those.

In the case of an object rollback, either done explicitly by one's code, or
because there was an exception thrown in the update, Kansas rolls each of
the fields back to the original values. So it's similar to your first
example. It's simple, but in practice it works well, in my experience.

Dunno if anyone is interested in this... I was going to add a
mechanism to convert XPATH queries into SQL queries, and was
starting to think about how different element types could be mapped
to different tables, at which point it starts to look like an OR
mapping solution. At the moment there are just global 'elements' and
'attributes' tables. I will look at Kansas, AR, NDB and others and
see what good ideas I can steal from them :slight_smile:

But in principle an object with attributes should map quite nicely to
   <class attr1=val1 attr2=val2...>
Having a hierarchy can be useful too, e.g. for access control, where
a user can only "see" objects which are below them in the tree. And
XML is still useful as an export/import tool, even if most people on
this list use YAML anyway :slight_smile:

I'm interested. It's always good to see what other ideas are floating
around that one can learn from or piggy back off of!

Kirk Haines

···

On Fri, 8 Oct 2004 01:36:37 +0900, Brian Candler wrote

I wrote a system once which incorporated this concept, but since it had a
web interface, I think I ended up putting the 'old' values in hidden form
variables anyway.

This is a COOL idea :slight_smile:

Having a hierarchy can be useful too, e.g. for access control, where a
user can only "see" objects which are below them in the tree. And XML is
still useful as an export/import tool, even if most people on this list use
YAML anyway :slight_smile:

well n/Db fully supports a hierarchy. Have a look at the Child and ParentClass and at DbConnection#children etc.

George Moschovitis

···

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

Kirk Haines wrote:

Active Record, Criteria, Kansas, Lafcadio, ndb are the five that pop to mind immediately.

There is also Vapor (http://vapor.rubyforge.org). A nice listing of ORM libs can be found on this page:

   http://rpa-base.rubyforge.org/wiki/wiki.cgi?PackageAdvisor

···

--
John

Kirk Haines wrote:

That's another O<->R mapping engine? Looks like there's a big ecosystem to choose from :slight_smile:

In alphabetical order:
Active Record, Criteria, Kansas, Lafcadio, ndb are the five that pop to mind immediately.

There are many O<->R libraries out there, but let me point out where ndb is different:

ActiveRecord:
this seems to me to be an R->O mapping and not vice versa
i think this doesnt use standard ruby objects but something like OpenStruct.

Criteria:
Uses strange OO query objects

Lafcadio:
R->O again, you have to write sql definitnions

Kansas:
the same.

n/db:
- Uses STANDARD ruby objects ie (O->R mapping)

- You write NO SQL table definitions
(you can finetune the generated sql schema using a simple metalanguage in ruby)

- Because the managed objects are standard objects, you can use
inheritance or synthesize your objects using mixins. A number
of usefull mixins are provided (for example Child, SqlTreeTraversable
(graph preorder pattern) etc).

- You can optionally use a unified id space. Ie oids are allocated to
different classes of object from the same space. This allows you for
example code like the following:

class Article
....
end

class User
....
end

class Comment
....
end

a = Article.new
u = User.new

c1 = Comment.new
c1.set_parent(a)
$db << c1

c2 = Comment.new
c2.set_parent(a)
$db << c2

article_comments = $db.children(a, Comment)
user_comments = $db.children(u, Comment)

- you can automatically deserialize to ruby objects or to resultsets,
a future version will support deserialization to OpenStruct objects.

- nice support for SQL JOIN operations.

- moreover there are some optimizations (for example pregenerated insert/update/deserialize code for each managed class) and more
are coming in the next version (prepared statements, stored procedures).

- there are some more nice tricks for you to discover until I find time
to write proper documentation.

I hope more people will have a look at the code, and tell me about problems, and bugs or give suggestions and ideas for improvement.
Any cool ideas will be integrated in a future version.

regards,
George Moschovitis

···

On Fri, 8 Oct 2004 01:36:37 +0900, Brian Candler wrote

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

Yep. The trouble with just having a child/parent link is that it's very
inefficient to do a search which is limited to all nodes within a subtree,
and this is the one vaguely clever bit of what I've implemented. Each node
has a 'path' attribute stored in a compacted form. It has the property that

   select * from elements where path like 'ABC%' order by path;

will give you all the elements under node ABC, *and* in the correct order to
spool them out to XML (each parent preceeds its children). It's very easy to
perform searches which are limited to any arbitary subtree.

The path is constructed using these rules:

* root node has path ""
* a child has a path constructed of its parent's path plus a child ID
* the child ID counts from 0, and is encoded into a variable-length string
  using base64, using symbols 0-9 and A-V for numbers 0-31.

    x 0 to 31
    Wxx 32 to 2^10-1
    Xxxxx 2^10 to 2^20-1
    Yxxxxxx 2^20 to 2^30-1
    Zxxxxxxxx 2^30 to 2^40-1

Such paths can be easily decomposed by inspection, e.g.

    4AWVUP -> 4 A WVU P -> /4/10/1022/25

and are short enough to be used as a primary key. The variable length
encoding allows for both deep and wide trees with reasonable efficiency.

Anyway, I'll try to stick it up on the web somewhere over the weekend if
anyone wants to play.

Regards,

Brian.

···

On Fri, Oct 08, 2004 at 06:24:44PM +0900, George Moschovitis wrote:

>Having a hierarchy can be useful too, e.g. for access control, where a
>user can only "see" objects which are below them in the tree. And XML is
>still useful as an export/import tool, even if most people on this list use
>YAML anyway :slight_smile:

well n/Db fully supports a hierarchy. Have a look at the Child and
ParentClass and at DbConnection#children etc.

Yep. The trouble with just having a child/parent link is that it's very
inefficient to do a search which is limited to all nodes within a subtree,

... anyone wants to play.

Have a look at the SqlTreeTraversable Mixin in the file n/db/mixins.rb
If you include this mixin in your managed Object you can efficiently serach in subtrees. This mixin implements the common Preorder Tree Traversal SQL pattern. Is this what you are looking for?

regards,
George Moschovitis

···

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

and this is the one vaguely clever bit of what I've implemented. Each node
has a 'path' attribute stored in a compacted form. It has the property that
   select * from elements where path like 'ABC%' order by path;

This is a common SQL pattern, however I believe that the method used for
efficient hierarchical traversal in n/Db is more general.

regards,
George

···

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

ActiveRecord:
this seems to me to be an R->O mapping and not vice versa
i think this doesnt use standard ruby objects but something like OpenStruct.

Can you clarify what you mean by R->O vs O<->R?

Lafcadio:
R->O again, you have to write sql definitnions

Actually, Lafcadio comes with a lafcadio_schema script that allows you to generate create table statements from a Ruby domain class file.

F.

···

On Oct 8, 2004, at 5:09 AM, George Moschovitis wrote:

Can you clarify what you mean by R->O vs O<->R?

From what I 've seen Active Record takes an SQL Schema as input
(R) and creates a Ruby structure (O), ie R->O

regards,
George Moschovitis

···

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

>Yep. The trouble with just having a child/parent link is that it's very
>inefficient to do a search which is limited to all nodes within a subtree,
... anyone wants to play.
>

Have a look at the SqlTreeTraversable Mixin in the file n/db/mixins.rb
If you include this mixin in your managed Object you can efficiently
serach in subtrees. This mixin implements the common Preorder Tree
Traversal SQL pattern. Is this what you are looking for?

I don't think so, but I'm afraid I couldn't work out what the
"SqlTreeTraversable" module actually does from its rdoc comment or code.

An example of what I want to do is as follows. Say I have a tree of E-mail
accounts, ordered hierarchically:

       . (root)
       >
   resellers
       >
     VISPS
       >
     Users
       >
    Mailboxes

A reseller "owns" a set of VISPs, each of which "own" their users, each of
which "owns" some mailboxes. Let's say we have 10 resellers, each with 100
VISPs, each with 2000 users, each with 5 mailboxes; that makes 10M mailboxes
altogether.

If I'm logged in as a particular reseller, I should only be able to see my
portion of the tree. Then I want to be able to ask efficiently: "find all
E-mail addresses like brianc% which are underneath me", as a single SQL
query.

It can be implemented as a join using the parent-child relationship, but is
unlikely to be efficient (probably will end up enumerating all 1 million
nodes which are underneath this reseller, just to find the ones which start
'brianc%')

Then consider that I make the tree recursive, so a User can have their own
VISPs, which can have Users, which can have VISPs... etc. Now, finding
mailboxes like brianc% under a particular User or VISP is not at all easy.
Oracle has SQL extensions for tree traversal, but it's not efficient if you
end up walking a very large section of tree.

However, if path strings like I described before are allocated to each node,
it becomes easy:

   select .... where e.path like 'A%' and e.address like 'brianc%';

('A' being the point in the tree I am searching below). A good SQL query
optimiser will then be able to estimate for itself whether it is more
efficient to find all nodes under A% and then limit that to those which
match brianc%, or find all nodes which match brianc% and limit them to those
under A%. Typically one or the other will dominate.

For XML-heads, it may be clearer to observe that XPATH queries can be
translated directly into SQL: e.g.

    id(123)//bar # all 'bar' nodes under subtree rooted at id(123)

becomes

    select ... where e.path like '123%' and e.type = 'bar';

Thus you have the possibility of treating your database as one huge XML
document (but without having to load it all into RAM, and with the
concurrent access and transaction safety that a SQL database allows). The
XPATH stuff is still a long way from being implemented though.

Cheers,

Brian.

···

On Fri, Oct 08, 2004 at 09:29:44PM +0900, George Moschovitis wrote:

serach in subtrees. This mixin implements the common Preorder Tree
Traversal SQL pattern. Is this what you are looking for?

>

I don't think so, but I'm afraid I couldn't work out what the
"SqlTreeTraversable" module actually does from its rdoc comment or code.

I do think so :slight_smile:
Have a look at the following link tha explains this technique:

http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html

I 'll try to improve the documentation though :slight_smile:

best regards
George

···

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

Have a look at the following link tha explains this technique:

Trees in SQL. Joe Celko

This is a very cute data structure which I hadn't come across before.
However, if I add a new subordinate to 'Bert' (say 'Bob'), don't I end up
having to renumber almost the entire tree?

            Albert
             1,14
         / \
     Bert Chuck
     2,5 6,13
      > / | \
     Bob Donna Eddie Fred
     3,4 7,8 9,10 11,12

Example 5 in that page gives the SQL to do this, but I don't fancy updating
10 million nodes on each insert (and in an ISP, signups happen
continuously). Perhaps there is a workaround: e.g. add the nodes into a
temporary adjacency list representation, and then once a night walk the tree
to insert in bulk. I'd have to think about that.

Regards,

Brian.

···

On Fri, Oct 08, 2004 at 10:39:44PM +0900, George Moschovitis wrote:

Brian Candler wrote:

Have a look at the following link tha explains this technique:

Trees in SQL. Joe Celko

This is a very cute data structure which I hadn't come across before.
However, if I add a new subordinate to 'Bert' (say 'Bob'), don't I end up
having to renumber almost the entire tree?

Yeap this is the tradeoff. Slower inserts give you faster (and constant time) reads. The tradeoff compared to your idea is generality.

The latest version of the web application framework we are using here at Navel further encapsulates this pattern to make it almost transparent. It is very usefull for fora messages, comments and more.

I 'll probably also add a Mixin for the 'materialized paths' method you described (this is the 'formal' name) in the next version of n/Db. Sometime in the next week, I hope.

best regards
George Moschovitis

ps: I would really like to have a look at your xpath->sql translator.

···

On Fri, Oct 08, 2004 at 10:39:44PM +0900, George Moschovitis wrote:

--
www.navel.gr | tel: +30 2106898050 | fax: +30 2106898437

Navel does not accept liability for any errors, viruses or omissions in the contents of this message. The full corporate policy is available on our site.

have fun: www.joy.gr

"Brian Candler" <B.Candler@pobox.com> wrote in message

This is a very cute data structure which I hadn't come across before.

Google for "ait-kaci lattice encoding" or "efficient tree encoding" and you
will see lots of work in this space, covering trees, DAGs, etc. and mapping
them to bit vectors, integers, integer ranges, etc.

However, if I add a new subordinate to 'Bert' (say 'Bob'), don't I end up
having to renumber almost the entire tree?

            Albert
             1,14
         / \
     Bert Chuck
     2,5 6,13
      > / | \
     Bob Donna Eddie Fred
     3,4 7,8 9,10 11,12

Add a couple of zero's (1,14 --> 100, 1400) and you suddenly have a whole
bunch of breathing room before you have to renumber the whole tree :slight_smile:

This guy works (or used to work for Oracle) and came up with a scheme
for storing trees in a relational structure using real numbers (well
that is where he starts). Anyway, interesting solution. the only
thing is that it is only good for hierarchical data.

Kelly Nawrocke

···

On Sat, 9 Oct 2004 02:59:44 +0900, Its Me <itsme213@hotmail.com> wrote:

"Brian Candler" <B.Candler@pobox.com> wrote in message
> This is a very cute data structure which I hadn't come across before.

Google for "ait-kaci lattice encoding" or "efficient tree encoding" and you
will see lots of work in this space, covering trees, DAGs, etc. and mapping
them to bit vectors, integers, integer ranges, etc.

> However, if I add a new subordinate to 'Bert' (say 'Bob'), don't I end up
> having to renumber almost the entire tree?
>
> Albert
> 1,14
> / \
> Bert Chuck
> 2,5 6,13
> > / | \
> Bob Donna Eddie Fred
> 3,4 7,8 9,10 11,12
>

Add a couple of zero's (1,14 --> 100, 1400) and you suddenly have a whole
bunch of breathing room before you have to renumber the whole tree :slight_smile: