The "perfect" ORM?

George Moschovitis wrote:

> I dislike putting the metadata for my objects into the objects
> themselves.

The metadata is stored in the object class not the actual instances.

Yes, I realize that.

> In addition, my memory of Og is that it encourages thinking in
> database terms (like "has_many") -- true or not?

I dont think that has many is a database term, it just decribes object
relations amd allows Og to automagiaclly generate some useful methods.
This is an abstraction.

I think it was originally a database term which then found its way into
things like UML. In classic (non-UML) discussions of OOP, I have never
seen these terms.

I think, using Og you almost forget that you are using a database. In
fact you dont need an RDBMS store.

I like Og, it just feels verbose to me.

But I am the type who would rather do

  arr.mapf(:downcase)

instead of

  arr.map {|x| x.downcase }

And I feel that with Og, I am coupling my classes to the persistence
mechanism. Rather than "teaching my objects to persist" I would like
to give information to a persistence framework, and then just pass in
pristine objects (undecorated, unannotated, no tattoos or bumper
stickers).

> 2. To specify the minimum information necessary in order to marshal
> each of my types.

Og supports this.

> 3. To store the metadata separately from my classes/objects so as to
> minimize impact on them. (But probably not in a separate file.)

You can do this in the latest version.

OK, I didn't know that at all. :slight_smile:

Og is constantly evolving stay tunned for even better abstractions.
Even better you can help us with suggestions and/or patches. Join the
mailing list :wink:

I will try first to produce something of my own that I like. If I fail,
then Og will be my favorite framework. :wink: And then I suppose I would
join the list and so on.

Thanks,
Hal

Hi guys,

def Person
  attr_reader :name, :surname, :day_of_birth

  def initialize(name, surname, day_of_birth)
    @name, @surname, @day_of_birth = name, surname, day_of_birth
  end

  def name=(name)
    @name = name
  end
end

Take this simple class.

Many implementations exist to decribe relation, data types, ... but
nothing forbids you to separate the meta-data description in another
file. Even if it's described in the class.

I'm really bad at writing english explanations so I hope you got it.
Unlike java, ruby allows you to redefine classes. I don't say this
because I think you don't know it, but in this particular discussion I
see it was not talked. I think it's an interesting approach at the
first sight.

Cheers,
  zimba-tm

Jeremy Kemper wrote:

This is very different from composing a domain model, coding it up, then devising a mapper to persist your object graph. As far as I know there are no Ruby ORM that attempt this.

If I understand you right, that is the way I want to do it.

Having done it both ways, I prefer those little hints. The apparent cost of a generic domain mapper is deceptively low due to the "it seems nice" discount, but its true cost is far higher: high conceptual overhead, difficult mapping bugs, and carpal tunnel.

Well, I guess I will work on it and see where it goes. Maybe it will
prove impractical. Maybe no one will like it but me. We'll see.

Thanks,
Hal

This is very different from composing a domain model, coding it up,
then devising a mapper to persist your object graph. As far as I
know there are no Ruby ORM that attempt this.

Ehm, Og does that.

First you get your somain model:

class User
  attr_accessor :name
end

class Book
  attr_accessor :title
end

Then you can annotate the model as needed (even in another source file):

class User
  ann :name, :klass => String
  has_many :books
end

class Book
  belongs_to :user
end

Of course you can combine the two steps in one if you want (and typically you
want this ;-))

then you just issue:

user.save to perist yourt object
or
User.find_by_name(...)
to query etc etc...

Og even allows for inheritance and polymorphic relations. You can use
a non RDBMS store if you want, or no store at all (use it just as an
object manager). I suggest you have a look at this.

regards,
George.

···

--

http://www.navel.gr

I just have to say first that I've really enjoyed reading these planning messages of yours Hal. It's clear you have a vision and know what you want. I can't wait to see the results.

That said, I don't understand the full vision, so forgive my dumb question.

The whole time I'm reading your posts though I keep thinking you just want:

   File.open("objects", "w") do |file| Marshal.dump(whatever, file) end

Or YAML. Or PStore if you also want transactions.

Can you explain how what you want differs from this?

James Edward Gray II

···

On Oct 28, 2005, at 4:47 PM, rubyhacker@gmail.com wrote:

I want to do as little specification as possible to store my objects.

Yes, but that's just giving different names to the same things, isn't
it?

Well, yeah. :slight_smile:

As far as I can see, all that's really needed is:
1. Let each class map to a table
2. Let each table have a known unique primary key

Then we don't need all this "relationship" stuff, do we?

In some cases, you can't do without it.

ORM can be approached from two basic directions. It can be database driven,
in that the structure of the database dictates the language structures, or
one can have the language structures dictate the database structure.

For instance, if you have a database that you need to write an application to
interact with, that database structure is going to dictate your Ruby language
structures, and your use of those structures should not change the structure
of the database.

In a case like that, depending on what database you are using, the database
structure might explicitly describe the relationship between tables for you,
or it may not.

If it does, IMHO, the ideal for an ORM, and where I am going with Kansas, is
for the ORM to be able to understand that and act on it, so that one
automatically gets Ruby classes and methods that simply make that database
structure accessible with no effort on the part of the programmer. It's
automatic.

On the other hand, if the database does not provide this information, or the
code is being written so that it can be used on multiple dbs, and one of the
potential targets does not provide this information, then if one wants to
make use of relationships, one must provide this information. There's no way
around it.

And the reason why relationships are useful beyond having a simple mapping of
class to table is because they make the code convenient.

In the MSDS application that my examples come from, one can select a single
chemical and view information on it. One thing that can be done at that
point is to see other chemicals manufactured by the same manufacturer that
the chemical being viewed comes from.

So, if that chemical's record is in @chemical:

@chemical.manufacturer.chemicals

And you have your list. The relationship information provided the necessary
string to tie those together without the programmer having to explicity write
the code. It is a tremendous timesaver.

Now, the other direction that an ORM can go is to let the language structures
dictate the database structures.

So, for instance, you declare a class:

class Foo
  attr_accessor :a, :b
end

And somehow, that class is mapped to the database, creating or altering the
table definition as necessary. When doing this, if a field is going to store
objects of another class that is also mapped to a table in the database, you
have to tell the ORM about that so that it knows how the tables should look
in order to store your data.

Regardless of the direction that one is approaching the ORM task from, the
annotation still serves the same purpose -- it makes sure that the ORM is
doing what you want it to in cases where interpretation is ambiguous or even
impossible.

As for a "has-many" relationship (in objects, not in DBs) -- isn't that

just what we call an "array"? The difference being that Ruby arrays are

heterogeneous whereas rows of a table all represent the same type?

Yep.

I want to do as little specification as possible to store my objects.
That's where I'm coming from. I want the persistence framework to be as
smart as possible and make as many reasonable assumptions as possible.

I want to spend as little time coding the metadata portion as I can,
and
I want it all stuck in the same place in my code, in as few lines as
possible.

Those are all my goals, too. :slight_smile:

Let's say that you have two classes:

class Manufacturers
  attr_accessor :idx, :name, :address
end

class Chemicals
  attr_accessor :idx, :name, :manufacturer
end

Consider the manufacturer field in the Chemicals class. An ORM can not look
at this and know that you intend to store Manufacturers objects in it. And
that information is important to determine the structure of the database.

What if you have a Manufacturers object, and you want to know all of the
chemicals that have that manufacturer? You could write a method manually to
do that:

class Manufacturers
  def chemicals
    #query the db and retrieve an array of Chemicals records where
    # manufacturer == self.idx
  end
end

But if all of that typing can be reduced to simply telling the ORM about the
relationship, using some syntax or other, isn't that a win?

class Manufacturers
  relates_to Chemicals.manufacturer
end

Thanks,

Kirk Haines

···

On Friday 28 October 2005 3:47 pm, rubyhacker@gmail.com wrote:

James Edward Gray II wrote:

I want to do as little specification as possible to store my objects.

I just have to say first that I've really enjoyed reading these planning messages of yours Hal. It's clear you have a vision and know what you want. I can't wait to see the results.

That said, I don't understand the full vision, so forgive my dumb question.

The whole time I'm reading your posts though I keep thinking you just want:

  File.open("objects", "w") do |file| Marshal.dump(whatever, file) end

Or YAML. Or PStore if you also want transactions.

Can you explain how what you want differs from this?

Not a dumb question at all. And I wouldn't dignify my ideas as a "vision"
yet. If it works well, I'll retroactively dub it a vision. :wink:

Basically the only thing missing from a YAML solution or something is
the ability to do sophisticated queries without storing all the objects
in memory at once.

Given that I may have 100,000 objects or so, I don't want to store them
all in a giant array, but I *do* want to be able to find them by the
values of their accessors.

Make sense?

Hal

···

On Oct 28, 2005, at 4:47 PM, rubyhacker@gmail.com wrote:

Kirk Haines wrote:

In some cases, you can't do without it.

ORM can be approached from two basic directions. It can be database driven, in that the structure of the database dictates the language structures, or one can have the language structures dictate the database structure.

I think I am definitely object-driven.

[snippage]

And the reason why relationships are useful beyond having a simple mapping of class to table is because they make the code convenient.

In the MSDS application that my examples come from, one can select a single chemical and view information on it. One thing that can be done at that point is to see other chemicals manufactured by the same manufacturer that the chemical being viewed comes from.

So, if that chemical's record is in @chemical:

@chemical.manufacturer.chemicals

And you have your list. The relationship information provided the necessary string to tie those together without the programmer having to explicity write the code. It is a tremendous timesaver.

It is, and I see the usefulness of it, but it doesn't fit my brain.

That is the sort of thing I'd use a query for, rather than just grabbing the
value of what looks to me like an accessor.

When you call chemicals in that way, is it then doing a query (late binding)
or did it get done recursively when you retrieved @chemical?

Now, the other direction that an ORM can go is to let the language structures dictate the database structures.

So, for instance, you declare a class:

class Foo
  attr_accessor :a, :b
end

And somehow, that class is mapped to the database, creating or altering the table definition as necessary. When doing this, if a field is going to store objects of another class that is also mapped to a table in the database, you have to tell the ORM about that so that it knows how the tables should look in order to store your data.

[snip]

Yes, this is my personal preference.

[snip]

What if you have a Manufacturers object, and you want to know all of the chemicals that have that manufacturer? You could write a method manually to do that:

class Manufacturers
  def chemicals
    #query the db and retrieve an array of Chemicals records where
    # manufacturer == self.idx
  end
end

But if all of that typing can be reduced to simply telling the ORM about the relationship, using some syntax or other, isn't that a win?

class Manufacturers
  relates_to Chemicals.manufacturer
end

It's a win if you want to think that way. I'd rather just make the
query syntax easy/flexible and forget about "relates_to" and such.

I understand a simple query. But every time I saw "relates_to" I
would have to stop and ask myself how it worked and what it meant.

Hal

Hal Fulton wrote:

Kirk Haines wrote:
>
> In some cases, you can't do without it.
>
> ORM can be approached from two basic directions. It can be database driven,
> in that the structure of the database dictates the language structures, or
> one can have the language structures dictate the database structure.

I think I am definitely object-driven.

[snippage]

> And the reason why relationships are useful beyond having a simple mapping of
> class to table is because they make the code convenient.
>
> In the MSDS application that my examples come from, one can select a single
> chemical and view information on it. One thing that can be done at that
> point is to see other chemicals manufactured by the same manufacturer that
> the chemical being viewed comes from.
>
> So, if that chemical's record is in @chemical:
>
> @chemical.manufacturer.chemicals
>
> And you have your list. The relationship information provided the necessary
> string to tie those together without the programmer having to explicity write
> the code. It is a tremendous timesaver.

It is, and I see the usefulness of it, but it doesn't fit my brain.

That is the sort of thing I'd use a query for, rather than just grabbing the
value of what looks to me like an accessor.

I think it's easy to underestimate the value of being able to pass
around views of your data as objects. Using a query can will quickly
clog up your code (check how clogged up even LINQ queries can become
for the coming C# 3.0). I've dramatically reduced the size of some apps
using that approach. Then again, there's no point in condensing your
code if it doesn't read easily for you.
After much resistance my brain now fits in with Kirk's view of things.
I see my model as a database and want to design for a database.
However, I want my database to be a ruby object. I also want to reduce
the number of queries which always introduces errors and debugging for
me.
This may sound odd, but I want the purity of design of the relational
model, as well as the purity of design of ruby OO, without one
polluting the other. So I like one layer of my app explicitly for that
purpose.

It's a win if you want to think that way. I'd rather just make the
query syntax easy/flexible and forget about "relates_to" and such.

I understand a simple query. But every time I saw "relates_to" I
would have to stop and ask myself how it worked and what it meant.

You wouldn't call relates_to very much apart from when you initialize
the app. I personally find that switching my brain to 'database mode'
in those few necessary cases isn't too expensive in runtime.

DCC

Yes, I think I understand now, finally. It's a very interesting idea. Can't wait to see what you come up with...

James Edward Gray II

···

On Oct 29, 2005, at 1:11 PM, Hal Fulton wrote:

Not a dumb question at all. And I wouldn't dignify my ideas as a "vision"
yet. If it works well, I'll retroactively dub it a vision. :wink:

Basically the only thing missing from a YAML solution or something is
the ability to do sophisticated queries without storing all the objects
in memory at once.

Given that I may have 100,000 objects or so, I don't want to store them
all in a giant array, but I *do* want to be able to find them by the
values of their accessors.

Make sense?

Hal Fulton wrote:

James Edward Gray II wrote:

The whole time I'm reading your posts though I keep thinking you just
want:

  File.open("objects", "w") do |file| Marshal.dump(whatever, file) end

Or YAML. Or PStore if you also want transactions.

Can you explain how what you want differs from this?

Not a dumb question at all. And I wouldn't dignify my ideas as a "vision"
yet. If it works well, I'll retroactively dub it a vision. :wink:

Basically the only thing missing from a YAML solution or something is
the ability to do sophisticated queries without storing all the objects
in memory at once.

Given that I may have 100,000 objects or so, I don't want to store them
all in a giant array, but I *do* want to be able to find them by the
values of their accessors.

So Marshalled objects plus indexes?

Dave

Nice summary, Dave. If this is indeed what Hal is talking about, it seems like a very nice "fit".

From off the top of my head, an ideal data repository has the following qualities:

1. Infinite storage capacity
2. Zero access time
3. Persistent / Failsafe

Current technology (i.e. a hard drive) marries persistence with storage capacity and unfortunately increases access time. In-memory data reverses the advantages and disadvantages--it decreased access time, but it has a smaller storage capacity and it is no longer persistent.

Indexing is a netherworld. By imposing some structure on the data (e.g. "The 'id' attribute will always contain an integer") we can store ordered information about an otherwise haphazard data web. The ordering gives us the ability to predict where to look for information (e.g. sort the 'id' attribute numerically). This is important--we only need structure where we need to predict something. If we don't need to predict where to find information then an index is unnecessary. The "imposed structure" of database tables goes away if we don't need a bird's eye view of the data.

It seems that "Marshalled objects + Indexes" gives us this happy middle ground--most of the time we don't need to predict where to find information (e.g. many array attributes) but in the cases where we do, we could impose that "thread" of structure (aka an index) on a YAML file.

Duane Johnson
(canadaduane)

···

On Oct 30, 2005, at 5:52 AM, Dave Burt wrote:

Basically the only thing missing from a YAML solution or something is
the ability to do sophisticated queries without storing all the objects
in memory at once.

Given that I may have 100,000 objects or so, I don't want to store them
all in a giant array, but I *do* want to be able to find them by the
values of their accessors.

So Marshalled objects plus indexes?

There is a package called DyBase out there, it's somewhat dated and unmaintained, but it
basically does the marshalled objects with indexes, and it has an API for Ruby, PHP, Python
and some other language I've never heard of.

The only reason not to use it is because it stores data in a binary format file, it's
lightning fast for what it does.

-Jeff

···

On Thu, Nov 03, 2005 at 02:52:24AM +0900, Duane Johnson wrote:

On Oct 30, 2005, at 5:52 AM, Dave Burt wrote:
>>Basically the only thing missing from a YAML solution or something is
>>the ability to do sophisticated queries without storing all the
>>objects
>>in memory at once.
>>
>>Given that I may have 100,000 objects or so, I don't want to store
>>them
>>all in a giant array, but I *do* want to be able to find them by the
>>values of their accessors.
>
>So Marshalled objects plus indexes?
>

Nice summary, Dave. If this is indeed what Hal is talking about, it
seems like a very nice "fit".

From off the top of my head, an ideal data repository has the
following qualities:

1. Infinite storage capacity
2. Zero access time
3. Persistent / Failsafe

Current technology (i.e. a hard drive) marries persistence with
storage capacity and unfortunately increases access time. In-memory
data reverses the advantages and disadvantages--it decreased access
time, but it has a smaller storage capacity and it is no longer
persistent.

Indexing is a netherworld. By imposing some structure on the data
(e.g. "The 'id' attribute will always contain an integer") we can
store ordered information about an otherwise haphazard data web. The
ordering gives us the ability to predict where to look for
information (e.g. sort the 'id' attribute numerically). This is
important--we only need structure where we need to predict
something. If we don't need to predict where to find information
then an index is unnecessary. The "imposed structure" of database
tables goes away if we don't need a bird's eye view of the data.

It seems that "Marshalled objects + Indexes" gives us this happy
middle ground--most of the time we don't need to predict where to
find information (e.g. many array attributes) but in the cases where
we do, we could impose that "thread" of structure (aka an index) on a
YAML file.

Duane Johnson
(canadaduane)

Duane Johnson wrote:

Basically the only thing missing from a YAML solution or something is
the ability to do sophisticated queries without storing all the objects
in memory at once.

Given that I may have 100,000 objects or so, I don't want to store them
all in a giant array, but I *do* want to be able to find them by the
values of their accessors.

So Marshalled objects plus indexes?

Nice summary, Dave. If this is indeed what Hal is talking about, it seems like a very nice "fit".

I guess I never replied to this one. I'm not sure that this is the
way I would state it, but it's mostly correct.

After all, complex queries don't depend on indexes. Indexes just make
them faster.

From off the top of my head, an ideal data repository has the following qualities:

1. Infinite storage capacity
2. Zero access time
3. Persistent / Failsafe

I would add transparency with regard to objects. That is, I don't want
to assemble and disassemble my objects from records manually.

  > It seems that "Marshalled objects + Indexes" gives us this happy middle

ground--most of the time we don't need to predict where to find information (e.g. many array attributes) but in the cases where we do, we could impose that "thread" of structure (aka an index) on a YAML file.

The paradigm of "marshalling + indexes" is an interesting one indeed. But
when I think of queries, I think databases. That is how my interest in
KirbyBase arose.

So for now I will build some kind of solution on top of KB rather than
add my own indexing/querying scheme to YAML or something.

Hal

···

On Oct 30, 2005, at 5:52 AM, Dave Burt wrote: