[OT] RE: help -- persuade my boss to adopt ruby

Ok, I confess: I know nothing about data
modelling – and I assume you don’t just
mean ADT’s. (I don’t even know what you
mean by it, so it may be a terminology / > language issue, but most likely I’m just
ignorant.)

It is a confusing term, so I’ll do my best
to explain it as I mean it. I may end up
writing an article about it and putting it
up, so don’t take this as definitive.

Data Modeling can mean one of several things, depending on the context. When I use it, I most often mean it in the sense of database design. (It’s something of a specialty of mine; one of several.) However, it can also mean determining how the business uses the data. Object oriented design is a special case of data modeling, but one where too many OO “designers” focus on the functionality rather than the data (at least IME).

Although this book is Oracle-specific, it’s still an excellent reference for real-world data modeling. Dave Ensor & Ian Stevenson wrote “Oracle Design” (O’Reilly, 1997). Chapter 3 is all about data modeling in the large. They say:

"What is data modeling? It is simply a means of formally capturing the data that is of relevance to an organization in conducting its business. It is one of the fundamental analysis techniques in use today, the foundation on which we build relational databases."

I’m of the opinion that it’s foundational to everything – you have to have an understanding of the sort of data that you’re dealing with so that you’re not just dealing with it in the context of a single program. Data modeling is often considered part of analysis, but I actually believe that it’s the role of the analyst and the designer together to work out the data model.

Data modeling fundamentally looks at the relationships behind the data – any given piece of data dosn’t stand in isolation.

One also normalizes one’s data model – at least in 3NF (3rd normal form). Codd defined normalization as the basis for removing unwanted “functional dependencies” from data entities, where an FD is when the value of an attribute can be known by knowing the value of another attribute in the same entity (the name of a country implies its capital city). It’s also possible to have a “multivalued dependency” (one attribute determines a set of values of another attribute, such as the name of a country leading to the name of all of its airports). Normalization helps model the necessary data in two dimensions for relational databases without imposing too many conditions or compromising the data’s integrity; it also reduces redundancy and inconsistency because of redundancy.

There are six normal forms: first (1NF), second (2NF), third (3NF), Boyce-Codd (BCNF), fourth (4NF), and fifth (5NF). The book that I’ve recommended covers these in detail. Normalization does proliferate entities, but it also makes each of those entities atomic – you can change them without affecting unrelated data improperly.

1NF: Only atomic attribute values are allowed. All repeating groups must be removed and placed in a new related entity.

2NF: 1NF + non-key attributes must be fully dependent upon the primary key of the entity.

3NF: 2NF + non-key attributes must be ONLY dependent on the primary key (they may not depend on other non-key attributes). 3NF is often summarized by “All attributes of an entity must depend on the key, the whole key, and nothing but the key (so help me Codd)”. Most folks won’t go beyond 3NF, but they’re good to know.

BCNF: 3NF + transitive dependencies removed. (“Table R is in BCNF if, for every nontrivial FD X → A, X is a superkey.”) BCNF may require a bit of added redundancy. IMO, The example that Ensor and Stevenson use for BCNF could have been simplified by the use of a proxy key (even though some folks frown on that, I find it useful).

4NF: 3NF/BCNF + removal of multiple multivalued dependencies. Ensor & Stevenson give a good example.

5NF: 4NF + resolving 3+ entities with many-to-many relationships to one another. This problem can show up with a data modeling tool that creates associative entities, resulting in a “join-projection anomaly”), so this form is sometimes called “join-projection normal form” (JPNF). Again, Ensor & Stevenson give a better example than I can.

Beyond simple data modeling, most “business” rules can be expressed as data themselves. Ensor & Stevenson point out that “most applications are designed to be code driven.” The rules applied in a particular case are driven by choices within the code itself. I’ve switched largely to data-driven designs (see the state flow code in Bug Traction for an example; the flow is completely driven by the data within the tables; the code only supports that data – well, mostly – there’s a little code flow left, but that will be disappearing when I decide to work on Bug Traction again, as the code flow is wrong).

There are times when one should denormalize as well, but that’s as much an art as anything else.

Data modeling is, at least, IMO, a under-utilized skill and it’s not taught nearly enough in CS programs. I had to teach myself with some good mentors in business – but Ensor & Stevens helped a lot, too.

-a

···


austin ziegler
Sent from my Treo

It is a confusing term, so I’ll do my best
to explain it as I mean it. I may end up
writing an article about it and putting it
up, so don’t take this as definitive.

Yep, fair play Austin, your knowledge of data modelling is impressive. Thanks
for the info.

It’s funny because having worked with OO over the years, my perception is quite
different to yours!

Object modelling was always intended to be a superset of ER modelling, e.g.
finer grained multiplicity, agregation, ternary assocations, etc. It is natural
for me to choose A UML model over an ER diagram. I have an Oracle book that
provides good examples of how one converts a UML model into a nomalised ER
model. (Although funnily enough I think many OO methodologies promote 3NF from
the outset).

I think the perception that OO designers focus on functionality is probably
true - I do it myself! This is often called “Responsibility-driven design”
(Check out Wirfs-Brocks book). A common problem in OO (and I suspect evident in
ER communities too) is “analysis paralysis”, where a developer gets carried away
in trying to produce the finest model in existence. Martin Fowler pointed out
in his “Analysis Patterns” book that there are infinite models of any problem.
The only concrete driver behind any data requirement is what users actually want
to do with the application. Of course as experienced designers, we look ahead
and anticipate furture requirements and thus the models to support them. This
is however a gamble and indicative I think of over engineering.

I think the real travesty about OO is that of transparent object persistence or
lack of it, possibly from the on slaught of camp Oracle. At the end of the day,
I want to be able to program in a single lanaguage, be able to create complex
models and relationships, travese them and change them (within the bounds of a
transaction), without having to worry about how that information is being
physically stored. Take PSQL (which I know you are higly skilled in - respect!).
If I want to implement a complex ring structure in OO, it is easy. To persist
this today in an RDB, I would have to resort to PSQL to manage the integrity of
the cyclic dependency. Why should I have to learn another langauge to do this?

I think RDBMS will always be a major part of large businesses because they are
such a simple and accessible tool. I hope however that there remains a niche
for applications that can benefit from the best in OO and transparent
persistence and I think ruby is absolutely ideal for this.

All the best
Russ Freeman

···

Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.427 / Virus Database: 240 - Release Date: 06/12/2002

“Austin Ziegler” austin@halostatue.ca wrote in message

There are times when one should denormalize as well, but that’s as much an
art as anything else.

Now that you have mentioned this, I would urge you to read some very
enlightening articles on www.dbdebunk.com.

Search the word “denormalize” on the site’s search engine, and you will get
a wealth of material.

DBDebunk is a good thing. Denormalization is definitely an art –
and something that should be approached with much trepidation (if
not abject fear). I haven’t really found any times when it made
a lot of sense – but I have had to do it because of PHBs.

I like what Fabian and Date have to say on DBDebunk, but they are a
bit abrasive which I find quite … unhelpful to their cause.

I’m fighting a bit of idiotic denormalization at this point
(unnecessary and premature aggregation of data, which is making
extension of the query to support two new fields for aggregation
nearly impossible). The simple rule is that one shouldn’t store
values calculated from other columns or tables.

There are, of course, times when storing of calculated values is
necessary – but they are specific and limited, and should be 100%
repeatable every time.

In specific, I’m thinking of billing software. A good billing system
will collect the on-line data, perform billing in a separate
database and/or schema
, and after the data has been calculated,
then you insert billing records, which may be calculated from the
original data, but have now acquired more information that was not
part of the original data (e.g., the billing date and a few other
items like that). In this case, the cost of the duplicated data is a
known cost – because the value isn’t necessarily in the
duplication, but in the data added. (And even this can be
normalized significantly.)

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.12.17 at 20.40.37

···

On Wed, 18 Dec 2002 10:28:25 +0900, Shashank Date wrote:

“Austin Ziegler” austin@halostatue.ca wrote in message

There are times when one should denormalize as well, but that’s
as much an art as anything else.
Now that you have mentioned this, I would urge you to read some
very enlightening articles on www.dbdebunk.com.

Search the word “denormalize” on the site’s search engine, and you
will get a wealth of material.

Yep, fair play Austin, your knowledge of data modelling is
impressive. Thanks for the info.

To be fair, I was also sitting with my reference handy to use the
specific terms that I don’t use on a day to day basis. I’m not a DM
theorist; I’m a practitioner. I don’t necessarily use the terms on a
daily basis, I just look for the results to match my experience. As
I also noted, I will often use proxy keys, which is frowned on by
some practitioners. (I use proxy keys most often when the other
candidate keys are likely to be volatile.)

It’s funny because having worked with OO over the years, my
perception is quite different to yours!

My perception is based on experience – specifically, seeing OO
enthusiasts with no experience in data modeling (specifically, the
capturing of data as related to the customer’s needs) attempting to
create an object model and then forcing the object model directly
onto the database. I’ll also admit that I can’t stand object
databases, far preferring object-relational and pure relational
databases. I find that object databases are suitable only for one
application. In most cases, this also means that there’s a single
access path to the data. This makes it difficult to find all
customers in a single postal code because most people will design a
customer object such that the customer HAS-AN address object.

Object modelling was always intended to be a superset of ER
modelling, e.g. finer grained multiplicity, agregation, ternary
assocations, etc.

I think it was intended as such, but I think that a lot of people
aren’t taught data modeling first and are dumped right into (bad)
object modeling.

It is natural for me to choose A UML model over an ER diagram. I
have an Oracle book that provides good examples of how one
converts a UML model into a nomalised ER model. (Although funnily
enough I think many OO methodologies promote 3NF from the outset).

Em, well, I find UML useless for database stuff. I will choose a UML
model for a program’s object model, but I will design the ER diagram
to be more flexible – based on the needs of the users, not just the
needs of the program.

I think the perception that OO designers focus on functionality is
probably true - I do it myself! This is often called
“Responsibility-driven design” (Check out Wirfs-Brocks book).

But this is, IMO, a mistake as a first focus. I’ve seen people
design a customer object by saying “a customer will sign up for
something,” and they focus on the customer object’s actions before
they consider what a customer object is. Both parts are necessary.
Absolutely, object designers should focus on functionality – but
only AFTER determining what the initial data model itself is. (The
data model will, of course, be enriched by an iterative design
process and interaction with the functionality.) My problem isn’t
that OO design looks at functionality, it’s that the data model
itself is ignored first by many OO developers.

A common problem in OO (and I suspect evident in ER communities
too) is “analysis paralysis”, where a developer gets carried away
in trying to produce the finest model in existence. Martin Fowler
pointed out in his “Analysis Patterns” book that there are
infinite models of any problem.

It’s common, though I think that it’s more common in OO than in ER
because ER only represents the data, where as OO models can vary
based on functionality as well as data.

The only concrete driver behind any data requirement is what users
actually want to do with the application. Of course as experienced
designers, we look ahead and anticipate furture requirements and
thus the models to support them. This is however a gamble and
indicative I think of over engineering.

You’re right – looking too far in the future is over-engineering.
What I tend to do is to look at what the client is already
collecting vs. what they need to collect for current and near future
projects. It’s a bit more important to make sure your primary
entities are well-defined in a database model than in an object
model, because it’s more expensive to extend a database than an
object…

I think the real travesty about OO is that of transparent object
persistence or lack of it, possibly from the [onslaught] of camp
Oracle.

That’s a tough situation. You’re right, it would be nice to have
such transparency, but as I’ve said before, this would be a problem
because the data model for a particular program may not match that
of another program – and it could be inefficient to force each
program to deal with the extraneous data from the first program.

At the end of the day, I want to be able to program in a single
lanaguage, be able to create complex models and relationships,
travese them and change them (within the bounds of a transaction),
without having to worry about how that information is being
physically stored.

I agree … but still have the problem that I pointed out above.
Additionally, you have the problem of versioning – which isn’t as
big a problem in relational databases as it is in OO databases
(but it is often present nonetheless; I’ve done too many migration
documents to pretend otherwise). The problem of versioning is
significant – you need to upgrade the data “in-place”. What you
need to do is to read the object from the OODB in with the old
object definition and write it with the new object definition – it
might be transparent how this is done, but it’s still necessary to
be done.

Take PSQL (which I know you are higly skilled in - respect!). If I
want to implement a complex ring structure in OO, it is easy. To
persist this today in an RDB, I would have to resort to PSQL to
manage the integrity of the cyclic dependency. Why should I have
to learn another langauge to do this?

Actually, you may not need to do that with the latest versions (8i
and higher) of Oracle, as there is a way of implementing a “delayed”
foreign key. That is, a key which ensures constraints, but doesn’t
do so until after the transaction is completed. The canonical case
for this sort of foreign key is a two-way mandatory relationship, as
in an order and an order line item. In versions prior to 8i (maybe
8.0), one of the foreign keys would have to be foregone or
implemented with a post-insert trigger (PL/SQL). In later versions,
the delayed foreign key can be placed on one of the two tables and
you can then insert into both the order and order line item tables
without worrying about “which is first.”

I think RDBMS will always be a major part of large businesses
because they are such a simple and accessible tool. I hope however
that there remains a niche for applications that can benefit from
the best in OO and transparent persistence and I think ruby is
absolutely ideal for this.

I dislike the Caché advertisements where it suggests that an OO
database is superior to a relational database. There are (as I noted
above) significant disadvantages to OO databases that aren’t offset
by the advantage of transparency offered by OO databases.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.12.17 at 20.54.13

···

On Wed, 18 Dec 2002 09:43:51 +0900, Russ Freeman wrote:

Relevant to this thread, but not in reply to anything in particular:

Since relational DBs have advantages over OODBs in your opinion, Austin, what
kind of strategy do you have for accessing them from an OO application?

I ask because, although I also very much like relational DBs, there is
obviously a bridge to be crossed between the DB and the app. I’ve never seen
this problem solved once and for all. In my experience:

  • one company wrote a layer for internal use that abstracted queries
    and updates quite nicely - but I never hear of these used wisely
  • I dislike having SQL code or even relational logic sprayed throughout
    my code, yet writing a DB-access class for each data class is
    cumbersome, uninteresting, error-prone, um … ?

The DB stuff in my current project is very nicely abstracted because the
application affords it (i.e. it’s not mundane “update customer” kind of stuff,
but more abstract), but I don’t like beginning new projects and coming up with
DB access strategies all over again.

Any thoughts?

Gavin

My perception is based on experience – specifically, seeing OO
enthusiasts
with no experience in data modeling (specifically, the capturing of
data as
related to the customer’s needs) attempting to create an object model
and then
forcing the object model directly onto the database.

This makes it difficult to find all customers in a single postal code
because
most people will design a customer object such that the customer
HAS-AN
address object.

I think it was intended as such, but I think that a lot of people
aren’t
taught data modeling first and are dumped right into (bad) object
modeling.

I find UML useless for database stuff. I will choose a UML model for
a program’s object model, but I will design the ER diagram to be more
flexible
– based on the needs of the users, not just the needs of the program.

I feel that much of your experience is of bad OO design rather than OO
models as a flawed tool. Many of the situations you state are about the
kinds on models people are comming up with rather than defects in the
notation and technology of OO. Clearly an OO database is going to make
a poor enterprise model even worse.

When I design a system I look at it from both a functional and data
perspective. It is the insersection of these two views that produces a
layered, flexible OO model, where the lower layers model common business
entities, that are reusable across an enterprise, and higer layers that
are specific to applications. The lower layers are likely to correlate
with the models you refer to. However for me I would prefer to use a
single modelling notation and langauge to capture and build these
systems.

It’s common, though I think that it’s more common in OO than in ER
because ER
only represents the data, where as OO models can vary based on
functionality
as well as data.

Yes, good point.

Actually, you may not need to do that with the latest versions (8i and
higher)
of Oracle, as there is a way of implementing a “delayed” foreign key.

Interesting, I’ll take a look at those. However I’m not convinced it
makes me feel anymore at one with my core lanaguage :slight_smile:

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.12.17 at 20.54.13

Thanks for the comments
Russ

···

Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.427 / Virus Database: 240 - Release Date: 06/12/2002


Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.427 / Virus Database: 240 - Release Date: 06/12/2002

In my experience, Enterprise Objects Framework might not be the solution
that solves the problem once and for all, but it comes pretty close…
I was a WebObjects developer for 2 years (back when it was Objective-C
only) and we did some pretty serious stuff with it - even managed to crash
Oracle once or twice :slight_smile:

···

On Wed, 18 Dec 2002 19:08:48 +0900, Gavin Sinclair wrote:

Relevant to this thread, but not in reply to anything in particular:

Since relational DBs have advantages over OODBs in your opinion, Austin, what
kind of strategy do you have for accessing them from an OO application?

I ask because, although I also very much like relational DBs, there is
obviously a bridge to be crossed between the DB and the app. I’ve never seen
this problem solved once and for all.


ste

Relevant to this thread, but not in reply to anything in
particular:

Since relational DBs have advantages over OODBs in your opinion,
Austin, what kind of strategy do you have for accessing them from
an OO application?

It depends on the application itself. I have written code which
marshals and un-marshals through an abstract layer; I have written
code that has a specific “save” and “load” methods; and I have
written code that sprinkles SQL all through the program.

I ask because, although I also very much like relational DBs,
there is obviously a bridge to be crossed between the DB and the
app. I’ve never seen this problem solved once and for all. In my
experience:

I don’t think it will be solved once and for all. I admit that there
is a fundamental disconnect between object models and data models
precisely because an object model is modeling the objects for use in
a particular program. To use a person-address example, while the
organization may need to keep track of the current and the last
three previous addresses (for some reason), the program you’re
writing only cares about the current address. In this case, you may
wish to “embed” the address information in the person object – but
it won’t be stored that way. If, of course, you may be updating
the address, then you want to keep the address as a separate object
in any case.

  • one company wrote a layer for internal use that abstracted
    queries and updates quite nicely - but I never hear of these
    used wisely
  • I dislike having SQL code or even relational logic sprayed
    throughout my code, yet writing a DB-access class for each data
    class is cumbersome, uninteresting, error-prone, um … ?

The DB stuff in my current project is very nicely abstracted
because the application affords it (i.e. it’s not mundane “update
customer” kind of stuff, but more abstract), but I don’t like
beginning new projects and coming up with DB access strategies all
over again.

Any thoughts?

Unfortunately, I can’t offer any general advice. In general, I try
to treat each table as a class and each row as an object of that
class for marshal/unmarshal purposes, but then I may have objects
which I interoperate with as a container (HAS-A relationships) when
I need to combine this information in novel ways.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.12.18 at 07.41.01

···

On Wed, 18 Dec 2002 19:08:48 +0900, Gavin Sinclair wrote:

I wrote:

My perception is based on experience […] seeing OO enthusiasts
with no experience in data modeling […] attempting to create an
object model and then forcing the object model directly onto the
database. […] most people will design a customer object such
that the customer HAS-AN address object. […] I think that a lot
of people aren’t taught data modeling first and are dumped
right into (bad) object modeling. […] I find UML useless for
database stuff. […]

Russ Freeman wrote:

I feel that much of your experience is of bad OO design rather
than OO models as a flawed tool. Many of the situations you state
are about the kinds on models people are comming up with rather
than defects in the notation and technology of OO. Clearly an OO
database is going to make a poor enterprise model even worse.

You’re right – but as much as anything else, I’m also condemning
the state of training for people who try to do OO models. I think
that I’m a decent object modeler, but only because I’m a damn good
data modeler. (Humble? About that? No way.)

UML as it currently stands (version 1.x) is a very poor fit to
database modeling, as it provides little information that a good ER
model provides. Relationships in UML are much more about IS-A and
HAS-A inheritance and containment relationships, whereas foreign
keys in databases are much more than just HAS-A relationships. An FK
can represent a HAS-A relationship (an employee HAS-A manager [also
an employee], represented as a “pig’s ear” FK notation in ERDs).
Alternatively, it can represent an “is-constrained-by” relationship
(an email address stored in one table must map to a valid account).

I personally find the “crow’s feet” notation much easier than the
diamond notation used in UML (the crow’s feet notation can indicate
whether a relationship is mandatory or optional and “to one” or “to
many” – without using the 0 … * and 1 … * notations required by
UML).

Within UML, I will use a lot of the features available: use
cases[1], class models, state charts, message passing charts (both
forms), etc. It’s really quite useful for expressing how the system
will work, but I find that it’s easier to consider the data model
separate from the object model, although they are definitely
related and there will be a feedback loop between the models.

When I design a system I look at it from both a functional and
data perspective. It is the insersection of these two views that
produces a layered, flexible OO model, where the lower layers
model common business entities, that are reusable across an
enterprise, and higer layers that are specific to applications.
The lower layers are likely to correlate with the models you refer
to. However for me I would prefer to use a single modelling
notation and langauge to capture and build these systems.

If I felt that UML could actually capture the information I need
from a database model, I would use it. So far, it has proven to be
incapable of doing so.

It’s common, though I think that it’s more common in OO than in
ER because ER only represents the data, where as OO models can
vary based on functionality as well as data.
Yes, good point.

I do find that having the data model and object model separate helps
here – just a little. The data model gives the object modelers a
starting point for their object model and then they can modify their
object model’s data as necessary to support the functionality
required. If they need more data than the data model provides, then
it’s a matter of enhancing the data model – but in a dynamic
system, that’s healthy.

Actually, you may not need to do that with the latest versions (8i
and higher) of Oracle, as there is a way of implementing a
“delayed” foreign key.
Interesting, I’ll take a look at those. However I’m not convinced
it
makes me feel anymore at one with my core lanaguage :slight_smile:

It’s a feature that’s nice – but I haven’t yet used. (: I think it
will solve the problem you mentioned in that you can then have the
circular dependencies you want without violating transaction scope.

-a
[1] The creation and use of use cases is very poorly explained in
every text and class I’ve had on UML, including a couple of
“Rational University” courses a former employer paid for.
– Austin Ziegler, austin@halostatue.ca on 2002.12.18 at 23.34.35