Ok, I confess: I know nothing about data
modelling – and I assume you don’t just
mean ADT’s. (I don’t even know what you
mean by it, so it may be a terminology / > language issue, but most likely I’m just
ignorant.)
It is a confusing term, so I’ll do my best
to explain it as I mean it. I may end up
writing an article about it and putting it
up, so don’t take this as definitive.
Data Modeling can mean one of several things, depending on the context. When I use it, I most often mean it in the sense of database design. (It’s something of a specialty of mine; one of several.) However, it can also mean determining how the business uses the data. Object oriented design is a special case of data modeling, but one where too many OO “designers” focus on the functionality rather than the data (at least IME).
Although this book is Oracle-specific, it’s still an excellent reference for real-world data modeling. Dave Ensor & Ian Stevenson wrote “Oracle Design” (O’Reilly, 1997). Chapter 3 is all about data modeling in the large. They say:
"What is data modeling? It is simply a means of formally capturing the data that is of relevance to an organization in conducting its business. It is one of the fundamental analysis techniques in use today, the foundation on which we build relational databases."
I’m of the opinion that it’s foundational to everything – you have to have an understanding of the sort of data that you’re dealing with so that you’re not just dealing with it in the context of a single program. Data modeling is often considered part of analysis, but I actually believe that it’s the role of the analyst and the designer together to work out the data model.
Data modeling fundamentally looks at the relationships behind the data – any given piece of data dosn’t stand in isolation.
One also normalizes one’s data model – at least in 3NF (3rd normal form). Codd defined normalization as the basis for removing unwanted “functional dependencies” from data entities, where an FD is when the value of an attribute can be known by knowing the value of another attribute in the same entity (the name of a country implies its capital city). It’s also possible to have a “multivalued dependency” (one attribute determines a set of values of another attribute, such as the name of a country leading to the name of all of its airports). Normalization helps model the necessary data in two dimensions for relational databases without imposing too many conditions or compromising the data’s integrity; it also reduces redundancy and inconsistency because of redundancy.
There are six normal forms: first (1NF), second (2NF), third (3NF), Boyce-Codd (BCNF), fourth (4NF), and fifth (5NF). The book that I’ve recommended covers these in detail. Normalization does proliferate entities, but it also makes each of those entities atomic – you can change them without affecting unrelated data improperly.
1NF: Only atomic attribute values are allowed. All repeating groups must be removed and placed in a new related entity.
2NF: 1NF + non-key attributes must be fully dependent upon the primary key of the entity.
3NF: 2NF + non-key attributes must be ONLY dependent on the primary key (they may not depend on other non-key attributes). 3NF is often summarized by “All attributes of an entity must depend on the key, the whole key, and nothing but the key (so help me Codd)”. Most folks won’t go beyond 3NF, but they’re good to know.
BCNF: 3NF + transitive dependencies removed. (“Table R is in BCNF if, for every nontrivial FD X → A, X is a superkey.”) BCNF may require a bit of added redundancy. IMO, The example that Ensor and Stevenson use for BCNF could have been simplified by the use of a proxy key (even though some folks frown on that, I find it useful).
4NF: 3NF/BCNF + removal of multiple multivalued dependencies. Ensor & Stevenson give a good example.
5NF: 4NF + resolving 3+ entities with many-to-many relationships to one another. This problem can show up with a data modeling tool that creates associative entities, resulting in a “join-projection anomaly”), so this form is sometimes called “join-projection normal form” (JPNF). Again, Ensor & Stevenson give a better example than I can.
Beyond simple data modeling, most “business” rules can be expressed as data themselves. Ensor & Stevenson point out that “most applications are designed to be code driven.” The rules applied in a particular case are driven by choices within the code itself. I’ve switched largely to data-driven designs (see the state flow code in Bug Traction for an example; the flow is completely driven by the data within the tables; the code only supports that data – well, mostly – there’s a little code flow left, but that will be disappearing when I decide to work on Bug Traction again, as the code flow is wrong).
There are times when one should denormalize as well, but that’s as much an art as anything else.
Data modeling is, at least, IMO, a under-utilized skill and it’s not taught nearly enough in CS programs. I had to teach myself with some good mentors in business – but Ensor & Stevens helped a lot, too.
-a
···
–
austin ziegler
Sent from my Treo