Friday, September 19, 2008

A flexibility contest

The introduction in the Hibernate documentation reports that

A relational database is more flexible than a network programming language, so it doesn't need anything like a navigation direction - data can be viewed and retrieved in any possibile way.

As this is quite a statement, I stopped to consider it a while.

In a programming language you have to consider navigability, so if you have - as in the example - a Person class and an Event class and you want to get all events associated to a person you'll have to ask the person with something like person.getEvents(), but if you also want to get all persons associated to an event you'll have to ask the event with event.getPersons(), i.e. you have to provide accessors to both classes. And both classes hold a reference to a collection of their companion class. Moreover, when you add a person to an event you have to make sure that also the event gets modified, and vice versa.

In an RDBMS you can compose all the queries you like, starting from persons or events; the only thing you'll need is a well defined relation between the tables, which tipically translates into a bridge table. You don't even need to enforce referential integrity, though this is the path to the dark side (or is it?), as long as everybody knows how the tables should be connected. So when you add a person to an event you only have to insert a row in the bridge table.

In this sense databases are more flexible than programming languages. You can write
select *
from person p
left outer join person_event pe
on p.id = pe.person_id
left outer join events e
on pe.event_id = e.id
as well as
select *
from event e
left outer join person_event pe
on e.id = pe.event_id
left outer join person p
on pe.person_id = p.id
but this comes at a cost. You need keys. And you need indexes to manage them (even if this relates to the phisical aspect rather than to the logical one). And a bridge table to manage the many-to-many relationship, which doesn't belong in the domain at all but is a pure fabrication used to represent the model. This clutters things a bit, even if we're so used to it that we don't even notice.

The Java language gives a much clearer representation of the domain (at least, the signatures of the method do):
public class Person {
// stuff
public List getEvents() {...}
public void addEvent(Event event) {...}
}

public class Event {
//stuff
public List getPersons() {...}
public void addPerson(Person person) {...}
}
The language shields you from how the model is represented (that's where the rigidity is hidden), and it is more natural to use.

Databases can use views to simplify data retrieval, but they don't work for inserts and updates. That's where stored procedures come in handy, but... aren't they written in a programming language? Should the latter be considered part if the database or as an external means to get a higher abstraction over SQL (which in turn is a language of its own)? How far should we push this?

My conclusion? Everyone should get their act together to make sure that the quality of what they build is the highest that they can get, either when designing a good database schema or when implementing a bidirectional association. The problem is that "the highest they can get" can sometimes be a low standard, so everyone should always try to improve, no matter how high the standard. I think this rule applies not only to programmers in their own field of competence, but to men in the broader sense.