Saturday, August 30, 2008

Why EJB, Rails, and other clever object-relational frameworks are harmful

Software developers have spent a great deal of time and effort developing new and better ways of dealing with data persistence. Unfortunately, the more clever the framework, the less useful it is when faced with any real-world data problems, and I'll tell you why. (This will be pretty java-centric, since it's what I know, and it's where I perceive the problem as originating, anyway. If you like .NET, I'll bet $1 that what I say below still applies.)

Before Java came along, I don't recall there being a big hoohah about elaborate persistence frameworks. You had your relational database, you had C, you had a library that you used to get your data in and out, and maybe you had some kind of pre-compiler to ease the burden of integrating the library with your code. Then C++ became interesting, and we all struggled to integrate the notion of relational data with the notion of object orientation. At that point, we were off to the races.

When Java came along, JDBC came followed shortly after, and JDBC is a decent general method of getting at your data in a database-independent (we'll come back to this notion later, by the way) manner. The reason it's decent is because it doesn't try to do too much; it lets you execute SQL on a server, and it lets you get at the results of that SQL. What more could you ask?

Well, to developers steeped in object-oriented methods, JDBC does not do enough because it doesn't turn SQL results sets into collections of objects. Further, to lazy developers, such as myself, it requires an awful lot of code just to execute simple queries. What you really want is an easy way of getting your data out of the database (which is really good at storing and querying the data) and into objects, which are easier for pretty much all of your other components to deal with.

So far, so good, mostly. Except, at this point, already the shift of the database to second-class status had begun. The relational database had become more of a necessary evil in the lack of any decent alternatives. With the advent of Enterprise JavaBeans, this became particularly pronounced, as it became fashionable to write as little SQL as possible and "let the framework do the work".

At the time, I even bought into this point of view. When EJB was still in its 1.0 version, at which time it was a neat idea, but not really usable (kind of like Jini and JavaSpaces; remember those?), I wrote a gigantic persistence framework that did all the things I wanted it to: I didn't have to write any SQL, I got to deal with object instances that were all objected-oriented and cool, and I didn't have to think about the database. Well, other than when I had to think about the database; my general-purpose way of modeling the data so it would be easy to map to objects only worked with CRUD operations on single instances. I eventually put in a lot of work to solve most of those problems and had something that worked pretty well, but only until I had to make any changes to way the data was structured to fix design problems.

At a later time, when EJB was in its 2.0 version, I had occasion to use it pretty extensively and, surprisingly at the time, it had many of the same problems I had had to solve in the framework I built. More recently, we have Hibernate and Rails which are essentially similar in nature, and they exhibit the same problems as EJB and everything else.

The overarching problem here is that they all try to abstract the database away. I've heard several arguments for doing so, but the one that crops up the most often is that you want to have database portability. There is this great desire to be able to switch database server technology without it being a Big Deal. Thus, if you abstract the database away, and make your framework stick to "standard" SQL, you should be able to just pick up your application and drop it on a new server and have it Just Work.

This is fantasy. For trivial applications, such as, say, the sample and tutorial apps that all of the frameworks ship with, sure, this can work. For real-world applications that deal in volumes of real data, though? Pure fantasy.

Now, a lot of the fault in this lies at the feet of the "standard" SQL people, and with the people who thought up SQL to begin with. "What, not how" works very well when talking about relational calculus; unfortunately, in the real world, we have to worry about "how" as well as "what". And until optimizers employed by database servers are perfect, we will have to continue to worry about "how". But, because of the promise of SQL to allow us to think about "what" without thinking about "how", other enterprising individuals have decided it's a good thing to abstract the database away.

Every database server's optimizer works differently. Queries that are exactly the same in relational calculus can give wildly different execution plans in practice. It takes an expert in the particular database technology to figure out the right way to write the queries to get the best performance. It takes an expert in the particular database technology to figure out the right way to structure the tables, indexes and constraints to get the best performance.

This brings me to my first rule of enterprise software: The database needs an expert.. Database design and database development is a first-class discipline. No matter how cool the rest of your application is, if it needs a relational database, you need an expert if you want your application to be at all scalable. That expert should be designing your databases, writing your queries, and working closely with your IT/operations people to ensure that the deployment is scalable and reliable. The corollary to this rule is Don't let Java programmers write SQL (unless the are An Expert). The skills required to write good object-oriented code do not automatically transfer to database development, and I've seen more needless bugs in applications that stemmed from Java programmers writing SQL when they don't understand how database servers work. Another corollary is: Don't let Java code write SQL. Java programmers write Java code. Framework writers don't know anything about your application. What makes you think that framework writers know how to write your SQL?

Now, my second rule of enterprise software is: Changing database servers is a Big Deal. Database portability is a myth, for reasons aforementioned, and further, switching database servers almost never happens. Abstracting the database away in the name of portability is useless, because switching database servers just isn't done, and when it is it's a Big Deal anyway.

Now, I'm not opposed to frameworks; anything that makes my job actually easier is a good thing. But, a framework should not get in my way, or force me to do things in a certain way. The framework writers don't know anything about my application, so why would they know how I should write it? So, any framework used for data access should stay out of the way as much as possible. Remember, what I want is a way of getting my data out of the database and into objects that I can use. And, I want Java programmers writing Java and database experts writing SQL, the framework should facilitate that.

And now, for my third rule of enterprise software: Access to the data should be through stored procedures. I want SQL experts writing the SQL. I also want them to have the ability to tune the SQL without having to write Java code. Finally, I don't want them to have to go wading through Java code (or XML or properties files, etc...) to find the SQL. The natural way for a database expert to write database code is via stored procedures. Thus, any framework I use will be capable of calling stored procedures. If your database server doesn't have stored procedures, well, you need to get a new database server because yours is a toy.

This is the point where I put in a plug for the framework that I like: iBatis SQLMaps. It does everything I want and mostly stays the hell out of the way.

In conclusion, let Java programmers write Java code and let database programmers write database code. You will get the highest quality and the most scalability that way, and contrary to popular belief, your applications will be maintainable and understandable, too. Don't worry about database portability.

That there is The Right Way.

3 Comments:

At November 25, 2008 at 1:22 PM, Blogger Kyle said...

Agree on a couple of points -
1) databasing is a first-class problem space
2) database portability is a myth

You also say that databases require database experts. Yes.

The problem with these ORM frameworks is one of expectations managements - they are billed as being the One Solution To Everything. That's simply not what they are for - as you spell out pretty well - data access is a hard problem, and not easily automatable. So, I would re-state that point as databases require thinking human beings, not automatic tools.

Back to expectations management - I do not necessarily agree that ORM frameworks are necessarily bad - just that the problem domain that they exist to satisfy is considerably smaller than what their respective marketing departments push.

If you set your own expectations (and those of your colleagues) a little lower: namely, that ORM packages are designed to model data as objects (but not necessarily to optimize retrieval and modification)....*I think* you will find yourself in a much happier place....

Of course - you might ask yourself - is there any value to ORM then? I think so - i like to think of things in terms of objects - it's how my brain flows - makes maintenance easier *for me*... I understand when people disagree with that, as it's somewhat subjective.

 
At November 26, 2008 at 10:45 AM, Blogger Unknown said...

I should, perhaps, have placed the word "clever" in my title in quotes. You'll notice that I don't actually completely eschew ORMs, I actually use one. But, it doesn't try to be too "clever", and doesn't get in my way, and most importantly, it doesn't have a lot of extra features bloating its code base and diverting developer time. It's as if it was built with people like me in mind.

 
At November 26, 2008 at 3:25 PM, Blogger Kyle said...

Agreed

 

Post a Comment

<< Home