Saturday, August 30, 2008

Why EJB, Rails, and other clever object-relational frameworks are harmful

Software developers have spent a great deal of time and effort developing new and better ways of dealing with data persistence. Unfortunately, the more clever the framework, the less useful it is when faced with any real-world data problems, and I'll tell you why. (This will be pretty java-centric, since it's what I know, and it's where I perceive the problem as originating, anyway. If you like .NET, I'll bet $1 that what I say below still applies.)

Before Java came along, I don't recall there being a big hoohah about elaborate persistence frameworks. You had your relational database, you had C, you had a library that you used to get your data in and out, and maybe you had some kind of pre-compiler to ease the burden of integrating the library with your code. Then C++ became interesting, and we all struggled to integrate the notion of relational data with the notion of object orientation. At that point, we were off to the races.

When Java came along, JDBC came followed shortly after, and JDBC is a decent general method of getting at your data in a database-independent (we'll come back to this notion later, by the way) manner. The reason it's decent is because it doesn't try to do too much; it lets you execute SQL on a server, and it lets you get at the results of that SQL. What more could you ask?

Well, to developers steeped in object-oriented methods, JDBC does not do enough because it doesn't turn SQL results sets into collections of objects. Further, to lazy developers, such as myself, it requires an awful lot of code just to execute simple queries. What you really want is an easy way of getting your data out of the database (which is really good at storing and querying the data) and into objects, which are easier for pretty much all of your other components to deal with.

So far, so good, mostly. Except, at this point, already the shift of the database to second-class status had begun. The relational database had become more of a necessary evil in the lack of any decent alternatives. With the advent of Enterprise JavaBeans, this became particularly pronounced, as it became fashionable to write as little SQL as possible and "let the framework do the work".

At the time, I even bought into this point of view. When EJB was still in its 1.0 version, at which time it was a neat idea, but not really usable (kind of like Jini and JavaSpaces; remember those?), I wrote a gigantic persistence framework that did all the things I wanted it to: I didn't have to write any SQL, I got to deal with object instances that were all objected-oriented and cool, and I didn't have to think about the database. Well, other than when I had to think about the database; my general-purpose way of modeling the data so it would be easy to map to objects only worked with CRUD operations on single instances. I eventually put in a lot of work to solve most of those problems and had something that worked pretty well, but only until I had to make any changes to way the data was structured to fix design problems.

At a later time, when EJB was in its 2.0 version, I had occasion to use it pretty extensively and, surprisingly at the time, it had many of the same problems I had had to solve in the framework I built. More recently, we have Hibernate and Rails which are essentially similar in nature, and they exhibit the same problems as EJB and everything else.

The overarching problem here is that they all try to abstract the database away. I've heard several arguments for doing so, but the one that crops up the most often is that you want to have database portability. There is this great desire to be able to switch database server technology without it being a Big Deal. Thus, if you abstract the database away, and make your framework stick to "standard" SQL, you should be able to just pick up your application and drop it on a new server and have it Just Work.

This is fantasy. For trivial applications, such as, say, the sample and tutorial apps that all of the frameworks ship with, sure, this can work. For real-world applications that deal in volumes of real data, though? Pure fantasy.

Now, a lot of the fault in this lies at the feet of the "standard" SQL people, and with the people who thought up SQL to begin with. "What, not how" works very well when talking about relational calculus; unfortunately, in the real world, we have to worry about "how" as well as "what". And until optimizers employed by database servers are perfect, we will have to continue to worry about "how". But, because of the promise of SQL to allow us to think about "what" without thinking about "how", other enterprising individuals have decided it's a good thing to abstract the database away.

Every database server's optimizer works differently. Queries that are exactly the same in relational calculus can give wildly different execution plans in practice. It takes an expert in the particular database technology to figure out the right way to write the queries to get the best performance. It takes an expert in the particular database technology to figure out the right way to structure the tables, indexes and constraints to get the best performance.

This brings me to my first rule of enterprise software: The database needs an expert.. Database design and database development is a first-class discipline. No matter how cool the rest of your application is, if it needs a relational database, you need an expert if you want your application to be at all scalable. That expert should be designing your databases, writing your queries, and working closely with your IT/operations people to ensure that the deployment is scalable and reliable. The corollary to this rule is Don't let Java programmers write SQL (unless the are An Expert). The skills required to write good object-oriented code do not automatically transfer to database development, and I've seen more needless bugs in applications that stemmed from Java programmers writing SQL when they don't understand how database servers work. Another corollary is: Don't let Java code write SQL. Java programmers write Java code. Framework writers don't know anything about your application. What makes you think that framework writers know how to write your SQL?

Now, my second rule of enterprise software is: Changing database servers is a Big Deal. Database portability is a myth, for reasons aforementioned, and further, switching database servers almost never happens. Abstracting the database away in the name of portability is useless, because switching database servers just isn't done, and when it is it's a Big Deal anyway.

Now, I'm not opposed to frameworks; anything that makes my job actually easier is a good thing. But, a framework should not get in my way, or force me to do things in a certain way. The framework writers don't know anything about my application, so why would they know how I should write it? So, any framework used for data access should stay out of the way as much as possible. Remember, what I want is a way of getting my data out of the database and into objects that I can use. And, I want Java programmers writing Java and database experts writing SQL, the framework should facilitate that.

And now, for my third rule of enterprise software: Access to the data should be through stored procedures. I want SQL experts writing the SQL. I also want them to have the ability to tune the SQL without having to write Java code. Finally, I don't want them to have to go wading through Java code (or XML or properties files, etc...) to find the SQL. The natural way for a database expert to write database code is via stored procedures. Thus, any framework I use will be capable of calling stored procedures. If your database server doesn't have stored procedures, well, you need to get a new database server because yours is a toy.

This is the point where I put in a plug for the framework that I like: iBatis SQLMaps. It does everything I want and mostly stays the hell out of the way.

In conclusion, let Java programmers write Java code and let database programmers write database code. You will get the highest quality and the most scalability that way, and contrary to popular belief, your applications will be maintainable and understandable, too. Don't worry about database portability.

That there is The Right Way.