Agile code

Tony Bowden: Understanding Nothing: Writing Agile Code

The issue of ‘agile code’ has been occupying a lot of my spare cycles recently. I think its way more important to an agile process than is made out. Sure the XP books talk about OAOO and refactoring, but somehow the sheer importance of quality code seems to get forgotten on a lot of agile projects. Agile processes target difficult projects and trust in the teams’ ability to respond to changes and adapt to fluid situations. If your code isn’t agile, it won’t matter a damn how flexible your process is. Ask yourself in how many places your codebase would have to change if you swapped out your persistence layer. I’m not just talking about changing DB vendor, how about switching to flat files, or XML, or Prevayler. How many packages, classes, methods would be affected? How many tacit assumptions are in the code about how other bits of it work? Having agile code should mean that huge chunks could be replaced without having to touch the rest of it.

XP talks about discipline. An agile codebase must be far more disciplined than a non-agile one. It has to be able to change in unforseen directions as the project demands. This requires a level of quality that is frequently underestimated.

Inheritance Sucks!

Not really, of course, its just very easy to misuse, and used way too much. I’m talking about object inheritance (‘extends‘) here. Object inheritance should only be used to override behaviour, not state. If the only reason a class extends another is to get all its fields, that’s the wrong thing to do. That’s what object composition is for. If you want a one to one mapping to another class, include it as a field and add delegators for its public methods to your interface. In IntelliJ there’s even a menu option for it. What about polymorphism you cry? Thats what interface inheritance is for. If you need to be able to polymorphically pass your composed object to methods that expect your contained object, implement a common interface and make your methods accept that instead. Your code will be cleaner, more modular and (most importantly) less coupled. The title of this post should probably have been ‘Coupling Sucks’. Extending a class binds you to it far more tightly than wrapping it as a field. And working through interfaces makes testing a snap. Just plug your testcase in, or a full-fledged mockobject, although I rarely need a full Mock when I can just write MyTestCase extends TestCase implements Foo, Bar, Baz {...} and make the assertions directly based on which methods were called and in what sequence.

Most code uses inheritance more often than object composition, and much of the time composition would have been the better option. I suspect the reason is as prosaic as the fact that object composition involves more typing. This is no longer the case with IDE’s like IDEA.

Prevayler Perturbations

Ara said this:

Prevayler

…Prevayler is better and faster than classic OODBMSs because it’s simpler and aggressively caches everything: there’s no transaction API, no weird querying API, everything is just written to a log file, and so on. Basically the three main optimization techniques are: 1-lazy evaluation, 2-eager evaluation, and 3-caching. Prevayler uses 1 and 3 and wins, but you can apply the same techniques to RDBMSs too. So let’s say we come up with the same in-memory cached system, the same command-logging system like Prevayler, but with RDBMSs for the long term storage instead of serialization approach of Prevayler. I think that’ll be more attractive. And it’s possible right now: useHibernate for mapping objects to relational tables, use JCS (or a distributed cache or soon the JCache spec) for caching. The only missing part is the log-now/persist-over-night feature. That shouldn’t be hard to do. So let’s say you have a HibernatePersistenceManager class, you can decorate it with a LogNowPersistOverNightPersistenceManager which just writes a log file like Prevayler and persists them over the night (possibly triggered via a JMS queue or simply via Quartz job scheduler). The system should be lighting fast like Prevayler and yet RDBMS-based…

Prevayler has been great for getting everyone to question their persistence needs, and for stressing simplicity. However, the tacit assumption it seems to make (someone will correct me if I’m wrong) is that the only client of your application’s data is your application. The reason RDB’s are so ubiquitous and entranched is that there is a standard means of querying and updating them, using SQL. Its rare for a corporate system to have only one user of its data. Even if only one application is updating the data there are usually lots of reports based on it. If your data is accessible with a SQL statement, ‘anyone’ can write an ad-hoc query on it. If its stored in memory, or a serialized form then adding a report could mean changing your application code at worst, or at best making some kind of RMI call to it, which seriously raises the technical bar to writing a report. The other aspect of Prevayler that I am still unsure of is how it would work in distributed applications. Most distributed apps. have a single (or clustered) backend datastore, simply because there has to be somewhere that holds the unified picture of your data. The classic example being the airline reservation system where two agents both attempt to book the last seat on a plane. This can be handled because both transactions end up at the same datastore. Without some single point of reference, all concurrency and consistency checks would have to be after-the-fact.

Agile O/R Mapping

Object-Relational mapping is hard. Its usually much easier to change your object model than it is to change your database schema, which can quickly lead to a mismatch between your objects and your database. Some kind of mapping metadata layer is almost inevitable, as I have yet to find a tool that can map even moderately complex object models automatically to a database. Some of them are getting close, and if your project is small and agile enough to allow you to generate your database automatically then you can use your mapping infomation to drive your schema. I’ve been playing around with Hibernate and XDoclet to do just this, and it can work really well. Not having to worry about maintaining table creation scripts is very handy. Its difficult to work out how to scale this approach to big projects, but here are some thoughts.

On a project of any appreciable size, the database will be touched by several other systems, plus ad-hoc reports etc. Dropping and rebuilding the whole thing every iteration probably won’t work too well.

There appear to me to be at least three distinct types of data in a typical application database (in increasing order of ubiquity and variability):

Application metadata
Data used during a build process to code generate type-safe enum objects. Not frequently seen (except on the project I’m currently on), and only of interest at build time. Could in theory be excluded from a deployed database.

Static reference data
Lookup data that changes rarely, like currency codes, country codes etc. Most applications have data like this.

Application data
The active ‘memory’ of the application. Stuff that is constantly being inserted, updated and deleted during normal operation. If your application doesn’t have this type of data, you probably don’t need a database at all.

The first two types are generally easier to deal with, as they change rarely, and the changes are almost always to the data, not the schema. Generally speaking, normal dba management practices should take care of these two.

The third one is where most of the difficulty arises. It is this data that everyone cares about. The developers are trying to map their classes to it, other systems want to feed data to, or take data from it, and unruly crowds of business users want vast numbers of reports run against it.

Problems arise because there are two opposing forces at work. The developers want a totally flexible schema that perfectly maps to their classes. From their perspective a schema that is regenerated during every build based on their mapping metadata would be perfect. This end of the spectrum allows total agility, at the expense of dropping all the mapped tables every time a build is deployed. At the other end of the spectrum lie all the customers of the database – all the inputs, outputs and reports that interact with it. From their perspective, a totally stable schema (assuming the data model is not fundamentally wrong) is the ideal.

The sweet-spot on the spectrum will depend on where the application is in its lifecycle. A brand new system with no pre-existing customers will benefit from the rapid feedback that comes from being driven by the object model. This allows the developers maximum freedom to refactor as the application design improves. At some point during development, it will probably become necessary to stabilise the schema, in order to allow integration work with other systems to take place.

Coming from a development perspective, my desire is to find ways to maximise the ‘agile period’ for the database schema. One approach could be to further subdivide the schema, to insulate the parts of the database that must interact with other systems from the tables used for object persistence. This would allow the object persistence tables to remain auto-generated, while offering some stability in the other tables. The issue then becomes one of synchronisation. If using a database that allows updatable views, these could be used to map the database’s ‘public interface’ to the object persistence tables. Keeping the views in sync. then becomes a maintenance overhead, which could be significant if the object model is changing rapidly.

Note that I haven’t mentioned data migration anywhere, mostly because this is probably even harder to cope with than schema changes. Not being a dba, I’d probably do an injustice to the subject if I tried to address it. With that risk in mind, all I’ll say for now is that if your object model is driving the schema, it will probably be necessary for the acceptance test data to be defined in code, as a big method whose job is just to instantiate objects and send them to the persistence layer. This at least allows the test data to be included in any automated refactorings done in IDEA, for example. More brainpower than I currently have available is needed.

Useful links:

http://www.agiledata.org/
http://www.agiledata.org/essays/mappingObjects.html
http://groups.yahoo.com/group/agileDatabases/
http://xdoclet.sourceforge.net/tags/hibernate-tags.html
http://hibernate.bluemars.net