Software isn’t engineering

In response to Les’ response:

Back in the days when PL/1 was the cool kid on the block, and projects were measured in the 100’s of man-years, software was engineered. Flow charts and diagrams were drawn and checked, everything was documented and approved, and only right at the end was anything actually programmed. We know this approach today as Waterfall. It is generally accepted to be a poor fit for most development projects. It is however, a very good fit for most ‘Engineering’ disciplines. I’m a Chemical Engineer by training, and I’ve been there (albeit briefly). Generally an engineering project starts off with one or more feasibilty studies, where you make some broad assumptions and run lots of computer simulations of the system to get a feel for which direction to go in. Then you progressively add detail and refine your assumptions until you have a detailed design that can be used as the basis for procurement and construction. You really don’t want to be making big changes during procurement and construction. Lead times for large equipment like compressors can be several months, and concrete only sets once.

How comparable is this to software development? Not much. The closest relative is probably the type of project (1000’s of man years) talked about in The Mythical Man-Month, which are right at one end of the spectrum, and nowhere near the reality of most software projects. Physical engineering disciplines were developed to tackle very specific physical engineering issues, and trying to shoehorn them onto software projects just isn’t going to work.

What we should be doing is developing a spectrum of practices that are specific to our own industry, instead of trying to force it into a mould that simply won’t fit.

Software, Buildings, Bridges & Testing

Les makes some good points in his comparison of software with buildings and other complex activities here.

Comparing software development with the engineering of complex physical objects is always tricky, and fraught with risks. There are many similarities (they’re both hard to do well), but I think the differences are more profound than the similarities. Les appears to disagree with the current trend of TDD replacing static checking and formal methods as a means of ensuring software quality, drawing a parallel with bridge construction. Bridge construction is heavily rooted in engineering maths and physics – calculating the forces involved and the materials needed to withstand them, along with sample testing the individual components.

There are two major differences between bridge engineering and software development. First, the cost of testing. Unit tests are essentially free. They are generally run many times each day while development progresses, and act as a quantifiable measurement of progress, as well as a means of ensuring quality. The other major difference is in the economic breakdown. In a bridge, the design cost is probably less than 30% of the total cost, the rest being materials and construction. In a software system, probably over 90% of the cost is in the design (by which I mean the process by which source code gets written). Construction (ie. compilation) is basically free. There may be costs associated with deployment, but they are marginal compared to the design costs. In a continuous integration environment, deployment is often automated (the cost is then in the development of the automation), happening several times a day. Its a bit harder to construct and deploy a bridge that often.

All of which means that there is a definite benefit in spending time and effort (ie. money) in formally verifying your bridge design is correct, as the cost of having to rebuild a collapsed bridge is considerably more than the cost of fixing a broken software build. If construction and testing are more or less free, then using TDD and building as often as possible really is the most cost-effective means to develop a software system.

What if all methods were objects?

What would happen if you wrote code where every method of an object simply delegated to a field? Where every method was essentially a ‘Method Object’ as described in the Refactoring book. What would that be like? Objects would become like mini-packages. Loose aggregations of behaviour collaborating to form a whole like cells in an organism. What would be the advantages and flaws? I haven’t tried it so I don’t know for sure. One advantage I can see is that of testability. If all your methods are objects, testing a private method becomes simply a matter of instantiating its object and invoking it. It would also make it harder to write code with side effects. Multiple return values (ie. methods that modify several fields) might be tricky though. On the other hand this might force more careful thought about where behaviour should be.

Food for thought on a friday…

Putting the Unit in Unit Tests

Some excellent comments to my last post. Please allow me to respond:

Should you test your persistence layer? Yes.
Should you test your system from end to end? Yes.
Should you test complex edge cases? Yes.

If you assumed the answer to any (sensible) question that starts with ‘Should you test…’ is ‘Yes’, you wouldn’t be too far from the truth.

However….

Should you test any of those things in your Unit tests? No. Unit tests (to me) are the things that the developers write and run many times a day, and should always pass. Testing against DB’s is hideously slow. Orders of magnitude too slow for unit tests. Tests against DB’s can too easily fail for the wrong reasons. Tests that require masses of setup code and mock objects just to run are too brittle. Lots of mock objects can hurt you when refactoring, as they tend to break when you pull their superclasses apart.

Also note that I didn’t say ‘Don’t use Mocks’. Sometimes they are required. MockPersistenceLayer being a good example. Although if you have interface/impl separation (which you should for your persistence layer), then ‘InMemoryPersistenceLayer’ or ‘HashMapBackedPersistenceLayer’ would be equally good names, and less open to misinterpretation than MockPersistenceLayer.

The other kind of Mocks, the ones that verify their methods were called in a certain sequence with certain parameters can be useful, but, as I said, can be brittle and prone to breakage during refactoring, so should be used with caution. They also make it less clear what’s being tested. Chances are the system would benefit from finer grained objects that could be tested individually, which should reduce the need for a full Mock.

Its perfectly ok to have a separate set of JUnit tests that aren’t ‘Unit’ tests, that (for example) test the persistence of each of your objects exactly once. Run them as often as you want, add them to your commit script if you feel the need, just don’t bolt them into the main body of unit tests.

Project guidelines

Just some thoughts, mostly for my own benefit. I reserve the right to change my mind, and even do complete U-turns where necessary.

  • Use CruiseControl, or some other form of continuous integration.
  • Use clover, and fail checkins if there is any code not covered by a unit test.
  • Block System.out.println with a CVS commit script.
  • Ban unit tests from hitting the database.
  • Don’t write tests that require more than 10 lines of (non production) setup code per test method.
  • Don’t write code that requires a (JUnit) setUp method that is bigger than all your test methods combined.
  • Use mock objects very sparingly. Consider them indications of code smell.
  • Don’t use anonymous subclasses in testcases just to loosen method permissions for testing, or to stub out hard-to-test behaviour. Code smell warning.
  • Do put your test code in a parallel directory structure, in the same packages as your production code.
  • Do test protected and package methods.
  • Don’t worry about testing private methods too much – they must be called from a public, package or protected method, or they’re redundant and can be deleted.

Don’t break the law of Demeter

…Or the OO cops will get you. The best summary of the law of demeter I saw was from c2.com. It goes something like this:

You can play with yourself.
You can play with your own toys (but you can’t take them apart),
You can play with toys that were given to you.
And you can play with toys you’ve made yourself.

That’s it. To recast in geek-speek:

An object can call its own methods.
An object can call methods on its own fields. (But not on it’s fields’ fields)
An object can call methods on parameters passed to it.
An object can call methods on objects it has instantiated.

That’s it. You can’t call methods on the fields of other objects. So anything that looks like foo.getBar().getBaz().doSomething() is not allowed. What you can do is add a method to foo called ‘doSomething’ which delegates to a method on its ‘bar’ field called ‘doSomething’ which delegates to its ‘baz’ field. That doesn’t break the law, because it prevents foo’s caller from knowing about bar and baz, and foo from knowing about baz.

Its all about preserving encapsulation.

If you take it to the extreme you find that getter methods break the law of demeter. Ask yourself why other objects need to get at your object’s fields. Can’t they just tell your object what they want it to do directly? Setter methods aren’t much better. They still expose an object’s state to the world.

Something new every day

IDEA is just the coolest IDE since sliced bread. Today I discovered the ‘invert if’ and ‘split if’ refactorings. Just put the text cursor next to an ‘if’ statement and IDEA will ask if you would like to invert it or split it! Consider the following:

if (foo != null  && bar != null) {
// Do some stuff
return true;
}
return false;

After ‘split if’:

if (foo != null) {
if (bar != null) {
// Do some stuff
return true;
}
}
return false;

After two ‘invert ifs’:

if (foo == null) {
return false;
}
if (bar == null) {
return false;
}

// Do something
return true;

Which I think is much clearer and more open to further refactorings, as it has clearly separated the guard clauses from what the method is actually supposed to be doing. Cool.