Why a 20 minute build is 18 minutes too long

I’ve just started working on a project team with a healthy obsession with build times. It had crept up to 20 minutes before some cunning hackery by Simon and Carlos brought it down to 7. This includes running the signed off acceptance tests and the database. The trick is to keep everything in process. IO is the enemy. But that’s a story for another day.

The point is, why is a fast build so important? It was Mike that said it best: having a screamingly fast build doesn’t make it easier to do the same things, it makes it possible to do entirely new things. When a build takes less time than a trip to the water cooler, you can build (and check in) every time you go to the water cooler. Doing a build becomes part of the rhythm of developing. Test, code, refactor, build, check-in. Repeat.

Code Reading

I enjoy reading code. In many ways it’s like reading a mystery novel. Or one of those ‘now turn to page 67’ roleplaying books. Actually very like reading one of those roleplaying books. For those who have no idea what I’m on about, there is a genre of book, usually of the sci-fi or fantasy ilk that puts the reader in the role of the main character and allows you to make choices about what happens next. Like an Infogrames text adventure in paper form. So you’d read a page that describes a situation then be presented with a sentence like, ‘You stand at the door to the brooding castle. A path leads around to the rear of the castle. To open the door and enter the castle, turn to page 34. To see where the path leads, turn to page 68.’ And so it goes. But I digress.

Reading code is an adventure of a similar sort. Every polymorphic call or conditional branch is a choice that must be made by the reader. Do I step into the ‘if’ block, or find out where ‘else’ goes? Which of the 4 implementations of ‘isValid’ do I want to look at first? Slowly, after much backtracking and re-reading, a mental map of the system begins to form. I also like to take offline backups to index cards while I’m reading, due to a somewhat limited personal heap size.

Code reading is a learned skill, just as reading in general is. You get better with practice. As with literature, there is much more bad writing than good writing. As with literature, the ability to tell good writing from bad improves the more you read, as indeed does writing skill.

We should all read more code.

Agile Answers

When engaging with a team new to agile methods of the XP flavour at a day to day technical level, there are some questions and discussions that I’ve come to expect as inevitable.

Adding methods ‘just for testing’

The approach of test driven development is a distinct change of mindset for most developers. The idea that the unit tests incrementally drive out functionality by specifiying it from the perspective of the calling code can feel like turning your designs inside out. More accurately it makes you think from the outside in. I have only used TDD in anger in object oriented languages, and it seems particularly appropriate for this style. TDD (also called behaviour driven development) forces you to think about what a class’s responsibility is, and how you will know if its living up to that responsibility. The upshot being that ultimately yes, you do end up with methods that you wouldn’t have added if you weren’t doing TDD, and that this is actually a good thing. I have heard complaints about polluting the public API, which I believe is a fear-based argument rooted in the concern that other developers may use your non-private methods in an unexpected way that causes bugs. Which leads me to my next point.

TDD designs are more complicated

This for me is purely in the eye of the beholder. I’d argue that TDD designs are generally more object oriented than non TDD designs, which personally I find less complicated. I can think about a system on a more conceptual level and in smaller chunks if I can read through a cohesive group of collaborating classes than I can if I have to read the same functionality in a single 150 line method. TDD codebases tend to grow a larger number of small classes with small methods than other approaches. The discipline of specifying behaviour in tests forces a decoupling, otherwise the test code becomes very hairy and brittle. Part of learning TDD it seems is going through that pain. To clarify the link from my earlier point, TDD done attentively tends to eliminate, or make explicit, a lot of the internal assumptions within a class about how it will be used. Of particular interest is method call sequencing. I see a lot of (non TDD) code where a public method on a class only works correctly if the object is instantiated in an environment where a singleton factory has been initialized somewhere else in the codebase beforehand, and three or four other methods have been called first with the correct parameters, themselves being complicated objects with brittle state. In this environment, reducing the number of public methods possibly serves to reduce the chances of calling them in the wrong order. Using tests to explicitly assert the behaviour of an object under different scenarios leads to objects that not only behave as expected in situations that have been considered but also behave robustly in situations that have not. Writing a class in such a way that it is usable immediately after construction is considered by many to be a best practice anyway (sometimes termed the Good Citizen approach). TDD just adds an extra impetus. Duplication in test code can be used as a hint to what reasonable assumptions can be made at construction time, and when in fact explicitly throwing an exception is the right thing to do. Many java developers are so fed up with unexpected NullPointerExceptions that explicitly throwing one from a constructor seems bizarre. If a class really cannot do anything useful when that field is null, throw as soon as you know. Its far easier to track down the cause than to wait until the exception gets thrown by something else the first time something tries to use the null field.

All this test code takes so long to write, we could be finished in half the time without it

For some value of finished perhaps. For a system of any significance whose production lifetime is quite probably going to be several times (and quite possibly an order of magnitude) longer than the initial development period, I disagree. It might seem slow, but that is usually due to two factors. The team are new to TDD and asking lots of questions like the ones I’m describing here (and arguing, debating, thinking etc.), and the discipline of TDD is pushing lots of small assumptions to the surface that would otherwise remain buried and implicit, in wait for the unsuspecting support and maintenance teams. If a project goes on for even a couple of months TDD, done conscientiously, can give a significant increase in productivity. New features become almost effortless to add to a loosely coupled, strongly cohesive codebase with a solid suite of automated unit tests. I’ve worked on systems that seemed to write themselves towards the end of the project.

Do the simplest thing for the task at hand

Originally a pushback against speculative development (particularly in code frameworks), this guideline is very easy to distort if you want to attack agile methods. Developing incrementally does not (in my opinion) mean taking your brain out before starting. I generally have an idea of where I’m going when developing, based on knowledge and experience. I use that mental roadmap to decide how to write the next test. Sometimes I’ll skim through and try a quick implementation of part of a feature to see if it looks right (quick meaning minutes, not days). Then I’ll write the tests that exercise my intended implementation, which will probably change it slightly to make it more testable. Code is clay, not marble. Or at least it should be.

Scrubbed

Finally deleted the last trackback spam yesterday. All comments and trackbacks are disabled forthwith. Go and bother someone else you bastards.

Not especially impressed with the new version of Movable Type either. The installation was really quite fiddly, and I’m a geek for heaven’s sake. And there was almost nothing that couldn’t have been scripted if they’d made the effort. I know the theory is that it should be possible to install with nothing more than an FTP connection, but there should have been a script provided for those of us who have complete access. The blacklist capability appears to be dreadful. There was no way to scan existing comments and trackbacks for spam, they were all assumed to be fine it seems. So I had to (via the web interface), delete 2000 odd trackbacks in batches of 100, which each took about a minute to complete. Nice. Bring back MT-Blacklist I say.

Hold tight, I’m going in

The machine my hosted account is on suffered a fairly fatal hardware failure a few weeks ago. Thanks to the marvels of RAID and backup schedules nothing important was lost, but my perl libs (which were not part of the backup) seem to have taken a hit, with the result that some bits of MoveableType have stopped working, namely trackback and comment filtering. I’m taking the opportunity to upgrade MT, so hopefully we’ll be back to spam-free operation soon. Either that or I’m about to take myself offline in a rather unplanned fashion.

Beyond HTML email

Slightly late to the party reading this entry about how we should all switch to HTML email.

That’s so 90’s. HTML is a truly shocking format for email, capable of containing malicious code and tracking your actions (thus allowing spammers to know if your email address is still valid). Also, an HTML ‘document’ often isn’t the whole message. HTML often contains references to external content. Consider the <img> tag. I received an email the other day that consisted almost entirely of image tags pointing off to a site that the corporate proxy had tagged as inappropriate content. The message was completely broken and unreadable (probably just as well), and the proxy logs would show that I ‘attempted’ to browse to a blocked site, despite the fact that it was my email application acting entirely beyond my control. HTML? Its fine for web pages and that’s where it should stay.

What’s more, there at least 2 open standards far better suited to sending richly formatted emails. Rich Text Format (RTF) and Portable Document Format (PDF). Both do what they say on the tin, and both are viewable on pretty much every platform with a GUI. And neither will cause your machine to go trawling the web or turn into a zombie.