One thread per cpu

Hyperthreading notwithstanding, one thread per cpu will give you the highest application performance. Having more than that means context switching and contention, which means the cpu is spending time on tasks other than running your program.

For this reason, the fastCGI approach to web application servers is definitely a good thing – prefork a fixed number of single-threaded processes to handle requests, one per cpu.

So why spawn more threads or processes than cpus? Blocking. If a thread blocks, its out of action until the thing its waiting for is ready. With 1 thread per cpu, if a thread ever blocks, thats a whole cpu idle. To implement an application that never blocks, especially when it needs to make network calls (eg. to a database), or read files, is more challenging than doing the same thing with blocking calls and many threads.

CyUnit

Bought the 1st series of the remade Battlestar Galactica yesterday. Much darker and edgier than the original. None of the characters appear to be all-bad or all-good, and most of them have ‘issues’ of some kind.

One of the characters was working on a Cylon detector (Cylons are the bad guys, robots that look like people and want to wipe out humanity, although some are programmed to think they’re human), and there’s a scene where we see the UI for the first time: red bar / green bar!

Introducing CyUnit. Red bar, failing test, you’re a Cylon. Green bar, you’re human. Made me laugh, geek that I am.

Cool technology, bad name

The combination of DHTML, Javascript and XMLHttpRequest has been making weblog headlines recently. My experience with it predated the hype by about 6 months (we were porting a rich client system to the web and hit upon using XMLHttpRequest specifically as a solution to some of the responsiveness issues). I was a little slow to realise just how unknown the technology was in the wider world, but I don’t mind that, so much as the appalling name its come to be known by.

Ajax? It sounds like a brand of detergent. Oh wait. It is.

Encapsulate the Database

Picking up the ‘schema as 3rd party code’ idea, I want my database access to be encapsulated behind an insulation layer that decouples the business logic from the table schema.

This is the primary reason I have mostly lost interest in tools such as Hibernate and Neo. Where the documentation says things like ‘Autogenerate your mapping layer and persist your domain objects’, I see ‘Tightly couple your code to your database’.

Relational databases are hard to change. Especially once they’ve gone live. They very quickly foster an ecosystem of reports and queries, and almost always become an ad-hoc integration point for several applications, making schema changes very difficult.

For this reason alone, any tool which works by coupling business logic to to the database causes me concern. On almost every project I’ve seen that had an OO application and a relational DB, the persistence mechanism became a point of pain. Code becomes really hard to change because its constrained by the database schema and there’s usually an additional maintenance burden of keeping an XML configuration file in sync.

Code Insulation

Or, ‘Just because its open source I don’t want to have to compile it every time’.

Dee mentioned using vendor branches to manage 3rd party source.

I choose to avoid them. If I’m using a 3rd party library, I want a released binary. Recompiling someone else’s source as well as my own seems like a waste of time.

With respect to ‘fixing in place’, sure, if I know a solution I’ll submit a patch, but fixing in place means forking the 3rd party codebase, which (depending on the license) might oblige me to release it back to the community (which means time spent not getting work done), and opens a real potential for causing merge hell when I want to use a new version of the library. This happened to me a few weeks ago. Had the project done either of encapsulating access to the 3rd party code, or stuck with a binary version, the upgrade would have been far easier.

Inversion of inversion of control

Followup to Sam’s post, Sam’s other post and Liz’s post about Sam’s posts.

The contextual IOC pattern looks to me like a solution to the ‘problem’ of Dependency Injection based IOC, namely that all an object’s dependencies are explicitly named. But wait a minute, IOC was a solution to the problem of having objects whose dependencies weren’t clear in the first place.

Dependency injection bothers me because it encourages an object to have too much control. To shamelessly steal Liz’s example code:

new GameFactory(UserPreference preference, GuiComponentFactory factory, Scorer scorer, RulesEngine rulesEngine, Timer timer)

The reason GameFactory needs all that stuff is because it is presumably going to pass it down to the Game objects it creates. That is fine, and 5 parameters in a constructor isn’t setting off any code-smell alarms for me. I wouldn’t do it that way however. For a start, I question the need for a factory. Presumably somewhere there is a UI showing a list of possible games waiting for the player to choose one. I see no benefit in having an abstract factory when what I want to do in response to a user clicking ‘Tetris’ is a new TetrisGame(). The indirection of the factory is just getting in the way.

I could see how you might want a more configurable list of games, in which case possibly a GameCollection containing a TetrisFactory, a PacmanFactory etc. might work, if the game objects were too heavyweight to simply pre-instantiate at startup.

That’s not really where I want to go with this post though. The thing that really caught my eye was the Timer parameter. I’m being a bit mean picking on this one as its easy target, and design is subjective, but it demonstrates one of my concerns with IOC quite nicely. Clearly a Game object requires some notion of the passage of time, as many games are based on things happening periodically. In the Tetris example, it has to move a block down the screen and check to see if it’s landed on the pile. However, injecting a Timer is not what I would do. What the game object really wants is for something else to tell it when time has passed. Something external. The code might look like this:

public abstract class Game implements TimeDependent {
}

public interface TimeDependent {
void clockTick(long millisSinceLastTick);
// or possibly, for more OO flexibility
void clockTick(TimeInstant aFrozenMomentInTime);
}

This style satisfies all the requirements of testability, with the added simplification of removing one of the dependencies from the constructor.

You could do the same with Scorer. The game could expose a public TetrisScore basicScore() method, that a collaborator can call, and do its own post-processing on before showing it to the user.

UserPreferences is tougher. I’d agree that the game wants to ask a number of questions of the preferences object, so it should probably be given it.

To conclude, we build object-oriented systems. Its ok for an object to need help from collaborating objects, without having to own them all, or even know who is helping it.

The XP definition of 3rd party code

Continuing along the train of thought from my previous post, I came up with this rather extreme definition of 3rd party code:

3rd party code is any code on a different release cycle to the system under development.

This definition covers everything from 3rd party binaries, code developed by other in-house teams, the operating system, and your database schema.

Third party code considered harmful

Most systems are not built from scratch. Most will use third party libraries, either commercial products or open source utilities.

A good example of this in the Java world is log4J – many projects use it for their logging requirements, and its a good tool. However what most projects will do is use it directly, which means that virtually every class in the system will have a dependency on log4J. If a need arises that log4J can’t handle, or a new version is released that changes some behaviour your project depends on, it can be a huge job to go through all your code making changes.

A useful approach is to wrap all access to 3rd party code yourself, thus giving you a single point of contact between your system’s code and the external library. This wrapper is also a useful place to put any extension functionality that your system needs.