Twitter has totally clobbered my ability to express any thought that requires more than 140 characters. Must get back into the blogging habi
To the person I overtook and had to squeeze in front of as the overtaking lane merged into the left hand lane going up that hill who was themselves held up by the queue of traffic that I was overtaking, sorry. I flashed my hazard lights at you in apology once I was safely in front of you. Hopefully you interpreted that gesture correctly. Now, does my lack of judgement make it okay for you to accelerate into my path and actively try to block my return to the line of traffic? In what way did your actions contribute to the safe and swift progress of all concerned? Would you have preferred it if I’d not seen you (and then hit my horn long and hard I admit) and just ploughed into your right hand side? My car would have been a mess, but yours would have gone into the ditch at 60 miles an hour.
I’ve just spent a week motorcycling in France and I’m left with this impression: French driving in general is significantly better than British driving. Many French people ride scooters as teenagers and it shows in their friendly and considerate attitude towards their two-wheeled friends and road safety in general. We British drivers sometimes act like we own the road and all other road users are an inconvenient obstacle. We also appear to believe ourselves invulnerable and infallible. Personally I think learning to ride a moped or motorcycle before learning to drive a car should be compulsory.
Few things enrage me more than wrestling with blog software. All I want to do is fix the archive links so that outside references to popular posts imported from movable type still work. Messing about with mod-rewrite, random plugins and generally wasting my morning are not part of that agenda. All it needs is one extra field that I can manually type my own permalink into that would map to a specific post. I don’t NEED an algorithmic solution here, that’s the beauty of URLs. They’re just text! If only this blasted software would get out of the way and do the simplest thing that could possibly work.
This year, for some reason (*cough* midlife crisis) I’ve been motivated to do some things I’ve thought about in the past but allowed myself to be talked out of (by myself or others). So I’m learning the piano. I’ve played a few instruments as a child, with a timeline that basically went like this: Start having lessons, breeze through for a few months, right up to the point where regular practicing was required to make progress, lose interest and stop. Somewhat counterintuitively, I could improve further than average initially with little practice, so would hit a wall when I stopped improving as I had become used to making easy progress.
Coming into a new instrument after more than a few years, and having mostly lost any music reading ability I may have had, I’ve been struck by how regular music actually is. There are patterns everywhere and things that I struggled to understand as a young’un make perfect sense to me now. For example, scales:
The easiest scale to play on a piano is C-Major. You start at the white key to the left of the two black keys (that’s C) and press each white key (whole notes, eg C,D,E,F etc.) until you get to the next one that looks the same. Other scales however involve pressing black keys (the sharps and flats, eg C♯, E♭) and this was one of the things I found confusing whenever I considered it before. But there is a very simple pattern for all major scales, and it goes like this: 0-2-2-1-2-2-2-1. This means you press the first two keys two semitones apart after the first note (0), then the next semitone, followed by three more keys each two semitones apart, followed by a single semitone interval, which puts you on your starting note one octave higher. This applies to all major scales, regardless of which note you start on.
What’s a semitone? Its the difference between each adjacent key on the keyboard irrespective of colour. Notice how most white keys have a black between them but some don’t? That’s why the C-Major scale is only on the white keys. It happens that just when you need to go only one semitone higher, you’re at two white keys next to each other with no black key in the middle. Start at a different spot however and sometimes you’ll have to hit a black key in order to maintain the above sequence (eg. if you find yourself needing to go one semitone higher and there is a black key to the right, or two semitones higher when there is no black key between the key you’re on and the next white key along.)
So there you go. Music is maths.
Here’s a modelling scenario that crops up on many occasions in software systems.
Today is wednesday. The friendly neighbourhood HR person is updating the HR system. She needs to record the following information.
Last thursday, Bob from Treasury received a pay rise, commencing next friday.
It’s now monday almost 2 weeks later, the business day after next friday. Friendly neighbourhood HR person is told of a mistake with Bob’s pay rise. She needs to record the following information.
The pay rise recorded for Bob 2 wednesdays ago that he received 3 thursdays ago that commenced last friday was wrong. It should have been backdated to the beginning of the month, 5 fridays ago, not last friday.
Developers and their software can get in an awful mess if sufficient thought is not given to the many dimensions of time. In this example a robust system would have recorded the following:
- The time the initial entry was made by FNHRPerson.
- The time the pay rise was given to Bob.
- The time the pay rise took economic effect.
- The time the entry was modified.
- The new time the pay rise took economic effect.
It might also need to answer questions like, ‘when does Bob’s pay rise take effect?’ and ‘as of the day before yesterday, when did you think Bob’s pay rise took effect?’.
There’s a whole other side to this story with respect to calculating how much money had changed hands and the correction required in light of the new date data. But that’s another dimension altogether.
Or, when is performing request-response communication over an asynchronous channel a good idea? When is it not?
If system A needs to ask system B a question and get some data as a response it could do so in a variety of ways, some awful, others less so.
- Take a periodic database extract, transform and load
- Query the other system’s database on the fly
- Use RMI / Corba binary interop
- Send a message and wait for a reply
- Get some XML from a URL via HTTP
This list is not exhaustive, but represents a reasonable spread of mechanisms. Lets narrow it down to the two least awful.
First, any integration involving a database is right out. Coupling application databases together is extremely awful. Binary integration is bad for the same reason. These integration mechanisms are fragile and make the job of releasing both applications much harder, as well as making it hard to know if a change has broken the integration.
Moving on to the two remaining choices. Using messaging can work well if the requesting system can tolerate a large variation in latency and/or you need to scale by having multiple processes sending and receiving the messages. It is also very handy if you need to cope with one of the systems failing periodically as the messaging infrastructure can hold onto the messages when the systems are unavailable. The request-response semantics can be achieved by passing a correlation id around to track conversational state. The implementation is more complicated because you don’t really want the requestor sitting blocked waiting for a response that could take an unbounded amount of time to return (eg. if the responder is busy or down).
If the complexity of a message based approach is not justified then a simple web service (that is, XML and HTTP GET requests, no SOAP or other pointless nonsense) can work very well. As an added benefit, if some thought is put into how the URLs are designed it is a relatively simple matter to insert an HTTP proxy cache (such as Squid) into the flow. Sensible use of cache directives in the HTTP headers can allow even moderately fast changing data to be cached effectively without going stale. All without having to complexify the code that returns the XML.
JDOM’s XPath implementation has (in my opinion) big glaring bug with respect to its handling of the default namespace. That’s a namespace that looks a bit like this in the XML:
<?xml version="1.0" encoding="utf-8"?> <myRootElement xmlns="https://darrenhobbs.com/some/namespace/2008/10/15" xmlns:foo="https://darrenhobbs.com/some/foo/namespace"> <myChildElement> ... some stuff ... </myChildElement> <foo:aFooElement> ... some foo stuff ... </foo:aFooElement> </myRootElement>
Note the ‘xmlns=…’, denoting the default namespace. As opposed to ‘xmlns:foo=’ which denotes the ‘foo’ namespace.
Let’s say I wanted to run an XPath query for: ‘//myChildElement’:
XPath xPath = XPath.newInstance("//myChildElement"); xPath.addNamespace(Namespace.getNamespace("https://darrenhobbs.com/some/namespace/2008/10/15")); List nodes = xPath.selectNodes(aDocument);
This will never work. XPath does not play nicely with default namespaces. The solution is to register the same namespace URI against a made-up prefix and change the XPath like so:
XPath xPath = XPath.newInstance("//dh:myChildElement"); xPath.addNamespace("dh", "https://darrenhobbs.com/some/namespace/2008/10/15");
The query should then work. This is not a new problem.
As a developer, of course I went off and found the webkit benchmark. Results below, in all their ugly unformatted glory:
TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.28x as fast 5387.6ms +/- 0.6% 2358.2ms +/- 0.2% significant ============================================================================= 3d: 3.69x as fast 621.6ms +/- 1.1% 168.6ms +/- 4.0% significant cube: 5.35x as fast 229.0ms +/- 1.3% 42.8ms +/- 11.5% significant morph: 2.88x as fast 205.4ms +/- 1.4% 71.2ms +/- 5.7% significant raytrace: 3.43x as fast 187.2ms +/- 2.1% 54.6ms +/- 3.1% significant access: 6.93x as fast 885.6ms +/- 0.9% 127.8ms +/- 4.3% significant binary-trees: 13.7x as fast 112.6ms +/- 0.6% 8.2ms +/- 12.7% significant fannkuch: 8.92x as fast 401.2ms +/- 0.1% 45.0ms +/- 2.0% significant nbody: 4.86x as fast 210.8ms +/- 2.7% 43.4ms +/- 10.0% significant nsieve: 5.16x as fast 161.0ms +/- 1.7% 31.2ms +/- 4.4% significant bitops: 8.42x as fast 796.8ms +/- 0.2% 94.6ms +/- 5.3% significant 3bit-bits-in-byte: 26.6x as fast 154.2ms +/- 0.4% 5.8ms +/- 17.9% significant bits-in-byte: 17.8x as fast 217.2ms +/- 0.5% 12.2ms +/- 8.5% significant bitwise-and: 5.51x as fast 178.6ms +/- 0.9% 32.4ms +/- 4.4% significant nsieve-bits: 5.58x as fast 246.8ms +/- 0.2% 44.2ms +/- 7.0% significant controlflow: 28.3x as fast 113.2ms +/- 0.5% 4.0ms +/- 22.0% significant recursive: 28.3x as fast 113.2ms +/- 0.5% 4.0ms +/- 22.0% significant crypto: 5.29x as fast 405.0ms +/- 0.3% 76.6ms +/- 4.8% significant aes: 4.97x as fast 153.2ms +/- 0.7% 30.8ms +/- 6.0% significant md5: 5.27x as fast 126.6ms +/- 1.1% 24.0ms +/- 8.2% significant sha1: 5.74x as fast 125.2ms +/- 0.8% 21.8ms +/- 2.6% significant date: 1.07x as fast 416.2ms +/- 1.1% 389.6ms +/- 1.6% significant format-tofte: 1.23x as fast 261.4ms +/- 1.3% 212.2ms +/- 2.0% significant format-xparb: *1.15x as slow* 154.8ms +/- 1.0% 177.4ms +/- 1.8% significant math: 3.79x as fast 619.6ms +/- 1.1% 163.4ms +/- 5.6% significant cordic: 3.24x as fast 294.2ms +/- 0.9% 90.8ms +/- 6.2% significant partial-sums: 3.39x as fast 177.8ms +/- 2.9% 52.4ms +/- 11.3% significant spectral-norm: 7.31x as fast 147.6ms +/- 0.5% 20.2ms +/- 2.8% significant regexp: *1.88x as slow* 305.8ms +/- 10.1% 573.6ms +/- 0.5% significant dna: *1.88x as slow* 305.8ms +/- 10.1% 573.6ms +/- 0.5% significant string: 1.61x as fast 1223.8ms +/- 3.3% 760.0ms +/- 1.4% significant base64: 1.81x as fast 154.8ms +/- 2.2% 85.6ms +/- 9.5% significant fasta: 3.84x as fast 306.0ms +/- 2.1% 79.6ms +/- 2.6% significant tagcloud: - 216.0ms +/- 3.5% 209.0ms +/- 1.5% unpack-code: 1.36x as fast 378.0ms +/- 10.1% 278.2ms +/- 2.3% significant validate-input: 1.57x as fast 169.0ms +/- 2.9% 107.6ms +/- 2.4% significant
That’s looking a bit more believable. On average Chrome/V8 seems to be twice as fast as Firefox/Spidermonkey, with results varying from 30 times faster to almost 2 times slower. It will be interesting to see how Tracemonkey compares, as it seems to be about 1.8 times faster than Spidermonkey.
Ruby is so, like, web 2.0. More than two graduating classes have, er, graduated since Ruby became the next big thing. That makes it nearly your grandad’s social networking application programming language, in these ‘internet speed’ times we live in. This would be the same community that has just realised that network connections that survive beyond a single request are actually quite useful. But that’s my XMPP / HTTP rant, not this one.
- Perl: Larry Wall
- Python: Guido van Rossum
- Ruby: Yukihiro Matsumoto
- Java: Sun
- .Net: Microsoft
Don’t use Double.NaN when you meant to say zero. Zero is, amazingly, a number.