JDom, XPath and the saga of the invisible namespace

JDOM’s XPath implementation has (in my opinion)  big glaring bug with respect to its handling of the default namespace. That’s a namespace that looks a bit like this in the XML:

<?xml version="1.0" encoding="utf-8"?>
<myRootElement xmlns="https://darrenhobbs.com/some/namespace/2008/10/15"
               xmlns:foo="https://darrenhobbs.com/some/foo/namespace">
 <myChildElement>
  ... some stuff ...
 </myChildElement>
 <foo:aFooElement>
  ... some foo stuff ...
 </foo:aFooElement>
</myRootElement>

Note the ‘xmlns=…’, denoting the default namespace. As opposed to ‘xmlns:foo=’ which denotes the ‘foo’ namespace.

Let’s say I wanted to run an XPath query for: ‘//myChildElement’:

XPath xPath = XPath.newInstance("//myChildElement");
xPath.addNamespace(Namespace.getNamespace("https://darrenhobbs.com/some/namespace/2008/10/15"));
List nodes = xPath.selectNodes(aDocument);

This will never work. XPath does not play nicely with default namespaces. The solution is to register the same namespace URI against a made-up prefix and change the XPath like so:

XPath xPath = XPath.newInstance("//dh:myChildElement");
xPath.addNamespace("dh", "https://darrenhobbs.com/some/namespace/2008/10/15");

The query should then work.  This is not a new problem.

Chrome / V8 Javascript performance

According to http://code.google.com/apis/v8/run.html Chrome’s javascript engine is 10 times faster than Firefox 3.0. 1152 (Chrome) vs 110 (Firefox). Although they both take about 8 seconds to run on my machine. This is the danger of benchmarks. You can optimise for any benchmark but the real trick is in choosing which benchmarks to optimise for. And you ignore ‘user time’ at your peril. As a browsing human, I don’t really care how many milliseconds New and Improved Browser shaves off Brand X’s benchmarks. I care about how long I’m waiting for the hourglass to disappear or the spinny thing to stop spinning.

As a developer, of course I went off and found the webkit benchmark. Results below, in all their ugly unformatted glory:

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           2.28x as fast     5387.6ms +/- 0.6%   2358.2ms +/- 0.2%     significant

=============================================================================

  3d:                  3.69x as fast      621.6ms +/- 1.1%    168.6ms +/- 4.0%     significant
    cube:              5.35x as fast      229.0ms +/- 1.3%     42.8ms +/- 11.5%     significant
    morph:             2.88x as fast      205.4ms +/- 1.4%     71.2ms +/- 5.7%     significant
    raytrace:          3.43x as fast      187.2ms +/- 2.1%     54.6ms +/- 3.1%     significant

  access:              6.93x as fast      885.6ms +/- 0.9%    127.8ms +/- 4.3%     significant
    binary-trees:      13.7x as fast      112.6ms +/- 0.6%      8.2ms +/- 12.7%     significant
    fannkuch:          8.92x as fast      401.2ms +/- 0.1%     45.0ms +/- 2.0%     significant
    nbody:             4.86x as fast      210.8ms +/- 2.7%     43.4ms +/- 10.0%     significant
    nsieve:            5.16x as fast      161.0ms +/- 1.7%     31.2ms +/- 4.4%     significant

  bitops:              8.42x as fast      796.8ms +/- 0.2%     94.6ms +/- 5.3%     significant
    3bit-bits-in-byte: 26.6x as fast      154.2ms +/- 0.4%      5.8ms +/- 17.9%     significant
    bits-in-byte:      17.8x as fast      217.2ms +/- 0.5%     12.2ms +/- 8.5%     significant
    bitwise-and:       5.51x as fast      178.6ms +/- 0.9%     32.4ms +/- 4.4%     significant
    nsieve-bits:       5.58x as fast      246.8ms +/- 0.2%     44.2ms +/- 7.0%     significant

  controlflow:         28.3x as fast      113.2ms +/- 0.5%      4.0ms +/- 22.0%     significant
    recursive:         28.3x as fast      113.2ms +/- 0.5%      4.0ms +/- 22.0%     significant

  crypto:              5.29x as fast      405.0ms +/- 0.3%     76.6ms +/- 4.8%     significant
    aes:               4.97x as fast      153.2ms +/- 0.7%     30.8ms +/- 6.0%     significant
    md5:               5.27x as fast      126.6ms +/- 1.1%     24.0ms +/- 8.2%     significant
    sha1:              5.74x as fast      125.2ms +/- 0.8%     21.8ms +/- 2.6%     significant

  date:                1.07x as fast      416.2ms +/- 1.1%    389.6ms +/- 1.6%     significant
    format-tofte:      1.23x as fast      261.4ms +/- 1.3%    212.2ms +/- 2.0%     significant
    format-xparb:      *1.15x as slow*    154.8ms +/- 1.0%    177.4ms +/- 1.8%     significant

  math:                3.79x as fast      619.6ms +/- 1.1%    163.4ms +/- 5.6%     significant
    cordic:            3.24x as fast      294.2ms +/- 0.9%     90.8ms +/- 6.2%     significant
    partial-sums:      3.39x as fast      177.8ms +/- 2.9%     52.4ms +/- 11.3%     significant
    spectral-norm:     7.31x as fast      147.6ms +/- 0.5%     20.2ms +/- 2.8%     significant

  regexp:              *1.88x as slow*    305.8ms +/- 10.1%    573.6ms +/- 0.5%     significant
    dna:               *1.88x as slow*    305.8ms +/- 10.1%    573.6ms +/- 0.5%     significant

  string:              1.61x as fast     1223.8ms +/- 3.3%    760.0ms +/- 1.4%     significant
    base64:            1.81x as fast      154.8ms +/- 2.2%     85.6ms +/- 9.5%     significant
    fasta:             3.84x as fast      306.0ms +/- 2.1%     79.6ms +/- 2.6%     significant
    tagcloud:          -                  216.0ms +/- 3.5%    209.0ms +/- 1.5%
    unpack-code:       1.36x as fast      378.0ms +/- 10.1%    278.2ms +/- 2.3%     significant
    validate-input:    1.57x as fast      169.0ms +/- 2.9%    107.6ms +/- 2.4%     significant

That’s looking a bit more believable. On average Chrome/V8 seems to be twice as fast as Firefox/Spidermonkey, with results varying from 30 times faster to almost 2 times slower. It will be interesting to see how Tracemonkey compares, as it seems to be about 1.8 times faster than Spidermonkey.

Javascript is the next Ruby

Ruby is so, like, web 2.0. More than two graduating classes have, er, graduated since Ruby became the next big thing. That makes it nearly your grandad’s social networking application programming language, in these ‘internet speed’ times we live in. This would be the same community that has just realised that network connections that survive beyond a single request are actually quite useful. But that’s my XMPP / HTTP rant, not this one.

I’ve been of the opinion that Javascript is a much underrated and very powerful language that was only missing a key ingredient to become a ‘real’ language and escape the browser.  The ingredient being the backing of a big enough commercial entity. Popular programming languages generally have one of two things: either a charismatic and beneficent despot or a powerful company behind them.  Examples? Naturally.

  • Perl: Larry Wall
  • Python: Guido van Rossum
  • Ruby: Yukihiro Matsumoto
  • Java: Sun
  • .Net: Microsoft

And now…

  • Javascript: Google

Google are (at the time of writing) on the cusp of releasing their web browser, Chrome. While Chrome has many interesting and cool features, the most interesting is that they’ve written their own Javascript virtual machine, called V8. And the people they’ve got writing it are not short of experience in the realm of VM’s. Lars Bak has worked on Self, Strongtalk and the Hotspot Java VM and was (last I heard) working on a Smalltalk VM for embedded devices called OOVM before the company got bought by Esmertec.

Javascript (strictly speaking ECMAscript 4th edition) also seems to pass Steve’s NBL test and now it has a VM with large commercial organisation backing it, which was the thing it most obviously needed, in my view, to get traction beyond the browser.

Final piece of evidence? This year JAOO has a Javascript track. It’s time.

Don’t play with a loaded regex, kids

Otherwise you might be buttumed to be buttociated with the consbreastution of the united states of buttinine.

Try the following Google searches:

buttociate

buttinine

buttume

And, my personal favourite: consbreastution

Haven’t laughed so hard all day. But then I’m easily breastillated.

Update: and there’s more! Wouldn’t want to be a buttembly language developer, or worse, get buttbuttinated.

Funny? Laughed my burte off.

Source: http://thedailywtf.com/Articles/The-Clbuttic-Mistake-.aspx

Unsend

The process of migrating from movable type involved a bit of skimming over old posts. I was certainly, er, ‘younger’ back then. The me of 2002 certainly wrote a lot of crap nonsense.

Not sure the me of 2008 has significantly improved on that score, except whereby writing less at least reduces the number of opportunities to open my mouth and put my foot in it.

Comments!

Plunge taken. Comments enabled on new posts. Come on spammers, I’m ready for you.  Or preferably, don’t.

All I need now is a bit of good old fashioned controversy to kickoff some heated ranting.

That seemed easy enough

Huge endorsement to johncompanies hosting for the same day actioning of my request to switch off an old redhat version onto a shiny fresh Ubuntu install. On a sunday. After I decided this morning to impulsively eject movable type and switch to wordpress, and change distro’s while I was at it. Couple of polite emails confirming I’d backed up everything I needed and blam, old server gone, new server running.

There’ll be broken permalinks all over the place, but who uses links to find stuff anymore? Its all about the search. I’ve maintained the main RSS feed at least, using a bit of mod_rewrite magic.

Just got to get my head around WP now and sort out a theme other than the default.

Update: So it seems that the mod_rewrite rules I used have also ‘preserved’ some of the MT style archive links but due to all the posts getting renumbered on import they all comedically point to different places than they used to.  I like that. I think I’ll leave it.

Dang that’s some mighty fine internets you got there

Today felt like the first day I discovered the world wide web. Pure awesome:

Before today I had never even heard of any of these artifacts of amazing. What a couple of idle hours will find. Okay, five. Or so. Roughly. Maybe seven. Anyway.

PS. If it’s not too late to mention, hotforwords might be a bit NSFW. A little. More embarrassing than, you know, dodgy. Maybe shrink the window a bit before clicking. And look over your shoulder. Check for reflections in windows behind you. Its fine really, just, perhaps, hard to explain if you’re not prepared. She’s got 2 degrees in philology you know. Just explain you only watch it for the etymologies. That excuse always works.