Multicast multithreaded mayhem

Javagroups is cool. Multicast peer to peer messaging and remote procedure calls in a nice easy package. The documentation could be better – figuring out how to make it do stuff invariably involves a short stroll through the source. Good thing its open source, really.

One thing that did give me trouble was the supplied RpcDispatcher. This handy utility class makes it really easy to do RPC over multicast, upon which can be built such things as grid computing, or distributed failover. The flaw being that if you happen to make an RPC call that blocks, all subsequent RPC calls to that node will also block, and probably time out. Javagroups handles client-side timeout quite well, but the server side can potentially block indefinitely.

Fixing this meant I got to play with the deep joys of inter-thread communication and synchronisation. Even deeper joy was found when trying to test it. The only means of doing this I’ve found so far is to liberally sprinkle log statements all over the code, and watch the order they appear in, and when they stopped appearing as my code proceeded to block in all the wrong places. Maybe a log4j JUnitAppender class is required, that can assert that log calls were made with the right messages in the right order? Hmm.

O/R Mapping for fun and profit

Back on this old chestnut again…

  • Always use a synthetic primary key.
  • Never use a primary key with business meaning (in case my last point wasn’t clear).
  • Call your primary key something that can’t be mistaken for something with business meaning. ID is usually good. PK works too.
  • Don’t include the table name in any of your column names. They already know which table they belong to.
  • Use Longs as primary keys, and generate them from a sequence.
  • Use unique and ‘NOT NULL’ constraints to identify secondary keys (the ones that DO have business meaning).
  • Use hibernate.

O/R mapping is a compromise. Bending both your DB and your object layer slightly to acknowledge this fact is far better than bending one of them over backwards to avoid changing the other. Using synthetic primary keys is almost always a good move anyway, and helps object persistence a lot. Having a consistent primary key name and datatype makes automated persistence much easier (ie. code-generation).

XML is dead, long live… Lisp?

So I had a go at writing a Jython equivalent of an ANT script at the weekend. I reckon it should be possible, but the syntax and structure would have to change, which I wanted to minimize (cos if I can do that its conceivable that I could parse ANT’s XML and generate Jython automatically which would be cool). It might be possible to do something with Jython’s variable argument ‘*args’ syntax, but the output could look very odd.

The issue is in the nesting. How to make

<project>
<target name="dostuff" >
<copy>
<!-- copy stuff -->
</copy>
<javac>
<!-- javac stuff -->
</javac>
</target>
</project>

look like Jython? The closest representation would probably look a bit like:

foo = project(
target("dostuff",
copy(
srcdir="src", destdir="foo"
),
javac(
srcdir="foo", destdir="bar"
)
)
)

but it would get hairy with all of ANT’s optional attributes. I’m loath to admit it, but I think the closest direct language representation would be obtained by using Lisp or Scheme. Now that I know about SISC, that might be worth a try.

Alternatives to XML config files

Charles writes here about using SISC to configure Java applications at runtime. SISC is a Java based implementation of Scheme.

I speculated (at around the same time) about using Jython for the same purpose. Not only do these approaches allow very powerful runtime configuration of your application, but you are not limited by a schema or DTD. If a method is exposed, you can call it from your script.

Software isn’t engineering

In response to Les’ response:

Back in the days when PL/1 was the cool kid on the block, and projects were measured in the 100’s of man-years, software was engineered. Flow charts and diagrams were drawn and checked, everything was documented and approved, and only right at the end was anything actually programmed. We know this approach today as Waterfall. It is generally accepted to be a poor fit for most development projects. It is however, a very good fit for most ‘Engineering’ disciplines. I’m a Chemical Engineer by training, and I’ve been there (albeit briefly). Generally an engineering project starts off with one or more feasibilty studies, where you make some broad assumptions and run lots of computer simulations of the system to get a feel for which direction to go in. Then you progressively add detail and refine your assumptions until you have a detailed design that can be used as the basis for procurement and construction. You really don’t want to be making big changes during procurement and construction. Lead times for large equipment like compressors can be several months, and concrete only sets once.

How comparable is this to software development? Not much. The closest relative is probably the type of project (1000’s of man years) talked about in The Mythical Man-Month, which are right at one end of the spectrum, and nowhere near the reality of most software projects. Physical engineering disciplines were developed to tackle very specific physical engineering issues, and trying to shoehorn them onto software projects just isn’t going to work.

What we should be doing is developing a spectrum of practices that are specific to our own industry, instead of trying to force it into a mould that simply won’t fit.

Software, Buildings, Bridges & Testing

Les makes some good points in his comparison of software with buildings and other complex activities here.

Comparing software development with the engineering of complex physical objects is always tricky, and fraught with risks. There are many similarities (they’re both hard to do well), but I think the differences are more profound than the similarities. Les appears to disagree with the current trend of TDD replacing static checking and formal methods as a means of ensuring software quality, drawing a parallel with bridge construction. Bridge construction is heavily rooted in engineering maths and physics – calculating the forces involved and the materials needed to withstand them, along with sample testing the individual components.

There are two major differences between bridge engineering and software development. First, the cost of testing. Unit tests are essentially free. They are generally run many times each day while development progresses, and act as a quantifiable measurement of progress, as well as a means of ensuring quality. The other major difference is in the economic breakdown. In a bridge, the design cost is probably less than 30% of the total cost, the rest being materials and construction. In a software system, probably over 90% of the cost is in the design (by which I mean the process by which source code gets written). Construction (ie. compilation) is basically free. There may be costs associated with deployment, but they are marginal compared to the design costs. In a continuous integration environment, deployment is often automated (the cost is then in the development of the automation), happening several times a day. Its a bit harder to construct and deploy a bridge that often.

All of which means that there is a definite benefit in spending time and effort (ie. money) in formally verifying your bridge design is correct, as the cost of having to rebuild a collapsed bridge is considerably more than the cost of fixing a broken software build. If construction and testing are more or less free, then using TDD and building as often as possible really is the most cost-effective means to develop a software system.