On the nature of time in distributed applications

Tracking the sequence of events in distributed applications is tricky, especially when queues are involved. Messages can get backed up at certain points, arrive on dead-letter queues and be subsequently requeued, arrive out of sequence etc. This has the added effect that it may not be possible to accurately report on all the in-flight transactions at a given point in time until a certain period after that time.

Meanwhile, your application may have internal counters and time based identifiers that rollover periodically (at midnight for example). So how to reconcile temporally decoupled events with time-sensitive counters?

One approach is to go back to first principles, and use timestamps to tag messages at critical points (for example, the time they are created). Then when they arrive at their destinations, the timestamp on the message can be reconciled to the internal counter value that was in force at that time. This way, even messages that languish on dead letter queues and get subsequently requeued can be slotted into the correct place when they finally arrive.

This approach only works if your systems have a very precise agreement about what time it is. Thankfully there is a protocol that exists specifically for this very purpose, and is built in to both Windows and every flavour of Unix-like system, called Network Time Protocol (NTP). In Windows its sometimes controlled via the ‘NET TIME’ command or ‘w32tm.exe’ or ‘w32time.exe’ depending on your windows version.

NTP is a client-server based protocol, based on hierarchical tiers (NTP calls them strata) of servers, with the authoritative time servers being machines with atomic clocks, or other highly accurate time device. There are many public NTP servers out there, links to which can be found at the site mentioned below.

Its really pretty straightforward to setup an internal NTP server group, and as having a consistent enterprise-wide picture of time is so useful, should really be done as a matter of course when setting up distributed enterprise systems.

More information: