Time, Clocks and the Ordering of Events in a Distributed System Leslie Lamport CACM v21 n7 1978 This paper is THE paper that clarified the notion of concurrent operations. It makes clear that the "atoms" of execution (memory changes, instructions, basic blocks, ...) have some mandatory ordering, some incidental ordering and some explicitly unordered. The mandatory ordering corresponds to causation (or at least the ceasation of inhibition) -- control and data dependencies define some operations to "happen before" others. Because this "happens before" relationship is transitive, a number (many) partial orders on the total set of operations is defined. As this is a "partial" ordering, there are pairs of operations which do not "have" to happen in one order or the other. For example, the sequential instructions of two different threads might be interleaved in different ways without effect on correctness, and probably will be in different runs of the program. These are the explicitly unordered operations. The incidentally ordered operations are those that in the program could be reordered without effecting the program correctness, but are in fact are coded with a specific order (control dependence). We will treat these as the same as causal dependencies, wishing for tools to allow us to treat them as unordered (see years of research in compiler-based program transformations and weakly consistent computer architecture). Causation dependencies can be quite complex -- for example, programs can effect changes in the physical world (actuators) and test changes in the physical world (sensors) such that the messaging media (noise, heat conductance, physical position) can be quite complex -- these are known as covert channels in that malicious programs can be expected to resort to these to communication when administrators deny simpler forms of communication. This treatment assumes that all communication is overt and (at least logically) are all done with one type of messages. It also assumes that the "atoms" of a process are instructions and there is an instruction-level counter (whose value will be unique to an instruction and monotonically increasing -- it is not an instruction counter per se as it can take big leaps forward). A logical clock is an assignment of numbers to operations such that the TOTAL order formed by these numbers is consistent with all mandatory and incidental partial orders. He proposes a specific logical clock, most important as a pedagogical example of the larger points he is making, as follows: - a clock value is a machine local counter concatenated with a machine id - operations are labelled with the local clock value - when a message is sent, it is labelled with a current local clock value - when a message is received, the local clock is reset to a value larger than the maximum of the local clock and the message's clock Lamport's example, the synchronization system, is interesting mostly in his assertion that the application logic, other than the logical clock, is simple. It is not. It involves application specific consensus -- a thorny issue in itself. But that is why Lamport also wrote his seminal Byzantine Generals paper. Lamport then turns to the issues of making progress -- the algorithm above works in the absence of bounded time events. With unbounded time operations (messages especially), one can't distinguish a failed machine from a slow one and failure recovery is hard. So Lamport addresses physical clocks based on bounded message time and drift-free oscillators. The approach he uses is to bound below the fastest message delivery so he can be assured that a message from a different machine arrives at another machine at a local time on the receiver that is later than the sender's local time at send time. The problem with this is that it requires clocks to be synchronized more tightly than is reasonable: minimum message times are often microseconds in fast local area networks, while distributed clock synchronization is rarely better than milliseconds. So in the real world logical clocks and loosely synchronized distributed clocks co-exist.