Distributed Systems: Concepts and Design
To give the student some intuitions and mathematical tools for thinking about the execution of distributed systems, by exploring the notions of physical and logical time and global states.
To give students an appreciation of the utility of synchronized clocks in distributed systems, and of the variability in network delays that stands in the way of accurate synchronization. To understand the key features of Cristian's synchronization algorithm, the Berkeley algorithm and the Network Time Protocol. To understand the utility of logical clocks (Lamport and vector), the rules for updating them and their limitations.
To understand why instantaneous snapshots are impossible and that this is a fundamental property of distributed systems. To understand the practical importance of global states for reasoning about debugging, deadlocks etc. To understand the construction of a snapshot algorithm.
To appreciate the impact of whether we use a synchronous or asynchronous system model on the algorithms we can construct.
The lack of universal time is a fundamental characteristic of distributed systems. However, it is possible to synchronize clocks to reasonable degrees of accuracy, which are adequate for many purposes. Internal synchronization algorithms synchronize clocks together, without reference to UTC; external synchronization algorithms synchronize them to UTC.
Where clock accuracy is an important requirement rather than a convenience (that is, in real-time distributed systems), a more critical analysis is required than that given in the chapter - in particular where clocks may report false values, and where time servers must be assumed to fail occasionally. Questions then arise about the guarantees that can be placed on clock values returned to the user.
Logical clocks enable us to reason about the order in which events occur without reference to real time. The happened-before relation captures events that are related at the level of the application semantics; unfortunately, it can also capture events that are unrelated at this level. The students should appreciate that sometimes hidden causal channels (such as telephones) exist, which do not appear in our system model.
A global state consists of both process and channel states.
A general snapshot algorithm such as Chandy and Lamport's may be replaced by a more efficient, specialized algorithm intended for detecting a specific type of state such as deadlock.
The debugging algorithm that we present is general but extremely inefficient; however it helps us to understand what is involved in distributed debugging.
Students may find it hard to see the point of the material on logical clocks. Concentrate on the happened-before relation and encourage students to think of examples of actions that depend on prior actions.
Many students find the material in 10.5 and 10.6 hard-going. Try formulating the problems more concretely (e.g. a particular debugging example) and setting them the task of finding a solution themselves.
Encourage students to state and prove the properties of the algorithms mathematically (see e.g. [Raynal 1988]).
To help the students understand the notion of global states and consistent global states, try using the metaphor of playing video footage of views of different parts of the same scene, on separate video players. By winding the videos to arbitrary points we can obtain combinations of states, many of which do not make sense (an effect without the cause).