Distributed Systems: Concepts and Design
To extend the ideas of Chapter 12 to deal with distributed transactions. To extend the three methods of concurrency control given in Chapter 12 for use with multiple servers. To study how distributed deadlock may be detected. To study how the all-or-nothing properties of transactions can be ensured in the presence of server failures by means of recovery techniques.
The two phase commit protocol ensures that all the servers in a transaction reach the same decision: to commit or to abort. Timeout actions are included in case servers fail. The protocol has considerable communication costs and can cause severe delays.
Distributed concurrency control is based on local concurrency control at each server. Locks will be held at each server until the coordinator announces the decision and the local commit is completed. As distributed deadlock detection is complex, timeouts on locks are a useful alternative.
A centralized solution to detection of distributed deadlocks is not scalable. Phantom deadlocks are a problem with non-centralized approaches to detection. Edge chasing is a technique for finding cyclic dependencies without storing the entire wait-for graph.
Recovery is concerned with ensuring the properties of failure atomicity and durability. The recovery file consists of a log of all the transactions at a server and can be used to restore its data items. Checkpoints are used to reduce the size of the recovery file. If the two-phase commit protocol has reached a decision to commit, the servers involved must complete it even if they fail repeatedly. The recovery file includes entries to ensure that this is possible.
Students find it hard to remember that servers cannot change their mind after having agreed to commit in the two-phase commit protocol.
The distributed deadlock algorithms are very complicated - they could be omitted.
The approach is to consider an architecture for a distributed transaction in which the servers operate independently until the transaction is ready to commit, at which point a coordinator is required. Students could be asked to say why atomic commitment is difficult and to suggest a solution. They can be asked to consider each of the places in Figure 13.6 where a timeout might be required and to consider the problem of coordinator failure. Discussions of alternative protocols can be based on Exercises 13.1-13.3.
The idea of independent servers can be extended to the discussion of distributed concurrency control. Locking, optimistic concurrency control and timestamp ordering should be revised before asking students to say how serial equivalence between independent servers can break down in each method. Discussion may be based on Exercises 13.6-13.9. This discussion can be extended to discuss how distributed deadlock arises.
The approach to Section 13.6 is that transaction recovery is well understood and has a well-defined fault model, but is not suitable for applications with strict requirements for performance and reliability in the presence of arbitrary faults. Revise Section 12.2 on the all-or-nothing properties of transactions and motivate the need for an intentions list. To explain the logging technique, discuss the transfer of the intentions list and associated data items to the recovery file. Emphasize that entries from different transactions can be interleaved - see Exercises 13.12- 13.16 which explore the interaction of recovery with timestamp ordering and optimistic concurrency control.
Revise the two-phase commit protocol and ask students to suggest what must be written to permanent storage at each step in Figure 13.6. Then compare the results with the table in Figure 13.19.