Distributed Systems: Concepts and Design
This chapter aims to provide students with conceptual models to support their study of distributed systems. The models will motivate the study of many of the design problems and solutions described throughout the book. They are of two types:
These provide a high-level view of the distribution of functionality between components and the relationships between them. They will assist readers to organize their knowledge of system components and the patterns of communication (or interaction) between them. Architectural models determine the distribution of data and computational tasks amongst the physical nodes of the system and are helpful when evaluating the performance, reliability, scalability and other properties of distributed systems.
These are vertical views or slices, representing some key aspects of distributed systems. Each fundamental model represents a set of issues that must be addressed in the design of distributed systems. Of course, the three fundamental models presented in this chapter do not encompass all of the design issues for distributed systems. Rather, they represent key areas and provide examples af a design approach by which related sets of design issues are extracted and reasoned about independently of the rest.
Middleware, as represented for example by CORBA and Java/Jini, provides essential support for distributed application development and is likely to offer greater support in the future. But the end-to-end argument implies that some aspects of communication support cannot always be abstracted away from applications.
The client-server architecture is predominant, but there are many variations, some of which differ markedly from the client-server model. All of those mentioned in Section 2.2 are important.
In an asynchronous system, for example, the Internet, no bounds can be set on process execution speeds, message delays or clock drift rates. In these conditions, it is impossible to synchronize computer clocks. Synchronous systems can be built, by supplying processors with clocks with bounded drift rates and providing sufficient processor and network capacity to satisfy known resource requirements.
Processes and communication channels can fail. The classificaton of their failures is useful for the analysis of failures of protocols. Components that exhibit byzantine or arbitrary failures may do anything at any time. Timing failures occur only in synchronous systems. Most failures in distributed systems are benign (e.g. omission but not byzantine failures). A service may mask the failures of the components from which it is constructed, for example, reliable one-to-one communication may be built by masking omission failures.
The security model discusses the possible threats to processes and communication channels. It introduces the concept of a secure channel, which is secure against those threats.
The end-to-end argument (p. 33) may seem counter-intuitive for those who take a bottom-up approach to system engineering. As Dave Reed points out in [ www.reed.com ], it was the subject of much debate between experienced designers and engineers during the early development of the Internet; the outcome was firmly based on end-to-end primciples (e.g. for TCP, DNS and the Web) and the current scale of the Internet would have been inconceivable had it not been so.
The system architectures of Section 2.2.2 can be related to the Internet. For example, Figure 2.2 could be an illustration of browsers and a web server which uses a local file server. Or Figure 2.3 could illustrate a replicated web service. Figures 2.4 and 2.6 already refer to web examples.
In discussing the synchronous/asynchronous models of distributed systems, try to bring out what is really needed for the former. Exercise 2.11-12 address some of the issues.
Before embarking on the failure model, try asking students to suggest all the possible things that can go wrong in a distributed system. Exercise 2.10 could be used to relate failures to Section 2.2.5 on the use of caching and replication.
When discussing the classification of their failures, mention that the classes will be used again in the discussion of the characteristics of the Internet protocols (UDP and TCP in Chapter 4). Use Exercise 2.14-15 to illustrate theuse of the classification.
To help with the idea of a service masking the failures of the components from which it is constructed, see Exercise 2.16, or discuss the use of checksums in network protocols to mask wrong values (byzantine failures) and replace them with omission failures.
In discussing the integrity property of reliable communication, point out that integrity can be broken both by failures and by security breaches.