Distributed Operating Systems
Total Page:16
File Type:pdf, Size:1020Kb
Distributed Operating Systems ANDREW S. TANENBAUM and ROBBERT VAN RENESSE Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways. This paper is intended as an introduction to distributed operating systems, and especially to current university research about them. After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed. Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden. Categories and Subject Descriptors: C.2.4 [Computer-Communications Networks]: Distributed Systems-network operating system; D.4.3 [Operating Systems]: File Systems Management-distributed file systems; D.4.5 [Operating Systems]: Reliability-fault tolerance; D.4.6 [Operating Systems]: Security and Protection-access controls; D.4.7 [Operating Systems]: Organization and Design-distributed systems General Terms: Algorithms, Design, Experimentation, Reliability, Security Additional Key Words and Phrases: File server INTRODUCTION more convenient to use than the bare ma- chine. Examples of well-known centralized Everyone agrees that distributed systems (i.e., not distributed) operating systems are are going to be very important in the future. CP/M,’ MS-DOS,’ and UNIX.3 Unfortunately, not everyone agrees on A distributed operating system is one that what they mean by the term “distributed looks to its users like an ordinary central- system.” In this paper we present a view- ized operating system but runs on multi- point widely held within academia about ple, independent central processing units what is and is not a distributed system, we (CPUs). The key concept here is transpar- discuss numerous interesting design issues ency. In other words, the use of multiple concerning them, and finally we conclude processors should be invisible (transparent) with a fairly close look at some experimen- to the user. Another way of expressing the tal distributed systems that are the subject same idea is to say that the user views of ongoing research at universities. the system as a “virtual uniprocessor,” not To begin with, we use the term “distrib- as a collection of distinct machines. This uted system” to mean a distributed operat- is easier said than done. ing system as opposed to a database system Many multimachine systems that do not or some distributed applications system, fulfill this requirement have been built. For such as a banking system. An operating system is a program that controls the re- ’ CP/M is a trademark of Digital Research, Inc. sources of a computer and provides its users ’ MS-DOS is a trademark of Microsoft. with an interface or virtual machine that is 3 UNIX is a trademark of AT&T Bell Laboratories. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1986 ACM 0360-0300/85/1200-0419 $00.75 ComputingSurveys, Vol. 17, No. 4, December 1985 420 l A. S. Tanenbaum and R. van Renesse CONTENTS To make the contrast with distributed operating systems stronger, let us briefly look at another kind of system, which we call a “network operating system.” A typical INTRODUCTION configuration for a network operating sys- Goals and Problems tem would be a collection of personal com- System Models puters along with a common printer server 1. NETWORK OPERATING SYSTEMS 1.1 File System and file server for archival storage, all tied 1.2 Protection together by a local network. Generally 1.3 Execution Location speaking, such a system will have most of 1.4 An Example: The Sun Network the following characteristics that distin- File System guish it from a distributed system: 2. DESIGN ISSUES 2.1 Communication Primitives l Each computer has its own private oper- 2.2 Naming and Protection 2.3 Resource Management ating system, instead of running part of 2.4 Fault Tolerance a global, systemwide operating system. 2.5 Services l Each user normally works on his or her 3. EXAMPLES OF DISTRIBUTED own machine; using a different machine OPERATING SYSTEMS invariably requires some kind of “remote 3.1 The Cambridge Distributed Computing System login,” instead of having the operating 3.2 Amoeba system dynamically allocate processes to 3.3 The V Kernel CPUS. 3.4 The Eden Project l Users are typically aware of where each 3.5 Comparison of the Cambridge, Amoeba, V, and Eden Systems of their files are kept and must move files 4. SUMMARY between machines with explicit “file ACKNOWLEDGMENTS transfer” commands, instead of having REFERENCES file placement managed by the operating system. l The system has little or no fault toler- ance; if 1 percent of the personal com- puters crash, 1 percent of the users are example, the ARPANET contains a sub- out of business, instead of everyone sim- stantial number of computers, but by this ply being able to continue normal work, definition it is not a distributed system. albeit with 1 percent worse performance. Neither is a local network consisting of personal computers with minicomputers and explicit commands to log in here or Goals and Problems copy a file from there. In both cases we The driving force behind the current inter- have a computer network but not a distrib- est in distributed systems is the enormous uted operating system. Thus it is the soft- rate of technological change in micropro- ware, not the hardware, that determines cessor technology. Microprocessors have whether a system is distributed or not. become very powerful and cheap, compared As a rule of thumb, if you can tell which with mainframes and minicomputers, so it computer you are using, you are not using has become attractive to think about de- a distributed system. The users of a true signing large systems composed of many distributed system should not know (or small processors. These distributed sys- care) on which machine (or machines) their tems clearly have a price/performance ad- programs are running, where their files vantage over more traditional systems. are stored, and so on. It should be clear by Another advantage often cited is the rela- now that very few distributed systems are tive simplicity of the software-each pro- currently used in a production environ- cessor has a dedicated function-although ment. However, several promising research this advantage is more often listed by projects are in progress. people who have never tried to write a Computing Surveys, Vol. 17, No. 4, December 1985 Distributed Operating Systems l 421 distributed operating system than by those VAXs), each with multiple users. Each user who have. is logged onto one specific machine, with Incremental growth is another plus; if remote access to the other machines. This you need 10 percent more computing model is a simple outgrowth of the central power, you just add 10 percent more pro- time-sharing machine. cessors. System architecture is crucial to In the workstation model, each user has this type of system growth, however, since a personal workstation, usually equipped it is hard to give each user of a personal with a powerful processor, memory, a bit- computer another 10 percent of a personal mapped display, and sometimes a disk. computer. Reliability and availability can Nearly all the work is done on the work- also be a big advantage; a few parts of the stations. Such a system begins to look dis- system can be down without disturbing tributed when it supports a single, global people using the other parts. On the minus file system, so that data can be accessed side, unless one is very careful, it is easy without regard to their location. for the communication protocol overhead The processor pool model is the next to become a major source of inefficiency. evolutionary step after the workstation There has been built more than one system model. In a time-sharing system, whether requiring the full computing power of its with one or more processors, the ratio of machines just to run the protocols, leaving CPUs to logged-in users is normally much nothing over to do the work. The occasional less than 1; with the workstation model it lack of simplicity cited above is a real prob- is approximately 1; with the processor pool lem, although in all fairness, this problem model it is much greater than 1. As CPUs comes from inflated goals: With a central- get cheaper and cheaper, this model will ized system no one expects the computer to become more and more widespread. The function almost normally when half the idea here is that whenever a user needs memory is sick. With a distributed system, computing power, one or more CPUs are a high degree of fault tolerance is often, at temporarily allocated to that user; when least, an implicit goal. the job is completed, the CPUs go back into A more fundamental problem in distrib- the pool to await the next request. As an uted systems is the lack of global state example, when ten procedures (each on a information. It is generally a bad idea to separate file) must be recompiled, ten pro- even try to collect complete information cessors could be allocated to run in parallel about any aspect of the system in one table. for a few seconds and then be returned to Lack of up-to-date information makes the pool of available processors.