LOCUS Distributed Operating System I
Total Page:16
File Type:pdf, Size:1020Kb
The LOCUS Distributed Operating System I Bruce Walker, Gerald Popek, Robert English, Charles Kline and Greg Thiel 2 University of California at Los Angeles Abstract system supports a very high degree of network tran- sparency, i.e. it makes the network of machines ap- LOCUS Is a distributed operating system pear to users and programs as a single computer; which supports transparent access to data machine boundaries are completely hidden during through a network wide fllesystem, permits normal operation. Both files and programs can be automatic replication of storaget supports moved dynamically with no effect on naming or transparent distributed process execution, correct operation. Remote resources are accessed in supplies a number of high reliability functions the same manner as local ones. Processes can be such as nested transactions, and is upward created locally and remotely in the same manner, compatible with Unix. Partitioned operation and process interaction is the same, independent of of subnetl and their dynamic merge is also location. Many of these functions operate tran- supported. sparently even across heterogeneous cpus. The system has been operational for about LOCUS also provides a number of high reliability two years at UCLA and extensive experience facilities, including flexible and automatic replication In its use has been obtained. The complete of storage at a file level, a full implementation of system architecture is outlined in this paper, nested transactions[MEUL 83], uad a substantially and that experience is summarized. more robust data storage facility than conventional Unix systems. All of the functions reported here 1 Introduction have been implemented, and most are in routine use. LOCUS is a Unix compatible, distributed operat- This paper provides an overview of the basic ing system in operational use at UCLA on a set of 17 LOCUS system architecture. The file system, espe- Vax/750's connected by a standard Ethernet s The cially its distributed naming catalog, plays a central role in the system structure, both because file system activity typically predominates in most operating 1 This research was supported by the Advanced Research systems and so high performance is critical, and be- Projects Agency under research contract DSS-MDA-903- cause the generalized name service provided is used 82-C-0189. by so many other parts of the system. Therefore, Current address of Bruce Walker, Charles Kline and the file system is described first. Remote processes Greg Thiel: LOCUS Computing Corporation, 3330 Ocean are discussed next, including discussions of process Park Blvd., Santa Monica, Ca. 90405. creation, inter-process functions and error handling. Initial work was done on DEC PDP-11/45's and VAX An important part of the LOCUS research con- 750's using both 1 and 10 megabit ring networks. cerns recovery from failures of parts of the system, including partition of a LOCUS system into separated Permission to copy without fee all or part of this material is granted but functioning subnetworks. The next sections of provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the this paper discuss the several LOCUS facilities dedi- publication and its date appear, and notice is given that copying is by cated to recovery. First is the merging of the nam- permission of the Association for Computing Machinery. To copy ing catalog; the hierarchical directory system. The otherwise, or to republish, requires a fee and/or specific permission. handling of other object types in the file system is also briefly considered. These recovery algorithms '©1983 ACM 0-89791-115-6/83/010/0049 $00.75 49 are designed to permit normal operation while A substantial amount of the LOCUS filesystem resources are arriving and departing. Last, the pro- design, as well as implementation, has been devoted tocols which LOCUS sites execute in order to main- to appropriate forms of error and failure manage- tain and define the accessible members of a network, ment. These issues will be discussed throughout this i.e. the network topology, are discussed. These pro- paper. Further, high performance has always been a tocols are designed to assure that all sites converge critical goal. In our view, solutions to all the other on the same answer in a rapid manner. problems being addressed are really not solutions at all unless their performance is suitable. In LOCUS, The paper concludes with a set of observations when resources are local, access is no more expensive regarding the use of LOCUS in production settings, than on a conventional Unix system. When especially the value of its network transparent inter- resources are remote, access cost is higher, but face. dramatically better than traditional layered file transfer and remote terminal protocols permit. 2 Distributed Fllesystem Measured performance results are presented in [GOLD 83]. S.I Filesystem overview The LOCUS filesystem presents a single tree 2.2 File Replication structured naming hierarchy to users and applica- tions. It is functionally a superset of the Unix tree 2.2.1 Motivation for Replication structured naming system. There are three major Replication of storage in a distributed filesystem areas of extension. First, the single tree structure in serves multiple purposes. First, from the users' point LOCUS covers all objects in the filesystem on all of view, multiple copies of data resources provide the machines. LOCUS names are fully transparent; it is opportunity for substantially increased availability. not possible from the name of a resource to discern This improvement is clearly the ease for read access, its location in the netwol'k. Such location tran- although the situation is more complex when update sparency is critical for allowing data and programs in is desired, since if some of the copies are not accessi- general to move or even be executed from different ble at a given instant, potential inconsistency prob- sites. The second direction of extension concerns re- lems may preclude update, thereby decreasing avai- plication. Files in LOCUS can be replicated to vary- lability as the level of replication is increased. ing degrees, and it is the LOCUS system's responsibil- ity to keep all copies up to date, assure that access The second advantage, from the user viewpoint, requests are served by the most recent available ver- concerns performance. If users of the file exist on sion, and support partitioned operation. different machines, and copies are available near those machines, then read access can be substantially To a first approximation, the pathname tree is faster compared to the necessity to have one of the made up of a collection of filegroups, as in a conven- users always make remote accesses. This difference tional Unix environment 1. Each group is a wholly can be substantial; in a slow network, it is self contained subtree of the naming hierarchy, in- overwhelming, but in a high speed local network it is cluding storage for all files and directories contained still significant 1. in the subtree. Gluing together a collection of file- groups to construct the uniform naming tree is done In a general purpose distributed computing en- via the mount mechanism. Logically mounting a file- vironment, such as LOCUS, some degree of replica- group attaches one tree (the filegroup being mount- tion is essential in order for the user to be able to ed} as a subtree within an already mounted tree. work at all. Certain files used to set up the user's The glue which allows smooth path traversals up and environment must be available even when various down the expanded naming tree is kept as operating machines have failed or are inaccessible. The start- system state information. Currently this state infor- up files in Multics, or the various Unix shells, are ob- mation is replicated at all sites. To scale a LOCUS network to hundreds or thousands of sites, this "mount" information would be cached. 1 In the LOCUS system, which is highly optimized for remote access, the cpu overhead of accessing a remote page is twice local access, and the cost of a remote open is 1 The term filegroup in this paper corresponds directly to significantly more than the case when the entire open can the Unix term filesystem. be done locally. 50 vious examples. Mail aliases and routing information greater, so one would have less directory replication are others. Of course, these cases can generally be to improve performance. handled by read-only replication, which in general The performance tradeoffs between update/read imposes fewer problems 1. rates and degree of replication are well known, and From the system point of view, some form of re- we have already discussed them. However, there are plication is more than convenient; it is absolutely other costs as well. For example, concurrency con- essential for system data structures, both for availa- trol becomes more expensive. Without replication bility and performance. Consider a file directory. A the storage site can provide concurrency control for hierarchical name space in a distributed environment the object since it will know about all activity. implies that some directories will have entries which With replication some more complex algorithm must refer to files on different machines. There is strong be supported. In a similar way, with replication, a motivation for storing a copy of all the directory en- choice must be made as to which copy of an object tries in the backward path from a file at the site will supply service when there is activity on the ob- where the file is stored, or at least "nearby". The ject. This degree of freedom is not available without principal reason is availability. If a directory entry replication. If objects move, then, in the no replica- in the naming path to a file is not accessible because tion case, the mapping mechanism must be more of network partition or site failure, then that file general.