The File System of an Integrated Local Network Paul J
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 1985 ACM Computer Science Conference-Agenda for Compunng Research:The Challenge for Creativity, 1985 March 12-14 The File System of an Integrated Local Network Paul J. Leach, Paul H. Levlne, James A. Hamllton, and Bernard L. Stumpf Apollo Computer, Inc. 16 Elizabeth Drive, Chelmsford, MA 01824 Abstract Within the DOMAIN system, the network and the distributed Ale system contribute to this goal by al- The distributed file system component of lowing the professional to share programs, data, and the DOMAIN system is described. The DO- expensive peripherals, and to cooperate via electronic MAIN system is an architecture for networks mail, with colleagues in much the same manner as on of personal workstations and servers which cre- larger shared machines, but without the attendant dis- ates an integrated distributed computing envi- advantage of sharing processing power. Cooperation ronment. The distinctive features of the file sys- and sharing are facilitated by being able to name and tem include: objects addressed by unique iden- access all objects in the same way regardless of their tifiers (UIDs); transparent access to objects, re- location in the network. gardless of their location in the network; the abstraction of a single level store for accessing Thus, when we say that DOMAIN is an integrated all objects; and the layering of a network wide local network, we mean that all users and applications hierarchical name space on top of the UID based programs have the same view of the system, so that flat name space. The design of the facilities is they see it as a single integrated whole, not a collec- described, with emphasis on techniques used to tion of individual nodes. However, we do not sacrifice achieve performance for access to objects over the autonomy of personal workstations to achieve in- the network. tegration: each personal workstation is able to stand alone, but the system provides mechanisms which the user can select that permit a high degree of cooperation 1. Introduction and sharing when so desired. Another reason we say that DOMAJN is an inte- This paper describes the design of the distributed grated local network is that each machine runs a com- file system for the Apollo DOMAIN operating system. plete (but highly configurable) set of standard software, DOMAIN is an integrated local network of powerful which (potentially) provides it with all the facilities it personal workstations and server computers ([APOL normally needs - Ale storage, name resolution, and so 811, [NELS 811); both of which are called nodes. A forth. In contrast are server-based distributed systems, DOMAIN system is intended to provide a substrate on wherein network wide services are provided by desig- which to build and execute complex professional, engi- nated machines (“servers”) which run special purpose neering and scientific applications ([NELS 831). Other software tailored to providing some single service or systems built following the integrated model of dis- smal1 number of services (e.g. Grapevine [BIRR 821, tributed computing include EDEN [LAZO 811 and LO- WFS [SWIN 791, and DFS [STUR SO]). DOMAIN has CUS [POPE 811. server nodes; however, they are created by configur- ing the standard hardware and software for a special Permission to copy without fee all or part of this material is granted purpose - a “Ale server” node, say, is created using a provided that the copies are not made or distributed for direct machine with several large disks and system software commercial advantage, the ACM copyright notice and the title of the configured with the appropriate device drivers. publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 1.1. Organization @ 1985 ACM 0-89791-ISO-4/85/003/0309$00.75 The rest of this paper is organized as follows. The remainder of this introduction briefly descibes the hard- ware environment on which the system runs. Section 2 provides an overview of the Ale system, and breaks 309 F’roceedmgs of the l!85 ACM Computer Science Conference-Agenda for Computing Research: The Challenge for Creativity, 1985 March 12-14 it into four major component groups. Section 3 gives a modifled statistics on a per page basis for the use of page block diagram of the Ale system structure, and a brief replacement software, and access protection controlling description of each module, locating it within one of the read, write and execute access. The differences between component groups. Sections 4, 5, 6, and 7 each describe the DATs of the different models are abstracted away one of these component groups. Finally, section 8 fo- by an &&iu (memory management unit) module. cuses on those aspects of the design which we believe have contributed most to the efilciency of the system. 1.2.3. Network 1.2. Hardware Environment The network is a 12 megabit per second baseband token passing ring (other ring implementations are de- A DOMAIN system consists of a collection of scribed in [WILK 791, [GORD 791; and reasons for pre- powerful personal workstations and server computers ferring a ring network in [SALT 791. [SALT 811). Each (generically, nodes) interconnected by a high speed lo- node’s ring controller provides the node with a unique cal network. node ID, which is assigned at the factory and contained in the controller’s microcode PROMS. The maximum 1.2.1. User Interface packet size is 2048 bytes. The controller has a broad- cast capability. Users interact with their personal nodes via a dis- We will not discuss the network further here; for play subsubsystem, which includes a high resolution purposes of the Ale system, all that is required is that raster graphics display, a keyboard and a locating de- the it deliver messages with high probability and low vice (mouse, touch pad, OF tablet). A typical display CPU overhead. For more information on the ring con- has 800 by 1024 pixels, and bit BLT (bit block trans- troller and data link layer protocols see [LEAC 831. fer) hardware to move arbitrary rectangular areas at high speed. Server nodes have no display, and are con- trolled over the network. More information on the user environment can be found in [NELS 841. 2. File System Overview The DOMAIN fiIe system is actually made of four 1.2.2. CPU distinct components: an object storage system (OSS), There are several models of both personal and sever the single level store (SLS), the lock manager, and the nodes. Their ‘tick’ times [LAMP 801 range from .4 naming server. (See figure 1 for a block diagram.) to 1.25 microseconds; their maximum main memory The OSS provides a flat space of objects (storage ranges from 3.5 megabytes to 8 megabytes. Most per- containers) addressed by unique identiilers (UIDs). Ob- sonal nodes have 33 to 154 megabytes of disk storage jects are typed, protected, abstract information con- and a 1 megabyte floppy disk, but no disk storage is tainers: associated with each object is the UID of a required for a node to operate. Server nodes configured type descriptor, the UID of an access control list (ACL) as file servers can have 300-1000 megabytes or more object, a disk storage descriptor, and some other at- of disk storage; those configured as peripheral servers tributes: length; date time created, used and modi- can have printers, magnetic tape drives, plotters, and Aed; reference count; and so forth. Object types in- so forth. clude: alphanumeric text, record structured data, IPC All nodes have dynamic address translation (DAT) mailboxes, DBMS objects, executable modules, directo- hardware which supports up to 128 processes, with ries, access control lists, serial I/O ports, magnetic tape each process able to to address 16 or 256 megabytes drives, and display bit maps. (Other objects which are of demand paged virtual memory (depending on CPU not information containers also exist. UIDs are used model). The DAT hardware on some models uses a re- to identify processes; and to identify persons, projects, verse mapping scheme, similar to that used in the IBM organizations, and protected subsystems for authenti- System/38 [HOUD 781; it is a large, hardware hash cation and protection purposes.) The distributed OSS table keyed by virtual address, with the physical ad- makes the objects on each node accessible throughout dress given by the hash table slot number in which a the network (if the objects’ owners so choose by setting translation entry is stored. Other models use a forward the objects’ ACLs appropriately). The operations pro- mapping scheme, similar to the VAX [DEC 79j or Sys- vided by the OSS on storage objects include: creating, tem/370 [IBM 761. The DAT also maintains used and deleting, extending, and truneating an object; reading 310 F’rweediigs of the 1985 ACM Computer Science Conference-Agenda for Computing Research: The Challenge for Creativity, 1985 March 12-14 or writing a page of an object; getting and setting at- 3. File System Structure tributes of an object such as the ACL UID, type UID, and length: and locating the home node of an object. Figure 1 shows a block diagram of the iile sys- The OSS automatically uses a node’s main memory as a tem. Each of the major component groups is indicated cache of recently used pages, attributes, and locations by a different form of shading. The arrows between of objects, including remote ones. It does nothing to blocks indicate call dependencies; in addition, all mod- guarantee cache consistency between nodes; however, ules above the “pageable” boundary have an implicit it does provide mechanisms that the lock manager can dependency on the SLS.