Robert L. Brown and Peter J. Denning, Research Institute for Advanced Computer Science Walter F. Tichy, Purdue University

Operating systems of 1955 were ing. Local networks, such as Ethernet, control programs a few thou- ring nets, and wideband nets, and net- sand bytes long that scheduled jobs, work protocols, such as X.25, PUP, drove peripheral devices, and billed and TCP/IP, allow large systems to be users. Operating systems of 1984 are constructed from many small ones. much larger both in size and respon- The available hardware has grown sibility. The largest ones, such as rapidly in power and sophistication. Honeywell's or IBM's MVS, In view of these rapid hardware ad- are tens of millions of bytes long. In- vances, we feel compelled to ask termediate ones, such as from whether hardware will eventually ob- AT&T Bell Laboratories or VMS viate software control programs. Is Far from fading out of from Digital Equipment Corporation, the intellectual core recorded in are several hundreds of thousands of texts outmoded? Are the picture, operating bytes long. Even the smallest, most operating systems an outmoded tech- systems continue to pared-down systems for personal nology? On the contrary-we believe meet the challenges of computers are tens of thousands of that the power and complexity of the new hardware develop- bytes long. new hardware intensifies the need for ments. Their success The intellectual value of operating operating systems, that the intellectual lies in abstraction system research was recognized in the core contains the concepts needed for levels that hide early 1970's. Virtually every cur- today's computer systems, and that riculum in computer science and en- operating systems are and will remain nonessential details gineering now includes a course on essential. from the user. operating systems, and texts are nu- merous. The continuing debates- over the set of concepts that should be What is an operating system? taught and over the proper mix be- tween concepts and implementation Before we can explain our view on projects-are signs of the field's vi- the future utility of operating systems, tality. we need to agree on what they are. The Since 1975, personal computers for oldest definition, which says an home and business have grown into a operating system is "a control pro- multibillion-dollar industry. Ad- gram for allocating resources among vanced graphics workstations and competing tasks," describes only a microcomputers have been proliferat- small portion of a modem operating

October 1984 0018-9162/O4/10000/173S01.00 © 1984 IEEE 173 sy.stem's responsibilities, and is hence sharing systems and test the many new than Multics, Unix retains most of its inadequate. operating system concepts. These in- predecessor's useful characteristics, Among the great problems faced by cluded MIT's Compatible Timeshar- such as processes, hierarchical file operating system designers is how to ing System, the University of Man- system, device independence, I/0 re- manage the comnplexity of operations chester Atlas, the University of Cam- direction, and a high-level language at manr levels of detail, from hard- bridge Multiple Access System, IBM . Unix dispensed with virtual wsare opeirations that take one billionth TSS/ 360, and RCA Spectra/70. The memory and the detailed protection of a second to softw are operations most ambitious project of all was system and introduced the pipe. It of- that take tenas of seconds. An early Multics (short for Multiplexed Infor- fered a large library of utility pro- stratagemi w as infortination hiding mation and Computing Service) for grams that were well integrated with confining the details of managing a the General Electric 645 (later re- the command language. Most of Unix class of "objects" within a module is written in a high-level language (C), that has a good interface with its users. allowing it to be transported to a wide With information hiding, designers Of great concern to OS designers variety of processors, from main- can protect themselves from extensive is how to manage complex frames to personal computers. 6,7 reprogramming if the hardware or operations at many levels of In systems with multiple Unix ma- some part of the software changes: the detail. chines connected by a high-speed local change affects only the small portion network, it is desirable to hide the of the software interfacing directly locations of files, users, and devices with that system component. This namled Honeywell 6180) processor. 3 from those who do not wish to deal principle has been extended from Multics simultaneously tested new with those details. Locus, a distributed isolated subsystems to an entire concepts of processes, interprocess version of Unix, satisfies this need operating system. The basic idea is to comiimunication, segmented virtual through a directory hierarchy that create a hierarchy of abstraction levels memory, page replacemeint, linking spans the entire network.8 MOS, so that at any level we can ignore the new segments to a computation on de- another distributed Unix, also uses a details of what is going on at all lower imiand, automatic mtiltiprogrammed global directory hierarchy as well as levels. At the highest level are system load control, access control and pro- automatically migrating processes users, who, ideally, are insulated from tection, hierarchical , device among machines to balance loads. 9 everything except what they want to independence, I/0 redirection, and a In recent years, a large family of acconmplish. Thus, a better definition high-lesel language shell. operating systems has been developed of an operating system is "a set of Another important concept of for personal computers, including softsvare extensions of primitive hard- third-generation systems was the vir- MS-DOS, PC-DOS, Apple-DOS, wvare, culminating in a virtual machine tual machine, a simulated copy of the CP/M, Coherent, and Xenix. All that serves as a high-level program- hiost. Virtual machines were first these systems have limited function, ming environment." tested around 1966 on the M44/44X being designed for 8- and 16-bit micro- Operating systems of this type can project at the IBM T. J. Watson processor chips with small memories. support diverse environments: pro- Research Center. In the early 1970's In many respects, the growth pattern gramming, text processing, real-time sirtual machines were used in IBM's of personal computers is repeating processing, office automation, data- CP-67 system, a timesharing system that of mainframes in the early 1960's base, and hobbyist. that assigned each user's to its for example, multiprocess oper- own virtual copy of the IBM 360/67 ating systems for microcomputers machine. This system, which has been have appeared only recently in the Current systems moxed to the IBM 370 machine, is form of pared-down, Unix-like sys- now called VM/370.4,' Because each tems such as Coherent and Xenix. Be- Most operatiing systems for large sirtuLal machine can run a different cause only large firms can sell enough mainframes are direct descendants of copy of the operating system, VM/370 machines to make their operating third-generation systems, such as is effective for developing new operat- systems viable, there is strong pressure Honeywell Multics, IBM MVS, VM/ ing systeim.s within the current oper- for standard operating systems. The 370, and CDC Scope. These systems ating systemn. However, because vir- emerging standards are PC-DOS, CP/ introduced important concepts such as tial machines are well isolated, com- M, and Unix. timesharing, multiprogramming, vir- municatioin among them is expensive Research on operating systems con- tual memory, sequential processes and awkward. tinues. Numerous experimental sys- cooperating via semaphores, hierar- Perhaps the most influential current tems are exploring new concepts of chical file systems, and device-in- operating system is Unix, a complete system structure and distributed com- dependent I /O. reengineering of Multics, originally putation. The operating system for the Duritng the 1960's, many projects for the DEC PDP computers. Al- Cambridge CAP machine exploits the x crc established to construct time- thotugh an order of magnitude smaller hardware's microcode support for

174 COMPUTER capability addressing to implement a problems for OS designers to solve. In ticular operating system but rather in- large number of processes in separate- other words, the need for operating corporates ideas from several systems, ly protected domains. Data abstrac- systems is stronger than ever. including facilities for distributed pro- tion is easy to implement on this cessing. machine. 10 Each level is the manager of a set of Star OS, an operating system for A model operating system objects, either hardware or software, the Cm* machine, supports the "task the nature of which varies greatly force," a group of concurrent pro- Overview. In the hierarchical struc- from level to level. Each level also cesses cooperating in a computation. ture of a model operating system, deflnes operations that can be carried Star OS also uses capabilities to con- functions are separated according to out on those objects, obeying two trol access to objects. Another oper- their characteristic time scales and general rules: ating system for the Cm* machine, their levels of abstraction. Table 1 * Hierarchy. Each level adds new called Medusa, is composed of several shows an organization spanning 15 operations to the machine and "utilities," each of which implements levels. It is not a model of any par- hides selected operations at lower a particular abstraction such as a file system. Each utility can include several parallel processes running on separate processors. There is no cen- tral control. II Table 1. An operating system design hierarchy. Grapevine, a distributed database LEVEL NAME OBJECTS EXAMPLE OPERATIONS and message delivery system used 15 Shell User programming Statements in shell widely within the Xerox Corporation, environment scalar data, language contains special nameservers that can array data find the locations of users, groups, and other services when given their 14 Directories Directories Create, destroy, attach, detach, symbolic names. Because Grapevine search, list has no central control, it can survive 13 User Processes User Process Fork, quit, kill, suspend, resume the failures of the nameserver ma- chines. 12 It does not provide all the 12 Stream I/O Streams Open, close, read, write services available in a high-level pro- gramming environment, however, so 1 1 Devices External devices and Create, destroy, open, close, peripherals such as read, write it is not considered a true operating printer, display, keyboard system. The kernel is an experimental 10 File System Files Create, destroy. open, close, system aiming for efficient, uniform read, write interfaces between system compo- 9 Communications Pipes Create, destroy, open, close, nents. A complete copy of the kernel read, write runs on each machine of the network and hides the locations of files, devices, and users. V is a descendant 8 Capabilities Capabilities Create. validate, attenuate of Thoth, an earlier system worked on by the author of V. 13,14 7 Segments Read, write, fetch The Provably Secure Operating 6 Local Secondary Blocks of data. device Read, write, allocate, free System, a level-structured system, has Store channels high-level code that has demonstrated success in the context of a rigorous, 5 Primitive Pro- Primitive process. Suspend, resume, hierarchical design methodology de- cesses semaphores, ready list wait, signal veloped at SRI International. '5 Al- 4 Fault-handler programs Invoke, mask, unmask, retry though it was intended for secure computing, PSOS explored many 3 Procedures Procedure segments, Call Mark stack. call, principles that can help any operating stack, display return toward the of system goal provable 2 Instruction Set Evaluation stack, micro- Load, store, un_op, correctness. program interpreter bin op, branch, array ref, These examples clearly illustrate etc. that the new technology, far from diverting interest from operating 1 Electronic Circuits Registers. gates, buses. Clear, transfer, complement, systems, has created new control etc. activate, etc. October 1984 175 levels. The operations visible at a The first four levels correspond mer the illusion of having a main giveen level form the instruction roughly to the basic machine as it is memory space large enough to hold set of an abstract machine. received from the manufacturer, al- the program and all its data even if the Hence, a program written at a though there is some interaction with available main memory is much c_isvc level can invoke visible the operating system. For example, smaller. Software at this level han- operations of lower levels but no interrupts are generated by hardware, dles the interrupts generated by the opei-ations of higher levels. but the -handler routines are hardware when a block of data is ad- Inl/orniai(ition hiditng. The details part of the operating system. dressed that is not in the main mem- of how an object is represented or Level 5 adds primitive processes, ory; this software locates the missing xi here it is stored are hidden which are simply single programs in block in the secondary store, frees w ithin the level responsible for space for it in the main store, and re- that type. Hence, io part oh anl quests level 6 to read in the missing object cani be chainged except by The principle of data abstraction block. applyingc an authiorized operation embodied in the levels model Level 8 implements capabilities that dates back to 1966. are unique internal addresses for soft- at higher levels. The principle of data abstractioin ware objects definable capabilities are read but in the levels mnodel traces At this level, emlbodied A validate back to Dennis and Van Horn's 1966 not altered. operation of execution. The informa- of higher level wvhich emphasized a simple the course enables the progammer paper-, to a actual pa- interface between users and the tion required specify primitive procedures to verify that is its stateword, a data struc- are capabilities of the ex- kerinel. "I The first instance of a work- process rameters the values of the ing operating system with a kernel tore that can hold pected types. This level level 8, the operating spannin, several levels was reported recisters in the processor. Through a deals exclusively with the bN Dijkstra in 1968. The idea has provides operation, system v the processor's atten- a single machine. At the been extenided to geinerate families of fhich transfers resources of process to another by however, the operating oper ating systems for related tion from one next level, of the first and a larger imachines'5 and to increase the por- saxing the stateword system begins to encompass the second. devices tabilitv ot anl operating system loading the stateword of world including peripheral a scheduler that and kernel. 'lTe PSOS is the first com- This level contains such as terminals and printers list" of avail- to the net- plete level-structured system reported selects, from a "ready other computers attached next to run world, pipes, files, anid tormally pioved correct in open able processes, the process work. In this liter altlore after the current process is switched devices, user processes, and directories off the processor. This level also pro- can be shared among all the machines. Single-machine levels: 1-8. l evels sides semaphores, the special vari- I iYoUch 8 in Table I are called sin- ables used to cause oine process to Multimachine levels: 9-14. Every olem-achine lev els because their stop and wait until another process object in the system has two names: its operations are well understood firom has signalled the completion of a external name, a string of characters pirimitiv e machines Land require little task. Another characteristic of this having some meaning to users, and its imiocil'ication t'or- advanced operating level is that hardware is easy to imple- internal namne, a binary code used by AN stems. nment. 1' Primitive processes are anal- the system to locate the object. The The lowe t levels include the haid- ooous to the system processes in user controls mapping from external \arc ot the sstcmll. Level 1 is theelec- PSOS and the lightweight processes to internal names by means of direc- tirollicc oi-Ct1itr' where objects are irr V octs. tories. The operating system controls IdCcisters, cates, memory cells, anid Level 6 handles access to the sec- mapping from internal names to phys- tthe like, andicl operations are clcarilgL ondary-storage devices of a particular ical locations and can move objects rIcisiters, r eadilne2 ImeCImlOrF cells, and machine. The programs at this level among several machines without af- tihe likc. tCVelI 2 adds the piocessor 's are responsible for operations such as fecting any user's ability to use those illStl-LlctiOnl Set, which carn deal with positioning the head of a disk drive objects. This principle, called delayed somcwhhat Imore absti act entitis Such anid transferring a block of data. Soft- binding, was important in third-gener- sS anl nc aloLation stack aInd an array of ware at a higher level determines the ation operating systems and is even meimicor V locations. level 3 adds the address of the data on the disk and more important today. I concept of a pr ocedoure and the places a request for it in the device's To hide the locations of all sharable OCperatioils ot call and ret urni. L evel 4 queue of pending work; the requesting objects, both external and internal ifit todcLies interr opts arnd a mecha- pFrocess then waits at a semaphore un- names must be global, that is be inter- ii oior ir1voking special procedures til the transfer has been comiipleted. pretable on any machine. Unique ex- I-!lc't1the iprOCe OI - rCCcix'(.IteIs inter- Leel 7 is a standard virtual mem- ternal names can be constructed as I'll pt Ioa't11 ors, a scher1ie that gives the program- path names in the directory hierarchy

1 76 COMPUTER (defined at level 14), while unique in- mation contained in pipes, files, and objects themselves. Thus, when a ternal names are provided by capabil- devices is regarded simply as streams directory of devices is searched for the ities (level 8). If the local network of bytes; requests for reading or string "laser," the result returned is communication system (level 9) is effi- writing move segments of data be- merely a capability for the laser cient, software at the higher levels can tween streams and a user process. Sec- printer. The capability must be passed obtain access to a remote object with ond, a user process is programmed to to a program at level 11, which little penalty. request all input and output via ports, handles the actual transmission to that Level 9 is explicitly concerned with attached by the OPEN operation at printer. communication between processes, runtime to specific pipes, files, or Level 15 is the shell, so called which can be arranged through a devices. because it is the level that separates the single mechanism called a pipe. A pipe user from the rest of the operating is a one-way channel: a stream of data system. The shell interprets a high- flows in one end and out the other. A The level structure is designed to level command language through request to read items is delayed until enable software verification, which the user gives instructions to the they are actually in the pipe. A pipe installation, and testing. system. Incorporating a listener pro- can connect two processes on the same gram that responds to a terminal's machine or on different machines keyboard, it parses each line of input equally well. A set of pipes linking Level 13 implements user processes, to identify program names and pa- levels in all the machines can serve as a which are virtual machines executing rameters, creates and invokes a user broadcast facility, which is useful for programs. It is important to distin- process for each program, and con- finding resources anywhere in the net- guish the user process from the prim- nects it as needed to pipes, files, and work.21 Pipes are implemented in itive process of level 5. All informa- devices. Unix6 and have been copied in recent tion required to define a primitive pro- systems such as i Max22 and Xinu.23 cess can be expressed in the stateword General comments on structure. Level 10 provides for long-term that records the contents of the The level structure is a hierarchy of storage of named files. While level 6 registers in the processor. A user pro- functional specifications designed to deals with disk storage in terms of cess includes not only a primitive pro- impose a high degree of modularity tracks and sectors-the physical units cess, but also a virtual memory con- and enable incremental software veri- of the hardware-level 10 deals with taining the program and its work- fication, installation, and testing. more abstract entities of variable space, a list of arguments supplied as In a functional hierarchy, a pro- length. Indeed, a file may be scattered parameters when the process was gram at one level may directly call any over many noncontiguous tracks and started, a list of objects with which the visible operation of a lower level. No sectors. To be examined or updated, a process can communicate, and certain information flows through any inter- file's contents must be copied between other information about the context in mediate level. The level structure can virtual memory and the secondary which the process operates. A user be completely enforced by a , storage system. If a file is kept on a process is much more powerful than a which can insert procedure calls or ex- different machine, level 10 software primitive process. pand functions in-line. 18 A recent ex- can create a pipe to level 10 on the Level 14 manages a hierarchy of ample of its use is Xinu, an operating file's home machine. directories that catalogs the hardware system for a distributed system based Level 11 provides access to external and software objects to which access on LSI 11/02 machines.23 input and output devices such as must be controlled throughout the It is important to distinguish the printers, plotters, and the keyboards network: pipes, files, devices, user level structure discussed here from the and display screens of terminals. processes, and the directories them- layer structure of the International There is a standard interface with all selves. The central concept of a direc- Standards Organization model of these devices, and a pipe can again be tory is a table that matches external long-haul network protocols.24 In the used to gain access to a device attached names of objects with capabilities con- ISO model, information is passed to another machine. taining their internal names. A hierar- down through all layers on the sending Level 12 provides a means of at- chy arises because a directory can in- machine and back up through all taching user processes interchangeably clude among its entries the names of layers on the receiving machine. Since to pipes, files, or 1/0 devices. The idea subordinate directories. Level 14 en- each layer adds overhead to a data is to make each fundamental opera- sures that subhierarchies encached at transmission, whether or not that tion of levels 9, 10, and 11 (OPEN, each machine are consistent with one overhead is required, models for long- CLOSE, READ, and WRiTE) look another. haul network protocol structure may the same so that the author of a pro- The directory level is responsible not be efficient in a local network. 8 gram need not be concerned with the only for recording the associations A significant advantage of func- differences in these objects. This between the external names and tional levels over information-trans- strategy has two steps. First, the infor- capabilities; other levels manage the ferring layers is efficiency: a program

October 1984 177 that does not use a given function will without having to rely on a central tions that cannot easily be verified a experience no overhead from that mapping mechanism-a strategy that priori. System procedures must check function's presence in the system. For enhances not only reliability in a at runtime only for the presence of ex- example, procedure calls will validate distributed network, but also efficien- pected capabilities as parameters capabilities only when they are ex- cy because central mechanisms are because the calling programs may be pected. Common objects (such as prone to be bottlenecks. unverified; other aspects of parameter pipes, files, devices, directories, and Operating system designers ought checking can be performed by a com- user processes) are implemented by to take reasonable steps to verify that piler. their own levels rather than as new each level of the operating system "types" within a general type-exten- meets its specifications. This goal also Distributed capabilities: level 8. The sion scheme. 25 serves efficiency as well as reliability, external names of sharable objects are Each level should be able to locate since runtime checks can be omitted character strings of arbitrary length local objects by their internal names from system code except for condi- that have meaning to users. Because these strings are difficult to manip- ulate efficiently, the operating system provides internal names for quick ac- cess to objects. One purpose of level 8 is to provide a standard way of repre- senting and interpreting internal names for objects. To prevent a process from applying invalid operations to an object with a known internal name, the operating system can attach a type code and an access code to an internal name. The combination ofcodes (type, access, in- ternal name) is called a capability. All processes are prevented from altering capabilities. The system assumes that because a process holds a capability for an object it is authorized to use that object. Processes are thus respon- sible for controlling the capabilities they hold. The simplest way to protect capabil- ities from alteration is to tag the memory words containing them with a special bit and to permit only one in- struction, "create-capability," to set that bit. 10,26 The IBM System 38 is a recent example of an efficient system using tags to distinguish capabilities from other objects in memory.27 Ca- pabilities were first proposed as an ef- ficient method ofimplementing an ob- ject-oriented operating system 16 and have continued to be used primarily for this reason'22,25,27 All existing implementations of capabilities are based on a central mechanism for mapping the internal Figure 1. The storage structure for representing objects consists of a chain name to an object. These mappings starting with a capability, through a local map under the control of the are direct extensions of virtual mem- object's level, through a descriptor block, to the object itself. Changing the object's location requires no change in any capability. The level generates in- ory addressing schemes. 2,26 Unfor- dex numbers locally when it creates objects. In this example, the process tunately, a central mapping scheme holds two T capabilities; one object is in a segment in virtual memory and the cannot be used with a distributed other is in a block of secondary storage. system because the component ma-

178 COMPUTER chines may fail. Consequently, the and stores in it the given initial value, must be applied to the object denoted responsibility for mapping must be sets up a descriptor block, finds an by T_cap. The compiler can validate distributed by allowing each pro- unused index and sets up the entry in that the first actual parameter on any cedure of levels 9 through 14 on each the local map, and finally creates call to APPLY_OP is indeed of type T machine to read and interpret locally a capability of type T (denoted by using another operation of level 8, the fields of validated parameter "T_cap"). called VALIDATE, which checks that capabilities. Creating a capability is a critical this parameter is a capability whose The storage structure and mapping operation. Level 8 implements a type code is T and whose access code scheme for capabilities are illustrated special operation for this purpose: enables operation OP. VALIDATE in Figure 1. The name field of capa- T_cap: = CREATE_CAP(I) can also be used to verify the presence bility type T consists of a code M for where I is the local index number of other capabilities among the other the machine on which the capability chosen by level L. When used inside parameters. was created and index number I. The the CREATE_T operation on ma- A procedure may reduce the access machine number is needed because chine M, CREATE-CAP constructs a rights of a capability it passes to some capabilities (those for open capability (T, A, M, I), sets to 1 the another procedure by using the level 8 pipes, files, and devices) can be used capability bit of the memory word operation ATTENUATE. (Table 3 only on the issuing machine. The ac- containing it, and returns the result. summarizes the operations imple- cess code specifies which Toperations The code for M comes from a register mented at level 8.) can be applied to the object. Index in the processor. The code for A is the number I is used by the level in charge one denoting maximum access. The of Tobjects on machine Mto address code for T comes Table 2: Capability type marks and from a field in the their abbreviations. a descriptor block for the given object. program status word, a processor The descriptor block records control register that also contains the program TYPE information about an object, time and counter of the current procedure. The LEVEL MARK ABBREVIATION date created and last updated, and must be set up to generate current size and 14 Directory dir attributes of the ob- PSW = T only for the CREATE_T ject. The location of the descriptor procedure and PSW = null for all 13 User process up block denotes the location of the ob- other procedures. CREATE-CAP ject-moving the descriptor block 11 Device dev fails if executed when PSW = null. Open device op dev from one machine to another effec- The procedures for applying opera- tively moves the object. tions to a given object have the generic 10 File file Procedures implementing opera- form Open file op file tions at levels 9 through 15 must con- form to certain standards that ensure APPLY_OP(T_cap, parameters) 9 Pipe pipe the proper use of capabilities. One is which means that OP (parameters) Open pipe op pipe an agreement on the codes for the ob- ject types, eight of which are listed in Table 2. We use the notation T_cap to denote a T capability, for example, Table 3. Specification of capability operations (level 8). file_cap. FORM OF CALL The remaining standards concern EFFECT the creation of capabilities pointing to T cap. = CREATE CAP (I ) If the type-mark in the current PSW is new objects and the application of non-null, create a new capability with type field set to that mark, access code specific operations to those objects. maximum, machine field the local Suppose level L (L > 8) is the manager machine identifier. and index 1. of This level contains a Tobjects. pro- VALIDATE (p. n. (Ti, al ), ., (Tn, an)) Verify the capability at the caller's vir- cedure to create new objects of type T tual address p. For at least one and one or more procedures to apply i=1, ., n the following must be given functions to T objects. The true: the capability contains Ti in its CREATE operation must use a call of type field and permits access ai. If Ti denotes op pipe. op file, or op dev, the form the machine field must match the iden- = tifier of the local machine. (Fails if these T_cap: CREATE_T(initial-value) conditions are not met.) This procedure performs all the steps cap : = ATTENUATE (cap, mask) Returns a copy of the given capability required to set up the storage for a with the access field replaced by the new of type T: it obtains bitwise AND of mask and the access object space field from in secondary storage for the object cap.

October 1984 179 Communications: level 9. The com- operations are implemented in the until the reader opens its end? What munications level provides a single same way as SEND and RECEIVE happens if either the reader or writer mechanism, the pipe, for moving in- operations for message queues.28 breaks its connection? Questions like formation from a writer process to a When the two processes are on dif- these are dealt with by a connection reader process on the same or dif- ferent machines, the communications protocol. A simple connection pro- ferent machines. The most important level must implement the network pro- tocol, called rendezvous on open and property of the pipe is that a reader tocols required to move information close, has the following properties: must stop and wait until a writer has reliably between machines (Figure 2). * The open-for-reading and the put enough data into the pipe to fill These protocols are much simpler open-for-writing requests may be the request. Level 9 gives the higher than long-haul protocols because con- called at different times, but both levels the ability to move objects gestion and routing control are not returns are simultaneous. among the nodes of the network. needed, packets cannot be received out * The CLOSE operation, executed The external interface presented by of order, fewer error types are possible, by the reader, shuts both ends of the comunications level consists of the and errors are less common. 8 the pipe; when executed by the commands in Table 4. When two The READ and WRITE operations writer, the operation is deferred communicating processes are on the become ambiguous unless both a until the reader empties the pipe. same machine, a pipe between them reader process and a writer process are can be stored in shared memory and connected to a pipe. Should a writer A pipe capability can be stored in a the READ_PIPE and WRITE PIPE be blocked from entering information file or a message and passed to an- other machine over an existing open pipe or by broadcast. A pipe capabili- ty can also be listed in a directory mak- Table 4. Specification of communication level interface (level 9). ing the pipe a global object. In this case, it is like a FIFO file in System-5 FORM OF CALL EFFECT Unix. 29 pipe cap CREATE PIPE () Creates a new empty pipe and returns a The communications level also con- capability for it. (If the caller is a user tains a broadcast operation to permit process! it can store this capability in a 12 to directory entry and make the pipe levels 10, 11, and request map- available throughout the system.) ping information from their counter- DESTROY PIPE (pipe cap) Destroys the given pipe (undoes a parts on other machines. For example, create-pipe operation). if the file level on one machine cannot locally open the file named by a given op pipe cap I= OPEN PIPE (pipe cap, rw) Opens the pipe named by the pipe capability by allocating storage and set- capability, it can broadcast that capa- ting up a descriptor block. Initially, the bility to the file levels of other ma- pipe is empty. If rw =write (rw can be chines; the machine actually holding read or write) the open-pipe capability the file responds with enough infor- has its write permission set and can be used only by the process at the input mation to allow the broadcaster to end of the pipe. If rw =read, the open- complete its pending OPEN opera- pipe capability has its read permission tion. set and can be used only by the process at the output end of the pipe. Does not Files: level 10. Level 10 implements return until both reader and writer have a long-term store for files, named requested connections. (Fails if the strings of bits of known, but arbitrary pipe is already open for writing when length that are potentially accessible rw write or for reading when rw = read ) If both sender and receiver from all machines in the network. are on the same machine, the open- Table 5 summarizes file operations. pipe descriptor block will indicate that To establish a connection with a shared memory can be used for the file, a process must present a file pipe" otherwise a network protocol to the opera- must be used. capability OPEN-FILE tion, which will find the file in second- READ PIPE These have the same effects as the ary storage and allocate buffers for WRITE PIPE READ, WRITE. and CLOSE operations CLOSE PIPE (described later in the section on transmissions between the file and the stream 1/0). caller. The transmissions themselves op pipe cap BROADCAST (msg) Broadcasts a message to all type are requested by READ-FILE and managers in the network that manage WRITE-FILE operations. Each objects of the same type as the calling READ operation copies a segment of local type manager. Returns an open information from the file to the pipe capability for reading responses. caller's virtual memory and advances

180 COMPUTER a read pointer by the length of the seg- ment. Each WRITE operation ap- pends a segment from the caller's vir- tual memory to the end of the file. In a multimachine system, the file level must deal with the problem of nonlocal files. When a process on one machine requests to open a file stored on another machine, there are two feasible alternatives: * Remote Open: Open a pair of pipes to level 10 on the file's home machine; READ and WRITE re- quests are relayed via the forward pipe for remote execution; results are passed back over the reverse pipe. An example is the Berkeley Cocanet system. 30 * File Migration: Move the file from its current machine to the machine on which the file is being opened; thereafter all READ and WRITE operations are local. An example is the Purdue Stork file system. 31 Figure 2. A network protocol must be used when two processes connected by a pipe are on different machines. The WRITE requests of the sender ap- pend segments to a stream awaiting transmission. The sender process The open-connection descriptor transmits the stream as a sequence of packets, which are converted back block for a file, which is addressed by into a stream and placed in the receiving buffer. Each READ request of the an open-file capability, indicates receiver waits until the requested amount of data is in the buffer, then whether READ and WRITE opera- returns it. tions can be performed locally or must interact with a surrogate process on another machine. In the latter case, Table 5. Specification of the files level interface (level 10). the required open-pipe capability is FORM OF CALL EFFECT implanted in the descriptor block by the command. file-cap := CREATE-FILE ( Creates a new empty file and returns a open-file capability for it. (If the caller is a user Figure 3 illustrates the types of process, it can store this capability in a capabilities generated and used during directory entry and make the file a typical file-editing session. available throughout the system.) One important improvement to the DESTROY-FILE (file-cap) Destroys the given file (undoes a basic file system is to allow multiple create-file operation). readers and writers by building into op_file_cap := OPEN-FILE (file-cap, rw) Opens the file named by the file READ and WRITE operations a solu- capability by allocating storage for buf- tion to the "readers and writers" syn- fers and setting up a descriptor block. 32 is to The value of rw (read, write, or both) is chronization problem. Another put in the access field of the open-file use a version control system to auto- capability. The read pointer is set to matically retain different revisions of a zero and the write pointer to the file's file; the file system can then provide length. (Fails if the file is already open.) access to the older versions when READ-FILE These have the same effects as the needed. 33 WRITE FILE READ, WRITE, and CLOSE operations CLOSE-FILE (described later in the section on stream 1/0). Devices: level 11. The devices level implements a common interface to a REWIND (op_iile_cap) Resets read pointer to zero. wide range of external I/O devices, in- ERASE (op_file_cap) Sets file length and write pointer to zero; releases secondary storage cluding terminal displays and key- blocks occupied by the file. boards, printers, plotters, time-of-day

October 1984 181 clock, and optical readers. The inter- face attempts to hide differences in devices by making input devices ap- pear as sources of data streams and output devices as sinks. Obviously, the differences cannot be completely hid- den-cursor-positioning commands must be embedded in the data stream sent to a graphics display, for ex- ample-but a surprising degree of uniformity is possible. Corresponding to each device is a program that translates commands at the interface into in- structions for operating that device. A considerable amount of effort may be required to construct a reliable, robust device driver. When a new device is at- tached to the system, its physical ad- dress is stored in a special file acces- sible to device drivers. Table 6 summarizes the interface for external devices. Stream 1/0: level 12. An important principle adopted in the hypothetical operating system described here is 1/0 independence. At levels 9, 10, and 11, the same fundamental operations (namely OPEN, CLOSE, READ, and WRITE) are defined for pipes, files, and devices. Although writing a block of data to a disk calls for a sequence of Figure 3. The steps of an editing session generate and use various events quite different from that capabilities. (1) Convert external name string to capability cl by a directory- search command. (2) Open file for reading and writing by command c2: = needed to supply the same data to the OPEN-FILE (c1, rw). (3) Copy file into buffer by the command READ-FILE laser printer or to the input of another (c2, , all). (4) Edit contents of buffer.(5) Replace olderversion of file bypairof program, the author of a program commands "ERASE (c2); WRITE-FILE (c2, b, 1)," where I is the length of buf- does not need to be concerned with fer. (6) Close file by command CLOSE-FILE (c2). those differences. All READ and WRITE statements in a program can Table 6. Specification of devices level interface (level 11). refer to I/O ports, which are attached to particular files, pipes, or devices on- FORM OF CALL EFFECT ly when the program is executed. dev-cap := Returns a capability for a device of the given type at the This strategy, an instance of de- CREATE_DEV (type, address) given address. The access code of the returned layed binding, can greatly increase the capability will not include "W" if the device is read- A library only or " R" if the device is write-only. versatility of a program. program (such as the pattern-finding DESTROY_DEV (dev_cap) Detaches the given device from the system (undoes a create-device operation). "grep" program in Unix) can take its input from a file or directly from a ter- opEdev cap:= r Opens the device named by the device capability by OPEN-DEV (dev-cap, rw) allocating storage for buffers and setting up a descriptor minal and can send its output to block. The value in the access field of the open-device another file, to a terminal, or to a capability is the logical AND of rw and the access code printer. Without delayed binding, of the device capability. (Fails if the device is already each program would have to be writ- open.) ten to handle each possible combina- READ_DEV These have the same effects as the READ, WRITE, and tion of source and destination. WRITE_DEV CLOSE operations (described in the section on stream CLOSE_DEV A common model of data must be 1/0). used for pipes, files, and devices. The

182 COMPUTER Abstraction levels in practical operating systems

The level structure of the hypothetical operating system Level 11: Devices. A wide variety of IIO devices is described in this article reflects the "uses-relation": All the available. Most operating systems are equipped with a set functions that a given level of software uses must be on the of interface programs, called "drivers," for standard same or lower levels. The resulting hierarchy of functions devices like keyboards, displays, printers, and tape drives. aids in understanding the software, verifying it, limiting the Most systems allow programmers of special subsystems effects of modifications to it, and incrementally testing it. to add drivers for the more esoteric devices such as TV The first operating system explicitly incorporating levels cameras or robot arms. was Dijkstra's T.H.E. operating system, built around 1968. Levels can be a powerful descriptive tool even for operating systems with code not strictly structured by Level 12: Stream l/O. If all standard devices and programs levels. The following paragraphs illustrate this with sample transmit data in the form of byte streams, considerable flex- functions in existing operating systems of levels 8 through ibility can be achieved in forming interconnections among 15 (listed in Table 1 in the article). programs. These ideas were first tested in Multics in 1968 and are an important aspect of the Unix operating system Level 8: Capabilities. The idea of a hardware-recogniz- (1974). able, unique virtual address, or capability, proposed by Den- nis and Van Horn in 1966, was implemented with special hardware in the Plessey System 250 in 1972, the Carnegie- Level 13: User processes. Most modern operating Mellon in 1975 and Star OS in 1979, the Cambridge systems allow users to construct commands composed of CAP system in 1980, and the Intel Max system for the 432 several programs. The operating system implements such microcomputer in 1981. It was implemented with commands by spawning a process for each component microcode extensions in IBM's System 38 in 1980. program, even for single-user machines such as personal Although most other operating systems are not computers. A process is a simulated virtual machine con- capability-based, they implement many ideas of capability taining an executable program. addressing in the levels that interact with user programs. For example, a system call for opening a file usually returns a to a the be pointer file descriptor block; pointer may Level 14: Directories. Directories were used as catalogs passed to routines for reading and writing the file. The of files in the MIT Compatible Time Sharing System around routine for writing, for instance, checks whether the pointer 1963 with one directory per user. In 1965, CTSS was passed to it is indeed for a file opened for writing. In a modified to permit subdirectories. Many later systems also capability-based system, this check would be performed used hierarchical directories-for example, Multics in automatically by hardware. 1968, Unix in 1972, and VMS in 1978. In Unix, directories can contain entries not only for other directories and files, but in Level 9: Pipes. Advances communications research for devices and pipes. In capability-based systems, such as have spawned a large number of networks that provide the Cambridge CAP, directories can contain entries for any reliable, low-cost, high-speed computer communication object. In Locus, the Unix directory structure is made to in use to- over a wide variety of carriers. Sample networks span all the machines in the network, which makes files ad- and day are long-haul networks, such as DARPA's Arpanet dressable in the same way on every machine and provides GTE's Telenet, and local networks such as Digital Equip- "location transparency," which means that a file's location and Xerox's ment Corporation's Decnet, IBM's SNA, cannot be determined from its name. Ethernet. Many commercial and hobby systems today have the Networks and communication channels are relatively re- primitive, one-level directory structure type used in CTSS. cent additions to operating systems. They have usually been added as utility programs outside the kernel, resulting in a set of loosely coupled machines and the network's be- Level 15: Shell. The shell is the interactive command in- ing visible. When the network functions are integrated into terpreter program that listens to the user's terminal. The the kernel, they can be made to look like interprocess term "shell" was first used in Multics, but the idea of pipes, and the network becomes invisible. treating commands as statements in a very high level pro- gramming language goes back to the early timesharing Level 10: Files. All operating systems offer files for long- systems like the Manchester Atlas in 1959. The command term storage. In addition to the simple sequential file interpreters of most batch-processing systems implement organization presented here, most commercial systems primitive "job control languages" that are often difficult to provide record-oriented files. use.

October 1984 183~~~~~~~~~~~~~~~~~~~~~~~~~~ October 1984 183 simplest possibility is the stream The OPEN operation of level 12 WRITE operations for the three kinds model in which these objects are returns an op T_cap, corresponding of I/0 objects. media for holding streams of bits. to a given T_cap for T, a pipe, file, or The stream model is not used in Corresponding to each of these ob- device. The op_T_cap represents an every operating system. For example, jects is a pair of pointers, r for reading active connection through which data in Multics, because segments are ex- and w for wvriting; r counts the number may be passed efficiently to and from plicit components of virtual memory, of bytes read thus far and w counts the object. The request segment in a separate concept of file is not needed those written thus far. Each READ re- READ and WRIl E operations moves because segments are retained in- quest begins at position r and ad- across such a connection. The CLOSE definitely until deleted by their own- vances r bv the number of bytes read. operation breaks the connection. ers.3 In Multics, the four operations Similarly, each WRITE request begins Because the stream model has of Table 7 are implicit. The first time a at position w and advances w by the already been incorporated into the process refers to a segment, a "miss- number of bytes written. pipes, files, and devices levels, the only ing-binding" interrupt causes the The blocks of data moved by ne", mechanism involves a method of operating system to load and bind that READ or WRITE requests are called switching from a level 12 operation to segment to the process. The process segments; seg(x,n) denotes a con- its counterpart in the level for the type can thereafter read or write the seg- tiguous sequence of n bvtes beginning of object connected to a port. For ex- ment using the ordinary virtual- at position x- in a given data stream. ample, OPEN (T_cap,rw) means addressing mechanism. Certain seg- The exact interpretation of a READ CASE T OF ments of the address space are per- (or WRITE) request depends on pipe: RETURN manently bound to devices, so reading whether the segment comes from a OPEN-PIPE (T_cap, rw); or writing those segments is equivalent pipe, file, or uevice. For example, a file: RETURN to reading or writing the device. The READ request can be applied only at OPENJFILE(T-cap, rw); concept of pipe is missing, but the in- the output end of a pipe, and the dev: RETURN terprocess communication mechanism reader must wait until the writer has OPEN_DEV (T_cap, rw); allows a data stream to be transmitted supplied enough data to fill the re- ELSE: error; from one process to another. quest. An output-only device, such as END CASE a laser printer, cannot be read and an User processes: level 13. A user pro- input-only device, such as a terminal Table 7 summarizes the interpreta- cess is a virtual machine containing a keyboard, cannot be written. tion of OPEN, CLOSE, READ, and program in execution. It consists of a

Table 7. Semantics of l/O operations on objects (level 12). FORM OF CALL PIPE FILE DEVICE op T cap - Verify that T cap refers to an Verify that T cap refers to an Verify that T cap refers to an OPEN (T cap. rw) unopened pipe. Use OPEN PIPE unopened file. Use OPEN FILE unopened device. Use OPEN DEV (T cap, rw) to initialize an open- (T cap, rw) to initialize an open- (T cap, rw) to initialize an open- pipe descriptor block in which file descriptor block in which device descriptor block in which r=w =0 return the op pipe r0= and w= (file length); r0= or w =0, according to whether cap. return the op file cap. the device is input or output" return the op dev cap. READ (op T cap, a. n) Wait until r+n

184 COMPUTER primitive process, a virtual mefnory, a list of arguments passed as param- eters, a list of ports, and a context. Each port represents a capability for an open pipe, file, or device. Context, a set ofvariables characterizing the en- vironment in which the process oper- ates, includes the current working directory, the command directory, a link to the parent process, a linked list of spawned processes, and a signal variable that counts the number -of spawned processes with an incomplete execution. Figure 4 illustrates the for- mat of a user-process descriptor block. A new user process is created by a FORK operation. The creator is called the "parent" and the new process a "child." A parent can exercise control over its children by resuming, sus- pending, or killing them. A parent can stop and wait for its children to com- plete their tasks by a join operation, and a child can signal its completion Figure 4. A user process is a virtual machine created to execute a given pro- gram. It contains a primitive process, a virtual memory holding the given pro- by an exit operation (Table 8). gram, a list of arguments supplied at the time of call, a list of ports, and a set The OPEN operation that appears of context variables. By convention, PORTS[O] is the default input and in Table 8 hides the level 12 OPEN PORTS[1] is the default output; these two ports are bound to pipes, files, or from higher levels and allows level 13 devices when the process is created. The process can open other ports as to store copies of all open-object well after it begins execution. capabilities in the PORTS table. When a process terminates, level 13 8. of user-process operations (level 13). can assure that all open objects are Table Specification closed by invoking the level 12 FORM OF CALL EFFECT CLOSE operation for each entry in up cap:= Allocates a user process descriptor block. Creates a suspended the PORTS table. (FORK(file cap, primitive process and stores its index in a new user-process capabili- params, in, out) ty. Creates a virtual memory and loads the executable file denoted by Directories: level 14. Level 14 is file_cap. Copies the parameters into the ARGS list. Verifies that in responsible for managing a hierarchy and out are capabilities for pipes, files, or devices; if so, opens in for of directories containing capabilities reading and puts the open-capability in PORTS [0], and opens out for for sharable objects. In our hypothet- writing and puts the open-capability in PORTS [1]. ical system, these capabilities are JOIN (A) Waits until caller's context-variable signal is A, then returns. pipes, files, devices, directories, and KILL (up-cap) Terminates the designated user process, but only if it is a child of the user processes; capabilities for open caller. This entails destroying the primitive process and virtual memory, closing open pipes, files, or devices connected to ports, pipes, files, and devices are not releasing the storage held by the descriptor block, and removing the sharable and cannot appear in direc- deleted process from the list of the caller's children. tories. A hierarchy arises because a EXIT Terminates the caller process and adds i to the signal variable of the directory can contain capabilities for parent process. subordinate directories. SUSPEND (up-cap) Puts the primitive process contained within the user process A directory is a table that matches "up-cap'' into the suspended state, but only if the caller is the an external name, stored as a string of parent of process "up-cap." characters, with an access code and a RESUME (up-cap) Puts the primitive process contained within the user process capability. In a tree of directories (Fig- ''up-cap" into the ready state, but only it the caller is the parent of ure 5), the concatenated sequence of process ''up-cap." external names from the root to a op_T_cap := Invokes the OPEN command in level 12, stores a copy of the result in given object serves as a unique, sys- OPEN(T_cap, rw) the next available position in the PORTS table, and returns the result temwide external name for that ob- to the caller (T is pipe, file, or device).

October 1984 185 ject. A directory system of this kind to a given external name. Thus, the capabilities must be presented to their has been implemented on the Cam- directory level is merely a mechanism respective levels for interpretation. In- bridge CAP machine.'0 for mapping external names to inter- formation about object attributes, The principal operation of level 14 nal ones. Only one type of capability such as ownership or time of last use, is a search command that locates and can be mapped to an object at this is not kept in directories but rather in returns the capability corresponding level: a directory capability. All other the object descriptor blocks within the object manager levels. The requirement for systemwide unique names implies that the direc- tory level must also ensure that por- tions of the directory hierarchy resi- dent on each machine are consistent. The methods for replication in a distributed database system are a good way to guarantee this consistency. 34 To control the number ofupdate mes- sages in a large system, the full direc- tory database may be kept on only a Figure 5. A directory hierarchy can be depicted as an inverted tree whose top- small subset of machines (for exam- most node is called "root." Some directories are permanently reserved for ple, two or three) implementing a specific purposes. For example, the dev directory lists all the external devices of the system. The lib directory lists the library of all executable pro- stable store. A workstation or other grams maintained by the system's administration. A user directory contains local system can store copies of the a subdirectory for each authorized user; that subdirectory is the root of a sub- views of the directory database being tree belonging to that user. In Unix, the unique external name of an object is accessed after the accessing user logs formed by concatenating the external names along the path from the root, an separated by an "I" and omitting root. Thus, the laser printer's external name in. Operations that modify entry in is "Idevllaser." a directory must send updates to the stable-store machines, which relay them to affected workstations. Table 9: Specification of a directory manager interface (level 14). Specifications of the directory FORM OF CALL EFFECT level's principal operations are given in Table 9. These operations allow dir_cap := CREATE_DIR (access) Allocates an empty directory. Returns a capability with its permission bits set to the given access code. higher level programs to create objects (This directory is not attached to the directory tree.) and store capabilities for them in DESTROY_DIR (dir_cap) Destroys (removes) the given directory. (Fails it the directories. The table is not a complete directory is not empty.) specification of a directory manager, ATTACH Makes an entry called name in the given directory however; for example, it contains no (obj_cap, dir_cap, name, access) (dir_cap); stores in it the given object-capability command to change the name and ac- (obj_cap) and the given access code. If obj_cap cess fields of a directory entry. denotes a directory, sets its parent entry from the The ATTACH operation is used to self entry of the directory dir_cap. Notifies the create a new entry in a directory. The directory stable store of the change. (Fails if the name already exists in the directory dir_cap, if the access field of the capability returned directory dir_cap is not attached, or if obj_cap by a search operation will be the con- denotes an already attached directory.) junction of the entry's access code DETACH (dir_cap, name) Removes the entry of the given name from the given and the access field already in the directory. Notifies the directory stable store of the capability. change. (Fails if the name does not exist in the given When a directory needs to be at- directory or if the given directory is not empty.) tached to another directory, the AT- obj[cap := SEARCH Finds the entry of the given name in the given direc- TACH operation must also define the (dir_cap, name) tory and returns a copy of the associated capability. parent of the newly attached direc- Sets the access field in the returned capability to the minimum privilege enabled by the access fields of tory. The operation fails if a parent is the directory entry and of the capability. (Fails if the already defined (Figure 6). The DE- name does not exist in the given directory.) TACH operation only removes entries seg := LIST (dir_cap) In a segment of the caller's virtual memory, returns from directories but has no effect on a copy of the contents of the directory. (A user-level the object to which a capability points. program can interrogate the other levels for other in- To destroy an object, the DESTROY formation about the objects listed in the directory, operation of the appropriate level such as the date of last change. must be used. To minimize inadver-

186 COMPUTER tent deletions, the operation to destroy directed by a pipe (the " " symbol) to c7: = CREATE_FILEO; a directory fails if applied to a the input of eqn, which replaces ATTACH(c7, WD, "output", all); nonempty directory. descriptions of equations with the The variable CD holds a capability for The ATTACH and DETACH op- necessary formatting commands. The a commands directory and WD holds erations must notify the stable store output of eqn is then piped to lptroff, a capability for the current working so that changes become effective which generates the commands for the directory. Both CD and WD are part throughout the system. By maintain- laser printer. Finally, > indicates that of the shell's context (Figure 4). ing two conditions, this process can be the output of lptroffis to be placed in The shell then creates and resumes made simple: (1) an empty directory a file named output. If " > output" is user processes that execute the three must first be attached to the global replaced with " laser," the data is in- components ofthe pipeline and awaits directory tree before entries are made stead sent directly to the laser printer. their completion: in it, and (2) a directory must be empty After the components of a com- before being detached. A more com- mand line are identified, the shell ob- RESUME (FORK (cl, -, c2, c3)); plicated notification mechanism will tains capabilities for them by a series RESUME (FORK (c4, -, c3, c5)); be needed if a process is allowed to of commands: RESUME (FORK (c6, -, c5, c7)); JOIN (3); construct a directory subtree before its cl: = SEARCH(CD, "tbl"); root is attached to the global directory c2: = SEARCH(WD, "text"); After the JOIN returns, the shell can tree. c3: = CREATE_PIPEO; kill these processes and acknowledge c4: = SEARCH(CD, "eqn"); completion of the entire command to Shell: level 15. Most system users c5: = CREATE_PIPEO; the user through a "prompt" char- spend a great deal of time executing c6: = SEARCH(CD, "lptroff"); acter. existing programs, not writing new ones. When a user logs in, the oper- ating system creates a user process containing a copy of the shell program with its default input connected to the user's keyboard and its default output connected to the user's display. The shell is the program that listens to the user's terminal and interprets the in- put as commands to invoke existing programs in specified combinations and with specified inputs. The shell scans each complete line of input to pick out the names of pro- grams to be invoked and the values of arguments to be passed to them. For each program called in this way, the shell creates a user process. The user processes are connected according to the data flow specified in the com- mand line. Operations of substantial complex- ity can be programmed in the com- mand language of the Unix shell. For example, the operations that format and then print a file named "text" can be set in motion by the command line: tbl < text eqn lptroff > output The first program is tbl, which scans the data on its input stream and Figure 6. A directory is a table matching an external name string with an ac- replaces descriptions of tables of in- cess code and a capability. Every directory contains a capability pointing to with the necessary format- its immediate parent and a capability pointing to itself; the self capability formation can be used to fill in the parent entry in a new subordinate directory. Because ting commands. The " < " symbol in- directories are at a higher level than files, the file system can be used to store dicates that tbl is to take its input from directories. A directory containing only the self and parent entries is con- the file text. The output of tbl is sidered empty. October1984 187 If the specificatioin "< text" is the command. For example, typing cess will create a shell process con- omitted, the shell connects tbl to the Ip text output nected to the same terminal. When the deefault input, which is the same as its would substitute text for S1 and out- user types a logout command, the shell own, namely the terminal keyboard. put for $2 and so would have exactly process will exit and the login process In this case, the second search com- the same effect as the original com- will resume. miiaind is oumitted and the first fork mand line. opei-ationi is e have used the levels model to (cI, -, PORTS[O], c3) FORK System initialization describe the functions of con- Simnilarly, it ''> output' is omitted, temporary multimachine operating the shell connects lptroffto the default One small but essential piece of an systems and how it is possible to output, the shell's PORTS[l]. operating system has not been dis- systematically hide the physical loca- 11 anl claborate command line is to cussed the method of starting up the tions of all sharable objects, yet be be performiied ofteni, typing it can be- system. The start-up procedure, called able to locate them quickly when given come tedious. Unix encourages users a bootstrap sequence, begins with a a name in the directory hierarchy. to store complicated commands in ex- verv short program copied into the The directory function can be ccutable files called shell-sc ripts that low end of main memory from a per- generalized from its traditional role by become simipler commands. A file manent ROM. This program loads a storing capabilities, rather than file named Ip might be created with the longer program from the disk, which identifiers, in directory entries. No contents then takes control and loads the user machine has to have a full, local tbl < SI eqn lptroff >S2 operating system itself. Finally, the copy of the directory structure; it where the n1aimies of inlput and output operating system creates a special needs only to encache the view with files have beenl replaced by variables login process connected to each ter- which it is currently working. The full $1 aind $2. W"hen the command Ip is minal of the system. structure is maintained by a small inivoked, the variables $1 and $2 are When a user correctly types an iden- group of machines implementing a replaced by the arguments following tifier and a password, the login pro- stable store. special features special features exclusively available to COMPMAIL+ users ... your rofessional library at your fingertips. Type ">(-cspibs"on youLrCOMPMA L+ electronic imail system and ... * scan the index of over 400 titles * search hy title or topic * place a credit card order Your IEEE COMPUTER SOCIETY books will he shipped within 48 houers ot downloading!

Or . . . write for more information or a free catalog to: IEEE Computer Society, 1109 Spring Street, Suite 300, Dept. CM884, Silver Spring, MD 20910.

EIEEE COMPUTER SOCIETY T ._90wM8 THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, INC.

188COMPUTER188 CO MPUTER The model can deal with heteroge- RIACS, and to the National Science 13. D. R. Cheriton, "The V Kernel: A neous systems consisting of general- Foundation, which supported part of Software Base for Distributed Sys- purpose user machines, such as work- this work through grant MCS-8109513 tems," IEEE Software, Vol. 1, No. stations, and special-purpose ma- at Purdue University. 2, Apr. 1984, pp. 19-42. chines, such as stable stores, file 14. D. R. Cheriton, The Thoth System: servers, and supercomputers. Only the Multi-process Structuring and Por- user machines need a full operating tability, Elsevier Science, New York, system; the special-purpose machines 1982. require only a simple operating system 15. P. G. Neumann et al., A Provably capable of managing local tasks and References Secure Operating System, Its Ap- communicating on the network. plications, and Proofs, tech. report The levels model is based on the 1. P. Denning, "Third Generation Com- CSL-1 16, 2nd ed., SRI International, puter Surveys, same principle found in nature to or- Systems," Comnputing Menlo Park, Calif., May 7, 1980. Vol. 3, No. 4, Dec. 1971, pp. 175-212. ganize many scales of space and time. 2. P. Denning, "Fault-Tolerant Oper- 16. J. B. Dennis and E. C. Van Horn, At each level of abstraction are well- "Programming Semantics for Multi- defined rules of interaction for the ob- ating Systems," Comiputing Surveys, Vol. 8, No. 4, Dec. 1976, pp. 359-389. programmed Computations," Comm. jects visible at that level; the rules can ACM, Vol. 9, No. 3, Mar. 1966, pp. be understood without detailed 3. E. I. Organick, The Multics System: 143-155. knowledge of the smaller elements An Examination of Its Structure, The MIT Press, Cambridge, Mass., 1972. 17. E. W. Dijkstra, "The Structure of the making up those objects. The many THE-Multiprogramming System," parts of an operating system cannot be 4. R. P. Goldberg, "Survey of Virtual Comm. ACM, Vol. 11, No. 5, May fully understood without keeping this Machine Research," Computer, Vol. 1968, pp. 341-346. principle in mind. 7, No. 6, June 1974, pp. 3446. 18. A. Habermann, A. L. F. Nico, and 5. IBM Virtual Machine Facility/370: L. W. Cooprider, "Modularization Introduction, tech. report GC20- and Hierarchy in a Family of Oper- 1800-1, IBM, Aug. 1973. ating Systems," Comm. ACM, Vol. Acknowledgments 6. D. M. Ritchie and K. L. Thompson, 19, No. 5, May 1976, pp. 266-272. "The Unix Time-Sharing System," We are grateful to many colleagues Comm. ACM, Vol. 17, No. 7, July 19. P. J. Denning, T. D. Dennis, and J. A. for their advice and counsel on the 1974, pp. 365-375. Brumfield, "Low Contention Sema- and this phores and Ready Lists," Comm. formulation refinement of 7. B. W. Kernighan and R. Pike, The model: namely, J. Dennis, E. Dijkstra, ACM, Vol. 24, No. 10, Oct. 1981, pp. UNIX Programming Environment, 687-699. N. Habermann, B. Randell, and Prentice-Hall, Englewood Cliffs, N. M. Wilkes for early inspirations about J., 1984. 20. P. J. Denning, "Virtual Memory," hierarchical structure; K. Levitt, M. Computing Surveys, Vol. 2, No. 3, 8. G. B. Popek et al., "LOCUS: A Net- Sept. 1970, pp. 154-216. Meliar-Smith, and P. Neumann for work Transparent, High Reliability numerous discussions about PSOS; Distributed System," Proc. Eighth 21. D. R. Boggs, Internet Broadcasting, D. Cheriton, for demonstrating these Symp. Operating Systems Principles, tech. report CSL-83-3, Xerox, Palo ideas in the Thoth system, D. Comer, Dec. 1981, pp. 169-177. Alto Research Center, Palo Alto, for demonstrating these ideas in the 9. A. Barak and A. Litman, "MOS: A Calif. Oct. 1983. Xinu system, and G. Popek, for Multicomputer Distributed Oper- 22. E. Organick, A Programmer's View demonstrating these ideas in the Locus ating System," Software Practice ofthe Intel 432 System, McGraw-Hill, system; D. Schrader for advice on and Experience (to be published). New York, 1983. distributed databases and with help 10. M. V. Wilkes and R. M. Needham, 23. D. Comer, Operating System Design: the document; D. Denning for advice The Cambridge CAP Computer and The XINU Approach, Prentice-Hall, on access controls, capability systems, its Operating System, Elsevier/ Englewood Cliffs, N. J., 1984. and verification; A. Birrell, B. Lamp- North-Holland Publishing Co., New son, R. Needham, and M. Schroeder York, 1979. 24. A. S. Tanenbaum, Computer Net- works, Prentice-Hall, Englewood for advice on "server models" of 11. J. K. Ousterhout et al., "Medusa: distributed systems; and D. Farber, A. An Experiment in Distributed Oper- Cliffs, N. J., 1981. Hearn, T. Korb, and L. Landweber ating System Structure," Comm. 25. W. A. Wulf, R. Levin, and S. P. Har- for advice on networks through the ACM, Vol. 23, No. 2, Feb. 1980, pp. bison, HYDRA/C.mmp, An Experi- CSNET Project. 92-105. mental Computer System, McGraw- 1981. We are also grateful to the National 12. A. D. Birrell et al., "Grapevine: An Hill, Aeronautics and Space Administra- Exercise in Distributed Computing," 26. R. S. Fabry, "Capability-Based Ad- tion, which supported part of this Comm. ACM, Vol. 25, No. 4, Apr. dressing, " Comm. A CM, Vol. 17, No. work under contract NAS2-11530 at 1982, pp. 260-274. 7, July 1974, pp. 403-412. October 1984 189 27. H. M. Levy, Capability-Based Com- 34. P. G. Selinger, "Replicated Data," in puter Systems, Digital Press, Bedford, Distributed Data Bases, F. Poole, ed., Mass., 1984. Cambridge University Press, Cam- 28. H. P. Brinch, Operating System Prin- bridge, 1980, pp. 223-231. ciples, Prentice-Hall, Englewood Cliffs, N. J., 1973. 29. "UNIX, System User's Manual: System V," Issue 1, Bell Laboratories, Western Electric, Jan. 1983, pp. Peter J. Denning is the director of the 301-905. Research Institute for Advanced Computer Science (RIACS) at the NASA Ames 30. L. A. Rowe and K. P. Birman, "A Research Center in Mountain View, Based UNIX Local Network on the California. Previously, Denning was head Operating System," IEEE Trans. of the Computer Sciences Department at Software Engineering, Vol. SE-8, No. Purdue University and assistant professor 2, Mar. 1982, pp. 137-146. of electrical engineering at Princeton 31. J. F. W. F. Robert L. Brown is a staff scientist at the Paris and Tichy, Stork: Research Institute for Advanced Computer University. He received a PhD and an SM An Experimental Migrating File from MIT's Electrical Engineering Depart- tech. Science at the NASA Ames Research Cen- Systemfor ComputerNetworks, ter. He received a bachelor's degree in ment in 1968 and 1965, respectively, and a report CSD-TR-411, Purdue Univer- BEE from Manhattan College in 1964. sity, West Ind., Feb. mathematics from the Ohio Wesleyan Uni- Lafayette, 1983. versity in 1975 and is close to completing Denning's primary research interests are 32. R. C. Holt et al., Structured Concur- work for a PhD in computer science from computer systems architecture, operating rent Programming with Operating Purdue University. His work emphasizes systems, and performance modeling. He Systems Applications, Addison- the modeling of a coherent distributed has published over 120 papers and articles Wesley, Reading, Mass., 1978. system. in these areas since 1967. Denning is a past president of the ACM 33. W. F. Tichy, "Design, Implementa- and has held many editorial positions in- tion, and Evaluation of a Revision Questions about this article can be cluding editor-in-chief of Communications Control System," in Proc. Sixth Intl directed to Peter Denning, RIACS, NASA ofthe ACM and editor-in-chief of ACM's Conf. Software Engineering, Sept. Ames Research Center, MS 230-5, Moffet Computing Surveys. He is a member ofthe 1983, pp. 58-67. Field, CA 94035. New York Academy ofSciences, Sigma Xi, and IFIP Working Group 7.3 on Com- Al puter Performance Modeling. EMPLOYMENT OPPORTUNITIES Advanced Information & Decision Systems (AI&DS) invites qualified candidates to apply for research and programming positions in artificial intelligence R&D. Al&DS is a growing employee-owned company of 63 highly talented people working to develop and implement state-of- the-art solutions to real-world problems in areas such as artificial intelligence, software engineering, image understanding, information management, estimation and control, decision theory, signal processing, and cognitive science. Throughout 1984, positions will be available on projects involving: * Knowledge-Based and Expert * Robotics Walter F. Tichy is assistant professor in the Systems * Distributed Al General Architectures * Intelligent User Interfaces Department of Computer Science at Pur- Knowledge Acquisition * Natural Language Understanding due University. His research interests in- * Hypothesis Formation * Software Development Aids clude operating systems and programming Image and Signal * Information Management Understanding Information Retrieval environments, and he is currently in- Image Processing Database Management vestigating distributed, location-transpar- Information Integration ent, and migrating file systems for hetero- * Planning and Control UNIX Systems Programming * Decision-Making Aids LISP Programming geneous computer networks. In program- ming environments, he is studying version Located in the beautiful San Francisco Bay Area, AI&DS offers a pleasant control tools, delta methods, and module and supportive work environment, with expanding hardware facilities interconnection languages. that include VAXs, Symbolics 3600s, SUNs, and a Vicom image processor linked by Ethernets and connected to the ARPANET. Tichy received his BS in mathematics To find out more about AI&DS and the positions listed above, please and computer science from the Technical phone Mary Margaret Morton or Clif McCormick at (415) 941-3912, or University in Munich, Germany, in 1974, send your resume to: and his MS and PhD in computer science Advanced Information & Decision Systems from Carnegie-Mellon University in Pitts- 201 San Antonio Circle, Suite 286 AI&D burgh, Pennsylvania, in 1976 and 1980, Mountain View, California 94040-1270 E respectively. Tichy is a member of the An equal opportunity employer IEEE, ACM, and Sigma Xi. U.S. citizenship required for many projects COM PUTER