The Distributed V Kernel and Its Performance for Diskless Workstations
Total Page:16
File Type:pdf, Size:1020Kb
July 1983 Report No. STAN-B-83-973 Also twtdered C’SL 246 The Distributed V Kernel and Its Performance for Diskless Workstations bY David R. Cheriton and Wily Zwaenepoel Department of Computer Science Stanford University Stanford, CA 94305 . The Distributed V Kernel and its Performance for Diskless Workstations David R. Cheriton and Willy Zwaenepoel Computer Systems Laboratory Departments of Computer’ Science and Electrical Engineering Stanford University Abstract Ethernet [7]. Network interprocess communication is predominantly used for remote file access since most SUN The distributed V kernel is a message-oriented kernel that workstations at Stanford are configured without a local disk provides uniform local and network interprocess communication. This paper reports our experience with the implementation and It is primarily being used in an environment of diskless use of the V kernel. Of particular interest are the controversial workstations connected by a high-speed local network to a set of aspects of our approach, namely: file servers. We describe a performance evaluation of the kernel, 1. The use of diskless workstations with all secondary storage with particular emphasis on the cost of network file access. Our provided by backend file servers. results show that over a local network: 2. The use of a general purpose network interprocess 1. Diskless workstations can access remote files with minimal communication facility (as opposed to special-purpose file performance penalty. access protocols) and, in particular, the use of a Thoth-like 2. The V message facility can be used to access remote files at inter-process communication mechanism. comparable cost to any well-tuned specialized file access The more conventional approach is to configure workstations protocol. with a small local disk, using network-based file servers for We conclude that it is feasible to build a distributed system with archival storage. Diskless workstations, however, have a number all network communication using the V message facility even of advantages, including: when most of the network nodes have no secondary storage. 1. Lower hardware cost per workstation. 2. Simpler maintenance and economies of scale with shared I. Introductkn file servers. The distributed V kernel is a message-oriented kernel that 3. Little or no memory or processing ovcrhcad on the provides uniform local and network inter-process communication. workstation for file system and disk handling. The kernel interface is modeled after the ‘Thoth [3,5] and 4. Fewer probiems with replication, consistency and Verex (4, S] kernels with some modifications to facilitate efficient distribution of files. local network operation. It is in active use at Stanford and at The major disadvantage is the overhead of performing all file other research and commercial establishments. The system is access over the network. One might therefore expect that we use implemented on a collection of MC68OO&based SUN a carefully tuned specialized tile-access protocol integrated into workstations [2] interconnected by a 3 Mb Ethernet [9] or 10 Mb the transport protocol layer. as done in LOCUS [ll). Instead, our file access is built on top of a general-purpose interprocess communication (IPC) facility that scrvcs as lhc transport layer. This work was sponsomd in pm by the Defense Advanmd Rcseatch Projects @IICy While this approach has the advantage of supporting a variety of under contracts MDA903-8OC4102 and NKU39-83-K0431. different types of network communication, its generality has the Permission to copy without fee all or part of this material is granted provided that the potential of introducing a significant performance penalty over copies are not made or distributed for direct the “problem-oriented” approach used in LOCUS. commercial advantage, the ACM copyright notice and the title of the publication and Furthermore. because sequential file access is so common, it is its date appear, and notice is given that convcnlional to use streaming protocols to minimize the effect of copying is by permission of the Association network latency on performance. Instead. WC adopted a for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or synchronous “request-response” model of mcs.sagc-passing and specific permission. data transfer which, while simple and efficient to implement as . well as relatively easy to use, does not suppon application-level received. use of streaming. ( pid, count ) = RcceivcWithScgmcnt( message, scgptr, scgsize ) Block the inI oking process to receli e a message as with These potential problems prompted a performance evaluation of Receive except, if a segment is spccit’icd in the message our methods, with particular emphasis on the cfliciency of file with read access, up to the first segsize bl tcs of the access. This emphasis on file access diztin:uishes our work from segment may bc transferred to the array starting at segptr. with counr specifying the actual number of similar studies [IO, 131. The results of our stud! strongly support bytes received. the idea of building a distributed vstem using diskless Reply( message, pid ) workstations connected by a high-speed local network to one or Send a 32-byte reply cbntained in the message buffer more file servers. Furthermore, we show that remote file access to the specified process providing it is awaiting a reply from the replier. The sending process is readied upon using the \’ kernel IPC facility is on]! slightly more expensive receiving the reply; the replying process does not than a lower bound imposed by the basic mst of network block. communication. From this we conclude that relatively little ReplyWithSegmcnt( message, pid. destptr, segptr, segsize ) improvement in performance can be achieved using protocols Send a reply message as done by Reply but also transmit the short segment specified by segprr and further specialized to file access. segsize to desfptr in the destination process’ address space. MoveFrom( srcpid, dest, src, count ) Copy counf bytes from the segment starting at src in 2. V Kernel Interprocess Communication the address space of srcpid to the segment starting at The basic model provided by the I’ kernel is hat of many small desr in the active process’s space. The srcpid must be processes communicating by messages. A process is identified by awaiLing rcpl) from the active process and must have provided read access to the segment of memory in its a 32-bit globally unique process idcrzri/ier or pid Communication address space using the message con\entions described between processes is protided in the form of short fixed-length under Send messages, each uith an associated reply message, plus a data MoveTo( destpid, dest, src, count ) transfer operation for maying larger arnounE of data between Copy counr bytes from the segment starting at src in the active process’s space to the segment starting at desr processes. In particular, all messages are a fixed 32 bytes in in the address space of the dcsrpid procebs. The desrpid length. must be auairlng reply from the active process and must ha\e provided wntc access to the segment of The common communicaLion scenario is as follows: A cfienr mcmop in i& address space using the message process executes a Send to a seTseT process \vh;ch then completes conventions described under Send execution of a Receive to receive the missage and eventually SetPid( logicalid, pid, scope ) executes a Reply 10 respond uith a rep!! message back to the Associate pid with the specified logicalid in the specified scope, which is one of local. remote or both. chent We refer to thus sequence as a mcssrge exchange. lhe Example loglcalid’s are fileserver, nameserver, etc. recciier ma! execute one or more :\lo\*eTu or .IloveFrom data pid = GctPid( logicalid. scope ) transfer operations beliveen the ume the message 1s received and Return the process idcntificr associated with (ogicalid the Ume the reply message is sent. in the specified scope if any, else 0. The following sections describe the primitives relevant to this 2.2. Discussion paper. The interested reader is referred to the V kernel The V kernel’s interproccss communication is modeled after that manual [6] for a complete description of the kernel facilities. of the Thoth and Verex kernels, which have been used in multi- user systems and real-time applications for several years. An 2.1. Primitives extensive discussion of this design and its motivations is Scnd( message, pid ) Send the 32-b\te message specified b! message to the available [5], although mainly in the scope of a single machine process specified by prd Ihe sender blocks until the system. We summarize here the highlights of the discussion. recciler has rcceiied the message and has sent back a 1. Synchronous request-response message communication . 32-byte rep11 using Reply. The rep]\ message overwnlcs the onglnal mcssagc area makes programming easy because of the similarity to Using the kernel mc4tigc format con\entions, a procedure calls. process sJ)cciJics in the nlejtige the scgmcnr of its 2. The distinction between small messages and a separate data address space t!ar the mesG;e recipient may access and uhether the reciJlient may read or write that transfer facility ties in well with a frcquenlly observed usage segment- A segment is specified b! the last two words pattern: A last amount of interprocess communication is of a mehsase. gilin its Stan addre% and its length transfer of small amounts of control information (e.g. de& respecti\el>. Resened fins bl& aL the &sinning of the message lndlcare uhcther a ScgmcnI is specified and if completion) while occasionally there is bulk data transfer so, 11s access permissions. (e.g. program loading). pid = Recci\ c( message ) 3. Finally, sl nchronous communication and small, fixed-size Hock the Invoking process. if ntycLqn. to receive a 32-b!tc mc’+a;c in 11s mesage VccIor.