Distributed Operating Systems

Total Page:16

File Type:pdf, Size:1020Kb

Distributed Operating Systems Distributed Operating Systems ANDREW S. TANENBAUM and ROBBERT VAN RENESSE Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways. This paper is intended as an introduction to distributed operating systems, and especially to current university research about them. After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed. Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden. Categories and Subject Descriptors: C.2.4 [Computer-Communications Networks]: Distributed Systems-network operating system; D.4.3 [Operating Systems]: File Systems Management-distributed file systems; D.4.5 [Operating Systems]: Reliability-fault tolerance; D.4.6 [Operating Systems]: Security and Protection-access controls; D.4.7 [Operating Systems]: Organization and Design-distributed systems General Terms: Algorithms, Design, Experimentation, Reliability, Security Additional Key Words and Phrases: File server INTRODUCTION more convenient to use than the bare ma- chine. Examples of well-known centralized Everyone agrees that distributed systems (i.e., not distributed) operating systems are are going to be very important in the future. CP/M,’ MS-DOS,’ and UNIX.3 Unfortunately, not everyone agrees on A distributed operating system is one that what they mean by the term “distributed looks to its users like an ordinary central- system.” In this paper we present a view- ized operating system but runs on multi- point widely held within academia about ple, independent central processing units what is and is not a distributed system, we (CPUs). The key concept here is transpar- discuss numerous interesting design issues ency. In other words, the use of multiple concerning them, and finally we conclude processors should be invisible (transparent) with a fairly close look at some experimen- to the user. Another way of expressing the tal distributed systems that are the subject same idea is to say that the user views of ongoing research at universities. the system as a “virtual uniprocessor,” not To begin with, we use the term “distrib- as a collection of distinct machines. This uted system” to mean a distributed operat- is easier said than done. ing system as opposed to a database system Many multimachine systems that do not or some distributed applications system, fulfill this requirement have been built. For such as a banking system. An operating system is a program that controls the re- ’ CP/M is a trademark of Digital Research, Inc. sources of a computer and provides its users ’ MS-DOS is a trademark of Microsoft. with an interface or virtual machine that is 3 UNIX is a trademark of AT&T Bell Laboratories. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1986 ACM 0360-0300/85/1200-0419 $00.75 ComputingSurveys, Vol. 17, No. 4, December 1985 420 l A. S. Tanenbaum and R. van Renesse CONTENTS To make the contrast with distributed operating systems stronger, let us briefly look at another kind of system, which we call a “network operating system.” A typical INTRODUCTION configuration for a network operating sys- Goals and Problems tem would be a collection of personal com- System Models puters along with a common printer server 1. NETWORK OPERATING SYSTEMS 1.1 File System and file server for archival storage, all tied 1.2 Protection together by a local network. Generally 1.3 Execution Location speaking, such a system will have most of 1.4 An Example: The Sun Network the following characteristics that distin- File System guish it from a distributed system: 2. DESIGN ISSUES 2.1 Communication Primitives l Each computer has its own private oper- 2.2 Naming and Protection 2.3 Resource Management ating system, instead of running part of 2.4 Fault Tolerance a global, systemwide operating system. 2.5 Services l Each user normally works on his or her 3. EXAMPLES OF DISTRIBUTED own machine; using a different machine OPERATING SYSTEMS invariably requires some kind of “remote 3.1 The Cambridge Distributed Computing System login,” instead of having the operating 3.2 Amoeba system dynamically allocate processes to 3.3 The V Kernel CPUS. 3.4 The Eden Project l Users are typically aware of where each 3.5 Comparison of the Cambridge, Amoeba, V, and Eden Systems of their files are kept and must move files 4. SUMMARY between machines with explicit “file ACKNOWLEDGMENTS transfer” commands, instead of having REFERENCES file placement managed by the operating system. l The system has little or no fault toler- ance; if 1 percent of the personal com- puters crash, 1 percent of the users are example, the ARPANET contains a sub- out of business, instead of everyone sim- stantial number of computers, but by this ply being able to continue normal work, definition it is not a distributed system. albeit with 1 percent worse performance. Neither is a local network consisting of personal computers with minicomputers and explicit commands to log in here or Goals and Problems copy a file from there. In both cases we The driving force behind the current inter- have a computer network but not a distrib- est in distributed systems is the enormous uted operating system. Thus it is the soft- rate of technological change in micropro- ware, not the hardware, that determines cessor technology. Microprocessors have whether a system is distributed or not. become very powerful and cheap, compared As a rule of thumb, if you can tell which with mainframes and minicomputers, so it computer you are using, you are not using has become attractive to think about de- a distributed system. The users of a true signing large systems composed of many distributed system should not know (or small processors. These distributed sys- care) on which machine (or machines) their tems clearly have a price/performance ad- programs are running, where their files vantage over more traditional systems. are stored, and so on. It should be clear by Another advantage often cited is the rela- now that very few distributed systems are tive simplicity of the software-each pro- currently used in a production environ- cessor has a dedicated function-although ment. However, several promising research this advantage is more often listed by projects are in progress. people who have never tried to write a Computing Surveys, Vol. 17, No. 4, December 1985 Distributed Operating Systems l 421 distributed operating system than by those VAXs), each with multiple users. Each user who have. is logged onto one specific machine, with Incremental growth is another plus; if remote access to the other machines. This you need 10 percent more computing model is a simple outgrowth of the central power, you just add 10 percent more pro- time-sharing machine. cessors. System architecture is crucial to In the workstation model, each user has this type of system growth, however, since a personal workstation, usually equipped it is hard to give each user of a personal with a powerful processor, memory, a bit- computer another 10 percent of a personal mapped display, and sometimes a disk. computer. Reliability and availability can Nearly all the work is done on the work- also be a big advantage; a few parts of the stations. Such a system begins to look dis- system can be down without disturbing tributed when it supports a single, global people using the other parts. On the minus file system, so that data can be accessed side, unless one is very careful, it is easy without regard to their location. for the communication protocol overhead The processor pool model is the next to become a major source of inefficiency. evolutionary step after the workstation There has been built more than one system model. In a time-sharing system, whether requiring the full computing power of its with one or more processors, the ratio of machines just to run the protocols, leaving CPUs to logged-in users is normally much nothing over to do the work. The occasional less than 1; with the workstation model it lack of simplicity cited above is a real prob- is approximately 1; with the processor pool lem, although in all fairness, this problem model it is much greater than 1. As CPUs comes from inflated goals: With a central- get cheaper and cheaper, this model will ized system no one expects the computer to become more and more widespread. The function almost normally when half the idea here is that whenever a user needs memory is sick. With a distributed system, computing power, one or more CPUs are a high degree of fault tolerance is often, at temporarily allocated to that user; when least, an implicit goal. the job is completed, the CPUs go back into A more fundamental problem in distrib- the pool to await the next request. As an uted systems is the lack of global state example, when ten procedures (each on a information. It is generally a bad idea to separate file) must be recompiled, ten pro- even try to collect complete information cessors could be allocated to run in parallel about any aspect of the system in one table. for a few seconds and then be returned to Lack of up-to-date information makes the pool of available processors.
Recommended publications
  • Sprite File System There Are Three Important Aspects of the Sprite ®Le System: the Scale of the System, Location-Transparency, and Distributed State
    Naming, State Management, and User-Level Extensions in the Sprite Distributed File System Copyright 1990 Brent Ballinger Welch CHAPTER 1 Introduction ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ This dissertation concerns network computing environments. Advances in network and microprocessor technology have caused a shift from stand-alone timesharing systems to networks of powerful personal computers. Operating systems designed for stand-alone timesharing hosts do not adapt easily to a distributed environment. Resources like disk storage, printers, and tape drives are not concentrated at a single point. Instead, they are scattered around the network under the control of different hosts. New operating system mechanisms are needed to handle this sort of distribution so that users and application programs need not worry about the distributed nature of the underlying system. This dissertation explores the approach of centering a distributed computing environment around a shared network ®le system. The ®le system is chosen as a starting point because it is a heavily used service in stand-alone systems, and the read/write para- digm of the ®le system is a familiar one that can be applied to many system resources. The ®le system described in this dissertation provides a distributed name space for sys- tem resources, and it provides remote access facilities so all resources are available throughout the network. Resources accessible via the ®le system include disk storage, other types of peripheral devices, and user-implemented service applications. The result- ing system is one where resources are named and accessed via the shared ®le system, and the underlying distribution of the system among a collection of hosts is not important to users.
    [Show full text]
  • Distributed Virtual Machines: a System Architecture for Network Computing
    Distributed Virtual Machines: A System Architecture for Network Computing Emin Gün Sirer, Robert Grimm, Arthur J. Gregory, Nathan Anderson, Brian N. Bershad {egs,rgrimm,artjg,nra,bershad}@cs.washington.edu http://kimera.cs.washington.edu Dept. of Computer Science & Engineering University of Washington Seattle, WA 98195-2350 Abstract Modern virtual machines, such as Java and Inferno, are emerging as network computing platforms. While these virtual machines provide higher-level abstractions and more sophisticated services than their predecessors from twenty years ago, their architecture has essentially remained unchanged. State of the art virtual machines are still monolithic, that is, they are comprised of closely-coupled service components, which are thus replicated over all computers in an organization. This crude replication of services forms one of the weakest points in today’s networked systems, as it creates widely acknowledged and well-publicized problems of security, manageability and performance. We have designed and implemented a new system architecture for network computing based on distributed virtual machines. In our system, virtual machine services that perform rule checking and code transformation are factored out of clients and are located in enterprise- wide network servers. The services operate by intercepting application code and modifying it on the fly to provide additional service functionality. This architecture reduces client resource demands and the size of the trusted computing base, establishes physical isolation between virtual machine services and creates a single point of administration. We demonstrate that such a distributed virtual machine architecture can provide substantially better integrity and manageability than a monolithic architecture, scales well with increasing numbers of clients, and does not entail high overhead.
    [Show full text]
  • Workstation Operating Systems Mac OS 9
    15-410 “Now that we've covered the 1970's...” Plan 9 Nov. 25, 2019 Dave Eckhardt 1 L11_P9 15-412, F'19 Overview “The land that time forgot” What style of computing? The death of timesharing The “Unix workstation problem” Design principles Name spaces File servers The TCP file system... Runtime environment 3 15-412, F'19 The Land That Time Forgot The “multi-core revolution” already happened once 1982: VAX-11/782 (dual-core) 1984: Sequent Balance 8000 (12 x NS32032) 1985: Encore MultiMax (20 x NS32032) 1990: Omron Luna88k workstation (4 x Motorola 88100) 1991: KSR1 (1088 x KSR1) 1991: “MCS” paper on multi-processor locking algorithms 1995: BeBox workstation (2 x PowerPC 603) The Land That Time Forgot The “multi-core revolution” already happened once 1982: VAX-11/782 (dual-core) 1984: Sequent Balance 8000 (12 x NS32032) 1985: Encore MultiMax (20 x NS32032) 1990: Omron Luna88k workstation (4 x Motorola 88100) 1991: KSR1 (1088 x KSR1) 1991: “MCS” paper on multi-processor locking algorithms 1995: BeBox workstation (2 x PowerPC 603) Wow! Why was 1995-2004 ruled by single-core machines? What operating systems did those multi-core machines run? The Land That Time Forgot Why was 1995-2004 ruled by single-core machines? In 1995 Intel + Microsoft made it feasible to buy a fast processor that fit on one chip, a fast I/O bus, multiple megabytes of RAM, and an OS with memory protection. Everybody could afford a “workstation”, so everybody bought one. Massive economies of scale existed in the single- processor “Wintel” universe.
    [Show full text]
  • A Single System Image Java Operating System for Sensor Networks
    A SINGLE SYSTEM IMAGE JAVA OPERATING SYSTEM FOR SENSOR NETWORKS Emin Gun Sirer Rimon Barr John C. Bicket Daniel S. Dantas Computer Science Department Cornell University Ithaca, NY 14853 {egs, barr, bicket, ddantas}@cs.cornell.edu Abstract In this paper we describe the design and implementation of a distributed operating system for sensor net- works. The goal of our system is to extend total system lifetime through power-aware adaptation for sensor networking applications. Our system achieves this goal by providing a single system image of a unified Java virtual machine to applications over an ad hoc collection of heterogeneous sensors. It automatically and transparently partitions applications into components and dynamically finds a placement of these components on nodes within the sensor network to reduce energy consumption and increase system longevity. This paper describes the design and implementation of our system and examines the question of where and when to mi- grate components in a sensor network. We evaluate two practical, power-aware, general-purpose algorithms for object placement, as well as an adaptive scheme for deciding the time granularity of object migration. We demonstrate that our algorithms can increase sensor network longevity by a factor of four to five by effec- tively distributing energy consumption and avoiding hotspots. 1. Introduction able to components at each node, in particular the available power and bandwidth may change over Sensor networks simultaneously promise a radi- time and necessitate the relocation of application cally new class of applications and pose signifi- components. Further, event sources that are being cant challenges for application development.
    [Show full text]
  • Design of the SPEEDOS Operating System Kernel
    Universität Ulm Fakultät für Informatik Abteilung Rechnerstrukturen Design of the SPEEDOS Operating System Kernel Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultät für Informatik der Universität Ulm vorgelegt von Klaus Espenlaub aus Biberach a.d. Riß 2005 Official Copy, serial number df13c6b7-d6ed-e3f4-58b6-8662375a2688 Amtierender Dekan: Prof. Dr. Helmuth Partsch Gutachter: Prof. Dr. J. Leslie Keedy (Universität Ulm) Gutachter: Prof. Dr. Jörg Kaiser (Otto-von-Guericke-Universität, Magdeburg) Gutachter: Prof. John Rosenberg (Deakin University, Geelong, Victoria, Australia) Prüfungstermin: 11.07.2005 iii Abstract (Eine inhaltsgleiche, deutsche Fassung dieser Übersicht ist ab Seite 243 zu finden.) The design of current operating systems and their kernels shows deficiencies in re- spect to the structuring approach and the flexibility of their protection systems. The operating systems and applications suffer under this lack of extensibility and flexib- ility. The protection model implemented in many operating systems is not powerful enough to represent arbitrary protection conditions on a more fine-grained granu- larity than giving read and/or write access to an entire object. Additionally current operating systems are not capable of controlling the flow of information between software units effectively. Confinement conditions cannot be expressed explicitly and thus confinement problems can only be solved indirectly. Further complications with the protection system and especially the software structure in modern operating systems based on the microkernel approach are caused by the use of the out-of-process model. It is extremely difficult to spe- cify access rights appropriately, because the client/server paradigm does not easily allow a relationship to be established between the role of the client and the per- missions of the server.
    [Show full text]
  • MOSIX Evaluation on a Linux Cluster
    62 The International Arab Journal of Information Technology, Vol. 3, No. 1, January 2006 MOSIX Evaluation on a Linux Cluster Najib Kofahi1, Saeed Al Zahrani2, and Syed Manzoor Hussain 3 1Department of Computer Science s, Yarmouk University, Jordan 2Saudi Aramco, SA 3Dep t. of Information and Computer Science , King Fahd University of Petroleum and Minerals , SA Abstract: Multicomputer Operating System for Unix ( MOSIX) is a cluster -computing enhancement of Linux kernel that supports preemptive process migration. It consists of adaptive resource sharing algorithms for high performance scalability b y migrating processes across a cluster. Message passing Interface ( MPI ) is a library standard for writing message passing programs, which has the advantage of portability and ease of use. This paper highlights the advantages of a process migration model to utilize computing resources better and gain considerable speedups in the execution of parallel and multi -tasking applications. We executed several CPU bound tests under MPI and MOSIX. The results of these tests show the advantage of using MOSIX over MPI. At the end of this paper, we present the performance of the executions of those tests, which showed that in some cases improvement in the performance of MOSIX over MPI can reach tens of percents. Keywords: High performance computing, performance evaluatio n, Linux cluster, MOSIX, MPI, process migration. Received July 28, 2004; accepted September 30, 2004 1. Introduction model requires application developers to write their code according to set of stan dards. Message passing In recent years, interest in high performance computing Interface ( MPI ) and Parallel Virtual Machine ( PVM ) has increased [ 4, 7 , 17].
    [Show full text]
  • Advanced Programming in the UNIX Environment
    CS631-AdvancedProgrammingintheUNIXEnvironment Slide1 CS631 - Advanced Programming in the UNIX Environment Department of Computer Science Stevens Institute of Technology Jan Schaumann [email protected] https://www.cs.stevens.edu/~jschauma/631/ Lecture 01: Introduction, UNIX history, UNIX Programming Basics August 27, 2018 CS631-AdvancedProgrammingintheUNIXEnvironment Slide2 New Rules Close your laptops! Lecture 01: Introduction, UNIX history, UNIX Programming Basics August 27, 2018 CS631-AdvancedProgrammingintheUNIXEnvironment Slide3 New Rules Close your laptops! Open your eyes! (Mind, too.) Lecture 01: Introduction, UNIX history, UNIX Programming Basics August 27, 2018 CS631-AdvancedProgrammingintheUNIXEnvironment Slide4 So far, so good... What questions do you have? Lecture 01: Introduction, UNIX history, UNIX Programming Basics August 27, 2018 CS631-AdvancedProgrammingintheUNIXEnvironment Slide5 About this class The class is called “Advanced Programming in the UNIX Environment”. It is not called: “An Introduction to Unix” “An Introduction to Programming” “An introduction to C” Lecture 01: Introduction, UNIX history, UNIX Programming Basics August 27, 2018 CS631-AdvancedProgrammingintheUNIXEnvironment Slide6 What is it? https://www.bell-labs.com/usr/dmr/www/chist.html Lecture 01: Introduction, UNIX history, UNIX Programming Basics August 27, 2018 CS631-AdvancedProgrammingintheUNIXEnvironment Slide7 In a nutshell: the ”what” $ ls /bin [ csh ed ls pwd sleep cat date expr mkdir rcmd stty chio dd hostname mt rcp sync chmod df kill mv
    [Show full text]
  • Cluster Computing White Paper
    Cluster Computing White Paper Status – Final Release Version 2.0 Date – 28th December 2000 Editor-MarkBaker,UniversityofPortsmouth,UK Contents and Contributing Authors: 1 An Introduction to PC Clusters for High Performance Computing, Thomas Sterling California Institute of Technology and NASA Jet Propulsion Laboratory, USA 2 Network Technologies, Amy Apon, University of Arkansas, USA, and Mark Baker, University of Portsmouth, UK. 3 Operating Systems, Steve Chapin, Syracuse University, USA and Joachim Worringen, RWTH Aachen, University of Technology, Germany 4 Single System Image (SSI), Rajkumar Buyya, Monash University, Australia, Toni Cortes, Universitat Politecnica de Catalunya, Spain and Hai Jin, University of Southern California, USA 5 Middleware, Mark Baker, University of Portsmouth, UK, and Amy Apon, University of Arkansas, USA. 6 Systems Administration, Anthony Skjellum, MPI Software Technology, Inc. and Mississippi State University, USA, Rossen Dimitrov and Srihari Angulari, MPI Software Technology, Inc., USA, David Lifka and George Coulouris, Cornell Theory Center, USA, Putchong Uthayopas, Kasetsart University, Bangkok, Thailand, Stephen Scott, Oak Ridge National Laboratory, USA, Rasit Eskicioglu, University of Manitoba, Canada 7 Parallel I/O, Erich Schikuta, University of Vienna, Austria and Helmut Wanek, University of Vienna, Austria 8 High Availability, Ira Pramanick, Sun Microsystems, USA 9 Numerical Libraries and Tools for Scalable Parallel Cluster Computing, Jack Dongarra, University of Tennessee and ORNL, USA, Shirley Moore, University of Tennessee, USA, and Anne Trefethen, Numerical Algorithms Group Ltd, UK 10 Applications, David Bader, New Mexico, USA and Robert Pennington, NCSA, USA 11 Embedded/Real-Time Systems, Daniel Katz, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA and Jeremy Kepner, MIT Lincoln Laboratory, Lexington, MA, USA 12 Education, Daniel Hyde, Bucknell University, USA and Barry Wilkinson, University of North Carolina at Charlotte, USA Preface Cluster computing is not a new area of computing.
    [Show full text]
  • An Introduction to Single System Image (SSI) Cluster Technique
    Volume III, Issue IV, April 2014 IJLTEMAS ISSN 2278 - 2540 An Introduction to Single System Image (SSI) Cluster Technique Tarun Kumawat [CSE] , JECRC UDML College of Engineering. Kukas, Jaipur, Rajasthan, India1 Sandeep Tomar [CSE] , Arya College of Engineering & I.T. Kukas, Jaipur, Rajasthan, India2 Mohit Gupta [CSE] , Arya College of Engineering & I.T. Kukas, Jaipur, Rajasthan, India3 [email protected] [email protected] 3 [email protected] beowulf.myinstitute.edu), although the cluster Abstract-Cluster computing is not a new area of computing. may have multiple physical host nodes to serve It is, however, evident that there is a growing interest in its the login session. The system transparently usage in all areas where applications have traditionally used distributes user’s connection requests to different parallel or distributed computing platforms. A Single System physical hosts to balance load. Image (SSI) is the property of a system that hides the Single user interface: The user should be able to heterogeneous and distributed nature of the available use the cluster through a single GUI. The resources and presents them to users and applications as a single unified computing resource. SSI can be enabled in interface must have the same look and feel than numerous ways, this range from those provided by extended the one available for workstations (e.g., Solaris hardware through to various software mechanisms. SSI OpenWin or Windows NT GUI). means that users have a globalised view of the resources Single process space: All user processes, no available to them irrespective of the node to which they are matter on which nodes they reside, have a unique physically associated.
    [Show full text]
  • Downloaded on 2018-08-23T19:11:32Z Single System Image: a Survey
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Cork Open Research Archive Title Single system image: A survey Author(s) Healy, Philip D.; Lynn, Theo; Barrett, Enda; Morrisson, John P. Publication date 2016-02-17 Original citation Healy, P., Lynn, T., Barrett, E. and Morrison, J. P. (2016) 'Single system image: A survey', Journal of Parallel and Distributed Computing, 90- 91(Supplement C), pp. 35-51. doi:10.1016/j.jpdc.2016.01.004 Type of publication Article (peer-reviewed) Link to publisher's http://dx.doi.org/10.1016/j.jpdc.2016.01.004 version Access to the full text of the published version may require a subscription. Rights © 2016 Elsevier Inc. This is the preprint version of an article published in its final form in Journal of Parallel and Distributed Computing, available https://doi.org/10.1016/j.jpdc.2016.01.004. This manuscript version is made available under the CC BY-NC-ND 4.0 licence https://creativecommons.org/licenses/by-nc-nd/4.0/ Item downloaded http://hdl.handle.net/10468/4932 from Downloaded on 2018-08-23T19:11:32Z Single System Image: A Survey Philip Healya,b,∗, Theo Lynna,c, Enda Barretta,d, John P. Morrisona,b aIrish Centre for Cloud Computing and Commerce, Dublin City University, Ireland bComputer Science Dept., University College Cork, Ireland cDCU Business School, Dublin City University, Ireland dSoftware Research Institute, Athlone Institute of Technology, Ireland Abstract Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system.
    [Show full text]
  • Process Migration
    Process Migration DEJAN S. MILOJI´ CIˇ C´ HP Labs FRED DOUGLIS AT&T Labs–Research YVES PAINDAVEINE TOG Research Institute RICHARD WHEELER EMC AND SONGNIAN ZHOU University of Toronto and Platform Computing Process migration is the act of transferring a process between two machines. It enables dynamic load distribution, fault resilience, eased system administration, and data access locality. Despite these goals and ongoing research efforts, migration has not achieved widespread use. With the increasing deployment of distributed systems in general, and distributed operating systems in particular, process migration is again receiving more attention in both research and product development. As high-performance facilities shift from supercomputers to networks of workstations, and with the ever-increasing role of the World Wide Web, we expect migration to play a more important role and eventually to be widely adopted. This survey reviews the field of process migration by summarizing the key concepts and giving an overview of the most important implementations. Design and implementation issues of process migration are analyzed in general, and then revisited for each of the case studies described: MOSIX, Sprite, Mach, and Load Sharing Facility. The benefits and drawbacks of process migration depend on the details of implementation and, therefore, this paper focuses on practical matters. This survey will help in understanding the potentials of process migration and why it has not caught on. Categories and Subject Descriptors: C.2.4 [Computer-Communication
    [Show full text]
  • Single System Image in a Linux-Based Replicated Operating System Kernel
    Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Binoy Ravindran, Chair Robert P. Broadwater Antonio Barbalace February 24, 2015 Blacksburg, Virginia Keywords: Linux, Multikernel, Thread synchronization, Signals, Process Management Copyright 2015, Akshay Giridhar Ravichandran Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran (ABSTRACT) Recent trends in the computer market suggest that emerging computing platforms will be increasingly parallel and heterogeneous, in order to satisfy the user demand for improved performance and superior energy savings. Heterogeneity is a promising technology to keep growing the number of cores per chip without breaking the power wall. However, existing system software is able to cope with homogeneous architectures, but it was not designed to run on heterogeneous architectures, therefore, new system software designs are necessary. One innovative design is the multikernel OS deployed by the Barrelfish operating system (OS) which partitions hardware resources to independent kernel instances that communi- cate exclusively by message passing, without exploiting the shared memory available amongst different CPUs in a multicore platform. Popcorn Linux implements an extension of the mul- tikernel OS design, called replicated-kernel OS, with the goal of providing a Linux-based single system image environment on top of multiple kernels, which can eventually run on dif- ferent ISA processors. A replicated-kernel OS replicates the state of various OS sub-systems amongst kernels that cooperate using message passing to distribute or access various services uniquely available on each kernel.
    [Show full text]