Task Management Issues in Distributed Systems Abstract 1 Introduction 2 Techniques and Attributes of Task Management

Task Management Issues in Distributed Systems Ahilan Anantha, Maki Sugimoto, Andreas Suryawan, Peter Tran University of California, San Diego November 21, 1998 Abstract tem architect: complexity, residual dep enden- cies, p erformance, and transparency. [7 ] We One of the main goals of distributed systems is analyze how each op erating system cop es with allowing idle pro cessing resources to b e utilized. these con icting factors to provide eciency To accomplish this, there must b e mechanisms and maintainability. And nally, we consider to distribute tasks across machines. We exam- improvements to these systems. ine the task management mechanisms provided by several distributed op erating systems, and 2 Techniques and Attributes analyze their e ectiveness. of Task Management 1 Intro duction 2.1 Ownership of CPU Resources A ma jor motivation for constructing a dis- The op erating systems we discuss in this pap er tributed op erating system is to p erform co or- fall into two basic classes of environments: 1 dination of decentralized resources in order to those where machines are "owned" by particu- raise the utilization of the system as a whole. lar users, such that a machine's pro cessing may It is through the management of tasks that a be used only when the owner is not using it, system is able to optimize the parallelism b eing and 2 where there is notion of ownership of ma- o ered, thereby increasing utilization. chines, all pro cessors are available for use by all There are quite a few interesting attributes users. Op erating systems of the rst class are of distributed op erating systems, and notable typically designed for environments of graphical techniques used in handling these. We examine workstations. the following attributes and techniques of task The Sprite and Condor op erating systems are management: designed for the lab graphical workstation en- vironment. There is no separate CPU server 1. Ownership of CPU resources cluster, the workstations themselves make up the distributed system. A user is exp ected to 2. Homogeneous vs heterogeneous environ- interact with the op erating system by way of ment a windowing system, which is a highly CPU 3. Remote execution/pro cess migration and memory intensive interactive pro cess. In- teractive pro cesses have randomly uctuating 4. Namespace transparency loads b ecause they act in resp onse to user activity, which itself is randomly uctuating. 5. Load information and control manager At the same time, interactive pro cesses have Design choices made in the systems surveyed minimum delay requirements b ecause users re- re ect a set of tradeo s considered by the sys- quire real-time resp onse. Windowing systems 1 nals would have minimal pro cessing resources p ose an even greater problem b ecause they re- to o er. All the resources of a graphical ter- quire a large p ercentage of pro cessing power, minal can thus be reserved for the windowing while traditional text mo de interaction is fairly system. lightweight. In order to satisfy these delay requirements, it b ecomes necessary to reserve the Clouds, Alpha, MOSIX, Plan 9, and Solaris maximum amount of CPU resources that a win- MC fall under this category. Solaris MC pro- dowing system would require. vides a pro cess migration mechanism for server In non-distributed systems, the user of a machines, allowing pro cesses to be migrated graphical workstation is exp ected to actively across server machines in cases where one server control which pro cesses may run on the system must be brought down for maintenance. In to satisfy his delay requirements. If the user these cases, migrated pro cesses can b e p ermit- of a graphical workstation has ownership of all ted to be inecient and to tax the resources the user pro cesses that can hog the system re- of the hosting system. But the necessity of sources, he can susp end or terminate the pro- maintaining services across server disconnec- cesses that prevent the useability of the console. tions outweighs these factors. Therefore So- However, if other users were p ermitted to eas- laris MC suggests the need for a distinction ily run pro cesses on remote systems, the console between servers which are prepared to o er user will lose the ability to control the interac- the resources to accept migrated pro cesses and tive resp onse time. user workstations which are not willing to ac- For this reason, these op erating systems give cept the burden of migrated pro cesses. Clouds, a second class status to remote pro cesses. Re- Alpha, and Plan 9 are alike in that they con- mote pro cesses are only allowed to utilize the sist of separate highend CPU and data servers. resources of a workstation if the workstation is A MOSIX system consists of a large number of not already busy serving its console user. The commo dityworkstations, all of whichmay play CPU resources can b e taken back from remote an equal part in serving data and pro cessing. pro cesses if the console user desires them. As such, the console user can be considered the 2.2 Remote Execution owner of a workstation's CPU resources. Remote execution and pro cess migration are Sprite and Condor will only p ermit remote the techniques used in distributed systems to pro cesses to run on a system when the system share CPU resources. Remote execution is the is idle and will evict remote pro cesses once a ability to create pro cesses on remote machines. user starts using the console. Pro cess migration is the ability to relo cate pro- The other class of distributed systems con- cesses b etween no des in mid-execution. sists of environments of dedicated CPU servers, Plan 9 supp orts remote execution on CPU data servers, and graphical terminals. The bulk servers explicitly sp eci ed by the user. Pro- of the pro cessing power in these environments cesses cannot be migrated, therefore remotely is contained in the CPU server. The computers executed pro cesses sp end their entire lifespan with graphical displays are essentially graphical on the remote CPU server, from creation to ter- terminals, they have sucient pro cessing abil- mination. Condor also supp orts remote execu- ityto run the windowing system pro cesses but tion only. require no more. All other CPU intensive pro- Sprite supp orts remote execution through the cesses are executed remotely on a CPU server. mechanism of pro cess migration. A request for No user "owns" the CPU server, every user remote execution of a new pro cess on Sprite gets a guaranteed share of its resources. Con- would need to be reduced to creating the pro- versely, no remote pro cesses would be allowed cess lo cally, and attempting to migrate the pro- on a graphical terminal. The work of trying cess so on after. In the case of an immediate to determine whether a graphical workstation remote execution, lo cal memory need not b e al- is idle is unnecessary, since the graphical termi- 2 is a path of execution made up of a series of lo cated for the co de and data if an idle com- calls to ob ject metho ds. Each call is referred puter is available to start with. Sprite, however, to as an invo cation that the ob ject resp onds do esn't provide this optimization. All execution to. An ob ject by itself is passive. When a b egins lo cally. thread invokes an ob ject's metho d, the thread Pro cess migration is only a request that an enters that ob ject's virtual-address space and application can make. Pro cesses are only p er- b egins execution. Eachinvo cation of a metho d mitted to execute on remote workstations that is called a segment of the thread that invokes are idle. If there are no idle machines then it. This segmentation of threads is how Clouds the request may be denied and the pro cess provides distributed execution. would continue to execute on the lo cal machine. Note how this is di erent from the tradi- Therefore, a pro cess cannot also b e exp ected to tional mo del of pro cesses and pro cess migration. b e able to b egin execution on a remote system There, a pro cess executes within one virtual- either. address space unless it is migrated to another When the computer ceases to b e idle, all re- no de. Migration is exp ensive, and it is exp ected mote pro cesses must b e evicted. Therefore, ev- not to o ccur more than once or twice. ery pro cess must have the notion of a home ma- In Clouds, migration takes place with ob- chine, which is the machine from which the user ject granularity. That is, as a thread pro- invoked the pro cess. ceeds, it mayinvoke ob jects on di erent no des. Solaris MC provides remote execution and A thread's path of execution ncessarily crosses pro cess migration mechanisms. User must ex- through all of these no des. It is strategic place- plicitly call the rexec system call to carry out ment of ob jects that would b e used as a mech- remote execution. Unlike Plan 9, the destina- anism for load balancing. tion no de need not be sp eci ed. Pro cess mi- Since there are no address space asso ciated gration is assumed to be used mainly for o - with each of the threads, ob jects on remote or loading pro cesses from a no de b eing shutdown lo cal no de can b e invoked with the same seman- for mainteanance.

Task Management Issues in Distributed Systems Abstract 1 Introduction 2 Techniques and Attributes of Task Management

Clustering with Openmosix

Load Balancing Experiments in Openmosix”, Inter- National Conference on Computers and Their Appli- Cations , Seattle WA, March 2006 0 0 5 4 9

HPC with Openmosix

Introduction Course Outline Why Does This Fail? Lectures Tutorials

The Utility Coprocessor: Massively Parallel Computation from the Coffee Shop

Legoos: a Disseminated, Distributed OS for Hardware Resource

Architectural Review of Load Balancing Single System Image

SEMINARIS DOCENTS DE CASO 03/04 - 2Q NÚM GRUP: Facultat D´Informàtica De Barcelona - Departament AC - UPC

Study of Mosix Software Tool for Cluster Computing in Linux

Process Migration in Distributed Systems

Using Mosix for Wide-Area Compuational Resources

Open-MPI Over MOSIX Parallel Computing in a Clustered World