Process Migration

DEJAN S. MILOJI´ CIˇ C´ HP Labs

FRED DOUGLIS AT&T Labs–Research

YVES PAINDAVEINE TOG Research Institute

RICHARD WHEELER EMC

AND

SONGNIAN ZHOU University of Toronto and Platform Computing

Process migration is the act of transferring a process between two machines. It enables dynamic load distribution, fault resilience, eased system administration, and data access locality. Despite these goals and ongoing research efforts, migration has not achieved widespread use. With the increasing deployment of distributed systems in general, and distributed operating systems in particular, process migration is again receiving more attention in both research and product development. As high-performance facilities shift from to networks of , and with the ever-increasing role of the World Wide Web, we expect migration to play a more important role and eventually to be widely adopted. This survey reviews the field of process migration by summarizing the key concepts and giving an overview of the most important implementations. Design and implementation issues of process migration are analyzed in general, and then revisited for each of the case studies described: MOSIX, Sprite, Mach, and Load Sharing Facility. The benefits and drawbacks of process migration depend on the details of implementation and, therefore, this paper focuses on practical matters. This survey will help in understanding the potentials of process migration and why it has not caught on.

Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems—network operating systems; D.4.7 [Operating Systems]: Organization and Design—distributed systems; D.4.8 [Operating Systems]: Performance—measurements; D.4.2 [Operating Systems]: Storage Management— distributed memories General Terms: Design, Experimentation Additional Key Words and Phrases: Process migration, distributed systems, distributed operating systems, load distribution

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. c 2001 ACM 0360-0300/01/0900-0241 $5.00

ACM Computing Surveys, Vol. 32, No. 3, September 2000, pp. 241–299. 242 D. S. Milojiˇci´cetal.

1. INTRODUCTION cess between two machines during its Organization of the Paper execution. Several implementations have 2. BACKGROUND been built for different operating systems, 2.1. Terminology including MOSIX [Barak and Litman, 2.2. Target Architectures 1985], V [Cheriton, 1988], Accent [Rashid 2.3. Goals and Robertson, 1981], Sprite [Ousterhout 2.4. Application Taxonomy et al., 1988], Mach [Accetta et al., 1986], 2.5. Migration Algorithm and OSF/1 AD TNC [Zajcew et al., 1993]. 2.6. System Requirements for Migration In addition, some systems provide mech- 2.7. Load Information Management anisms that checkpoint active processes 2.8. Distributed and resume their execution in essentially 2.9. Alternatives to Process Migration the same state on another machine, in- 3. CHARACTERISTICS cluding Condor [Litzkow et al., 1988] and 3.1. Complexity and Operating Load Sharing Facility (LSF) [Zhou et al., System Support 1994]. Process migration enables: 3.2. Performance r 3.3. Transparency dynamic load distribution,bymi- 3.4. Fault Resilience grating processes from overloaded nodes 3.5. Scalability to less loaded ones, r 3.6. Heterogeneity fault resilience, by migrating pro- 3.7. Summary cesses from nodes that may have expe- 4. EXAMPLES rienced a partial failure, 4.1. Early Work r improved system administration,by 4.2. Transparent Migration in migrating processes from the nodes that -like Systems are about to be shut down or otherwise 4.3. OS with Message-Passing Interface made unavailable, and 4.4. Microkernels r data access locality, by migrating pro- 4.5. User-space Migrations cesses closer to the source of some data. 4.6. Application-specific Migration 4.7. Mobile Objects Despite these goals and ongoing re- 4.8. Mobile Agents search efforts, migration has not achieved 5. CASE STUDIES widespread use. One reason for this is the 5.1. MOSIX complexity of adding transparent migra- 5.2. Sprite tion to systems originally designed to run 5.3. Mach stand-alone, since designing new systems 5.4. LSF with migration in mind from the begin- 6. COMPARISON ning is not a realistic option anymore. An- 7. WHY PROCESS MIGRATION other reason is that there has not been a HAS NOT CAUGHT ON compelling commercial argument for op- 7.1. Case Analysis erating system vendors to support process 7.2. Misconceptions migration. Checkpoint-restart approaches 7.3. True Barriers to Migration Adoption offer a compromise here, since they can 7.4. How these Barriers Might be Overcome run on more loosely-coupled systems by 8. SUMMARY AND FURTHER RESEARCH restricting the types of processes that can ACKNOWLEDGMENTS migrate. REFERENCES In spite of these barriers, process mi- gration continues to attract research. We believe that the main reason is the po- 1. INTRODUCTION tentials offered by mobility as well as A process is an ab- the attraction to hard problems, so in- straction representing an instance of a herent to the research community. There running computer program. Process mi- have been many different goals and gration is the act of transferring a pro- approaches to process migration because

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 243 of the potentials migration can offer to visited for each of the case studies de- different applications (see Section 2.3 on scribed: MOSIX, Sprite, Mach, and LSF. goals, Section 4 on approaches, and Sec- The benefits and drawbacks of process mi- tion 2.4 on applications). gration depend on the details of implemen- With the increasing deployment of dis- tation and therefore this paper focuses tributed systems in general, and dis- on practical matters. In this paper we tributed operating systems in particular, address mainly process migration mech- the interest in process migration is again anisms. Process migration policies, such on the rise both in research and in prod- as load information management and dis- uct development. As high-performance fa- tributed scheduling, are mentioned to the cilities shift from supercomputers to Net- extent that they affect the systems be- works of Workstations (NOW) [Anderson ing discussed. More detailed descriptions et al., 1995] and large-scale distributed of policies have been reported elsewhere systems, we expect migration to play a (e.g., Chapin’s survey [1996]). more important role and eventually gain This survey will help in understand- wider acceptance. ing the potential of process migration. It Operating systems developers in in- attempts to demonstrate how and why dustry have considered supporting pro- migration may be widely deployed. We cess migration, for example Solaris MC assume that the reader has a general [Khalidi et al., 1996], but thus far the knowledge of operating systems. availability of process migration in com- mercial systems is non-existent as we describe below. Checkpoint-restart sys- Organization of the Paper tems are becoming increasingly deployed The paper is organized as follows. Sec- for long-running jobs. Finally, techniques tion 2 provides background on process mi- originally developed for process migration gration. Section 3 describes the process have been employed in developing mobile migration by surveying its main charac- agents on the World Wide Web. Recent in- teristics: complexity, performance, trans- terpreted programming languages, such parency, fault resilience, scalability and as Java [Gosling et al., 1996], Telescript heterogeneity. Section 4 classifies vari- [White, 1996] and Tcl/Tk [Ousterhout, ous implementations of process migration 1994] provide additional support for agent mechanisms and then describes a couple mobility. of representatives for each class. Section 5 There exist a few books that discuss describes four case studies of process mi- process migration [Goscinski, 1991; Barak gration in more detail. In Section 6 we et al., 1993; Singhal and Shivaratri, 1994; compare the process migration implemen- Milojiciˇ c´ et al., 1999]; a number of sur- tations presented earlier. In Section 7 we veys [Smith, 1988; Eskicioglu, 1990; Nut- discuss why we believe that process migra- tal, 1994], though none as detailed as tion has not caught on so far. In the last this survey; and Ph.D. theses that deal section we summarize the paper and de- directly with migration [Theimer et al., scribe opportunities for further research. 1985; Zayas, 1987a; Lu, 1988; Douglis, 1990; Philippe, 1993; Milojiciˇ c,´ 1993c; Zhu, 1992; Roush, 1995], or that are related 2. BACKGROUND to migration [Dannenberg, 1982; Nichols, 1990; Tracey, 1991; Chapin, 1993; Knabe, This section gives some background on 1995; Jacqmot, 1996]. process migration by providing an over- This survey reviews the field of pro- view of process migration terminology, cess migration by summarizing the key target architectures, goals, application concepts and describing the most impor- taxonomy, migration algorithms, system tant implementations. Design and im- requirements, load information manage- plementation issues of process migration ment, distributed scheduling, and alterna- are analyzed in general and then re- tives to migration.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 244 D. S. Milojiˇci´cetal.

Fig. 1. High Level View of Process Migration. Fig. 2. Taxonomy of Mobility. Process migration consists of extracting the state of the process on the source node, transferring it to the destination node where a new instance of the process destination instance is the new process is created, and updating the connections with other created on the destination node. After mi- processes on communicating nodes. gration, the destination instance becomes a migrated process. In systems with a home node, a process that is running on 2.1. Terminology other machines may be called a remote A process is a key concept in operating process (from the perspective of the home systems [Tanenbaum, 1992]. It consists of node) or a foreign process (from the per- data, a stack, register contents, and the spective of the hosting node). state specific to the underlying Operating Remote invocation is the creation of a System (OS), such as parameters related process on a remote node. Remote invo- to process, memory, and file management. cation is usually a less “expensive” opera- A process can have one or more threads tion than process migration. Although the of control. Threads, also called lightweight operation can involve the transfer of some processes, consist of their own stack and state, such as code or open files, the con- register contents, but share a process’s ad- tents of the address space need not be dress space and some of the operating- transferred. system-specific state, such as signals. The Generally speaking, mobility can be task concept was introduced as a gener- classified into hardware and software mo- alization of the process concept, whereby bility, as described in Figure 2. Hardware a process is decoupled into a task and a mobility deals with mobile computing, number of threads. A traditional process such as with limitations on the connectiv- is represented by a task with one thread ity of mobile computers and mobile IP (see of control. [Milojiciˇ c´ et al., 1999] for more details). A Process migration is the act of trans- few techniques in mobile computing have ferring a process between two machines an analogy in software mobility, such as (the source and the destination node) dur- security, locating, naming, and communi- ing its execution. Some architectures also cation forwarding. Software mobility can define a host or home node, which is the be classified into the mobility of passive node where the process logically runs. A data and active data. Passive data rep- high-level view of process migration is resents traditional means of transferring shown in Figure 1. The transferred state data between computers; it has been em- includes the process’s address space, exe- ployed ever since the first two comput- cution point (register contents), communi- ers were connected. Active data can be cation state (e.g., open files and message further classified into mobile code, pro- channels) and other operating system de- cess migration and mobile agents. These pendent state. Task migration represents three classes represent incremental evo- transferring a task between two machines lution of state transfer. Mobile code, such during execution of its threads. as Java applets, transfers only code be- During migration, two instances of the tween nodes. Process migration, which is migrating process exist: the source in- the main theme of this paper, deals pri- stance is the original process, and the marily with code and data transfer. It also

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 245 deals with the transfer of authority, for Massively Parallel Processors (MPP) instance access to a shared file system, are another type of architecture used but in a limited way: authority is under for migration research [Tritscher and the control of a single administrative do- Bemmerl, 1992; Zajcew et al., 1993]. MPP main. Finally,mobile agents transfer code, machines have a large number of pro- data, and especially authority to act on cessors that are usually shared between the owner’s behalf on a wide scale, such multiple users by providing each of them as within the entire Internet. with a subset, or partition, of the pro- cessors. After a user relinquishes a par- tition, it can be reused by another user. 2.2. Target Architectures MPP computers are typically of a NORMA Process migration research started with (NO Remote Memory Access) type, i.e., the appearance of distributed processing there is no remote memory access. In among multiple processors. Process mi- that respect they are similar to net- gration introduces opportunities for shar- work clusters, except they have a much ing processing power and other resources, faster interconnect. Migration represents such as memory and communication chan- a convenient tool to achieve repartition- nels. It is addressed in early multipro- ing. Since MPP machines have a large cessor systems [Stone, 1978; Bokhari, number of processors, the probability of 1979]. Current multiprocessor systems, failure is also larger. Migrating a running especially symmetric multiprocessors, are process from a partially failed node, for ex- scheduled using traditional scheduling ample after a bank of memory unrelated to methods. They are not used as an envi- the process fails, allows the process to con- ronment for process migration research. tinue running safely. MPP machines also Process migration in NUMA (Non- use migration for load distribution, such Uniform Memory Access) multiprocessor as the psched daemon on Cray T3E, or architectures is still an active area of re- Loadleveler on IBM SP2 machines. search [Gait, 1990; Squillante and Nelson, Since its inception, a Local Area Net- 1991; Vaswani and Zahorjan, 1991; Nelson work (LAN) of computers has been the and Squillante, 1995]. The NUMA archi- most frequently used architecture for pro- tectures have a different access time to the cess migration. The bulk of the systems de- memory of the local processor, compared scribed in this paper, including all of the to the memory of a remote processor, or to case studies, are implemented on LANs. a global memory. The access time to the Systems such as NOW [Anderson et al., memory of a remote processor can be vari- 1995] or Solaris [Khalidi et al., 1996] have able, depending on the type of intercon- recently investigated process migration nect and the distance to the remote pro- using clusters of workstations on LANs. cessor. Migration in NUMA architectures It was observed that at any point in time is heavily dependent on the memory foot- many autonomous workstations on a LAN print that processes have, both in memory are unused, offering potential for other and in caches. Recent research on virtual users based on process migration [Mutka machines on scalable shared memory mul- and Livny, 1987]. There is, however, a so- tiprocessors [Bugnion, et al., 1997] rep- ciological aspect to the autonomous work- resents another potential for migration. station model. Users are not willing to Migration of whole virtual machines be- share their computers with others if this tween processors of a multiprocessor ab- means affecting their own performance stracts away most of the complexities of [Douglis and Ousterhout, 1991]. The pri- operating systems, reducing the migrate- ority of the incoming processes (process- able state only to memory and to state ing, VM, IPC priorities) may be reduced contained in a virtual monitor [Teodosiu, in order to allow for minimal impact 2000]. Therefore, migration is easier to im- on the ’s owner [Douglis and plement if there is a notion of a virtual Ousterhout, 1991; Krueger and Chawla, machine. 1991].

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 246 D. S. Milojiˇci´cetal.

Most recently, wide-area networks have Resource sharing is enabled by mi- presented a huge potential for migration. gration to a specific node with a special The evolution of the Web has significantly hardware device, large amounts of free improved the relevance and the opportu- memory, or some other unique resource. nities for using a wide-area network for Examples include NOW [Anderson et al., . This has resulted 1995] for utilizing memory of remote in the appearance of mobile agents, en- nodes, and the use of parallel make in tities that freely roam the network and Sprite [Douglis and Ousterhout, 1991] and represent the user in conducting his tasks. work by Skordos [1995] for utilizing un- Mobile agents can either appear on the In- used workstations. ternet [Johansen et al., 1995] or in closed Fault resilience is improved by migra- networks, as in the original version of tion from a partially failed node, or in the Telescript [White, 1996]. case of long-running applications when failures of different kinds (network, de- 2.3. Goals vices) are probable [Chu et al., 1980]. In The goals of process migration are closely this context, migration can be used in com- tied with the type of applications that use bination with checkpointing, such as in migration, as described in next section. Condor [Litzkow and Solomon, 1992] or The goals of process migration include: Utopia [Zhou et al., 1994]. Large-scale sys- Accessing more processing power tems where there is a likelihood that some is a goal of migration when it is used of the systems can fail can also benefit for load distribution. Migration is partic- from migration, such as in Hive [Chapin ularly important in the receiver-initiated et al., 1995] and OSF/1 AD TNC [Zajcew distributed scheduling algorithms, where et al., 1993]. a lightly loaded node announces its avail- System administration is simplified ability and initiates process migration if long-running computations can be tem- from an overloaded node. This was the porarily transferred to other machines. goal of many systems described in this sur- For example, an application could mi- vey, such as Locus [Walker et al., 1983], grate from a node that will be shutdown, MOSIX [Barak and Shiloh, 1985], and and then migrate back after the node is Mach [Milojiciˇ c´ et al., 1993a]. Load distri- brought back up. Another example is the bution also depends on load information repartitioning of large machines, such as management and distributed scheduling in the OSF/1 AD TNC Paragon configura- (see Sections 2.7 and 2.8). A variation tion [Zajcew et al., 1993]. of this goal is harnessing the computing Mobile computing also increases the power of temporarily free workstations in demand for migration. Users may want to large clusters. In this case, process mi- migrate running applications from a host gration is used to evict processes upon to their mobile computer as they connect the owner’s return, such as in the case of to a network at their current location or Sprite (see Section 5.2). back again when they disconnect [Bharat Exploitation of resource locality is and Cardelli, 1995]. a goal of migration in cases when it is more efficient to access resources locally 2.4. Application Taxonomy than remotely. Moving a process to an- The type of applications that can benefit other end of a communication channel from process migration include: transforms remote communication to lo- Parallelizable applications can be cal and thereby significantly improves per- started on certain nodes, and then mi- formance. It is also possible that the re- grated at the application level or by a source is not remotely accessible, as in the system-wide migration facility in response case when there are different semantics to things like load balancing consider- for local and remote accesses. Examples ations. Parallel Virtual Machine (PVM) include work by Jul [1989], Milojiciˇ cetal.´ [Beguelin et al., 1993] is an example [1993], and Miller and Presotto [1981]. of application-level support for parallel

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 247 invocation and interprocess communi- Such an application can either migrate it- cation, while Migratory PVM (MPVM) self, or it can be migrated by another au- [Casas et al., 1995] extends PVM to al- thority. This type of application is most low instances of a parallel application common in various systems described in to migrate among nodes. Some other Section 4 and in the case studies described applications are inherently paralleliz- in Section 5. Note that it is difficult to able, such as the make tool [Baalbergen, select such applications without detailed 1988]. For example, Sprite provides a knowledge of past behavior, since many migration-aware parallel make utility applications are short-lived and do not ex- that distributes a compilation across ecute long enough to justify the overhead several nodes [Douglis and Ousterhout, of migration (see Section 2.7). 1991]. Certain processor-bound applica- Migration-aware applications are tions, such as scientific computations, can applications that have been coded to be parallelized and executed on multi- explicitly take advantage of process ple nodes. An example includes work by migration. Dynamic process migration can Skordos [1995], where an acoustic appli- automatically redistribute these related cation is parallelized and executed on a processes if the load becomes uneven on cluster of workstations. Applications that different nodes, e.g. if processes are dy- perform I/O and other nonidempotent op- namically created, or there are many more erations are better suited to a system-wide processes than nodes. Work by Skordos remote execution facility that provides lo- [1995], Freedman [1991] and Cardelli cation transparency and, if possible, pre- [1995] represent this class of application. emptive migration. They are described in more detail in Sec- Long-running applications, which tion 4.6. can run for days or even weeks, can Network applications are the most suffer various interruptions, for example recent example of the potential use of mi- partial node failures or administrative gration: for instance, mobile agents and shutdowns. Process migration can relo- mobile objects (see Sections 4.7 and 4.8). cate these applications transparently to These applications are designed with mo- prevent interruption. Examples of such bility in mind. Although this mobility dif- systems include work by Freedman [1991] fers significantly from the kinds of “pro- and MPVM [Casas et al., 1995]. Migra- cess migration” considered elsewhere in tion can also be supported at the appli- this paper, it uses some of the same tech- cation level [Zhou et al., 1994] by pro- niques: location policies, checkpointing, viding a checkpoint/restart mechanism transparency, and locating and communi- which the application can invoke periodi- cating with a mobile entity. cally or upon notification of an impending interruption. 2.5. Migration Algorithm Generic multiuser workloads, for example the random job mix that an Although there are many different migra- undergraduate computer laboratory pro- tion implementations and designs, most of duces, can benefit greatly from process mi- them can be summarized in the following gration. As users come and go, the load on steps (see also Figure 3): individual nodes varies widely. Dynamic process migration [Barak and Wheeler, 1. A migration request is issued to a 1989, Douglis and Ousterhout, 1991] can remote node. After negotiation, mi- automatically spread processes across all gration has been accepted. nodes, including those applications that 2. A process is detached from its are not enhanced to exploit the migration source node by suspending its execu- mechanism. tion, declaring it to be in a migrating An individual generic application, state, and temporarily redirecting com- which is preemptable, can be used with munication as described in the follow- various goals in mind (see Section 2.3). ing step.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 248 D. S. Milojiˇci´cetal.

Fig. 3. Migration Algorithm. Many details have been simplified, such as user v. kernel migration, when is process actually suspended, when is the state transferred, how are message transferred, etc. These details vary subject to particular implementation.

3. Communication is temporarily message channels); and relevant ker- redirected by queuing up arriving nel context. The communication state messages directed to the migrated and kernel context are OS-dependent. process, and by delivering them to Some of the local OS internal state is the process after migration. This step not transferable. The process state is continues in parallel with steps 4, typically retained on the source node 5, and 6, as long as there are addi- until the end of migration, and in some tional incoming messages. Once the systems it remains there even after mi- communication channels are enabled gration completes. Processor dependen- after migration (as a result of step 7), cies, such as register and stack con- the migrated process is known to the tents, have to be eliminated in the case external world. of heterogeneous migration. 4. The process state is extracted, 5. A destination process instance is including memory contents; proces- created into which the transferred sor state (register contents); commu- state will be imported. A destination in- nication state (e.g., opened files and stance is not activated until a sufficient

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 249

amount of state has been transferred cation state, such as open message from the source process instance. Af- channels in the case of message-based ter that, the destination instance will systems, or open files and signal masks be promoted into a regular process. in the case of UNIX-like systems. r 6. State is transferred and imported Naming/accessing the process and into a new instance on the remote its resources. After migration, the mi- node. Not all of the state needs to be grated process should be accessible by transferred; some of the state could be the same name and mechanisms as lazily brought over after migration is if migration never occurred. The same completed (see lazy evaluation in Sec- applies to process’s resources, such as tion 3.2). threads, communication channels, files 7. Some means of forwarding refer- and devices. During migration, access to ences to the migrated process must be a process and/or some of its resources maintained. This is required in order can be temporarily suspended. Varying to communicate with the process or to degrees of transparency are achieved in control it. It can be achieved by regis- naming and accessing resources during and after migration (see Section 3.3). tering the current location at the home r node (e.g. in Sprite), by searching for the Cleaning up the process’s non- migrated process (e.g. in the V Kernel, migratable state. Frequently, the mi- at the communication protocol level), grated process has associated system or by forwarding messages across all state that is not migratable (examples visited nodes (e.g. in Charlotte). This include a local process identifier, pid, and step also enables migrated communica- the local time; a local pid is relevant only tion channels at the destination and it to the local OS, and every host may have ends step 3 as communication is perma- a slightly different value for the local nently redirected. time—something that may or may not 8. The new instance is resumed when matter to a migrating process). Migra- sufficient state has been transferred tion must wait until the process finishes and imported. With this step, process or aborts any pending system operation. migration completes. Once all of the If the operation can be arbitrarily long, it state has been transferred from the is typically aborted and restarted on the original instance, it may be deleted on destination node. For example, migra- the source node. tion can wait for the completion of local file operations or local device requests that are guaranteed to return in a lim- 2.6. System Requirements for Migration ited time frame. Waiting for a message or To support migration effectively, a system accessing a remote device are examples should provide the following types of func- of operations that need to be aborted and tionality: restarted on the remote node. Processes r that cannot have their non-migrateable Exporting/importing the process state cleaned cannot be considered for state. The system must provide some migration. type of export/import interfaces that allow the process migration mechanism 2.7. Load Information Management to extract a process’s state from the source node and import this state on the The local processes and the resources of destination node. These interfaces may local and remote nodes have to be char- be provided by the underlying operating acterized, in order to select a process for system, the programming language, migration and a destination node, as well or other elements of the programming as to justify migration. This task is com- environment that the process has access monly known as load information manage- to. State includes processor registers, ment. Load information is collected and process address space and communi- passed to a distributed scheduling policy

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 250 D. S. Milojiˇci´cetal.

nation and the costs of process migration. The lower the costs, the shorter the period can be; the higher the costs, less frequently load information is disseminated. How much information should be transferred? It can be the entire state, but typically only a subset is transferred in order to minimize the transfer costs and have a scalable solution. In large systems, approximations are applied. For example, only a subset of the information might be Fig. 4. Load Information Management Mod- transferred, or it might be derived from ule collects load information on the local node and disseminates it among the nodes. Distributed the subset of all nodes [Barak and Shiloh, Scheduling instructs the migration mechanism 1985; Alon et al., 1987; Han and Finkel, when, where, and which process to migrate. 1988; Chapin and Spafford, 1994]. There are two important observations (see Figure 4). Load information manage- derived from the research in load infor- ment is concerned with the following three mation management. The first one is that questions: just a small amount of information can What is load information and how lead to substantial performance improve- is it represented? The node load is typi- ments. This observation is related to load cally represented by one or more of the fol- distribution in general, but it also applies lowing load indices: utilization of the CPU, to process migration. Eager et al. were the length of the queue of processes wait- among the first to argue that load shar- ing to be executed, the stretch factor (ra- ing using minimal load information can tio between turnaround- and execution- gain dramatic improvements in perfor- time-submission to completion v. start to mance over the non-load-sharing case, and completion) [Ferrari and Zhou 1986], the perform nearly as well as more complex number of running processes, the num- policies using more information [Eager ber of background processes, paging, com- et al., 1986b]. The minimal load informa- munication [Milojiciˇ c,´ 1993c], disk uti- tion they use consists of the process queue lization, and the interrupt rate [Hwang length of a small number of successively et al., 1982]. A process load is typically probed remote nodes. A small amount of characterized by process lifetime, CPU us- state also reduces communication over- age, memory consumption (virtual and head. Kunz comes to the same conclusion physical), file usage [Hac, 1989a], commu- using the concept of stochastic learning nication [Lo, 1989], and paging [Milojiciˇ c,´ automata to implement a task scheduler 1993c]. Kuntz uses a combination of work- [Kunz, 1991]. load descriptions for distributed schedul- The second observation is that the cur- ing [Kunz, 1991]. The application type is rent lifetime of a process can be used considered in Cedar [Hagmann, 1986]. for load distribution purposes. The issue When are load information col- is to find how old the process needs to lection and dissemination activated? be before it is worth to migrate it. Costs These can be periodic or event-based. A involved with migrating short-lived pro- typical period is in the range of 1 second cesses can outweigh the benefits. Leland or longer, while typical events are pro- and Ott were the first to account for the cess creation, termination, or migration. process age in the balancing policy [1986]. The frequency of information dissemina- Cabrera finds that it is possible to predict tion is usually lower than the frequency of a process’s expected lifetime from how long information collection, i.e. it is averaged it has already lived [Cabrera, 1986]. This over time in order to prevent instability justifies migrating processes that manage [Casavant and Kuhl, 1988b]. It also de- to live to a certain age. In particular, he pends on the costs involved with dissemi- finds that over 40% of processes doubled

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 251 their age. He also finds that the most low and medium loaded systems, which UNIX processes are short-lived, more than have a few overloaded nodes. This strat- 78% of the observed processes have a life- egy is convenient for remote invocation time shorter than 1s and 97% shorter strategies [Eager et al., 1986a; Krueger than 4s. and Livny, 1987b; Agrawal and Ezzat, Harchol-Balter and Downey explore 1987]. the correlation between process lifetime r A receiver-initiated policy is acti- and acceptable migration costs [Harchol- vated on underloaded nodes willing to Balter and Downey, 1997]. They derive accept the load from overloaded ones. a more accurate form of the process life- A receiver-initiated policy is preferable time distribution that allows them to pre- for high load systems, with many over- dict the life-time correlated to the pro- loaded nodes and few underloaded ones. cess age and to derive a cost criterion Process migration is particularly well- for migration. Svensson filters out short- suited for this strategy, since only with running processes by relying on statistics migration can one initiate process trans- [Svensson, 1990], whereas Wang et al. de- fer at an arbitrary point in time [Bryant ploy AI theory for the same purpose [Wang and Finkel, 1981; Eager et al., 1986a; et al., 1993]. Krueger and Livny, 1988]. r 2.8. Distributed Scheduling A symmetric policy is the combina- tion of the previous two policies, in an This section addresses distributed sched- attempt to take advantage of the good uling closely related to process migration characteristics of both of them. It is suit- mechanisms. General surveys are pre- able for a broader range of conditions sented elsewhere [Wang and Morris, 1985; than either receiver-initiated or sender- Casavant and Kuhl, 1988a; Hac, 1989b; initiated strategies alone [Krueger and Goscinski, 1991; Chapin, 1996]. Livny, 1987b; Shivaratri et al., 1992]. Distributed scheduling uses the infor- r A random policy chooses the destina- mation provided by the load informa- tion node randomly from all nodes in a tion management module to make migra- distributed system. This simple strategy tion decisions, as described in Figure 4. can result in a significant performance The main goal is to determine when to improvement [Alon et al., 1987; Eager migrate which process where. The acti- et al., 1986b; Kunz, 1991]. vation policy provides the answer to the question when to migrate. Scheduling is The following are some of the issues in activated periodically or it is event-driven. distributed scheduling related to the pro- After activation, the load is inspected, and cess migration mechanism: if it is above/below a threshold, actions r are undertaken according to the selected Adaptability is concerned with the strategy. The selection policy answers the scheduling impact on system behavior question which process to migrate. The [Stankovic, 1984]. Based on the cur- processes are inspected and some of them rent host and network load, the rela- are selected for migration according to the tive importance of load parameters may specified criteria. Where to migrate de- change. The policy should adapt to these pends on the location policy algorithm, changes. Process migration is inherently which chooses a remote node based on the adaptable because it allows processes to available information. run prior to dispatching them to other There are a few well-known classes of nodes, giving them a chance to adapt. Mi- distributed scheduling policies: gration can happen at any time (thereby adapting to sudden load changes), r A sender-initiated policy is activated whereas initial placement happens only on the node that is overloaded and that prior to starting a process. Examples of wishes to off-load to other nodes. A adaptive load distribution include work sender-initiated policy is preferable for by Agrawal and Ezzat [1987], Krueger

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 252 D. S. Milojiˇci´cetal.

and Livny [1988], Concepcion and scheduling among the groups. This area Eleazar [1988], Efe and Groselj [1989], has attracted much research [Bowen Venkatesh and Dattatreya [1990], Shiv- et al., 1988; Bonomi and Kumar, 1988; aratri and Krueger [1990], and Mehra Feitelson and Rudolph, 1990; Gupta and Wah [1992]. and Gopinath, 1990; Gopinath and r Stability is defined as the ability to de- Gupta, 1991; Chapin, 1995]. A process tect when the effects of further actions migration mechanism is a good fit for (e.g. load scheduling or paging) will not hierarchical scheduling since processes improve the system state as defined by are typically migrated within a LAN or a user’s objective [Casavant and Kuhl, other smaller domain. Only in the case 1988b]. Due to the distributed state, of large load discrepancies are processes some instability is inevitable, since it migrated between domains, i.e. between is impossible to transfer state changes peers at higher levels of the hierarchy. across the system instantly. However, high levels of instability should be The most important question that dis- avoided. In some cases it is advisable tributed scheduling studies address re- not to perform any action, e.g. under ex- lated to process migration is whether mi- tremely high loads it is better to aban- gration pays off. The answer depends don load distribution entirely. Process heavily on the assumptions made. For ex- migration can negatively affect stability ample, Eager et al. compare the receiver- if processes are migrated back and forth and sender-initiated policies [Eager et al., among the nodes, similar to the thrash- 1986a], and show that the sender-initiated ing introduced by paging [Denning, policies outperform the receiver-initiated 1980]. To prevent such behavior a limit policies for light and moderate system on the number of migrations can be im- loads. The receiver-initiated policy is bet- posed. Bryant and Finkel demonstrate ter for higher loads, assuming that trans- how process migration can improve sta- fer costs are same. They argue that the bility [Bryant and Finkel, 1981]. transfer costs for the receiver policy, that r requires some kind of migration, are much Approximate and heuristic sched- higher than the costs for mechanisms uling is necessary since optimal solu- for the sender-initiated strategies, where tions are hard to achieve. Suboptimal initial placement suffices. They finally solutions are reached either by approx- conclude that under no condition could mi- imating the search space with its subset gration provide significantly better perfor- or by using heuristics. Some of the mance than initial placement [Eager et al., examples of approximate and heuristic 1988]. scheduling include work by Efe [1982], Krueger and Livny investigate the rela- Leland and Ott [1986], Lo [1988], tionship between load balancing and load Casavant and Kuhl [1988a], and Xu and sharing [Krueger and Livny, 1988]. They Hwang [1990]. Deploying process migra- argue that load balancing and load shar- tion introduces more determinism and ing represent various points in a contin- requires fewer heuristics than alterna- uum defined by a set of goals and load con- tive load distribution mechanisms. Even ditions [Krueger and Livny, 1987]. They when incorrect migration decisions are claim that the work of Eager et al. [1988] made, they can be alleviated by subse- is only valid for a part of the contin- quent migrations, which is not the case uum, but it cannot be adopted generally. with initial process placement where Based on better job distributions than processes have to execute on the same those used by Eager et al., their simulation node until the end of its lifetime. results show that migration can improve r Hierarchical scheduling integrates performance. distributed and centralized scheduling. Harchol-Balter and Downey present the It supports distributed scheduling most recent results on the benefits of us- within a group of nodes and centralized ing process migration [Harchol-Balter and

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 253

Downey, 1997]. They use the measured and Birman, 1982], Nest [Agrawal and distribution of process lifetimes for a va- Ezzat, 1987], Sprite [Ousterhout et al., riety of workloads in an academic envi- 1988], Plan 9 [Pike et al., 1990], Amoeba ronment. The crucial point of their work [Mullender et al., 1990], Drums [Bond, is understanding the correct lifetime dis- 1993], Utopia [Zhou et al., 1994], and Hive tribution, which they find to be Pareto [Chapin et al., 1995]. (heavy-tailed). Based on the trace-driven Remote execution has disadvantages as simulation, they demonstrate a 35-50% well. It allows creation of the remote in- improvement in the mean delay when us- stance only at the time of process creation, ing process migration instead of remote as opposed to process migration which al- execution (preemptive v. non-preemptive lows moving the process at an arbitrary scheduling) even when the costs of migra- time. Allowing a process to run on the tion are high. source node for some period of time is ad- Their work differs from Eager et al. vantageous in some respects. This way, [1988] in system model and workload de- short-lived processes that are not worth scription. Eager et al. model server farms, migrating are naturally filtered out. Also, where the benefits of remote execution the longer a process runs, the more in- are overestimated: there are no associated formation about its behavior is available, costs and no affinity toward a particular such as whether and with whom it commu- node. Harchol-Balter and Downey model a nicates. Based on this additional informa- network of workstations where remote ex- tion, scheduling policies can make more ecution entails costs, and there exists an appropriate decisions. affinity toward some of the nodes in a dis- Cloning processes is useful in cases tributed system. The workload that Eager where the child process inherits state from et al. use contains few jobs with non-zero the parent process. Cloning is typically life-times, resulting in a system with lit- achieved using a remote fork mechanism. tle imbalance and little need for process A remote fork, followed by the termina- migration. tion of the parent, resembles process mi- gration. The complexity of cloning pro- 2.9. Alternatives to Process Migration cesses is similar to migration, because the Given the relative complexity of im- same amount of the process state is in- plementation, and the expense incurred herited (e.g. open files and address space). when process migration is invoked, re- In the case of migration, the parent is searchers often choose to implement al- terminated. In the case of cloning, both ternative mechanisms [Shivaratri et al., parent and child may continue to access 1992; Kremien and Kramer, 1992]. the same state, introducing distributed Remote execution is the most fre- shared state, which is typically complex quently used alternative to process migra- and costly to maintain. Many systems use tion. Remote execution can be as simple remote forking [Goldberg and Jefferson, as the invocation of some code on a re- 1987; Smith and Ioannidis, 1989; Zajcew mote node, or it can involve transferring et al., 1993]. the code to the remote node and inherit- Programming language support for ing some of the process environment, such mobility enables a wide variety of options, as variables and opened files. Remote exe- since such systems have almost complete cution is usually faster than migration be- control over the runtime implementation cause it does not incur the cost of trans- of an application. Such systems can en- ferring a potentially large process state able self-checkpointing (and hence migrat- (such as the address space, which is cre- able) applications. They are suitable for ated anew in the case of remote execu- entire processes, but also for objects as tion). For small address spaces, the costs small as a few bytes, such as in Emerald for remote execution and migration can [Jul et al., 1988; Jul, 1989] or Ellie be similar. Remote execution is used in [Andersen, 1992]. Finer granularity in- many systems such as COCANET [Rowe curs lower transfer costs. The complexity

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 254 D. S. Milojiˇci´cetal. of maintaining communication channels poses different kinds of problems. In Emerald, for example, the pointers have to be updated to the source object. Pro- gramming language support allows a pro- grammer to introduce more information on object behavior, such as hints about communication and concurrency patterns. Object migration at the middleware level is also possible. Because of the in- creasing costs of operating system devel- opment and the lack of standard solutions Fig. 5. Migration levels differ in implementation complexity, performance, transparency, and reusa- for distributed systems and heterogene- bility. ity, middleware level solutions have be- come of more interest [Bernstein, 1996]. search in the mobile agent community. Distributed objects are supported in mid- Mobile agents are described in more detail dleware systems such as DCE [Rosenberry in Section 4.8. et al., 1992] and CORBA [OMG, 1996]. Ob- ject migration at the middleware level has 3. CHARACTERISTICS not attracted as much research as process migration in operating systems. One of This section addresses issues in pro- the reasons is that the early heterogeneity cess migration, such as complexity, per- of these systems did not adequately sup- formance, transparency, fault resilience, port mobility. Nevertheless, a couple of scalability and heterogeneity. These char- systems do support mobility at the mid- acteristics have a major impact on the dleware level, such as DC++ [Schill and effectiveness and deployment of process Mock, 1993] and the OMG MASIF speci- migration. fication for mobile agents [Milojiciˇ c´ et al., 3.1. Complexity and Operating 1998b] based on OMG CORBA. System Support Mobile agents are becoming increas- ingly popular. The mobility of agents on The complexity of implementation and the Web emphasizes safety and security is- dependency on an operating system are sues more than complexity, performance, among the obstacles to the wider use transparency and heterogeneity. Mobile of process migration. This is especially agents are implemented on top of safe true for fully-transparent migration im- languages, such as Java [Gosling et al., plementations. Migration can be classified 1996], Telescript [White, 1996] and Tcl/Tk according to the level at which it is applied. [Ousterhout, 1994]. Compared to process It can be applied as part of the operating migration, mobile agents have reduced im- system kernel, in user space, as part of a plementation complexity because they do system environment, or as a part of the ap- not have to support OS semantics. Per- plication (see Figure 5). Implementations formance requirements are different due at different levels result in different per- to the wide-area network communication formance, complexity, transparency and cost, which is the dominant factor. Hetero- reusability. geneity is abstracted away at the language User-level migration typically yields level. The early results and opportunities simpler implementations, but suffers too for deployment, as well as the wide inter- much from reduced performance and est in the area of mobile agents, indicate transparency to be of general use for load a promising future for this form of mobil- distribution. User-space implementations ity. However, the issues of security, social are usually provided for the support of acceptance, and commercializable appli- long-running computations [Litzkow and cations have been significantly increased Solomon, 1992]. Migration implemented and they represent the main focus of re- as part of an application can have poor

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 255 reusability if modifications are required to interactions resulting from process mi- the application, as was done in the work by gration. In the early days, migration re- Freedman [1991] and Skordos [1995]. This quired additional OS support, such as ex- requires familiarity with applications and tensions for communications forwarding duplicating some of the mechanisms for [Artsy et al., 1987], or for data transfer each subsequent application, frequently strategies [Theimer et al., 1985; Zayas, involving effort beyond re-linking the mi- 1987a]. In the case of some subsequent mi- gration part with the application code. gration implementations, this support al- It could be somewhat improved if parts ready existed in the OS, such as in the case of migration support is organized in a of Mach [Milojiciˇ c´ et al., 1993a]. reusable run-time library. Lower-level mi- In UNIX-like operating systems, sup- gration is more complex to implement, but port for opened files and signals requires has better performance, transparency and significant interaction with various kernel reusability. subsystems [Douglis, 1989; Welch, 1990]. Despite high migration costs, user-level Process migration in message-passing implementations have some benefits with kernels requires significant effort to sup- regard to policy. The layers closer to an port message handling [Theimer et al., application typically have more knowl- 1985; Artsy et al., 1987; Artsy and Finkel, edge about its behavior. This knowledge 1989]. Recent operating systems provide can be used to derive better policies and much of this support, such as transparent hence, better overall performance. Sim- distributed IPC with message forwarding, ilar motivations led to the development and external distributed pagers, which of microkernels, such as Mach [Accetta allow easier optimizations and customiz- et al., 1986], Chorus [Rozier, 1992], and ing [Black et al., 1992; Rozier, 1992]. Amoeba [Tanenbaum, 1990], which have Nevertheless, migration still challenges moved much of their functionality from these mechanisms and frequently breaks the kernel into user space. For example, them [Douglis and Ousterhout, 1991; file servers and networking may be imple- Milojiciˇ c,´ 1993c]. mented in user space, leaving only a min- imal subset of functionality provided in 3.2. Performance the microkernel, such as virtual memory management, scheduling and interprocess Performance is the second important communication. factor that affects the deployment of pro- Extensible kernels, such as Spin cess migration. Migration performance de- [Bershad et al., 1995], Exokernel [Engler pends on initial and run-time costs intro- et al., 1995], and Synthetix [Pu et al., duced by the act of migration. The initial 1995], have taken an alternative approach costs stem from state transfer. Instead by allowing user implemented parts to of at migration time, some of the state be imported into the kernel. Both micro- may be transferred lazily (on-demand), kernels and extensible kernels provide thereby incurring run-time costs. Both opportunities for extracting a process’s types of cost may be significant, depend- state from the operating system. ing on the application characteristics, as There have been many implementa- well as on the ratio of state transferred tions of migration for various operat- eagerly/lazily. ing systems and hardware architectures; If only part of the task state is trans- many of them required a significant im- ferred to another node, the task can start plementation effort and modifications to executing sooner, and the initial migra- the underlying kernel [Barak and Shiloh, tion costs are lower. This principle is called 1985; Theimer et al., 1985; Zayas, 1987a; lazy evaluation: actions are not taken be- Douglis and Ousterhout, 1991]. This com- fore they are really needed with the hope plexity is due to the underlying operat- that they will never be needed. However, ing system architecture and specifically when this is not true, penalties are paid its lack of support for the complex for postponed access. For example, it is

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 256 D. S. Milojiˇci´cetal. convenient to migrate a huge address of from memory on the source node space on demand instead of eagerly. In the as in copy-on-reference [Douglis and lazy case, part of the space may never be Ousterhout, 1991]. The flushing strategy trans-ferred if it is not accessed. However, is like the eager (dirty) transfer strat- the source node needs to retain lazily eval- egy from the perspective of the source, uated state throughout the life-time of the and like copy-on-reference from the tar- migrated process. get’s viewpoint. It leaves dependencies A process’s address space usually con- on the server, but not on the source stitutes by far the largest unit of process node. state; not surprisingly, the performance of r The precopy strategy reduces the process migration largely depends on the “freeze” time of the process, the time that performance of the address space transfer. process is neither executed on the source Various data transfer strategies have been nor on the destination node. While the invented in order to avoid the high cost of process is executed on the source node, address space transfer. the address space is being transferred r to the remote node until the number The eager (all) strategy copies all of of dirty pages is smaller than a fixed the address space at the migration time. limit. Pages dirtied during precopy have Initial costs may be in the range of to be copied a second time. The precopy minutes. Checkpoint/restart implemen- strategy cuts down the freeze time below tations typically use this strategy, such the costs of the COR technique [Theimer as Condor [Litzkow and Solomon, 1992] et al., 1985]. or LSF [Zhou et al., 1994]. r The eager (dirty) strategy can be de- There are also variations of the above ployed if there is remote paging sup- strategies. The most notable example is port. This is a variant of the eager migration in the Choices operating sys- (all) strategy that transfers only modi- tem [Roush and Campbell, 1996]. It uses fied (dirty) pages. Unmodified pages are a variation of the eager (dirty) strategy paged in on request from a backing store. which transfers minimal state to the re- Eager (dirty) significantly reduces the mote node at the time of migration. The re- initial transfer costs when a process has mote instance is started while the remain- a large address space. Systems support- der of the state is transferred in parallel. ing eager (dirty) strategy include MOSIX The initial migration time is reduced to [Barak and Litman, 1985] and Locus 13.9ms running on a SparcStation II con- [Popek and Walker, 1985]. nected by a 10Mb , which is an r order of magnitude better than all other The Copy-On-Reference (COR) strat- reported results, even if results are nor- egy is a network version of demand pag- malized (see work by Rousch [1995] for ing: pages are transferred only upon ref- more details on normalized performance erence. While dirty pages are brought results). from the source node, clean pages can be Leaving some part of the process state brought either from the source node or on the source or intermediate nodes of the from the backing store. The COR strat- migrated instance results in a residual de- egy has the lowest initial costs, rang- pendency. Residual dependencies typically ing from a few tens to a few hundred occur as a consequence of two implemen- microseconds. However, it increases the tation techniques: either using lazy evalu- run-time costs, and it also requires sub- ation (see definition below), or as a means stantial changes to the underlying oper- for achieving transparency in communica- ating system and to the paging support tion, by forwarding subsequent messages [Zayas, 1987a]. to a migrated process. r The flushing strategy consists of flush- A particular case of residual depen- ing dirty pages to disk and then access- dency is the home dependency, which is ing them on demand from disk instead a dependency on the (home) node where

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 257 a process was created [Douglis and Transparency also assumes that the mi- Ousterhout, 1991]. An example of a home grated instance can execute all system dependency is redirecting systems calls to calls as if it were not migrated. Some user- the home node: for example, local host- space migrations do not allow system calls dependent calls, calls related to the file that generate internode signals or file ac- system (in the absence of a distributed cess [Mandelberg and Sunderam, 1988; file system), or operations on local devices. Freedman, 1991]. A home dependency can simplify migra- (SSI) represents a tion, because it is easier to redirect re- complete form of transparency. It provides quests to the home node than to support a unique view of a system composed of a services on all nodes. However, it also ad- number of nodes as if there were just one versely affects reliability, because a mi- node. A process can be started and com- grated foreign process will always depend municated with without knowing where on its home node. The notion of the home it is physically executing. Resources can dependency is further elaborated upon be- be transparently accessed from any node low in Section 5.1 (MOSIX) and Section 5.2 in the system as if they were attached (Sprite). to the local node. The underlying system Redirecting communication through the typically decides where to instantiate new previously established links represents processes or where to allocate and access another kind of residual dependency. In resources. general, dependencies left at multiple SSI can be applied at different levels of nodes should be avoided, since they re- the system. At the user-level, SSI consists quire complex support, and degrade per- of providing transparent access to objects formance and fault resilience. Therefore, and resources that comprise a particu- some form of periodic or lazy removal of lar programming environment. Examples residual dependencies is desirable. For ex- include Amber [Chase et al., 1989] and ample, the system could flush remain- Emerald [Jul, 1989]. At the traditional op- ing pages to the backing store, or update erating system level, SSI typically con- residual information on migrated commu- sists of a distributed file system and dis- nication channels. tributed process management, such as in MOSIX [Barak and Litman, 1985], Sprite [Ousterhout et al., 1988] and OSF/1 AD 3.3. Transparency TNC [Zajcew et al., 1993]. At the microker- Transparency requires that neither the nel level, SSI is comprised of mechanisms, migrated task nor other tasks in the sys- such as distributed IPC, distributed mem- tem can notice migration, with the possi- ory management, and remote tasking. A ble exception of performance effects. Com- near-SSI is implemented for Mach [Black munication with a migrated process could et al., 1992] based on these transpar- be delayed during migration, but no mes- ent mechanisms, but the policies are sup- sage can be lost. After migration, the ported at the OSF/1 AD server running on process should continue to communicate top of it. At the microkernel level the pro- through previously opened I/O channels, grammer needs to specify where to create for example printing to the same console remote tasks. or reading from the same files. SSI supports transparent access to a Transparency is supported in a variety process, as well as to its resources, which of ways, depending on the underlying op- simplifies migration. On the other hand, erating system. Sprite and NOW MOSIX the migration mechanism exercises func- maintain a notion of a home machine that tionality provided at the SSI level, posing executes all host-specific code [Douglis a more stressful workload than normally and Ousterhout, 1991; Barak et al., 1995]. experienced in systems without migra- Charlotte supports IPC through links, tion [Milojiciˇ c´ et al., 1993a]. Therefore, which provide for remapping after migra- although a migration implementation on tion [Finkel et al., 1989]. top of SSI may seem less complex, this

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 258 D. S. Milojiˇci´cetal. complexity is pushed down into the SSI locating techniques, such as multicasting implementation. (as used in V kernel [Theimer et al., Some location dependencies on another 1985), reliance on the home node (as used host may be inevitable, such as access- in Sprite [Douglis and Ousterhout, 1991], ing local devices or accessing kernel- and MOSIX [Barak and Litman, 1985]), dependent state that is managed by the or on a forwarding name server (as used other host. It is not possible transparently in most distributed name services, such to support such dependencies on the newly as DCE, as well as in mobile agents, such visited nodes, other than by forwarding as MOA [Milojiciˇ c´ et al., 1999]). This way the calls back to the home node, as was dependencies are singled out on dedicated done in Sprite [Douglis and Ousterhout, nodes, as opposed to being scattered 1991]. throughout all the nodes visited, as is the case for Charlotte [Artsy et al., 1987]. 3.4. Fault Resilience Shapiro, et al. [1992] propose so-called Fault resilience is frequently mentioned SSP Chains for periodically collapsing as a benefit of process migration. How- forwarding pointers (and thereby reduc- ever, this claim has never been substan- ing residual dependencies) in the case of tiated with a practical implementation, garbage collection. although some projects have specifically addressed fault resilience [Chou and 3.5. Scalability Abraham, 1983; Lu et al., 1987]. So far The scalability of a process migration the major contribution of process migra- mechanism is related to the scalability of tion for fault resilience is through com- its underlying environment. It can be mea- bination with checkpointing, such as in sured with respect to the number of nodes Condor [Litzkow and Solomon, 1992], LSF in the system, to the number of migra- Zhou et al., 1994] and in work by Skordos tions a process can perform during its life- [1995]. Migration was also suggested as a time, and to the type and complexity of means of fault containment [Chapin et al., the processes, such as the number of open 1995]. channels or files, and memory size or frag- Failures play an important role in mentation. the implementation of process migration. The number of nodes in the system af- They can happen on a source or target ma- fects the organization and management chine or on the communication medium. of structures that maintain residual pro- Various migration schemes are more or cess state and the naming of migrated pro- less sensitive to each type of failure. Resid- cesses. If these structures are not part of ual dependencies have a particularly neg- the existing operating system, then they ative impact on fault resilience. Using need to be added. them is a trade-off between efficiency and Depending on the migration algorithm reliability. and the techniques employed, some sys- Fault resilience can be improved in sev- tems are not scalable in the number of eral ways. The impact of failures during migrations a process may perform. As we migration can be reduced by maintaining shall see in the case study on Mach (see process state on both the source and Section 5.3), sometimes process state can destination sites until the destination grow with the number of migrations. This site instance is successfully promoted to is acceptable for a small number of mi- a regular process and the source node is grations, but in other cases the additional informed about this. A source node failure state can dominate migration costs and can be overcome by completely detaching render the migration mechanism useless. the instance from the source node once Migration algorithms should avoid lin- it is migrated, though this prevents ear dependencies on the amount of state lazy evaluation techniques from being to be transferred. For example, the eager employed. One way to remove communi- data transfer strategy has costs propor- cation residual dependencies is to deploy tional to the address space size, incurring

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 259 significant costs for large address spaces. points at which the application can be The costs for a lazily copied process are safely preempted and checkpointed. The independent of the address space size, but checkpointing program sets a breakpoint they can depend on the granularity and at each preemption point and examines type of the address space. For example, the the state of the process when a breakpoint transfer of a large sparse address space is encountered. Smith and Hutchinson can have costs proportional to the num- note that not all programs can be safely ber of contiguous address space regions, checkpointed in this fashion, largely de- because each such region has metadata as- pending on what features of the language sociated with it that must be transferred are used [Smith and Hutchinson, 1998]. at migration time. This overhead can be Emerald [Steensgaard and Jul, 1995] exacerbated if the meta-data for each re- is another example of a heterogeneous gion is transferred as a separate opera- system. tion, as was done in the initial implemen- In the most recent systems, heterogene- tation of Mach task migration [Milojiciˇ c´ ity is provided at the language level, as et al., 1993b]. by using intermediate byte code repre- Communication channels can also affect sentation in Java [Gosling et al., 1996], scalability. Forwarding communication to or by relying on scripting languages such a migrated process is acceptable after a as Telescript [White, 1996] or Tcl/Tk small number of sequential migrations, [Ousterhout, 1994]. but after a large number of migrations the forwarding costs can be significant. In 3.7. Summary that case, some other technique, such as This subsection evaluates the trade-offs updating communication links, must be between various characteristics of process employed. migration, and who should be concerned with it. 3.6. Heterogeneity Complexity is much more of a concern Heterogeneity has not been addressed in to the implementors of a process migra- most early migration implementations. tion facility than to its users. Complexity Instead, homogeneity is considered as a depends on the level where migration requirement; migration is allowed only is implemented. Kernel-level implemen- among the nodes with a compatible archi- tations require significantly more com- tecture and processor instruction set. This plexity than user-level implementations. was not a significant limitation at the time Users of process migration are impacted since most of the work was conducted on only in the case of user-level implemen- clusters of homogeneous machines. tations where certain modifications of the Some earlier work indicated the need application code are required or where mi- as well as possible solutions for solving gration is not fully transparent. the heterogeneity problem, but no mature Long-running applications are not con- implementations resulted [Maguire and cerned with performance as are those Smith, 1988; Dubach, 1989; Shub, 1990; applications whose lifetimes are com- Theimer and Hayes, 1991]. parable to their migration time. Short- The deployment of world-wide comput- running applications are generally not ing has increased the interest in hetero- good candidates for migration. Migration- geneous migration. In order to achieve time performance can be traded off against heterogeneity, process state needs to be execution-time (by leaving residual depen- saved in a machine-independent represen- dencies, or by lazily resolving communica- tation. This permits the process to resume tion channels). Residual dependencies are on nodes with different architectures. An of concern for long-running applications application is usually compiled in advance and for network applications. Applications on each architecture, instrumenting the with real-time requirements generally are code to know what procedures and vari- not suitable candidates for residual depen- ables exist at any time, and identifying dency because of the unpredictable costs

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 260 D. S. Milojiˇci´cetal. of bringing in additional state. On the in sufficiently many ways to warrant their other hand, real-time requirements can be inclusion [Milojiciˇ c´ et al., 1998a]. For each more easily fulfilled with strategies, such class, an overview and some examples are as precopy. presented. Finally, in the last subsection, Legacy applications are concerned with we draw some conclusions. The next sec- transparency in order to avoid any tion expands upon four of these systems in changes to existing code. Scientific ap- substantial detail. plications typically do not have trans- There are also other examples of pro- parency requirements. Frequently, one is cess migration that can fit into one or allowed to make modifications to the code more classes presented in this section. of these applications, and even support Examples include object migration in migration at the application level (e.g. Eden [Lazowska, et al., 1981]; MINIX by checkpointing state at the application [Louboutin, 1991]; Galaxy [Sinha et al., level). Transparency typically incurs com- 1991]; work by Dediu [1992]; EMPS [van plexity. However, transparency is not re- Dijk and van Gils, 1992]; object migration lated to migration exclusively, rather it for OSF DCE, DC++ [Schill and Mock, is inherent to remote access. Transpar- 1993]; work by Petri and Langendorfer ent remote execution can require support [1995]; MDX [Schrimpf, 1995]; and many that is as complex as transparent pro- more. A description of these systems is cess migration [Douglis and Ousterhout, beyond the scope of this paper. In ad- 1991]. dition to other surveys of process mi- Scientific applications (typically long- gration already mentioned in the intro- running), as well as network applications duction [Smith, 1988; Eskicioglu, 1990; are concerned with failure tolerance. In Nuttal, 1994], Borghoff provides a cat- most cases periodic checkpointing of the alogue of distributed operating systems state suffices. with many examples of migration mech- Scalability requires additional complex- anisms [Borghoff, 1991]. ity for efficient support. It is of concern for scientific applications because they may 4.1. Early Work require a large number of processes, large Early work is characterized by specialized, address spaces, and a large number of ad hoc solutions, often optimized for the communication channels. It is also impor- underlying hardware architecture. In this tant for network applications, especially subsection we briefly mention XOS, Worm, those at the Internet scale. DEMOS/MP and Butler. Heterogeneity introduces performance Migration in XOS is intended as a penalties and additional complexity. It is tool for minimizing the communication be- of most concern to network applications tween the nodes in an experimental mul- which typically run on inhomogeneous tiprocessor system, organized in a tree systems. fashion [Miller and Presotto, 1981]. The representation of the process and its state are designed in a such a way as to facili- 4. EXAMPLES tate migration. The Process Work Object This section classifies process migration (PWO) encapsulates process related state implementations in the following cate- including stack pointers and registers. Mi- gories: early work; UNIX-like systems gration is achieved by moving PWO ob- supporting transparent migration; sys- jects between the XOS nodes. The process tems with message-passing interfaces; location is treated as a hint, and the cur- microkernels; user-space migration; and rent location is found by following hints. application-specific migration. In addi- The Worm idea has its background tion, we also give an overview of mobile in the nature of real worms [Shoch and objects and mobile agents. These last two Hupp, 1982]. A worm is a computation classes do not represent process migration that can live on one or more machines. in the classic sense, but they are similar Parts of the worm residing on a single

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 261 machine are called segments. If a segment quired mechanisms for forwarding mes- fails, other segments cooperatively rein- sages and updating links. The transferred stantiate it by locating a free machine, re- state includes program code and data booting it from the network, and migrat- (most of the state), swappable and non- ing the failed worm segment to it. A worm swappable state, and messages in the in- can move from one machine to another, oc- coming queue of the process. cupying needed resources, and replicating itself. As opposed to other migration sys- 4.2. Transparent Migration in tems, a worm is aware of the underlying UNIX-like Systems network topology. Communication among worm segments is maintained through UNIX-like systems have proven to be rel- multicasting. atively hard to extend for transparent The original Butler system supports migration and have required significant remote execution and process migra- modifications and extensions to the under- tion [Dannenberg, 1982]. Migration oc- lying kernel (see Subsections 4.3 and 4.4 curs when the guest process needs to be for comparisons with other types of OSes). “deported” from the remote node, e.g. in There are two approaches to addressing case when it exceeds resources it nego- distribution and migration for these sys- tiated before arrival. In such a case, the tems. One is to provide for distribution at complete state of the guest process is the lower levels of a system, as in MOSIX packaged and transferred to a new node. or Sprite, and the other is by providing The state consists of the address space, distribution at a higher-level, as in Locus registers, as well as the state contained and its derivatives. In this section, we in the servers collocated at the same shall describe process migration for Locus, node. Migration does not break the com- MOSIX and Sprite. All of these systems munication paths because the underly- also happened to be RPC-based, as op- ing operating system (Accent [Rashid and posed to the message-passing systems de- Robertson, 1981]) allows for port migra- scribed in Section 4.3. tion. The Butler design also deals with Locus is a UNIX-compatible operating the issues of protection, security, and au- system that provides transparent access tonomy [Dannenberg and Hibbard, 1985]. to remote resources, and enhanced relia- In particular, the system protects the bility and availability [Popek et al., 1981; client program, the Butler daemons on the Popek and Walker, 1985]. It supports pro- source and destination nodes, the visiting cess migration [Walker et al., 1983] and process, and the remote node. In its later initial placement [Butterfield and Popek, incarnation, Butler supports only remote 1984]. Locus is one of the rare systems invocation [Nichols, 1987]. that achieved product stage. It has been DEMOS/MP [Miller et al., 1987] is a ported to the AIX operating system on the successor of the earlier version of the IBM 370 and PS/2 computers under the DEMOS operating system [Baskett et al., name of the Transparent Computing Fa- 1977]. Process migration is fully transpar- cility (TCF) [Walker and Mathews, 1989]. ent: a process can be migrated during exe- Locus migration has a high level of func- cution without limitations on resource ac- tionality and transparency. However, this cess. The implementation of migration has required significant kernel modifications. been simplified and its impact to other Locus has subsequently been ported to services limited by the message-passing, the OSF/1 AD operating system, under location-independent communication, and the name of TNC [Zajcew et al., 1993]. by the fact that the kernel can partici- OSF/1 AD is a distributed operating sys- pate in the communication in the same tem running on top of the Mach microker- manner as any process [Powell and Miller, nel on Intel and Paragon architectures 1983]. Most of the support for process (see Section 5.3). TNC is only partially con- migration already existed in the DEMOS cerned with task migration issues of the kernel. Extending it with migration re- underlying Mach microkernel, because in

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 262 D. S. Milojiˇci´cetal. the OSF/1 AD environment the Mach sages sent to a migrated process can be interface is not exposed to the user, and forwarded after its migration to a new therefore the atomicity of process migra- destination. However, much of the sim- tion is not affected. Locus was also used plicity that seems to be inherent for as a testbed for a distributed shared mem- message-passing systems is hidden in- ory implementation, Mirage [Fleisch and side the complex message-passing mech- Popek, 1989]. Distributed shared mem- anisms [Douglis and Ousterhout, 1991]. ory was not combined with process migra- In this section we describe Charlotte, tion as was done in the case of Mach (see Accent and the V kernel. The V kernel can Section 5.3). be classified both as a microker-nel and The MOSIX distributed operating sys- as a message passing kernel; we chose to tem is an ongoing project that began in present it in the message-passing section. 1981. It supports process migration on Charlotte is a message-passing operat- top of a single system image base [Barak ing system designed for the Crystal mul- and Litman, 1985] and in a Network of ticomputer composed of 20 VAX-11/750 Workstations environment [Barak et al., computers [Artsy and Finkel, 1989]. The 1995]. The process migration mechanism Charlotte migration mechanism exten- is used to support dynamic load balancing. sively relies on the underlying operating MOSIX employs a probabilistic algorithm system and its communication mecha- in its load information management that nisms which were modified in order to sup- allows it to transmit partial load informa- port transparent network communication tion between pairs of nodes [Barak and [Artsy et al., 1987]. Its process migration is Shiloh, 1985; Barak and Wheeler, 1989]. well insulated from other system modules. A case study of the MOSIX system is pre- Migration is designed to be fault resilient: sented in Section 5.1. processes leave no residual dependency on The Sprite network operating system the source machine. The act of migration [Ousterhout et al., 1988] was developed is committed in the final phase of the state from 1984–1994. Its process migration fa- transfer; it is possible to undo the migra- cility [Douglis and Ousterhout, 1991] was tion before committing it. transparent both to users and to appli- Accent is a distributed operating sys- cations, by making processes appear to tem developed at CMU [Rashid and execute on one host throughout their ex- Robertson, 1981; Rashid, 1986]. Its pro- ecution. Processes could access remote re- cess migration scheme was the first one to sources, including files, devices, and net- use the “Copy-On-Reference” (COR) tech- work connections, from different locations nique to lazily copy pages [Zayas, 1987a]. over time. When a user returned to a work- Instead of eagerly copying pages, virtual station onto which processes had been off- segments are created on the destination loaded, the processes were immediately node. When a page fault occurs, the vir- migrated back to their home machines and tual segment provides a link to the page could execute there, migrate else-where, on the source node. The duration of the ini- or suspend execution. A case study of the tial address space transfer is independent Sprite system is presented in Section 5.2. of the address space size, but rather de- pends on the number of contiguous mem- ory regions. The subsequent costs for lazily 4.3. OS with Message-Passing Interface copied pages are proportional to the num- Process migration for message-passing op- ber of pages referenced. The basic as- erating systems seems easier to design sumption is that the program would not and implement. Message passing is con- access all of its address space, thereby venient for interposing, forwarding and saving the cost of a useless transfer. Be- encapsulating state. For example, a new sides failure vulnerability, the drawback receiver may be interposed between the of lazy evaluation is the increased com- existing receiver and the sender, with- plexity of in-kernel memory management out the knowledge of the latter, and mes- [Zayas, 1987b].

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 263

The V Kernel is a microkernel devel- commercial implementations, and many oped at Stanford University [Cheriton, more research microkernels, such as 1988]. It introduces a “precopying” tech- Arcade [Cohn et al., 1989], Birlix [Haertig nique for the process address space trans- et al., 1993], KeyKOS [Bomberger et al., fer [Theimer et al., 1985]. The address 1992] and RHODOS [Gerrity et al., 1991]. space of the process to be migrated is The microkernel approach, combined copied to the remote node prior to its with message passing, allows for trans- migration, while the process is still ex- parent, straightforward extensions to dis- ecuting on the source node. Dirty pages tributed systems. Not surprisingly, micro- referenced during the precopying phase kernels are a suitable environment for are copied again. It has been shown that various migration experiments. The task only two or three iterations generally suf- migration mechanism can be reused by fice to reach an acceptably small number different OS personalities, as a common of dirty pages. At that point of time the denominator for different OS-specific pro- process is frozen and migrated. This tech- cess migration mechanisms. In this sub- nique shortens the process freeze time, but section we describe process migrations for otherwise negatively influences the exe- RHODOS, Arcade, Chorus, Amoeba, Birlix cution time, since overhead is incurred and Mach. in iterative copying. Migration benefits RHODOS consists of a nucleus that from a communications protocol that dy- supports trap and interrupt handling, con- namically rebinds to alternate destina- text switching, and local message pass- tion hosts as part of its implementation ing. The kernel runs on top of the nu- of reliable message delivery. Instead of cleus and supports IPC, memory, process, maintaining process communication end- and migration managers [Gerrity et al., points after migration, V relies on multi- 1991]. The migration mechanism is sim- cast to find the new process location. ilar to that in Sprite, with some modifica- tions specific to the RHODOS kernel [Zhu, 1992]. 4.4. Microkernels Arcade considers groups of tasks for The microkernel approach separates the migration [Cohn et al., 1989]. It is used classical notion of a monolithic kernel as a framework for investigating sharing into a microkernel and an operating sys- policies related to task grouping [Tracey, tem personality running on top of it in 1991]. The group management software a separate module. A microkernel sup- ensures that members of the group ex- ports tasks, threads, IPC and VM man- ecute on different machines, thereby ex- agement, while other functionality, such ploiting parallelism. as networking, file system and process The Chorus microkernel was extended management, is imple-mented in the OS to support process migration [Philippe, personality. Various OS personalities have 1993]. The migration mechanism is sim- been implemented, such as BSD UNIX ilar to task migration on top of Mach (cf. [Golub et al., 1990], AT&T UNIX System Section 5.3), however it is applied at the V [Rozier, 1992; Cheriton, 1990], MS DOS process level, instead of the Actor level. Ac- [Malan et al., 1991], VMS [Wiecek, 1992], tors in Chorus correspond to Mach tasks. OS/2 [Phelan and Arendt, 1993] and Chorus migration is biased toward the [Barbou des Places et al., 1996]. hypercube implementation (fast and reli- In the late eighties and early nineties, able links). Some limitations were intro- there was a flurry of research into mi- duced because Chorus did not support port crokernels, including systems, such as migration. Mach [Accetta et al., 1986], Chorus Steketee et al. implemented process mi- [Rozier, 1992], Amoeba [Mullender et al., gration for the Amoeba operating system 1990], QNX [Hildebrand, 1992], Spring [Steketee et al., 1994]. Communication [Hamilton and Kougiouris, 1993] and L3 transparency relies on the location inde- [Liedtke, 1993], which eventually reached pendence of the FLIP protocol [Kaashoek

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 264 D. S. Milojiˇci´cetal. et al., 1993]. Since Amoeba does not sup- cally a function of the address space size, port virtual memory, the memory transfer since the eager (all) data transfer scheme for process migration is achieved by phys- is deployed. This subsection presents a ical copying [Zhu et al., 1995]. few such implementations: Condor, the Birlix supports adaptable object migra- work by Alonso and Kyrimis, the work tion [Lux, 1995]. It is possible to spec- by Mandelberg and Sunderam, the work ify a migration policy on a per-object ba- by Petri and Langendoerfer, MPVM, and sis. A meta-object encapsulates data for LSF. the migration mechanism and information Condor is a software package that sup- collection. An example of the use of an ports user-space checkpointing and pro- adaptable migration mechanism is to ex- cess migration in locally distributed sys- tend migration for improved reliability or tems [Litzkow, 1987; Litzkow et al., 1988; performance [Lux et al., 1993]. Litzkow and Solomon, 1992]. Its check- Mach [Accetta et al., 1986] was used pointing support is particularly useful for as a base for supporting task migration long-running computations, but is too ex- [Milojiciˇ c´ et al., 1993b], developed at the pensive for short processes. Migration in- University of Kaiserslautern. The goals volves generating a core file for a process, were to demonstrate that microkernels combining this file with the executable are a suitable substrate for migration and then sending this on to the target mechanisms and for load distribution in machine. System calls are redirected to a general. The task migration implementa- “shadow” process on the source machine. tion significantly benefited from the near This requires a special version of the C SSI provided by Mach, in particular from library to be linked with the migrated distributed IPC and distributed memory programs. management. Process migration was built Condor does not support processes that for the OSF/1 AD 1 server using Mach use signals, memory mapped files, timers, task migration [Paindaveine and Milojiciˇ c,´ shared libraries, or IPC. The scheduler 1996]. Task and process migration on top activation period is 10 minutes, which of Mach are discussed in more detail in demonstrates the “heaviness” of migra- Section 5.3. tion. Nevertheless, Condor is often used for long-running computations. It has been ported to a variety of operating sys- 4.5. User-space Migrations tems. Condor was a starting point for a few While it is relatively straightforward to industry products, such as LSF from Plat- provide process migration for distributed form Computing [Zhou et al., 1994] and operating systems, such as the V kernel, Loadleveler from IBM. Accent, or Sprite, it is much harder to sup- Alonso and Kyrimis perform minor port transparent process migration on in- modifications to the UNIX kernel in order dustry standard operating systems, which to support process migration in user space are typically non-distributed. Most work- [Alonso and Kyrimis, 1988]. A new signal stations in the 1980s and 1990s run pro- for dumping process state and a new sys- prietary versions of UNIX, which makes tem call for restarting a process are intro- them a more challenging base for process duced. This implementation is limited to migration than distributed operating sys- processes that do not communicate and tems. Source code is not widely available are not location- or process-dependent. for a proprietary OS; therefore, the only The work by Alonso and Kyrimis was done way to achieve a viable and widespread in parallel with the early Condor system. migration is to implement it in user space. Mandelberg and Sunderam present User-space migration is targeted to a process migration scheme for UNIX that long-running processes that do not pose does not support tasks that perform I/O significant OS requirements, do not need on non-NFS files, spawn subprocesses, transparency, and use only a limited set of or utilize pipes and sockets [Mandelberg system calls. The migration time is typi- and Sunderam, 1988]. A new terminal

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 265 interface supports detaching a process that they will be used in the predominant from its terminal and monitors requests phase of execution. This ad hoc process mi- for I/O on the process migration port. gration considers only memory contents. Migratory Parallel Virtual Machine Skordos integrates migration with par- (MPVM) extends the PVM system allel simulation of subsonic fluid dy- [Beguelin et al., 1993] to support process namics on a cluster of workstations migration among homogeneous machines [Skordos, 1995]. Skordos statically allo- [Casas et al., 1995]. Its primary goals are cates problem sizes and uses migration transparency, compatibility with PVM, when a workstation becomes overloaded. and portability. It is implemented entirely Upon migration, the process is restarted as a user-level mechanism. It supports after synchronization with processes par- communication among migrating pro- ticipating in the application on other cesses by limiting TCP communication to nodes. At the same time, it is possible to other MPVM processes. conduct multiple migrations. On a cluster Load Sharing Facility (LSF) sup- of 20 HP-Apollo workstations connected ports migration indirectly through pro- by 10 Mbps Ethernet, Skordos notices ap- cess checkpointing and restart [Platform proximately one migration every 45 min- Computing, 1996]. LSF can work with utes. Each migration lasts 30 seconds on checkpointing at three possible levels: average. Despite the high costs, its rela- kernel, user, and application. The tech- tive impact is very low. Migrations happen nique used for user-level check-pointing is infrequently, and do not last long relative based on the Condor approach [Litzkow to the overall execution time. and Solomon, 1992], but no core file Bharat and Cardelli describe Migra- is required, thereby improving perfor- tory Applications, an environment for mi- mance, and signals can be used across grating applications along with the user checkpoints, thereby improving trans- interface and the application context, parency. LSF is described in more detail in thereby retaining the same “look and feel” Section 5.4. across different platforms [Bharat and Cardelli, 1995]. This type of migration is particularly suitable for mobile applica- 4.6. Application-specific Migration tions, where a user may be travelling from Migration can also be implemented as a one environment to another. Migratory ap- part of an application. Such an approach plications are closely related to the un- deliberately sacrifices transparency and derlying programming language Oblique reusability. A migrating process is typi- [Cardelli, 1995]. cally limited in functionality and migra- tion has to be adjusted for each new appli- 4.7. Mobile Objects cation. Nevertheless, the implementation can be significantly simplified and opti- In this paper we are primarily concerned mized for one particular application. In with process and task migration. Object this subsection we describe work by Freed- migration and mobile agents are two other man, Skordos, and Bharat and Cardelli. forms of migration that we mention briefly Freedman reports a process migration in this and the following subsection. Al- scheme involving cooperation between the though used in different settings, these migrated process and the migration mod- forms of migration serve a similar purpose ule [Freedman, 1991]. The author ob- and solve some of the same problems as serves that long-running computations process migration does. In this subsection, typically use operating system services we give an overview of object migration for in the beginning and ending phases of Emerald, SOS and COOL. execution, while most of their time is spent Emerald is a programming language in number-crunching. Therefore, little at- and environment for the support of dis- tention is paid to supporting files, sock- tributed systems [Black et al., 1987]. It ets, and devices, since it is not expected supports mobile objects as small as a

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 266 D. S. Milojiˇci´cetal. couple of bytes, or as large as a UNIX pro- COOL provides an object-oriented layer cess [Jul, 1988; Jul et al., 1988]. Objects on top of Chorus [Amaral et al., 1992]. It have a global, single name space. In addi- supports DSM-based object sharing, per- tion to traditional process migration ben- sistent store, and object clustering. Trans- efits, Emerald improves data movement, parent remote invocation is achieved with object invocation and garbage collection. a simple communication model using the In Emerald, communication links are COOL base primitives. When re-mapped pointers to other objects. Upon each ob- onto a new node, all internal references ject migration, all object pointers need to are updated depending on the new lo- be updated. The Emerald compiler gen- cation by pointer swizzling [Lea et al., erates the templates that are associated 1993], which is a technique for convert- with the object data areas describing its ing the persistent pointers or object iden- layout. The templates contain informa- tifiers into the main memory pointers tion about which objects are associated (addresses). Conversion can be activated with the given object, including the point- upon an access to the object (swizzling ers to other objects. These pointers are on discovery) or eagerly (all objects at changed if the referenced object moves. once upon the discovery of the first per- Pointers are optimized for local invocation sistent pointer). Pointer swizzling can also because mobility is a relatively infrequent be used for supporting large and persis- case compared to local invocation. Objects tent address spaces [Dearle, et al., 1994] that become unreachable are garbage col- and in very large data bases [Kemper and lected. Moving a small passive object on a Kossmann, 1995]. cluster of 4 MicroVax II workstations con- nected by a 10 megabit/second Ethernet 4.8. Mobile Agents takes about 12 ms while moving a small process takes about 40 ms. Some mod- In the recent past, mobile agents have re- est experiments demonstrated the bene- ceived significant attention. A number of fits of Emerald for load distribution [Jul, products have appeared and many suc- 1989]. cessful research systems have been devel- Shapiro investigates object migration oped (see description of these systems be- and persistence in SOS [Shapiro et al., low). A patent has been approved for one of 1989]. The objects under consideration the first mobile agent systems, Telescript are small to medium size (a few hun- [White, et al. 1997] and a standard was dred bytes). Of particular concern are adopted by OMG [Milojiciˇ c´ et al., 1998b]. intra-object references and how they are Mobile agents derive from two fields: preserved across object migrations. Ref- agents, as defined in the artificial in- erences are expressed through a new telligence community [Shoham, 1997], type, called a permanent pointer. After and distributed systems, including mobile migration, permanent pointers are lazily objects and process migration [Milojiciˇ c´ evaluated, based on the proxy principle et al., 1999]. However, their popularity [Shapiro, 1986]. A proxy is a new ob- started with the appearance of the Web ject that represents the original object, and Java. The former opened vast oppor- maintains a reference to it at the new tunities for applications suited for mobile location, and provides a way to access it. agents and the latter became a driving Proxies, and the term proxy principle de- programming language for mobile agents. scribing its use, are extensively used in In a Web environment, programming distributed systems with or without mi- languages focus on platform indepen- gration (e.g. for distributed IPC [Barrera, dence and safety. Innovations in OS 1991], distributed memory management services take place at the middleware [Black et al. 1998], and proxy servers on level rather than in kernels [Bernstein, the Web [Brooks, et al. 1995]). Functional- 1996]. Research in distributed systems ity can be arbitrarily distributed between has largely refocused from local to wide- a proxy and its principal object. area networks. Security is a dominant

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 267 requirement for applications and sys- computers, e.g. by minimizing connection tems connected to the Web. In this en- time and communication. The TACOMA vironment, mobile agents are a very project is a joint effort by Tromso and promising mechanism. Typical uses in- Cornell Universities [Johansen et al., clude electronic commerce and support 1995]. Compared to other mobile agent for mobile, sporadically-connected com- research, which addresses programming puting for which agents overcome limita- languages aspects, TACOMA addresses tions posed by short on-line time, reduced operating system aspects. The main re- bandwidth, and limited storage. search topics include security and re- Java has proven to be a suitable pro- liability. Mole is one of the first aca- gramming language for mobile agents be- demic agent systems written in Java cause it supports mobile code and mobile [Baumann, et al., 1998]. It has been objects, remote object model and language used by industry (Siemens, Tandem, and and run-time safety, and it is operating Daimler Benz), and academia (Univer- system independent. sity of Geneva). Mole addresses groups of While a large amount of the OS- agents, agent termination, and security level support for migration concentrated for protecting agents against malicious on transparency issues, the agent ap- hosts. proach has demonstrated less concern for There are also many other mobile transparency. agent systems, such as Ara [Peine and We provide an overview a few commer- Stolpmann, 1997], Messenger [Tschudin, cial mobile agent systems, such as Tele- 1997], MOA [Milojiciˇ c´ et al., 1998a], and script, IBM Aglets, and Concordia, and a Sumatra [Ranganathan, et al., 1997]. A few academic systems, such as Agent Tcl, lot of effort has been invested in security TACOMA and Mole. of mobile agents, such as in the work by Telescript first introduced the mobile Farmer, et al. [1996], Hohl [1998], Tardo agent concepts [White, 1996]. It is tar- and Valente [1996], Vigna [1998], and geted for the MagicCap, a small hand- Vitek, et al. [1997]. A paper by Chess et al. held device. Telescript first introduced mo- [1995] is a good introduction to mobile bile agent concepts place and permit and agents. mechanisms meet and go. IBM Aglets is one of the first commercially available mo- bile agent systems based on Java [Lange 5. CASE STUDIES and Oshima, 1998]. It is developed by This section presents four case studies of IBM Tokyo Research Lab IBM. Aglets has process migration: MOSIX, Sprite, Mach, a large community of users and applica- and LSF. At least one of the authors of tions, even a few commercial ones. Con- this survey directly participated in the de- cordia is a mobile agent system developed sign and implementation of each of these at the Mitsubishi Electric ITA Laboratory systems. Because it is difficult to choose a [Wong, et al., 1997]. It is a Java-based sys- representative set of case studies, the se- tem that addresses security (by extend- lection of systems was guided by the au- ing the Java security manager) and reli- thors’ personal experience with the chosen ability (using message queuing based on systems. two-phase-commit protocol). Concordia is used for many in-house applications. 5.1. MOSIX Agent Tcl started as a Tcl/Tk-based transportable agent, but it has been ex- MOSIX is a distributed operating system tended to support Java, Scheme and C/ from the Hebrew University of Jerusalem. C++ [Kotz, et al., 1997]. It is used for MOSIX is an ongoing project which be- the development of the DAIS system for gan in 1981 and released its most recent information retrieval and dissemination version in 1996. Automatic load balanc- in military intelligence [Hoffman, et al., ing between MOSIX nodes is done by pro- 1998]. Agent Tcl is optimized for mobile cess migration. Other interesting features

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 268 D. S. Milojiˇci´cetal.

nodes. Diskless nodes in MOSIX rely on a specific node for file services. r Dynamic configuration. MOSIX nodes may join or leave a MOSIX cluster at any time. Processes that are not running on a node or using some node specific re- source, are not affected by the loss of that node. r Scalability. System algorithms avoid us- ing any global state. By avoiding depen- dence on global state or centralized con- trol, the system enhances its ability to scale to a large number of nodes. Fig. 6. The MOSIX Architecture. Design. The system architecture sepa- include full autonomy of each node in the rates the UNIX kernel into a lower and system, fully-decentralized control, single an upper kernel. Each object in MOSIX, system image, dynamic configuration and like an open file, has a universal object scalability. pointer that is unique across the MOSIX Various versions of MOSIX have been in domain. Universal objects in MOSIX are active use at the Hebrew University since kernel objects (e.g. a file descriptor entry) 1983. The original version of MOSIX was that can reference an object anywhere in derived from UNIX Version 7 and ran on the cluster. For example, the upper kernel a cluster of PDP-11/45 nodes connected by holds a universal object for an open file; a token passing ring [Barak and Litman, the universal object migrates with the pro- 1985]. The version of MOSIX documented cess while only the host of the file has the in the MOSIX book is a cluster of multipro- local, non-universal file information. The cessor workstations which used a UNIX upper kernel provides a traditional UNIX System V.2 code base [Barak and Wheeler, system interface. It runs on each node and 1989; Barak et al., 1993]. The most recent handles only universal objects. The lower version, developed in 1993, is called NOW kernel provides normal services, such as MOSIX [Barak et al., 1995]. This version device drivers, context switching, and so enhances BSDI UNIX by providing pro- on without having any knowledge or de- cess migration on a cluster of Intel Pen- pendence on other nodes. The third com- tium processor based workstations. ponent of the MOSIX system is the linker, which maps universal objects into local ob- jects on a specific node, and which provides Goals of the MOSIX system include: internode communication, data transfer, r Dynamic process migration. At context process migration and load balancing al- switch time, a MOSIX node may elect to gorithms. When the upper kernel needs to migrate any process to another node. The perform an operation on one of the univer- migrated process is not aware of the mi- sal objects that it is handling, it uses the gration. linker to perform a remote kernel proce- r dure call on the object’s host node. Single system image. MOSIX presents a MOSIX transfers only the dirty pages process with a uniform view of the file and user area of the migrating process system, devices and networking facili- at the time of the migration, an eager ties regardless of the process’s current (dirty) transfer strategy. Text and other location. clean pages are faulted in as needed once r Autonomy of each node. Each node in the the process resumes execution on the tar- system is independent of all other nodes get node. and may selectively participate in the Process migration in MOSIX is a com- MOSIX cluster or deny services to other mon activity. A process has no explicit

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 269 knowledge about what node it is actually Scalability. MOSIX was designed as running on or any guarantees that it will a scalable system. The system relies on continue to run on its current node. The no centralized servers and maintains no migration algorithm is cooperative: for a global information about the system state. process to migrate to a node, the target Each MOSIX node is autonomous and can node must be willing to accept it. This dynamically join or withdraw from the allows individual nodes control over the MOSIX system. No remote system oper- extent of their own contribution to the ations involve more than two nodes: the MOSIX system. Individual nodes can also initiating node and the node providing the force all active processes to migrate away, service. The process migration and load a procedure that is used when shutting balancing algorithms also support scala- down an individual node. bility: load information is totally decen- Process migration in MOSIX relies on tralized. Currently, an 80-node MOSIX the fact that the upper kernel context system is running at Hebrew University. of each process is site-independent: re- Load Information Management and gardless of where the process physically Distributed Scheduling. Several types runs, its local upper kernel and linker of information are managed by MOSIX in route each system call to the appropri- order to implement its dynamic load bal- ate node. If the process decides to migrate ancing policy: the load at each node, indi- to a new node, the migration algorithm vidual process profiling, and load informa- queries the new node to ensure that it tion about other nodes in the system. is willing to accept a new process. If so, Each node computes a local load esti- the upper kernel invokes a series of re- mate that reflects the average length of mote kernel procedure calls that create its ready queue over a fixed time period. an empty process frame on the new node, By selecting an appropriate interval, the moves the upper kernel context and any impact of temporary local load fluctua- dirty pages associated with the process tions is reduced without presenting obso- and then resumes the process on the new lete information. node. For each process in the system, an exe- Fault Resilience. Failed nodes in cution profile is maintained which reflects MOSIX affect only processes running on its usage of remote resources like files or the failed node or directly using resources remote devices, communication patterns provided by the node. Nodes dynamically with other nodes, how long this process join and leave a MOSIX cluster at will. has run and how often it has created new Detection of stale objects—those that sur- child processes via the fork() system call. vive past the reboot of the object’s server— This information is useful in determin- is done by maintaining per object version ing where a process should migrate to numbers. (As an example of a stale object, when selected for migration. For example, a universal pointer to a file object must a small process that is making heavy use be reclaimed after the home node for the of a network interface or file on a specific file reboots.) Migrated processes leave no node would be considered for migration traces on other nodes. to that node. This profiling information is Transparency. Migration is com- discarded when a process terminates. pletely transparent in MOSIX, except for The MOSIX load balancing algorithm processes that use shared memory and is decentralized. Each node in the system are not eligible for migration. Full single maintains a small load information vector system image semantics are presented about the load of a small subset of other by MOSIX, making processes unaware of nodes in the system [Barak et al., 1989]. their actual physical node. A new system On each iteration of the algorithm, each call, migrate(), was added to allow pro- node randomly selects two other nodes, of cesses to determine the current location which at least one node is known to have or to request migration to a specified been recently alive. Each of the selected node. nodes is sent the most recent half of the

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 270 D. S. Milojiˇci´cetal. local load vector information. In addition, Table 1. MOSIX System Call Performance when a load information message is re- ceived, the receiving node acknowledges System Call Local Remote Slowdown receipt of the message by returning its read (1K) 0.34 1.36 4.00 own load information back to the sending write (1K) 0.68 1.65 2.43 open/close 2.06 4.31 2.09 node. fork (256 Kb) 7.8 21.60 2.77 During each iteration of the algorithm, exec (256 KB) 25.30 51.50 2.04 the local load vector is updated by incorpo- rating newly received information and by aging or replacing older load information. hout [1987], system calls were 2.8 times To discourage migration between nodes slower when executed on a remote 33MHz with small load variations, each node ad- MOSIX node [Barak et al., 1989]. Table 1 justs its exported local load information shows the measured performance and by a stability factor. For migration to take slowdown of several commonly used sys- place, the difference in load values be- tem calls. Many system calls, for example tween two nodes must exceed this stability getpid (), are always performed on the pro- value. cesse’s current node and have no remote The load balancing algorithm decides to performance degradation. migrate processes when it finds another The performance of the MOSIX mi- node with a significantly reduced load. It gration algorithm depends directly on selects a local process that has accumu- the performance of the linker’s data lated a certain minimum amount of run- transfer mechanism on a given network time, giving preference to processes which and the size of the dirty address space have a history of forking off new subpro- and user area of the migrating process. cesses or have a history of communica- The measured performance of the VME- tion with the selected node. This prevents based MOSIX migration, from one node short-lived processes from migrating. of the cluster to the bus master, was Implementation and Performance. 1.2 MB/second. The maximum data trans- Porting the original version of MOSIX to a fer speed of the system’s VME bus was new operating system base required sub- 3 MB/second. stantial modifications to the OS kernel in Some applications benefit significantly order to layer the code base into the three from executing in parallel on multiple MOSIX components (linker, lower and up- nodes. In order to allow such applications per kernels). Few changes took place at to run on a system without negatively im- the low level operating system code [Barak pacting everyone else, one needs process and Wheeler, 1989]. migration in order to be able to rebalance In order to reduce the invasiveness of loads when necessary. Arguably the most the porting effort, a completely redesigned important performance measurement is version of NOW MOSIX was developed for the measurement of an actual user-level the BSDI version of UNIX [Barak et al., application. Specific applications, for ex- 1995]. The NOW MOSIX provides process ample an implementation of a graph color- migration and load balancing. without a ing algorithm, show a near-linear speedup single system image. As in Sprite, system with increasing number of nodes [Barak calls that are location sensitive are for- et al., 1993]. Of course, this speedup does warded to the home node of a migrated not apply to other types of applications process as required (cf. Section 5.2). (non-CPU-bound, such as network or I/O The performance of a migrated pro- bound jobs). These applications may ex- cess in MOSIX depends on the nature perience different speedups. No attempt of the process. One measurement of the has been conducted to measure an average effect that migration has on a process speedup for such types of applications. is the slower performance of remote sys- Lessons Learned. The MOSIX sys- tem calls. Using the frequencies of system tem demonstrated that dynamic load bal- calls measured by Douglis and Ouster- ancing implemented via dynamic process

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 271 migration is a viable technology for workstations, and later Sparcstation and a cluster of workstations. The earlier DECstation machines. Automatic host se- MOSIX implementations required too lection went through multiple iterations many changes to the structure of the base as well, moving from a shared file to a operating system code in order to main- server-based architecture. Migration was tain the single system image nature of used regularly starting in the fall of 1988. the system. Giving up the single system Goals: image while preserving process migration r delivers most of the benefits of the earlier Workstation autonomy. Local users had MOSIX systems without requiring inva- priority over their workstation. Dynamic sive kernel changes. process migration, as opposed to merely remote invocation, was viewed primarily as a mechanism to evict other users’ pro- 5.2. Sprite cesses from a personal workstation when The Sprite Network Operating System the owner returned. In fact, without was developed at U.C. Berkeley between the assurance of local autonomy through 1984 and 1994 [Ousterhout et al., 1988]. process migration, many users would not Its primary goal was to treat a network have allowed remote processes to start of personal workstations as a time-shared on their workstation in the first place. computer, from the standpoint of shar- r Location transparency. A process would ing resources, but with the performance appear to run on a single workstation guarantees of individual workstations. It throughout its lifetime. provided a shared network file system r Using idle cycles. Migration was meant with a single-system image and a fully- to take advantage of idle workstations, consistent cache that ensured that all but not to support full load balancing. machines always read the most recently r written data [Nelson et al., 1988]. The Simplicity. The migration system tried kernel implemented a UNIX-like proce- to reuse other support within the Sprite dural interface to applications; internally, kernel, such as demand paging, even at kernels communicated with each other via the cost of some performance. For ex- a kernel-to-kernel RPC. User-level IPC ample, migrating an active process from was supported using the file system, with one workstation to another would re- either pipes or a more general mech- quire modified pages in its address space anism called pseudo-devices [Welch and to be written to a file server and faulted Ousterhout, 1988]. Virtual memory was in on the destination, rather than sent supported by paging a process’s heap and directly to the destination. stack segments to a file on the local disk Design. Transparent migration in or a file server. Sprite was based on the concept of a home An early implementation of migration machine.Aforeign process was one that in Sprite [Douglis and Ousterhout, 1987] was not executing on its home machine. suffered from some deficiencies [Douglis, Every process appeared to run on its 1989]: home machine throughout its lifetime, r processes accessing some types of files, and that machine was inherited by de- such as pseudo-devices, could not be mi- scendants of a foreign process as well. grated; Some location-dependent system calls by r there was no automatic host selection; a foreign process would be forwarded au- and tomatically, via kernel-to-kernel RPC, to r there was no automatic failure recovery. its home; examples include calls dealing with the time-of-day clock and process After substantial modifications to the groups. Numerous other calls, such as shared file system to support increased fork and exec, required cooperation be- transparency and failure recovery [Welch, tween the remote and home machines. 1990], migration was ported to Sun-3 Finally, location-independent calls, which

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 272 D. S. Milojiˇci´cetal. included file system operations, could be such as queuing VM pages to be flushed handled locally or sent directly to the to the file system. machine responsible for them, such as a 4. The source kernel allocates a buffer file server. and calls encapsulation routines for Foreign processes were subject to each module. These too can have side eviction—being migrated back to their effects. home machine—should a local user re- 5. The source kernel sends the buffer via turn to a previously idle machine. When RPC, and on the receiving machine each a foreign process migrated home, it left module de-encapsulates its own state. no residual dependencies on its former The target may perform other opera- host. When a process migrated away from tions as a side effect, such as commu- its home, it left a shadow process there nicating with file servers to arrange for with some state that would be used to the transfer of open files. support transparency. This state included such things as process identifiers and the 6. Each kernel module can execute a “post- parent-child relationships involved in the migration” procedure to clean up state, UNIX wait call. such as freeing page tables. As a performance optimization, Sprite 7. The source sends an RPC to tell the tar- supported both full process migration, in get to resume the process, and frees the which an entire executing process would buffer. migrate, and remote invocation, in which a new process would be created on a dif- Fault Resilience. Sprite process mi- ferent host, as though a fork and exec gration was rather intolerant of faults. were done together (like the Locus run During migration, the failure of the target call [Walker et al., 1983]). In the latter anytime after step 5 could result in the ter- case, state that persists across an exec call, mination of the migrating process, for ex- such as open files, would be encapsulated ample, once its open files have been moved and transferred, but other state such as to the target. After migration, the failure virtual memory would be created from an of either the home machine or the process’s executable. current host would result in the termina- When migrating an active process, tion of the process. There was no facility to Sprite writes dirty pages and cached file migrate away from a home machine that blocks to their respective file server(s). The was about to be shut down, since there address space, including the executable, would always be some residual dependen- is paged in as necessary. Migration in the cies on that machine. form of remote invocation would result in Transparency was achieved through dirty cached file blocks being written, but a conspiracy between a foreign process’s would not require an address space to be current and home workstations. Opera- flushed, since the old address space is be- tions on the home machine that involved a ing discarded. foreign process, such as a ps listing of CPU The migration algorithm consists of the time consumed, would contact its current following steps [Douglis, 1989]: machine via RPC. Operations on the cur- rent host involving transparency, includ- 1. The process is signaled, to cause it to ing all process creations and terminations, trap into the kernel. contacted the home machine. Waiting for a 2. If the process is migrating away from child, even one co-resident on the foreign its home machine, the source contacts machine, would be handled on the home the target to confirm its availability machine for simplicity. and suitability for migration. All IPC in Sprite was through the file 3. A “pre-migration” procedure is invoked system, even TCP connections. (TCP was for each kernel module. This returns served through user-level daemons con- the size of the state that will be trans- tacted via pseudo-devices.) The shared ferred and can also have side effects, network file system provided transparent

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 273 access to files or processes from different would suspend any process that could not locations over time. be remigrated. As in MOSIX, processes that share Implementation and Performance. memory could not be migrated. Also, pro- Sprite ran on Sun (Sun 2, Sun 3, cesses that map hardware devices directly Sun 4, SPARCstation 1, SPARCstation 2) into memory, such as the X server, could and Digital (DECstation 3100 and 5100) not migrate. workstations. The entire kernel consisted Scalability. Sprite was designed for a of approximately 200,000 lines of heav- cluster of workstations on a local area net- ily commented code, of which approximate work and did not particularly address the 10,000 dealt with migration. issue of scalability. As a result, neither The performance of migration in Sprite did the migration system. The central- can be measured in three respects. All ized load information management sys- measurements in this subsection were tem, discussed next, could potentially be taken on SPARCstation 1 workstations a bottleneck, although a variant based on on a 10-Mbps Ethernet, as reported in the MOSIX probabilistic load dissemina- [Douglis and Ousterhout, 1991]. tion algorithm was also implemented. In practice, the shared file servers proved to 1. The time to migrate a process was a be the bottleneck for file-intensive opera- function of the overhead of host selec- tions such as kernel compilations with as tion (36ms to select a single host, amor- few as 4-5 hosts, while cpu-intensive simu- tized over multiple selections when mi- lations scaled linearly with over ten hosts gration is performed in parallel); the [Douglis, 1990]. state for each open file (9.4ms/file); Load Information Management. A dirty file and VM blocks that must separate, user-level process (migd)was be flushed (480-660 Kbytes/second de- responsible for maintaining the state of pending on whether they are flushed each host and allocating idle hosts to ap- in parallel); process state such as exec plications. This daemon would be started arguments and environment variables on a new host if it, or its host, should during remote invocation (also 480 crash. It allocated idle hosts to request- Kbytes/second); and a basic overhead ing processes, up to one foreign “job” per of process creation and message traffic available processor. (A “job” consisted of (76ms for the null process). a foreign process and its descendants.) 2. A process that had migrated away from It supported a notion of fairness, in that its home machine incurred run-time one application could use all idle hosts overhead from forwarding location- of the same architecture but would have dependent system calls. Applications of some of them reclaimed if another ap- the sort that were typically migrated in plication requested hosts as well. Re- Sprite, such as parallel compilation and claiming due to fairness would look to LaTeX text processing, incurred only the application just like reclaiming due 1-3% degradation from running re- to a workstation’s local user returning: motely, while other applications that the foreign processes would be migrated invoked a higher fraction of location- home and either run locally, migrated dependent operations (such as access- elsewhere, or suspended, depending on ing the TCP daemon on the home ma- their controlling task’s behavior and host chine, or forking children repeatedly) availability. incurred substantial overhead. Migration was typically performed by 3. Since the purpose of migration in Sprite pmake, a parallel make program like many was to enable parallel use of many others that eventually became common- workstations, application speedup is place (e.g., [Baalbergen, 1988]) Pmake an important metric. Speedup is af- would use remote invocation and then fected by a number of factors, in- remigrate processes if migd notified it cluding the degree of parallelism, the that any of its children were evicted. It load on central resources such as the

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 274 D. S. Milojiˇci´cetal.

migd daemon, and inherently non- parallelizable operations. By compar- ing the parallel compilation of sev- eral source directories, ranging from 24 to 276 files and 1 to 3 indepen- dent link steps, one found that the speedup compared to the sequential case ranged from about 3 to 5.4 using up to 12 hosts, considerably below lin- ear speedup. During a 12-way pmake, the processors on both the server stor- ing the files being read and written, and the workstation running pmake, were Fig. 7. Task Migration Design. saturated. Network utilization was not Only task abstraction is migrated, while process abstraction remains on a significant problem, however. the source node. Lessons Learned. Here we summa- phase, only tasks were addressed, while rize the two most important lessons and UNIX processes were left on the source experiences in Sprite process migration machine, as described in Figure 7. This [Douglis, 1990; Douglis and Ousterhout, means that only Mach task state was mi- 1991]. grated, whereas the UNIX process state r Migration provided a considerable that was not already migrated as a part of source of processor cycles to the Sprite the Mach task state (e.g. state in the UNIX community. Over a one-month period, “personailty server” emulating UNIX on 30% of user processor activity came from top of the Mach microkernel) remained on migrated (foreign) processes. The host the source machine. Therefore, most of the that accounted for the greatest total us- UNIX system calls were forwarded back to age (nearly twice as many cpu-seconds the source machine, while only Mach sys- as the next greatest) ran over 70% of its tem calls were executed on the destination cycles on other hosts. machine. r Evictions accounted for 6% of all migra- Process migration for the OSF/1 AD 1 tions, with about half of these evictions server [Paindaveine and Milojiciˇ c,´ 1996] due to fairness considerations and the was developed during 1994 at the Univer- other half due to users reclaiming their site Catholique de Louvain, Belgium, as a machines. About 1% of all host alloca- part of a project on load-leveling policies tions were revoked for one of these two in a distributed system [Jacqmot, 1996]. reasons. (Evictions counted for a rela- OSF/1 AD 1 is a version of the OSF/1 tively higher fraction of all migrations operating system which provides a scal- because one host revocation could result able, high-performance single-system im- in many processes being migrated.) age version of UNIX. It is composed of servers distributed across the different nodes running the Mach microkernel. Pro- 5.3. Mach cess migration relies on the Mach task mi- Mach is a microkernel developed at gration to migrate microkernel-dependent the Carnegie Mellon University [Accetta process state between nodes. et al., 1986; Black et al., 1992], and later Mach task migration was also used at at the OSF Research Institute [Bryant, the University of Utah, for the Schizo 1995]. A migration mechanism on top of project [Swanson et al., 1993]. Task and the Mach microkernel was developed at process migration on top of Mach were the University of Kaiserslautern, from designed and implemented for clusters of 1991 to 1993 [Milojiciˇ c´ et al., 1993b]. workstations. Task migration was used for exper- Goals. The first goal was to provide a iments with load distribution. In this transparent task migration at user-level

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 275 with minimal changes to the microkernel. pager. However, the DMM became too This was possible by relying on Mach OS complex, and had performance and scal- mechanisms, such as (distributed) mem- ability problems. The particular design ory management and (distributed) IPC. mistakes include the interactions between The second goal was to demonstrate that it DSM support and virtual copies in a dis- is possible to perform load distribution at tributed system; transparent extension of the microkernel level, based on the three Mach copy-on-write VM optimization to distinct parameters that characterize mi- distributed systems; and limitations im- crokernels: processing, VM and IPC. posed by Mach’s external memory man- Design. The design of task migration agement while transparently extending it is affected by the underlying Mach mi- to distributed systems. (Copy-on-write is crokernel. Mach supported various pow- an optimization introduced to avoid copy- erful OS mechanisms for purposes other ing pages until it is absolutely needed, than task and process migration. Exam- and otherwise sharing the same copy. It ples include Distributed Memory Manage- has also been used in Chorus [Rozier, ment (DMM) and Distributed IPC (DIPC). 1992] and Sprite [Nelson and Ousterhout, DIPC and DMM simplified the design and 1988].) implementation of task migration. DIPC DMM had too many goals to be success- takes care of forwarding messages to mi- ful; it failed on many general principles, grated process, and DMM supports remote such as “do one thing, but do it right,” and paging and distributed shared memory. “optimize the common case” [Lampson, The underlying complexity of message 1983]. Some of the experiments with task redirection and distributed memory man- migration reflect these problems. Varia- agement are heavily exercised by task tions of forking an address space and mi- migration, exposing problems otherwise grating a task significantly suffered in not encountered. This is in accordance performance. While some of these cases with earlier observations about message- could be improved by optimizing the al- passing [Douglis and Ousterhout, 1991]. gorithm (as was done in the case of CRTC In order to improve robustness and per- [Milojiciˇ c´ et al., 1997]), it would only add formance of DIPC, it was subsequently to an already complex and fragile XMM redesigned and reimplemented [Milojiciˇ c´ design and implementation. Some of the et al., 1997]. Migration experiments have DMM features are not useful for task mi- not been performed with the improved gration, even though they were motivated DIPC. However, extensive experiments by task migration support. Examples in- have been conducted with Concurrent Re- clude DSM and distributed copy-on-write mote Task Creation (CRTC), an in-kernel optimizations. DSM is introduced in order service for concurrent creation of remote to support the transparent remote forking tasks in a hierarchical fashion [Milojiciˇ c´ of address spaces (as a consequence of re- et al., 1997]. The CRTC experiments are mote fork or migration) that locally share similar to task migration, because a re- memory. Distributed copy-on-write is mo- mote fork of a task address space is per- tivated by transparently forking address formed. spaces that are already created as a con- DMM enables cross-node transparency sequence of local copy-on-write, as well as at the Mach VM interface in support in order to support caching in distributed of a distributed file system, distributed case. file system, distributes processes, and dis- Even though the DIPC and DMM inter- tributed shared memory [Black, et al., faces support an implementation of user- 1998]. The DMM support resulted in sim- level task migration, there are two excep- plified design and implementation of the tions. Most of the task state is accessible functionality built on top of it, such as from user space except for the capabilities SSI UNIX and remote tasking, and it that represent tasks and threads and ca- avoided pager modifications by interpos- pabilities for internal memory state. Two ing between the VM system and the new interfaces are provided for exporting

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 276 D. S. Milojiˇci´cetal. the aforementioned capabilities into user of DMM, the established paging paths space. remain bound to the source node even A goal of one of the user-space migration after eager copying of pages is performed servers is to demonstrate different data to the destination node. transfer strategies. An external memory Transparency was achieved by delay- manager was used for implementation of ing access or providing concurrent access this task migration server. The following to a migrating task and its state during strategies were implemented: eager copy, migration. The other tasks in the sys- flushing, copy-on-reference, precopy and tem can access the migrating task either read-ahead [Milojiciˇ c´ et al., 1993b]. For by sending messages to the task kernel most of the experiments, a simplified mi- port or by accessing its memory. Send- gration server was used that relied on the ing messages is delayed by interposing default in-kernel data transfer strategy, the task kernel port with an interpose copy-on-reference. port. The messages sent to the interpose The task migration algorithm steps port are queued on the source node and are: then restarted on the destination node. The messages sent to other task ports 1. Suspend the task and abort the threads 1 are transferred as a part of migration of in order to clean the kernel state. the receive capabilities for these ports. 2. Interpose task/thread kernel ports on Access to the task address space is sup- the source node. ported by DMM even during migration. 3. Transfer the address space, capabili- Locally shared memory between two tasks ties, threads and the other task/thread becomes distributed shared memory after state. migration of either task. 4. Interpose back task/thread kernel ports In OSF/1 AD, a virtual process (Vprocs) on the destination node. framework supports transparent opera- tions on the processes independently of 5. Resume the task on the destination the actual process’s location [Zajcew et al., node. 1993]. By analogy, vprocs are to processes Process state is divided into several cat- what vnodes are to files, both providing lo- egories: the Mach task state; the UNIX cation and heterogeneity transparency at process local state; and the process- the system call interface. Distributed pro- relationship state. The local process state cess management and the single system corresponds to the typical UNIX proc and image of Mach and OSF/1 AD eased the user structures. Open file descriptors, al- process migration implementation. though part of the UNIX process state, are A single system image is preserved by migrated as part of the Mach task state. retaining the process identifier and by Fault Resilience of Mach task migra- providing transparent access to all UNIX tion was limited by the default transfer resources. There are no forwarding stub strategy, but even more by the DIPC and processes or chains. No restrictions are DMM modules. Both modules heavily imposed on the processes considered for employ the lazy evaluation principle, migration: for example, using pipes or sig- leaving residual dependencies throughout nals does not prevent a process from being the nodes of a distributed system. For migrated. example, in the case of DIPC, proxies of Scalability. The largest system that the receive capabilities remain on the Mach task migration ran on at University source node after receive capability is of Kaiserslautern consisted of five nodes. migrated to a remote node. In the case However, it would have been possible to scale it closer towards the limits of the scalability of the underlying Mach micro- 1 Aborting is necessary for threads that can wait in the kernel arbitrarily long, such as in the case of wait- kernel, which is up to a couple of thou- ing for a message to arrive. The wait operation is sand nodes on the Intel Paragon super- restartable on the destination node. computer.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 277

Table 2. Processing, IPC and VM Intensive Applications

User/Total IPC VM Type Application Time (msg/s) ((pagin + out)/s) Processing Dhrystone 1.00 3.49 0.35 + 0 IPC find 0.03 512.3 2.75 + 0 VM WPI Jigsaw 0.09 2.46 28.5 + 38.2

Migration of the address space relies Implementation and Performance. heavily on the Mach copy-on-write VM op- Milojiciˇ c´ et al. built three implementa- timization, which linearly grows the inter- tions: two user-level migrations (an op- nal VM state for each migration [Milojiciˇ c´ timized and a simple migration server); et al., 1997]. In practice, when there are and a kernel implementation. The size of just few migrations, this anomaly is not the simplified migration server is approx- noticeable. However for many consecutive imately 400 lines of code that took about 3 migrations it can reduce performance. months to design and implement. A lot of Load Information and Distributed this time was spent in debugging the DIPC Scheduling. Mach was profiled to reflect parts of code that were never before exer- remote IPC and remote paging activ- cised. Task migration, especially load dis- ity in addition to processing information. tribution experiments using task migra- This information was used to improve tion, turned out to be a very stressful test load distribution decisions. [Milojiciˇ c,´ for DIPC. 1993c]. Profiling was performed inside The size of the in-kernel version is close of the microkernel by collecting statis- to the simplified migration server, from tics for remote communication and for re- which it was derived. These two imple- mote paging and in user space, by in- mentations relied on the in-kernel sup- terposing application ports with profiler port for address space transfer. However, ports. the size of the DIPC and DMM modules A number of applications were profiled was significantly higher. One of the lat- and classified in three categories: process- est versions of optimized DIPC (nmk21b1) ing, communication and paging intensive. consisted of over 32,000 lines of code. It Table 2 gives representatives of each class. took over 10 engineer-years to release the Extended load information is used for second version of DIPC. The DMM, which applying more appropriate distributed was never optimized, consists of 24,000 scheduling decisions [Milojiciˇ c´ et al., lines of code. 1993a]. An application that causes a The optimized migration server is significant amount of remote paging, largest in size with a few thousand lines or communicates with another node, is of code. Most of this implemented a pager considered for migration to the appropri- supporting different data transfer strate- ate node. CPU-bound applications have gies. The optimized migration server did no such preference and can be migrated not rely on in-kernel data transfer strat- based only on the processing load criteria. egy, except for the support of distributed For applications consisting of a number of shared memory. processes that communicate among them- Although there is an underlying dis- selves, improvements achieved by con- tributed state in the microkernel, no dis- sidering IPC/VM information in addition tributed state is involved in the process to CPU load is proportional to the load migration facility at the server level, ren- and it can reach up to 20-50% for dis- dering the design of the migration mech- tributed scheduling strategies [Milojiciˇ c´ anism simple. The process migration code et al., 1993a]. Improvements of the perfor- consists of approximately 800 lines of code. mance of a simple application due to local- However, adding distributed process man- ity of reference can be multifold [Milojiciˇ c´ agement requires about 5000 lines of addi- et al., 1993b]. tional code. The main (initial and runtime)

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 278 D. S. Milojiˇci´cetal.

Fig. 8. Task migration performance as a function of VM size: initial costs are independent of task address space size (aside of variations due to other side effects). costs of migration are due to task mi- cess state would presumably have gration. Process migration has very little alleviated this overhead, similar to overhead in addition to task migration. the evolution in Sprite toward dis- Performance measurements were con- tinguishing between location-dependent ducted on a testbed consisting of three In- and location-independent calls [Douglis, tel 33MHz 80486 PCs with 8MB RAM. The 1989]. NORMA14 Mach and UNIX server UX28 r Applications on microkernels can be pro- were used. Performance is independent of filed as a function of processing, IPC and the address space size (see Figure 8), and VM and this information can be used is a linear function of the number of ca- for improved load distribution. Improve- pabilities. It was significantly improved in ment ranges from 20-55% for collabora- subsequent work [Milojiciˇ c´ et al., 1997]. tive types of applications. Lessons Learned 5.4. LSF r Relying on DIPC and DMM is crucial LSF (Load Sharing Facility) is a load shar- for the easy design and implementation ing and batch scheduling product from of transparent task migration, but these Platform Computing Corporation [Plat- modules also entail most of the complex- form Computing, 1996]. LSF is based on ity and they limit performance and fault the Utopia system developed at the Uni- resilience. versity of Toronto [Zhou et al., 1994], r Task migration is sufficient for mi- which is in turn based on the earlier Ph.D. crokernel applications. In contrast, as thesis work of Zhou at UC Berkeley [Zhou, mentioned above, UNIX applications 1987; Zhou and Ferrari, 1988]. would forward most system calls back LSF provides some distributed op- to the source node, resulting in an erating system facilities, such as dis- order-of-magnitude performance degra- tributed process scheduling and transpar- dation. Migrating the full UNIX pro- ent remote execution, on top of various

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 279 operating system kernels without change. Application-level checkpointing may also LSF primarily relies on initial process allow migration to work across heteroge- placement to achieve load balancing, but neous nodes. also uses process migration via check- The checkpoint file is stored in a user- pointing as a complement. LSF currently specified directory and, if the directory is runs on most UNIX-based operating sys- shared among the nodes, the process may tems. be restarted on another node by accessing Checkpointing and Migration this file. Mechanisms. LSF’s support for process Load Information Exchange. Simi- user-level process migration is based on lar to Sprite, LSF employs a centralized Condor’s approach [Litzkow and Solomon, algorithm for collecting load information. 1992]. A checkpoint library is provided One of the nodes acts as the master, and that must be linked with application code. every other node reports its local load to Part of this library is a signal handler the master periodically. If the master node that can create a checkpoint file of the fails, another node immediately assumes process so that it can be restarted on a the role of the master. The scheduling re- machine of compatible architecture and quests are directed to the master node, operating system. Several improvements which uses the load information of all the have been made to the original Condor nodes to select the one that is likely to pro- checkpoint design, such as: vide the best performance. Although many of the load information r No core dump is required in order to gen- updates may be wasted if no process need erate the checkpoint file. The running to be scheduled between load informa- state of the process is directly extracted tion updates, this algorithm has the ad- and saved in the checkpoint file together vantage of making (reasonably up-to-date) with the executable in a format that can load information of all nodes readily avail- be used to restart the process. This not able, thus reducing the scheduling delay only is more efficient, but also preserves and considering all nodes in scheduling. the original process and its ID across the Zhou et al. [1994] argue that the network checkpoint. and CPU overhead of this approach is r UNIX signals can be used by the check- negligible in modern computers and net- pointed process. The state information works. Measurements and operational ex- concerning the signals used by the pro- perience in clusters of several hundred cess is recorded in the checkpoint file and hosts confirm this observation. Such a cen- restored at restart time. tralized algorithm also makes it possible to coordinate all process placements - once In addition to user-level transparent a process is scheduled to run on a node, process checkpointing, LSF can also take this node is less likely to be considered for advantage of checkpointing already sup- other processes for a while to avoid over- ported in the OS kernel (such as in Cray loading it. For systems with thousands Unicos and ConvexOS), and application- of nodes, several clusters can be formed, level checkpointing. The latter is achiev- with selective load information exchange able in classes of applications by the pro- among them. grammer writing additional code to save Scheduling Algorithms. LSF uses the data structures and execution state in- checkpoint and restart to achieve process formation in a file that can be interpreted migration, which in turn is used to achieve by the user program at restart time in load balancing. If a node is overloaded or order to restore its state. This approach, needed by some higher priority processes, when feasible, often has the advantage of a process running on it may be migrated a much smaller checkpoint file because it to another node. The load conditions that is often unnecessary to save all the dirty trigger process migration can be config- virtual memory pages as must be done ured to be different for various types of in user-level transparent checkpointing. jobs. To avoid undesirable migration due

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 280 D. S. Milojiˇci´cetal. to temporary load spikes and to control Process Migration vs. Initial Place- migration frequency, LSF allows users to ment. Although LSF makes use of pro- specify a time period for which a process cess migration to balance the load, it is is suspended on its execution node. Only used more as an exception rather than the if the local load conditions remain unfa- rule, for three reasons. First, transparent vorable after this period would the sus- user-level checkpointing and migration pended process be migrated to another are usable by only those processes linked node. with the checkpoint library, unless the OS The target node is selected based on the kernel can be modified; in either case, dynamic load conditions and the resource their applicability is limited. Secondly, requirements of the process. Recognizing intelligent initial process placement has that different processes may require dif- been found to be effective in balancing the ferent types of resources, LSF collects a load in many cases, reducing the need for variety of load information for each node, migration [Eager et al., 1988]. Finally, and such as average CPU run queue length, perhaps most importantly, the same load available memory and swap space, disk balancing effect can often be achieved by paging and I/O rate, and the duration of process placement with much less over- idle period with no keyboard and mouse head. The remote process execution mech- activities. Correspondingly, a process may anism in LSF maintains the connection be associated with resource requirement between the application and the Remote expressions such as Execution Server on the execution node and caches the application’s execution con- select[ && swap >= 120 && text for the duration of the application ex- mem >= 64] order[cpu : mem] ecution, so that repeated remote process executions would incur low overhead (0.1 which indicates that the selected node seconds as measured by Zhou et al. on a should have a resource called “sparc,” and network of UNIX workstations [1994]). should have at least 120 MB of swap space In contrast, it is not desirable to main- and 64 MB of main memory. Among the tain per-application connections in a ker- eligible nodes, the one with the fastest, nel implementation of process migration lightly loaded CPU, as well as large mem- to keep the kernel simple, thus every pro- ory space, should be selected. A heuristic cess migration to a remote node is “cold”. sorting algorithm is employed by LSF to Per-application connections and cached consider all the (potentially conflicting) re- application state are rather “heavyweight” source preferences and select a suitable for kernel-level migration mechanisms, host. Clearly, good host allocation can only and the kernel-level systems surveyed be achieved if the load condition of all in this paper treat each migration sepa- nodes is known to the scheduler. rately (though the underlying communi- The resource requirements of a process cation systems, such as kernel-to-kernel may be specified by the user when sub- RPC, may cache connection state). The mitting the process to LSF, or may be benefits of optimizing remote execution configured in a system process file along are evident by comparing LSF to an ear- with the process name. This process file lier system such as Sprite. In the case of is automatically consulted by LSF to de- Sprite, the overhead of exec time migra- termine the resource requirement of each tion was measured to be approximately type of process. This process file also stores 330ms on Sparcstation 1 workstations information on the eligibility of each type over the course of one month [Douglis of process for remote execution and mi- and Ousterhout, 1991]. Even taking dif- gration. If the name of a process is not ferences in processor speed into account found in this file, either it is excluded from as well as underlying overheads such as migration consideration, or only nodes of file system cache flushing, LSF shows a the same type as the local node would be marked improvement in remote invoca- considered. tion performance.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 281

6. COMPARISON tems incurs a lot of complexity for the support of Single System Image and ex- In this section, we compare the various tending UNIX semantics to a distributed migration implementations described in system. As already pointed out, message the paper. We cover the case studies, as passing typically requires a lot of com- well as some other systems mentioned in plexity; examples include Charlotte and Section 4. the V kernel, as well as some of the Table 3 summarizes the process migra- microkernels, such as Mach and Chorus. tion classification provided in Section 4. In addition, some of the microkernels (e.g. We mention examples of each class of Mach) also support distributed memory migration, followed by the main charac- management, which is even harder to sup- teristic of each class. These columns are port. User-space migrations trade off the self-explanatory. The OS v. Application simplicity of the underlying support for Modification column describes where redirecting system calls or imposing limits the majority of modifications to support on them. Application-specific migrations migration are performed. Migration in the require knowledge of the application se- early work, UNIX-like systems, message mantics in order to integrate migration passing and microkernels require modifi- calls at appropriate places. cations to the underlying OS. User-space Extensibility describes how easy it is and application-specific systems require to extend a process migration implemen- modifications to the application, typically tation. Examples include support for mul- relinking and in certain cases also recom- tiple data transfer and location strategies. piling. Mobile objects and agents require In most cases, extensibility is inversely modifications to the underlying program- proportional to complexity. An exception ming environment. However, they also to this rule are message-passing kernels, have the least transparency, as described which have simple migration implementa- below. tions, but are not as extensible. Extensions The Migration Complexity column to a migration mechanism for performance describes the amount of effort required and improved fault resilience typically re- to design and implement migration. Com- quire complex changes to the underlying plexity is high for kernel implementations. mechanism for message passing. Porta- Exceptions are message-passing kernels, bility describes how easy it is to port the which already provide much of the re- migration mechanism to another operat- quired functionality in their support for ing system or computer. User-space and message passing. This results in a sim- application-specific implementations have pler migration mechanism. Microkernels superior portability. Condor and LSF run also support migration more easily be- on numerous versions of operating sys- cause of simpler abstractions and reduced tems and computers. Kernel-level imple- functionality (for example, no UNIX com- mentations are typically closely related to patibility). However, extensive complexity the underlying system and consequently is introduced for supporting distributed their portability is limited to the portabil- IPC and distributed memory manage- ity of the operating system. For example ment. The least complex implementations Mach and MOSIX were ported to a num- are those done at user level and those done ber of computer architectures. as part of an application. It is hard to compare the performance The “other complexity” subfield de- of various migration mechanisms because scribes where the complexity in the system the implementations were done on a exists. Early work incurred complexity in number of different architectures. It is infrastructure support for the underlying also hard and inaccurate to normalize hardware and software, such as Alto com- performance (some attempts toward nor- puters in the case of Worms, and the malizations were done by Roush [1995]). X-Tree architecture in the case of XOS. Therefore, we have not provided a column Transparent migration on UNIX-like sys- describing performance. Nevertheless,

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 282 D. S. Milojiˇci´cetal. Summary of the Different Migration Implementations Table 3. DEMOS, ButlerV Kernel HW dependentChorus, Mach, RHODOSMandelberg, complex LSF OS supportBharat & Cardelli easy PM implement.COOL infrastructure) moreTACOMA, Telescript appl. knowledge and (recompiled) DIPC) awareness) (Message Passing) (relinked) (OS (OS depend.) depend.) system calls) environment (appl. dep.) & safety) environment (communication) Migration/ Examples Main OS v. Appl. Migration Complexity Extensibility Transparency Characteristics Characteristics Modification (Other Complexity) & Portability in UNIX-like OS Sprite underlying env. (Supporting SSI) (OS depend.) Early WorkTransp. Migration XOS, Locus,Message-Passing Worm, MOSIX, OS Charlotte, Accent,MicrokernelsUser Space major changes to complex the OS Amoeba, ad-hoc support Arcade, solutions, BirliX,Application OS no OS UNIX semantics Condor,Mobile Alonso objects & Kyrimis, OS less Freedman, transparency Skordos,Mobile Agents OS Emerald, SOS, high low application min. Agent-TCL, transparency, low Aglets (lack of low low (DMM (forwarding application object oriented heterogeneity poor fair lowest very (app good migration fair very good good programming programming moderate limited lowest (security limited minimal full full full good good fair full

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 283

Table 4. Transparency “Checklist”

Migration/ Open Fork Communication Need to Relink Changes to Shared Supported Files Children Channels Application Kernel Memory MOSIX yes yes yes no yes no Sprite yes yes yes no yes no Mach & OSF/1 AD yes yes yes no yes yes LSF some no no yes no no we note that the performance of user- are aware of migration and they can in- and application-level migrations typically voke migration only at predefined places fall in the range of seconds, even min- in the code. Kernel-supported imple- utes, when migrating processes with large mentations typically have higher lev- address spaces. The kernel supported els of transparency. Single system image migrations, especially the newer imple- supports transparent migration at any mentations, fall in the range of tens of point of application code; migration can milliseconds. The most optimized ker- transparently be initiated either by the nel implemented migration (Choices) has migrating process or by another process. initial costs of only 14ms [Roush and Most mobile agent implementations do Campbell, 1996], and it is better even if not allow transparent migration invoca- some rough normalization is accounted for tion by other applications; only the mi- (see [Roush, 1995]). grating agent can initiate it. Even though As mentioned earlier, the dominant per- less transparent, this approach simplifies formance element is the cost to transfer implementation. the address space. Kernel-level optimiza- More specifics on transparency in the tions can cut down this cost, whereas user- case studies are presented in Table 4. level implementations do not have access Migration for each case study is catego- to the relevant data structures and cannot rized by whether it transparently sup- apply these optimizations. ports open files, forking children, commu- Recently, trends are emerging that nication channels, and shared memory. If allow users more access to kernel data, migration requires changes to the kernel mechanism, and policies [Bomberger or relinking the application, that is also et al., 1992]. For example, microkernels listed. export most of the kernel state needed for Support for shared memory of migrated user-level implementations of migration tasks in Mach is unique. In practice, it [Milojiciˇ c,´ 1993c]. Extensible kernels pro- was problematic due to a number of de- vide even more support in this direction sign and implementation issues [Black [Bershad et al., 1995; Engler et al., 1995]. et al. 1998]. Other systems that supported These trends decrease the relevance of both shared memory and migration ei- user versus kernel implementations. ther chose not to provide transparent ac- Transparency describes the extent to cess to shared memory after migration which a migrated process can continue (e.g. Locus [Walker and Mathews, 1989; execution after migration as if migra- Fleisch and Popek, 1989]), or disallowed tion has not happened. It also determines migration of processes using shared mem- whether a migrated process is allowed to ory (e.g., Sprite [Ousterhout et al., 1988]). invoke all system functions. Many user- or Kernel-level migration typically sup- application-level implementations do not ports all features transparently, whereas allow a process to invoke all system calls. user-level migrations may limit access to Migration that is implemented inside the NFS files and may not support communi- kernel typically supports full functional- cation channels or interprocess communi- ity. In general, the higher the level of cation. In addition, a user-level migration the implementation, the less transparency typically requires relinking applications is provided. User-space implementations with special libraries. Migration done as

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 284 D. S. Milojiˇci´cetal.

Table 5. Summary of Various Data Transfer Strategies

Data Transfer Example Freeze Freeze Residual Residual Initial Strategy Time Costs Time & Costs Dependency Migration Time eager (all) most user-level and high high none none high early migrations

eager (dirty) MOSIX, Locus moderate moderate none none moderate

precopy V kernel very low high none none high

copy on Accent, Mach low small high yes low reference

flushing Sprite moderate moderate moderate none moderate part of an application requires additional Finally, the flushing strategy also re- re-compilation. quires some amount of change to the In Table 5, we compare different data kernel, and has somewhat higher freeze transfer strategies with respect to freeze time than copy-on-reference, but improves time, freeze costs, residual time and residual time and costs by leaving residual costs, residual dependencies, and dependencies only on a server, but not on initial migration time (time passed the source node. Process migration in the since request for migration until process Choices system, not listed in the table, rep- started on remote node). resents a highly optimized version of eager We can see that different strategies (dirty) strategy. have different goals and introduce differ- The data transfer strategy dominates ent costs. At one end of the spectrum, process migration characteristics such as systems that implement an eager (all) performance, complexity, and fault re- strategy in user space eliminate residual silience. The costs, implementation details dependencies and residual costs, but suf- and residual dependencies of other pro- fer from high freeze time and freeze costs. cess elements (e.g. communication chan- Modifying the operating system allows nels, and naming) are also important but an eager (dirty) strategy to reduce the have less impact on process migration. amount of the address space that needs to In the Mach case study, we saw that be copied to the subset of its dirty pages. most strategies can be implemented in This increases residual costs and depen- user space. However, this requires a pager- dencies while reducing freeze time and like architecture that increases the com- costs. plexity of OS and migration design and Using a precopy strategy further im- implementation. proves freeze time, but has higher freeze Table 6 summarizes load information costs than other strategies. Applications database characteristics. Database type with real-time requirements can bene- indicates whether the information is fit from this. However, it has very high maintained as a distributed or a cen- migration time because it may require tralized database. Centralized databases additional copying of already transferred have shown surprising scalability for some pages. systems, in particular LSF. Neverthe- Copy on reference requires the most ker- less, achieving the highest level of scal- nel changes in order to provide sophis- ability requires distributed information ticated virtual mappings between nodes. management. It also has more residual dependencies Maximum nodes deployed is defined than other strategies, but it has the lowest as the number of nodes that were ac- freeze time and costs, and migration time tually used. It is hard to make predic- is low, because processes can promptly tions about the scalability of migration start on the remote node. and load information management. An

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 285

Table 6. Load Information Database

Migration/ Maximum Nodes Database Fault Knowledge Charact. Database type Deployed Scope Tolerance Relevance MOSIX distributed 64 partial yes aging Sprite centralized 30 global limited verification, update on state change or periodic Mach distributed 5 global no negotiation LSF centralized 500 global yes none approximate prediction is that centralized tion is maintained for each process in load information management could scale a system, as well as for each machine up to 500 nodes without hierarchical or- in a system. Process parameters lists ganization, such as in Sprite. With hier- the information gathered about individ- archical organization, such as in LSF, it ual processes while system parameters could scale beyond 2000 nodes. Decentral- shows the information gathered per node. ized information management, such as in All systems use processor utilization infor- MOSIX, can scale to an even larger num- mation while some of them also consider ber of nodes. Even though Mach task mi- system communication patterns and ac- gration has not been used on larger sys- cess to off-node devices. tems than a 5-node Ethernet cluster, most Disseminated parameters describes of its components that can impact scalabil- how much of the information is passed on ity (distributed IPC, distributed memory to other nodes in a system. In most cases, management, and remote address space only system information is disseminated, creation) have been demonstrated to scale and the average ready queue length is of well. The Intel Paragon computer, the most interest. In some systems, not all largest MMP machine that runs Mach, available information is retained, as de- has over 2000 nodes [Zajcew et al., 1993]. scribed in the retained information col- However, in order to use migration for load umn. For example, MOSIX retains only a distribution some decentralized informa- subset of collected information during dis- tion management algorithm would have semination phase. Negotiation parame- to be deployed, similar to the one used for ters details the information exchanged at TNC. the time of an attempted migration. Pro- Database scope defines the amount cess parameters are used during negoti- of information that is considered. Some ation phase. Finally, the collection and systems, like MOSIX, maintain partial dissemination columns detail the fre- system information in order to enhance quency of collection and dissemination in scalability. Large systems need to address four case studies. In all cases both collec- fault tolerance. One drawback of cen- tion and dissemination are periodic, with tralized databases is that storing the data the exception of Sprite—it also dissemi- on one node introduces a single point of nates upon a state change. failure. This problem can be alleviated by Table 8 summarizes the characteristics replicating the data. of distributed scheduling. The migration Once knowledge about the state of a class column indicates the type of migra- system is collected and disseminated, it tion mechanism employed. The consid- starts to lose its relevance as the state of ered costs column indicates whether and the system changes. This is an intrinsic how systems weigh the actual cost of mi- characteristic of distributed systems. The gration. The migration costs are consid- last column, knowledge relevance, lists ered in the case of MOSIX and LSF, and the methods used by the load information not in the case of Sprite and Mach. In ad- management modules to account for this. dition to CPU costs, MOSIX also accounts Table 7 describes the type of informa- for communication costs. tion managed by load information col- Migration trigger summarizes lection and dissemination. Load informa- the reasons for migration activation.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 286 D. S. Milojiˇci´cetal. info weighting Collection Dissemination non eligible aging load vector minimum residency + Distributed Scheduling , migratory shell, non eligible processes none bias toward activity or fairness) ones machines Table 8. pmake Load Information Collection and Dissemination Table 7. user input, readyqueue length version upon a state change configurable of all nodes (user-level migr.) eligible commands deviation (UNIX-like OS) eviction (due to user or list of eligible long-idle (UNIX-like OS) communication load difference processes process residency node refusal (microkernel) eligible tasks migration and remotepaging remote IPC, remote paging free paging space file access (random subset) may be refused (worm-like) LSF process migration CPU overhead configurable thresholds predefined non- lowering standard high thresholds Mach task migration no threshold cross predefined non- limit consecutive high threshold Sprite process migration no LSF none arbitrary all info retained system parameters periodic periodic MOSIX process migration CPU & threshold cross Mach age, remote IPC, average ready queue, all info retained destination load, periodic (1s) periodic (1s) System/ Migration Considered Migration A Priori Learning from Stability Sprite none time since last local all info retained migration periodic (5s) periodic (1min) and MOSIX age, I/O patterns, average ready queue partial migrating process periodic periodic (1-60s) Charact. Parameters (also disseminated) Information Parameters -event driv. (event) -event driv. (event) Migration/ Per Process System Parameters Retained Negotiation -periodic (freq.) -periodic (freq.) Characteristics Class Costs Trigger Knowledge the Past

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 287

Examples include crossing a load thresh- use. Several things worked against the old on a single node or on demand after wider adoption of the MOSIX system: the an application-specific request, or only for implementation was done on a commer- specific events like process eviction. Sprite cial operating system which prevented process migration can be initiated as a wide-spread distribution of the code. One part of the pmake program or a migratory commercial backer of MOSIX withdrew shell, or as a consequence of the eviction from the operating system business. of a remotely executed process. The current outlook is much brighter. Some of the systems use a priori The latest versions of MOSIX support pro- knowledge, typically in the form of spec- cess migration on BSDI’s version of UNIX ifying which processes are not allowed and Linux. The Linux port eliminates the to migrate. These are for example well legal barriers that prevented the distribu- known system processes, such as in the tion of early versions of the system. case of MOSIX, Sprite and Mach, or com- Sprite. Sprite as a whole did not mands in the case of LSF. The learning achieve a long-lasting success, so its pro- from the past column indicates how some cess migration facility suffered with it. systems adapt to changing loads. Exam- Sprite’s failure to expand significantly be- ples include aging load vectors and pro- yond U.C. Berkeley was due to a conscious cess residency in MOSIX, and limiting decision among its designers not to in- consecutive migrations in Mach. Stabil- vest the enormous effort that would have ity is achieved by requiring a minimum been required to support a large external residency for migrated processes after a community. Instead, individual ideas from migration (such as in MOSIX), by intro- Sprite, particularly in the areas of file sys- ducing a high threshold per node (such as tems and virtual memory,have found their in Mach and LSF), or by favoring long- way into commercial systems over time. idle machines (such as in Sprite). It can The failure of Sprite’s process migration also be achieved by manipulating load in- facility to similarly influence the commer- formation as was investigated in MOSIX. cial marketplace has come as a surprise. For example, dissemination policies can be Ten years ago we would have predicted changed, information can be weighed sub- that process migration in UNIX would be ject to current load, and processes can be commonplace today, despite the difficul- refused. ties in supporting it. Instead, user-level load distribution is commonplace, but it is 7. WHY PROCESS MIGRATION commonly limited to applications that can HAS NOT CAUGHT ON run on different hosts without ill effects, and relies either on explicit checkpointing In this section, we attempt to identify or the ability to run to completion. the barriers that have prevented a wider Mach and OSF/1. Compared to other adoption of process migration and to ex- systems, Mach has gone the furthest plain how it may be possible to overcome in technology transfer. Digital UNIX them. We start with an analysis of each has been directly derived from OSF/1, case study; we identify misconceptions; NT internals resemble the Mach design, we identify those barriers that we con- and a lot of research was impacted by sider the true impediments to the adop- Mach. However, almost no distributed tion of migration; and we conclude by out- support was transferred elsewhere. The lining the likelihood of overcoming these distributed memory management and dis- barriers. tributed IPC were extremely complex, re- quiring significant effort to develop and to 7.1. Case Analysis maintain. The redesign of its distributed MOSIX. The MOSIX distributed operat- IPC was accomplished within the OSF ing system is an exception to most other RI [Milojiciˇ c´ et al., 1997], but distributed systems supporting transparent process memory management has never been migration in that it is still in general redesigned and was instead abandoned

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 288 D. S. Milojiˇci´cetal.

[Black et al. 1998]. Consequently, task and r the lack of support for transparency, and process migration have never been trans- r the lack of support for heterogeneity. ferred elsewhere except to Universities and Labs. Some implementations, even success- LSF. Platform Computing has not ag- ful ones, indeed have reinforced such be- gressively addressed process migration liefs. Despite the absence of wide spread because the broad market is still not deployment, work on process migration ready—partially due to an immature dis- has persisted. In fact, recently we have tributed system structure, and partially seen more and more attempts to provide due to a lack of cooperation from OS migration and other forms of mobility and application vendors. But most impor- [Steensgaard and Jul, 1995; Roush and tantly there was no significant customer Campbell, 1996; Smith and Hutchinson, demand. 1998]. Checkpoint/restart systems are be- Since a vast majority of users run Unix ing deployed for the support of long- and Windows NT, for which dynamic pro- running processes [Platform Computing, cess migration is not supported by the OS 1996]. Finally, mobile agents are being in- kernel, Platform Computing has been us- vestigated on the Web. ing user-level job checkpointing and mi- If we analyze implementations, we see gration as an indirect way to achieve that technical solutions exist for each process migration for the users of LSF. of these problems (complexity, cost, non- A checkpoint library based on that of transparency and homogeneity). Migra- Condor is provided that can be linked tion has been supported with various de- with Unix application programs to en- grees of complexity: as part of kernel able transparent process migration. This mechanisms; as user-level mechanisms; has been integrated into a number of im- and even as a part of an application (see portant commercial applications. For ex- Sections 4.2–4.6). The time needed to mi- ample, a leading circuit simulation tool grate has been reduced from the range from Cadence, called Verilog, can be check- of seconds or minutes [Mandelberg and pointed on one workstation and resumed Sunderam, 1988; Litzkow and Solomon, on another. 1992] to as low as 14ms [Roush and It is often advantageous to have check- Campbell, 1996]. Various techniques have pointing built into the applications and been introduced to optimize state transfer have LSF manage the migration process. [Theimer et al., 1985; Zayas, 1987a; Roush The checkpoint file is usually smaller and Campbell, 1996] (see Section 3.2). compared to user-level, because only cer- Transparency has been achieved to dif- tain data structures need to be saved, ferent degrees, from limited to complete rather than all dirty memory pages. With (see Section 3.3). Finally, recent work more wide-spread use of workstations and demonstrates improvements in support- servers on the network, Platform Comput- ing heterogeneous systems, as done in ing is experiencing a rapidly increasing Emerald [Steensgaard and Jul, 1995], demand for process migration. Tui [Smith and Hutchinson, 1998] and Legion [Grimshaw and Wulf, 1996] (see 7.2. Misconceptions Section 3.6). Frequently, process migration has been dismissed as an academic exercise with 7.3. True Barriers to Migration Adoption little chance for wide deployment [Eager We believe that the true impediments to et al., 1988; Kremien and Kramer, 1992; deploying migration include the following: Shivaratri et al., 1992]. Many rationales r have been presented for this position, A lack of applications. Scientific such as: applications and academic loads (e.g. r pmake and simulations) represent a significant complexity, small percentage of today’s applications. r unacceptable costs, The largest percentage of applications

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 289

today represent standard PC applica- Applications. To become popular in tions, such as word-processing, and desk- the marketplace, migration needs a “killer top publishing. Such applications do not application” that will provide a compelling significantly benefit from migration. reason for commercial operating system r A lack of infrastructure. There has vendors to devote the resources needed not been a widely-used distributed op- to implement and support process migra- erating system. Few of the distributed tion. The types of application that are features of academically successful re- well-suited for process migration include search operating systems, such as Mach, processor-intensive tasks such as paral- Sprite, or the V kernel, have been trans- lel compilation and simulation, and I/O- ferred to the marketplace despite ini- intensive tasks that would benefit from tial enthusiasm. This lack increases the movement of a process closer to some the effort needed to implement process data or another process (see also Section migration. 2.4). These applications are exceedingly r rare by comparison to the typical uses of Migration is not a requirement for today’s computers in the home and work- users. Viable alternatives, such as re- place, such as word processing, spread- mote invocation and remote data access, sheets, and games. However, applications might not perform as uniformly as pro- are becoming more distributed, modular, cess migration but they are able to meet and dependent on external data. In the user expectations with a simpler and near future, because of the exceeding dif- well understood approach [Eager et al., ference in network performance, it will be 1986a, Kremien and Kramer, 1992]. more and more relevant to execute (mi- r Sociological factors have been im- grate) applications close to the source of portant in limiting the deployment of data. Modularity will make paralleliza- process migration. In the workstation tion easier (e.g. various component mod- model, each node belongs to a user. Users els, such as Java Beans and Microsoft are not inclined to allow remote pro- DCOM). cesses to visit their machines. A lot Infrastructure. The NT operating of research has addressed this prob- system is becoming a de facto standard, lem, such as process eviction in Sprite leading to a common environment. UNIX [Douglis and Ousterhout, 1991], or low- is also consolidating into fewer ver- ering the priority of foreign processes sions. All these systems start to address in the Stealth scheduler [Krueger and the needs for clustering, and large-scale Chawla, 1991]. However, the psychologi- multicomputers. Both environments are cal effects of workstation ownership still suitable for process migration. These play a role today. operating systems are becoming more and more distributed. A lot of missing infras- tructure is becoming part of the standard 7.4. How these Barriers Might be Overcome commercial operating systems or its It often takes a long time for good re- programming environments. search ideas to become widely adopted Convenience vs. requirement (im- in the commercial arena. Examples in- pact of hardware technology). The fol- clude object-orientation, multi-threading, lowing hardware technology trends may and the Internet. It may be the case that impact process migration in the future: process mobility is not ripe enough to be high speed networks, large scale systems, adopted by the commercial market. and the popularity of hardware mobile We address each of the barriers iden- gadgets. With the increasing difference tified in previous section and try to pre- in network speeds (e.g. between a mobile dict how migration might fit the future computer and a fiber-channel), the differ- needs. The rest of the section is highly ence between remote execution and migra- speculative because of the attempts to ex- tion becomes greater. Being able to move trapolate market needs and technology. processes during execution (e.g. because it

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 290 D. S. Milojiˇci´cetal. was realized that there is a lot of remote We believe there is a future for process communication) can improve performance migration. Different streams of develop- significantly. Secondly, with the larger ment may well lead to a wider deploy- scale of systems, the failures are more fre- ment of process migration. Below we in- quent, thereby increasing the relevance clude some possible paths. of being able to continue program execu- One path is in the direction of LSF, tion at another node. For long-running or a user-level facility that provides much critical applications (those that should not of the functionality of full-fledged pro- stop executing) migration becomes a more cess migration systems, but with fewer attractive solution. Finally, the increas- headaches and complications. The check- ing popularity of hardware mobile gadgets point/restart model of process migration will require mobility support in software. has already been relatively widely de- Examples include migrating applications ployed. Packages such as Condor, LSF from a desktop, to a laptop, and eventually and Loadleveler are used for scientific and to a gadget (e.g. future versions of cellular batch applications in production environ- phones or palmtops). ments. Those environments have high de- Sociology. There are a few factors re- mands on their computer resources and lated to sociology. The meaning and rel- can take advantage of load sharing in a evance of someone’s own workstation is simple manner. blurring. There are so many computers in A second path concerns clusters of work- use today that the issue of computing cy- stations. Recent advances in high speed cles becomes less relevant. Many comput- networking (e.g. ATM [Partridge, 1994] ers are simply servers that do not belong and Myrinet [Boden et al., 1995]) have re- to any single user, and at the same time duced the cost of migrating processes, al- the processing power is becoming increas- lowing even costly migration implementa- ingly cheap. A second aspect is that as the tions to be deployed. world becomes more and more connected, A third path, one closer to the con- the idea of someone else’s code arriving on sumers of the vast majority of today’s one’s workstation is not unfamiliar any- computers (Windows systems on Intel- more. Many security issues remain, but based platforms), would put process mi- they are being actively addressed by the gration right in the home or office. Sun re- mobile code and agents community. cently announced their Jini architecture In summary, we do not believe that for home electronics [, there is a need for any revolutionary de- 1998] and other similar systems are sure velopment in process migration to make it to follow. One can imagine a process start- widely used. We believe that it is a matter ing on a personal computer, and migrat- of time, technology development, and the ing its flow of control into another device changing needs of users that will trigger a in the same domain. Such activity would wider use of process migration. be similar to the migratory agents ap- proach currently being developed for the Web [Rothermel and Hohl, 1998]. 8. SUMMARY AND FURTHER RESEARCH Still another possible argument for pro- In this paper we have surveyed process cess migration, or another Worm-like fa- migration mechanisms. We have classi- cility for using vast processing capability fied and presented an overview of a num- across a wide range of machines, would ber of systems, and then discussed four be any sort of national or international case studies in more detail. Based on this computational effort. Several years ago, material, we have summarized various Quisquater and Desmedt [1991] suggested migration characteristics. Throughout the that the Chinese government could solve text we tried to assess some misconcep- complex problems (such as factoring large tions about process migration, as well as numbers) by permitting people to use the to discover the true reasons for the lack of processing power in their television sets, its wide acceptance. and offering a prize for a correct answer

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 291 as an incentive to encourage television ACKNOWLEDGMENTS owners to participate. In case of ex- We would like to thank Amnon Barak, David Black, traordinary need, process migration could Don Bolinger, Bill Bryant, Luca Cardelli, Steve provide the underlying mechanism for Chapin, Alan Downey, Shai Guday, Mor Harchol- large-scale computation across an ever- Balter, Dag Johansen, and Dan Teodosiu for pro- changing set of computers. viding many useful suggestions that significantly Finally, the most promising new op- improved the paper. The anonymous reviewers pro- portunity is the use of mobile agents in vided an extensive list of general, as well as very de- the Web. In this setting, both technical tailed, suggestions that have strengthened our focus, and sociological conditions differ from the presentation and correctness of the paper. We are in- typical distributed system where process debted to them and to the ACM Computing Surveys migration has been de-ployed (see the editor, Fred Schneider. analysis in Section 7.2). Instead of the processor pool and workstation models, REFERENCES the Web environment connects computers as interfaces to the “network-is-computer” ACCETTA, M., BARON, R., BOLOSKY, W., GOLUB,D., model. The requirements for transparency RASHID, R., TEVANIAN, A., AND YOUNG, M. 1986. Mach: A New Kernel Foundation for UNIX De- are relaxed, and user-specific solutions are velopment. Proceedings of the Summer USENIX preferred. Performance is dominated by Conference, 93–112. network latency and therefore state trans- AGRAWAL,R.AND EZZAT, A. 1987. Location Indepen- fer is not as dominant as it is on a lo- dent Remote Execution in NEST. IEEE Trans- cal area network; remote access and re- actions on Software Engineering 13, 8, 905–912. mote invocation are not competitive with ALON, N., BARAK, A., AND MANBER, U. 1987. On Dis- seminating Information Reliably without Broad- solutions based on mobility. Users are casting. Proceedings of the 7th International ready to allow foreign code to be down- Conference on Distributed Computing Systems, loaded to their computer if this code is 74–81. executed within a safe environment. In ALONSO,R.AND KYRIMIS, K. 1988. A Process Migra- addition, there are plenty of dedicated tion Implementation for a UNIX System. Pro- servers where foreign code can execute. ceedings of the USENIX Winter Conference, 365– 372. Heterogeneity is supported at the lan- AMARAL, P., JACQEMOT, C., JENSEN, P., LEA, R., AND guage level. Generally speaking, the use of MIROWSKI, A. 1992. Transparent Object Migra- mobile agents in a Web environment over- tion in COOL-2. Proceedings of the ECOOP. comes each of the real impediments to de- ANDERSEN, B. 1992. Load Balancing in the Fine- ploying process migration, and will be a Grained Object-Oriented Language Ellie. Pro- growing application of the technology (al- ceedings of the Workshop on Dynamic Object Placement and Load Balancing in Parallel and beit with new problems, such as security, Distributed Systems Programs, 97–102. that are currently being addressed by the ANDERSON, T. E., CULLER,D.E.,AND PATTERSON,D.A. mobile agents community). Mobile agents 1995. A Case for NOW (Networks of Worksta- bear a lot of similarity and deploy similar tions). IEEE Micro 15, 1, 54–64. techniques as process migration. ARTSY, Y., CHANG,Y.,AND FINKEL, R. 1987. Interpro- Process migration will continue to at- cess Communication in Charlotte. IEEE Soft- tract research independently of its success ware, 22–28. in market deployment. It is deemed an ARTSY,Y.AND FINKEL, R. 1989. Designing a Process Migration Facility: The Charlotte Experience. interesting, hard, and unsolved problem, IEEE Computer, 47–56. and as such is ideal for research. How- BAALBERGEN, E. H. 1988. Design and Implementa- ever, reducing the amount of transparency tion of Parallel Make. Computing Systems 1, and the OS-level emphasis is common for 135–158. each scenario we outlined above. Eventu- BANAWAN,S.A.AND ZAHORJAN, J. 1989. Load Sharing ally this may result in a less transpar- in Hierarchical Distributed Systems. Proceed- ings of the 1989 Winter Simulation Conference, ent OS support for migration, reflecting 963–970. the lack of transparency to the application BARAK,A.AND LITMAN, A. 1985. MOS: a Multicom- level while still providing certain guaran- puter Distributed Operating System. Software- tees about connectivity. Practice and Experience 15, 8, 725–737.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 292 D. S. Milojiˇci´cetal.

BARAK,A.AND SHILOH, A. 1985. A Distributed Load- Memory Management (XMM): Lessons Learned. Balancing Policy for a Multicomputer. Software- Software-Practice and Experience 28, 9, 1011– Practice and Experience 15, 9, 901–913. 1031. BARAK,A.AND WHEELER, R. 1989. MOSIX: An BODEN, N., COHEN, D., FELDERMAN, R. E., KULAWIK,A. Inte-grated Multiprocessor UNIX. Proceedings E., SEITZ, C. L., SEIZOVIC,J.N.,AND SU, W.-K. 1995. of the Winter 1989 USENIX Conference, 101– Myrinet: A Gigabit-per-Second Local Area Net- 112. work. IEEE Micro 15, 1, 29–38. BARAK, A., SHILOH, A., AND WHEELER, R. 1989. BOKHARI, S. H. 1979. Dual Processor Scheduling Flood Prevention in the MOSIX Load-Balancing with Dynamic Reassignment. IEEE Transac- Scheme. IEEE Technical Committee on Operat- tions on Software Engineering, SE-5, 4, 326–334. ing Systems Newsletter 3, 1, 24–27. BOMBERGER,A.C.,FRANTZ,W.S.,HARDY,A.C., BARAK, A., GUDAY,S.,AND WHEELER, R. G. 1993. The HARDY, N., LANDAU,C.R.,AND SHAPIRO,J.S. MOSIX Distributed Operating System. Springer 1992. The Key-KOS (R) Nanokernel Architec- Verlag. ture. USENIX Workshop on Micro-Kernels and BARAK, A., LADEN,O.,AND BRAVERMAN, A. 1995. The Other Kernel Architectures, 95–112. NOW MOSIX and its Preemptive Process Mi- BOND, A. M. 1993. Adaptive Task Allocation in gration Scheme. Bulletin of the IEEE Technical a Distributed Workstation Environment. Ph.D. Committee on Operating Systems and Applica- Thesis, Victoria University at Wellington. tion Environments 7, 2, 5–11. BONOMI,F.AND KUMAR, A. 1988. Adaptive Optimal BARBOU DES PLACES, F. B., STEPHEN,N.,AND REYNOLDS, Load Balancing in a Heterogeneous Multiserver F. D. 1996. Linux on the OSF Mach3 Micro- System with a Central Job Scheduler. Proceed- kernel. Proceedings of the First Conference on ings of the 8th International Conference on Dis- Freely Redistributable Software, 33–46. tributed Computing Systems, 500–508. BARRERA, J. 1991. A Fast Mach Network IPC Imple- BORGHOFF U. M. 1991. Catalogue of Distributed mentation. Proceedings of the Second USENIX File/Operating Systems. Springer Verlag. Mach Symposium, 1–12. BOWEN,N.S.,NIKOLAOU,C.N.,AND GHAFOOR,A. BASKETT, F.,HOWARD,J.,AND MONTAGUE, T.1977. Task 1988. Hierarchical Workload Allocation for Communication in DEMOS. Proceedings of the Distributed Systems. Proceedings of the 1988 In- 6th Symposium on OS Principles, 23–31. ternational Conference on Parallel Processing, BAUMANN, J., HOHL, F., ROTHERMEL, K., AND STRABER, II:102–109. M. 1998. Mole—Concepts of a Mobile Agent BROOKS, C., MAZER, M. S., MEEKS,S.,AND MILLER, System. World Wide Web 1, 3, 123–137. J. 1995. Application-Specific Proxy Servers as BEGUELIN, A., DONGARRA, J., GEIST, A., MANCHEK, R., HTTP Stream Transducers. Proceedings of the OTTO,S.,AND WALPOLE, J. 1993. PVM: Experi- Fourth International World Wide Web Confer- ences, Current Status and Future Directions. ence, 539–548. Proceedings of Supercomputing 1993, 765–766. BRYANT, B. 1995. Design of AD 2, a Distributed BERNSTEIN, P. A. 1996. Middleware: A Model for Dis- UNIX Operating System. OSF Research Insti- tributed System Services. Communications of tute. the ACM 39, 2, 86–98. BRYANT,R.M.AND FINKEL, R. A. 1981. A Stable Dis- BERSHAD, B., SAVAGE, S., PARDYAK, P., SIRER,E.G., tributed Scheduling Algorithm. Proceedings of FIUCZINSKI, M., BECKER, D., CHAMBERS,C.,AND the 2nd International Conference on Distributed EGGERS, S. 1995. Extensibility, Safety and Per- Computing Systems, 314–323. formance in the SPIN Operating System. Pro- BUGNION, E., DEVINE, S., GOVIL, K., AND ROSENBLUM,M. ceedings of the 15th Symposium on Operating 1997. Disco: running commodity operating sys- Systems Principles, 267–284. tems on scalable multiprocessors. ACM Transac- BHARAT,K.A.AND CARDELLI, L. 1995. Migratory Ap- tions on Computer Systems 15, 4, 412–447. plications. Proceedings of the Eight Annual ACM BUTTERFIELD,D.A.AND POPEK, G. J. 1984. Network Symposium on User Interface Software Technol- Tasking in the Locus Distributed UNIX System. ogy. Proceedings of the Summer USENIX Conference, BLACK, A., HUTCHINSON, N., JUL, E., LEVY, H., 62–71. AND CARTER, L. 1987. Distributed and Abstract CABRERA, L. 1986. The Influence of Workload on Types in Emerald. IEEE Transactions on Soft- Load Balancing Strategies. Proceedings of the ware Engineering, SE-13, 1, 65–76. Winter USENIX Conference, 446–458. BLACK, D., GOLUB, D., JULIN, D., RASHID, R., DRAVES, CARDELLI, L. 1995. A Language with Distributed R., DEAN, R., FORIN, A., BARRERA, J., TOKUDA, H., Scope. Proceedings of the 22nd Annual ACM MALAN,G.,AND BOHMAN, D. 1992. Microkernel Symposium on the Principles of Programming Operating System Architecture and Mach. Pro- Languages, 286–297. ceedings of the USENIX Workshop on Micro- CASAS, J., CLARK, D. L., CONURU, R., OTTO,S.W.,PROUTY, Kernels and Other Kernel Architectures, 11–30. R. M., AND WALPOLE, J. 1995. MPVM: A Migra- BLACK, D., MILOJICIˇ C´, D., DEAN, R., DOMINIJANNI, tion Transparent Version of PVM. Computing M., LANGERMAN, A., SEARS, S. 1998. Extended Systems 8, 2, 171–216.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 293

CASAVANT,T.L.AND KUHL, J. 1988a. A Taxonomy CONCEPCION,A.I.AND ELEAZAR, W. M. 1988. A of Scheduling in General-Purpose Distributed Testbed for Comparative Studies of Adap- Computing Systems. IEEE Transactions on Soft- tive Load Balancing Algorithms. Proceedings ware Engineering, SE-14(2), 141–152. of the Distributed Simulation Conference, 131– CASAVANT,T.L.AND KUHL, J. 1988b. Effects of Re- 135. sponse and Stability on Scheduling in Dis- DANNENBERG, R. B. 1982. Resource Sharing in a Net- tributed Computing systems. IEEE Transac- work of Personal Computers. Ph.D. Thesis, Tech- tions on Software Engineering, SE-14(11), 1578– nical Report CMU-CS-82-152, Carnegie Mellon 1588. University. CHAPIN, J., ROSENBLUM, M., DEVINE, S., LAHIRI, T., TEO- DANNENBERG,R.B.AND HIBBARD, P. G. 1985. A But- DOSIU,D.,AND GUPTA, A. 1995. Hive: Fault Con- ler Process for Resource Sharing on a Spice Ma- tainment for Shared-Memory Multiprocessors. chine. IEEE Transactions on Office Information Proceedings of the Fifteenth ACM Symposium on Systems 3, 3, 234–252. Operating Systems Principles, 12–25. DEARLE A. DI BONA R., FARROW J., HENSKENS F., CHAPIN, S. J. 1995. Distributed Scheduling Support L INDSTROM A., ROSENBERG J. AND VAUGHAN F. 1994. in the Presence of Autonomy. Proceedings of the Grasshopper: An Orthogonally Persistent Oper- 4th Heterogeneous Computing Workshop, IPPS, ating System. Computer Systems 7, 3, 289–312. 22–29. DEDIU, H., CHANG,C.H.,AND AZZAM, H. 1992. Heavy- CHAPIN, S. J. 1993. Scheduling Support Mecha- weight Process Migration. Proceedings of the nisms for Autonomous, Heterogeneous, Dis- Third Workshop on Future Trends of Distributed tributed Systems. Ph.D. Thesis, Technical Report Computing Systems, 221–225. CSD-TR-93-087, Purdue University. DENNING, P.J. 1980. Working Sets Past and Present. CHAPIN,S.J.AND SPAFFORD, E. H. 1994. Support IEEE Transactions on Software Engineering, for Implementing Scheduling Algorithms Using SE-6, 1, 64–84. MESSIAHS. Scientific Programming, 3, 325– DIKSHIT, P., TRIPATHI,S.K.,AND JALOTE, P. 1989. SA- 340. HAYOG: A Test Bed for Evaluating Dynamic CHAPIN, S. J. 1996. Distributed and Multiprocessor Load-Sharing Policies. Software-Practice and Scheduling. ACM Computing Surveys 28, 1, 233– Experience, 19, 411–435. 235. DOUGLIS,F.AND OUSTERHOUT, J. 1987. Process Migra- CHASE,J.S.,AMADOR,F.G.,LAZOWSKA,E.D.,LEVY, tion in the Sprite Operating System. Proceedings H. M., AND LITTLEFIELD, R. J. 1989. The Am- of the Seventh International Conference on Dis- ber System: Parallel Programming on a Net- tributed Computing Systems, 18–25. work of Multiprocessors. Proceedings of the 12th DOUGLIS, F. 1989. Experience with Process Migra- ACM Symposium on Operating Systems Princi- tion in Sprite. Proceedings of the USENIX Work- ples, 147–158. shop on Experiences with Distributed and Mul- CHERITON, D. R. 1988. The V Distributed System. tiprocessor Systems (WEBDMS), 59–72. Communications of the ACM 31, 3, 314–333. DOUGLIS, F. 1990. Transparent Process Migration in CHERITON, D. 1990. Binary Emulation of UNIX Us- the Sprite Operating System. Ph.D. Thesis, Tech- ing the V Kernel. Proceedings of the Summer nical Report UCB/CSD 90/598, CSD (EECS), USENIX Conference, 73–86. University of California, Berkeley. CHESS, D., B., G., HARRISON, C., LEVINE, D., PARRIS, DOUGLIS,F.AND OUSTERHOUT, J. 1991. Transparent C., AND TSUDIK, G. 1995. Itinerant Agents for Process Migration: Design Alternatives and the Mobile Computing. IEEE Personal Communica- Sprite Implementation. Software-Practice and tions Magazine. Experience 21, 8, 757–785. CHOU,T.C.K.AND ABRAHAM, J. A. 1982. Load Balanc- DUBACH, B. 1989. Process-Originated Migration in a ing in Distributed Systems. IEEE Transactions Heterogeneous Environment. Proceedings of the on Software Engineering, SE-8, 4, 401–419. 17th ACM Annual Computer Science Conference, CHOU,T.C.K.AND ABRAHAM, J. A. 1983. Load Redis- 98–102. tribution under Failure in Distributed Systems. EAGER, D., LAZOWSKA, E., AND ZAHORJAN, J. 1986a. A IEEE Transactions on Computers, C-32, 9, 799– Comparison of Receiver-Initiated and Sender- 808. Initiated Adaptive Load Sharing. Performance CHOW, Y.-C. AND KOHLER, W. H. 1979. Models for Dy- Evaluation 6, 1, 53–68. namic Load Balancing in a Heterogeneous Mul- EAGER, D., LAZOWSKA, E., AND ZAHORJAN, J. 1986b. tiple Processor System. IEEE Transactions on Dynamic Load Sharing in Homogeneous Dis- Computers, C-28, 5, 354–361. tributed Systems. IEEE Transactions on Soft- COHN, D. L., DELANEY,W.P.,AND TRACEY,K.M. ware Engineering 12, 5, 662–675. 1989. Arcade: A Platform for Distributed Op- EAGER, D., LAZOWSKA, E., AND ZAHORJAN, J. 1988. erating Systems. Proceedings of the USENIX The Limited Performance Benefits of Migrat- Workshop on Experiences with Distributed ing Active Processes for Load Sharing. Proceed- and Multiprocessor Systems (WEBDMS), 373– ings of the 1988 ACM SIGMETRICS Confer- 390. ence on Measurement and Modeling of Computer

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 294 D. S. Milojiˇci´cetal.

Systems, Performance Evaluation Review 16,1, of Distributed Systems. Proceedings of the 8th 63–72. International Conference on Parallel Processing, EFE, K. 1982. Heuristic Models of Task Assignment 728–734. Scheduling in Distributed Systems. IEEE Com- GOLUB, D., DEAN, R., FORIN, A., AND RASHID, R. 1990. puter, 15, 6, 50–56. UNIX as an Application Program. Proceedings EFE,K.AND GROSELJ, B. 1989. Minimizing Control of the Summer USENIX Conference, 87–95. Overheads in Adaptive Load Sharing. Proceed- GOPINATH,P.AND GUPTA, R. 1991. A Hybrid Ap- ings of the 9th International Conference on Dis- proach to Load Balancing in Distributed Sys- tributed Computing Systems, 307–315. tems. Proceedings of the USENIX Symposium ENGLER, D. R., KAASHOEK,M.F.,AND O’TOOLE,J.J. on Experiences with Distributed and Multipro- 1995. Exokernel: An Operating System Archi- cessor Systems, 133–148. tecture for Application-Level Resource Manage- GOSCINSKI, A. 1991. Distributed Operating Systems: ment. Proceedings of the 15th Symposium on Op- The Logical Design. Addison Wesley. erating Systems Principles, 267–284. GOSLING, J., JOY,B.,AND STEELE, G. 1996. The Java ESKICIOGLU, M. R. 1990. Design Issues of Process Language Specification. Addison Wesley. Migration Facilities in Distributed Systems. GRAY, R. 1995. Agent Tcl: A flexible and secure IEEE Technical Committee on Operating Sys- mobileagent system. Ph.D. thesis, Technical Re- tems Newsletter 4, 2, 3–13. port TR98-327, Department of Computer Sci- EZZAT, A., BERGERON,D.,AND POKOSKI, J. 1986. Task ence, Dartmouth College, June 1997. Allocation Heuristics for Distributed Comput- GRIMSHAW,A.AND WULF, W. 1997. The Legion Vision ing Systems. Proceedings of the 6th International of a Worldwide Virtual Computer. Communica- Conference on Distributed Computing Systems. tions of the ACM 40, 1, 39–45. FARMER, W. M., GUTTMAN,J.D.,AND SWARUP, V. 1996. GUPTA,R.AND GOPINATH, P. 1990. A Hierarchical Ap- Security for Mobile Agents: Issues and Require- proach to Load Balancing in Distributed Sys- ments. Proceedings of the National Information tems. Proceedings of the Fifth Distributed Mem- Systems Security Conference, 591–597. ory Computing Conference, II, 1000–1005. FEITELSON,D.G.AND RUDOLPH, L. 1990. Mapping HAC, A. 1989a. A Distributed Algorithm for Perfor- and Scheduling in a Shared Parallel Environ- mance Improvement Through File Replication, ment Using Distributed Hierarchical Control. File Migration, and Process Migration. IEEE Proceedings of the 1990 International Conference Transactions on Software Engineering 15, 11, on Parallel Processing, I: 1–8. 1459–1470. FERRARI,D.AND ZHOU., S. 1986. A Load Index for Dy- HAC, A. 1989b. Load Balancing in Distributed Sys- namic Load Balancing. Proceedings of the 1986 tems: A Summary. Performance Evaluation Re- Fall Joint Computer Conference, 684–690. view, 16, 17–25. FINKEL, R., SCOTT, M., ARTSY,Y.,AND CHANG, H. 1989. HAERTIG, H., KOWALSKI,O.C.,AND KUEHNHAUSER,W.E. Experience with Charlotte: Simplicity and Func- 1993. The BirliX Security Architecture. tion in a Distributed Operating system. IEEE HAGMANN, R. 1986. Process Server: Sharing Pro- Transactions on Software Engineering, SE-15,6, cessing Power in a Workstation Environment. 676–685. Proceedings of the 6th International Conference FLEISCH,B.D.AND POPEK, G. J. 1989. Mirage: A Co- on Distributed Computing Systems, 260–267. herent Distributed Shared Memory Design. Pro- HAMILTON,G.AND KOUGIOURIS, P. 1993. The Spring ceedings of the 12th ACM Symposium on Oper- Nucleus: A Microkernel for Objects. Proceedings ating System Principles, 211–223. of the 1993 Summer USENIX Conference, 147– FREEDMAN, D. 1991. Experience Building a Process 160. Migration Subsystem for UNIX. Proceedings of HAN,Y.AND FINKEL, R. 1988. An Optimal Scheme for the Winter USENIX Conference, 349–355. Disseminating Information. Proceedings of the GAIT, J. 1990. Scheduling and Process Migration in 1988 International Conference on Parallel Pro- Partitioned Multiprocessors. Journal of Parallel cessing, II, 198–203. and Distributed Computing 8, 3, 274–279. HARCHOL-BALTER,M.AND DOWNEY, A. 1997. Exploit- GAO, C., LIU,J.W.S.,AND RAILEY, M. 1984. Load Bal- ing Process Lifetime Distributions for Dynamic ancing Algorithms in Homogeneous Distributed Load Balancing. ACM Transactions on Com- Systems. Proceedings of the 1984 International puter Systems 15, 3, 253–285. Previously ap- Conference on Parallel Processing, 302–306. peared in the Proceedings of ACM Sigmetrics GERRITY,G.W.,GOSCINSKI, A., INDULSKA, J., TOOMEY,W., 1996 Conference on Measurement and Modeling AND ZHU, W. 1991. Can We Study Design Issues of Computer Systems, 13–24, May 1996. of Distributed Operating Systems in a Gener- HILDEBRAND, D. 1992. An Architectural Overview of alized Way? Proceedings of the Second USENIX QNX. Proceedings of the USENIX Workshop on Symposium on Experiences with Distributed and Micro-Kernels and Other Kernel Architectures, Multiprocessor Systems, 301–320. 113–126. GOLDBERG,A.AND JEFFERSON, D. 1987. Transparent HOFMANN, M.O., MCGOVERN, A., AND WHITEBREAD,K. Process Cloning: A Tool for Load Management 1998. Mobile Agents on the Digital Battlefield.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 295

Proceedings of the Autonomous Agents ’98, 219– KRUEGER,P.AND LIVNY, M. 1987. The Diverse 225. Objectives of Distributed Scheduling Policies. HOHL, F. 1998. A Model of Attacks of Malicious Proceedings of the 7th International Conference Hosts Against Mobile Agents. Proceedings of the on Distributed Computing Systems, 242–249. 4th Workshop on Mobile Objects Systems, INRIA KRUEGER,P.AND LIVNY, M. 1988. A Comparison of Technical Report, 105–120. Preemptive and Non-Preemptive Load Balanc- HWANG, K., CROFT, W., WAH, B., BRIGGS, F., SIMONS, ing. Proceedings of the 8th International Con- W., AND COATES, C. 1982. A UNIX-Based Local ference on Distributed Computing Systems, 123– Computer Network with Load Balancing. IEEE 130. Computer, 15, 55–66. KRUEGER,P.AND CHAWLA, R. 1991. The Stealth Dis- JACQMOT, C. 1996. Load Management in Dis- tributed Scheduler. Proceedings of the 11th Inter- tributed Computing Systems: Towards Adaptive national Conference on Distributed Computing Strategies. Technical Report, Ph.D. Thesis, De- Systems, 336–343. partement d’Ingenierie Informatique, Universite KUNZ, T. 1991. The Influence of Different Work- catholique de Louvain. load Descriptions on a Heuristic Load Balancing Scheme. IEEE Transactions on Software Engi- JOHANSEN,D.,VAN R ENESSE, R., AND SCHNEIDER, F. 1995. Operating System Support for Mobile Agents. neering 17, 7, 725–730. Proceedings of the 5th Workshop on Hot Topics LAMPSON, B. 1983. Hints for Computer System De- in Operating Systems, 42–45. sign. Proceedings of the Ninth Symposium on Operating System Principles, 33–48. JUL, E., LEVY, H., HUTCHINSON,N.,AND BLACK, A. 1988. Fine-Grained Mobility in the Emerald System. LANGE,D.AND OSHIMA, M. 1998. Programming Mo- ACM Transactions on Computer Systems 6,1, bile Agents in JavaTM -With the Java Aglet API. 109–133. Addison Wesley Longman. JUL, E. 1988. Object Mobility in a Distributed LAZOWSKA,E.D.,LEVY, H. M., ALMES,G.T.,FISHER, Object-Oriented System. Technical Report 88- M. J., FOWLER,R.J.,AND VESTAL, S. C. 1981. The 12-06, Ph.D. Thesis, Department of Computer Architecture of the Eden System. Proceedings of Science, University of Washington, Also Techni- the 8th ACM Symposium on Operating Systems cal Report no. 98/1, University of Copenhagen Principles, 148–159. DIKU. LEA, R., JACQUEMOT,C.,AND PILLVESSE, E. 1993. JUL, E. 1989. Migration of Light-weight Processes COOL: System Support for Distributed Pro- in Emerald. IEEE Technical Committee on Op- gramming. Communications of the ACM 36,9, erating Systems Newsletter, 3(1)(1):20–23. 37–47. KAASHOEK,M.F.,VAN R ENESSE, R., VAN S TAVEREN, H., LELAND,W.AND OTT, T. 1986. Load Balancing AND TANENBAUM, A. S. 1993. FLIP: An Internet- Heuristics and Process Behavior. Proceedings of work Protocol for Supporting Distributed Sys- the SIGMETRICS Conference, 54–69. tems. ACM Transactions on Computer Systems, LIEDTKE, J. 1993. Improving IPC by Kernel Design. 11(1). Proceedings of the Fourteenth Symposium on Op- KEMPER, A., KOSSMANN, D. 1995. Adaptable Pointer erating Systems Principles, 175–188. Swizzling Strategies in Object Bases: Design, LITZKOW, M 1987. Remote UNIX-Turning Idle Realization, and Quantitative Analysis. VLDB Work-stations into Cycle Servers. Proceedings Journal 4(3): 519-566(1995). of the Summer USENIX Conference, 381– KHALIDI, Y. A., BERNABEU, J. M., MATENA, V., SHIRIFF, 384. K., AND THADANI, M. 1996. Solaris MC: A Multi- LITZKOW, M., LIVNY, M., AND MUTKA, M. 1988. Condor Computer OS. Proceedings of the USENIX 1996 A Hunter of Idle Workstations. Proceedings of Annual Technical Conference, 191–204. the 8th International Conference on Distributed KLEINROCK, L. 1976. Queueing Systems vol. 2: Com- Computing Systems, 104–111. puter Applications. Wiley, New York. LITZKOW,M.AND SOLOMON, M. 1992. Supporting KNABE, F. C. 1995. Language Support for Mo- Checkpointing and Process Migration outside bile Agents. Technical Report CMU-CS-95-223, the UNIX Kernel. Proceedings of the USENIX Ph.D. Thesis, School of Computer Science, Winter Conference, 283–290. Carnegie Mellon University, Also Technical Re- LIVNY,M.AND MELMAN, M. 1982. Load Balancing in port ECRC-95-36, European Computer Industry Homogeneous Broadcast Distributed Systems. Research Centre. Proceedings of the ACM Computer Network Per- KOTZ, D., GRAY, R., NOG, S., RUS, D., CHAWLA,S.,AND formance Symposium, 47–55. CYBENKO., G. 1997. Agent Tcl: Targeting the LO, V. 1984. Heuristic Algorithms for Task Assign- needs of mobile computers. IEEE Internet Com- ments in Distributed Systems. Proceedings of puting 1, 4, 58–67. the 4th International Conference on Distributed KREMIEN,O.AND KRAMER, J. 1992. Methodical Anal- Computing Systems, 30–39. ysis of Adaptive Load Sharing Algorithms. IEEE LO, V. 1989. Process Migration for Communication Transactions on Parallel and Distributed Sys- Performance. IEEE Technical Committee on Op- tems 3, 6, 747–760. erating Systems Newsletter 3, 1, 28–30.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 296 D. S. Milojiˇci´cetal.

LO, V. 1988. Algorithms for Task Assignment and 194. Also appeared in IEE Proceedings— Contraction in Distributed Computing Systems. Distributed Systems Engineering, 5, 1–14, 1998. Proceedings of the 1988 International Conference MILOJICIˇ C´, D., DOUGLIS, F., WHEELER, R. 1999. Mobil- on Parallel Processing, II, 239–244. ity: Processes, Computers, and Agents. Addison- LOUBOUTIN, S. 1991. An Implementation of a Pro- Wesley Longman and ACM Press. cess Migration Mechanism using Minix. Pro- MILOJICIˇ C´, D., GIESE,P.,AND ZINT, W. 1993a. Expe- ceedings of 1991 European Autumn Conference, riences with Load Distribution on Top of the Budapest, Hungary, 213–224. Mach Microkernel. Proceedings of the USENIX LU, C., CHEN, A., AND LIU, J. 1987. Protocols for Re- Symposium on Experiences with Distributed and liable Process Migration. INFOCOM 1987, The Multiprocessor Systems. 6th Annual Joint Conference of IEEE Computer MILOJICIˇ C´, D., ZINT, W., DANGEL, A., AND GIESE,P. and Communication Societies. 1993b. Task Migration on the top of the Mach LU, C. 1988. Process Migration in Distributed Sys- Microkernel. Proceedings of the third USENIX tems. Ph.D. Thesis, Technical Report, University Mach Symposium, 273–290.fc of Illinois at Urbana-Champaign. MILOJICIˇ C´, D. 1993c. Load Distribution, Implemen- LUX, W., HAERTIG, H., AND KUEHNHAUSER, W. E. 1993. tation for the Mach Microkernel. Ph.D. Thesis, Migrating Multi-Threaded, Shared Objects. Technical Report, University of Kaiserslautern. Proceedings of 26th Hawaii International Con- Also Vieweg, Wiesbaden, 1994. ference on Systems Sciences, II, 642–649. MILOJICIˇ C´, D., LANGERMAN, A., BLACK, D., SEARS,S., LUX, W. 1995. Adaptable Object Migration: Concept DOMINIJANNI, M., AND DEAN, D. 1997. Concur- and Implementation. Operating Systems Review rency, a Case Study in Remote Tasking and Dis- 29, 2, 54–69. tributed IPC. IEEE Concurrency 5, 2, 39–49. MA,P.AND LEE, E. 1982. A Task Allocation Model for MIRCHANDANEY, R., TOWSLEY,D.,AND STANKOVIC,J. Distributed Computing Systems. IEEE Transac- 1989. Analysis of the Effects of Delays on Load tions on Computers, C-31, 1, 41–47. Sharing. IEEE Transactions on Computers 38, 11, 1513–1525. MAGUIRE,G.AND SMITH, J. 1988. Process Migra- tions: Effects on Scientific Computation. ACM MIRCHANDANEY, R., TOWSLEY,D.,AND STANKOVIC,J. SIGPLAN Notices, 23, 2, 102–106. 1990. Adaptive Load Sharing in Heteroge- neous Distributed Systems. Journal of Parallel MALAN, G., RASHID, R., GOLUB,D.,AND BARON, R. 1991. DOS as a Mach 3.0 Application. Proceedings and Distributed Computing, 331–346. of the Second USENIX Mach Symposium, 27– MULLENDER,S.J.,VAN R OSSUM, G., TANENBAUM,A. 40. S., VAN R ENESSE, R., AND VAN STAVEREN,H. 1990. Amoeba—A Distributed Operating Sys- MANDELBERG,K.AND SUNDERAM, V. 1988. Process Migration in UNIX Networks. Proceedings of tem for the 1990s. IEEE Computer 23,5,44– USENIX Winter Conference, 357–363. 53. MUTKA,M.AND LIVNY, M. 1987. Scheduling Remote MEHRA,P.AND WAH, B. W. 1992. Physical Level Syn- thetic Workload Generation for Load-Balancing Processing Capacity in a Workstation Processor Experiments. Proceedings of the First Sympo- Bank Computing System. Proceedings of the 7th sium on High Performance Distributed Comput- International Conference on Distributed Com- ing, 208–217. puting Systems, 2–7. NELSON,M.N.,AND OUSTERHOUT, J. K. 1988. Copy- MILLER,B.AND PRESOTTO, D. 1981. XOS: an Operat- ing System for the XTREE Architecture. Oper- on-Write for Sprite. Proceedings of the Summer ating Systems Review, 2, 15, 21–32. 1988 USENIX Conference, 187–201. NELSON,M.N.,WELCH,B.B.,AND OUSTERHOUT,J.K. MILLER, B., PRESOTTO,D.,AND POWELL, M. 1987. DE- MOS/ MP: The Development of a Distributed Op- 1988. Caching in the Sprite Network File Sys- erating System. Software-Practice and Experi- tem. ACM Transaction on Computer Systems 6, ence 17, 4, 277–290. 1, 134–54. NELSON,R.AND SQUILLANTE, M. 1995. Stochastic MILOJICIˇ C´,D.S.,BREUGST, B., BUSSE, I., CAMPBELL,J., Analysis of Affinity Scheduling and Load Bal- COVACI, S., FRIEDMAN, B., KOSAKA, K., LANGE,D., ancing in Parallel Processing Systems. IBM Re- ONO, K., OSHIMA, M., THAM, C., VIRDHAGRISWARAN, search Report RC 20145. S., AND WHITE, J. 1998b. MASIF, The OMG Mo- bile Agent System Interoperability Facility. Pro- NI,L.M.AND HWANG, K. 1985. Optimal Load Bal- ceedings of the Second International Workshop ancing in a Multiple Processor System with on Mobile Agents, pages 50–67. Also appeared Many Job Classes. IEEE Transactions on Soft- in Springer Journal on Personal Technologies, 2, ware Engineering, SE-11, 5, 491–496. 117–129, 1998. NICHOLS, D. 1987. Using Idle Workstations in a MILOJICIˇ C´,D.S.,CHAUHAN,D.,AND LAFORGE,W. Shared Computing Environment. Proceedings of 1998a. Mobile Objects and Agents (MOA), De- the 11th Symposium on OS Principles, 5–12. sign, Implementation and Lessons Learned. NICHOLS, D. 1990. Multiprocessing in a Network Proceedings of the 4th USENIX Conference on of Workstations. Ph.D. Thesis, Technical Report Object-Oriented Technologies (COOTS), 179– CMU-CS-90-107, Carnegie Mellon University.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 297

NUTTAL, M. 1994. Survey of Systems Providing Pro- RANGANATHAN, M., ACHARYA, A., SHARMA,S.D., cess or Object Migration. Operating System Re- AND SALTZ, J. 1997. Networkaware Mobile Pro- view, 28, 4, 64–79. grams. Proceedings of the USENIX 1997 Annual OMG, 1996. Common Object Request Broker Archi- Technical Conference, 91–103. tecture and Specification. Object Management RASHID,R.AND ROBERTSON, G. 1981. Accent: a Com- Group Document Number 96.03.04. munication Oriented Network Operating Sys- OUSTERHOUT, J., CHERENSON, A., DOUGLIS, F., NELSON, tem Kernel. Proceedings of the 8th Symposium M., AND WELCH, B. 1988. The Sprite Network on Operating System Principles, 64–75. Operating System. IEEE Computer, 23–26. RASHID, R. 1986. From RIG to Accent to Mach: The OUSTERHOUT, J. 1994. TcL and the Tk Toolkit. Evolution of a Network Operating System. Pro- Addison-Wesley Longman. ceedings of the ACM/IEEE Computer Society Fall Joint Computer Conference, 1128–1137. PAINDAVEINE,Y.AND MILOJICIˇ C´, D. 1996. Process v. Task Migration. Proceedings of the 29th Annual ROSENBERRY, W., KENNEY,D.,AND FISHER, G. 1992. Hawaii International Conference on System Sci- Understanding DCE. O’Reilly & Associates, Inc. ences, 636–645. ROTHERMEL, K., AND HOHL, F. 1998. Mobile Agents. Proceedings of the Second International Work- PARTRIDGE, C. 1994. Gigabit Networking. Addison Wesley. shop, MA’98, Springer Verlag. ROUSH, E. T. 1995. The Freeze Free Algorithm PEINE,H.AND STOLPMANN, T. 1997. The Architecture for process Migration. Ph.D. Thesis, Techni- of the Ara Platform for Mobile Agents. Proceed- cal Report, University of Illinois at Urbana- ings of the First International Workshop on Mo- Champaign. bile Agents (MA’97). LNCS 1219, Springer Ver- lag, 50–61. ROUSH,E.T.AND CAMPBELL, R. 1996. Fast Dynamic Process Migration. Proceedings of the Interna- PETRI,S.AND LANGENDORFER, H. 1995. Load Balanc- tional Conference on Distributed Computing Sys- ing and Fault Tolerance in Workstation Clusters tems, 637–645. Migrating Groups of Communicating Processes. Operating Systems Review 29 4, 25–36. ROWE,L.AND BIRMAN, K. 1982. A Local Network Based on the UNIX Operating System. IEEE PHELAN,J.M.AND ARENDT, J. W. 1993. An OS/2 Transactions on Software Engineering, SE-8,2, Personality on Mach. Proceedings of the third 137–146. USENIX Mach Symposium, 191–202. ROZIER, M. 1992. Chorus (Overview of the Chorus PHILIPPE, L. 1993. Contribution al’` etude´ et la Distributed Operating System). USENIX Work- realisation´ d’un systeme` d’exploitation a` im- shop on Micro Kernels and Other Kernel Archi- age unique pour multicalculateur. Ph.D. Thesis, tectures, 39–70. Technical Report 308, Universite´ de Franche- SCHILL AND OCK ++ comte.´ ,A. M , M. 1993. DC : Distributed Object Oriented System Support on top of OSF PIKE, R., PRESOTTO, D., THOMPSON, K., AND TRICKEY,H. DCE. Distributed Systems Engineering 1,2, 1990. . Proceedings of the 112–125. UKUUG Summer 1990 Conference, 1–9. SCHRIMPF, H. 1995. Migration of Processes, Files PLATFORM COMPUTING. 1996. LSF User’s and Admin- and Virtual Devices in the MDX Operating Sys- istrator’s Guides, Version 2.2, Platform Comput- tem. Operating Systems Review 29, 2, 70–81. ing Corporation. SHAMIR,E.AND UPFAL, E. 1987. A Probabilistic Ap- POPEK, G., WALKER,B.J.,CHOW, J., EDWARDS,D., proach to the Load Sharing Problem in Dis- KLINE, C., RUDISIN,G.,AND THIEL, G. 1981. Lo- tributed Systems. Journal of Parallel and Dis- cus: a Network-Transparent, High Reliability tributed Computing, 4, 5, 521–530. Distributed System. Proceedings of the 8th Sym- SHAPIRO, M. 1986. Structure and Encapsulation posium on Operating System Principles, 169– in Distributed Systems: The PROXY Princi- 177. ple. Proceedings of the 6th International Con- POPEK,G.AND WALKER, B. 1985. The Locus Dis- ference on Distributed Computing Systems, 198– tributed System Architecture. MIT Press. 204. POWELL,M.AND MILLER, B. 1983. Process Migra- SHAPIRO, M., DICKMAN,P.,AND PLAINFOSSE´, D. 1992. tion in DEMOS/MP. Proceedings of the 9th Sym- Robust, Distributed References and Acyclic posium on Operating Systems Principles, 110– Garbage Collection. Proceedings of the Sympo- 119. sium on Principles of Distributed Computing, PU, C., AUTREY, T., BLACK, A., CONSEL, C., COWAN, C., IN- 135–146. OUYE, J., KETHANA, L., WALPOLE,J.,AND ZHANG,K. SHAPIRO, M., GAUTRON,P.,AND MOSSERI, L. 1989. Per- 1995. Optimistic Incremental Specialization. sistence and Migration for C++ Objects. Pro- Proceedings of the 15th Symposium on Operat- ceedings of the ECOOP 1989-European Confer- ing Systems Principles, 314–324. ence on Object-Oriented Programming. QUISQUATER, J.-J. AND DESMEDT, Y. G. 1991. Chinese SHIVARATRI,N.G.AND KRUEGER, P. 1990. Two Adap- Lotto as an Exhaustive Code-Breaking Machine. tive Location Policies for Global Scheduling Al- IEEE Computer 24, 11, 14–22. gorithms. Proceedings of the 10th International

ACM Computing Surveys, Vol. 32, No. 3, September 2000. 298 D. S. Milojiˇci´cetal.

Conference on Distributed Computing Systems, STUMM, M. 1988. The Design and Implementation 502–509. of a Decentralized Scheduling Facility for a SHIVARATRI, N., KRUEGER,P.,AND SINGHAL, M. 1992. Workstation Cluster. Proceedings of the Sec- Load Distributing for Locally Distributed Sys- ond Conference on Computer Workstations, 12– tems. IEEE Computer, 33–44. 22. TM SHOHAM, Y. 1997. An Overview of Agent-oriented SUN MICROSYSTEMS. 1998. Jini Software Simpli- Programming. in J. M. Bradshaw, editor, Soft- fies Network Computing. http://www.sun.com/ ware Agents, 271–290. MIT Press. 980713/jini/feature.jhtml. SHOCH,J.AND HUPP, J. 1982. The Worm Programs— SVENSSON, A. 1990. History, an Intelligent Load Early Experience with Distributed Computing. Sharing Filter. Proceedings of the 10th Interna- Communications of the ACM 25, 3, 172–180. tional Conference on Distributed Computing Sys- tems, 546–553. SHUB, C. 1990. Native Code Process-Originated Mi- gration in a Heterogeneous Environment. Pro- SWANSON, M., STOLLER, L., CRITCHLOW,T.,AND KESSLER, ceedings of the 18th ACM Annual Computer Sci- R. 1993. The Design of the Schizophrenic ence Conference, 266–270. Workstation System. Proceedings of the third USENIX Mach Symposium, 291–306. SINGHAL,M.AND SHIVARATRI, N. G. 1994. Advanced Concepts in Operating Systems. McGraw-Hill. TANENBAUM, A. S., RENESSE,R.VAN,STAVEREN,H.VAN., SHARP,G.J.,MULLENDER,S.J.,JANSEN,A.J., SINHA, P., MAEKAWA, M., SHIMUZU, K., JIA, X., ASHIHARA, AND VAN ROSSUM, G. 1990. Experiences with the TSUNOMIYA, N., PARK, AND NAKANO, H. 1991. U Amoeba Distributed Operating System. Com- The Galaxy Distributed Operating System. munications of the ACM, 33, 12, 46–63. IEEE Computer 24, 8, 34–40. TANENBAUM, A. 1992. Modern Operating Systems. SKORDOS, P. 1995. Parallel Simulation of Subsonic Prentice Hall, Englewood Cliffs, New Jersey. Fluid Dynamics on a Cluster of Workstations. Proceedings of the Fourth IEEE International TARDO,J.AND VALENTE, L. 1996. Mobile Agent Symposium on High Performance Distributed Security and Telescript. Proceedings of Computing. COMPCON’96, 52–63. TEODOSIU, D. 2000. End-to-End Fault Containment SMITH, J. M. 1988. A Survey of Process Migration Mechanisms. Operating Systems Review 22,3, in Scalable Shared-Memory Multiprocessors. 28–40. Ph.D. Thesis, Technical Report, Stanford Univer- sity. SMITH,J.M.AND IOANNIDIS, J. 1989. Implement- ing Remote fork() with Checkpoint-Restart. THEIMER,M.H.AND HAYES, B. 1991. Heterogeneous IEEE Technical Committee on Operating Sys- Process Migration by Recompilation. Proceed- tems Newsletter 3, 1, 15–19. ings of the 11th International Conference on Dis- tributed Computer Systems, 18–25. SMITH,P.AND HUTCHINSON, N. 1998. Heterogeneous THEIMER,M.AND LANTZ, K. 1988. Finding Idle Ma- Process Migration: The Tui System. Software- chines in a Workstation-Based Distributed Sys- Practice and Experience 28, 6, 611–639. tem. IEEE Transactions on Software Engineer- SOH,J.AND THOMAS, V. 1987. Process Migration for ing, SE-15, 11, 1444–1458. Load Balancing in Distributed Systems. TEN- THEIMER, M., LANTZ, K., AND CHERITON, D. 1985. Pre- CON, 888–892. emptable Remote Execution Facilities for the V SQUILLANTE,M.S.AND NELSON, R. D. 1991. Analysis System. Proceedings of the 10th ACM Sympo- of Task Migration in Shared-Memory Multipro- sium on OS Principles, 2–12. cessor Scheduling. Proceedings of the ACM SIG- TRACEY, K. M. 1991. Processor Sharing for Coopera- METRICS Conference 19, 1, 143–155. tive Multi-task Applications. Ph.D. Thesis, Tech- STANKOVIC, J. A. 1984. Simulation of the three Adap- nical Report, Department of Electrical Engineer- tive Decentralized Controlled Job Scheduling al- ing, Notre Dame, Indiana. gorithms. Computer Networks, 199–217. TRITSCHER,S.AND BEMMERL, T. 1992. Seitenorien- STEENSGAARD,B.AND JUL, E. 1995. Object and Na- tierte Prozessmigration als Basis fuer Dynamis- tive Code Thread Mobility. Proceedings of the chen Lastausgleich. GI/ITG Pars Mitteilungen, 15th Symposium on Operating Systems Princi- no 9, 58–62. ples, 68–78. TSCHUDIN, C. 1997. The Messenger Environment STEKETEE, C., ZHU,W.,AND MOSELEY, P. 1994. Imple- M0 - A condensed description. In Mobile Object mentation of Process Migration in Amoeba. Pro- Systems: Towards the Programmable Internet, ceedings of the 14th International Conference on LNCS 1222, Springer Verlag, 149–156. Distributed Computer Systems, 194–203. VAN D IJK,G.J.W.AND VAN GILS, M. J. 1992. Efficient STONE, H. 1978. Critical Load Factors in Two- process migration in the EMPS multiprocessor Processor Distributed Systems. IEEE Transac- system. Proceedings 6th International Parallel tions on Software Engineering, SE-4, 3, 254–258. Processing Symposium, 58–66. STONE,H.S.AND BOKHARI, S. H. 1978. Control of Dis- VAN R ENESSE, R., BIRMAN,K.P.,AND MAFFEIS, S. 1996. tributed Processes. IEEE Computer 11,7,97– Horus: A Flexible Group Communication Sys- 106. tem. Communication of the ACM 39, 4, 76–85.

ACM Computing Surveys, Vol. 32, No. 3, September 2000. Process Migration 299

VASWANI,R.AND ZAHORJAN, J. 1991. The implications WIECEK, C. A. 1992. A Model and Prototype of VMS of Cache Affinity on Processor Scheduling for Using the Mach 3.0 Kernel. Proceedings of the Multiprogrammed Shared Memory Multiproces- USENIX Workshop on Micro-Kernels and Other sors. Proceedings of the Thirteenth Symposium Kernel Architectures, 187–204. on Operating Systems Principles, 26–40. WONG, R., WALSH,T.,AND PACIOREK, N. 1997. Con- VENKATESH,R.AND DATTATREYA, G. R. 1990. Adap- cordia: An Infrastructure for Collaborating tive Optimal Load Balancing of Loosely Cou- Mobile Agents. Proceedings of the First Interna- pled Processors with Arbitrary Service Time tional Workshop on Mobile Agents, LNCS 1219, Distributions. Proceedings of the 1990 Interna- Springer Verlag, 86–97. tional Conference on Parallel Processing, I, 22– XU,J.AND HWANG, K. 1990. Heuristic Methods for 25. Dynamic Load Balancing in a Message-Passing VIGNA, G. 1998. Mobile Agents Security, LNCS . Proceedings of the Supercom- 1419, Springer Verlag. puting’90, 888–897. VITEK, I., SERRANO, M., AND THANOS, D. 1997. Secu- ZAJCEW, R., ROY, P., BLACK, D., PEAK, C., GUEDES, rity and Communication in Mobile Object Sys- P., K EMP, B., LOVERSO, J., LEIBENSPERGER, M., tems. In Mobile Object Systems: Towards the BARNETT, M., RABII,F.,AND NETTERWALA, D. 1993. Programmable Internet, LNCS 1222, Springer An OSF/1 UNIX for Massively Parallel Multi- Verlag, 177–200. computers. Proceedings of the Winter USENIX WALKER, B., POPEK, G., ENGLISH, R., KLINE,C.,AND Conference, 449–468. THIEL, G. 1983. The LOCUS Distributed Oper- ZAYAS, E. 1987a. Attacking the Process Migration ating System. Proceedings of the 9th Symposium Bottleneck. Proceedings of the 11th Symposium on Operating Systems Principles 17, 5, 49–70. on Operating Systems Principles, 13–24. WALKER,B.J.AND MATHEWS, R. M. 1989. Process Mi- ZAYAS, E. 1987b. The Use of Copy-on-Reference in gration in AIX’s Transparent Computing Facil- a Process Migration System. Ph.D. Thesis, Tech- ity (TCF). IEEE Technical Committee on Oper- nical Report CMU-CS-87-121, Carnegie Mellon ating Systems Newsletter, 3, 1, (1) 5–7. University. WANG, Y.-T. AND MORRIS, R. J. T. 1985. Load Sharing ZHOU, D. 1987. A Trace-Driven Simulation Study of in Distributed Systems. IEEE Transactions on Dynamic Load Balancing. Ph.D. Thesis, Techni- Computers, C-34, 3, 204–217. cal Report UCB/CSD 87/305, CSD (EECS), Uni- WANG, C.-J., KRUEGER,P.,AND LIU, M. T. 1993. In- versity of California, Berkeley. telligent Job Selection for Distributed Schedul- ZHOU,S.AND FERRARI, D. 1987. An Experimen- ing. Proceedings of the 13th International Con- tal Study of Load Balancing Performance. Pro- ference on Distributed Computing Systems, 288– ceedings of the 7th IEEE International Confer- 295. ence on Distributed Computing Systems, 490– WELCH,B.B.AND OUSTERHOUT, J. K. 1988. Pseudo- 497. Devices: User-Level Extensions to the Sprite File ZHOU,S.AND FERRARI, D. 1988. A Trace-Driven Sim- System. Proceedings of the USENIX Summer ulation Study of Dynamic Load Balancing. IEEE Conference, 7–49. Transactions on Software Engineering 14,9, WELCH, B. 1990. Naming, State Management and 1327–1341. User-Level Extensions in the Sprite Distributed ZHOU, S., ZHENG, X., WANG,J.,AND DELISLE, P. 1994. File System. Ph.D. Thesis, Technical Report Utopia: A Load Sharing Facility for Large, UCB/CSD 90/567, CSD (EECS), University of Heterogeneous Distributed Computer Systems. California, Berkeley. Software-Practice and Experience. WHITE, J. 1997. Telescript Technology: An Intro- ZHU, W. 1992. The Development of an Environment duction to the Language. White Paper, General to Study Load Balancing Algorithms, Process Magic, Inc., Sunnyvale, CA. Appeared in Brad- migration and load data collection. Ph.D. The- shaw, J., Software Agents, AAAI/MIT Press. sis, Technical Report, University of New South WHITE, J. E., HELGESON,S.,AND STEEDMAN, D. A. 1997. Wales. System and Method for Distributed Computa- ZHU, W., STEKETEE,C.,AND MUILWIJK, B. 1995. tion Based upon the Movement, Execution, and Load Balancing and Workstation Autonomy on Interaction of Processes in a Network. United Amoeba. Australian Computer Science Commu- States Patent no. 5603031. nications (ACSC’95) 17, 1, 588–597.

Received October 1996; revised December 1998 and July 1999; accepted August 1999

ACM Computing Surveys, Vol. 32, No. 3, September 2000.