Mechanisms for Just-in-Time Allocation of Resources to Adaptive Parallel Programs
Arash Baratloo Ayal Itzkovitz Zvi M. Kedem Yuanyuan Zhao
baratloo,ayali,kedem,yuanyuan¡ @cs.nyu.edu Department of Computer Science Courant Institute of Mathematical Sciences New York University
Abstract quired at any single instance during their execution. Even in computing environments of adaptive jobs Adaptive parallel computations—computations that with excessive resources, significant load-unbalancing can adapt to changes in resource availability and could still occur, resulting in slow turnaround and non- requirement—can effectively use networked machines be- responsiveness. This indicates that intelligent allocation cause they dynamically expand as machines become avail- of resources is necessary, especially one that could full- able and dynamically acquire machines as needed. While fil the potential imposed by the existence of adaptive pro- most parallel programming systems provide the means to grams. We call the entity performing this functionality a develop adaptive programs, they do not provide any func- resource manager. Ideally, the service provided by a re- tional interface to external resource management systems. source manager should be available to any compiled exe- Thus, no existing resource management system has the ca- cutable without requiring explicit programming. For effi- pability to manage resources on commodity system soft- cient utilization of resources, machines should be allocated ware, arbitrating the demands of multiple adaptive compu- to computations only when requested and not pre-allocated tations written using diverse programming environments. and reserved. We refer to this ability as just-in-time alloca- This paper presents a set of novel mechanisms that facil- tion of resources. Furthermore, a resource manager should itate dynamic allocation of resources to adaptive parallel be able to manage programs developed using different pro- computations. The mechanisms are built on low-level fea- gramming systems. However, as discussed later, none of tures common to many programming systems, and unique the existing systems achieves this, which severely limits in their ability to transparently manage multiple adaptive their applicability. parallel programs that were not developed to have their Parallel programming systems like PVM, MPI, Ca- resources managed by external systems. We also describe lypso, and Linda have a built-in resource manager with the design and the implementation of the initial prototype limited functionality. It is common for this type of re- of ResourceBroker, a resource management system built to source manager to (1) schedule parallel tasks among the validate these mechanisms. (already) allocated machines, and to (2) assist in adding explicitly-named machines to a computation. This type of resource manager is refered to as an intra-job resource 1. Introduction manager, since its primary responsibility is to manage re- sources within one job. To differentiate from the resource Adaptive programs are those that can adapt to exter- manager built into parallel programming systems, inter-job nal changes in resource availability and internal changes in resource manager is used to refer to the system that man- resource requirements. Most master-slave PVM [9] pro- ages all the machines among all the executing jobs. The grams, self-scheduling MPI [10] programs, bag-of-tasks concept of separating these two types of managers was first Linda [3] programs, and all Calypso [1] programs are adap- discussed in the work in Benevolent Bandit Laboratory [7]. tive. For PVM, MPI, and Linda, programs must be written Dynamic allocation and reallocation of machines to jobs so that they are able to tolerate machine removals; whereas requires communication between inter-job manager and for Calypso, this service is provided by the runtime layer. multiple intra-job managers. The lack of a common inter- Adaptive computations can efficiently utilize networked face introduces a challenge to inter-job resource manager machines for two reasons. First, they can expand to ex- developers: how can their software system communicate ecute on machines that, otherwise, would be left unused. with a variety of parallel systems, especially with systems Second, they can acquire machines as their requirements that do not provide an interface? This is the reason exist- grow, rather than reserving the largest pool of machines re- ing resource managers for adaptive programs restrict them- £ resource manager resource manager resource management layer
agent agent scheduling
dynamic resource ¢
¢ agent layer service sub- sub- sub- sub- sub- availability agent agent agent agent agent runtime request applications for resources resource
layer requirements
¤ calypso computation pvm computation ¦ input/ parallel job ¥ output Figure 1. Components of ResourceBroker. user Figure 2. The three entities in job execution. selves to supporting only a single programming system. ResourceBroker is the first resource manager to use low- the resource-management layer and the agent layer. Fig- level features common to popular parallel programming ure 1 depicts the software layers during the execution of systems for its communication, hence, it is able to manage two parallel jobs. unmodified PVM, MPI, Calypso, and PLinda programs. The resource management layer consists of a single network-wide resource manager process and a single dae- 2. Design Goals mon process on each machine. It is possible to run the resource manager process with non-privilege user access rights, as opposed to administrator access rights. The re- The goal of ResourceBroker is to provide a comprehen- source manager process spawns the daemon processes at sive resource management service that is able to dynami- startup and restarts them if they fail. Daemons are respon- cally allocate machines among multiple competing compu- sible for monitoring resources such as the CPU status, the tations written in different parallel programming systems. users who are logged on, the number of running jobs, and The following is the key objectives met in this system: the keyboard- and the mouse-status on the machine. This 1. Parallel programming systems are treated as com- information is periodically reported to the resource man- modity, so that the executables built for environments with ager process. The resource manager process is responsible no global resource management service are not expected to for deciding which job can use which machine. be modified to take advantage of the service; 2. Operating systems are treated as commodity, and the The agent layer consists of dynamically changing sets only privilege required is user level, thus ResourceBroker of processes. A user who wants to use ResourceBroker’s doesn’t comprise the security of the network; services first starts an agent process to submit the job for 3. The use of the resource manager is optional, and it is execution. As the job extends to remote machines, sub- sometimes referred to as a “service” to emphasize its un- agent process are automatically started to monitor the new obtrusive availability; processes. This is illustrated in Figure 1. The combination 4. The service can dynamically reallocate resources of agent and subagent processes forms the agent layer. The agent layer provides the means for the resource manager among adaptive jobs as resource availability changes; to monitor and actively intervene in the execution of adap- 5. Mechanism and policy are separated, making the later tive jobs. In a broader sense, it acts as a broker between an easily plugin module. the resource management layer and the running jobs, and Compared with mechanism, policy is relatively dy- is able to coerce programs to achieve the allocation policy namic. Different user requires different policy, and same determined by the resource management layer. policy evolves over time, it’s desirable to make the inte- gration of new policy into the system as easily as possible. Many existing resource managers [12, 16, 14, 11] em- When the system was first implemented, the following has ploy a single-level architecture, where the monitoring dae- been used. User jobs are divided into two classes: adap- mon processes also carry out the responsibilities of agent tive, and others. Similarly, machines are divided into two and subagent processes. The purpose behind a two-level classes: private and public. Private machines belong to in- architecture is to allow ResourceBroker to run with user- dividuals and the owner has absolute priority over its use, level privileges only. Although resource-management layer where as public machines are available to all users and typi- processes run with user privilege, it is able to manage other cally reside in a public laboratory. The policy implemented user’s jobs. These mechanisms are described in Section 5. by ResourceBroker is to allocate private machines only to adaptive jobs. Hence, adaptive jobs running on a privately 4. User, Job, and ResourceBroker Interactions owned machine can be deallocated once the owner of the machine returns. In other cases, ResourceBroker tries to In the presence of a resource manager, there are three evenly partition machines among jobs. entities in every job invocation: (1) the resource manager, (2) the user, and (3) the job submitted for execution. Fig- 3. Architecture ure 2 depicts them and their interactions. The interaction between a user and the resource manager and the interac- ResourceBroker consists of two weakly coupled layers: tion between a running job and the resource manager are
2
of particular interest and are briefly discussed next, more §
details is available at [2]. #! § bin bash §
echo add $1 ¨ $HOME .pvmrc
¨ §
echo quit ¨ $HOME .pvmrc
§ §
4.1. Users' Interaction with ResourceBroker pvm ¨ dev null
rm $HOME § .pvmrc Users communicate with ResourceBroker to query machine availability, to learn the status of queued jobs, Figure 3. Module to grow PVM. to submit a job for execution and specify its resource
requirements. ResourceBroker adopted the Resource rsh daemon
©app 3 Specification Language of Globus [8], and extended it 4
1
adaptive client to support adaptive programs. Specifically, , 2 start script module © app , and parameters were added to rsh node01