A Decentralized and Cooperative Workflow Scheduling Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Eighth IEEE International Symposium on Cluster Computing and the Grid A Decentralized and Cooperative Workflow Scheduling Algorithm Rajiv Ranjan, Mustafizur Rahman, and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Laboratory Department of Computer Science and Software Engineering The University of Melbourne, Australia {rranjan, mmrahman, raj}@csse.unimelb.edu.au Workflow Workflow Abstract— In the current approaches to workflow scheduling, T1 T1 there is no cooperation between the distributed workflow brokers T2 T3 T2 T3 and as a result, the problem of conflicting schedules occur. To overcome this problem, in this paper, we propose a decentralized T4 T4 T1 R1 T1 R1 Workflow and cooperative workflow scheduling algorithm. The proposed T2 R1 T2 R1 Workflow Workflow Broker 1 approach utilizes a Peer-to-Peer (P2P) coordination space with T3 R2 T3 R2 Broker 2 Broker m T4 R3 T4 R3 respect to coordinating the application schedules among the Grid Query wide distributed workflow brokers. The proposed algorithm is Reply completely decentralized in the sense that there is no central Centralised point of contact in the system, the responsibility of the key GIS functionalities such as resource discovery and scheduling co- ordination are delegated to the P2P coordination space. With Update the implementation of our approach, not only the performance bottlenecks are likely to be eliminated but also efficient scheduling with enhanced scalability and better autonomy for the users are likely to be achieved. We prove the feasibility of our approach through an extensive trace driven simulation study. Resource 1 Resource 2 Resource n I. INTRODUCTION Fig. 1. Existing workflow scheduling approach. Workflow scheduling algorithm is a process of finding the efficient mapping of tasks in a workflow to the suitable Other major drawback involved with current workflow resources so that the execution can be completed with the scheduling is that the existing workflow brokers rely on the satisfaction of objective functions such as execution time min- centralized (refer to Fig. 1) or semi-centralized hierarchical imization as specified by Grid users. Therefore, the efficiency resource information services such as MDS-2,3,4 [3]. Current of the workflow scheduling algorithm directly affects the studies have shown that [19] existing centralized model for performance of the system with respect to delivered Quality information services do not scale well as the number of of Service (QoS), utilization, and system performance. users, brokers and providers increase in the system. Hence In the current approaches to workflow scheduling, there is in case, the centralized links leading to these services fail, no cooperation between the distributed workflow brokers and then no broker in the system can undertake scheduling related as a result, the problem of conflicting schedules can occur. For activities due to the lack of up-to-date resource information. In example, consider a Grid environment (as shown in Fig. 1) addition, considering the sheer dynamism of Grid computing that consists of n number of resources and m number of environment, any scheduling decision that is based on static workflow brokers. Users submit their scientific applications to resource information, would certainly be sub-optimal. the workflow brokers. These brokers generate the schedules Further, Grids [4] are heterogeneous and dynamic environ- based on the resource information obtained from the Grid ments consisting of computing, storage and network resources Information Services (GIS). A schedule is effectively mapping with different capability and availability. In [10], it is shown of the set of tasks in the workflow to the set of available that the dynamic scheduling algorithms based on heuristics resources. However, if workflow broker 1 and workflow broker adapt to the changing resource conditions of Grids by perform- 2 query the GIS at the same time, they will get the similar ing just-in-time scheduling (generating schedules that maps the information about the resource availability pattern. Based on tasks dynamically). But the workflow brokers running these this information, workflow broker 1 and workflow broker dynamic algorithms, still need to be coordinated in order to 2 will generate the same mapping for the tasks in their avoid any conflict and generate schedule globally. locally submitted workflows, which will lead to conflicting To overcome the limitation of exiting approaches, we pro- schedules. Hence, both the system and the application suffer pose a fully decentralized and cooperative workflow schedul- from degraded performance. ing algorithm based on Peer-to-Peer (P2P) coordination space. 978-0-7695-3156-4/08 $25.00 © 2008 IEEE 1 DOI 10.1109/CCGRID.2008.94 A P2P coordination space [8], [11] provides a global virtual perform the Decentralized Dynamic Workflow Scheduling us- shared space that can be concurrently and associatively ac- ing Reinforcement Learning (DDWS-RL) algorithm. DDWS- cessed by all participants in the system and the access is RL enabled TMs query information from the RLA and make independent of the actual physical or topological proximity decision on resource selection at the time of task execution. of the objects or hosts. New generation routing algorithms, Thus, RLA is used in the tuple space to facilitate the schedul- which are more commonly known as the Distributed Hash ing algorithm to be more efficient. However, this approach Tables (DHTs) [15], [13] form the basis for organizing the also involves the same architecture as the other approach, P2P coordination space. In the proposed approach, workflow stated above, which is based on centralized client-server model brokers post their resource demands by injecting a Resource with respect to resource discovery and coordination space Claim object into the DHT-based decentralized coordination management. In contrast, with our approach there is no central space, while resource providers update the resource infor- component that can prove to be a bottleneck. mation by injecting a Resource Ticket object. These objects Resource site 1 are mapped to the DHT-based coordination space using a Resource site 2 T1 T1 spatial hashing technique [16]. Once a resource ticket matches 3 T2 T3 T2 T3 with one or more resource claims, the coordinator space user 2 T4 T4 sends notification messages to the resource claimers such that 6 2 notify() GFA 1 GFA 2 it does not lead to the overloading of the concerned resource GFA3 user 1 publish (ticket) ticket issuer. Thus, this mechanism prevents the workflow Resource 1 1 Resource 2 brokers from overloading the same resource. 1 publish (ticket) index cell i spatial hash The main contributions of this work include: (i) a novel 4 subscribe (claim) 7 decentralized and cooperative workflow scheduling algorithm submit (job) T1 and (ii) extensive simulation based study to prove the effec- T1 Resource site 3 1 5 Resource site 4 tiveness of the proposed approach. publish (ticket) match() user 3 user 4 The rest of this paper is organized as follows. In the P2P Tuple Space publish (ticket) next section, we describe the related work that focused on 1 decentralized workflow scheduling. Section 3 provides a brief GFA 3 GFA 4 description of the system models used in our scheduling Resource 3 Resource 4 approach. In Section 4, we present the details of our workflow scheduling algorithm. The simulation model, experimental setups, and the findings of the experiments performed are Fig. 2. Proposed workflow scheduling approach. discussed in Section 5. Finally, we conclude the paper with III. SYSTEM MODELS the direction for future work. A. Grid Model II. RELATED WORK The proposed workflow scheduling algorithm utilizes the The main focus of this section is to compare the novelty Grid-Federation [12] model in regards to resource organization of the proposed work with respect to the existing decen- and Grid networking. Grid-Federation aggregates distributed tralized scientific workflow scheduling strategies. In [18], a resource brokering and allocation services as part of a coopera- workflow enactment engine with a just-in-time scheduling tive resource sharing environment. The Grid-Federation, GF = system using tuple space is proposed to manage the exe- {R1,R2,...,Rn}, consists of a number of sites, n, with each cution of scientific workflow applications. In this system, site contributing its resource to the federation. Every site in the every task has its own scheduler called Task Manager (TM) federation has its own resource description Ri which contains which implements a scheduling algorithm and handles the the definition of the resource that it is willing to contribute. Ri, processing of tasks. The TMs are controlled by a Workflow can include information about the CPU architecture, number Coordinator (WC). Besides, an event-driven mechanism with of processors, memory size, secondary storage size, operating subscription-notification methods supported by the tuple space system type, etc. In this work, Ri = (pi, xi, µi, øi), which model is used to control and manage scheduling activities. includes the number of processors pi, processor architecture In this work, although the task managers are working in a xi, their speed µi, and installed operating system type øi. distributed fashion, they communicate with each other through Resource brokering, indexing and allocation in Grid- tuple space which is implemented based on a client-server Federation are facilitated by a Resource Management Sys- based centralized technology. Further, the WC in this system