Embarrassingly Parallel Jobs Are Not Embarrassingly Easy to Schedule on the Grid

Enis Afgan, Purushotham Bangalore Department of and Information Sciences University of Alabama at Birmingham 1300 University Blvd., Birmingham AL 35294-1170 {afgane, puri}@cis.uab.edu

Abstract range greatly from only a few instances to several hundred instances and each instance can execute from Embarrassingly parallel applications represent several seconds or minutes to many hours. The end an important workload in today's grid environments. result is of application’s execution that is only Scheduling and execution of this class of applications limited by the number of resource available. is considered mostly a trivial and well-understood Meanwhile, has emerged as the on homogeneous clusters. However, while upcoming distributed computational infrastructure that grid environments provide the necessary enables virtualization and aggregation of computational resources, associated resource geographically distributed compute resources into a heterogeneity represents a new challenge for efficient seamless compute platform [2]. Characterized by task execution for these types of applications across heterogeneous resources with a wide range of network multiple resources. This paper presents a set of speeds between participating sites, high network examples illustrating how execution characteristics of latency, site and resource instability and a sea of local individual tasks, and consequently a job, are affected management policies, it seems the grid presents as by the choice of task execution resources, task many problems as it solves. Nonetheless, from the invocation parameters, and task input data attributes. early days of the grid, EP class of applications has It is the aim of this work to highlight this relationship been categorized as the killer application for the grid between an application and an execution resource to [3]. This is due to the ability of EP applications to promote development of better metascheduling consume large numbers of resources accompanied techniques for the grid. By exploiting this relationship, with minimal requirements and dependencies on application throughput can be maximized, also network status as well as high tolerance for partial resulting in higher resource utilization. In order to failure. Furthermore, these applications were relatively achieve such benefits, a set of job scheduling and easy to deploy and initiate execution. execution concerns is derived leading toward a This suitability has materialized through several computational for scheduling embarrassingly projects that have explored, exploited, and advanced parallel applications in grid environments. capabilities of EP application in grid environments, for example AppLeS [4] and Nimrod/G [5]. However, because of the lack of communication between 1. Introduction individual tasks, a common assumption is that execution of EP application in grid environments is Embarrassingly parallel (EP) class of applications easy, or at least significantly easier than execution of is likely to represent the most widely deployed class of tightly coupled, MPI applications. On the contrary, applications in the world [1]. EP applications, in execution of EP applications in grid environments is nature very similar to SPMD (Single Process, Multiple confined by resource availability, optimized through Data) or Parameter Sweep, are characterized by task parameterization, hindered by simultaneous use independent, coarsely grained and indivisible tasks. and management of heterogeneous resources The goal of EP applications is to introduce parallelism belonging to different administrative domains, and into application execution without any application dependent on user requirements. As such, the act of code modification and associated cost. This is realized application execution includes application scheduling, through multiple invocations of the same application, or assignment of tasks to appropriate resources. where each instance is invoked using a different input Because of the sheer number of influencing data set. The number of tasks being instantiated can components, scheduling becomes a major component

978-1-4244-2872-4/08/$25.00 ©2008 IEEE for success of EP applications in the grid and should be handled comprehensively with respect to the 17 application execution environment variables. 15 As the grid evolves into independently managed clouds relying heavily on commercial aspects, 13 resource owners will try to maximize resource utilization while satisfying users’ requirements. 11 Simultaneously, users will impose stringent 9 requirements on their application executions in terms 7

of cost, execution time, reliability or data cost. In Task number order to achieve these objectives an understanding of 5 the components influencing application execution is necessary. In this paper, we offer a discussion of 3 execution variables for EP applications in grid 1 environments and their effects on applications’ runtimes. We have used the NCBI BLAST [6] 0 2000 4000 6000 8000 application as an example to highlight many of these Time taken (seconds) issues and illustrate the need for advanced resource and application specific metascheduling techniques. Figure 1. Sample task runtimes of a job across heterogeneous resources. 2. Application Model largely homogeneous. In grid environments, where As briefly discussed, the Embarrassingly Parallel individual resources are highly heterogeneous, task (EP) class of applications is composed of a class of execution times can vary greatly [8]. Figure 1 shows applications whose computational load consists of an example of task runtime variability resulting from executing the same application multiple times, with resource heterogeneity. Shown tasks were all fed an each instance operating on a varying input data set and equally-sized input file and executed across a range of perhaps varying parameters [7]. The most significant heterogeneous resources resulting in a significant level characteristic of EP applications is that, once started, of load imbalance. Tasks that have experienced there is no communication between individual tasks. approximately equivalent runtimes got assigned to This model can be seen as a modification of Flynn’s multiple nodes within a single cluster by the high-level original SIMD (Single Instruction, Multiple Data) scheduler. In the described environment, the runtime taxonomy and more general SPMD (Single Process, of the entire job is determined by the longest running Multiple Data) model where individual processes can task making a strong requirement for efficient load execute independent instructions at the same point in balancing. time but are, overall, executing the same code [7]. As such, an application execution with all of its instances 2.2. Task parameterization is referred to as a job. A job is composed of a set of tasks or instances, where each instance is assigned a In order to better understand factors influencing different input data set. More formally, a job J runtime of a task, a task ti can be further decomposed composed of a set of tasks ti can be represented as and represented as follows: follows: (2) where d represents the task input data, r represents (1) i i the execution resource for task ti, and pi represents the parameter set used when invoking the given task. In this section we present some of the key factors that Individually, the input data set is the single most affect metascheduling of EP class of applications in a influential factor determining the runtime of a task. grid along with potential solutions to address some of However, because the input file or input parameters these issues. are the purpose for the computation,the job is constrained by the characteristics of the input. 2.1. Task heterogeneity Nonetheless, at the task level, existing file characteristics may be exploited to better meet In more traditional cluster environments, where resource capabilities. For example, Figure 2 is individual compute nodes are homogeneous, execution showing runtime of two different NCBI BLAST [6] times of individual tasks within a job can be expected jobs where a large number of only short search queries Because of grid resource heterogeneity, the factor Many short queries offering most flexibility regarding task runtime is the Few large queries resource selected for task execution. Figure 3 shows a sample runtimes for a set of jobs on heterogeneous resources. From the figure, it is obvious that even small variations in underlying resources can cause 0 2000 4000 6000 8000 noticeable change in task and therefore job runtime. Time (in seconds) Lastly, the parameter set used when invoking a Figure 2.Task runtime affected by input file task can have a significant impact on resource properties. utilization and task runtime (see Figure 4). For example, a sample parameter that the user may have where fed to the program as opposed to a small control over but that does not influence the results of number of long queries. Both files were of the computation is number of threads employed within approximately same physical size. The figure shows a a process. Other such parameters may be application difference in execution speed of several orders of dependent and can include transitional probabilities in magnitude. Such observations are application specific, Hidden Markov Model (HHM) applications or step but offer a new level of application-specific size in Monte Carlo simulations. Figure 4 shows the scheduling that can be exploited in grid environments. runtime influence of using varying number of threads For example, in case of BLAST, variation in task to complete the same task. For shown application execution time between long and short queries is (NCBI BLAST), such functionality is supported caused by algorithm’s requirement to keep track of directly within the application and requires only an various segments of the search query. For the long additional parameter when invoking the task. What queries, this process consumes significantly more time can also be noticed from the figure is that than for the short queries. As such, this characteristic of presented approach has its limits, and, more so, can be exploited to divide the input data into sets of those limits are resource dependent. As can be seen, only long and only short search queries. Then, long speedup of employing two threads as opposed to one queries can be submitted to fast resources while short is nearly linear for all of the resources, but queries can be targeted for the slower, cheaper and when using four threads as opposed to two, only the less loaded resources achieving overall job load dual Opteron resource shows significant speedup. balance. Furthermore, when invoking the task with eight threads, all the machines seem to have reached their 2500 scalability maximum with respect to the available parameter. This limit is caused by the total number of 2000 processing cores available on underlying resources, 1500 i.e., mentioned parameter scales well until the number of threads is equivalent to the number of processing 1000 cores available on the compute node. As an additional note on application-resource dependency, if compute 500 nodes with multiple CPUsare considered, one may be 0 tempted to start multiple BLAST processes so that the number of processes corresponds to the number of CPUs on the node. As shown in Figure 5 though, such parameterization results in approximately 7% slowdown when compared to the single process and multiple threads parameterization. These observations apply to NCBI BLAST in particular and have been derived through analysis of application’s execution characteristics. Nonetheless, similar observations can be derived for other applications. With provided examples in mind, the pi factor may need to be derived individually for each application and can represent a complex structure Figure 3.Execution times of a set of jobs across corresponding closely to application parameters. In heterogeneous architectures showing runtime general, because of the effects all the components have influence. on the task computation time, a variable ei can be 2000 formulation leads to a definition of job parameterization, which can be defined as an 1800 understanding and selection of job and task parameters 1600 that are algorithm, input data, and resource dependent. 1400 Because of the broad encompassment of EP class 1200 of applications and the versatility of the model describing them, other classes of applications can be 1000 encapsulated within the EP class. This derivation can 800 be used to make a case for showing the difficulty

Time (in seconds) 600 involved in scheduling and coordinating execution of 400 EP class of applications. Obviously, a sequential application can be represented by a single task that 200 would also define the entire job. Similarly, a tightly 0 coupled MPI application can be encapsulated within a 1 thread 2 threads 4 threads 8 threads single task. The value for the parameter factor pi from Pentium D 3.0 GHz Eq. 2 would include needed parallelism information. Core 2 Duo 2.33 GHz Because of general inability of MPI applications to Dual Opteron 265 1.8 GHz cross individual cluster boundaries, currently there has not been such a need for multiple task coordination. Figure 4.Task runtime dependencies on This greatly simplifies the scheduling and job invocation parameters across architectures. submission process because load balance at the job level does not have to be a concern. Master-worker introduced to represent the expected computation time and all-worker classes of applications can be seen as of a task ti: equivalent to EP class where the master process would (3) be handling the resource selection and job To aim of variable ei is not necessarily parameterization rather than a higher-level scheduler. representation of an absolute and definite task runtime that deals with job runtime models and job runtime 3. Scheduling Considerations estimations. Rather, it provides a relative reference among individual task options that exist when Execution of EP jobs in grid environments submitting a job and provides insight into components enables large reduction in overall job execution time, influencing task execution time. Finally, overall job increase in resource utilization and reduction in execution time can be defined as for all i. resource operating cost. This job execution though is perplexed with application and resource dependencies 2.3. Job parameterization that, if not understood, can easily invalidate many of the available benefits. Scheduling of grid jobs, EP jobs Following, the aim of a job submission effort and in particular, is thus the most relevant activity that can an accompanying scheduler dealing with EP jobs is to realize these benefits. The first step toward efficient consider available options and factors influencing task job execution is understanding of the existing execution with the goal of minimizing load imbalance application-resource relationship. Once understood, between individual tasks ti. By minimizing load these relationships can be leveraged to deliver sought imbalance among resources selected for execution, the goals. Such understanding of an application’s overall job runtime will be minimized. Presented relationship to the numerous grid resources may seem as a far fetched goal requiring un-proportional effort 1 process per node for the benefit received. If decomposed, this process 1 process per CPU can be classified into several main categories lending themselves as a set of guidelines for deepening the understanding of the application-resource relationship. Furthermore, depending on the desired results, the 225 230 235 240 245 250 255 level of understanding of the existing relationship can Time (sec) range from rudimentary, where merely a small number of most obvious application characteristics are Figure 5.Effects of task parameterization - recognized, all the way to a well-understood and well- invoking a task using a single process per node with developed mathematical model for application’s multiple threads as opposed to single process per CPU. behavior that can later be mapped to individual granularity. Due to the overhead associated with resources. task instantiation, a balance between number of In the context of the EP applications, efficient job tasks created and individual task’s computation execution can be warranted by minimization of the time needs to be adequately tuned. load imbalance across instantiated tasks. This load x Task parameterization – once created, each task imbalance is minimized through coordination of can be seen as an individual job. This is especially employed parameters in the context of Eq. 2 above true from the resource perspective. Depending on across job spectrum. Eq. 2 alone provides a baseline the characteristics of the execution resource, for understanding individual components that affect parameterization of a task should be task’s runtime. As stated earlier in Section 2, independently handled and optimized for the advancing understanding of effects each individual resource at hand. Based on the amount of data component has on application’s execution is the first available for current application and resource as step in achieving efficient task execution. The next well as the level of understanding of an step is the derivation and coordination of multiple application performance model, this process can tasks that are making up the current job. This is also require a greatly variable amount of effort. Effects the most demanding step. With the ultimate goal of of task parameterization can turn a poorly achieving adequate load balance across tasks, behaving resource into a competitive one. An derivation of the tasks must be done simultaneously. example is shown in Figure 6 where a properly Thus, this problem can easily be described as a multi- parameterized task (Parameter set 2) executing on constraint multi-objective optimization problem. an old Sun Sparc machine is comparable in Fortunately, because the grid is such a heterogeneous performance to an improperly parameterized task and dynamic system, perfect solution to the problem is (Parameter set 1) on a much newer Intel Xeon rarely necessary and, even if attained, questionably based resource. realizable. An approximate solution that recognizes x Load balancing – although each task of an EP stated attributes and employs heuristics and best-effort application executes independently, maximization practices will often suffice. As such, the following is a of job performance is achieved through set of concerns and suggestion that should be kept in minimization of load imbalance across tasks. At mind during scheduling and execution of EP class of the job level, aiming at achieving load balance applications: results in task dependencies making efficient x Job-level coordination– it is beneficial to consider scheduling of this class of applications a job in its entirety prior to its execution rather significantly more difficult. This is further than simply submitting jobs on first come, first complicated by the suggested task-level serve basis for example. Global overview and parameterization where significant variability comprehension of the job with respect to input file among task execution rates can be observed. An size, input file format and characteristics, data approach to managing this issue can be seen in the locality, available resources and user requirements job plan management that is aware of individual is desirable. Through acknowledgement of these task-level optimizations and can thus guide the characteristics, the scheduling objectives, and thus computation toward set goals. Other possible the scheduling algorithm, can be adjusted more directions for achieving load balance in grid adequately. By considering a job as a unit, a applications include adaptation of techniques used global plan can be developed a priori eliminating in parallel and distributed environments (e.g., [9, many uncertainties as job execution gets under 10, 11]). way (e.g., job budget, input file distribution, data 3000 accuracy, expected task failure rate). At the same Intel Xeon, (2.66 GHz, 32 time, this job plan cannot be considered definitive bit, 512 Kb L1) Sun Sparc E450, (400 MHz, due to the core characteristics of the grid (e.g., 2000 4 GB RAM) unreliability, dynamic state) requiring provisions to be made that enable in-execution plan 1000 adjustments. An effective way to cope with the uncertainty is to adopt dynamic task distribution 0 as opposed to the static distribution. The dynamic Parameter set 1 Parameter set 2 task distribution offers higher tolerance for individual task failure as well as dynamic Figure 6.Significance of task-level parameterization - resource availability. The major concern improperly parameterized task can result in a resource regarding dynamic task distribution is job behaving very poorly. x Task failure –superficially, execution of EP class realized by user satisfaction and measured of applications with no task communication seems through user utility. These notions are realized like a perfect platform for easy handling of partial through the Quality of Service (QoS) task failures. However, under the goals discussed requirements imposed by the users and agreed throughout this paper, task failure introduces a upon by the resource provider through the Service significant issue. The aim of developing a job Level Agreements (SLAs). With future execution plan is to provide additional insight into advancements of pervasive computing, user will job execution and try to satisfy user objectives. require detailed insight into their consumption of Once a task fails, the job plan is invalidated and, computational power further complicated with depending on the failure situation, could imposed requirements on alternative job completely disrupt outlook of successful job execution plans. While current efforts in completion. As a general observation, tasks scheduling EP applications primarily focus on executing on grid resources have the highest runtime minimization, largely because runtime is failure rate during their initialization and startup equated with cost, in future systems cost will take process. This can be due to resource miss- on different forms (e.g., result accuracy, system configurations, task miss-configuration, policies responsiveness, power consumption, availability) or events local to a resource as well as any of and will become a primary scheduling other host of variables. As such, a much higher consideration. probability of success can be assigned to a task that has reported as being under execution. 4. EP Application Scheduling Framework Alternatively, multiple submission of a task can be started, albeit resulting in wastage of resources. Considering described components that affect If balance between computation time and number execution of EP applications in grid environments, of tasks is well tuned, using dynamic task scheduling of this class of applications needs to move distribution can offer highest tolerance for task beyond using a generic resource comparison as a failure [12]. However, other issues, such as guideline for task distribution and submission. consistent resource availability, may pose Scheduling needs to involve coordination of resource alternate problems then. While there is no safe- capabilities, matching those to application’s observed guard against task failure, observations and potential, coupled with adequate data distribution and models such as these can be incorporated into the finalized through individual task parameterization. A job planning process leading to a more robust single scheduling action must coordinate execution execution. parameters of such heterogeneous tasks with the goal x Cost – considering the general direction of grid of meeting user requirements. As such, an EP computing toward enterprise and application with its derived tasks becomes a where economical aspects are being increasingly heterogeneous unit whose interactions and execution prevalent [13], cost associated with job execution characteristics need to be simultaneously balanced and is becoming a major concern. Because job cost is coordinated. obtained from cumulative task cost, utilization A diagram of the framework outlining the general realized by each task is directly proportional to process of effective EP application scheduling is the end cost. From the perspective of job provided Figure 7. As can be seen in the figure, scheduling, it is thus important to generate effective EP application scheduling is a two-step efficient job execution plan that satisfies user’s process. Initially, the application needs to be analyzed, requirements while yielding profit to the resource yielding needed application-specific information. owner. Thus, cost is another major variable when Following, as user jobs arrive, the derived information scheduling EP jobs. Adopting the job plan model is exploited to account for the application-specifics discussed leads toward desired estimations. and in turn provide a job execution plan that aims at x User requirements – in the context of enterprise optimizing job’s performance. Once a job plan has and cloud computing, user requirements will been derived, the control is transferred to a job become the major driving force behind job submission engine for execution on grid resources. scheduling policies. Unlike the more traditional Analyzer and planner are the two main cluster computing environments where resource components of described framework. Analyzer utilization was the main objective, service operates at the application level by performing a semi- oriented architecture promoted by the grid permanent analysis of an application. Such analysis infrastructure is advocating a user centric attempts at deriving application-specific relationships orientation. Here, the quality of a system is between input data, input parameters, runtime modes, Two EP applications were used to perform needed analysis followed by job execution. Runtime results show an order of magnitude improvement in job execution characteristic. As throughout the rest of this paper, NCBI BLAST application was chosen as one of the applications for execution. Figure 9 shows the runtimes of job segments as they were distributed across individual resources. A segment is defined as a set of tasks that executes on any one resource. Analyzing job turnaround time at the segment level is semantically equivalent to analyzing job turnaround time at individual task level with the benefit of reducing number of entities to display and keep track of, and has thus been used as the method of choice for result display. Application analysis included benchmarking of each machine using BLAST and systematically Figure 7. A sample EP application scheduling framework that accounts for application-specific changing input parameters. Following, based on behavior across grid resources. benchmark data, application-specific modules for data distribution were developed. Given an input file, and execution resources. Unfortunately, application resource-application benchmarks were used to analysis can often be a non-automated process that distribute and allocate appropriate chunks of data to requires acquiring individual’s familiarity with the selected resources with the goal of reducing job load application execution patterns. Tools and services such imbalance. Initial and adjusted job data distributions as AIS [14] can help in guiding and performing the can be seen in Figure 8. Furthermore, in order to analysis as well as storing derived conclusions for exploit some of the observed dependencies [8] on job later retrieval and use. In the simplest format, input file format, additional application-specific application analysis involves benchmarking of various module has been developed and used to rearrange data resources enabling relative comparison of those in the input files enabling a finer level of control over resources for later job execution (sample reducing load imbalance. This module allows a key benchmarking tools: [15, 16]). In the most step in optimizing job execution parameters. Lastly, complicated format, application analysis involves each task was parameterized to use appropriate development of a mathematical model that describes number of threads to meet resource capabilities (i.e., application execution patterns and dependencies. Once meet number of cores on a node) and thus minimize an application has been analyzed, user jobs can exploit tasks’ runtime. derived information to improve their performance. Job planner is the component that operates at the job level 100% and it integrates application analysis information along 80% with resource availability information to produce a job 60% Resource 3 plan. Job planner may require an application-specific 40% Resource 2 module that can perform application data distribution 20% or input data re-formatting in order for the job to 0% Resource 1 maximize its performance. Once a job plan has been created, it is handed to the job manager and a job Initial Adjusted submission engine. The job manager component is an Figure 8. BLAST job data distribution across optional component than can implement application- resources between initial and optimized job executions. specific logic for partial task failure and changes in resource availability. In short, the following components were addressed to obtain optimized job execution: resources 5. Scheduling Examples were benchmarked, results of the application benchmarks were used to perform data distribution of This section serves as a brief outlook into for optimized tasks, input file for each task was re- application execution characteristics and benefits as a formatted providing finer level of load balance result of applying considerations presented earlier. control, and lastly, each task was parameterized to use publicized scheduling policies of available resources, number of threads suitable to execution host. resource specific knowledge was exploited to create From the Figure 9, it can be seen that the multiple processes within a single assigned compute reduction in job turnaround time of the optimized case node resulting in higher overall throughput. After is on the order of 50%. In all of the test cases, the initial resource benchmarks, a targeted job plan was same number of tasks and the same number of created by developing an R-specific data distribution compute nodes was used as in the initial case; thus, the module. The first iteration of job’s execution resulted increase in resource utilization is obtained purely in significant reduction of job turnaround time through more appropriate job parameterization. (‘adjusted’ column in Figure 10). However, Furthermore, it is obvious that the load imbalance considerable load imbalance was still observed. In across the machines has been drastically reduced and order to remove such load imbalance, similar to almost eliminated in the optimized case. The only heterogeneous parameterization of tasks assigned to difference in execution parameters of the adjusted and individual resources, tasks within a resource were then optimized cases is use of the input file re-formatting parameterized in heterogeneous fashion (based on module. From the results, it can be concluded that for runtime data observed during the ‘adjusted’ run). Final true optimization of application’s execution, a fine- runtime results are shown under ‘optimized’ column in level understanding and analysis of an application is Figure 10 where it can be seen that load imbalance needed. Obviously, such understanding may be quite was further reduced leading to 75% reduction in complicated and time consuming. Therefore, runtime when compared to the initial case. depending on expected application execution benefits 20 and needs, highly detailed optimization of Resource 1 (10 nodes) application’s execution patterns may or may not be Resource 2 (5 nodes) needed. Resource 3 (25 nodes) Presented methods have also been applied and 10 used during job submission of statistical R code. Similar to execution of BLAST jobs, executing R code (hrs) Time targeted at exploiting knowledge about individual resources and application’s suitability to those. Unlike 0 BLAST, R code does not support multiple threads and Initial Adjusted Optimized thus majority of speedup was achieved through data distribution that would match resource capabilities Distribution variations more closely. As was case with BLAST, the same Figure 10. Job runtime reduction as a result of number of compute nodes was used for all jobs; the exploiting application-resource relationship and only difference was in how individual tasks were knowledge when executing R code. parameterized. In case of selected R code, this required heterogeneous parameterization of tasks that Obviously, customizations and optimizations corresponded to benchmarked relative resource taking place during shown job submissions can be capabilities. Furthermore, due to the lack of multi- very application specific. It can thus be concluded that threaded support in the application but due to the there is no single silver bullet that can be applied to any and all applications resulting in shown benefits. 1500 Resource 1 (24 nodes) Rather, and as indicated in the introduction, the purpose of this paper is to provide a set of insights and 1200 Resource 2 (48 nodes) tentative guidelines that can be considered when 900 Resource 3 (128 nodes) developing metaschedulers or executing jobs in grid 600 environments. Currently, provided framework

Time (sec) architecture can be used as performance optimization 300 model rather than a concrete implementation. 0 Obtaining desired results automatically requires an application-specific module or scheduler that Initial Adjusted Optimized understands and exploits existing relationships. The Data distribution variations major drawback of this model is the need to, on Figure 9. Difference in job runtime and load balance application-per-application basis, understand and learn between a naïve job submission and an optimized job existing application-resource relationships followed by submission of a BLAST job. scheduler development. However, presented guidelines present a broad framework offering one a size). Upon collection of needed information, job directed path in achieving sought result. execution options are generated providing desired insight to the user. The user can then simply select the 6. Future Trends

Existing EP scheduling approaches and algorithms focus on minimizing job turnaround time, cost or both. Furthermore, current models require the user to specify job requirements and select most of the job execution parameters. Examples of options requested from the user can include specification of a preferred execution resource, number of processors to use, required amount of memory, and even creation of a virtual machine with all the necessary components and invocation parameters. Consequently, the user is still deeply involved in the job submission and scheduling process where the scheduler primarily performs actions requested. Even though today’s Figure 11. A sample job execution option space schedulers advertise cost and runtime optimality, presented to the user showing job execution tradeoffs. because they operate under constraints specified by the user, the schedulers are likely not exploring the entire execution option based on their current situation, thus application execution space. An analogy can be drawn satisfying their needs more adequately. here between searching for a local optimum versus a In order to achieve described functionality, a deep global optimum that actually exists in the application- understanding of an application needs to be attained. resource space but is not even being explored due to Initial works on understanding and exploiting this the constraints imposed by uninformed user. functionality have already begun (e.g., [18, 19, 20]) Future systems should advance the scheduling with more coming. Furthermore, each application environment and alleviate the user of otherwise needs by be analyzed and described independently. required effort by providing options and solutions to Examples of application-specific characteristics can the user. The level of understanding of the relationship include scalability levels, input data characteristic, as between an application and a resource should be well as resource compatibility. Added difficulty stems understood by the scheduler to the point where advice from the dynamics of the grid where resources fail, can be provided to the user automatically and prior to come online during job’s execution or vary their job’s execution. Such advice can come in form of performance rate based on current load. Some of these concrete values such as compute time, cost, or data issues may be alleviated through existing and accuracy (for sample methods of estimating needed developing systems (e.g., advance reservation), while values, interested reader can refer to [17]) or more others cannot be remedied for but only dealt with as abstractly where only a window of various execution they arise. Additional technologies, such as machine alternatives are presented. The latter alternatives may learning, may need to be employed to enable and not be mapped to concrete values but could be related automate aspirations such as these. Altogether, this is only to each other, thus providing insight to the user what makes scheduling a complex issue, but also one about the existing job execution alternatives. Without that is a cornerstone of a successful paradigm. such support, the user may not even be aware of existing options and is thus less likely to realize 7. Conclusions attainable goals. By enabling generation and presentation of described alternatives, job submission EP class of applications represents an ever- can be tailored to each individual user and each growing workload on today’s compute resources. This individual job to better match user’s requirements. workload is generated by larger problems as well as a This is the goal of service oriented technology. desire for more accurate results. Grid computing An example of what the user may be presented presents itself as a viable solution to the resource with in future is provided in Figure 11, where a sample shortage problem because independent institution can of application’s execution space is made available to now share their resources, thus increasing resources’ the user. Such an execution space should be derived overall utilization. Integration of the independent automatically, following guidelines presented above resources was the first step towards achieving goals and merged with user’s job requirements (e.g., job file set by grid computing ideals. The second step is efficient usage of those resources. This means that [10] J. Cao, D. P. Spooner, S. A. Jarvis, S. Saini, and G. R. resource should not only be used continuously (i.e., Nudd, "Agent-Based Grid Load Balancing Using increasing their utilization) but also effectively (i.e., Performance-Driven Task Scheduling," in 17th International increasing their throughput). In order to achieve such Symposium on Parallel and Distributed Processing (IPDPS) execution, relationship between an application and a 2003, Nice, France, 2003, p. 49.2. resource needs to be understood and exploited by the [11] R. U. Payli, E. Yilmaz, A. Ecer, H. U. Akay, and S. metascheduler. Then, the third step of customizing the Chien, "DLB – A Dynamic Load Balancing Tool for Grid execution of an application to user’s current needs Computing," in Parallel CFD Conference, Grand Canaria, becomes a realizable goal. At that point, the user Canary Islands, Spain, 2004, p. 8. abstracts themselves from underlying application [12] J. Dean and S. Ghemawat, "MapReduce: Simplified execution details and is able to focus on the original Data Processing on Large Clusters," in OSDI'04: Sixth problem. Work presented in this paper aims at Symposium on Operating System Design and promoting existence and importance of the application-resource relationship followed by a set of Implementation San Francisco, CA, 2004, p. 13. general concerns that can be used in order to develop [13] J. Broberg, S. Venugopal, and R. Buyya, "Market- more effective metascheduling techniques and oriented Grids and Utility Computing: The State-of-the-art solutions. and Future Directions," Journal of Grid Computing, 5(4), December 28 2007, 8. References [14] E. Afgan and P. Bangalore, "Application Specification Language (ASL) – A Language for Describing Applications [1] A. Iosup, C. Dumitrescu, D. H. Epema, H. Li, and L. in Grid Computing," in The 4th International Conference on Wolters, "How are real grids used? The analysis of four grid Grid Services Engineering and Management - GSEM 2007 traces and its implications," in International Conference on Leipzig, Germany, 2007, pp. 24-38. Grid Computing 2006, Barcelona, Spain, 2006, pp. 262-269. [15] G. Tsouloupas and M. D. Dikaiakos, "Grid Resource [2] I. Foster and C. Kesselman, Eds. The Grid 2, Second Ranking Using Low-Level Performance Measurements," in ed., New York: Morgan Kaufmann, 2004. 13th International Euro-Par Conference 2007 on Parallel [3] D. Abramson, J. Giddy, and L. Kotler, "High Processing, Rennes, France, 2007, pp. 467-476. Performance Parametric Modeling with Nimrod/G: Killer [16] G. Tsouloupas and M. Dikaiakos, "GridBench: A Tool Application for the Global Grid," in International Parallel for Benchmarking Grids," in 4th International Workshop on and Distributed Processing Symposium (IPDPS), Cancun, Grid Computing (Grid2003), Phoenix, AZ, 2003, pp. 60-67. Mexico, 2000, pp. 520-528. [17] E. Elmroth, J. Tordsson, T. Fahringer, F. Nadeem, R. [4] H. Casanova, G. Obertelli, F. Berman, and R. Wolski, Gruber, and V. Keller, "Three Complementary Performance "The AppLeS Parameter Sweep Template: User-Level Prediction Methods For Grid Applications," in CoreGRID Middleware for the Grid," in Supercomputing 2000, Dallas, Integration Workshop 2008, Heraklion, Greece, 2008. TX, 2000. [18] G. Tan, L. Xu, Z. Dai, S. Feng, and N. Sun, "A Study [5] R. Buyya, D. Abramson, and J. Giddy, "Nimrod-G of Architectural Optimization Methods in Resource Broker for Service-Oriented Grid Computing," Applications," International Journal of High Performance IEEE Distributed Systems Online, 2(7), November 2001, Computing Applications, 21(3), August 2007, pp. 371-384. [6] T. Madden, "The BLAST Sequence Analysis Tool," [19] H. Stockinger, M. Pagni, L. Cerutti, and L. Falquet, NCBI August 13 2003. "Grid Approach to Embarrassingly Parallel CPU-Intensive [7] V. Kumar, Introduction to : Bioinformatics Problems," in IEEE International Addison-Wesley Longman Publishing Co., Inc. Boston, MA, Conference on e-Science and Grid Computing Amsterdam, USA, 2002. Netherlands: IEEE 2006, pp. 58-68. [8] E. Afgan and P. Bangalore, "Performance [20] A. Tirado-Ramos, G. Tsouloupas, M. Dikaiakos, and P. Characterization of BLAST for the Grid," in IEEE 7th Sloot, "Grid Resource Selection by Application International Symposium on Bioinformatics & Benchmarking for Computational Haemodynamics Bioengineering (IEEE BIBE 2007) Boston, MA, 2007, pp. Applications," in International Conference on 1394-1398. Computational Science (ICCS) 2005, Kassel, Germany, [9] I. Banicescu and Z. Liu., "Adaptive Factoring: A 2005, pp. 534-543. Dynamic Scheduling Method Tuned to the Rate of Weight Changes," in High Performance Computing Symposium (HPC 2000), Washington, D.C., 2000, pp. 122-129.