(Thesis Proposal) Takayuki Osogami
Total Page:16
File Type:pdf, Size:1020Kb
Resource Allocation Solutions for Reducing Delay in Distributed Computing Systems (Thesis Proposal) Takayuki Osogami Department of Computer Science Carnegie Mellon University [email protected] April, 2004 Committee members: Mor Harchol-Balter (Chair) Hui Zhang Bruce Maggs Alan Scheller-Wolf (Tepper School of Business) Mark Squillante (IBM Research) 1 Introduction Waiting time (delay) is a source of frustration for users who receive service via computer or communication systems. This frustration can result in lost revenue, e.g., when a customer leaves a commercial web site to shop at a competitor's site. One obvious way to decrease delay is simply to buy (more expensive) faster machines. However, we can also decrease delay for free with given resources by making more efficient use of resources and by better scheduling jobs (i.e., by changing the order of jobs to be processed). For single server systems, it is well understood how to minimize mean delay, namely by the shortest remaining processing time first (SRPT) scheduling policy. SRPT can provide mean delay an order of magnitude smaller than a naive first come first serve (FCFS) scheduling policy. Also, the mean delay under various scheduling policies, including SRPT and FCFS, can be easily analyzed for a relatively broad class of single server systems (M/GI/1 queues). However, utilizing the full potential computing power of multiserver systems and analyzing their performance are much harder problems than for the case of a single server system. Despite the ubiquity of multiserver systems, it is not known how we should assign jobs to servers and how we should schedule jobs within each server to minimize the mean delay in multiserver systems. Also, it is not well understood how we can evaluate various assignment and scheduling policies for multiserver systems. In this thesis, we provide partial answers to these questions. 1.1 Multiserver architectures In this thesis, we seek to minimize delay in distributed computing systems (multiserver architec- tures). Figure 1 shows four common models of multiserver architectures that we consider in this thesis. (a) Network of workstations (NOW): There are n workstations and each workstation • owns a queue of jobs. Each workstation usually processes its own jobs, but we also al- low some workstations to help others (i.e., some workstations can process jobs from other workstations' queue). Examples of NOWs include local area networks in universities and companies. (b) Server farm with distributed queues: There are n servers and jobs arriving from • 1 (a) NOW (b) server farm (c) server farm (d) servers with (distributed queues) (central queue) affinities Figure 1: Four models of distributed computing systems that we consider in this thesis. outside the server farm are immediately dispatched to one of the servers. Examples of a server farm with distributed queues include high volume web servers. (c) Server farm with a central queue: There are n servers and one central queue. Here, • jobs arriving from outside the server farm waits in the central queue, and when one of the servers becomes available, a job is dispatched from the central queue to the available server. Examples of a server farm with a central queue include supercomputing centers. (d) Servers with affinities: There are n servers and m classes of jobs. Here, jobs typically • have different affinities with different servers, i.e., a job may be processed more quickly on one server than on another. Examples of servers with affinities include multiprocessor systems, where cache affinity can significantly affect processing speed, and call centers, where people with different abilities serve different types of requests. For simplicity, we set n = 2 and m = 2 in the figure. These models are not exhaustive, but they cover a wide range of distributed computing systems. 1.2 Where does delay come from We start by asking where delay comes from. Long waiting times are experienced when the system load is high, i.e., when the average arrival rate (jobs per second), λ, is high relative to the average service rate (jobs per second), µ (see Figure 2). 2 Long waiting times are also experienced when utilization of system resources is poor. When system resources cannot by fully utilized, the effective average service rate (jobs per second), µ0, becomes smaller than the potential average service rate, µ. (For example, the potential service capacity is eaten up by context switching time in switching from one type of jobs to another and migration time to transfer jobs from one server to another.) As a result, λ can be high relative to µ0, which has the same effect as high load (λ is high relative to µ), causing long waiting times. In fact, maximizing utilization does not necessarily result in minimizing delay in distributed computing systems, and this makes the design of resource allocation mechanisms in distributed computing systems difficult. We will see later that there are situation where we want to keep some servers idle even in the presence of jobs in queue so that more important (e.g. short processing time) future arrivals can receive service immediately upon their arrivals. High load and poor utilization are not the only causes of the long waiting time; we can experience long waiting time even when the average system load is low (see Figure 3). The long waiting time at low load is primarily due to variability in service demand and/or interarrival time, but other factors such as higher moments and correlation of service demand and interarrival time can also increase the waiting time. Even if long-run average load is not too high, fluctuation in the load can cause lots of delay, i.e., high instantaneous load can be problematic. In particular, variability and autocorrelation in interarrival times often causes fluctuations in load, and peak load and average load can differ by an order of magnitude. 1.3 Brief summary of prior work on minimizing delay We briefly summarizes prior work on minimizing delay by reducing the impact of high load and service demand variability, which are the primary sources of delay. More detailed literature review will be provided in later sections as needed. 1.3.1 Minimizing delay by combating high load When a system is overloaded, that is when the arrival rate is higher than the service rate, we need to either decrease the arrival rate, degrade the service quality, or increase the service rate, to keep the mean waiting time low. Below we classify common approaches to combating high load into three types: decreasing the arrival rate, degrading the service quality, and increasing the service 3 λ 50 40 jobs in queue (waiting) 30 20 a job in service mean waiting time 10 µ 0 0 0.2 0.4 ρ 0.6 0.8 1 (a) An M/M/1 queue (b) The mean waiting time in an M/M/1 queue λ Figure 2: The mean waiting time in an M/M/1/FCFS queue as a function of system load, ρ = µ , where λ is the average arrival rate (jobs per second) and µ = 1:0 is the average service rate (jobs per second). rate. One way to decrease the arrival rate is to reject some arrivals into the system; this approach is known as admission control and has been applied to various computer systems such as web servers [29, 30, 31, 103, 155, 158] and packet networks [81, 18, 19]. Admission control may be combined with some scheduling policy so that rather than just dropping an arrival at random, the scheduling policy determines which arrivals to drop [28]. Degrading the service quality during overload period (e.g., by omitting pictures from web pages) is also popular at web servers and has been studied as content adaptation [1, 25]. An advantage of admission control and content adaptation is that they can be exercised within a single system. When multiple systems are available, we can increase the service rate of an overloaded system by utilizing resources of other systems. Load balancing and cycle stealing are two popular ap- proaches that make use of multiple systems to mitigate the impact of overload. Load balancing mitigates the impact of overload in a system, by sharing the load among many systems. Load balancing has been popular in networks of workstations (NOW) [21, 60] and implemented in 4 50 50 40 40 2 30 C =64 30 C2=64 S A 20 20 mean waiting time 2 mean waiting time C =8 2 10 S 2 10 C =8 2 C =1 A C =1 S A 0 0 0 0.2 0.4 ρ 0.6 0.8 1 0 0.2 0.4 ρ 0.6 0.8 1 (a) The mean waiting time in M/G/1 (b) The mean waiting time in G/M/1 Figure 3: The mean waiting time in an M/G/1/FCFS queue and the mean waiting time in a λ G/M/1/FCFS queue as a function of system load, ρ = µ , where λ is the average arrival rate (jobs per second) and µ = 1:0 is the average service rate (jobs per second). The variability of the service σ(S) demand is represented by the coefficient of variability, CS = E[S] , where E[S] denotes the mean service demand and σ(S) denotes the standard deviation of the service demand. The variability of the interarrival time is represented by CA, which is defined analogously. (In (b), the arrival process is assumed to be a batch Poisson process with geometric batch size.) systems such as MOSIX [11] and Utopia [173]. A disadvantage of load balancing is that load from overloaded system can slow down other lightly loaded systems.