Stride Scheduling: Deterministic Proportional-Share Resource
Total Page:16
File Type:pdf, Size:1020Kb
Stride Scheduling: Deterministic Proportional-Share Resource Management Carl A. Waldspurger William E. Weihl Technical Memorandum MIT/LCS/TM-528 MIT Laboratory for Computer Science Cambridge, MA 02139 June 22, 1995 Abstract rates is required to achieve service rate objectives for users and applications. Such control is desirable across a This paper presents stride scheduling, a deterministic schedul- broad spectrum of systems, including databases, media- ing technique that ef®ciently supports the same ¯exible based applications, and networks. Motivating examples resource management abstractions introduced by lottery include control over frame rates for competing video scheduling. Compared to lottery scheduling, stride schedul- viewers, query rates for concurrent clients by databases ing achieves signi®cantly improved accuracy over relative and Web servers, and the consumption of shared re- throughput rates, with signi®cantly lower response time vari- sources by long-running computations. ability. Stride scheduling implements proportional-share con- trol over processor time and other resources by cross-applying Few general-purpose approaches have been proposed elements of rate-based ¯ow control algorithms designed for to support ¯exible, responsive control over service rates. networks. We introduce new techniques to support dynamic We recently introduced lottery scheduling, a random- changes and higher-level resource management abstractions. ized resource allocation mechanism that provides ef®- We also introduce a novel hierarchical stride scheduling al- cient, responsive control over relative computation rates gorithm that achieves better throughput accuracy and lower [Wal94]. Lottery scheduling implements proportional- response time variability than prior schemes. Stride schedul- share resource management ± the resource consumption ing is evaluated using both simulations and prototypes imple- rates of active clients are proportional to the relative mented for the Linux kernel. shares that they are allocated. Higher-level abstractions for ¯exible, modular resource management were also Keywords: dynamic scheduling, proportional-share resource introduced with lottery scheduling, but they do not de- allocation, rate-based service, service rate objectives pend on the randomized implementation of proportional sharing. 1 Introduction In this paper we introduce stride scheduling, a deter- ministic scheduling technique that ef®ciently supports Schedulers for multithreaded systems must multiplex the same ¯exible resource management abstractions in- scarce resources in order to service requests of varying troduced by lottery scheduling. One contribution of our importance. Accurate control over relative computation work is a cross-application and generalization of rate- ¢ ¡ based ¯ow control algorithms designed for networks E-mail: carl, weihl £ @lcs.mit.edu. World Wide Web: http://www.psg.lcs.mit.edu/. Prof. Weihl is currently sup- [Dem90, Zha91, ZhK91, Par93] to schedule other re- ported by DEC while on sabbatical at DEC SRC. This research sources such as processor time. We present new tech- was also supported by ARPA under contract N00014-94-1-0985, by niques to support dynamic operations such as the modi®- grants from AT&T and IBM, and by an equipment grant from DEC. cation of relative allocations and the transfer of resource The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the of®cial rights between clients. We also introduce a novel hier- policies, either expressed or implied, of the U.S. government. archical stride scheduling algorithm. Hierarchical stride 1 scheduling is a recursive application of the basic tech- ence between the speci®ed and actual number of alloca- nique that achieves better throughput accuracy and lower tions that a client receives during a series of allocations. ¢ response time variability than previous schemes. If a client has tickets in a system with a total of £ Simulation results demonstrate that, compared to lot- tickets, then its speci®ed allocation after ¤¦¥ consecutive ¢ © ¥¨§ £ tery scheduling, stride scheduling achieves signi®cantly allocations is ¤ . Due to quantization, it is improved accuracy over relative throughput rates, with typically impossible to achieve this ideal exactly. We signi®cantly lower response time variability. In con- de®ne a client's absolute error as the absolute value of trast to other deterministic schemes, stride scheduling the difference between its speci®ed and actual number ef®ciently supports operations that dynamically modify of allocations. We de®ne the pairwise relative error relative allocations and the number of clients competing between clients and as the absolute error for the ¢ ¢ £ for a resource. We have also implemented prototype subsystem containing only and , where , ¥ stride schedulers for the Linux kernel, and found that and ¤ is the total number of allocations received by both they provide accurate control over both processor time clients. and the relative network transmission rates of competing While lottery scheduling offers probabilistic guaran- sockets. tees about throughput and response time, stride schedul- In the next section, we present the core stride- ing provides stronger deterministic guarantees. For lot- ¥ scheduling mechanism. Section 3 describes extensions tery scheduling, after a series of ¤ allocations, a client's that support the resource management abstractions in- expected relative error and expected absolute error are ¤¥ troduced with lottery scheduling. Section 4 introduces both . For stride scheduling, the relative error hierarchical stride scheduling. Simulation results with for any pair of clients is never greater than one, inde- quantitative comparisons to lottery scheduling appear in pendent of ¤¥ . However, for skewed ticket distributions Section 5. A discussion of our Linux prototypes and re- it is still possible for a client to have ¤¦ absolute lated implementation issues are presented in Section 6. error, where ¤ is the number of clients. Nevertheless, In Section 7, we examine related work. Finally, we stride scheduling is considerably more accurate than lot- summarize our conclusions in Section 8. tery scheduling, since its error does not grow with the number of allocations. In Section 4, we introduce a 2 Stride Scheduling hierarchical variant of stride scheduling that provides a tighter "!#¤ bound on each client's absolute error. Stride scheduling is a deterministic allocation mecha- This section ®rst presents the basic stride-scheduling nism for time-shared resources. Resources are allocated algorithm, and then introduces extensions that support in discrete time slices; we refer to the duration of a dynamic client participation, dynamic modi®cations to standard time slice as a quantum. Resource rights are ticket allocations, and nonuniform quanta. represented by tickets ± abstract, ®rst-class objects that can be issued in different amounts and passed between 2.1 Basic Algorithm clients. Throughput rates for active clients are directly The core stride scheduling idea is to compute a repre- proportional to their ticket allocations. Thus, a client sentation of the time interval, or stride, that a client must with twice as many tickets as another will receive twice wait between successive allocations. The client with the as much of a resource in a given time interval. Client smallest stride will be scheduled most frequently. A response times are inversely proportional to ticket allo- client with half the stride of another will execute twice cations. Therefore a client with twice as many tickets as quickly; a client with double the stride of another as another will wait only half as long before acquiring a will execute twice as slowly. Strides are represented in resource. virtual time units called passes, instead of units of real The throughput accuracy of a proportional-share time such as seconds. scheduler can be characterized by measuring the differ- Three state variables are associated with each client: ¡ In this paper we use the same terminology (e.g., tickets and tickets, stride, and pass. The tickets ®eld speci®es currencies) that we introduced for lottery scheduling [Wal94]. the client's resource allocation, relative to other clients. 2 The stride ®eld is inversely proportional to tickets, and represents the interval between selections, measured in passes. The pass ®eld represents the virtual time index for the client's next selection. Performing a resource allocation is very simple: the client with the minimum pass is selected, and its pass /* per-client state */ typedef struct ¢ is advanced by its stride. If more than one client has ¡ ¢ the same minimum pass value, then any of them may be int tickets, stride, pass; selected. A reasonable deterministic approach is to use £ *client t; a consistent ordering to break ties, such as one de®ned by unique client identi®ers. /* large integer stride constant (e.g. 1M) */ const int stride1 = (1 << 20); Figure 1 lists ANSI C code for the basic stride scheduling algorithm. For simplicity, we assume a static /* current resource owner */ set of clients with ®xed ticket assignments. The stride client t ; current scheduling state for each client must be initialized via /* initialize client with speci®ed allocation */ client init() before any allocations are performed by al- void client init(client t c, queue t q, int tickets) locate(). These restrictions will be relaxed in subsequent ¢ sections