Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems

Group Ratio Round-Robin: O(1) Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems Bogdan Caprita, Wong Chun Chan, Jason Nieh, Clifford Stein,∗ and Haoqiang Zheng Department of Computer Science Columbia University Email: {bc2008, wc164, nieh, cliff, hzheng}@cs.columbia.edu Abstract We present Group Ratio Round-Robin (GR3), the first pro- work has been done to provide proportional share schedul- portional share scheduler that combines accurate propor- ing on multiprocessor systems, which are increasingly com- tional fairness scheduling behavior with O(1) scheduling mon especially in small-scale configurations with two or overhead on both uniprocessor and multiprocessor systems. four processors. Over the years, a number of scheduling GR3 uses a simple grouping strategy to organize clients mechanisms have been proposed, and much progress has into groups of similar processor allocations which can be been made. However, previous mechanisms have either su- more easily scheduled. Using this strategy, GR3 combines perconstant overhead or less-than-ideal fairness properties. the benefits of low overhead round-robin execution with a We introduce Group Ratio Round-Robin (GR3), the 3 novel ratio-based scheduling algorithm. GR introduces a first proportional share scheduler that provides constant novel frontlog mechanism and weight readjustment algo- fairness bounds on proportional sharing accuracy with O(1) 3 rithm to operate effectively on multiprocessors. GR pro- scheduling overhead for both uniprocessor and small-scale vides fairness within a constant factor of the ideal general- multiprocessor systems. In designing GR3, we observed ized processor sharing model for client weights with a fixed that accurate, low-overhead proportional sharing is easy to upper bound and preserves its fairness properties on multi- achieve when scheduling a set of clients with equal pro- 3 processor systems. We have implemented GR in Linux cessor allocations, but is harder to do when clients require and measured its performance. Our experimental results very different allocations. Based on this observation, GR3 3 show that GR provides much lower scheduling overhead uses a simple client grouping strategy to organize clients and much better scheduling accuracy than other schedulers into groups of similar processor allocations which can be commonly used in research and practice. more easily scheduled. Using this grouping strategy, GR3 combines the benefits of low overhead round-robin execu- 1 Introduction tion with a novel ratio-based scheduling algorithm. GR3 uses the same basic uniprocessor scheduling al- Proportional share resource management provides a flexible gorithm for multiprocessor scheduling by introducing the and useful abstraction for multiplexing processor resources notion of a frontlog. On a multiprocessor system, a client among a set of clients. Proportional share scheduling has a may not be able to be scheduled to run on a processor be- clear colloquial meaning: given a set of clients with asso- cause it is currently running on another processor. To pre- ciated weights, a proportional share scheduler should allo- serve its fairness properties, GR3 keeps track of a frontlog cate resources to each client in proportion to its respective per client to indicate when the client was already running weight. However, developing processor scheduling mech- but could have been scheduled to run on another processor. anisms that combine good proportional fairness scheduling It then assigns the client a time quantum that is added to its behavior with low scheduling overhead has been difficult to allocation on the processor it is running on. The frontlog achieve in practice. For many proportional share scheduling ensures that a client receives its proportional share alloca- mechanisms, the time to select a client for execution grows tion while also taking advantage of any cache affinity by with the number of clients. For server systems which may continuing to run the client on the same processor. service large numbers of clients, the scheduling overhead of 3 algorithms whose complexity grows linearly with the num- GR provides a simple weight readjustment algo- ber of clients can waste more than 20 percent of system re- rithm that takes advantage of its grouping strategy. On sources [3] for large numbers of clients. Furthermore, little a multiprocessor system, proportional sharing is not fea- sible for some client weight assignments, such as having ∗also in Department of IEOR one client with weight 1 and another with weight 2 on a USENIX Association 2005 USENIX Annual Technical Conference 337 two-processor system. By organizing clients with similar on some metric of fairness, and has a time complexity that weights into groups, GR3 adjusts for infeasible weight as- grows with the number of clients. The other way is to ad- signments without the need to order clients, resulting in just the size of a client’s time quantum so that it runs longer lower scheduling complexity than previous approaches [7]. for a given allocation, as is done in weighted round-robin We have analyzed GR3 and show that with only O(1) (WRR). This is fast, providing constant time complexity overhead, GR3 provides fairness within O(g2) of the ideal scheduling overhead. However, allowing a client to mo- Generalized Processor Sharing (GPS) model [16], where g, nopolize the resource for a long period of time results in ex- the number of groups, grows at worst logarithmically with tended periods of unfairness to other clients which receive the largest client weight. Since g is in practice a small con- no service during those times. The unfairness is worse with stant, GR3 effectively provides constant fairness bounds skewed weight distributions. with only O(1) overhead. Moreover, we show that GR3 GR3 is a proportional share scheduler that matches uniquely preserves its worst-case time complexity and fair- with O(1) time complexity of round-robin scheduling but ness properties for multiprocessor systems. provides much better proportional fairness guarantees in We have implemented a prototype GR3 processor practice. At a high-level, the GR3 scheduling algorithm scheduler in Linux, and compared it against uniproces- can be briefly described in three parts: sor and multiprocessor schedulers commonly used in practice and research, including the standard Linux sched- 1. Client grouping strategy: Clients are separated into uler [2], Weighted Fair Queueing (WFQ) [11], Virtual-Time groups of clients with similar weight values. The k Round-Robin (VTRR) [17], and Smoothed Round-Robin group of order is assigned all clients with weights 2k 2k+1 − 1 k ≥ 0 (SRR) [9]. We have conducted both simulation studies and between to ,where . kernel measurements on micro-benchmarks and real ap- 2. Intergroup scheduling: Groups are ordered in a list 3 plications. Our results show that GR can provide more from largest to smallest group weight, where the group than an order of magnitude better proportional sharing ac- weight of a group is the sum of the weights of all curacy than these other schedulers, in some cases with more clients in the group. Groups are selected in a round- than an order of magnitude less overhead. These results robin manner based on the ratio of their group weights. 3 demonstrate that GR can in practice deliver better propor- If a group has already been selected more than its pro- tional share control with lower scheduling overhead than portional share of the time, move on to the next group 3 these other approaches. Furthermore, GR is simple to im- in the list. Otherwise, skip the remaining groups in plement and easy to incorporate into existing scheduling the group list and start selecting groups from the be- frameworks in commodity operating systems. ginning of the group list again. Since the groups with This paper presents the design, analysis, and evalua- larger weights are placed first in the list, this allows 3 tion of GR . Section 2 describes the uniprocessor schedul- them to get more service than the lower-weight groups ing algorithm. Section 3 describes extensions for multipro- at the end of the list. cessor scheduling, which we refer to as GR3MP. Section 4 analyzes the fairness and complexity of GR3. Section 5 3. Intragroup scheduling: From the selected group, a presents experimental results. Section 6 discusses related client is selected to run in a round-robin manner that work. Finally, we present some concluding remarks and accounts for its weight and previous execution history. directions for future work. Using this client grouping strategy, GR3 separates scheduling in a way that reduces the need to schedule enti- 2 GR3 Uniprocessor Scheduling ties with skewed weight distributions. The client grouping Uniprocessor scheduling, the process of scheduling a time- strategy limits the number of groups that need to be sched- multiplexed resource among a set of clients, has two basic uled since the number of groups grows at worst logarithmi- steps: 1) order the clients in a queue, 2) run the first client in cally with the largest client weight. Even a very large 32-bit the queue for its time quantum, which is the maximum time client weight would limit the number of groups to no more interval the client is allowed to run before another schedul- than 32. The client grouping strategy also ensures that all ing decision is made. We refer to the units of time quanta clients within a group have weight within a factor of two. As a result, the intragroup scheduler never needs to sched- as time units (tu) rather than an absolute time measure such 3 as seconds. A scheduler can therefore achieve proportional ule clients with skewed weight distributions. GR groups sharing in one of two ways. One way, often called fair are simple lists that do not need to be balanced; they do not queueing [11, 18, 28, 13, 24, 10] is to adjust the frequency require any use of more complex balanced tree structures.

Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems

Queue Scheduling Disciplines

Scheduling Algorithms

Tradeoffs Between Low Complexity, Low Latency, and Fairness with Deficit Round-Robin Schedulers 683

Fairness in a Data Center

Hardware Implementation of a Scheduler for High Performance Switches with Quality of Service Support

Performance Evaluation of Different Scheduling Algorithms in Wimax

Simulating Strict Priority Queueing, Weighted Round Robin, And

Packet Scheduling in Wireless Systems by Token Bank Fair Queueing Algorithm

Multi-Queue Fair Queuing Mohammad Hedayati, University of Rochester; Kai Shen, Google; Michael L

Research Article Packet Scheduling in HighSpeed Networks Using Improved Weighted Round Robin

Efficient Fair Queuing Using Deficit Round-Robin

Improved Waiting Time of Tasks Scheduled Under Preemptive Round Robin