Multidimensional Load Balancing and Finer Grained Resource Allocation Employing Online Performance Monitoring Capabilities

Multidimensional Load Balancing and Finer Grained Resource Allocation Employing Online Performance Monitoring Capabilities A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Jacob A. Cooper August 2015 © 2015 Jacob A. Cooper. All Rights Reserved. 2 This thesis titled Multidimensional Load Balancing and Finer Grained Resource Allocation Employing Online Performance Monitoring Capabilities by JACOB A. COOPER has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Frank Drews Assistant Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract COOPER, JACOB A., M.S., August 2015, Computer Science Multidimensional Load Balancing and Finer Grained Resource Allocation Employing Online Performance Monitoring Capabilities (99 pp.) Director of Thesis: Frank Drews The development of increasingly accurate Proportional Share Schedulers (PSS) over recent years has allowed for reduced jitter and improved Quality of Service (QoS) to clients competing for computing system resources. Having originating in packet switching networks, PSS constructs have been extended to operating system’s time sharing of system CPUs between competing system tasks. A great deal of research has concentrated on fairness with respect to tasks local to the same run queue and has been successful in providing bounds limiting discrepancy between the ideal and actualized CPU utilization. Providing fairness in multiprocessor systems introduces added complexity and inherently reduces fairness guarantees over uni-processor counterparts. System developers and hardware manufactures alike have long strived to provide for Symmetric Multiprocessing (SMP) computing, providing equivalent performance across Processing Elements (PE), though are mathematically constricted by the problem construct. In order to provide equivalent performance, eachPE must receive equal work even though computing work is not infinitesimally divisible. Therefore the problem relies on optimizing the potentially infeasible Partition Problem, a variant of the NP-Complete subset sum problem. Task processing requirements extend beyond CPU utilization, and include use of execution units, caches, and buses within the processor. These resources are generally allocated indirectly, allowing their use with respect to scheduled CPU time. This work focuses on addressing this additional complexity in the fair allocation of multiprocessor systems by describing a method which provides online profiling of some of these 4 resources and a multidimensional load balancing technique aimed at increasing fairness by reducing contention on one type of finer grained resource. 5 Table of Contents Page Abstract.........................................3 List of Tables......................................7 List of Figures......................................8 List of Acronyms.................................... 10 1 Introduction..................................... 12 1.1 Motivating Practical Example......................... 14 1.2 Contributions................................. 15 2 Background and Related Work........................... 19 2.1 Generalized Processor Sharing........................ 19 2.2 Load Balancing................................ 20 2.2.1 Subset Sum Problem......................... 21 2.2.2 Partition Problem........................... 22 2.2.3 Subset Sum and Partition Problem Literature............ 23 2.2.4 Infeasible Task Weights....................... 23 2.2.5 Load Balancing in the 3.2 Linux Kernel............... 25 2.3 Task Scheduling................................ 26 2.3.1 Task Classes............................. 27 2.3.1.1 Batch Tasks........................ 27 2.3.1.2 Interactive Tasks...................... 27 2.3.1.3 Real-Time Tasks...................... 28 2.3.2 Dynamic System Events....................... 29 2.4 Proportional and Fair Share Scheduling................... 29 2.4.1 Earliest Eligible Virtual Deadline First................ 31 2.4.2 Completely Fair Scheduler...................... 32 2.4.3 Red Black Binary Search Tree.................... 33 2.4.4 Distributed Weighted Round-Robin................. 35 2.4.5 Additional Proportional Share Algorithms.............. 36 2.5 Performance Monitoring Counters...................... 38 3 Methodology.................................... 39 3.1 Motivation................................... 39 3.2 Synopsis.................................... 40 3.2.1 Partition Optimization Problem................... 40 3.2.2 Partition Load Range......................... 41 6 3.2.3 Mean Deviation............................ 41 3.2.4 Variance................................ 42 3.3 Finer Grained Resources........................... 43 3.3.1 Multidimensional Load Balancing and Partition Problem...... 43 3.3.2 Interdependent Multidimensional Load Balancing.......... 44 4 Testing Framework and Experiments........................ 46 4.1 Experimental Environment.......................... 46 4.2 Initial Considerations and Experiments.................... 46 4.3 Performance Monitoring Overview...................... 49 4.3.1 Performance Monitoring Profiling Overhead............ 51 4.3.2 Performance Monitoring in Linux.................. 52 4.4 Linux Scheduling Classes........................... 53 4.5 Performance Scheduler Class......................... 54 4.6 Performance Monitoring Kernel Modules.................. 56 4.6.1 Controlling Performance Event Counters.............. 56 4.6.2 Reading Performance Event Data.................. 57 4.6.3 Performance Scheduler Debug Module............... 58 4.7 Performance Scheduling Class Tests..................... 58 4.7.1 Performance Monitoring Accuracy.................. 58 4.7.2 Performance Monitoring Overhead................. 59 4.8 Simulation Tests................................ 61 4.8.1 Simulator Design........................... 61 4.8.1.1 Experimental Resource Dependent Balanced Load Par- titioning Algorithm.................... 64 4.8.1.2 Scalability of Finely Grained Resource Partitioning... 68 4.8.1.3 Dynamics of Finely Grained Resource Partitioning.... 69 4.8.1.4 Simulator Experimentation Evaluation.......... 69 4.8.2 Simulator Experimental Results - Dynamic Load.......... 70 4.8.3 Multidimensional Load Balancing Effects on Fairness....... 75 5 Conclusions and Future Work............................ 78 5.1 Conclusions.................................. 78 5.2 Future Work.................................. 78 References........................................ 80 Appendix A: Algorithms................................ 83 Appendix B: Additional Test Results and Figures................... 90 7 List of Tables Table Page 4.1 Baseline Contrived Load............................. 47 4.2 Imbalanced Contrived Load............................ 48 4.3 Balanced Contrived Load............................. 49 4.4 Interbench Result: Audio Load.......................... 60 4.5 Work Factor Table................................. 70 4.6 Simulator Test Results - Multiple Test Instances................. 71 4.7 Simulator Test Results - Ratio Range Test.................... 73 4.8 Simulator Test Results - Repeated Consistency Test............... 74 B.1 Video Load Interbench: Vanilla Scheduler vs Performance Monitoring Delta.. 90 B.2 X Load Interbench: Vanilla Scheduler vs Performance Monitoring Delta.... 90 B.3 Gaming Load Interbench: Vanilla Scheduler vs Performance Monitoring Delta. 91 8 List of Figures Figure Page 2.1 Red-Black Binary Search Tree.......................... 34 4.1 PERFEVTSELx Bitfeilds............................. 51 4.2 PERF GLOBAL CTRL Bitfeilds......................... 51 4.3 Performance Monitoring Counters (PMC) Selection Pseudocode........ 52 4.4 PMC Collection Pseudocode........................... 52 4.5 Linux Scheduling Class Priority List Modified Priority Marked by Dashed Arrows...................................... 53 4.6 PMC Accuracy Test Pseudocode......................... 58 4.7 Baseline Balancing Dynamic Load Generation - Seed 0 Number of Migrations and Tasks..................................... 63 4.8 Simulator Test Results Repeated Consistency Normal Distribution Probability Density...................................... 75 4.9 Standard Deviation Percent Imbalance - 4 Queues Balancing through Steady State Static Load Generation - Seed 893756................... 76 A.1 Distributed Partitioning Dynamic Programming Algorithm........... 83 A.2 Distributed Partitioning Algorithm - Priority Dimension - Part 1........ 85 A.3 Distributed Partitioning Algorithm - Priority Dimension - Part 2........ 86 A.4 Distributed Partitioning Algorithm - Second Dimension - Part 1........ 87 A.5 Distributed Partitioning Algorithm - Second Dimension - Part 2........ 88 A.6 Baseline Balancing Algorithm.......................... 89 B.1 cpuid Command Results............................. 91 B.2 Baseline Balancing to Steady State Static Load Generation Efficiency Over Time 92 B.3 Baseline Balancing to Steady State Static Load Generation - Seed 893756 Number of Migrations and Tasks......................... 92 B.4 Multidimensional Balancing to Steady State Static Load Generation Efficiency Over Time..................................... 93 B.5 Multidimensional Balancing to Steady State Static Load Generation - Seed 893756 Number of Migrations and Tasks..................... 93 B.6 Standard Deviation Percent

Load more