System-Level Power Management Using Online Machine Learning for Prediction and Adaptation

UNIVERSITY OF SOUTHAMPTON System-Level Power Management using Online Machine Learning for Prediction and Adaptation by Luis Alfonso Maeda-Nunez A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Physical Sciences and Engineering Electronics and Computer Science 2016-07-22 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy by Luis Alfonso Maeda-Nunez Nowadays embedded devices have the need to be portable, battery powered and high performance. This need for high performance makes power management a matter of critical priority. Power management algorithms exist, but most of the approaches focus on an energy-performance trade-off oblivious to the applications running on the system. Others are application-specific and their solution cannot be applied to other applications. This work proposes Shepherd, a cross-layer runtime management system for reduction of energy consumption whilst offering soft real-time performance. It is cross-layer because it takes the performance requirements from the application, and learns to adjust the power management knobs to provide the expected performance at the minimum cost of energy. Shepherd is implemented as a Linux governor running at OS level, this layer offers a low-overhead interface to change the CPU voltage and frequency dynamically. As opposed to the reactive behaviour of Linux Governors, Shepherd adapts to the application-specific performance requirements dynamically, and proactively selects the power state that fulfils these requirements while consuming the least power. Proac- tiveness is achieved by using AEWMA for adapting to the upcoming workload. These adaptations are facilitated using a model-free reinforcement learning algorithm, that once it learns the optimal decisions it starts exploiting them. This work enables Shep- herd to work with different applications. A programming framework was designed to allow programmers to develop their applications to be power-aware, by enabling them to send their performance requirements and annotations to Shepherd and provide the cross-layer soft real-time performance desired. Shepherd is implemented within the Linux Kernel 3.7.10, interfacing with the application and hardware to select an appropriate voltage-frequency control for the executing application. The performance of Shepherd is demonstrated on an ARM Cortex-A8 pro- cessor. Experiments conducted with multimedia applications demonstrate that Shep- herd minimises energy consumption by up to 30% against existing Governors. Also, the framework has been used to adapt example applications to work with Shepherd, achieving 60% energy savings compared to the existing approaches. Contents Acknowledgements xvii 1 Introduction 1 1.1 Research Justification . .2 1.1.1 Power Consumption . .2 1.1.2 Low Power Design Techniques . .3 1.1.2.1 Power Gating . .4 1.1.2.2 Dynamic Voltage and Frequency Scaling . .5 1.1.3 Challenges of System Level Power Management . .6 1.2 Research Questions . .9 1.3 Research Contributions . .9 1.4 Research Outputs . 10 1.5 Document Outline . 10 2 Power Management in Microprocessors 11 2.1 Power consumption in Microprocessors . 13 2.1.1 Dynamic Power . 14 2.1.1.1 Switching Power . 15 2.1.1.2 Internal Power . 16 2.1.2 Static Power . 17 2.2 Power Management Techniques: Knobs . 18 2.2.1 Multi-Vdd . 18 2.2.2 Dynamic Voltage and Frequency Scaling (DVFS) . 18 2.2.3 Power Gating (PG) . 19 2.3 Applications and OS: Requirements . 22 2.3.1 Real-Time Systems . 22 2.3.2 Hard Real-Time versus Soft Real-Time . 23 2.4 Power Management: Control . 24 2.4.1 Dynamic Power Management (DPM) . 24 2.4.1.1 Predictive Techniques . 24 2.4.1.2 Stochastic Techniques . 26 2.4.1.3 Machine Learning Techniques . 27 2.4.1.4 Reinforcement Learning for DPM . 29 2.4.1.5 Other Predictive Techniques . 30 2.4.2 DVFS . 31 2.4.2.1 Workload Detection . 31 2.4.2.2 Online Learning . 32 v vi CONTENTS 2.4.2.3 Offline Learning . 32 2.4.2.4 Workload Prediction . 34 2.4.2.5 Deadline Prediction . 34 2.4.2.6 Application-specific DVFS . 35 2.4.2.7 General Purpose DVFS . 36 2.4.3 The Interplay of DPM and DVFS . 37 2.4.4 Reinforcement Learning . 38 2.5 Discussion . 41 3 System-Level Power Management 43 3.1 Run-Time Management . 43 3.1.1 Requriements . 44 3.1.1.1 Application adaptation for use with RTM . 47 3.1.2 Power Knobs . 47 3.1.3 Control . 48 3.1.3.1 Monitor Feedback . 48 3.1.4 Run-Time Management Algorithm . 48 3.2 Prediction Unit . 50 3.2.1 Motivation . 50 3.2.2 EWMA . 51 3.2.2.1 Optimisation and Results . 52 3.2.3 AEWMA . 54 3.3 Decision Unit . 57 3.3.1 Cost Function . 58 3.3.2 Learning Phase in Shepherd . 60 3.4 Discussion and Summary . 61 4 Implementation of Run-time Manager 65 4.1 Implementation as Linux Governor . 65 4.1.1 Restrictions and optimisations for Linux Governor Implementation 67 4.2 Implementation of Prediction Unit . 68 4.3 Implementation of Decision Unit . 70 4.3.1 Implementation of Learning Phase in Shepherd . 71 4.3.2 Action Suitability for Exploiting Learning . 73 4.4 Performance Counters Module . 74 4.5 An Application Programming Interface (API) for Shepherd . 75 4.5.1 Design flow for application adapted to use Shepherd Governor . 77 4.6 Discussion . 78 5 Results 79 5.1 Experimental Setup . 79 5.2 Case Study: Run-Time Manager for Video Decoding . 81 5.2.1 Real time behaviour of Shepherd . 82 5.2.1.1 Shepherd on H.264 VGA . 84 5.2.1.2 Shepherd on H.264 QVGA . 85 5.2.1.3 Run-time Manager (RTM) on H.263 . 86 5.2.1.4 Shepherd on MPEG2 . 86 CONTENTS vii 5.2.1.5 Shepherd on MPEG4 . 87 5.2.2 Governor comparison . 87 5.2.2.1 Comparison versus Dynamic Governors . 87 5.2.2.2 Comparison versus Static Governors . 88 5.3 Shepherd API results . 89 5.4 Overheads . 91 5.5 Discussion and Summary . 92 6 Conclusions 93 6.1 Future Work . 94 Bibliography 97 Appendix A - Sample Shepherd Output Files 107 Appendix B - Publications 109 List of Figures 1.1 Dynamic and Leakage Power trend[1] . .3 1.2 Power vs Energy. Reproduced from [2] . .6 2.1 Flowchart of an embedded system in a cross-layer approach. Analytical approach (left) and practical approach (right). Sections 2.2, 2.3 and 2.4 are described as per the figure . 12 2.2 A CMOS Inverter showing the charging and discharging of the capacitor Cload. Based on [3] . 15 2.3 A CMOS Inverter showing the short circuit current flowing from VDD to VSS. Based on [4] . 16 2.4 Leakage Currents in a CMOS circuit (a). Taken from [2]. ITRS trends for leakage power dissipation (b). Based on [5]. 18 2.5 Definitions of power and times for a device with IDLE and SLEEP modes 20 2.6 Markov chain and Markov Decision Process model for DPM of Stron- gARM SA-1100, as presented by Benini et al. [6] . 26 2.7 Decision Tree example for selecting V-F settings. Taken from [7]. 33 2.8 Reinforcement Learning simple diagram. Taken from [8] . 38 3.1 Generic cross-layer RTM, where the arrows show the communication between the layers. 44 3.2 Run-Time Management Unit in the cross-layer approach . 49 3.3 Comparison of real workload vs. predicted on a sample video, from frames 430 to 615 . 52 3.4 Effect of weight for Exponential Weighted Moving Average (EWMA) on prediction error using dynamic workloads . 53 3.5 Effect of weight for EWMA on prediction error using static workloads . 54 3.6 Sample of video with frame types 1, 2 and 3. Red line represents the group of frames of type 2 (cross) and 3 (green filled circles) between frames type 1 (blue hollow circles) used to calculate local standard deviation. 55 3.7 AEWMA λ parameter change at transitions . 56 3.8 Cost function of Q-Learning algorithm for Shepherd. This graph shows the level of reward/punishment obtained for finishing the workload too early (A), very close to the deadline without surpassing it (B), finishing the workload shortly after the deadline (C) and finishing very late (D). Time left for deadline is calculated as 100 tdeadline−tfinished ......... 60 tdeadline 3.9 Q-Table during A) exploration and B) exploitation phases. The red boxes represent the best Action for each State. 61 3.10 Suitability of actions for a particular state (state 4), defined by the Q Values of the different actions . 62 ix x LIST OF FIGURES 4.1 Shepherd governor implementation . 67 4.2 Random number distribution generated from taking the 8 Least Signif- icant Bits (LSBs) from the CPU cycles measured per frame, over 5760 frames. 68 4.3 Adaptation of program code for utilisation of the Shepherd Run-Time Manager . 78 5.1 BeagleBoard-xM[9] development board used for running the experiments of Shepherd . 80 5.2 Video Decoding experiment using Shepherd Governor running on BeagleBoard- xM........................................ 81 5.3 Performance and Power Consumption of the governors Shepherd and On- demand for an H.264 Video ..

Load more