Dynamic Voltage/Frequency Scaling and Power-Gating of Network-On-Chip with Machine Learning

Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Mark A. Clark May 2019 © 2019 Mark A. Clark. All Rights Reserved. 2 This thesis titled Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning by MARK A. CLARK has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Avinash Karanth Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract CLARK, MARK A., M.S., May 2019, Electrical Engineering Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning (89 pp.) Director of Thesis: Avinash Karanth Network-on-chip (NoC) continues to be the preferred communication fabric in multicore and manycore architectures as the NoC seamlessly blends the resource efficiency of the bus with the parallelization of the crossbar. However, without adaptable power management the NoC suffers from excessive static power consumption at higher core counts. Static power consumption will increase proportionally as the size of the NoC increases to accommodate higher core counts in the future. NoC also suffers from excessive dynamic energy as traffic loads fluctuate throughout the execution of an application. Power- gating (PG) and Dynamic Voltage and Frequency Scaling (DVFS) are two highly effective techniques proposed in literature to reduce static power and dynamic energy in the NoC respectively. DVFS is a popular technique that allows dynamic energy to be saved but may potentially lead to a loss in throughput. Power-gating allows static power to be saved but can introduce new problems incurred by isolating network routers. Further complications include the introduction of long wake-up delays and break-even times. However, both DVFS and power-gating are critical for realizing energy proportional computing as core counts race into the hundreds for multi-cores. In this thesis, we propose two distinct but related techniques that enable energy- proportional computing for NoC. We first propose LEAD - Learning-enabled Energy- Aware Dynamic voltage/frequency scaling for NoC architectures. LEAD applies machine learning (ML) techniques to enable improvements in both energy and performance with reduced overhead cost. This allows LEAD to enact a proactive energy management strategy that relies on an offline trained regression model while also providing a wide 4 variety of voltage/frequency (VF) pairs. In this work, we will refer to various VF pairs as modes. LEAD groups each router and the router’s outgoing links locally into the same V/F domain allowing energy management at a finer granularity without additional timing complications and overhead. We then build on LEAD and propose DozzNoC, an adaptable power management technique that effectively combines LEAD with a partially non-blocking power-gating technique. This allows DozzNoC to target both static power and dynamic energy simultaneously, thereby enabling energy proportional computing. Our ML DVFS techniques from LEAD are applied on top of a partially non-blocking power- gated scheme that uses real valued wake-up/switching delays. DozzNoC also allows independently power-gated or voltage scaled routers such that each router and its outgoing links share the same voltage/frequency domain. We evaluate both LEAD and DozzNoC using trace files generated from PARSEC 2.1 and Splash-2 benchmark suits. Trace files are gathered at various network sizes and across two different network topologies. For a 64 core 4 × 4 concentrated mesh (CMesh) network, simulation results show that LEAD can achieve an average of 17% dynamic energy savings for an average loss of only 4% throughput. Our simulation results for DozzNoC on an 8 × 8 mesh network show that for an average decrease of 7% in throughput, we can achieve an average dynamic energy savings of 25% and an average static power reduction of 53%. 5 Acknowledgments I thank my advisor, Dr. Avinash Karanth for the support, guidance, and motivation he provided. I also want to thank the many wonderful friends I made throughout my time at Ohio University, even if we have since gone our own ways in life. 6 Table of Contents Page Abstract.........................................3 Acknowledgments....................................5 List of Tables......................................8 List of Figures......................................9 List of Acronyms.................................... 11 1 Introduction..................................... 13 1.1 Integrated Circuits to Multicores....................... 13 1.2 Energy Proportional Computing and NoC.................. 16 1.3 Dynamic Voltage and Frequency Scaling for Multicores........... 19 1.4 Power-gating for Multicores......................... 21 1.5 Benefits of Machine Learning........................ 23 1.6 Major Contributions.............................. 24 1.7 Thesis Organization.............................. 25 2 LEAD: Offline Trained Proactive DVFS for NoC................. 27 2.1 Related Works................................. 27 2.2 LEAD Architecture.............................. 33 2.2.1 Operating V/F Modes........................ 33 2.3 DVFS Models and Implementation...................... 34 2.3.1 DVFS Implementation........................ 37 2.4 Machine Learning for DVFS......................... 39 3 DozzNoC: Combination of ML based DVFS and Power-Gating for NoC..... 41 3.1 Related Works................................. 41 3.2 DozzNoC Architecture............................ 46 3.2.1 Operational States.......................... 48 3.3 Power-Gated DVFS Models......................... 50 3.4 Machine Learning for PG-DVFS....................... 55 4 Performance Evaluation............................... 58 4.1 LEAD Simulation Methodology....................... 58 4.1.1 LEAD Model Variants........................ 60 4.1.2 LEAD Mode Breakdown....................... 61 4.2 LEAD ML Simulation Methodology..................... 62 7 4.2.1 LEAD Feature Engineering..................... 63 4.2.2 LEAD ML Accuracy......................... 66 4.3 DozzNoC Simulation Methodology..................... 67 4.3.1 DozzNoC Model Variants...................... 69 4.3.2 DozzNoC Mode Breakdown..................... 71 4.4 DozzNoC ML Simulation Methodology................... 72 4.4.1 DozzNoC Feature Engineering.................... 73 4.5 LEAD Results................................. 75 4.5.1 LEAD Energy and Throughput.................... 75 4.6 DozzNoC Results............................... 76 4.6.1 DozzNoC Throughput, Static Power, and Dynamic Energy..... 77 5 Conclusions and Future Work............................ 81 References........................................ 83 8 List of Tables Table Page 3.1 DozzNoC’s Reduced Feature Set [16]...................... 56 4.1 LEAD Benchmarks................................ 58 4.2 Multi2sim Parameters............................... 59 4.3 Dynamic Energy Per Hop (Modes 1-5) [17]©2018 ACM............ 60 4.4 Full LEAD Feature Set.............................. 65 4.5 Full LEAD Feature Set (Cont.).......................... 66 4.6 LEAD-τ Mode Selection Accuracy [17]©2018 ACM.............. 67 4.7 DozzNoC Benchmarks.............................. 68 4.8 Static Power and Dynamic Energy Per Hop for Active State Operational Modes [16]........................................ 70 9 List of Figures Figure Page 1.1 Rapid growth of processor performance from increased clock speed to multicore processors [46]................................ 15 1.2 Various network topologies ranging from the bus to a hypercube......... 17 1.3 Depiction of static power becoming the majority of power consumption in the NoC as technology size decreases. [9]....................... 18 1.4 Depiction of DVFS being applied at various granularities ranging from per network to per element............................... 20 1.5 An example of power-gating applied to the NoC where the router modification, handshaking, and router pipeline are shown from Power-Punch [13]....... 22 2.1 An example DVS link is shown in part (a), while a history-based DVS algorithm is shown in part (b) [51]......................... 30 2.2 A Threshold and PI controller Finite State Machine (FSM) are shown in (a), while a Greedy controller FSM is shown in (b) [30]................ 31 2.3 An example of simultaneous power, temperature, and performance management using Q-learning [52]............................. 32 2.4 We apply LEAD to a CMesh with 16 routers and 64 cores. We use on chip voltage regulators that can adjust the supply voltage between 0.8V and 1.2V, allowing us to apply DVFS to individual routers and their corresponding links [17]©2018 ACM.................................. 34 2.5 The architecture as well as all additional units required for reactive or proactive mode selection are shown in (a). A simple voltage regulator setup that allows the selection of voltage levels in the range of 0.8V to 1.2V for every router and its’ associated outgoing links is shown in (b) [17]©2018 ACM.......... 35 2.6 LEAD-τ uses a predicted input buffer utilization to select the optimal mode per epoch. LEAD-∆ uses a predicted change in input buffer utilization to move in the direction of the optimal mode per epoch. LEAD-G incorporates both energy and throughput into the label and

Dynamic Voltage/Frequency Scaling and Power-Gating of Network-On-Chip with Machine Learning

Power Management 24

Power Management Using FPGA Architectural Features Abu Eghan, Principal Engineer Xilinx Inc

Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice

Computer Architecture Techniques for Power-Efficiency

Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗

Power Reduction Techniques for Microprocessor Systems

Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems£

Happy: Hyperthread-Aware Power Profiling Dynamically

Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems †

Energy Proportional Computing in Commercial Fpgas with Adaptive

The Dynamic Voltage and Frequency Scaling Based on the On-Chip Microcontroller System

Optimization of Clock Gating Logic for Low Power LSI Design