Measurement, Modeling, and Characterization for Energy-Efﬁcient Computing

THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Measurement, Modeling, and Characterization for Energy-Efficient Computing BHAVISHYA GOEL Division of Computer Engineering Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden 2016 Measurement, Modeling, and Characterization for Energy-Efficient Computing Bhavishya Goel ISBN 978-91-7597-411-8 Copyright c Bhavishya Goel, 2016. Doktorsavhandlingar vid Chalmers tekniska högskola Ny serie Nr 4092 ISSN 0346-718X Technical Report 129D Department of Computer Science and Engineering Computer Architecture Research Group Department of Computer Science and Engineering Chalmers University of Technology SE-412 96 GÖTEBORG, Sweden Phone: +46 (0)31-772 10 00 Author e-mail: [email protected] Chalmers Reposervice Göteborg, Sweden 2016 Measurement, Modeling, and Characterization for Energy-Efficient Computing Bhavishya Goel Division of Computer Engineering, Chalmers University of Technology ABSTRACT The ever-increasing ecological footprint of Information Technology (IT) sector coupled with adverse effects of high power consumption on electronic circuits has increased the significance of energy-efficient computing in the last decade. Making energy-efficient computing a norm rather than an exception requires that system designers and program- mers understand the energy implications of their design and implementation choices. This necessitates a detailed view of system’s energy expenditure and/or power consumption. We explore this aspect of energy-efficient computing in this thesis through power measurement, power modeling, and energy characterization. First, we present a quantitative comparison between power measurement data col- lected for computer systems using four techniques: a power meter at wall outlet, current transducers at ATX power rails, CPU voltage regulator’s current monitor, and Intel’s proprietary RAPL (Running Average Power Limit) interface. We compare them for accuracy, sensitivity and accessibility. Second, we present two different methodologies to model processor power consumption. The first model estimates power consumption at the granularity of individual cores using per-core performance events and temperature sensors. We validate the methodology on six different platforms and show that our model estimates power consumption with high accuracy across all platforms consistently. To understand the energy expenditure trends across different frequencies and different degrees of parallelism, we need to model power at a much finer granularity. The second power model addresses this issue by estimating static and dynamic power consumption for individual cores and the uncore. We validate this model on Intel’s Haswell platform for single-threaded and multi-threaded benchmarks. We use this power model to characterize energy efficiency of frequency scaling on Haswell microarchitecture and use the insights to implement a low overhead DVFS scheduler. We also characterize the energy efficiency of thread scaling using the power model and demonstrate how different communication parame- ters and microarchitectural traits affect application’s energy when it scales. Finally, we perform detailed performance and energy characterization of Intel’s Re- stricted Transactional Memory (RTM). We use TinySTM software transactional memory (STM) system to benchmark RTM’s performance against competing STM alternatives. We use microbenchmarks and STAMP benchmark suite to compare RTM an STM performance and energy behavior. We quantify the RTM hardware limitations and identify conditions required for RTM to outperform STM. Keywords: power estimation, energy characterization, power-aware scheduling, power management, transactional memory, power measurement ii Preface Parts of the contributions presented in this thesis have previously been published in the following manuscripts. Bhavishya Goel, Sally A. McKee, Roberto Gioiosa, Karan Singh, Major Bhadauria, Marco Cesati, “Portable, Scalable, Per-Core Power Estimation for Intelligent Resource Management” in Proceedings of the 1st International Green Computing Conference, Chicago, USA, August, 2010, pp.135-146. Bhavishya Goel, Sally A. McKee, Magnus Själander, “Techniques to Measure, Model, and Manage Power,” in Advances in Comput- ers 87, 2012, pp.7-54. Bhavishya Goel, Ruben Titos-Gil, Anurag Negi, Sally A. Mc- Kee, Per Stenstrom, “Performance and Energy Analysis of the Re- stricted Transactional Memory Implementation on Haswell” in Pro- ceedings of 28th IEEE International Parallel and Distributed Pro- cessing Symposium, Phoenix, USA, May, 2014, pp.615-624 The following manuscript has been accepted but is yet to be published. Bhavishya Goel, Sally A. McKee, “A Methodology for Modeling Dynamic and Static Power Consumption for Multicore Processors” in Proceedings of 30th IEEE International Parallel and Distributed Processing Symposium, Chicago, USA, May, 2016 The following manuscripts have been published but are not included in this work. iii PREFACE iv . Bhavishya Goel, Magnus Själander, Sally A. McKee, “RTL Model for Dynamically Resizable L1 Data Cache” in Swedish System-on- Chip Conference, Ystad, Sweden, 2013 . Magnus Själander, Sally A. McKee, Bhavishya Goel, Peter Brauer, David Engdal, Andras Vajda, “Power-Aware Resource Scheduling in Base Stations” in 19th Annual IEEE/ACM International Sym- posium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Singapore, Singapore, July, 2012, pp.462- 465. Acknowledgments I will like to thank following people for supporting me during my PhD and contributing to this thesis directly or indirectly. Professor Sally A. McKee for giving me the opportunity to work with her, for giving me the confidence boost during the lows and for cele- brating my highs. She always trusted me to choose my own research path but was always there whenever I needed her for counseling. She is also an excellent cook and her brownies and palak paneers have got me through various deadlines. Magnus Själander for helping me with my research during the early years of my PhD and letting me bounce ideas off him. His unabated enthusiasm for research is very contagious. Madhavan Manivannan for being a great friend and colleague; and for spending loads of his precious time to review my papers and thesis. Professor Lars Svensson for helping me with practical issues during his time as division head and for giving me critical feedback to improve the technical and non-technical aspects of this thesis. Professor Per Larsson-Edefors for teaching some of the most entertain- ing courses I have taken at Chalmers and for his valuable feedback from time to time. Professor Per Stenström for being an inspiration in the field of computer architecture. Professor Lieven Eeckhout for evaluating my licentiate thesis and for his insightful comments on my research. v ACKNOWLEDGMENTS vi . Anurag Negi for being an accommodating roommate and a helpful co- author. Ruben Titos for helping me in getting started with the basics of transactional memory, for doing his part in the paper we co-authored and for cheering me up with his positive vibes. Karan Singh, Vincent M. Weaver and Major Bhadauria for getting me started with the power modeling infrastructure at Cornell and answering my naïve questions patiently. Roberto Gioiosa and Marco Cesati for helping me run experiments for my first paper. Jacob Lidman for being an amazing study partner, for sharing office space with me and putting up with my rants and bad jokes. Alen, Angelos, Bapi, Dmitry, Kasyab, Mafijul, Waliullah, Gabriele, Vinay, Jochen, Fatemeh, Behrooz, Chloe, Petros, Risat, Stavros, Alirad, Ahsen, Viktor, Miquel, Vassilis, Yiannis and Ivan for being wonderful colleagues and friends. Peter, Rune and Arne for proving invaluable IT support. Rolf Snedsböl for managing my teaching duties. Eva, Lotta, Tiina, Jonna, and Marianne for providing excellent adminis- trative support during my time at Chalmers. All the colleagues and staff at Computer Science and Engineering department for creating a healthy work environment. Anthony Brandon for giving me valuable support when I was working with ρVEX processor for ERA project. European Union for funding my research. My family for motivating me to push for my goals and being there as a backup whenever I needed them. Bhavishya Goel Göteborg, May 2016 Contents Abstract i Preface iii Acknowledgments v 1 Introduction 2 1.1 Power Measurement . .8 1.2 Power Modeling . .9 1.3 Energy Characterization . 10 1.4 Contributions . 10 1.5 Thesis Organization . 12 2 Power Measurement Techniques 13 2.1 Power Measurement Techniques . 14 2.1.1 At the Wall Outlet . 14 2.1.2 At the ATX Power Rails . 15 2.1.3 At the Processor Voltage Regulator . 17 2.1.4 Experimental Results . 19 2.2 RAPL power estimations . 24 2.2.1 Overview . 24 2.2.2 Experimental Results . 24 2.3 Related Work . 31 2.4 Conclusions . 32 3 Per-core Power Estimation Model 33 3.1 Modeling Approach . 37 vii CONTENTS viii 3.2 Methodology . 38 3.2.1 Counter Selection . 38 3.2.2 Model Formation . 42 3.3 Validation . 44 3.3.1 Computation Overhead . 46 3.3.2 Estimation Error . 46 3.4 Management . 53 3.4.1 Sample Policies . 58 3.4.2 Experimental Setup . 58 3.4.3 Results . 60 3.5 Related Work . 62 3.6 Conclusions . 65 4 Energy Characterization for Multicore Processors Using Dynamic and Static Power Models 67 4.1 Power Modeling Methodology . 69 4.1.1 Uncore Dynamic Power Model . 69 4.1.2 Core Dynamic Power Model . 72 4.1.3 Uncore Static Power Model . 74 4.1.4 Core Static Power Model . 75 4.1.5 Total Chip Power Model . 77 4.2 Validation . 80 4.3 Energy Characterization of Frequency Scaling . 81 4.3.1 Energy Effects of DVFS . 83 4.3.2 DVFS Prediction . 85 4.4 Energy Characterization of Thread Scaling . 92 4.4.1 Energy Effects of Serial Fraction . 96 4.4.2 Energy Effects of Core-to-core Communication Overhead 99 4.4.3 Energy Effects of Bandwidth Contention . 102 4.4.4 Energy Effects of Locking-Mechanism Overhead . 104 4.5 Related Work . 106 4.6 Conclusions . 108 5 Characterization of Intel’s Restricted Transactional Memory 109 5.1 Experimental Setup . 110 5.2 Microbenchmark analysis . 112 CONTENTS ix 5.2.1 Basic RTM Evaluation .

Load more