On Power-Proportional Processors

ON POWER-PROPORTIONAL PROCESSORS by Yasuko Watanabe A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN - MADISON 2011 © Copyright by Yasuko Watanabe 2011 All Rights Reserved Abstract i Although advancements in technology continue to physically shrink transistors, power reduc- tion has fallen behind transistor scaling. This imbalance results in chips with increasing numbers of transistors that consume more power than the chips of preceding generations. The effort to meet an affordable power budget and still maintain continuous performance improvements cre- ates a need to tailor power consumption for the delivered performance—a concept called power proportionality. Dynamic voltage and frequency scaling (DVFS) is the predominant method of controlling power and performance. However, DVFS has reached a point of decreasing benefits because technology scaling reduces the effective voltage range. Previous attempts to overcome the limitations of DVFS often have undesirable consequences, including increased leakage power and process variations. Therefore, this thesis investigates microarchitectural mechanisms to achieve power proportionality. From a nominal design point, the core scales up by aggregating resources for higher performance at higher power. Conversely, disabling resources scales down the design for lower power and performance. We first propose a core design called WiDGET that scales from low-power in-order execution to high-performance out-of-order execution. We achieve this scalability by varying active in-order buffer and functional unit count and by organizing them in a distributed manner for higher latency tolerance. Using low-power in-order buffers makes WiDGET a more power-efficient design than a traditional out-of-order processor. ii To explore further scaling opportunities, we also examine trade-offs in achieving energy-efficient scalability using generalized scalable core designs. Due to wire delay, maintaining pipeline balance results in energy inefficiency when scaled up. On the other hand, it is more energy efficient to uniformly scale down the pipeline. Finally, we explore techniques for scaling power and performance down. We propose a concept called Power Gliding that selectively disables microarchitectural optimizations that follow the traditional DVFS 3:1 power-to-performance optimization rule for efficient power scale-down. Through two case studies, we empirically show that power gliding frequently does as well as DVFS and performs better in some cases. With the mechanisms proposed above, this thesis demonstrates processor designs that provide power proportionality beyond DVFS. Acknowledgments iii It has been a long journey to get here: a journey I could not have possibly done alone. Only with family, friends, mentors, and teachers, could I have completed it. I want to dedicate this dissertation to my fiancé, Joseph Eckert and my family. Joe has always been there for me. He was very supportive and believed in me even when I myself could not. He put my education first and never once complained about me always working on the next deadline. Thank you for bringing laughter and comfort into my life. I cannot imagine the bravery and faith my parents had when they allowed their 18-year-old daughter to leave a small town in Japan to pursue a bachelor’s degree in the U.S. all by herself. I only found out a couple years ago that my father has been donating to UNICEF under my name for good luck all these years. My mother actively reached out to exchange students in Japan, hop- ing that people in the U.S. would do the same for me. My sister and brother always supported my decisions and encouraged me. I never felt alone even across the Pacific, thanks to my family. I feel fortunate to have Professor David Wood as my advisor. He is highly intelligent, has a wide scope of knowledge, and is able to provide both detailed discussion and see big picture on any topic. He taught me the joy and the depth of research. I will remember all the lessons he gave me to become a great researcher like him. John Davis was a vital part of my graduate school career. He was always available even for brainstorming and provided industrial perspectives and insights. He was also a great mentor for me. It is not an exaggeration to say that I would not have been able to complete my dissertation without his guidance and encouragement. iv I also want to thank my committee for their constructive criticism. The fellow students in the architecture group enriched my graduate school years. In particular, I had a pleasure to work closely with Dan Gibson, who has a quick wit and is caring. He is also a good teacher and taught me the mechanisms of low-level circuits as well as how to play chess. I miss our trips to "The Library." Derek Hower has also been supportive. He provided valuable feedback to my dissertation. His ability to think out of the box is admirable. I also want to thank the students who came before me for their guidance, especially Phillip Wells, Michael Marty, Alaa Alameldeen, Matthew Allen, Brad Beckmann, Jayaram Bobba, Koushik Chakraborty, Jichuan Chang, Natalie Enright Jerger, Kevin Moore, Dana Vantrease, Min Xu, and Luke Yen. In addition, I want to thank my fellow students: Shoaib Altaf, Akanksha Baid, Arkaprava Basu, Spyridon Blanas Emily Blem, Marc de Kruijf, Polina Dudnik, Hamid Reza Ghasemi, Venkatraman Govindaraju, Gagan Gupta, Andrew Nere, Lena Olson, Marc Orr, Jason Power, Cindy Rubio Gonzalez, Somayeh Sardashti, Rathijit Sen, Srinath Sridharan, Nelay Vaish, Haris Volos, and Cong Wang. Doug Burger sparked my interest in computer architecture when I first took an undergraduate course with him at the University of Texas at Austin. I am thankful that I had an opportunity to work as an undergraduate research assistant with him, and it was he who first encouraged me to pursue a Ph.D. Lastly, I thank the Wisconsin Computer Architecture Affiliates for their time and feedback, the Computer Systems Laboratory for machine and software support, and the Wisconsin Condor project. v Table of Contents Abstract. i Acknowledgments. iii Table of Contents . v List of Figures. ix List of Tables . xii Chapter 1 Introduction . 1 1.1 Technology Trends . 3 1.2 Power Proportionality . 6 1.2.1 Wire Delay . 7 1.3 Desirable Hardware Features for Power Proportionality . 9 1.3.1 WiDGET: Decoupled, In-Order Scalable Cores . 11 1.3.2 Scalable Core Substrate . 12 1.3.3 Power Gliding: Extending the Power-Performance Curve . 14 1.4 Contributions . 14 1.5 Dissertation Structure . 15 Chapter 2 Related Work. .17 2.1 Power-Proportional Computing . 17 2.1.1 Circuit-Level Techniques . 19 2.1.2 System-Level Techniques . 20 2.1.3 Dynamically Adaptive Cores . 21 2.1.4 Heterogeneous Chip Multi-Processors . 21 2.2 Low-Complexity Microarchitectures . 22 2.2.1 Clustered Architectures . 22 vi 2.2.2 Thread-Level Speculation . 23 2.2.3 Approximating OoO Performance with In-Order Execution . 23 2.3 Instruction Steering Cost Model . 24 2.4 Prior Scalable Core Designs . 26 2.5 Designing Power-Proportional Processors . 27 Chapter 3 Evaluation Methodology. .29 3.1 Simulation Tools . 29 3.1.1 Simulation Assumptions . 30 3.2 Workloads . 31 3.3 Common Design Configuration . 33 Chapter 4 WiDGET: Wisconsin Decoupled Grid Execution Tiles . .34 4.1 High-Level Overview . 34 4.2 Toward Practical Instruction Steering . 36 4.3 Microarchitecture . 38 4.3.1 Pipeline Stages . 39 4.3.2 Frontend . 40 4.3.3 Execution Unit . 42 4.3.4 Backend . 44 4.4 Evaluation . 44 4.4.1 Simulation Methodology . 44 4.4.2 Performance Range . 45 4.4.3 Improving.

Load more