Power Constrained Performance Optimization in Chip Multi-Processors

POWER CONSTRAINED PERFORMANCE OPTIMIZATION IN CHIP MULTI-PROCESSORS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of the Ohio State University By Kai Ma, B.S., M.S. Graduate Program in Electrical and Computer Engineering The Ohio State University 2013 Dissertation Committee: Prof. Xiaorui Wang, Advisor Prof. Füsun Ozg¨¨ uner Prof. Kevin M. Passino Prof. Umit¨ V. Ç atalyürek c Copyright by Kai Ma 2013 ABSTRACT With the technology scaling in semiconductor industry, both the power density and the power consumption of processors keep increasing. Compared with traditional frequency increasing, integrating more cores on the processor chip offers the opportunity to explore inter-thread parallelism and better energy efficiency. Therefore, the processor design has officially entered a chip multi-processor era. However, the peak power consumption (i.e., power budget or power cap) of a processor is still constrained by the cooling capacity, power delivery limitation, or the limitations specified by the users for different management purposes. Accordingly, it is important to discuss the performance optimization with power constraints (i.e., power capping). Important as it is, power capping is also challenging. Fundamentally, the performance/power relationship of applications is unknown a priori due to runtime variations. Therefore, it is difficult to choose the optimal adjustment in a large possible adjustment space. In this document, we investigate different aspects of power capping such as considering more components (e.g, caches part) in addition to traditional core part, using new knobs (e.g, power gating), managing new emerging platforms (e.g, CPU-GPU hybrid systems), and using new cooling technology (e.g., thermal electric cooling) . First, we explore the opportunity to coordinate the cache part and the core part in CMP (i.e., chip multi-processor). Second, we investigate a scalable power capping algorithm that can leverage the inter-thread dependency of multi-threaded applications for optimized performance. Third, we integrate dynamic voltage and frequency ii scaling with power gating for power capping as well as considering the core-level service lifetime balancing. Fourth, we develop an energy conservation algorithm for CPU-GPU hybrid systems. Fifth, we check the co-optimization between computa- tional power and cooling power offered by new cooling devices. In this document, we focus on the power capping issue but also discuss the related energy conservation and thermal issues. iii This document is dedicated to my wonderful family. iv ACKNOWLEDGMENTS Without the help of the following people, I would not have been able to complete my dissertation. My heartfelt thanks to: Dr. Xiaorui Wang, for his guidance. I could not have asked for a better mentor. Without his help, I would not have had the opportunity to change my specialization to Computer Architecture, nor would I have enjoyed the level of success I have achieved in this area of research. Dr. Yefu Wang, for his help with the feedback-control-based power control concept that ultimately developed into our Temperature-Constrained Power Control paper. Dr. Ming Chen, for his help with the writing advice that ultimately developed into our Scalable Power Control paper. Xue Li, Wei Chen, and Chi Zhang, for their contributions to the GreenGPU project. v VITA 1981 .................................. BorninChangchun,Jilin, China 2004 .................................. B.S.InformationEngineering, Zhejiang University Hangzhou, Zhejiang, China 2007 .................................. M.S.ElectricalEngineering, Tongji University Shanghai, China 2008-2011 ............................. GraduateResearchAssociate, The University of Tennessee, Knoxville Knoxville, TN, USA 2011-Present .......................... GraduateResearchAssociate, The Ohio State University Columbus, OH, USA vi PUBLICATIONS Yefu Wang, Kai Ma, and Xiaorui Wang, Temperature-Constrained Power Control for Chip Multiprocessors with Online Model Estimation, The 36th International Sym- posium on Computer Architecture. June 2009, Austin, Texas, USA Xiaorui Wang, Kai Ma, and Yefu Wang, Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors, the 39th International Confer- ence on Parallel Processing September 2010, San Diego, California, USA Kai Ma, Xue Li, Ming Chen, and Xiaorui Wang, Scalable Power Control for Many- Core Architectures Running Multi-threaded Applications, the 38th International Sym- posium on Computer Architecture. June 2011, San Jose, California, USA Kai Ma, Xiaorui Wang, and Yefu Wang, DPPC: Dynamic Power Partitioning and Capping in Chip Multiprocessors, the 29th International Conference on Computer Design, October 2011, Amherst, Massachusetts, USA Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang, GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures, the 41st International Conference on Parallel Processing, September 10-13, 2012, Pittsburgh, PA, USA Kai Ma, and Xiaorui Wang, PGCapping: Exploiting Power Gating for Power Cap- ping and Core Lifetime Balancing in CMPs, the 21st International Conference on Parallel Architectures and Compilation Techniques, September 19-23, 2012, Min- neapolis, MN, USA Xiaorui Wang, Kai Ma, and Yefu Wang, Cache Latency Control for Application Fairness or Differentiation in Power-Constrained Chip Multiprocessors, IEEE Trans- actions on Computers, 61(12): 1-15, December 2012 Xiaorui Wang, Kai Ma, and Yefu Wang, Adaptive Power Control with Online Model Estimation for Chip Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 22(10): 1681-1696, October 2011 Kai Ma, Xiaorui Wang and Yefu Wang, DPPC: Dynamic Power Partitioning and Control for Improved Chip Multiprocessor Performance, IEEE Transactions on Com- puters, 2013, (accepted) vii FIELDS OF STUDY Major Field: Electrical and Computer Engineering Specialization: Computer Systems and Architecture viii TABLE OF CONTENTS Abstract....................................... ii Dedication...................................... iii Acknowledgments.................................. v Vita......................................... vi ListofFigures................................... xii CHAPTER PAGE 1 IntroductionandBackground....................... 1 1.1PowerWall............................... 1 1.2ChipMulti-processors......................... 2 1.3PowerCapping............................. 3 1.4Contributions.............................. 4 2 Scalable Many-Core Power Control for Multi-threaded Applications . 5 2.1Introduction.............................. 5 2.2Background............................... 9 2.3SystemArchitecture.......................... 10 2.4Chip-levelPowerControl....................... 14 2.5DynamicAggregatedFrequencyPartitioning............ 16 2.5.1Chip-levelPartitioning.................... 16 2.5.2Group-levelPartitioning................... 19 2.6Core-levelPowerEstimationonPhysicalTestbed.......... 21 2.7Implementation............................ 23 2.7.1Testbed............................ 23 2.7.2SimulationEnvironment................... 25 2.7.3DiscussiononHardwareImplementation.......... 26 2.8Evaluation............................... 27 2.8.1Baselines............................ 28 2.8.2EstimationAccuracy..................... 29 ix 2.8.3TestbedResults........................ 29 2.8.4SimulationResults...................... 36 2.8.5 Discussion on Algorithm Complexity and Scalability .... 36 2.9Conclusion............................... 38 3 PowerGatingforPowerCappingandCoreLifetimeBalancing..... 40 3.1Introduction.............................. 40 3.2Background............................... 44 3.3SystemDesign............................. 45 3.3.1DesignofPCPGManagementModule........... 47 3.3.2DesignofDVFSManagementModule............ 48 3.3.3LifetimeBalancing...................... 50 3.4Implementation............................ 51 3.4.1PowerCappingEvaluationTestbed............. 51 3.4.2LifetimeBalancingEvaluationSimulator.......... 54 3.5Evaluation............................... 54 3.5.1Baselines............................ 54 3.5.2PowerControlAccuracy................... 57 3.5.3ApplicationPerformance................... 60 3.5.4LifetimeBalancing...................... 62 3.6Conclusion............................... 64 4 EnergyEfficiencyinGPU-CPUHeterogeneousArchitectures...... 66 4.1Introduction.............................. 66 4.2Background............................... 69 4.3Motivation............................... 71 4.3.1 A Case Study on Frequency Scaling for GPU Cores and Memory............................ 71 4.3.2 A Case Study on Workload Division between GPU and CPU 73 4.4SystemDesignofGreenGPU..................... 74 4.5GreenGPUAlgorithms........................ 78 4.5.1 Dynamic Frequency Scaling for GPU Cores and Memory . 78 4.5.2WorkloadDivision...................... 81 4.6Implementation............................ 83 4.7Experiments.............................. 87 4.7.1FrequencyScalingforGPUCoresandMemory....... 87 4.7.2WorkloadDivisionbetweenGPUandCPU......... 90 4.7.3GreenGPUasaHolisticSolution.............. 92 4.8Conclusion............................... 93 x 5 Integrating Thermoelectric Coolers and Fans for Energy Efficiency . 95 5.1Introduction.............................. 95 5.2Background............................... 98 5.3SystemDesign............................. 99 5.3.1ThermalModel........................ 100 5.3.2PowerandPerformanceModel...............

Load more