[3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 110
...... WHAT IS HAPPENING TO POWER, PERFORMANCE, AND SOFTWARE?
...... SYSTEMATICALLY EXPLORING POWER, PERFORMANCE, AND ENERGY SHEDS NEW LIGHT
ON THE CLASH OF TWO TRENDS THAT UNFOLDED OVER THE PAST DECADE: THE RISE
OF PARALLEL PROCESSORS IN RESPONSE TO TECHNOLOGY CONSTRAINTS ON POWER,
CLOCK SPEED, AND WIRE DELAY; AND THE RISE OF MANAGED HIGH-LEVEL, PORTABLE
PROGRAMMING LANGUAGES.
...... Quantitative performance analy- seeking performance improvements and en- sis is the foundation for computer system de- ergy efficiency in smaller technologies to sign and innovation. In their classic paper, build parallel heterogeneous architectures. EmerandClarknotedthat‘‘alackof This hardware requires parallel software detailed timing information impairs efforts and exposes software to ongoing hardware to improve performance.’’1 They pioneered upheaval. Unfortunately, most software Hadi Esmaeilzadeh the quantitative approach by characterizing today is not parallel, nor is it designed to instruction mix and cycles per instruction modularly decompose onto a heterogeneous University of Washington on timesharing workloads. Emer and Clark substrate. surprised expert readers by demonstrating a Over this same decade, Moore’s transistor gap between the theoretical 1 million bounty drove orthogonal and disruptive soft- Ting Cao instructions per second (MIPS) peak of the ware changes with respect to how software is VAX-11/780 and the 0.5 MIPS it delivered deployed, sold, and built. Software demands Xi Yang on real workloads. Hardware and software for correctness, complexity management, researchers in industry and academia now programmer productivity, time-to-market, Stephen M. Blackburn use and have extended this principled perfor- reliability, security, and portability pushed Australian National mance analysis methodology. Our research developers away from low-level compiled applies this quantitative approach to mea- ahead-of-time (native) programming lan- University sured power and energy. guages. Developers increasingly choose This work is timely because the past de- high-level managed programming languages cade heralded the era of power-constrained with a selection of safe pointer disciplines, Kathryn S. McKinley hardware design. Hardware demands for en- garbage collection (automatic memory man- ergy efficiency intensified in large-scale sys- agement), extensive standard libraries, and Microsoft Research tems, in which power began to dominate portability through dynamic just-in-time costs, and in mobile systems, which are con- compilation. For example, modern web ser- strained by battery life. Unfortunately, tech- vices combine managed languages, such as nology limits on power retard Dennard PHP on the server side and JavaScript on scaling2 and prevent systems from using all theclientside.Inmarketsasdiverseasfinan- transistors simultaneously (dark silicon).3 cial software and cell phone applications, These constraints are forcing architects Java and .NET are the dominant choice......
110 Published by the IEEE Computer Society 0272-1732/12/$31.00 c 2012 IEEE [3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 111
...... Related Work in Power Measurement, Power Modeling, and Methodology Processor design literature is full of performance measurement and analysis. Despite power’s growing importance, power measurements References 1-3 are still relatively rare. Isci and Martonosi combine a clamp ammeter 1. X. Fan, W.D. Weber, and L.A. Barroso, ‘‘Power Provisioning for with performance counters for per-unit power estimation of the Intel Pen- a Warehouse-sized Computer,‘‘ Proc. 34th Ann. Int’l Symp. 2 tium 4 on SPEC CPU2000. Fan et al. estimate whole-system power for Computer Architecture (ISCA 07), ACM, 2007, pp. 13-23. 1 large-scale data centers. They find that even the most power-consuming 2. C. Isci and M. Martonosi, ‘‘Runtime Power Monitoring in workloads draw less than 60 percent of peak possible power consump- High-End Processors: Methodology and Empirical Data‘‘, tion. We measure chip power and support their results by showing that Proc.36thAnn.IEEE/ACMInt’lSymp.Microarchitecture thermal design power (TDP) does not predict measured chip power. Our (Micro 36), IEEE CS, 2003, pp. 93-104. work is the first to compare microarchitectures, technology generations, 3. E. Le Sueur and G. Heiser, ‘‘Dynamic Voltage and Frequency individual benchmarks, and workloads in the context of power and Scaling: The Laws of Diminishing Returns,‘‘ Proc. Int’l Conf. performance. Power-Aware Computing and Systems (HotPower 10), USENIX Power modeling is necessary to thoroughly explore architecture Assoc., 2010, pp. 1-8. 4-6 design. Measurement complements simulation by providing valida- 4. O. Azizi et al., ‘‘Energy-Performance Tradeoffs in Processor tion. For example, some prior simulators used TDP, but our measure- Architecture and Circuit Design: A Marginal Cost Analysis,‘‘ ments show that this estimate is not accurate. As we look to the Proc. 37th Ann Int’l Symp. Computer Architecture (ISCA 10), future, programmers will need to tune their applications for power 2010, pp. 26-36. and energy, and not just performance. Just as architectural event per- 5. Y. Li et al., ‘‘CMP Design Space Exploration Subject to Phys- formance counters provide insight to applications, so will power and ical Constraints,‘‘ Proc. 12th Int’l Symp. High-Performance energy measurements. Computer Architecture, IEEE CS, 2006, pp. 17-28. Although the results show conclusively that managed and native 6. J.B. Brockman et al., ‘‘McPAT: An Integrated Power, Area, workloads respond differently to architectural variations, perhaps this re- and Timing Modeling Framework for Multicore and Manycore 7 sult should not be surprising. Unfortunately, few architecture or operat- Architectures,‘‘ Proc. 42nd Ann. IEEE/ACM Int’l Symp. Micro- ing systems publications with processor measurements or simulated architecture (MICRO 42), IEEE CS, 2009, pp. 469-480. designs use Java or any other managed workloads, even though the 7. S.M. Blackburn et al., ‘‘Wake Up and Smell the Coffee: Eval- evaluation methodologies we use here for real processors and those uation Methodologies for the 21st Century,‘‘ Comm. ACM, 7 for simulators are well developed. vol. 51, no. 8, 2008, pp.83-89.
Exponential performance improvements in is the first to systematically measure the hardware hid many of the costs of high- power, performance, and energy characteris- level languages and helped create a virtuous tics of software and hardware across a range cycle with ever more capable, reliable, and of processors, technologies, and workloads. well performing software. This ecosystem is We execute 61 diverse sequential and paral- resulting in an explosion of developers, soft- lel benchmarks written in three native lan- ware, and devices that continue to change guages and one managed language, all of how we live and learn. which are widely used: C, C++, Fortran, Unfortunately, a lack of detailed power and Java. We choose Java because it has ma- measurements is impairing efforts to reduce ture virtual machine technology and substan- energy consumption on modern software. tial open-source benchmarks. We choose eight representative Intel IA32 processors Examining power, performance, and energy from five technology generations (130 nm This work quantitatively examines power, to 32 nm). Each processor has an isolated performance, and energy during this period processor power supply on the motherboard. of disruptive software and hardware changes Each processor has a power supply with sta- (2003 to 2011). Voluminous research explores ble voltage, to which we attach a Hall effect performance and a growing body of work sensor that measures current and, hence, pro- explores power (see the ‘‘Related Work in cessor power. We calibrate and validate our Power Measurement, Power Modeling, sensor data. We find that power consump- and Methodology’’ sidebar), but our work tion varies widely among benchmarks......
MAY/JUNE 2012 111 [3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 112
...... TOP PICKS
Findings Our results recommend that systems research- Power consumption is highly application dependent and is poorly correlated to TDP. ers include managed, native, sequential and Energy-efficient architecture design is very sensitive to workload. Configurations in parallel workloads when designing and eval- the native nonscalable Pareto frontier differ substantially from all the other workloads. uating energy-efficient systems. Comparing one core to two, enabling a core is not consistently energy efficient. The Java Virtual Machine induces parallelism into the execution of single-threaded Architecture Java benchmarks. Hardware features such as clock scaling, Simultaneous multithreading delivers substantial energy savings for recent hardware gross microarchitecture, simultaneous multi- and in-order processors. threading (SMT), and chip multiprocessors The most recent processor in our study does not consistently increase energy (CMPs) each elicit a huge variety of power, consumption as its clock increases. performance, and energy responses. This va- The power/performance response to clock scaling of the native nonscalable workload riety and the difficulty of obtaining power differs from the other workloads. measurements recommends exposing on-chip power meters and, when possible, structure- Figure 1. We organize our discussion around these seven findings from specific power meters for cores, caches, and an analysis of measured chip power, performance, and energy on other structures. 61 workloads and eight processors. The ASPLOS paper includes more Modern processors include power- findings and analysis.4 management techniques that monitor power sensors to minimize power usage and boost performance, but these sensors are not visi- ble. Only in 2011, after our original paper, Furthermore, relative power, performance, did Intel first expose energy counters in a and energy are not well predicted by core production processor (Sandy Bridge).5 Just count, clock speed, or reported thermal de- as hardware event counters provide a quanti- sign power (TDP). tative grounding for performance innova- We use controlled hardware configura- tions, future architectures should include tions to explore the energy impact of hard- power meters to drive innovation in the ware features, software parallelism, and power-constrained computer systems era. workload. We perform historical and Pareto Measurement is the key to understanding analyses that identify the most energy- and optimization. efficient designs in our architecture configu- ration space. We made all of our data pub- Methodology licly available in the ACM Digital Library This section overviews essential elements as a companion to our original ASPLOS of our experimental methodology. For a 2011 paper.4 Our data quantifies the extent, more detailed treatment, see our original with precision and depth, of some known paper.4 workload and hardware trends and some pre- viously unobserved trends. This article is Software organized around just seven findings, listed We systematically explored workload se- in Figure 1, from our more comprehensive lection because it is a critical component ASPLOS analysis. Two themes emerge for analyzing power and performance. Native from our analysis with respect to workload and managed applications embody different and architecture. tradeoffs between performance, reliability, portability, and deployment. It is impossible Workload to meaningfully separate language from The power, performance, and energy workload. We offer no commentary on the trends of nonscalable native workloads differ virtue of language choice. We created the fol- substantially from native parallel and managed lowing four workloads from 61 benchmarks. workloads. For example, the SPEC CPU2006 native benchmarks draw significantly less Native nonscalable benchmarks: C, C++, power than parallel benchmarks; and and Fortran single-threaded compute- managed runtimes exploit parallelism even intensive benchmarks from SPEC when executing single-threaded applications. CPU2006...... 112 IEEE MICRO [3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 113