What Is Happening to Power, Performance, and Software?

[3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 110 .......................................................................................................................................................................................................................... WHAT IS HAPPENING TO POWER, PERFORMANCE, AND SOFTWARE? .......................................................................................................................................................................................................................... SYSTEMATICALLY EXPLORING POWER, PERFORMANCE, AND ENERGY SHEDS NEW LIGHT ON THE CLASH OF TWO TRENDS THAT UNFOLDED OVER THE PAST DECADE: THE RISE OF PARALLEL PROCESSORS IN RESPONSE TO TECHNOLOGY CONSTRAINTS ON POWER, CLOCK SPEED, AND WIRE DELAY; AND THE RISE OF MANAGED HIGH-LEVEL, PORTABLE PROGRAMMING LANGUAGES. ......Quantitative performance analy- seeking performance improvements and en- sis is the foundation for computer system de- ergy efficiency in smaller technologies to sign and innovation. In their classic paper, build parallel heterogeneous architectures. EmerandClarknotedthat‘‘alackof This hardware requires parallel software detailed timing information impairs efforts and exposes software to ongoing hardware to improve performance.’’1 They pioneered upheaval. Unfortunately, most software Hadi Esmaeilzadeh the quantitative approach by characterizing today is not parallel, nor is it designed to instruction mix and cycles per instruction modularly decompose onto a heterogeneous University of Washington on timesharing workloads. Emer and Clark substrate. surprised expert readers by demonstrating a Over this same decade, Moore’s transistor gap between the theoretical 1 million bounty drove orthogonal and disruptive soft- Ting Cao instructions per second (MIPS) peak of the ware changes with respect to how software is VAX-11/780 and the 0.5 MIPS it delivered deployed, sold, and built. Software demands Xi Yang on real workloads. Hardware and software for correctness, complexity management, researchers in industry and academia now programmer productivity, time-to-market, Stephen M. Blackburn use and have extended this principled perfor- reliability, security, and portability pushed Australian National mance analysis methodology. Our research developers away from low-level compiled applies this quantitative approach to mea- ahead-of-time (native) programming lan- University sured power and energy. guages. Developers increasingly choose This work is timely because the past de- high-level managed programming languages cade heralded the era of power-constrained with a selection of safe pointer disciplines, Kathryn S. McKinley hardware design. Hardware demands for en- garbage collection (automatic memory man- ergy efficiency intensified in large-scale sys- agement), extensive standard libraries, and Microsoft Research tems, in which power began to dominate portability through dynamic just-in-time costs, and in mobile systems, which are con- compilation. For example, modern web ser- strained by battery life. Unfortunately, tech- vices combine managed languages, such as nology limits on power retard Dennard PHP on the server side and JavaScript on scaling2 and prevent systems from using all theclientside.Inmarketsasdiverseasfinan- transistors simultaneously (dark silicon).3 cial software and cell phone applications, These constraints are forcing architects Java and .NET are the dominant choice. .............................................................. 110 Published by the IEEE Computer Society 0272-1732/12/$31.00 c 2012 IEEE [3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 111 ............................................................................................................................................................................................... Related Work in Power Measurement, Power Modeling, and Methodology Processor design literature is full of performance measurement and analysis. Despite power’s growing importance, power measurements References 1-3 are still relatively rare. Isci and Martonosi combine a clamp ammeter 1. X. Fan, W.D. Weber, and L.A. Barroso, ‘‘Power Provisioning for with performance counters for per-unit power estimation of the Intel Pen- a Warehouse-sized Computer,‘‘ Proc. 34th Ann. Int’l Symp. 2 tium 4 on SPEC CPU2000. Fan et al. estimate whole-system power for Computer Architecture (ISCA 07), ACM, 2007, pp. 13-23. 1 large-scale data centers. They find that even the most power-consuming 2. C. Isci and M. Martonosi, ‘‘Runtime Power Monitoring in workloads draw less than 60 percent of peak possible power consump- High-End Processors: Methodology and Empirical Data‘‘, tion. We measure chip power and support their results by showing that Proc.36thAnn.IEEE/ACMInt’lSymp.Microarchitecture thermal design power (TDP) does not predict measured chip power. Our (Micro 36), IEEE CS, 2003, pp. 93-104. work is the first to compare microarchitectures, technology generations, 3. E. Le Sueur and G. Heiser, ‘‘Dynamic Voltage and Frequency individual benchmarks, and workloads in the context of power and Scaling: The Laws of Diminishing Returns,‘‘ Proc. Int’l Conf. performance. Power-Aware Computing and Systems (HotPower 10), USENIX Power modeling is necessary to thoroughly explore architecture Assoc., 2010, pp. 1-8. 4-6 design. Measurement complements simulation by providing valida- 4. O. Azizi et al., ‘‘Energy-Performance Tradeoffs in Processor tion. For example, some prior simulators used TDP, but our measure- Architecture and Circuit Design: A Marginal Cost Analysis,‘‘ ments show that this estimate is not accurate. As we look to the Proc. 37th Ann Int’l Symp. Computer Architecture (ISCA 10), future, programmers will need to tune their applications for power 2010, pp. 26-36. and energy, and not just performance. Just as architectural event per- 5. Y. Li et al., ‘‘CMP Design Space Exploration Subject to Phys- formance counters provide insight to applications, so will power and ical Constraints,‘‘ Proc. 12th Int’l Symp. High-Performance energy measurements. Computer Architecture, IEEE CS, 2006, pp. 17-28. Although the results show conclusively that managed and native 6. J.B. Brockman et al., ‘‘McPAT: An Integrated Power, Area, workloads respond differently to architectural variations, perhaps this re- and Timing Modeling Framework for Multicore and Manycore 7 sult should not be surprising. Unfortunately, few architecture or operat- Architectures,‘‘ Proc. 42nd Ann. IEEE/ACM Int’l Symp. Micro- ing systems publications with processor measurements or simulated architecture (MICRO 42), IEEE CS, 2009, pp. 469-480. designs use Java or any other managed workloads, even though the 7. S.M. Blackburn et al., ‘‘Wake Up and Smell the Coffee: Eval- evaluation methodologies we use here for real processors and those uation Methodologies for the 21st Century,‘‘ Comm. ACM, 7 for simulators are well developed. vol. 51, no. 8, 2008, pp.83-89. Exponential performance improvements in is the first to systematically measure the hardware hid many of the costs of high- power, performance, and energy characteris- level languages and helped create a virtuous tics of software and hardware across a range cycle with ever more capable, reliable, and of processors, technologies, and workloads. well performing software. This ecosystem is We execute 61 diverse sequential and paral- resulting in an explosion of developers, soft- lel benchmarks written in three native lan- ware, and devices that continue to change guages and one managed language, all of how we live and learn. which are widely used: C, C++, Fortran, Unfortunately, a lack of detailed power and Java. We choose Java because it has ma- measurements is impairing efforts to reduce ture virtual machine technology and substan- energy consumption on modern software. tial open-source benchmarks. We choose eight representative Intel IA32 processors Examining power, performance, and energy from five technology generations (130 nm This work quantitatively examines power, to 32 nm). Each processor has an isolated performance, and energy during this period processor power supply on the motherboard. of disruptive software and hardware changes Each processor has a power supply with sta- (2003 to 2011). Voluminous research explores ble voltage, to which we attach a Hall effect performance and a growing body of work sensor that measures current and, hence, pro- explores power (see the ‘‘Related Work in cessor power. We calibrate and validate our Power Measurement, Power Modeling, sensor data. We find that power consump- and Methodology’’ sidebar), but our work tion varies widely among benchmarks. .................................................................... MAY/JUNE 2012 111 [3B2-9] mmi2012030110.3d 17/5/012 12:23 Page 112 ............................................................................................................................................................................................... TOP PICKS Findings Our results recommend that systems research- Power consumption is highly application dependent and is poorly correlated to TDP. ers include managed, native, sequential and Energy-efficient architecture design is very sensitive to workload. Configurations in parallel workloads when designing and eval- the native nonscalable Pareto frontier differ substantially from all the other workloads. uating energy-efficient systems. Comparing one core to two, enabling a core is not consistently energy efficient. The Java Virtual Machine

Load more