Power Efficiency in High Performance Computing
Total Page:16
File Type:pdf, Size:1020Kb
Power Efficiency in High Performance Computing Shoaib Kamil John Shalf Erich Strohmaier LBNL/UC Berkeley LBNL/NERSC LBNL/CRD [email protected] [email protected] [email protected] ABSTRACT 100 teraflop successor consumes almost 1,500 KW without After 15 years of exponential improvement in microproces- even being in the top 10. The first petaflop-scale systems, sor clock rates, the physical principles allowing for Dennard expected to debut in 2008, will draw 2-7 megawatts of power. scaling, which enabled performance improvements without a Projections for exaflop-scale computing systems, expected commensurate increase in power consumption, have all but in 2016-2018, range from 60-130 megawatts [16]. Therefore, ended. Until now, most HPC systems have not focused on fewer sites in the US will be able to host the largest scale power efficiency. However, as the cost of power reaches par- computing systems due to limited availability of facilities ity with capital costs, it is increasingly important to com- with sufficient power and cooling capabilities. Following this pare systems with metrics based on the sustained perfor- trend, over time an ever increasing proportion of an HPC mance per watt. Therefore we need to establish practical center’s budget will be needed for supplying power to these methods to measure power consumption of such systems in- systems. situ in order to support such metrics. Our study provides The root cause of this impending crisis is that chip power power measurements for various computational loads on the efficiency is no longer improving at historical rates. Up until largest scale HPC systems ever involved in such an assess- now, Moore’s Law improvements in photolithography tech- ment. This study demonstrates clearly that, contrary to niques resulted in proportional reductions in dynamic power conventional wisdom, the power consumed while running the consumption per transistor and consequent improvements High Performance Linpack (HPL) benchmark is very close in clock frequency at the same level of power dissipation– to the power consumed by any subset of a typical compute- a property referred to as Dennard scaling. However, be- intensive scientific workload. Therefore, HPL, which in most low 90 nm, the static power dissipation (power lost due to cases cannot serve as a suitable workload for performance current leakage through the silicon substrate) has overtaken measurements, can be used for the purposes of power mea- dynamic power dissipation. This leads to a stall in clock fre- surement. Furthermore, we show through measurements on quency improvements in order to stay within practical ther- a large scale system that the power consumed by smaller mal power dissipation limits. Thus, the free ride of clock subsets of the system can be projected straightforwardly frequency and power efficiency improvements is over. Power and accurately to estimate the power consumption of the is rapidly becoming the leading design constraint for future full system. This allows a less invasive approach for deter- HPC system designs. After many years of architectural evo- mining the power consumption of large-scale systems. lution driven by clock frequency improvements at any cost, architectural power efficiency matters once again. In this paper we address how power consumption on small 1. INTRODUCTION and large scale systems can be measured for a variety of We are entering an era where Petaflop HPC systems are workloads. We do not address the related but independent anticipated to draw enormous amounts of electrical power. question of performance measurement itself. In Section 2 Concerns over total cost of ownership have moved the focus we discuss various approaches for defining workloads, pro- of the HPC system architecture from concern over peak per- cedures for power measurements, and different power effi- formance towards concern over improving power efficiency. ciency metrics. In Section 3 we describe the experimental The increase in power consumption can be illustrated by setup for the systems in our study. Results for single node comparing typical top HPC systems. In November of 2001, measurements are presented in Section 4, for single cabinet NERSC’s new 3 teraflop HPC system was able to reach #3 measurements in Section 5, and for a full large scale system on the Top500 list of most powerful machines using less than in Section 6. In Section 7 we demonstrate that full system 400 KW of electrical power. In November 2007, NERSC’s power consumption can be approximated with high accuracy through extrapolations based on power consumption at the cabinet level. Our conclusions are presented in Section 9. Permission to make digital or hard copies of all or part of this work for 2. RELATED WORK IN POWER personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies EFFICIENCY METRICS bear this notice and the full citation on the first page. To copy otherwise, to While metrics for assessing performance such as SPEC- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. FP [5], the NAS Parallel Benchmarks [9], and the Top500 [6] . list have gotten considerable attention over the past decade, similarly robust assessments of power-efficiency have received HPCC benchmark performance). There is no need to mea- comparably less attention. In order to foster an industry- sure the power consumption under each different benchmark wide focus on keeping power consumption under control, it workload— the power drawn while running LINPACK can is necessary to provide appropriate power-efficiency metrics suffice. that can be used to compare and rank systems in a manner similar to how LINPACK is used for ranking peak delivered 2.3 Methodologies for Measuring Power performance. Such efforts are already underway for com- Consumption mercial data centers. For example the EPA Energy Star Several different methods for measuring power usage on program has defined a rigorous set of Server Metrics [1] and current architectures have been proposed. These methods testing methodologies oriented towards transactional work- differ in the tools used for measuring power consumption loads. However, they are inappropriate for assessing the and in the various places where valid measurements can be power efficiency delivered for HPC and scientific workloads. collected. In this study we explore several different mea- The emerging SpecPower metric is valuable for assessing surement methods, and compare their effectiveness in our technical applications on workstation-class computing plat- experience. forms, but may have limited applicability to large scale HPC We investigated a variety of measurement techniques to fit systems. into the diverse constraints of existing facility infrastructure. It is our intent to foster development of power metrics by For example, in many facilities more than one system shares popular HPC rankings such as the Top500 list to develop the same PDU or metered circuit, therefore making it very efficiency standards that are appropriate for scientific work- difficult to isolate system power. In Warren et al [17], the loads. To arrive at a sound power efficiency metric we need authors use Transmeta processors and infrastructure allow- to define a suitable workload for performance measurements ing them to utilize an off-the-shelf UPS system to measure as well as power measurements, a power measurement pro- power because of low power consumption as well as the use cedure, and an appropriate metric itself. of a system that uses standard 3-prong wall sockets. Their power consumption methodology is not generally applicable, 2.1 Workload Definition for Performance since most cluster systems do not use wall socket connectors Measurements for power; in addition, current cluster designs attempt to For any serious evaluation of performance, it is critically perform a single AC to DC conversion for the entire rack. important to develop a workload that correctly reflects the A 2005 study due to Feng et al [11] proposed a framework requirements of large-scale scientific computing applications. for building cluster systems from commodity components Contrary to first impressions this continues to be largely and integrating a set of extension wires from the power sup- unsolved and has not proven straightforward. ply, each connected to a sensor resistor and a digital mul- There have been numerous alternative computer architec- timeter. Experimentally, they correlate each wire with its tures proposed to address power efficiency concerns, but lit- associated components, and then measure the power con- tle information on sustained power efficiency. A number of sumption for various NAS parallel benchmark codes. Al- novel architectures such as General Purpose GPUs [14], the though their power measurement hardware is infeasible for STI Cell Broadband Engine [18], and embedded-processor- many systems, they provide important results that agree based systems like the IBM BG/L hold some promise of with our findings. improving the power efficiency of HPC platforms, but the Previous work has shown that it can be extremely chal- lack of a uniform basis for comparing such systems makes lenging (and likely impractical) to measure power for a com- it difficult to determine whether any of these approaches of- plete HPC system while running the LINPACK benchmark. fer genuine power efficiency benefits for relevant scientific Systems that have already collected performance data are computing problems. loath