Introducing Application Awareness Into a Unified Power Management

Introducing Application Awareness Into a Unified Power Management Stack Daniel C. Wilson∗z, Siddhartha Jana∗, Aniruddha Marathey, Stephanie Brinky, Christopher M. Cantalupo∗, Diana R. Guttman∗, Brad Geltz∗, Lowren H. Lawson∗, Asma H. Al-rawi∗, Ali Mohammad∗, Fuat Keceli∗, Federico Ardanaz∗, Jonathan M. Eastep∗, Ayse K. Coskunz ∗Intel Corporation, yLawrence Livermore National Laboratory,zBoston University Email: ∗fsiddhartha.jana, christopher.m.cantalupo, diana.r.guttman, brad.geltz, lowren.h.lawson, asma.h.al-rawi, ali.mohammad, fuat.keceli, federico.ardanaz, [email protected], yfmarathe1, [email protected], zfdanielcw, [email protected], Abstract—Effective power management in a data center is Total Power Consumption of Broadwell Quartz Supercomputer critical to ensure that power delivery constraints are met while Nov 2017 to Aug 2018 maximizing the performance of users’ workloads. Power limiting is needed in order to respond to greater-than-expected power demand. HPC sites have generally tackled this by adopting one of two approaches: (1) a system-level power management approach that is aware of the facility or site-level power requirements, but is agnostic to the application demands; OR (2) a job-level power management solution that is aware of the application design patterns and requirements, but is agnostic to the site- (MW) Power level power constraints. Simultaneously incorporating solutions from both domains often leads to conflicts in power management 0 0.5 1 1.35 mechanisms. This, in turn, affects system stability and leads to irreproducibility of performance. To avoid this irreproducibility, Nov '17 Dec '17 Jan '18 Feb '18 Mar '18 Apr '18 May '18 Jun '18 Aug '18 HPC sites have to choose between one of the two approaches, thereby leading to missed opportunities for efficiency gains. This paper demonstrates the need for the HPC community Fig. 1. Power usage of Quartz system at LLNL over a period of one year. The to collaborate towards seamless integration of system-aware dashed line shows the peak power rating of the system. The solid blue line and application-aware power management approaches. This is shows actual instantaneous power usage whereas the solid black line shows achieved by proposing a new dynamic policy that inherits the moving average of the instantaneous power draw over a window of one day. benefits of both approaches from tight integration of a resource manager and a performance-aware job runtime environment. An empirical comparison of this integrated management approach against state-of-the-art solutions exposes the benefits of investing year. The system is rated to consume 1.35 MW as indicated in end-to-end solutions to optimize for system-wide performance or efficiency objectives. With our proposed system–application by the dashed line, but the average power draw remains at 830 integrated policy, we observed up to 7% reduction in system kW. Another example is the Cori system at NERSC which is time dedicated to jobs and up to 11% savings in compute energy, rated to consume 5.7 MW, but its peak power has only reached compared to a baseline that is agnostic to system power and 4.6 MW, and it typically only consumes 3.9 MW [1]. application design constraints. Index Terms—Energy-aware systems, Scheduling, Information Power delivery infrastructure must ensure that a site’s total resource management power consumption does not exceed the deliverable power capacity. The site’s power draw depends on non-uniform demands of its compute nodes, servicing different applications. I. INTRODUCTION State-of-the-art approaches for power management of the Sizing a data center’s power supply involves a trade-off compute nodes can be divided into two categories: (1) system- between peak performance of individual workloads, and the level control based on the observed consumption (e.g., [2], total number of hosts available to run those workloads. Invest [3]), and (2) application-level control based on the required too little in power, and you run the risk that a cluster will power (e.g., [4]). Solution-(1) has visibility into the site- perform poorly because it needs more power than can be level constraints, but is agnostic to the actual application delivered. But if you invest too much in power delivery demands. Solution-(2) is aware of the application design equipment, you sacrifice capital and floor space that could requirements, but remains oblivious to the available power at have gone to acquiring more computing infrastructure. Several the system level. This paper presents an opportunity analysis examples involving under-utilization of procured power exist of approaches that leverage the best of both solutions by inte- in the real world. For example, Figure 1 shows the average grating system-level and application-level power management power draw of the Quartz system at LLNL over a period of one policies. The key takeaways from this paper are as follows: Within a site, a system can be composed of many nodes • A siloed power management strategy, isolated to a that react differently under the same workloads. Even with single layer of the system stack, does not sufficiently a homogeneous system containing all nodes with the same optimize system performance or efficiency objectives. configuration, hardware variation can have a significant impact Moreover, having multiple such solutions run simultane- on workload efficiency and performance [6]. ously without coordination leads to unpredictability. For Workload characteristics also have a significant impact on example, if power limits are controlled through the same performance and efficiency [7]. For example, workloads often hardware interface by both a resource manager and a have compute-intensive and non-compute-intensive phases. job runtime environment, one layer may unintentionally Non-compute-intensive phases could result from pipeline stalls overwrite limits set by the other layer. due to dependencies on memory, network resources, storage • The solution is for HPC vendors to use holistic devices, accelerators, thread scheduling, etc. Since CPU activ- approaches that apply power management techniques ity is a major contributor to total system power, and can be that run in tandem to address both system-level and controlled with low-latency interfaces, this paper studies the application-level power requirements. impact of controlling CPU power to meet total system power • Power delivery infrastructure can handle more aggressive and performance constraints. power limits with less impact to quality of service by Even within a phase of an application, workloads can coordinating between layers. Such coordination can ulti- require varied demand for compute resources across processes, mately increase the amount of science per watt. called workload imbalance in this paper. Workload imbalance The paper presents an opportunity analysis for introducing in bulk-synchronous workloads can make the application’s interoperability among multiple layers of the system software overall performance largely insensitive to the performance of stack. The novel contributions of this paper are as follows: some of its parts, since only the process in the workload’s critical path will have an immediate impact on aggregate 1) We propose a workload-aware and system-power-aware performance. Several efforts have identified software critical policy that reduces time to solution in a power-limited paths for optimization in development and for modifying an cluster by distributing power both across and within execution environment to exploit that insensitivity [4], [8]. workloads in a performance-aware manner. The theme among our motivations is that some perfor- 2) We identify a set of application design characteristics mance and efficiency factors are most visible with a system- that emulate different power and performance trade-offs wide perspective, and others are more visible within the scale seen in HPC applications, and we design a microbench- of a single workload execution. In this paper, we evaluate mark1 to trigger these different conditions. power management policies with varying degrees of visibility 3) We evaluate the proposed policy against two other into system constraints and workload properties. Through our dynamic policies with either system power awareness evaluation, we demonstrate an opportunity for the HPC com- or workload awareness, and an additional policy with munity to work toward standardization of power management neither feature. Our evaluations at scale show up to policies across layers of a system stack. This is in alignment 7% system time reduction and up to 11% CPU energy with the charter of the HPC PowerStack consortium [9]. savings, versus the policy that has neither feature. This paper is organized as follows. We first motivate the III. SYSTEM-LEVEL POWER MANAGEMENT POLICIES implementation of a multi-level system-wide power manage- To evaluate the potential for performance improvement, we ment stack. Next, we describe the power management policies, design a new power management policy, MixedAdaptive. along with the synthetic benchmark and cluster environment We also implement baseline power management policies for used for evaluation. Then we describe the matrix of workload comparison purposes. combinations and power cap levels. We explain our results before moving on to discuss related approaches to system- A. Performance-Aware System Power Management wide power management. We conclude with takeaways and The proposed MixedAdaptive policy enables a resource the next steps that should follow this work. manager to share power across jobs in a power-aware manner. II. MOTIVATION This policy’s power awareness is made available to the resource manager by a job runtime, which can search at execu- HPC sites have different power and energy requirements, tion time for the distribution of available power that minimizes depending on their environment and the types of workloads elapsed time per iteration in a workload. Our experiments they run. The Energy Efficient High Performance Computing utilize the GEOPM [4] job runtime to apply energy- and Working Group (EEHPC-WG) performed a survey of Top500 performance-aware power management algorithms.

Load more