Wattwatcher: Fine-Grained Power Estimation on Live Multicore Systems Using Conﬁgurable Models

WattWatcher: Fine-Grained Power Estimation on Live Multicore Systems Using Configurable Models Michael LeBeane∗, Jee Ho Ryoo†, Reena Panda‡, and Lizy K. John§ Department of Electrical and Computer Engineering, The University of Texas at Austin Email: ∗[email protected], †[email protected], ‡[email protected] , §[email protected] Abstract—Power is a first-order design constraint in any Sample Points modern computing platform. Extensive research has focused on estimating power to guide advances in power management Total schemes, thermal hot spots, and voltage noise. However, most Watts Core 0 power estimation tools possess drawbacks related to the level of Time detail and amount of training needed for reliable results. This Core 0 Watts OTHER OTHER paper introduces WattWatcher, a power measurement framework 4% 3% Core1 MC MC that provides fine-grained power traces by passing predefined 6% 8% OoO OoO 13% performance counters and a hardware descriptor file into config- 17% ALU L3 L3 9% 16% urable back-end power models. We show that WattWatcher has Core 2 ALU 12% 13% Fetch L2 9% a mean absolute percentage error of 2.67% aggregated over all Fetch 17% LS L2 12% benchmarks when compared to measured power consumption on Core 3 LS 13% 31% 17% SPEC CPU 2006 and PARSEC benchmarks across three different Time machines. (a) Per Core Power Breakdown (b) Functional Unit Power Breakdown I. INTRODUCTION Fig. 1. Illustration of important fine-grained monitoring features provided by WattWatcher. Estimating the power and energy consumption of processors, either at runtime or offline, is a critical concern in statistics collected by WattWatcher are generic enough modern machines. Currently, coarse grained measurements to apply to most modern processors in a variety of can be obtained through hardware power counters [1] [2] or form factors. Researchers and vendors can add other external probes [3]. While processor wide power metering processors to our tool by mapping these machines to may be useful for high level policies, such aggregate studies the WattWatcher interface. mask important internal power consumption trends. Figure 1(a) • WattWatcher’s power models do not require significant illustrates a common scenario where individual core power training. Most curve fitting models require an exten- consumption trends drastically differ from the power consump- sive amount of training over a wide enough sample tion of the entire processor. Isolating and understanding the space to cover all possible program types that will be trends in individual core power consumption is critical for run in the future. WattWatcher specifically models all researching and applying per core DVFS, power capping, and of the major functional units in a microprocessor by power gating techniques. Even core level power estimates, querying the SUT for its hardware configuration. however, may prove insufficient for certain areas of active • WattWatcher offers power breakdowns at the individ- research. Figure 1(b) illustrates how functional unit power ual core and functional unit granularity. This supports consumption can vary drastically over the course of program advanced research that requires a much finer level execution. Measuring the power consumption of the internal component breakdown than is available from coarse components of a core is critical for studies in thermal analysis grained monitoring tools. and voltage noise. This paper is organized into several sections that present the Some curve fitting solutions [4][5][6][1] can overcome WattWatcher methodology and toolkit. Section II explains the the granularity limitations inherent in coarse grained power design of WattWatcher. Section III evaluates how WattWatcher probes, but require extensive training on numerous targeted performs on three different processors. Finally, section IV microbenchmarks, limiting their deployability. These solutions concludes the paper. are also extremely sensitive to the coverage of the training set, and fail when exposed to trends that they have not yet encoun- II. WATTWATCHER:OVERVIEW AND OPERATION tered. In this paper, we present WattWatcher, a methodology In this paper we introduce WattWatcher, a tool that mea- and accompanying tool-kit that uses detailed configurable sures performance events in live systems and manipulates them models from the computer architecture simulation domain to drive detailed configurable power models. WattWatcher es- and adapts them for power modeling on real live multicore timates structure and functional unit access patterns to produce systems. WattWatcher works by collecting performance events input traces suitable for power and timing models traditionally from the system under test (SUT) and passing them through associated with cycle accurate simulation. By calling into easily customizable power models. Our work offers several the back-end power model at a much larger time granularity contributions over the prior art, and carves out a unique and than traditional cycle accurate simulators, we can utilize these important spot in available power estimation methodologies: power models in a realtime environment. • WattWatcher can model a variety of different proces- There are a number of configurable power simulators that sors with its extensible configuration interface. The could serve as the backend for WattWatcher [7][8][9]. For the TABLE I. GENERIC HARDWARE EVENTS COLLECTED BY Hardware Processor WATTWATCHER. Counter Hardware SUT Analyzer Descriptor Events Events Descriptor Category Hardware Events Power Core/Kernel cycles, instructions, uops, migrations Analysis Frontend branches, mispredictions, IC/iTLB misses LS L1/L2/LLC misses, LLC/dTLB misses Execution FP scalar, FP packed WattWatcher Analyzer WattWatcherW Network specific embodiment of the tool presented in this paper, we CollectorCo Connection have chosen to implement WattWatcher around McPAT [8], Input Formatter due to its frequent updates, verification, and popularity within Operating HW Event Customized McPAT System Interface the computer architecture community. The following sections Output Formatter describe how WattWatcher integrates real hardware informa- Admin Application WattWatcher Control tion into the back-end model and the layout of the tool. Controller A. Modeling Fig. 2. Overview of the WattWatcher toolchain. Since most configurable architectural power models are processor state. WattWatcher will use this information along originally designed for simulation environments, they require with statistics it gathers on core C-state transfers to factor a large number of input statistics representing precise events in the correct leakage power. Secondly, if no coarse grained inside the microarchitecture. However, researchers have noted measurement tool is available, the datasheet for the processor a tight correlation between power consumption and only a can be consulted for ideal numbers regarding power dissipation few important hardware events that are exposed as perfor- in different C-states. mance counters [4]. WattWatcher leverages this knowledge by monitoring only the generally available counters shown in B. Tool Overview Table I, deriving the inputs to the power model either directly WattWatcher is a toolkit that integrates a number of Linux from these counters, or by making reasonable estimations and utilities and McPAT together with configurable system models transformations. A few simple examples are illustrated in the and functional unit estimators. The toolkit is divided up into following paragraphs. three components: the Controller, Analyzer, and Collector. OoO Engine and Instruction Fetch: These structures include These three elements work together and interact with each the reorder buffer, instruction window, and reservation stations. other to comprise the full WattWatcher toolkit. A description Accesses to these data structures are estimated directly based of how these elements operate together in a common workflow on the instructions (either RISC instructions or x86 micro ops) is presented in Figure 2 and is described in the following issued and retired. paragraphs in detail. Register File Accesses: Register file access are estimated Controller: All user interaction with the WattWatcher system based on the number and type of instructions issued. For is initiated via the Controller module. The user passes a example, we assume that integer instructions will perform on number of parameters to the Controller at startup, such as the average two reads to the integer register file, and one write. location (hostname) of the SUT(s) and the counter descriptor Similarly for floating point, where the width of the read and the file containing the umask and event numbers. The WattWatcher number of instructions is determined by characteristics of the Controller opens a connection to the SUT(s) and queries it to instruction (single vs. double precision and packed vs. scalar). gather high level statistics on microarchitectural features such Memory Hierarchy Accesses: Most memory hierarchy statis- as cache layout, number of CPUs, and core frequency. These tics are collected directly from the processor counters. Load statistics are used to populate an XML file that represents the and store queue activity is based on the L1 D-cache statistics. machine configuration in McPAT. Functional unit information Raw number of access to a level of the memory hierarchy are is estimated from a pre-populated table of common system estimated based on the number of misses in the level directly configurations. This information

Load more