Arxiv:1401.4655V1 [Cs.OH] 19 Jan 2014 Eoyssesi Plcto N Srbhvo Eedn 2.More [2]
Total Page:16
File Type:pdf, Size:1020Kb
The Energy/Frequency Convexity Rule: Modeling and Experimental Validation on Mobile Devices Karel De Vogeleer1, Gerard Memmi1, Pierre Jouvelot2, and Fabien Coelho2 1 TELECOM ParisTech – INFRES – CNRS LTCI - UMR 5141 – Paris, France 2 MINES ParisTech – CRI – Fontainebleau, France {karel.devogeleer,gerard.memmi}@telecom-paristech.fr, {pierre.jouvelot,fabien.coelho}@mines-paristech.fr Abstract. This paper provides both theoretical and experimental evi- dence for the existence of an Energy/Frequency Convexity Rule, which relates energy consumption and CPU frequency on mobile devices. We monitored a typical smartphone running a specific computing-intensive kernel of multiple nested loops written in C using a high-resolution power gauge. Data gathered during a week-long acquisition campaign suggest that energy consumed per input element is strongly correlated with CPU frequency, and, more interestingly, the curve exhibits a clear minimum over a 0.2 GHz to 1.6 GHz window. We provide and motivate an ana- lytical model for this behavior, which fits well with the data. Our work should be of clear interest to researchers focusing on energy usage and minimization for mobile devices, and provide new insights for optimiza- tion opportunities. Keywords: energy consumption and modeling, DVFS, power consump- tion, execution time modeling, smartphone, bit-reverse algorithm. 1 Introduction The service uptime of battery-powered devices, e.g., smartphones, is a sensitive issue for nearly any user [1]. Even though battery capacity and performance are hoped to increase steadily over time, improving the energy efficiency of current battery-powered systems is essential because users expect right now communica- tion devices to provide data access every time, everywhere to everyone. Under- standing the energy consumption of the different features of (battery-powered) arXiv:1401.4655v1 [cs.OH] 19 Jan 2014 computer systems is thus a key issue. Providing models for energy consumption can pave the way to energy optimization, by design and at run time. The power consumption of Central Processing Units (CPUs) and external memory systems is application and user behavior dependent [2]. Moreover, for cache-intensive and CPU-bound applications, or for specific Dynamic Voltage and Frequency Scaling (DVFS) settings, the CPU energy consumption may dom- inate the external memory consumption [3]. For example, Aaron and Carroll [2] 2 K. De Vogeleer, G. Memmi, P. Jouvelot, and F. Coelho showed that, for an embedded system running equake, vpr, and gzip from the SPEC CPU2000 benchmark suite, the CPU energy consumption exceeds the RAM memory consumption, whereas crafty and mcf from the same suite showed to be straining more energy from the device RAM memory. Providing an accurate model of energy consumption for embedded and, more generally, energy-limited devices such as mobile phones is of key import to both users and system designers. To reach that goal, our paper provides both theo- retical and first experimental evidence for the existence of an Energy/Frequency Convexity Rule, that relates energy consumption and CPU frequency on mobile devices. This convexity property seems to ensure the existence of an optimal frequency where energy usage is minimal. This existence claim is based on both theoretical and practical evidence. More specifically, we monitored a Samsung Galaxy SII smartphone running Gold- Rader’s Bit Reverse algorithm [4], a small kernel based on multiple nested loops written in C, with a high-resolution power gauge from Monsoon Solutions Inc. Data gathered during a week-long acquisition campaign suggest that energy consumed per input element is strongly correlated with CPU frequency and, more interestingly, that the corresponding curve exhibits a clear minimum over a 0.2 GHz to 1.6 GHz window. We also provide and motivate an analytical model of this behavior, which fits well with the data. Our work should be of clear interest to researchers focusing on energy usage and minimization on mobile devices, and provide new insights for optimization opportunities. The paper is organized as follows. Section 2 introduces the notions of en- ergy and power, and how these can be decomposed in different components on electronic devices. Section 3 describes the power measurement protocol and methodology driving our experiments, and the C benchmark we used. Section 4 introduces our CPU energy consumption model, and shows that it fits well with the data. Section 5 outlines the Energy/Frequency Rule derived from our experiment and modeling. Related work is surveyed in Section 6. We conclude and discuss future work in Section 7. 2 Power Usage in Computer Systems The total power Ptotal consumed by a computer system, including a CPU, may be separated into two components: Ptotal = Psystem + PCPU, where PCPU is consumed by the CPU itself and Psystem by the rest of system. In a battery- powered hand-held computer device Psystem may include the power needed to light the LCD display, to enable and maintain I/O devices (including memory), to keep sensors online (GPS, gyro-sensors etc.), and others. The power consumption PCPU of the CPU we focus on here can be divided into two parts: PCPU = Pdynamic+Pleak, where Pdynamic is the power consumed by the CPU during the switching activities of transistors during computation. Pleak is power originating from leakage effects inherent to silicon-based transistors, and is in essence not useful for the CPU’s purposes. Pdynamic may be split into the power Pshort lost when transistors briefly short-circuit during gate state changes The Energy/Frequency Convexity Rule 3 and Pcharge, needed to charge the gates’ capacitors: Pdynamic = Pshort +Pcharge. In 2 the literature Pcharge is usually [5] defined as αCfV , where α is a proportional constant indicating the percentage of the system that is active or switching, C the capacitance of the system, f the frequency at which the system is switching and V the voltage swing across C. Pshort originates during the toggling of a logic gate. During this switching, the transistors inside the gate may conduct simultaneously for a very short time, creating a direct path between VCC and the ground. Even though this peak current happens over a very small time interval, given current high clock frequencies and large amount of logic gates, the short-circuit current may be non-negligible. Quantifying Pshort is gate specific but it may be approximated by deeming it proportional to Pcharge. Thus the power Pdynamic stemming from the switching activities and the short-circuit currents in a CPU is thus Pcharge + 2 (η 1)Pcharge, i.e., η αCLfV , where η is a scaling factor representing the effects of− short-circuit power.· Pleak originates from leakage currents that flow between differently doped parts of a metal-oxide semiconductor field-effect transistor (MOSFET), the basic building block of CPUs. The energy in these currents are lost and do not con- tribute to the information that is held by the transistor. Some leakage currents are induced during the on or off -state of the transistor, or both. Six distinct sources of leakage are identified [6]. Despite the presence of multiple sources of leakage in MOSFET transistors, the sub-threshold leakage current, gate leakage, and band-to-band tunneling (BTBT) dominate the others for sub-100 nm tech- nologies [7]. Leakage current models, e.g., as incorporated in the BSIM [6] micro models, are accurate yet complex since they depend on multiple variables. More- over, Pleak fluctuates constantly as it also depends on the temperature of the sys- tem. Consequently Pleak cannot be considered a static part of the system’s power consumption. Given the different sources of power consumption in a MOSFET based CPU, the potal power can be rewritten as Ptotal = Psystem+Pleak+Pdynamic. The relationship between the power P (t) (Watts or Joules/s) and the energy E(∆t) (Joules) consumed by an electrical system over a time period ∆t is given by ∆t ∆t E(∆t)= P (t) dt = I(t) V (t) dt, (1) Z0 Z0 · where I(t) is the current supplied to the system, and V (t) the voltage drop over the system. Often V (t) is constant over time, hence dP (t)/dt only depends on I(t). If both current and voltage are constant over time, the energy integral becomes the product of voltage, current and time, or alternatively power and time. 3 Power Measurement Protocol on Mobile Devices A Samsung Galaxy S2 is used in our testbed sporting the Samsung Exynos 4 Systems-on-Chip (SoC) 45nm dual-core. The Galaxy S2 has a 32KB L1 data and instruction cache, and a 1 MB L2 cache. The mobile device runs Android 4 K. De Vogeleer, G. Memmi, P. Jouvelot, and F. Coelho 4.0.3 on the Siyah kernel adopting Linux 3.0.31. The frequency scaling governor in Linux was set to operate in userspace mode to prevent frequency and voltage scaling on-the-fly. The second CPU core was disabled during measurements. The smartphone is booted in clockwork recovery mode to minimize noisy side-effects of the Operating System (OS) and other frameworks. During the experiments, the phone’s battery was replaced by a power supply (Monsoon Power Monitor) that measures the power consumption at 5 kHz with an accuracy of 1mW. The power of the system and the temperature of the CPU were simultaneously logged. The kernel was patched to print a temperature sample to the kernel debug output at a rate of 2 Hz. The bit-reverse algorithm is used as benchmark kernel. This is an important operation since it is part of the ubiquitous Fast Fourier Transformation (FFT) algorithm, and rearranges deterministically elements in an array. The bit-reversal kernel is CPU intensive, induces cache effects, and is economically pertinent. The Gold-Rader implementation of the bit-reverse algorithm, often considered the reference implementation [4], is given below: void bitreverse_gold_rader (int N, complex *data) { int n = N, nm1 = n-1; int i = 0, j = 0; for (; i < nm1; i++) { int k = n >> 1; if (i < j) { complex temp = data[i]; data[i] = data[j]; data[j] = temp; } while (k <= j) {j -= k; k >>= 1;} j += k; } } The input of the bit-reversal algorithm is an array with a size of 2N ; the elements are pairs of 32 bit integers, representing complex numbers.