Vertigo: Automatic Performance-Setting for Linux

USENIX Association Proceedings of the 5th Symposium on Operating Systems Design and Implementation Boston, Massachusetts, USA December 9–11, 2002 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2002 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Vertigo: Automatic Performance-Setting for Linux Krisztián Flautner Trevor Mudge [email protected] [email protected] ARM Limited The University of Michigan 110 Fulbourn Road 1301 Beal Avenue Cambridge, UK CB1 9NJ Ann Arbor, MI 48109-2122 Abstract player, game machine, camera, GPS, even the wallet— into a single device. This requires processors that are Combining high performance with low power con- capable of high performance and modest power consumption is becoming one of the primary objectives of sumption. Moreover, to be power efficient, the proces- processor designs. Instead of relying just on sleep mode sors for the next generation communicator need to take for conserving power, an increasing number of proces- advantage of the highly variable performance require- sors take advantage of the fact that reducing the clock ments of the applications they are likely to run. For frequency and corresponding operating voltage of the example an MPEG video player requires about an order CPU can yield quadratic decrease in energy use. How- of magnitude higher performance than an MP3 audio ever, performance reduction can only be beneficial if it player but optimizing the processor to always run at the is done transparently, without causing the software to level that accommodates the video player would be miss its deadlines. In this paper, we describe the imple- wasteful. mentation and performance-setting algorithms used in Dynamic Voltage Scaling (DVS) exploits the fact Vertigo, our power management extensions for Linux. that the peak frequency of a processor implemented in Vertigo makes its decisions automatically, without any CMOS is proportional to the supply voltage, while the application-specific involvement. We describe how a amount of dynamic energy required for a given work- hierarchy of performance-setting algorithms, each spe- load is proportional to the square of the processor’s cialized for different workload characteristics, can be supply voltage [12]. Running the processor slower used for controlling the processor’s performance. The means that the voltage level can also be lowered, yield- algorithms operate independently from one another ing a quadratic reduction in energy consumption, at the and can be dynamically configured. As a basis for com- cost of increased run time. The key to making use of parison with conventional algorithms, we contrast this trade-off are performance-setting algorithms that measurements made on a Transmeta Crusoe-based aim to reduce the processor’s performance level (clock computer using its built-in LongRun power manager frequency) only when it is not critical to meeting the with Vertigo running on the same system. We show that software’s deadlines. The key observation is that often unlike conventional interval-based algorithms like Lon- the processor is running too fast. For example, it is gRun, Vertigo is successful at focusing in on a small pointless from a quality-of-service perspective to range of performance levels that are sufficient to meet decode the 30 frames of a video in half a second, when an application’s deadlines. When playing MPEG mov- the software is only required to display those frames ies, this behavior translates into a 11%-35% reduction during a one second interval. Completing a task before of mean performance level over LongRun, without any its deadline is an inefficient use of energy [6]. negative impact on the framerate. The performance While dynamic power currently accounts for the reduction can in turn yield significant power savings. greatest fraction of a processor’s power consumption, static power consumption, which results from the leak- 1. Introduction age current in CMOS devices, is rapidly increasing. If Power considerations are increasingly driving pro- left unchecked, in a 0.07 micron process, leakage cessor designs from embedded computers to servers. power could become comparable to the amount of Perhaps the most apparent need for low-power proces- dynamic power [3]. Similarly to dynamic power, leak- sors is for mobile communication and PDA devices. age can also be substantially reduced if the processor These devices are battery operated, have small form does not always have to operate at its peak performance factors and are increasingly taking up computational level. One technique for accomplishing this is adaptive tasks that in the past have been performed by desktop reverse body biasing (ABB), which combined with computers. The next generation 3G mobile phones dynamic voltage scaling can yield substantial reduction promise always-on connections, high-bandwidth in both leakage and dynamic power consumption [11]. mobile data access, voice recognition, video-on- The pertinent point for this paper with respect to DVS demand services, video conferencing and the conver- and ABB is that lowering the speed of the processor gence of today’s multiple standalone devices—MP3 results in better than linear energy savings. Vertigo pro- vides the main lever for controlling both of these tech- FIGURE 1. MPEG video playback LongRun vs. Vertigo Danse De Cable MPEG Legendary MPEG 100% 100% 4.07% 80% 80% 600 Mhz 47.72% 48.34% 60% 60% 600 Mhz 79.15% 88.06% 40% 40% 500 Mhz 29.50% 51.17% 20% 20% 400 Mhz 17.04% 500 Mhz 17.20% Fraction of time at each performance level Fraction of time at each performance level 300 Mhz 5.74% 400 Mhz 7.78% 0% 0% LongRun Vertigo LongRun Vertigo niques by providing an estimate for the necessary described in our previous papers and moves these tech- performance level of the processor. niques out of the simulator into a hardware and soft- Most mobile processors on the market today ware implementation. already support some form of voltage scaling; Intel Our performance-setting algorithms, described in calls its version of this technology SpeedStep [8]. How- Section 2, compare favorably to previous interval- ever, due to the lack of built-in performance-setting based algorithms. The two key differences in our policies in current operating systems, the computers approach are that multiple performance-setting algo- based on these chips use a simple approach that is rithms are used to come up with a global prediction and driven not by the workload but by the usage model: that the algorithms are implemented in the OS kernel, when the notebook computer is plugged in a power out- which gives them access to a richer set of data for pre- let the processor runs at a higher speed, when running dictions. The multiple performance-setting algorithms on batteries, it is switched to a more power efficient but in the system ensure that they do not all have to be opti- slower mode. Transmeta’s Crusoe processor sidesteps mal in all possible circumstances. This allows at least this problem by building the power management pol- some of the algorithms to be less concerned about the icy—called LongRun—into the processor’s firmware worst case. Figure 1 illustrates the fraction of time to avoid the need to modify the operating system [20]. spent at each of the processor’s four performance levels LongRun uses the historical utilization of the processor (300, 400, 500, and 600 Mhz) using the Crusoe’s built- to guide clock rate selection: it speeds up the processor in LongRun power manager in contrast with Vertigo if utilization is high and decreases performance if utili- during playbacks of two MPEG movies. The data for zation is low. Unlike on more conventional processors, both algorithms were collected on the same hardware, the power management policy can be implemented on however during the Vertigo measurements, the built-in the Crusoe relatively easily because it already has a LongRun power manager was disabled. While the play- hidden software layer that performs dynamic binary back quality of the different runs were identical, the translation and optimizations. However, it is currently main difference between the results is that Vertigo an open question—one that we address in this paper— spends significantly more time below peak perfor- how effectively a policy implemented at such a low mance than LongRun. During the first movie, Vertigo level in the software hierarchy can perform. switches mostly between two performance levels: the Research into performance-setting algorithms can machine’s minimum 300 Mhz and 400 Mhz, while dur- be broadly divided into two categories: ones that use ing the second, it settles on the processor’s third perfor- information about task deadlines in real-time kernels to mance level at 500 Mhz. LongRun, on the other hand, guide the performance-setting decisions of the proces- during both movies chooses the machine’s peak perfor- sor [9][13][15][19][16][17], and others that seek to mance setting for the dominant portion of execution derive deadlines automatically by either monitoring time. past utilization of the processor (interval-based tech- Vertigo is implemented as a set of kernel modules niques) [6][14][21] or based on semantic task and event and patches that hook into the Linux kernel to monitor classification [4][10]. Our work falls into the latter cat- program execution and to control the speed and voltage egory. Previously, we presented a mechanism for auto- levels of the processor (Figure 2).

Vertigo: Automatic Performance-Setting for Linux

Approaches in Green Computing

Green Destiny: a 240-Node Compute Cluster in One Cubic Meter

Service Manual

Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

Memorandum in Opposition to Hewlett-Packard Company's Motion to Quash Intel's Subpoena Duces Tecum

A Dynamic Voltage Scaling Algorithm for Sporadic Tasks∗

Comptia Fc0-Gr1 Exam Questions & Answers

The Technology Behind Crusoe™ Processors

USCOURTS-Ca9-09-35307-1.Pdf

TRANSMETA BREAKS X86 LOW-POWER BARRIER VLIW Chips Use Hardware-Assisted X86 Emulation by Tom R

Power Reduction Techniques for Microprocessor Systems

The First International Conference on Mobile Systems, Applications, and Services