TEAPC: Adaptive Computing and Underclocking in a Real PC Augustus K
Total Page:16
File Type:pdf, Size:1020Kb
From: The Proceedings of the First IBM P=ac2 Conference, October 6-8, 2004,IBM T.J. Watson Research Center, Yorktown Heights, NY, USA. Copyright 2004, A.K. Uht and R.J. Vaccaro. TEAPC: Adaptive Computing and Underclocking in a Real PC Augustus K. Uht and Richard J. Vaccaro University of Rhode Island Microarchitecture Research Institute Department of Electrical and Computer Engineering 4 East Alumni Ave. Kingston, RI 02864, USA {[email protected], [email protected]} real-time internal temperature. Since this is the major Abstract component of variation of timing within an Intel TEAPC is an IBM/Intel-standard PC realization of microprocessor (the CPU’s core voltage is highly the TEAtime performance “maximizing” adaptive regulated and approximately constant), it mirrors the computing algorithm, giving performance beyond original TEAtime philosophy. Further, it is possible worst-case-specifications. TEAPC goes beyond the to base the real-time performance on the real-time TEAtime algorithm by adapting to the current CPU power consumption of the entire PC; the latter is load. It is also the first machine to use extensive partially addressed in this paper. underclocking for disaster tolerance, low power consumption and high reliability. This is all done We have implemented the entire control system in dynamically, at runtime, on an unmodified standard software, in a Windows application. It runs in a operating system (Windows 2000) with a purely normal application mode on Windows 2000, software-implemented feedback-control based multitasking normally with other applications. CPU algorithm. frequency and core voltage are changed dynamically, based on the output of a feedback control system 1 Introduction using the CPU chip temperature as its primary input. While details such as specific hardware device Adaptive systems “maximize” performance or addresses may change, the basic approach and “minimize” power consumption with respect to detailed realization of the control loop and changes in environmental, operating and underclocking should be implementable on any PC to manufacturing conditions. Underclocking reduces the be built, as well as many that already exist. operating frequency below that normally specified, reducing power consumption and increasing Summarizing, our goals are: TEAPC should have the reliability. In this paper we present a prototype based adaptation and frequency maximization properties of on the standard IBM/Intel PC architecture realizing TEAtime; additionally, TEAPC should realize both adaptive computing and underclocking. adaptation to varying CPU loads, low power consumption on-demand, and disaster tolerance. All In prior work we demonstrated an adaptive system of these goals have been realized with the aid of a called TEAtime with a simple physical prototype (see purely software-based feedback control system, Section 2). We wanted to fully demonstrate employing CPU frequency underclocking as needed. TEAtime’s potential by putting it in a production microprocessor, but that is problematic. We therefore The rest of the paper is organized as follows. Section did the next best thing: demonstrating TEAtime’s 2 discusses prior work and background information. adaptive attributes on a real PC assembled with Section 3 describes the initial TEAPC solution and COTS (Commercial Off-The-Shelf) parts. TEAPC, real system constraints. In Section 4 TEAPC’s final the resulting prototype, goes further than TEAtime: It version is discussed, along with the experimental adapts to current CPU computational loading, and methodology. Section 5 presents the experiments, greatly underclocks for disaster tolerance and low data, and analyses. Examples of different TEAPC power, while still functioning normally. operating scenarios are given in Section 6. We conclude in Section 7. The initial TEAPC implementation used timing- based sensors, similar to that used in the TEAtime 2 Prior Work prototype. For reasons to be discussed later, this was abandoned in favor of basing the real-time Performance- and power- adaptive computing performance or frequency of the CPU on the CPU’s approaches have only been proposed in the last 45 decade or so. Most techniques are mainly concerned the frequency is decreased until the error ceases, and with power reduction [1, 4, 5], while a few recent then the process repeats. Since the tracking logic is methods consider performance enhancement [6, 8]. guaranteed by design to be slower than any other path Most methods use voltage-scaling for power- in the system, any timing error will occur in the reduction. The recent Razor proposal [2] and the tracking logic first, and only there. Hence, timing earlier TIMERRTOL model [11] are unusual in that errors in the main system are completely avoided. timing errors are not avoided, but rather are used as a This also completely avoids added pipeline stalls or feedback mechanism to regulate either power other additional clock cycles. Throughput increases consumption (Razor) or performance of almost a factor of two were demonstrated in a (TIMERRTOL). They employ different methods to physical TEAtime prototype. recover from the timing errors. One of the key TEAtime adapted to changes in the environment observations that can be made about all of this (e.g., temperature), operating conditions (e.g., supply adaptive computing work is that in general the same voltage) and manufacturing conditions (e.g., quality basic mechanism can be used for either power of a given production run), dynamically consumption or performance enhancement (or both). “maximizing” performance These methods achieve better-than-worst-case performance. In the case of Razor, its operating While the TEAtime prototype demonstrated the basic voltage is lowered below specs, reducing power. worth and functionality of the TEAtime idea, it used a very simple home-brew processor implemented on Skadron et al [9] first considered the use of formal an FPGA. In our desire to extend the TEAtime ideas feedback control theory as applied to the on-chip and applications, we decided to implement and control of chip temperature of a microprocessor. evaluate TEAtime on a real PC, using COTS parts. Instruction fetch toggling was used to control the We have created a high-performance feedback- chip temperature. In [10] on-chip frequency-scaling control system using normal application software, not was used to regulate the chip temperature using a requiring any operating system modifications. simple linear relationship between frequency and Disaster tolerance is achieved, and reduced power temperature; formal control theory was not needed consumption and increased reliability may be user- since the relevant thermal delays and changes in specified. TEAPC is the result. frequency and temperature were small. Slowdown was also small. Reliability could not be enhanced. In TEAPC, performance can be increased with the 3 Initial TEAPC Architecture particular methods used, and at the system level, and Control System without chip modification. Should reliability be an issue, TEAPC can instead enhance that. 3.1 Foundations of TEAPC Rohou and Smith [7] used operating-system based The guiding principle in the design of TEAPC was to software to control temperature of a chip in a system use as much COTS hardware and software as via frequency scaling; performance only decreases, possible. To this end, TEAPC’s foundation is a never increases. The frequency control mechanism standard IBM/Intel PC architecture assembled out of puts the CPU in either a doze state or a full operation COTS parts. (A commercial PC was not considered, state, while TEAPC utilizes sophisticated feedback since it would be unlikely to allow the kind of control for adjustment of frequency, temperature and internal access we required. However, the results of reliability. TEAPC executes all code normally, all of this work can easily be applied to commercial PC’s, the time. Also, TEAPC does not require operating during their design, construction and/or testing.) system modifications, and can increase performance. We wished to be conservative in our results, and Our recent work, TEAtime (Timing Error Avoidance) knew that it is always hardest to speedup or otherwise [12-14], approximated a maximization of control more advanced components. We therefore performance much above that normally allowed by assembled TEAPC out of what were at the time high- standard design practices. The key component was end parts: a 3.0 Ghz Intel Pentium 4 microprocessor tracking logic, a one-bit wide copy of the worst-case- with an 800 MHz bus and an Intel 875P chipset (a delay path between any pair of flip-flops in the digital “chipset” is the glue logic that connects the CPU to system, plus a small delay margin. In operation, the the main memory and I/O devices; in Figure 4 it is tracking logic is fed a never-ending stream of the Northbridge/Southbridge pair of chips). The main alternating one’s and zero’s. The operating frequency memory used high speed DDR (Dual Data Rate) of the system is incrementally increased until an error dynamic RAM. occurs in the output of the tracking logic; at this point 46 3.2 Initial Approach Towards a oscillations (peg-to-peg) and slow response. The main reason for the inadequacy of this simplistic TEAPC Prototype approach is the thermal capacitance of the Our initial plan and activities were to closely model components’ heatsinks, with the resulting roughly TEAPC operation on that of the TEAtime prototype. exponential time delay between component heating While we were of course unable to embed tracking and sensor response. logic in the existing components, especially the CPU, We then used formal feedback control system theory we were still able to create tracking sensors whose to eventually achieve a fast frequency response to delay varied with temperature, supply voltage, etc. thermal changes. The basic control input is a While we did not fully validate the following, since temperature setting, Tset, used by the control system each sensor was a standard CMOS logic gate to maintain a CPU temperature of Tset degrees C.