Energy Management for -Based Virtual Machines

Jan Stoess Christian Lang Frank Bellosa System Architecture Group, University of Karlsruhe, Germany {stoess, chlang, bellosa}@ira.uka.de

Abstract Research has proposed several approaches to OS di- Current approaches to power management are rected control over a computer’s energy consump- based on operating systems with full knowledge of tion, including user- and service-centric management and full control over the underlying hardware; the schemes. However, most current approaches to en- distributed nature of multi-layered ergy management are developed for standard, legacy environments renders such approaches insufficient. OSes with a monolithic kernel. A monolithic kernel In this paper, we present a novel framework for en- has full control over all hardware devices and their ergy management in modular, multi-layered oper- modes of operation; it can directly regulate device ating system structures. The framework provides activity or energy consumption to meet thermal or a unified model to partition and distribute energy, energy constraints. A monolithic kernel also controls and mechanisms for energy-aware resource account- the whole execution flow in the system. It can easily ing and allocation. As a key property, the frame- track the power consumption at the level of individ- work explicitly takes the recursive energy consump- ual applications and leverage its application-specific tion into account, which is spent, e.g., in the virtu- knowledge during device allocation to achieve dy- alization layer or subsequent driver components. namic and comprehensive energy management. Our prototypical implementation targets hyper- Modern VM environments, in contrast, consist of visor-based virtual machine systems and comprises a distributed and multi-layered stack in- two components: a host-level subsystem, which con- cluding a hypervisor, multiple VMs and guest OSes, trols machine-wide energy constraints and enforces modules, and other service infrastruc- them among all guest OSes and service components, ture (Figure 1). In such an environment, direct and, complementary, an energy-aware guest oper- and centralized energy management is unfeasible, as ating system, capable of fine-grained application- device control and accounting information are dis- specific energy management. Guest level energy tributed across the whole system. management thereby relies on effective virtualiza- At the lowest-level of the virtual environment, tion of physical energy effects provided by the vir- the privileged hypervisor and host driver modules tual machine monitor. Experiments with CPU and have direct control over hardware devices and their disk devices and an external data acquisition system energy consumption. By inspecting internal data demonstrate that our framework accurately controls structures, they can obtain coarse-grained per-VM and stipulates the power consumption of individual information on how energy is spent on the hard- hardware devices, both for energy-aware and energy- ware. However, the host level does not possess any unaware guest operating systems. knowledge of the energy consumption of individual applications. Moreover, with the ongoing trend to 1 Introduction restrict the hypervisor’s support to a minimal set of hardware and to perform most of the device control Over the past few years, technology in unprivileged driver domains [8,15], hypervisor and has regained considerable attention in the design of driver modules each have direct control over a small computer systems. Virtual machines (VMs) estab- set of devices; but they are oblivious to the ones not lish a development path for incorporating new func- managed by themselves. tionality – consolidation, transparent migra- The guest OSes, in turn, have intrinsic knowledge tion, secure computing, to name a few – into a sys- of their own applications. However, guest OSes oper- tem that still retains compatibility to existing oper- ate on deprivileged virtualized devices, without di- ating systems (OSes) and applications. At the very rect access to the physical hardware, and are un- same time, the ever increasing power density and dis- aware that the hardware may be shared with other sipation of modern servers has turned energy man- VMs. Guest OSes are also unaware of the side- agement into a key concern in the design of OSes. effects on power consumption caused by the vir-

USENIX Association 2007 USENIX Annual Technical Conference 1 Applications Applications global, system-wide notion into a local, component- or user-specific one. The second contribution is a Guest OS Guest OS vCPU vNIC vDISK vCPU vNIC vDISK distributed energy accounting approach, which ac- curately tracks back the energy spent in the system Service Drv. NIC Drv. DISK to originating activities. In particular, the presented vCPU vCPU vCPU approach incorporates both the direct and the side-

Hypervisor CPU effectual energy consumption spent in the virtual- ization layers or subsequent driver components. As the third contribution, our framework exposes all re- Figure 1: Increasing number of layers and components source allocation mechanisms from drivers and other in today’s virtualization-based OSes. resource managers to the respective energy manage- ment subsystems. Exposed allocation enables dy- tual device logic: since virtualization is transpar- namic and remote regulation of energy consumption ent, the “hidden”, or recursive power consumption, in a way that the overall consumption matches the which the virtualization layer itself causes when re- desired constraints. quiring the CPU or other resources simply vanishes We have implemented a prototype that targets unaccounted in the software stack. Depending on hypervisor-based systems. We argue that virtual the complexity of the interposition, resource require- server environments benefit from energy manage- ments can be substantial: a recent study shows ment within and across VMs; hence the prototype that the virtualization layer requires a considerable employs management software both at host-level amount of CPU processing time for I/O virtualiza- and at guest-level. A host-level management subsys- tion [5]. tem enforces system-wide energy constraints among The whole situation is even worsened by the non- all guest OSes and driver or service components. partitionability of some of the physical effects of It accounts direct and hidden power consumption power dissipation: the temperature of a power con- of VMs and regulates the allocation of physical de- suming device, for example, cannot simply be par- vices to ensure that each VM does not consume more titioned among different VMs in a way that each than a given power allotment. Naturally, the host- one gets alloted its own share on the temperature. level subsystem performs independent of the guest Beyond the lack of comprehensive control over and ; on the downside, it operates at knowledge of the power consumption in the system, low level and in coarse-grained manner. To ben- we can thus identify the lack of a model to com- efit from fine-grained, application-level knowledge, prehensively express physical effects of energy con- we have complemented the host-level part with an sumption in distributed OS environments. optional energy-aware guest OS, which redistributes To summarize, current power management the VM-wide power allotments among its own, sub- schemes are limited to legacy OSes and unsuitable ordinate applications. In analogy to the host-level, for VM environments. Current virtualization solu- where physical devices are allocated to VMs, the tions disregard most energy-related aspects of the guest OS regulates the allocation of virtual devices hardware platform; they usually virtualize a set of to ensure that its applications do not spend more standard hardware devices only, without any special energy than their alloted budget. power management capabilities or support for en- Our experiments with CPU and disk devices ergy management. Up to now, power management demonstrate that the prototype effectively accounts for VMs is limited to the capabilities of the host OS and regulates the power consumption of individual in hosted solutions and mostly dispelled from the physical and virtual devices, both for energy-aware server-oriented hypervisor solutions. and energy-unaware guest OSes. Observing these problems, we present a novel The rest of the paper is structured as follows: In framework for managing energy in distributed, Section 2, we present a generic model to energy man- multi-layered OS environments, as they are com- agement in distributed, multi-layered OS environ- mon in today’s computer systems. Our framework ments. We then detail our prototypical implementa- makes three contributions. The first contribution is tion for hypervisor-based systems in Section 3. We a model for partitioning and distributing energy ef- present experiments and results in Section 4. We fects; our model solely relies on the notion of energy then discuss related approaches in Section 5, and fi- as the base abstraction. Energy quantifies the phys- nally draw a conclusion and outline future work in ical effects of power consumption in a distributable Section 6. way and can be partitioned and translated from a

2 2007 USENIX Annual Technical Conference USENIX Association 2 Distributed Energy Management the temperature of a device. Such effects can easily be expressed as energy constraints, by means of a The following section presents the design principles thermal model [3, 12]. The energy constraints can we consider to be essential for distributed energy then be partitioned from global notions into local, management. We begin with formulating the goals component-wise ones. Energy constraints also serve of our work. We then describe the unified energy as a coherent base metric to unify and integrate man- model that serves as a foundation for the rest of our agement schemes for different hardware devices. approach. We finally describe the overall structure of our distributed energy management framework. 2.3 Distributed Management 2.1 Design Goals Current approaches to OS power management are tailored to single building-block OS design, where The increasing number of layers, components, and one kernel instance manages all software and hard- subsystems in modern OS structures demands for a ware resources. We instead model the OS as a set of distributed approach to control the energy spent in components, each responsible for controlling a hard- the system. The approach must perform effectively ware device, exporting a service library, or providing across protection boundaries, and it must comprise a software resource for use by applications. different types of activities, software abstractions, Our design is guided by the familiar concept of and hardware resources. Furthermore, the approach separating policy and mechanism. We formulate the must be flexible enough to support diversity in en- procedure of energy management as a simple feed- ergy management paradigms. The desire to control back loop: the first step is to determine the current power and energy effects of a computer system stems power consumption and to account it to the origi- from a variety of objectives: Failure rates typically nating activities. The next step is to analyze the increase with the temperature of a computer node or accounting data and to make a decision based on device; reliability requirements or limited cooling ca- a given policy or goal. The final step is to respond pacities thus directly translate into temperature con- with allocation or de-allocation of energy consuming straints, which are to be obeyed for the hardware to resources to the activities, with the goal to align the operate correctly. Specific power limits,inturn,are energy consumption with the desired constraints. typically imposed by battery or backup generators, We observe that mainly the second step is asso- or by contracts with the power supplier. Controlling ciated with policy, whereas the two other steps are power consumption on a per-user base finally enables mechanisms, bound to the respective providers of accountable computing, where customers are billed the resource, which we hence call resource drivers. for the energy consumed by their applications, but We thus model the second step as an energy man- also receive a guaranteed level or quality of service. ager module, which may, but need not reside in a However, not only the objectives for power manage- separate software component or protection domain. ment are diverse; there also exists a variety of al- Multiple such managers may exist concurrently the gorithms to achieve those objectives. Some of them system, at different position in the hierarchy and use real temperature sensors, whereas others rely on with different scopes. estimation models [3, 12]. To reach their goals, the algorithms employ different mechanisms, like throt- Client Client tling resource usage, request batching, or migrating energy accounting Client

of execution [4, 9, 17]. Hence, a valid solution must Energy Resource driver 1 be flexible and extensible enough to suit a diversity manager of goals and algorithms. Resource driver 2 energy allocation Energy consuming resource 1 Energy consuming 2.2 Unified Energy Model resource 2

To encompass the diverse demands on energy man- Figure 2: Distributed energy management. Energy agement, we propose to use the notion of energy managers may reside in different components or pro- as the base abstraction in our system, an approach tection domains. Resource drivers consume resources which is similar to the currentcy model in [28]. The themselves, for which the energy is accounted back to key advantage of using energy is that it quantifies the original clients. power consumption in a partitionable way – unlike other physical effects of power consumption such as Each energy manager is responsible for a set of subordinate resources and their energy consump-

USENIX Association 2007 USENIX Annual Technical Conference 3 tion. Since the system is distributed, the resource operation with different efficiency and energy coeffi- manager cannot assume direct control or access over cients (e.g., halt cycles or different active and sleep the resource; it requires remote mechanisms to ac- modes). The ultimate goal is to achieve the opti- count and allocate the energy (see Figure 2). Hence, mal level of efficiency with respect to the current by separating policy from mechanism, we translate resource utilization, and to reduce the wasted power our general goal of distributed energy management consumption. Software-based mechanisms, in turn, into the two specific aspects of distributed energy ac- rely on the assumption that energy consumption de- counting and dynamic, exposed resource allocation; pends on the level of utilization, which is ultimately these are the subject of the following paragraphs. dictated by the number of device requests. The rate of served requests can thus be adapted by software Distributed energy accounting Estimating to control the power consumption. and accounting the energy of a physical device usu- ally requires detailed knowledge of the particular de- 3 A Prototype for vice. Our framework therefore requires each driver Hypervisor-Based Systems of an energy consuming device or resource to be ca- pable of determining (or estimating) the energy con- Based on the design principles presented above, we sumption of its resources. Likewise, it must be ca- have developed a distributed, two-level energy man- pable to account the consumption to its consumers. agement framework for hypervisor-based VM sys- If the energy management software resides outside tems. The prototype employs management soft- the resource driver, it must propagate the account- ware both at host-level and at guest-level. It cur- ing information to the manager. rently supports management of two main energy Since the framework does not assume a single ker- consumers, CPU and disk. CPU services are directly nel comprising all resource subsystems, it has to provided by the hypervisor, while the disk is man- track energy consumptions across module bound- aged by a special device driver VM. In the following aries. In particular, it must incorporate the re- section, we first describe the basic architecture of cursive energy consumption: that is, the driver of our prototype. We then present the energy model a given resource such as a disk typically requires for CPU and disk devices. We then describe the other resources, like the CPU, in order to provide host-level part, and finally the guest-level part of its service successfully. Depending on the complex- our energy management prototype. ity, such recursive resource consumption may be sub- stantial; consider, as examples, a disk driver that transparently encrypts and decrypts its client re- 3.1 Prototype Architecture quests, or a driver that forwards client requests to a Our prototype uses the L4 micro-kernel as the priv- network attached storage server via a network inter- ileged hypervisor, and para-virtualized kernel face card. Recursive resource consumption requires instances running on top of it. L4 provides core ab- energy, which must be accounted back to the clients. stractions for user level resource management: vir- In our example, it would be the responsibility of disk tual processors (kernel threads), synchronous com- driver to calculate its clients’ shares of the disk and munication, and mechanisms to recursively con- on its own CPU energy. To determine its CPU en- struct virtual address spaces. I/O devices are man- ergy, the driver must recursively query the driver of aged at user-level; L4 only deals with exposing in- the CPU resource, which is the hypervisor in our terrupts and providing mechanisms to protect device case. memory. The guest OSes are adaptions of the Linux 2.6 ker- Dynamic and exposed resource allocation To nel, modified to run on top of L4 instead of on bare regulate the energy spent on a device or resource, hardware [11]. For managing guest OS instances, the each driver must expose its allocation mechanisms to prototype includes a user-level VM monitor (VMM), energy manager subsystems. The manager leverages which provides the virtualization service based on the allocation mechanisms to ensure that energy L4’s core abstractions. To provide user-level device consumption matches the desired constraints. Allo- driver functionality, the framework dedicates a spe- cation mechanisms relevant for energy management cial device driver VM to each device, which exports can be roughly distinguished into hardware and soft- a virtual device interface to client VMs and multi- ware mechanisms. Hardware-provided power saving plexes virtual device requests onto the physical de- features typically provide a means to change power vice. The driver VMs are Linux guest OS instances consumption of a device, by offering several modes of

4 2007 USENIX Annual Technical Conference USENIX Association themselves, which encapsulate and reuse standard 3.2 Device Energy Models Linux device driver logic for hardware control [15]. In the following section, we present the device en- ergy models that serve as a base for CPU and disk APP APP APP APP APP APP APP APP accounting. We generally break down the energy Legacy guest OS accounting Energy-aware OS consumption into access and idle consumption.Ac- allocation vCPU vDISK vCPU vDISK cess consumption consists of the energy spent when using the device. This portion of the energy con- accounting Energy Driver VM sumption can be reduced by controlling device allo- Mgr. allocation vCPU DISK cation, e.g., in terms of the client request rate. Idle Hypervisor consumption, in turn, is the minimum power con- CPU sumption of the device, which it needs even when it does not serve requests. Many current microproces- Figure 3: Prototype architecture. The host-level subsystem controls system-wide energy constraints and sors support multiple sleep and active modes, e.g., enforces them among all guests. A complementary via frequency scaling or clock gating. A similar tech- energy-aware guest OS is capable of performing its own, nology, though not yet available on current standard application-specific energy management. servers, can be found in multi-speed disks, which al- low lowering the spinning speed during phases of low The prototype features a host-level energy man- disk utilization [10]. To retain fairness, we propose ager module responsible for controlling the energy to decouple the power state of a multi-speed device consumption of VMs on CPUs and disk drives. The from the accounting of its idle costs. Clients that energy manager periodically obtains the per-VM do not use the device are charged for the lowest ac- CPU and disk energy consumption from the hyper- tive power state. Higher idle consumptions are only visor and driver VM, and matches them against a charged to the clients are actively using the device. given power limit. To bring both in line, it responds by invoking the exposed throttling mechanisms for 3.2.1 CPU Energy Model the CPU and disk devices. Our energy-aware guest OS is a modified version of that imple- Our prototype leverages previous work [3, 13] and ments the resource container abstraction [1] for re- bases CPU energy estimation on the rich set of per- source management and . We enhanced formance counters featured by modern IA-32 micro- the resource containers to support energy manage- processors. For each performance counter event, the ment of virtual CPUs and disks. Since the energy- approach assigns a weight representing its contribu- aware guest OS requires virtualization of the energy tion to the processor energy. The weights are the re- effects of CPU and disk, the hypervisor and driver sult of a calibration procedure that employs test ap- VM propagate their accounting records to the user- plications with constant and known power consump- level VM monitor. The monitor then creates, for tions and physical instrumentation of the micro- each VM, a local view on the current energy con- processors [3]. Previous experiments have demon- sumption, and thereby enables the guest to pursue strated that this approach is fairly accurate for in- its own energy-aware resource management. Note, teger applications, with an error of at most 10 per- that our energy-aware guest OS is an optional part of cent. To obtain the processor energy consumption the prototype: it provides the benefit of fine-grained during a certain period of time, e.g., during execu- energy management for Linux-compatible applica- tion of a VM, the prototype sums up the number of tions. For all energy-unaware guests, our proto- events that occurred during that period, multiplied type resorts to the coarser-grained host-level man- with their weights. The time stamp counter, which agement, which achieves the constraints regardless counts clock cycles regardless whether the processor whether the guest-level subsystem is present or not. is halted or not, yields an accurate estimation of the Figure 3 gives a schematic overview of the basic CPU’s idle consumption. architecture. Our prototype currently runs on IA- 32 microprocessors. Certain parts, like the device 3.2.2 Disk Energy Model driver VMs, are presently limited to single processor Our disk energy model differs from the CPU model systems; we are working on multi-processor support in that it uses a time-based approach rather than and will integrate it into future versions. event sampling. Instead of attributing energy con- sumption to events, we attribute power consumption to different device states, and calculate the time the

USENIX Association 2007 USENIX Annual Technical Conference 5 device requires to transfer requests of a given size. 3.3.1 CPU Energy Accounting There is no conceptual limit to the number of power To accurately account the CPU energy consumption, states. However, we consider suspending the disk to we trace the performance counter events within the be an unrealistic approach for hypervisor systems; hypervisor and propagate them to the user-space en- for lack of availability, we do not consider multi- ergy manager module. Our approach extends our speed disks as well. We thus distinguish two different previous work to support resource management via power states: active and idle. event logging [20] to the context of energy man- To determine the transfer time of a request – agement. The tracing mechanism instruments con- which is equal to the time the device must remain text switches between VMs within the hypervisor; in active state to handle it –, we divide the size of at each switch, it records the current values of the the request by the disk’s transfer rate in bytes per performance counters into an in-memory log buffer. second. We calculate the disk transfer rate dynami- The hypervisor memory-maps the buffers into the cally, in intervals of 50 requests. Although we ignore address space of the energy manager. The energy several parameters that affect the energy consump- manager periodically analyzes the log buffer and cal- tion of requests, (e.g., seek time or the rotational culates the energy consumption of each VM (Figure delays), our evaluation shows that our simple ap- 4). proach is sufficiently accurate. Our observation is By design, our tracing mechanism is asynchronous substantiated by the study in [26], which indicates and separates performance counter accumulation that such a 2-parameter model is inaccurate only be- from their analysis and the derivation of the energy cause of sleep-modes, which we can safely disregard consumption. It is up to the energy manager to per- for our approach. form the analysis often enough to ensure timeliness and accuracy. Since the performance counter logs 3.3 Host-Level Energy Management are relatively small, we consider this to be easy to Our framework requires each driver of a physical de- fulfil; our experience shows that the performance vice to determine the device’s energy consumption counter records cover a few hundred or thousand and to account the consumption to the client VMs. bytes, if the periodical analysis is performed about The accounting infrastructure uses the device en- every 20th millisecond. ergy model presented above: Access consumption is charged directly to each request, after the request Energy Mgr. PMC has been fulfilled. The idle consumption, in turn, Log log cannot be attributed to specific requests; rather, it Hypervisor CPU is alloted to all client VMs in proportion to their re- PMC spective utilization. For use by the energy manager and others, the driver grants access to its accounting Figure 4: The hypervisor collects performance counter records via shared memory and updates the records traces and propagates the trace logs to user-space energy regularly. managers. In addition to providing accounting records, each resource driver exposes its allocation mechanisms to The main advantage of using tracing for recording energy managers and other resource management CPU performance counters is that it separates policy subsystems. At host-level, our framework currently from mechanism. The hypervisor is extended by a supports two allocation mechanisms: CPU throt- simple and cheap mechanism to record performance tling and disk request shaping. CPU throttling can counter events. All aspects relevant to energy esti- be considered as a combined software-hardware ap- mation and policy are kept outside the hypervisor, proach, which throttles activities in software and within the energy manager module. A further ad- spends the unused time in halt cycles. Our disk re- vantage of in-memory performance counter records quest shaping algorithm is implemented in software. is that they can easily be shared – propagating them In the remainder of this section, we first explain to other guest-level energy accountants is a sim- how we implemented runtime energy accounting and ple matter of leveraging the hypervisor’s memory- allocation for CPU and the disk devices. We then management primitives. detail how the these mechanisms enable our energy In the current prototype, the energy manager management software module to keep the VMs’ en- is invoked every 20 ms to check the performance ergy consumption within constrained limits. counter logs for new records. The log records con- tain the performance counter values relevant for en-

6 2007 USENIX Annual Technical Conference USENIX Association ergy accounting, sampled at each context switch to- the disk. For that purpose, the driver invokes the gether with an identifier of the VM that was ac- following procedure periodically every 50 ms: tive on the CPU. For each period between subse- /* estimate idle energy since last time */ quent context switches, the manager calculates the idle_disk_energy = idle_disk_power * (now - last) / max_client_vms; energy consumption during that period, by multiply- for (id = 0; < max_client_vms; id++) ing the advance of the performance counters with [id].disk_idle += idle_disk_energy; their weights. Rather than charging the complete energy consumption to the active VM, the energy 3.3.3 Recursive Energy Accounting manager subtracts the idle cost and splits it between all VMs running on that processor. The time stamp Fulfilling a virtual device request issued by a guest counter, which is included in the recorded perfor- VM may involve interacting with several different mance counters, provides an accurate estimation of physical devices. Thus, with respect to host-level the processor’s idle cost. Thus the energy estimation energy accounting, it is not sufficient to focus on looks as follows: single physical devices; rather, accounting must in- /* per-VM idle energy based on TSC advance (pmc0) */ corporate the energy spent recursively in the virtu- for (id = 0; id < max_vms; id++) alization layer or subsequent service. vm[id].cpu_idle += weight[0] * pmc[0] / max_vms; We therefore perform a recursive, request-based /* calculate and charge access energy (pmc1..pmc8) */ accounting of the energy spent in the system, ac- for (p=1; p < 8; p++) vm[cur_id].cpu_access += weight[p] * pmc[p]; cording to the design principles presented in Section 2. In particular, each driver of a physical device determines the energy spent for fulfilling a given re- 3.3.2 Disk Energy Accounting quest and passes the cost information back to its To virtualize physical disks drives, our framework client. If the driver requires other devices to fulfill a reuses legacy Linux disk driver code by executing it request, it charges the additional energy to its clients inside VMs. The driver functionality is exported via as well. Since idle consumption of a device cannot be a translation module that mediates requests between attributed directly to requests, each driver addition- the device driver and external client VMs. The ally provides an “electricity meter” for each client. It translation module runs in the same address space as indicates the client’s share in the total energy con- the device driver and handles all requests sent to and sumption of the device, including the cost already from the driver. It receives disk requests from other charged with the requests. A client can query the VMs, translates them to basic Linux block I/O re- meter each time it determines the energy consump- quests, and passes them to the original device driver. tion of its respective clients. When the device driver has finalized the request, the As a result, recursive accounting yields a dis- module again translates the result and returns it to tributed matrix of virtual-to-physical transactions, the client VM. consisting of the idle and the active energy con- The translation module has access to all informa- sumption of each physical device required to provide tion relevant for accounting the energy dissipated a given virtual device (see Figure 5). Each device by the associated disk device. We implemented driver is responsible for reporting its own vector of accounting completely in this translation module, the physical device energy it consumes to provide its without changing the original device driver. The virtual device abstraction.

module estimates the energy consumption of the disk vDISK1 vDISK2 using the energy model presented above. When the idle device driver has completed a request, the transla- Energy vdisk1 vdisk2 Driver VM tion module estimates the energy consumption of Mgr. disk 3+2W 3+3W CPU 4+4W 4+6W vCPU DISK the request, depending on the number of transferred bytes: active driver VM CPU 6+12 W /* estimate transfer cost for size bytes */ Hypervisor vm[cur_id].disk_access += (size / transfer_rate) CPU * (active_disk_power - idle_disk_power); Because the idle consumption is independent of Figure 5: Recursive accounting of disk energy consump- the requests, it does not have to be calculated for tion; for each client VM and physical device, the driver each request. However, the driver must recalculate reports idle and active energy to the energy manager. The driver is assumed to consume 8W CPU idle power, it periodically, to provide the energy manager with which is apportioned equally to the two clients. up-to-date accounting records power consumption of

USENIX Association 2007 USENIX Annual Technical Conference 7 Since our framework currently supports CPU and 3.3.4 CPU Resource Allocation disk energy accounting, the only case where recur- To regulate the CPU energy consumption of individ- sive accounting is required occurs in the virtual disk ual machines, our hypervisor provides a mechanism driver located in the driver VM. The cost for the to throttle the CPU allocation at runtime, from user- virtualized disk consists of the energy consumed by level. The hypervisor employs a stride scheduling al- the disk and the energy consumed by the CPU while gorithm [21,23] that allots proportional CPU shares processing the requests. Hence, our disk driver also to virtual processors; it exposes control over the determines the processing energy for each request in shares to selected, privileged user-level components. addition to the disk energy as presented above. The host-level energy manager dynamically throt- As with disk energy accounting, we instrumented tles a virtual CPU’s energy consumption by adjust- the translation module in the disk driver VM to de- ing the alloted share accordingly. A key feature of termine the active and idle CPU energy per client stride scheduling is that it does not impose fixed up- VM. The Linux disk driver combines requests to get per bounds on CPU utilization: the shares have only better performance and delays part of the - relative meaning, and if one virtual processor does ing in work-queues and tasklets. When determin- not fully utilize its share, the scheduler allows other, ing the active CPU energy, it would be infeasible competing virtual processors to steal the unused re- to track the CPU energy consumption of each in- mainder. An obvious consequence of dynamic upper dividual request. Instead, we retrieve the CPU en- bounds is that energy consumption will not be con- ergy consumption at times and apportion it between strained either, at least not with a straight-forward the requests. Since the driver runs in a VM, it re- implementation of stride scheduling. We solved this lies on the energy virtualization capabilities of our problem by creating a distinct and privileged idle framework to retrieve a local view on the CPU en- virtual processor per CPU, which is guaranteed to ergy consumption (details on energy virtualization spend all alloted time with issuing halt instructions are presented in Section 3.4). (we modified our hypervisor to translate the idle pro- The constantly consumes a certain cessor’s virtual halt instructions directly into real amount of energy, even if it does not handle disk ones). Initially, each idle processor is alloted only a requests. According to our energy model, we do minuscule CPU share, thus all other virtual proces- not charge idle consumption with the request. To sors will be favored on the CPU if they require it. be able to distinguish the idle driver consumption However, to constrain energy consumption, the en- from the access consumption, we approximate the ergy manager will decrease the CPU shares of those idle consumption of the Linux kernel when no client virtual processors, and idle virtual processor will di- VM uses the disk. rectly translate the remaining CPU time into halt To account active CPU consumption, we assume cycles. Our approach guarantees that energy limits constant values per request, and predict the energy are effectively imposed; but it still preserves the ad- consumption of future requests based on the past. vantageous processor stealing behavior for all other Every 50th request, we estimate the driver’s CPU virtual processors. It also keeps the energy-policy energy consumption by means of virtualized per- out of the hypervisor and allows, for instance, to formance monitoring counters and adjust the ex- improve the scheduling policy with little effort, or pected cost for the next 50 requests. The follow- to exchange it with a more throughput-oriented one ing code illustrates how we calculate the cost per for those environments where CPU energy manage- request. In the code fragment, the static variable ment is not required. unaccounted cpu energy keeps track of the devia- tion between the consumed energy and the energy consumption already charged to the clients. The 3.3.5 Disk Request Shaping function get cpu energy() returns the guest-local To reduce disk power consumption, we pursue a sim- view of the current idle and active CPU energy since ilar approach and enable a energy manager to throt- the last query. tle disk requests of individual VMs. Throttling the /* subtract idle CPU consumption of driver VM */ request rate not only reduces the direct access con- unaccounted_cpu_energy -= drv_idle_cpu_power * (now - last); sumption of the disk; it also reduces the recursive CPU consumption which the disk driver requires to /* calculate cost per request */ num_req = 50; process, recompute, and issue requests. We imple- unaccounted_cpu_energy += get_cpu_energy(); mented the algorithm as follows: the disk driver pro- unaccounted_cpu_energy -= cpu_req_energy * num_req; cpu_req_energy = unaccounted_cpu_energy / num_req; cesses a client VM’s disk requests only to a specific request budget, and it delays all pending requests.

8 2007 USENIX Annual Technical Conference USENIX Association The driver periodically refreshes the budgets accord- throttle factors; viable throttle factors range from 0 ing to the specific throttling level set by the energy to a few thousand: manager. The algorithm is illustrated by the follow- e >e,t>t  1 − el−ec  l l ing code snippet: 4 (tl t)+ |e −e | : ∆t =  l c ec ring;

for (i=0; i < client->budget; i++) 3.4 Virtualized Energy { desc = &client->desc[ ring->start ]; To enable application-specific energy management, ring->start = (ring->start+1) % ring->cnt; initiate_io(conn, desc, ring); our framework supports accounting and control not } only for physical but also of virtual devices. In fact, } the advantage of having guest-level support for en- ergy accounting is actually twofold: first, it enables 3.3.6 Host-level Energy Manager guest resource management subsystems to leverage their application-specific knowledge; second, it al- Our host-level energy manager manager relies on lows drivers and other components to recursively de- the accounting and allocation mechanisms described termine the energy for their services. previously, and implements a simple policy that en- The main difference between a virtual device and forces given device power limits on a per-VM base. other software services and abstractions lies in its The manager consists of an initialization procedure interface: a virtual device closely resembles its phys- and a subsequent feedback loop. During initializa- ical counterpart. Unfortunately, most current hard- tion, the manager determines a power limit for each ware devices offer no direct way to query energy or VM and device type, which may not be exceeded power consumption. The most common approach during runtime. The CPU power limit reflects the to determine the energy consumption is to estimate active CPU power a VM is allowed to consume di- it based on certain device characteristics, which are rectly. The disk power limit reflects the overall ac- assumed to correlate with the power or energy con- tive power consumption the disk driver VM is al- sumption of the device. By emulating the according lowed to spend in servicing a particular VM, includ- behavior for the virtual devices, we support energy ing the CPU energy spent for processing (Neverthe- estimation in the guest without major modifications less, the driver’s CPU and disk energy are accounted to the guest’s energy accounting. Our ultimate goal separately, as depicted by the matrix in Figure 5). is to enable the guest to use the same driver for vir- Finding an optimal policy for allotment of power tual and for real hardware. In the remainder of this budgets is not the focus of our work; at present, the section, we first describe how we support energy ac- limits are set to static values. counting of virtual CPU and disk. We then present The feedback loop is invoked periodically, every the implementation of our energy-aware guest OS, 100 ms for the CPU and every 200 ms for the disk. which provides the support for application-specific It first obtains the CPU and disk energy consump- energy management. tion of the past interval by querying the account- ing infrastructure. The current consumptions are 3.4.1 Virtual CPU Energy Accounting used to predict future consumptions. For each de- vice, the manager compares the VM’s current energy For virtualization of physical energy effects of the consumption with the desired power limit multiplied CPU, we provide a virtual performance counter with the time between subsequent invocations. If model that gives guest OSes a private view of their they do not match for a given VM, the manager reg- current energy consumption. The virtual model re- ulates the device consumption by recomputing and lies on the tracing of performance counters within propagating the CPU strides and disk throttle fac- the hypervisor, which we presented in Section 3.3.1. tors respectively. To compute a new CPU stride, As mentioned, not only an energy-aware guest OS the manager adds or subtracts a constant offset from requires the virtual performance counters; the spe- the current value. When computing the disk throt- cialized device driver VM uses them as well, when tle factor, the manager takes the past period into recursively determining the CPU energy for its disk consideration, and calculates the offset ∆t according services. to the following formula. In this formula, ec denotes Like their physical counterparts, each virtual CPU the energy consumed, el the energy limit per period, has a set of virtual performance counters, which and t and tl and denote the present and past disk

USENIX Association 2007 USENIX Annual Technical Conference 9 3.4.3 An Energy-aware Guest OS For application-specific energy management, we have incorporated the familiar resource container concept into a standard version of our para- virtualized Linux 2.6 adoption. Our implementation relies on a previous approach to use resource contain- ers in the context of CPU energy management [3,24]. Figure 6: Virtualizing performance counters via We extended the original version with support for hypervisor-collected performance counter traces. disk energy. No further efforts were needed to man- age virtual CPU energy; we only had to virtualize factor out the events of other, simultaneously run- the performance counters to get the original version ning VMs. If a guest OS determines the current to run. value of a virtual performance counter, an emula- Similar to the host-level subsystem, the energy- tion routine in the in-place monitor obtains the cur- aware guest operating system performs scheduling rent hardware performance counter and subtracts based on energy criteria. In contrast to standard all advances of the performance counters that oc- schedulers, it uses resource containers as the base curred when other VMs were running. The hardware abstraction rather than threads or processes. Each performance counters are made read-accessible to application is assigned to a resource container, which user-level software by setting a control register flag then accounts all energy spent on its behalf. To in the physical processors. The advances of other account virtual CPU energy, the resource container VMs are derived from the performance counter log implementation retrieves the (virtual) performance buffers. To be accessible by the in-place VMM, the counter values on container switches, and charges log buffers are mapped read-only into the address the resulting energy to the previously active con- space of the guest OS. tainer. A container switch occurs on every context switch between processes residing in different con- tainers. 3.4.2 Virtual Disk Energy Accounting To account virtual disk energy, we enhanced the In contrast to the CPU, the disk energy estimation client-side of the virtual device driver, which for- schemes does not rely on on-line measurements of wards disk requests to the device driver VM. Orig- sensors or counters; rather, it is based on known inally, the custom device driver received single disk parameters such as the disk’s power consumption requests from the Linux kernel, which contained no in idle and active mode and the time it remains in information about the user-level application that active mode to handle a request. Directly translat- caused it. We added a pointer to a resource con- ing the energy consumption of physical devices from tainer to every data structure involved in a read or our run-time energy model to the parameter-based write operation. When an application starts a disk model of the respective guest OS would yield only in- operation, we bind the current resource container to accurate results. The VMM would have to calibrate the according page in the page cache. When the ker- the energy consumption of the devices to calculate nel writes the pages to the virtual disk, we pass the the energy parameters of the virtual devices. Fur- resource container on to the respective data struc- thermore, parameters of shared devices may change tures (i.e., buffer heads and bio objects). The cus- with the number of VMs, which contradicts the orig- tom device driver in the client accepts requests in inal estimation model. To ensure accuracy in the form of bio objects and translates them to a request long run, the guest would have to query the virtual for the device driver VM. When it receives the reply devices regularly for updated parameters. together with the cost for processing the request, it For our current virtual disk energy model, we charges the cost to the resource container bound to therefore use a para-virtual device extension. We the bio structure. expose each disk energy meter as an extension of To control the energy consumption of virtual de- the virtual disk device; energy-aware guest operating vices, the guest kernel redistributes its own, VM- systems can take advantage of them by customizing wide power limits to subordinate resource contain- the standard device driver appropriately. ers, and enforces them by means of preemption. Whenever a container exhausts the energy budget of the current period (presently set to 50 ms), it is pre- empted until a refresh occurs in the next period. A

10 2007 USENIX Annual Technical Conference USENIX Association 80 simple user-level application retrieves the VM wide base power budgets from host-level energy-manager and passes idle power (driver:disk) 70 (driver:CPU) them onto the guest kernel via special system calls. active power (driver:disk) (driver:CPU) ) t t (clientVM:CPU) a 60

W external DAQ (CPU+Disk) ( r e

4 Experiments and Results w 50 o P

In the following section, we present experimental re- 40 sults we obtained from our prototype. Our main goal 30 0 1 2 4 8 1 3 6 .5 K K K K 6 2 4 is to demonstrate that our infrastructure provides an K K K K effective solution to manage energy in distributed, Block size (bytes) multi-layered OSes. We consider two aspects as rele- vant: At first, we validate the benefits of distributed Figure 7: Energy distribution for CPU and disk during energy accounting. We then present experiments the disk stress test. The thin bar shows the real CPU that aim to show the advantages of multi-layered and disk power consumption, measured with an external DAQ system. resource allocation to enforce energy constraints. For CPU measurements, we used a Pentium D 830 with two cores at 3GHz. Since our implemen- ergy distribution; for reasons of space, we do not tation is currently limited to single processor sys- show it here. For each size, the figure shows disk tems, we enabled only on one core, which always and CPU power consumption of the client and the ran at its maximum frequency. When idle, the core device driver VM. The lowermost part of each bar consumes about 42W; under full load, power con- shows the base CPU power consumption required sumption may be 100W and more. We performed by core components such as the hypervisor and the disk measurements on a Maxtor DiamondMax Plus user-level VMM (36W); this part is consumed in- 9 IDE hard disk with 160GB size, for which we dependently of any disk load. The upper parts of took the active power (about 5.6W) and idle power each bar show the active and idle power consump- (about 3.6W) characteristics from the data sheet tion caused by the stress test, broken into CPU and [16]. We validated our internal, estimation-based ac- disk consumption. Since the client VM is the only counting mechanisms by means of an external high- consumer of the hard disk, it is accounted the com- performance data acquisition (DAQ) system, which plete idle disk power (3.5) and CPU power (8W) measured the real disk and CPU power consumption consumed by the driver VM. Since the benchmark with a sampling frequency of 1KHz. saturates the disk, the active disk power consump- tion of the disk driver mostly stays at its maximum 4.1 Energy Accounting (2W), which is again accounted to the client VM as the only consumer. Active CPU power consump- To evaluate our approach of distributed energy ac- tion in the driver VM heavily depends on the block counting, we measured the overall energy required size and ranges from 9W for small block sizes down for using a virtual disk. For that purpose, we ran a to 1W for large ones. Note that the CPU costs for synthetic disk stress test within a Linux guest OS. processing a virtual disk request may even surpass The test runs on a virtual hard drive, which is mul- the costs for handling the request on the physical tiplexed on the physical disk by the disk driver VM. disk. Finally, active CPU power consumption in the The test performs almost no computation, but gen- client VM varies with the block sizes as well, but erates heavy disk load. By opening the virtual disk at a substantially lower level; the lower level comes in raw access mode, the test bypasses most of the unsurprising, as the benchmark bypasses most parts guest OS’s caching effects, and causes the file I/O to of the disk driver in the client OS. The thin bar on be performed directly to and from buffers. the right of each energy bar shows the real power Afterwards, the test permanently reads (writes) con- consumption of the CPU and disk, measured with secutive disk blocks of a given size from (to) the disk, the external DAQ system. until a maximum size has been reached. We per- formed the test for block sizes from 0.5 KByte up to 32 KByte. We obtained the required energy per 4.2 Enforcing Power Constraints block size to run the benchmark from our accounting To demonstrate the capabilities of VM-based energy infrastructure. allocation, and to evaluate the behavior of our disk The results for the read case are shown in Fig- throttling algorithm over time, we performed a sec- ure 7. The write case yields virtually the same en- ond experiment with two clients that simultaneously

USENIX Association 2007 USENIX Annual Technical Conference 11 )

t 3 t implementation reduces the each task time accord- a 2.5 VM1 W ( 2 VM2 ingly, with the result that over time, the limits are r

e 1.5 external DAQ effectively obeyed. w

o 1 P

k 0.5 s i 0 60 2000 D 0 1 2 3 4 5 6 7 8 9 1 1 1 active power (VM:complete) 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 (VM:bzip2-1) 1800

c 50 (VM:bzip2-2) e 1600 )

s 60 / throughput (VM:bzip2-1) c e B 50 VM1

(VM:bzip2-2) 1400 s / M

( 40

VM2 40 s ) t t k 30 t c u 1200 a o p 20 l h W B ( g 10 30 1000 ( r u t e o 0 u r w p

h 0 2 4 6 8 1 800 o 0 0 0 0 0 h T

0 g P

20 u

Time (seconds) 600 o r h

400 T 10 Figure 8: Disk power consumption and throughput of 200 two constrained disk test simultaneously running in two 0 0 different guest VMs. host-level mgmt. guest-level mgmt. Figure 9: Guest-Level energy redistribution. require disk service from the driver. The clients in- terface with a single disk driver VM, but operate The results are given in Figure 9. For both cases, on distinct hard disk partitions. We set the active the figure shows overall active CPU power of the driver power limit of client VM 1 to 1W and the guest VM in the leftmost bar, and the throughput limit of client VM 2 to 0.5W, and periodically ob- broken down to each bzip2 instance in the rightmost tained driver energy and disk throughput over a pe- bar. For the energy-aware VM, we additionally ob- riod of about 2 minutes. Figure 8 shows both dis- tained the power consumption per bzip2 instance as tributions; we set the limit about 45 seconds after seen by the guest’s energy management subsystem having started the measurements. Our experiment itself; it is drawn as the bar in the middle. demonstrates the driver’s capabilities to VM-specific Note that the guest’s view of the power consump- control over power consumption. The internal ac- tion is slightly higher than the view of the host-level counting and control furthermore corresponds with energy manager. Hence, the guest imposes some- the external measurements. what harsher power limits, and causes the overall throughput of both bzip2 instances to drop com- 4.2.1 Guest-Level Energy Allocation pared to host-level control. We attribute the differ- ences in estimation to the clock drift and rounding In the next experiment, we compared the effects errors in the client. of enforcing power limits at the host-level against However, the results are still as expected: host- the effects of guest-level enforcement. In the first level control enforces the budgets independent of the part of the experiment, we ran two instances of guest’s particular capabilities – but the enforcement the compute-intensive bzip2 application within an treats all guest’s applications as equal and thus re- energy-unaware guest OS. In the unconstrained case, duces the throughput of both bzip2 instances pro- a single bzip2 instance causes an active CPU power portionally. In contrast, guest-level management al- consumption of more than 50W. The guest, in turn, lows the guest to respect its own user priorities and is alloted an overall CPU active power of only 40W. preferences: it allots a higher power budget to the As the guest is not energy-aware, the limit is en- first bzip2 instance, resulting in a higher throughput forced by the host-level subsystem. In the second compared to the second instance. part, we used an energy-aware guest, which com- plies with the alloted power itself. It redistributes 5 Related Work the budget among the two bzip2 instances using the resource container facility. Within the guest, we set There has been a considerable research interest in the application-level power limits to 10W for the involving operating systems and applications in the first, and to 30W for the second bzip2 instance. Note management of energy and power of a computer that the power limits are effective limits; strictly spo- system [6, 7, 9, 10, 14, 25, 27]. Except for the ap- ken, both bzip2 still consume each 50 Joules per sec- proach of vertically structured OSes, which will be ond when running; however, the resource container discussed here, none of them has addressed the prob-

12 2007 USENIX Annual Technical Conference USENIX Association lems that arise if the OS consists of several layers and power state. ECOSystem does not distinguish be- is distributed across multiple components, as cur- tween the fractions contributed by different devices; rent virtualization environments do. To our knowl- all cost that a task causes is accumulated to one edge, neither the popular hypervisor [2,19] nor value. This allows the OS to control the overall en- VMware’s most recent hypervisor-based ESX Server ergy consumption without considering the currently [22] support distributed energy accounting or allo- installed devices. However, it renders the approach cation across module boundaries or software layers. too inflexible for other energy management schemes Achieving accurate and easy accounting of energy such as thermal management, for which energy con- by vertically structuring an OS was proposed by the sumption must be managed individually per device. designers of Nemesis [14,18]. Their approach is very In previous work [3], Bellosa et al. proposed similar to our work in that it addresses account- to estimate the energy consumption of the CPU ability issues within multi-layered OSes. A verti- for the purpose of thermal management. The ap- cally structured system multiplexes all resources at proach leverages the performance monitoring coun- a low level, and moves protocol stacks and most ters present in modern processors to accurately esti- parts of device drivers into user-level libraries. As mate the energy consumption caused by individual a result, shared services are abandoned, and the ac- tasks. Like the ECOSystem approach, this work uses tivities typically performed by the kernel are exe- a monolithic operating system kernel. Also, the es- cuted within each application itself. Thus, most re- timated energy consumption is just a means to the source and energy consumption can be accounted to end of a specific management goal, i.e., thermal man- individual applications, and there is no significant agement. Based on the energy consumption and a anonymous consumption anymore. thermal model, the kernel estimates the temperature Our general observation is that hypervisor-based of the CPU and throttles the execution of individual VM environments are structured similarly to some tasks according to their energy characteristics if the extent: a hypervisor also multiplexes the system re- temperature reaches a predefined limit. sources at a low level, and lets each VM use its own protocol stack and services. Unfortunately, a big 6 Conclusion limitation of vertical structuring is that it is hard to achieve with I/O device drivers. As only one In this work, we have presented a novel framework driver can use the device exclusively, all applica- for managing energy in multi-layered OS environ- tions share a common driver provided by the low- ments. Based on a unified energy model and mech- level subsystem. To process I/O requests, a shared anisms for energy-aware resource accounting and driver consumes CPU resources, which recent ex- allocation, the framework provides an effective in- periments demonstrate to be substantial in multi- frastructure to account, distribute, and control the layered systems that are used in practice [5]. In a power consumption at different software layers. In completely vertically structured system, the process- particular, the framework explicitly accounts the re- ing costs and energy can not be accounted to the cursive energy consumption spent in the virtualiza- applications. In contrast, it was one of the key goals tion layer or subsequent driver components. Our of our work to explicitly account the energy spent in prototypical implementation encompasses a host- service or driver components. level subsystem controlling global power constraints The ECOSystem [27] approach resembles our and, optionally, an energy-aware guest OS for lo- work in that it proposes to use energy as the base cal, application-specific power management. Exper- abstraction for power management and to treat it iments show that our prototype is capable of en- as a first-class OS resource. ECOSystem presents a forcing power limits for energy-aware and energy- currentcy model that allows to manage the energy unaware guest OSes. consumption of all devices in a uniform way. Apart We see our work as a support infrastructure to from its focus on a monolithic OS design, ECOSys- develop and evaluate power management strategies tem differs from our work in several further aspects. for VM-based systems. We consider three areas to The main goal of ECOSystem is to control energy be important and prevalent for future work: devices consumption of mobile systems, in order to extend with multiple power states, processors with support their battery lifetime. To estimate the energy con- for hardware-assisted virtualization, and multi-core sumption of individual tasks, ECOSystem attributes architectures. There is no design limit with respect power consumptions to different states of each device to the integration into our framework, and we are (e.g. standby, idle, and active states) and charges actively developing support for them. applications if they cause a device switch to a higher

USENIX Association 2007 USENIX Annual Technical Conference 13 Acknowledgements [13] R. Joseph and M. Martonosi. Run-time power estimation in high performance microprocessors. In Proceedings of the 2001 International Symposium on Low Power Elec- We would like to thank Simon Kellner, Andreas tronics and Design, pages 135–140, Huntington Beach, Merkel, Raphael Neider, and the anonymous review- CA, Aug. 2001. ers for their comments and helpful suggestions. This [14] I. M. Leslie, D. McAuley, R. Black, T. Roscoe, P. T. Barham, D. Evers, R. Fairbairns, and E. Hyden. The de- work was in part supported by the Intel Corporation. sign and implementation of an operating system to sup- port distributed multimedia applications. IEEE Journal References of Selected Areas in Communications, 14(7):1280–1297, Sept. 1996. [1] G. Banga, P. Druschel, and J. C. Mogul. Resource con- [15] J. LeVasseur, V. Uhlig, J. Stoess, and S. G¨otz. Un- tainers: A new facility for resource management in server modified device driver reuse and improved system de- systems. In Proceedings of the 4th Symposium on Oper- pendability via virtual machines. In Proceedings of the ating Systems Design and Implementation, pages 45–58, 6th Symposium on Operating Systems Design and Imple- Berkeley, CA, Feb. 1999. mentation, pages 17–30, San Fransisco, CA, Dec. 2004. [2] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, [16] Maxtor Corporation. DiamondMax Plus 9 Data Sheet, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen 2003. and the art of virtualization. In Proceedings of the 19th [17] A. Merkel and F. Bellosa. Balancing power consumption Symposium on Operating System Principles, pages 164– in multiprocessor systems. In Proceedings of the 1st Eu- 177, Bolton Landing, NY, Oct. 2003. roSys conference, pages 403–414, Leuven, Belgium, Apr. 2006. [3] F. Bellosa, A. Weissel, M. Waitz, and S. Kellner. Event- driven energy accounting for dynamic thermal manage- [18] R. Neugebauer and D. McAuley. Energy is just another ment. In Proceedings of the Workshop on Compilers and resource: Energy accounting and energy pricing in the Operating Systems for Low Power, pages 1–10, New Or- nemesis OS. In Proceedings of 8th Workshop on Hot leans, LA, Sept. 2003. Topics in Operating Systems, pages 67–74, Schloß El- mau, Oberbayern, Germany, May 2001. [4] R. Bianchini and R. Rajamony. Power and energy man- [19] I. Pratt, K. Fraser, S. Hand, C. Limpach, A. Warfield, agement for server systems. IEEE Computer, 37(11):68– D. Magenheimer, J. Nakajima, and A. Malick. Xen 74, 2004. 3.0 and the art of virtualization. In Proceedings of the [5] L. Cherkasova and R. Gardner. Measuring CPU over- 2005 Ottawa Linux Symposium, pages 65–85, Ottawa, head for I/O processing in the Xen virtual machine mon- Canada, July 2005. itor. In Proceedings of the USENIX Annual Technical [20] J. Stoess and V. Uhlig. Flexible, low-overhead event Conference, pages 387–390, Anaheim, CA, Apr. 2005. logging to support resource scheduling. In Proceedings [6] K. Flautner and T. N. Mudge. Vertigo: automatic of the Twelfth International Conference on Parallel and performance-setting for Linux. In Proceedings of the Distributed Systems, volume 2, pages 115–120, Min- 5th Symposium on Operating Systems Design and Im- neapolis, MN, July 2006. plementation, pages 105–116, Boston, MA, Dec. 2002. [21] V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. [7] J. Flinn and M. Satyanarayanan. Energy-aware adapta- Towards scalable multiprocessor virtual machines. In tion for mobile applications. In Proceedings of the 17th Proceedings of the 3rd Virtual Machine Research and Symposium on Operating System Principles, pages 48– Technology Symposium, pages 43–56, San Jose, CA, May 63, Charleston, SC, Dec. 1999. 2004. [22] VMware Inc. ESX Server Data Sheet, 2006. [8] K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the Xen [23] C. A. Waldspurger and W. E. Weihl. Lottery schedul- virtual machine monitor. In 1st Workshop on Operating ing: Flexible proportional-share resource management. System and Architectural Support for the On-Demand In Proceedings of the 1st Symposium on Operating Sys- IT Infrastructure, Boston, MA, Oct. 2004. tems Design and Implementation, pages 1–11, Monterey, CA, Nov. 1994. [9] M. Gomaa, M. D. Powell, and T. N. Vijaykumar. Heat- [24] A. Weissel and F. Bellosa. Dynamic thermal manage- and-run: leveraging SMT and CMP to manage power ment for distributed systems. In Proceedings of the 1st density through the operating system. In Proceedings Workshop on Temperature-Aware Computer Systems, of the 11th International Conference on Architectural Munich, Germany, May 2004. Support for Programming Languages and Operating Sys- tems, pages 260–270, Boston, MA, Sept. 2004. [25] A. Weissel, B. Beutel, and F. Bellosa. Cooperative IO - a novel IO semantics for energy-aware applica- [10] S. Gurumurthi, A. Sivasubramaniam, M. Kandemir, and tions. In Proceedings of the 5th Symposium on Operat- H. Franke. DRPM: dynamic speed control for power ing Systems Design and Implementation, pages 117–130, management in server class disks. In Proceedings of the Boston, MA, Dec. 2002. 30th annual international symposium on Computer ar- [26] J. Zedlewski, S. Sobti, N. Garg, F. Zheng, A. Krishna- chitecture (ISCA), pages 169–181, New York, NY, June murthy, and R. Wang. Modeling hard-disk power con- 2003. sumption. In Proceedings of the Second Conference on [11] H. H¨artig, M. Hohmuth, J. Liedtke, and S. Sch¨onberg. File and Storage Technologies, pages 217–230, San Fran- The performance of µ-kernel based systems. In Proceed- cisco, CA, Mar. 2003. ings of the 16th Symposium on Operating System Prin- [27] H. Zeng, C. S. Ellis, A. R. Lebeck, and A. Vahdat. ciples, pages 66–77, Saint Malo, France, Oct. 1997. ECOSystem: managing energy as a first class operating [12] T. Heath, A. P. Centeno, P. George, L. Ramos, system resource. In Proceedings of the 10th International Y. Jaluria, and R. Bianchini. Mercury and Freon: tem- Conference on Architectural Support for Programming perature emulation and management in server systems. Languages and Operating Systems, pages 123–132, San In Proceedings of the 12th International Conference on Jose, CA, Oct. 2002. Architectural Support for Programming Languages and [28] H. Zeng, C. S. Ellis, A. R. Lebeck, and A. Vahdat. Cur- Operating Systems, pages 106–116, San Jose, CA, Oct. rentcy: unifying policies for resource management. In 2006. Proceedings of the USENIX 2003 Annual Technical Con- ference, pages 43–56, San Antonio, TX, June 2003.

14 2007 USENIX Annual Technical Conference USENIX Association