<<

A NEW SYSTEM ARCHITECTURE FOR GREEN ENTERPRISE

ADISSERTATION SUBMITTED TO THE DEPARTMENT OF AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Maria A. Kazandjieva August 2013

© 2013 by Maria Alexandrova Kazandjieva. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/dk404ky3467

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Philip Levis, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Christoforos Kozyrakis

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Nick McKeown

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii Abstract

Computing systems account for at least 13% of the electricity use of office buildings. This translates to about 2% of the electricity consumption of the entire US [46] or the equivalent of the State of New Jersey! As computing becomes pervasive, making these systems more efficient is an opportunity to reduce operational costs and have a positive environmental impact. Unfortunately, current understanding of energy consumption in office buildings is limited and coarse-grained. Without better visibility into how electricity is spent and how much of it is wasted, it is difficult to find ways to reduce it. Powernet –amulti- year power and utilization study of the computing infrastructure in the Computer Science Department at Stanford University – begins to address the visibility problem in one building. Powernet’s data is collected via a large network of plug-level wireless power meters and software sensors that cover a significant portion of the 2nd,3rd, and 4th floors of the Gates building at Stanford. The Powernet data show that at least 25% of Gates’s electricity is wasted on idle and over-provisioned devices. At the extreme, many desktops operate at near-idle for 75% of the time. The combination of high idle power and low utilization means that a large chunk of energy is wasted. This highlights an opportunity to improve on current computing systems. This dissertation presents a novel system architecture for office computing, Any- ware. To save energy, Anyware leverages two observations. First, an increase in energy use does not translate to the same increase in performance. Second, there is a range of resources one can have for a fixed power budget. Anyware’s hybrid design splits workload execution between a local low-power client device and a vir- tual machine (VM) on a backend server. Applications that benefit from hardware

iv optimizations, such as video and graphics, remain local; other tasks (document and picture editing, PDF viewing, etc.) are offloaded to the server. Anyware reduces the energy cost of computing by 70%–80% because the client has power draw comparable to that of a or a laptop (15 to 20 watts) while the server can host multiple user VMs. Fast I/O, the availability of network resources in a LAN environment, and the increased CPU and memory on the server mean that users can get comparable performance at the fraction of the energy cost. Anyware demonstrates that with a new computing architecture, it is possible to have the best of two worlds: desktop performance at the energy costs of thin clients.

v Acknowledgements

“The universe is big. It is vast and complicated and ridiculous. And sometimes, very rarely, impossible things just happen and we call them miracles.”

Doctor Who, Season 5, Episode 12

Ihaveoftenfeltthat,justliketheuniverse,graduateschoolisvastandcompli- cated and ridiculous. The real miracle is not the diploma at the end, but rather the continuous support of so many of you.

Phil – you have been my champion from day one. I will always remember how you introduced me to a boat-full of researchers in the Sydney harbor. You looked so proud and excited, I had only been at Stanford for two months (and survived my first NDSI submission!) Thank you for being a true advisor – somebody who taught me how to be a better researcher not by laying down an agenda but by allowing me to find my own academic path. I am already seeing the payoffof being able to confidently tackle underspecified technical challenges. What more could I ask for! I hope I will continue to make you proud as I venture out into the ‘real world’. My intellectual framework has never been more cognitively dissonant. Thank you! Christos and Scott – thank you for always bringing a fresh perspective to my research; your ideas and guidance have allowed me to look at energy efficiency from so many different directions. And remember, if worst comes to worst, Christos will adopt me in exchange for homemade beer and cheese! On my vacations, I will be in San Diego visiting Scott.

vi Nick, John O.,andMartin –youallsteppedintobepartofmydefenseand reading committees. The closed sessions discussion we had was extremely energizing – not an easy feat in the presence of a tired and stressed Maria. Thank you for all the valuable feedback. Margaret and Sami – without a doubt, none of this would have been possible without you two. I am fortunate to have not one but two amazing technical women who have mentored me throughout the years. Sami – in undergrad I told you I wanted to feel stupid and you knew just what I meant. Thank you for encouraging me to surround myself with the smartest people out there so I could grow as a researcher. Margaret – I was not applying to Stanford until you told me to. Good call! Thank you for being there for me long after I left Princeton; I hope our paths cross again. SING – Mayank, Jung Il, Kannan, Ewen, Behram, Wanja, Jung Woo, Tahir, Chinmayee, Jae-Young, Omid –youmadeitguys.YouareMaria-free (almost!) Thank you for all the collaborations and all the fun moments. You have made my time in grad school a worthy experience. Ewen,nowthatwearenolonger labmates, I hope I can be one of those secret, imaginary friends you have, drinking beer and enjoying food together. Nick and the McKeowns (no, it is not a band name) – thank you for making me an honorary member of your group; it has been and continues to be a fantastic experience. Stanford folk –Ireceivedsomuchsupportfrommanywhoowedmenothing. Mary Jane, Alexis, and Chris are truly indispensable. A lot of content in this disser- tation would not be here without collaborations with utilities and building managers, as well as IT staffat Gates and elsewhere on campus. Thank you. Friends,afulllistofwhomcanbeviewedonawebsitecalledFacebook–you have all enriched my life, made the tough times easier, the fun times crazier, and provided the occasional body or pad to punch (Yeah, Muay Thai.) Forgive me for not going through a long list of names, nobody deserves to be forgotten, yet my brain is not what it used to be. Everybody has my deepest gratitude. Lindsay –tenyearsintothis,thetwoofusarejustasR&Dasalwaysexcept now we are also proper adults and doctors! I love it and I love you.

vii Mom – ten years ago you sent me to the U.S. with a one-way ticket, $300, and two suitcases; thank you for your strength and sacrifice. Thank you for always putting aside your literature-degree sensibility and catering to a child who refused to read fiction but begged to get encyclopedias. Life has not been easy but my drive to succeed has always come down to making both of us proud and being able to take care of you. Your ‘love without attachment’ motto might just have been the most selfless thing a parent could teach their child. I love you!

Last but certainly not least, Brandon –learninghowtomakeyousmileisperhaps worthy of a whole other Ph.D. (Explosions. Leafy sea dragons. Cars. Ocean. Scenic time lapses. Palm hearts. Slo-mo punches. Llama. Mushrooms.) It is so worth it! Thank you for being my cosmological constant, ensuring that my universe expands1.

1If you would like to know what this means, please contact me or read [37]

viii Contents

Abstract iv

Acknowledgements vi

1 Introduction 1 1.1 MeasuringEnergyandUtilization ...... 3 1.2 DataAnalysis...... 4 1.3 Anyware...... 4 1.4 Thesis Statement and Contributions ...... 6

2 Background and Related Work 8 2.1 MeasuringEnergyUse ...... 9 2.2 Existing Green Computing Solutions ...... 12 2.2.1 Making Desktops Sleep ...... 13 2.2.2 Thin Clients ...... 17 2.3 Offloading Computation ...... 18 2.4 Summary ...... 21

3 Powernet: A Sensing Infrastructure 22 3.1 Power Monitoring ...... 23 3.1.1 OverallSystemDesign ...... 24 3.1.2 Power Meters ...... 25 3.1.3 Deployment Experiences ...... 28 3.1.4 DataAccess...... 31

ix 3.2 UtilizationMonitoring ...... 32 3.2.1 PCs ...... 32 3.2.2 Network Traffic...... 33 3.3 Deployments ...... 34 3.3.1 GatesHall...... 34 3.3.2 Thin clients ...... 35 3.4 Summary ...... 35

4 Data Analysis 37 4.1 Device Energy Consumption ...... 37 4.1.1 PersonalComputers ...... 38 4.1.2 Computer Displays ...... 41 4.1.3 Server Machines ...... 44 4.1.4 NetworkingEquipment ...... 45 4.1.5 Whole-buildingSummary ...... 48 4.2 Utilization...... 49 4.2.1 Computers ...... 49 4.2.2 NetworkEquipment ...... 52 4.3 Thin Client Setups ...... 55 4.3.1 UnitedStatesVMWareDeployment...... 55 4.3.2 German Computer Science Lab Sun Deployment ...... 57 4.4 Implications for Enterprise Computing ...... 59 4.4.1 Systems have poor power proportionality...... 59 4.4.2 Small increases in performance can have a high power cost. . . 61 4.4.3 User systems have low average utilization but occasionally need high performance...... 61 4.4.4 Networks are significantly overprovisioned...... 61 4.4.5 Current power saving techniques trade offproductivity for effi- ciency...... 62 4.5 Summary ...... 63

x 5 Anyware: A Hybrid Compute Model 64 5.1 Overal System Design ...... 65 5.1.1 AnywareOverview ...... 66 5.1.2 Remote Execution Options ...... 67 5.1.3 AnywareDesign...... 67 5.2 FindingRemoteResources ...... 69 5.3 Execution Placement ...... 71 5.3.1 Methodology ...... 72 5.3.2 Experimental Results ...... 74 5.3.3 Logistic Regression Model ...... 75 5.3.4 Model Training and Validation ...... 76 5.3.5 Discussion ...... 77 5.4 ArchitecturalSupportforAnyware ...... 78 5.5 Evaluation ...... 80 5.5.1 Experimental Methodology ...... 81 5.5.2 AnywareClientPerformance...... 82 5.5.3 Local-onlyApplications ...... 85 5.5.4 Sharing Server Resources ...... 86 5.5.5 Energy Savings ...... 88 5.6 Discussion ...... 90

6 Methodology Guidelines for Energy Research 92 6.1 PowerMeasurementHardware...... 92 6.2 Methodology Lessons in Power Measurement ...... 94 6.2.1 Sampling Frequency ...... 94 6.2.2 Device Variations ...... 96 6.2.3 Device Sample Size ...... 98 6.2.4 Duration of Measurements ...... 98 6.3 FourStepstoCharacterizingEnergyUse ...... 101 6.4 ExternalDatasets: ACautionaryTale ...... 104

xi 7 Conclusion 107 7.1 Contributions ...... 107 7.2 Going Forward ...... 109

Bibliography 111

xii List of Tables

3.1 Powernet covers a variety of devices whose power measurements enable adetailedcharacterizationoftheenergyconsumption.Somedevices also have CPU utilization or network traffic monitors...... 23 3.2 Summary of collected data, organized by type of measurement. . . . . 23 3.3 System Outages ...... 30

4.1 Personal computers are binned into three categories, and university and active network node counts allow us to extrapolate to thewholebuilding...... 41 4.2 A survey shows that majority of building occupants use mid-sized LCD displays. The number of large (30”) monitors is increasing as equip- ment is upgraded...... 44 4.3 Summary of switch types, quantities, and and estimated individual power consumptions. This inventory includes all major network switches and excludes small per-room switches and hubs...... 47 4.4 We cross-correlate Powernet measurements with IT databases to ex- trapolate the energy consumption of all computing systems in the building...... 48 4.5 CPU utilization of both student and administrative staffmachines re- veals that processing resources are only lightly taxed. Data was col- lected once a second for 11 months (students) and 1 month (staff). . . 50 4.6 The most popular workloads on administrative computing systems are general office and web application. These workloads imply that a lap- top can be a used instead of a desktop...... 52

xiii 4.7 Summary of groups of switches with individual and estimated total power consumption. Gates building...... 53 4.8 Power draw for two servers handling 44 virtual desktops via VMWare’s ESX Server OS. Each VM corresponds to a user with a 15- watt thin client at their desk. The average total per-user computing cost is 30 watts...... 57

5.1 Tasks performed by subjects to build and test a predictive model for the Anyware. The ’local’ column refers to whether users preferred this application to remain on the client...... 73 5.2 Application features collected in order to build a predictive model for Anyware...... 74 5.3 Execution placement classes determined from user experiments. . . . 75 5.4 Even with Anyware, replacing a desktop with lower-end machine has alargenegativeeffectonsystemperformance...... 78 5.5 Hardware used to evaluate Anyware. Non-desktop clients have slower CPU and I/O performance...... 81 5.6 Remote video playback results in low frames-per-second and degraded viewing experience. Anyware has the benefit of both local and remote resources so such tasks will remain local...... 85 5.7 Average power draw of different types of computing equipment in watts. The per-user server values assume 25 VMs per one physical server...... 89

6.1 Average power draw of two different devices with the same model (stan- dard deviation shown in parentheses). Two devices of the same model can differ by as much as 43%. Networking equipment is more uniform thanPCs...... 96

xiv List of Figures

2.1 Powerproportionality...... 12

3.1 Diagram of the Powernet system...... 24 3.2 Powernet custom meters ...... 27 3.3 Number of nodes from which packets were received at the basestation duringthedeployment...... 30 3.4 Map of wireless Powernet meters over three floors in the Gates building. Each dot is a sensing point and the two black triangles are base stations. 34

4.1 Distribution of power draw values for desktops and laptops monitored withPowernet...... 38 4.2 Desktop energy varies both over time and between different pieces of equipment, as shown by these three PC data traces...... 39 4.3 Brightness level and color scheme have a significant effect on monitor power consumption. A one-time change in LCD screen configurations can have a large impact on the power draw...... 42 4.4 Power remains constant for a wide variety of network load. The num- ber of active Ethernet porst, and more interestingly, the maker of the equipment, have a larger effect on energy...... 46 4.5 Aggregate power draw for the entire Powernet building shows diurnal and weekday/weekend patterns. Computing systems account for 51% of the total 445 kW. The given week of data is representative of the building, except for Monday, which was a university holiday (Feb 15). 48

xv 4.6 CDF of traffic for seven switches over 6 months shows that switches are operating well under capacity...... 50 4.7 Typical traffic patterns for one edge switches in the building. Network utilization remain low. Power consumption for this switch remain con- stant, at approximately 500 watts...... 52 4.8 CDF of traffic for seven switches over 6 months shows that switches are operating well under capacity...... 53 4.9 CPU for two Sun Fire servers at the Erlangen thin client infrastructure. 58 4.10 Relationship between power draw and utilization for three pieces of office equipment. Note that the network cost is low and constant. . . 60 4.11 Processor performance as rated the 3DMark benchmark versus maxi- mum power dissipation. For modern processors, power draw increases non-linearlywithperformance...... 60 4.12 Utilization data collected over 3 months from seven office PCs show how over-provisioned current desktop setups are: CPU lies below 25% for 85% of the time...... 62

5.1 On the Gnumeric Anyware performs identically to the desktop. The results are averaged over 10 runs, which are all within 0.6% of each other...... 83 5.2 The Van Gogh GIMP filter is a CPU-intensive task so both the laptop and the Eee PC benefits from Anyware’s access to the server. The missing ‘Eee PC only’ bar is at 147 seconds...... 83 5.3 Editing a 3.2MB text file with the Kate Editor has comparable per- formance for all three setups, with variation between runs of at most 0.25 seconds...... 84 5.4 Executing applications with a cold cache represents worst-case perfor- mance for Anyware, similar to a just-booted desktop...... 85 5.5 As pre-existing CPU load on the server increases, so does the execu- tion time of workloads. An offloading policy should take into account current utilization before assigning tasks for remote execution. . . . 87

xvi 5.6 Anyware’s energy savings depend on how many users can share the same server. Total per-user power draw lies within the shaded region. 90

6.1 Five-minute averages of power data do not show anything out of the ordinary – the PC is idle at about 95 watts...... 95 6.2 Power data collected once a second reveals a misbehaving PC. Earlier, 5-minute averages hid the anomaly. In certain use cases it is beneficial to have high-resolution data...... 95 6.3 PCs have a wide distribution of power draws, making it critical to have multiplemeasurementpoints...... 99 6.4 Desktop diversity requires the measurement of a large sample of the population. In this experiment, if only 5 desktops are used to estimate the power of all 69, then the expected error is over 16%...... 99 6.5 As the number of months of data increases, the standard deviation of error in estimates decreases. Even if only one month of data is used over 16 desktop, the year approximation will be within 4% of the true value of $1600...... 100 6.6 A week-long trace of power consumption and CPU utilization shows how well the two track each other, with r2 =0.996...... 102 6.7 The datasets for both dekstops and laptops show lower mean and median values for power draw. The difference in desktops is particularly large, misrepresing real worls. Additionally, the En- ergy Star dataset underestimates the energy savings a desktop/laptop switch could have...... 105

xvii Chapter 1

Introduction

The Department of Energy estimates [46] that computing systems in office and ed- ucation buildings account for 2% of the electricity use of the United States. This energy cost is equivalent to the electricity use of the entire state of New Jersey! As computing becomes more prevalent in our daily jobs, this number will continue to grow. Even though enterprise computing has an energy footprint comparable to that of data centers, enterprises have seen much less attention. The homogeneity and single administrative domain of data centers make them acompellingtargetformoreenergyefficientdesigns,resultingintangibleimprove- ments. Enterprise computing, on the other hand, is much more difficult to tackle due to highly heterogeneous equipment and workloads, decentralized management, and misaligned financial incentives. The research is there, covering thin-client comput- ing [23], dynamic VM migration [35, 44], network proxies [27, 67], and even network port deactivation [64]. Yet in our enterprise (a computer science department), as in others, the manager’s path to greater efficiency is less clear than that in a data center. Instead of an orderly, pre-planned warehouse of identical boxes running a lim- ited codebase, this is a multi-floor, VLAN-partitioned, diverse aggregate that grows organically, at best, and randomly, at worst. The manager does not know where the power is going, and even if she did, quantifying the comparative benefits of greener options is far beyond her reach. Improving the energy efficiency of a computing system first requires knowing where

1 CHAPTER 1. INTRODUCTION 2

and when energy is wasted. Until now, power characterization studies have either collected data at the macro scale of a whole building [29], combining all plug loads into one number, or at the micro scale from a handful of computers and LCD monitors [58]. Data at the macro scale is informative but difficult to act upon – it does not provide visibility into the computing components that can be made more energy efficient. Isolated measurements fail to show how individual devices relate to an entire IT infrastructure. This lack of comprehensive data hinders a deeper understanding of the existing problems and limits our ability to offer innovative solutions. This dissertation proposes a novel enterprise computing system that can reduce energy costs by 70-80% with no loss of performance. The proposal is motivated by an in-depth analysis of over two years of energy and utilization data from a variety of computing systems. This dissertation describes the methodology for collecting these datasets; it discusses the challenges of building Powernet, a sensing infrastructure with over 200 power and utilization sensors. The analysis of the resulting data leads to several observations. First, desktops alone account for almost 20% of the building’s electricity use. Second, utilization data show that active PCs are as wasteful idle ones because their power draw is very high despite them doing little work. Third, users’ computing needs are met by a variety of equipment, each offering a different tradeoff between power and performance. These insights drive the proposal of Anyware, a hybrid system architecture for the enterprise. Anyware uses the fact that many office workloads can be completed on lower-power computers with limited resources. This eliminates the need for waste- ful desktops but means that the performance of more resource-intensive tasks will be affected, decreasing user’s satisfaction and productivity. To solve this problem, Anyware borrows an idea from thin client systems and uses shared backend servers to give low-power client computers additional resources. Since servers are shared, their cost is amortized between users, reducing the overall energy use of the system. This dissertation describes the research challenges involved in designing Anyware. The rest of this chapter presents additional details on the data gathering method- ology and analysis, and the high-level design of Anyware. The chapter ends with a list of dissertation contributions. CHAPTER 1. INTRODUCTION 3

1.1 Measuring Energy and Utilization

Open data sets describing the energy use and efficiency of enterprise IP systems are difficult to find. Therefore, we choose to collect our own, using the Stanford Computer Science building as a sample office building. The space is largely used for graduate student, professor, and staffoffices, and houses two server rooms. Therefore, it is agoodapproximationofaworkplacethatreliesonPCsforeverydaytasks.Every month, the department pays a $40,000 (330,000 kWh) electricity bill and there is no visibility into where this energy is going and how much of it is spent on the computing infrastructure. To collect the necessary data, we designed and deployed Powernet, a sensing infrastructure that can monitor power at the plug level and utilization data at the system level. Over more than two years, Powernet has collected data from over 250 devices, including desktops, laptops, network switches and access points, servers, thin clients, etc. We make the data from the deployment available to the public [17]. Building, deploying, and maintaining such a sensing infrastructure raises a number of questions. The first step is to determine what technology should be used to collect data from a large number of individual computing devices over a long period of time. Initial experiences with off-the-shelf power meters help us identify a set of criteria of better meter design. Chapter 3 presents the design of Powernet. In addition to hardware considerations, we also study the effect of data collec- tion parameters such as measurement intervals and deployment lifetime. Chapter 6 discusses the tradeoffs when setting these parameters. Yet another parameter to con- sider is the number of devices from which to collect data. In many cases, including ours, it is not practical to plug a power meter into every single computing device in an office. Therefore, we study the best way to choose a subsets of equipment and the effect of subsampling on data accuracy. Answering the questions above results in Powernet, a measurement system that can reliably deliver power and utilization data from tens to hundreds of devices, all with minimal manual intervention. CHAPTER 1. INTRODUCTION 4

1.2 Data Analysis

The next step is analyzing the rich datasets that Powernet provides in order to gain better insight into the energy efficiency of enterprise computing. Throughout the data analysis process, we investigate a number of questions. For example, we want to be able to say something about the energy use of all desktops in an office, yet there is only data for a subset of them. So what are the appropriate ways to take these limited measurements and generalize them? We choose to add metadata in the form of equipment inventory. In addition to studying power alone, we also combine it with the utilization datasets. This part of the analysis lets us correlate energy with system use revealing waste. Interestingly, we identify not only resources that can be scaled back to save energy, but also ones that can be used more without impacting the electricity bill. Our data analysis goes beyond the systems in the Computer Science department. Powernet also provides data on an alternative setup comprising thin clients and servers. We use these data to study how different IT choices affect overall energy efficiency. Chapter 4 presents an in-depth analysis of our data sets. It characterizes the energy use of office buildings and it teases apart the high-level insights that can motivate change.

1.3 Anyware

One of the main takeaways from the Powernet data analysis is that desktops are the largest contributor of wasted energy. In the most common case this waste comes from systems that are powered on but idle. Recognizing that idle devices present an opportunity for significant and easy savings, systems such as LiteGreen [44], Somnil- oquy [27], and GreenUp [80] propose intelligently sleeping desktops, reducing energy consumption by 31–74%. In this sleep system model, the energy saved is directly proportional to the amount of time devices can be put in a low-power state that makes the user’s work environment temporarily unavailable. The energy spent when CHAPTER 1. INTRODUCTION 5

the user is active remains unchanged. While effective, such sleep approaches can only solve part of the problem. The Powernet data unequivocally shows that enterprise systems are heavily under-utilized even when not idle. This means that a traditional desktop with a power draw ranging between 100 and 160 watts, will still consume a lot of energy when it is active but lightly loaded and unable to enter sleep state. The fundamental problem is that there is a large mismatch between energy cost and workload demands. Resolving this mismatch without hurting users’ productivity can have a significant impact of the efficiency of enterprise computing systems. To this end, we propose Anyware, a new system architecture that leverages the fact that power draw is not linear with performance. It replaces under-utilized desktops with lightweight yet capable computers such as laptops and low-power PCs. These clients have sufficient resources for many common workloads; they also draw 86% less power on average when idle (14W), compared to a desktop. Of course, the lower power draw comes from reduction in hardware capabilities. To make up for the performance difference, Anyware offloads some applications to a shared backend server on the same LAN. At a high level, Anyware seems similar to thin clients, but Anyware clients are not thin at all: for the same energy as a thin client, they can execute local applications when needed and serve as personal data storage devices via fast solid-state drives. In fact, we find that some applications, such as video, cannot be effectively run remotely, but require local resources – a case that Anyware handles well and thin clients do not. Making Anyware an effective solution requires tackling several research challenges:

• How can Anyware be designed so it does not required changes to operating systems, applications, or users’ established workflow?

• How should the choice between the low-power client and the powerful server be made?

• What are the right hardware choice to run Anyware on, in order to maximize performance while minimizing energy costs? CHAPTER 1. INTRODUCTION 6

Chapter 5 describes in detail the design, implementation, and evaluation of our prototype system.

1.4 Thesis Statement and Contributions

This dissertation uses detailed power and utilization datasets to pinpoint energy waste. Based on insights from these data, the dissertation presents a new, more energy efficient, computing system. Specifically, this dissertation argues that:

Enterprise compute systems waste a huge amount of energy. A novel architecture can remove this waste without sacrificing per- formance.

Four contributions support the thesis statement above:

1. The design and deployment of a wireless sensing infrastructure, Powernet, to collect per-device power and usage measurements once per second over a multi- year time period;

2. The analysis of power and system utilization data together with equipment metadata to create a building-level characterization of energy use and waste in the enterprise;

3. The proposal, implementation, and evaluation of a new system architecture for green enterprise computing, Anyware, that provides performance comparable to that of existing systems at a fraction of the energy cost;

4. The discussion of methodology guidelines for future power measurement efforts based on deployment experiences and data.

The Powernet data collection effort results in over ten billion datapoints describing the energy behavior of computing devices, as well as the types of tasks these devices are used for. Combining the data allows us to understand how different devices affect the overall energy use of a building. In our own Computer Science department, CHAPTER 1. INTRODUCTION 7

IT systems account for 50% of the electricity use, costing thousands of dollars each month. Characterizing the energy use and utilization of hundreds of devices yields an- other benefit. It allows us to study and compare the different ways in which users’ computing needs could be met. The variety of desktops, laptops, thin clients, and servers that are present in an IT infrastructure suggests that there is not just one right device to buy. With Anyware, we explore a system design that re-envisions what resources and capabilities a personal computer needs to have. The result is a compute architecture that can perform as well as a desktop, while reducing energy use by 75%. Anyware’s performance shows that a hybrid computing infrastructure that uses multiple types of hardware can deliver a significantly more energy efficient overall system. The rest of this dissertation expands on each one of the contributions. Chapter 2 gives additional context on why there is an energy efficiency problem in the first place and reviews existing solution attempts. Since there is little energy data to aid in green computing research, Chapter 3 describes the design and deployment of Powernet, a sensing infrastructure that can fill the data gap. Chapter 4 analyzes the collected data, revealing how a mismatch between available and needed resources leads to large energy waste. Building on Powernet’s observations, Chapter 5 introduces and evaluates the Anyware system architecture. Anyware combines local execution with shared remote resources in order to meet users needs while minimizing waste. Chapter 6 discusses Powernet’s implications for future energy measurement. Lastly, Chapter 7 summarizes this work and gives future directions. Chapter 2

Background and Related Work

This chapter presents background on existing power measurement techniques and studies that highlight energy-inefficiencies in enterprises. It reviews prior green com- puting solutions and technological trends that affect the design choices behind Pow- ernet and Anyware. Rising fossil fuel prices, growing concern with climate change, and the increasing importance of energy bills to computing’s total cost of ownership have raised the issue of energy efficiency in a diverse set of domains such as data centers [55], en- terprise buildings [60], and homes [73]. Energy is now at the level of classic metrics like performance, scalability, and fault tolerance. A new willingness to consider al- ternatives to the status quo – purely for energy savings – has led to research into power-performance benchmarks such as JouleSort [78], low-power CPUs, and even redesigned [56]. In fact, the domain of energy efficient research no longer ends at the level of individual computing systems and components; it now extends all the way to mas- sive aggregations of them. Data center research examples include and storage architectures[30], cloud migration systems [44], service migration algo- rithms [75], and network power optimizers [55]. The homogeneity and single adminis- trative domain of data centers make them a compelling target for more energy efficient designs. Enterprise computing, in contrast, has seen far less study and change, de- spite the fact that it consumes at least as much power as data centers [46]. The rest of

8 CHAPTER 2. BACKGROUND AND RELATED WORK 9

this chapter reviews related work in the domains of green computing and distributed systems.

2.1 Measuring Energy Use

The research and design decisions in this dissertation are based on a detailed study of the energy use of one buildings computing infrastructure, but such data sets are not ready available and are laborious to collect. Traditionally, the only form of feedback available to a building manager has been the monthly electricity bill. With the development of smart meters [8, 26], the granularity of data has increased orders of magnitude, with many building meters now providing power readings every 15 minutes to an hour. These advances have also allowed users to monitor energy use in real time [12, 29]. The improvements in whole-building metering have brought more visibility into the temporal patterns of energy use. However, aggregate data for an entire building is not helpful in understanding where energy is going within the building. It is impossible to know how computing systems contribute to the overall cost in an office environment if the only available information is an aggregate power reading. One way to get around the problem is to model the power consumption of com- puting systems by doing measurements in the lab and understanding how different components contribute [59, 76, 84]. For example, one study [76] used power measure- ments from individual hardware components (CPU, memory, etc.) to build a model for the whole computer. These data, together with software counters to indicate the load on each subsystem, allowed the researchers to predict the energy use of another machine of the same model. The results had 20% accuracy. More sophisticated tech- niques that take into account operating-system level counters to capture system use can produce more accurate models, with sub-10% error [77]. Modeling can be an easy way to estimate energy consumptions, especially in setups where many identical machines are used, e.g. in a data center. In such a case, one or more models can be trained on a small subset of equipment, using real wall-power measurements. The rest of the measurements can be made via software counters. This approach might CHAPTER 2. BACKGROUND AND RELATED WORK 10

not be as straightforward in more heterogeneous environments, such as the enterprise. An extensive benchmark study [66] showed that increasing complexity and variety of hardware makes it difficult to create good general models. It is likely that each piece of equipment would require independent training. Additionally, continuous monitor- ing of software counters is likely to be more intrusive for users than direct power measurements. The desire for detailed data from individual devices and the limits of modeling techniques have motivated the creation of plug-level power meters [3, 16, 19, 58, 88]. Such meters sit between a power outlet and the piece of equipment that will be measured; as current passes through, the meter take instantaneous measurements and integrates them over a period of time. While all power meters provide the same basic functionality, there are several features that differentiate the available options. The first one is whether and how power data is stored. The simplest power meters display the readings on a built-in screen but do not have the ability to record any of the data. A step up are internal logging meters that have a few kilobytes of memory; users can then connect the power meter to a computer (via USB) and download the data. The most advanced meters can transmit data in real time via Ethernet or wireless signal to a PC, removing the storage limitations of internal memory. For the energy-conscious consumer a meter with a display might be enough; for controlled lab measurements, internal logging is sufficient; for long-term continuous monitoring in the wild, the ability to get data out of the meter in real time is key. Whether or not a power meter can record and transmit data also affects its ease of deployment. Ethernet-enabled meters require access to network ports and additional cables. On the other hand, meters that use wireless technology to communicate their data require the appropriate hardware to interface between the wireless medium and the end storage device (e.g. PC.) Last but not least, meters differ in how fast they can make power measurements. Current plug-level meters greatly improve on the older building smart meters from once every 15 minutes at best to once per second or faster. The custom meter design [62] we use in Powernet delivers a high sampling fre- quency and wireless data collection capabilities. One benefit of foregoing commercial CHAPTER 2. BACKGROUND AND RELATED WORK 11

options is that we have control over the software, a component we have found to be unreliable in some off-the-shelf options. Chapter 3 discusses in greater detail the Pow- ernet meter design and functionality. Chapter 6 shares experiences with one other type of meter. In the context of computing systems energy, such power meters are often used in small numbers to characterize an entire device class. For example, prior work [28, 44, 58] has often based their analyses of desktop energy use on a limited number, 10 or fewer, of data points. Such real-world data is an improvement over the limited visibility that bills and building-scale meters give, but a small sample of measurements can lead to inaccurate results. PCs are usually chosen based on convenience only, i.e. desktops in the lab where the research is done, instead of using more deliberate samples that can be representative of the typical users. Related work does not always indicate the duration over which power data was collected. In some cases [29, 58] power draw is shown over the course of a day or week, but the rest of the time it is presented as a one-time, instantaneous measure- ment. Despite the limited duration of measurements, they are used in calculations of long-term benefits of energy-saving techniques. Chapter 6 studies how the choice of monitored equipment and duration and sampling interval can affect the characteriza- tion of energy use. The lack of substantial power datasets within our research community means that it is not uncommon for academic work [40, 67] to base its analyses on Energy Star data. The Energy Star program [5] establishes standards for energy efficient consumer products, including computing systems. As part of the process of obtaining an Energy Star certification, manufacturers of PCs submit power data for each of their devices. The result is an extensive [25] of computer models and their power draw. These data, however, are not representative of the real-world power characteristics and use patterns of machines. Most recently, several large-scale power monitoring projects [45, 83] have deployed hundreds of power meters in office buildings, similar to Powernet. Their work is complementary to ours and the datasets can be of value to the whole community. CHAPTER 2. BACKGROUND AND RELATED WORK 12

Figure 2.1: Power proportionality

2.2 Existing Green Computing Solutions

Desktop computers are a significant contributor to the energy cost of enterprise IT infrastructures. Each individual machine can draw anywhere between 40 and 200 watts and it is typical for desktops to remain continuously powered on, wasting large amounts of energy. One reason for this waste is that the energy cost of an idle PC is far from zero, even though there is no useful work being done. In other words, the hardware design lacks energy-proportionality. The concept of energy-proportionality was first described in the context of data center computing [33]. Google server measurements showed that servers spent the majority of their time in the 10–50% CPU utilization range, yet their power require- ments were much higher than 10–50% of the maximum. The observation that power draw does not increase proportionally with the amount of work being done holds true for both servers and desktop PCs. This lack of energy-proportionality results in a large inefficiency when utilization is low or the machine is idle. Figure 2.1 shows a diagram illustrating the power behavior of a perfectly proportional machine compared CHAPTER 2. BACKGROUND AND RELATED WORK 13

to current systems. The benefit of an energy-proportional system is that its cost will scale with the load of the system and there will be no penalty for being in the idle state. The drawback is that optimizing a machine for every possible workload is very difficult. Energy-proportionality makes the implicit assumption that workloads are distributed roughly evenly between 0% and 100% utilization, making it worthwhile to be equally energy efficient at every point. In practice, this is rarely the case, and a system might deliver better energy efficiency if it were optimized for the most common range of operation. For example, in the data center space, one can take advantage of having many similar machines and consolidate workloads. This will result in two types of machines – those working very close to capacity and those not doing anything. In this case, the ideal scenario is a machine that is most energy efficient at 100% load, not one that is optimized for all possible loads. The unused servers can be turned offor put to sleep [87]. In the context of enterprise computing, each user typically has an individual work environment, making consolidation much harder. It is also not immediately obvious what the most common utilization range is, for which desktops should be optimized. Therefore, recent work has concentrated on maximizing the time a computer can be put to sleep – eliminating the idle PC scenario. Next, we review a number of sleep techniques; a discussion of active PCs appears in Chapter 4.

2.2.1 Making Desktops Sleep

A substantial amount of effort within the green computing community has been ded- icated to reducing the energy waste of idle computers. Instead of turning offPCs, the goal is to put them in sleep state as often as possible without affecting users’ normal workflow. The power difference between sleep and offstates for modern hardware is so small (a couple of watts) that the energy saving would be comparable. The added benefit of making desktops sleep is that when they are brought back up, the work environment is as the user left it. CHAPTER 2. BACKGROUND AND RELATED WORK 14

From a technical perspective, there is no reason why a PC cannot be put to sleep at the end of the work day either manually by the user or automatically via a schedule set it software. However, a disadvantage of the simple is that the machine’s network presence will be lost. Researchers have argues that this is undesirable since some application might require a network connection to do background tasks or a user might want to access their computer remotely, from home on the weekend, for example. In addition, studies show that manual sleep features are rarely used [82] and automatic ones are often disabled by IT staffmake patching, backup, and maintenance easier [81]. Two general approaches have been proposed to allow desktops to go to sleep when idle, yet enable them to remain reachable over the network: network proxies and migration.

Network Proxies

Sleep proxies, also known as network proxies, attempt to address the need for always- on network connectivity of computing devices. At a high level, a proxy is a device that can continue responding to network packets on behalf of a host that is sleeping. It is always-on and usually located on the same LAN [69]. For more complex tasks, the network proxy can wake up the sleeping PC. One of the earliest proxy designs, Somniloqyu [27], proposes the use of a USB desktop add-on that provides a low power network interface. When the desktop is put to sleep, the Somniloquy device can remain powered on; it runs stripped-down versions of common applications (e.g., file transfer) that can trigger the desktop to wake up and process the incoming packet. The savings one can expect are directly dependent on how often a desktop is idle. Somniloquy resulted is savings of up to 65%. The key drawbacks to the ap- proach are the need for additional hardware (the USB NIC), the need to modify some applications, and the user disruption while the machine is exiting sleep mode. Amitigationtothehardwareproblemistoemployadedicatedmachineorswitch on the same area network that serves as a network proxy for many machines. This dedicated machine will run proxy software. to better understand the design space of CHAPTER 2. BACKGROUND AND RELATED WORK 15

network proxies and their potential impact, [67] studies network traffic patterns and user idle behavior. The authors find that sleep proxies can keep machines in sleep mode for up to 50% of the time while providing uninterrupted network access for a limited set of protocols. SleepServer [28] is another software implementation of a sleep proxy that was developed on the same principles as Somniloqui but without the need for per-host hardware additions. Similar to the proxy in [67], there are machines dedicated to proxying for hosts on the same subnet, or others (via VLANs). These dedicated machines run trimmed-down virtual machine instances, one per host, that maintain the host’s network presence. An evaluation of SleepServer on 30 PCs shows about 60% reduction in energy consumption. Another real-world deployment of a software-based proxy [76] was evaluated at , in an enterprise setting. The solution achieve an average energy savings of about 20% over fifty users in six months. While this result is not as promising as earlier work that estimated upwards of 60% savings, it is an opportunity to study what factors affect performance on machines with real active users. The takeaway is that IT maintenance as well as buggy software applications prevent machine from going to sleep or interrupt sleep too many times. This highlights the need to consider how existing software might interact with energy-saving techniques. The evaluation of network proxies has gone from specialized per-host hardware to software proxies running on dedicated machine that can proxy for multiple user PCs. More recently, GreenUp [80] demonstrated that centralized dedicated machines are not necessary. Instead, GreenUp proposes a distributed proxy that runs on end users’ machines that are not in sleep state. An evaluation over 100 machines shows that desktops spend on average 31% in sleep mode.

Virtual Machine Migration

The network proxy solutions described above might require small changes to applica- tions and the OS but the overall user work environment remains the same. Virtual machine (VM) migration takes a different approach – individual applications no longer CHAPTER 2. BACKGROUND AND RELATED WORK 16

need to be modified but the entire desktop environment is put inside a VM. The ra- tionale behind this technique is that when the user is idle, their workspace can be completely migrated to a central server and the desktop can be put to sleep. The migrated VM is still running somewhere so network presence is preserved. Idle virtual machines can be consolidated on a smaller number of servers because they require fewer resources. LiteGreen [44] is a system that takes advantage of VM migration in order to enable more frequent desktop sleep. It shows that with advances in live migration, it is possible to move virtual machines between a host desktop and a local server in an acceptable amount of time. An evaluation on ten users for 28 days shows that LiteGreen can save 70% compared to always-on machines. Taking this approach one step further, Bila et al. [35] propose to use partial VM migration. One benefit is faster migration time which can allow the use of shorter idle intervals. The reduced amount of data that needs to be transferred also means that migration can be done over a wider area network. Additionally, the server can host more user VMs because each one of them has a smaller footprint compared to those in LiteGreen. From a user perspective, partial virtual machine migration is preferable because it will minimize the time it takes for the work environment to become available after a sleep period. One potential problem that neither paper addresses is how the normal use of the desktop will be affected by virtualizing it. While VM technology has improved tremendously over the years, there is still a a difference in performance between native and virtualized applications and operating systems.

Takeaway

Both sleep proxies and VM migration techniques have the same goal – ensure that sleeping desktops still have a presence in the network. This enables remote access and uninterrupted execution of applications that might require a long-lived network con- nection. The proposed approaches are an improvement over simpler sleep techniques (such as Windows settings) that do not offer remote access, and Wake-on-Lan [24] which is notoriously unreliable. CHAPTER 2. BACKGROUND AND RELATED WORK 17

However, in the context of energy efficiency, sleep approaches leave a large part of the problem unsolved. As some of the user activity data from these studies, as well as our own data in Chapter 4 show, even when users are actively using their desktops, the load is fairly low. A sleep technique can help when a 100-watt desktop is idle, but does nothing to reduce the energy use of a 105-watt desktop that is being used to read apaperorthenews.Thelackofenergy-proportionalitymeansthattheinefficiencies of the desktop extend well past the idle state into the state of low utilization. Therefore, a more aggressive energy-saving approach might argue that users’ vir- tual machines should always run on a server where they can be consolidated not only when idle but also when lightly used. Indeed, this is the overarching idea of thin clients – a compute model that serves as an alternative to standard desktop setups.

2.2.2 Thin Clients

In thin client systems, user-facing terminals have the ability to display a graphical work environment, whose drawing calls are coming over the network. All computation and I/O happen on a back-end server shared by multiple of these clients. Originally, the idea of thin clients was a way to to make expensive and rare compute resources available to more people. With the rise of personal computers, thin clients were replaced with desktops and laptops. Recently, however, there has been a renewed interest in these client-server systems. At the scale of an enterprise, they have many benefits including lower energy and equipment costs, centralized management, and enhanced security. Current thin client solutions such as VMWare’s [23] are similar to the virtual machine migration approaches above in that the entire user environment is virtualized. The difference is that the VM always runs on the back-end server, while the client machine simply displays it. The benefit is that the client does not have to be as powerful as a desktop – a more lightweight piece of hardware results in lower energy use and low utilization means servers can be shared by a large number of users. Section 4.3 presents two measurements studies of thin client deployments in order to quantify their energy efficiency. CHAPTER 2. BACKGROUND AND RELATED WORK 18

Software-based remote desktop clients can also provide a thin-client-like system when run on local, low-power commodity hardware that connects to a remote server. Examples include Windows Remote Desktop [14], RealVNC [18], and NX [15]. Despite energy and ease-of-management advantages, thin clients are not well- suited for all workloads. Specifically, since all the work happens at a remote server, applications cannot take advantage of local hardware optimizations. The result is that graphics heavy applications and video playback do not perform well, hurting the user’s satisfaction and productivity. THINC [32] is one attempt to address the problem by proposing a virtual display architecture that can send and process display commands faster. Chapter 4 investigates several thin client systems deployed in enterprise environ- ments. Using the resulting observations, Chapter 5 presents a novel approach to the problem that does not rely on putting desktops to sleep or replacing them with entirely remote execution on shared equipment.

2.3 Offloading Computation

So far, green computing research and development has focused on choosing between two extremes. At one end sit desktops – machines with all-local resources that are powerful but also power-hungry and underutilized. At the other end of the spectrum are thin clients and virtual machine solutions that recognize the benefits of consoli- dating workloads on remote servers. However, if we look at the enterprise, a middle ground approach to computing is gaining popularity. While, desktops still constitute a significant fraction of enterprise computing[49], more and more workplaces opt for laptop computers. It is worth placing laptops in the context of other computing options. Since one the primary design considerations is battery life, laptops are much more energy efficient than desktops. However, the lower power draw comes at a cost, laptops generally have lower performance than desktops, due to the type of processors used, I/O busses, and slower hard drives. If desktops are too wasteful and laptops might not always have the needed re- sources, what is the right way to combine the best of both worlds? In the earlier CHAPTER 2. BACKGROUND AND RELATED WORK 19

days of computing, users were often faced with limited local resources, while remote ones were available – either at idle neighboring terminals or at more powerful backend servers. Therefore, finding ways to share and optimize the use of all compute resources within an IT infrastructure was of interest to researchers in the 1980’s. For example, SpriteOS [72] recognized that user workstations were often idle and set out to design an kernel that will take advantage of these idle machines on the local network. SpriteOS uses remote procedure calls (RPCs) and multithreading to execute portions of code on a remote workstations. The V-system [86] also had a kernel that implemented threads. Unlike the current threads such as POSIX, the V-system used message passing for communication be- tween them. Therefore, execution was not limited to a single machine: threads could be used to spread a workload among multiple workstations. This approach to offload- ing computation is more reminiscent of today’s parallel and many-core programming techniques. These early systems were motivated by the constrained resources of any individual machine and the observation that often times users leave their workstations idle. Today, the latter of these observations has not changed but personal computers have become much more capable, increasing the waste of resources. The Anyware system presented in this dissertation tackles a slightly different problem with a different motivation but the high-level idea of sharing is the same. Instead of sharing peer compute cycles, Anyware uses shared servers and lower-end computers. From an implementation standpoint, our system is very different. Operating sys- tems and computing abstractions are deeply ingrained in today’s technology and designing a new kernel would not be an effective way to impact existing systems. Therefore, Anyware is built on top of existing solutions similar to another 1980’s system, Butler [68]. Butler gives the user the ability to execute a program remotely on somebody else’s machine using NFS and the X-windowing system. Anyware goes beyond what Butler provides in making remote execution automatic, instead of re- quiring user intervention. Understanding what processes can benefit from remote is akeycontributioninthisdissertation. More recently, migration of code at runtime has been explored in the context of CHAPTER 2. BACKGROUND AND RELATED WORK 20

mobile phones [31, 42, 71, 79]. While smart phones have improved in their ability to deliver computation, they are still significantly slower than desktops or servers. In addition, battery capacity is not growing at the speed at which computation is, raising the question of energy efficiency. To address the problem, several systems have proposed to offload computation from the phone to the cloud. Programmers using MAUI [43] must annotate methods so the system knows which portions of the code can be migrated. CloneCloud [41] and COMET [52] eliminate the need for support from the application developer. However, it does so only for processes running in a application-layer virtual machine. This assumption simplifies the task of migrating pieces of code on mobile devices but it might not be practical for enterprise computing environments. Overall, systems like MAUI and CloneCloud can results in lower energy use on the mobile device and faster completion of workloads. These two goals are similar to those of Anyware and serve are guidelines in Anyware’s design. Anyware is similar in its assumption that user data lives on the client device. On the other hand, to avoid the need for virtual machines on the user’s local machine or involvement from developers, Anyware migrates complete applications instead of portions of code. Anyware achieves a net energy saving through sharing of resources on the server, unlike mobile device migration systems which do not take server energy into account. In the extreme, systems such as Google’s Chromebook [7] offload almost all com- putation by replacing the operating system with a browser. In a way, this approach is amodernizedversionofthinclients(discussedinChapter4.3)sincethelocalmachine no longer does any work. Everything happens on a shared infrastructure somewhere over the network. does have the potential to consolidate millions of user workload, amortizing the energy cost of computing resources. It remains to be seen whether users will feel comfortable with a all-in-the-cloud approach and whether services will catch up in terms of features and performance to what a desktop application can provide. CHAPTER 2. BACKGROUND AND RELATED WORK 21

2.4 Summary

Despite the increased interest in energy efficiency and green computing, there are few detailed characterizations of the energy use of IT systems. Due to this limited visibility, most research has focused on an easy-to-identify problem – idle desktops – with goal of turning them offor putting them to sleep. Unfortunately, these sleep approaches affect users’ productivity by making the work environment temporarily unavailable. Furthermore, none of the existing solutions address the energy wasted due to active, but lightly loaded, computers. One way to tackle the latter problem is to allow users to share resources such that the high baseline energy cost of a computing device is amortized among many users. Thin clients do just that but fail to give the user dedicated local resources, affecting the performance of some tasks. The idea of sharing resources has a long history, dating back to the times when compute cycles were scarce. Remote execution would allow a user to take advantage of idle resources on other users’ machines, in addition to his own local one. Even though remote execution was not widely adopted in the 1980’s, it is worth revisiting now, when network connections are fast and a plethora of computing devices offer different combinations of local and remote resources. Chapter 3

Powernet: A Sensing Infrastructure

This chapters presents a data collection effort, Powernet, that addresses the lack of visibility into the energy use and resource utilization of computing systems. The resulting datasets reveal both the parts of the IT system that can be made more energy efficient, as well as the ways in which we can achieve this goal. Powernet is designed to collect energy and utilization data from computing de- vices, such as desktops, laptops, and network switches. It addresses the lack of energy visibility in office environments by gathering data from individual pieces of equipment. Our custom power meters, presented in Section 3.1, sense instantaneous power draw as often as once per second at the plug level, not at the building scale. The number of meters that need to be installed and the amount of data they generate raise a number of interesting deployment challenges. Unlike other studies [45, 83] aimed at characterizing the energy use of computing, Powernet measures not only power but also device usage. This is the first study to collect both power and utilizaion data with the explicit goal of understanding waste. Section 3.2 discusses the three types of usage data Powernet collects – CPU, system processes, and network traffic. This information is critical in understanding how energy is spent and in identifying inefficiencies. We deploy Powernet in several different environments, described in Section 3.3.

22 CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 23

Device Type Count Desktop 75 Monitor 70 Laptop 28 Network Switch 27 Printer 15 Server 36 Thin Clients 12 Misc 3 Total: 266 Table 3.1: Powernet covers a variety of devices whose power measurements enable a detailed characterization of the energy consumption. Some devices also have CPU utilization or network traffic monitors. Sensing Type Num. data points Power Data 10 billion CPU Usage 400 million UserProcesses 2billion Network Traffic 10 million

Table 3.2: Summary of collected data, organized by type of measurement.

The largest of these efforts is located in the Computer Science department at Stan- ford University. Table 3.1 summarizes the equipment that our sensing infrastructure measures. As of 2012, Powernet has collected over 10 billion power data points and over 2 million utilization measurements, as shown in Table 3.2. A number of sensors continue to report measurements at the time of writing this dissertation.

3.1 Power Monitoring

This subsection presents the portion of Powernet responsible for the collection of power data from individual computing devices. CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 24

Figure 3.1: Diagram of the Powernet system.

3.1.1 Overall System Design

Figure 3.1 shows the main components of the infrastructure – power meters that gather data, base stations that relay packets, and a central server that stores the resulting data traces. There are two types of meters, wired and wireless, that take power measurements and send them to the server. As the names suggest, the main difference between the two types of meters is the way data is transmitted to the server. The wired meters are plugged directly into the building LAN via Ethernet and can send data to the server using HTTP. The wireless meters, on the other hand, self-organize into an low-power wireless ad-hoc multihop network. The meters use the 802.15.4 physical layer standard [10], which is suitable for short-range communication in which the transmit power is lower than that of standard 802.11 WiFi. This ensures that Powernet’s operation does not interfere with the usage of the regular Stanford wireless network. The Powernet nodes use the Collection Tree Protocol [51] (CTP) to form a routing tree along which data packets are forwarded. CTP is the de-facto protocol for routing data in sensor networks and we discuss its performance later in this chapter. At the root of CTP’s collection tree is a base station whose responsibility is to bridge the wireless and wired domains. A base station can be any device that has both a CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 25

wireless antenna and another interface (e.g. Ethernet or USB.) Powernet uses TelosB sensor nodes [22] as base stations. These small devices can receive packets over the air and forward them over a USB cable to a physically co-located computer. Once forwarded, data is placed in UDP packets and sent to the central server over the local area network. Regardless of where data originate, they all arrive at the same destination: the central back-end server. We use a 1U rack server with two mirrored hard disk drives. The machine is located in one of the server rooms in the basement of the Gates building. Since most data comes from within the building, this minimizes Powernet’s traffic impact on the campus network facilities. The server has three responsibilities: it processes arriving data, it performs actions on existing data, and it hosts web services to make data available to the public. A set of Python scripts processes raw incoming packets by parsing their payloads to extract meter metadata and power readings. One part of the codebase executes an HTTP server in order to communicate with the wired meters. Another part receives the UDP packets created by the base-station computers. After the data is parsed, it is validated to ensure power readings are within acceptable limits (e.g. always positive), and written to a MySQL database. In addition to recording the raw data, the server periodically processes existing data in order to keep running statistics such as averages over several different intervals and device types. These summary data are stored a separate database and used for faster, higher level and analysis. The web services that run on the server are responsible for providing and display data to interested users. Section 3.1.4 lists the various ways in which we make data accessible.

3.1.2 Power Meters

This section discusses in detail the two types of power meters that Powernet uses. The off-the-shelf wired Watts Up .NET [88] meters were quick and easy to obtain so the deployment could be bootstrapped. They met our primary goal of collecting CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 26

fine granularity data from individual plugs and were also used to calibrate and cross- validate the custom wireless meters we later built. Each Watts Up device can collect power data up to once per second; data can be either stored locally on the meter or send over the Internet (via HTTP) to a central server. In addition to an Ethernet port, the Watts Up has micro USB connection for interfacing with a computer. The USB connection can be used to download local data and to configure the meter’s parameters (e.g. data collection interval and server IP address.) For Powernet’s purposes, we plugged each meter into the local area network and directed the data stream to the custom HTTP server process running on the backend machine. The meters were placed in networking closets, one of the basement server rooms, and several offices. Three key issues made the wired meters unsuitable for large-scale deployment: lack of code accessibility and remote firmware upgrade, high overhead of installing meters within the building network, and user dissatisfaction with clutter and frequent maintenance. Chapter 6 delves deeper into the experiences using Watts Up meters. The problems we encountered highlighted the need for a different approach to power monitoring. To this end, we designed custom wireless plug-level sensor that improve ease of deployment, increase flexibility, and simplify maintenance. Figure 3.2 shows the Powernet wireless meter. The case has a standard three- prong U.S. plug where users can connect their computing device. On the right side of the case, there are three small holes though which LED lights are visible. We use the lights to indicate the status of the meter when active and to debug when in the lab. Figure 3.2(b) shows the Powernet circuit board and its components. Each meter has sensing parts, a microprocessor, and communication pieces. The sensing portion includes current and voltage sensors, plus an ADE7753 digital power meter chip that multiplies the sensor values to get an instantaneous power reading [2]. The meters can sample at 14 KHz, enabling harmonics and power quality analysis, as well as controlled experiments where device utilization varies faster than once per second. The meter board houses an Epic sensor mote platform [6, 48] which provides a microprocessor and storage. We use TinyOS 2.1 [50], a sensor network operating CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 27

power measurement

LEDs

microcontroller

antenna

(a) Case (b) Insides

Figure 3.2: Powernet custom meters system, to program and use the Epic core. TinyOS provides low-level interfaces such as SPI communication, in order to communicate with the ADE7753 chip; it also allows us to write higher-level software with data processing and routing capabilities. As described earlier, the Collection Tree Protocol (CTP) lets data leave the meters. Nodes in the network exchange control packets in order to determine the best route up the tree. CTP has support for multiple collection roots, and therefore, multiple trees. Each root node’s ID is a multiple of 500, allowing the power meters to identify them. CTP’s ability to support multiple collection trees is a feature that Powernet takes advantage of – there are multiple base stations throughout the Computer Science deployment. Each base station is on a different floor which reduces congestion at the top of each tree and ensures that signal strength is no impaired by floors and ceilings. Adifferentportionofthemetersoftwareenablesover-the-airupdates.Deluge[57], adisseminationprotocol,isresponsibleforreceivingcodeupdatesfromaspecialized node and ensuring that the updates propagate through the entire network. This capability makes our Powernet nodes easier to maintain. The top-level application that run on each power meter makes a power reading CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 28

once a second, buffers ten samples, and constructs and sends a data packet. In the background, the nodes also exchange control packets to ensure the software is up-to- date and the routing tree is healthy and efficient. Lastly, the communications portion includes a low-power processor (1mA when active, 1uA when asleep), a radio (2.4 GHz unlicensed spectrum, 802.15.4-based), and an integrated antenna. The power meter board is extensively tested and calibrated. We use a Watts Up meter in line with our power meters to calibrate them at different points between 0 and 300-watt loads. We find that raw meter values exhibit linear behavior with an r-squared of 0.99 or above for all meters. Of course, our calibration is limited to the accuracy of the Watts Up meters. Complementary to our work, [45] have designed a calibration method that can achieve utility-grade accuracy. The second, and current, board revision is a result of our experiences from the first Powernet boards, combined with lessons learned by colleagues at UC Berkeley collaborating on a similar device, the ACme [58]. A major modification was the addition of a a mechanical relay which opens the possibility of turning power on and offremotely. This change removed the old solid-state relay, enabling a sealed case due to lower heat dissipation. The second change was to add an expansion port with a range of serial interfaces, to support new sensors and added storage. The cost per meter is $120, as compared to $189 for the wired meters, both in quantities of 100.

3.1.3 Deployment Experiences

The off-the-shelf Watt Up meters were a useful initial step in gathering power data and validating the custom hardware. Unfortunately, deploying and maintaining large numbers of these devices proved to be difficult and unreliable. Of the ninety or so Watts Up meters we originally deployed, about thirty are still in operation. Many me- ters would revert to internal data collection, instead of networked operation, requiring human intervention to reprogram the meter. Others would revert to the default sam- pling interval of five minutes – another problem that needed personal attention. A large number of users simply unplugged their meters or requested we remove them, CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 29

due to the frequent maintenance visits and cable clutter. Chapter 6.1 discusses in detail our experiences with deploying, configuring, and using these off-the-shelf devices. To their credit, the wired meters generally reported accurate data and are a good solution for dispersed deployments or short-term mea- surements. In contrast, the deployment of the first batch of 85 custom Powernet wireless meters took several afternoons, compared to two weeks for the wired meters. The benefits of the wireless deployment were noticed immediately, and some users even requested that we replace their wired meters with wireless ones. The IT staffwas not burdened by meter network registrations, and the open nature of the software and hardware made modifications easy. The main meter limitation is transmission distance but as the deployment expanded from a single floor to multiple ones we added two more base stations. The ability of the network to self-organize was key during this step, keeping efforts to a minimum and preventing any disruptions in data collection. Instrumenting the entire Gates building was not feasible due to the costs and practical challenges associated with monitoring over 2000 devices. Yet, we wanted detailed-enough data to understand where in the building energy is spent and wasted. Several consideration went into deciding what to instrument. We focused our efforts on one of the two building wings, considering it representative of both wings in terms of types of devices and usage cases. Further, we were only interested in computing equipment, therefore we did not include miscellaneous electric loads such as staplers, fridges, coffee makers, or lights and HVAC. This is in contrast to the @Scale de- ployment [36, 45] which adopted a stratified sampling approach in order to avoid a random sample overwhelmed by small, insignificant loads. The main goal of Power- net’s samples was to measure a wide variety of computing equipment to maximize the new information we gain. We did partially follow the stratified approach in allocating meters to go to different device categories such as servers and networking equipment. To date, we have not observed any hardware failures in the 200+ meters that have been active at different stages of the deployment. The wireless network has proven a reliable way to collect the data. This is important since indoor office environments CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 30

90 Deployment AB C D E F G H I J K Phase 75 60 45 30 15 Number of Nodes 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 Jan 1 Feb 1 Mar 1 Time (days)

Figure 3.3: Number of nodes from which packets were received at the basestation during the deployment.

Label Date Duration Description A Jan19 9hrs Buildingpoweroutage MySQL recovery B Jan21 10hrs Backendmaintenance/backup C Jan30 1hr Basestationmaintenance DFeb49hrsBasestationsoftwarefailure EFeb81hrBackendmaintenance FFeb280.5hrBackendmaintenance G Mar8 34hrs Backenddiskfailure HMar983hrsBackenddiskreplacement IMar149hrsBasestationbuffering JMar187hrsBasestationbuffering K Mar22 4hrs BackendRAID1rebuild

Table 3.3: System Outages are difficult in terms of their varying wireless environment. The low-power wireless signal is subject to interference from WiFi as well as propagation challenges due to placement – in the most common case, meters are located under desks, with little to no line-of-sight due to humans, office furniture, and walls and doors. We were able to seamlessly add additional base stations to improve connectivity without any manual re-configuration of the network. Figure 3.3 shows a 90-day trace of the number of connected wireless meters re- ported for each 15-minute period. Over the 90 days, the network experienced 11 CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 31

network-wide outages in data logging, labeled (A–K). Table 3.3 describes each out- age, including whole-building power loss, backend downtime maintenance, disk fail- ures, and gateway PC software failure, resulting in Powernet uptime of 91%. While the high point of the plot remains stable (e.g., between points D and F), it does vary. For example, a week around K (days 77-84) shows 8 nodes stopped reporting. This is not a network failure: the eight nodes were all in the same room and the 8-node outage occurred when the room was repainted and all computing equipment was unplugged and moved. Other, smaller dips represent users unplugging meters and logging delay due to MySQL buffering. Overall, the backend collected 85.9% of the expected data. Of the 14.1% of missing data, 8.2% is due to backend failures, such as whole-building power outages or server disk failures. This type of failures also affected data from the Watts Up meters and utilization sensors. Of the remaining 5.9%, we approximate that 2.8% is due to users taking meters offline by unplugging them: the remaining 3.1% of data losses are due to CTP routing problems.

3.1.4 Data Access

Throughout Powernet’s deployment, we developed several ways to access and share power data.

• Querying the database –Directaccesstothedatabasesgivescompleteand up-to-date access to data. This method was used in all analyses described in Chapter 4.

• Status reports – A webpage was setup to list all actively reporting meters. This simplified maintenance and debugging.

• Individual device timeline –Volunteerswhosedevicesweremonitoredby Powernet, requested the ability to see individual power draw timeline graphs. To this end, we provided a web interface that generated a real-time graph for any power meter. CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 32

• Powertron visualization –InordertopresentanoverviewofPowernet’s measurements, we designed an interactive web site. The visualization allows a user to explore two of the three deployment floors by selecting sensors on a floor plan. Users can chose to see high-level information in the form of color- coded meters (based on current power or daily energy use) or they can choose to dig into a specific meter and see how it compares to other similar devices. In addition to providing a website, we made Powertron available to occupants in the Gates Computer Science building via a large touch-screen display in lobby. Powertron can be explored at http://powernet.stanford.edu/powertron

• Downloadable data sets – Finally, we have made a large (1+ year) part of the Powernet datasets available to the community for download. Those can be found at http://sing.stanford.edu/maria/powernet

3.2 Utilization Monitoring

The second piece of the Powernet sensing infrastructure is collecting utilization data. The value of such data is that it puts the measured energy use into perspective – when is energy wasted and when is it not. Powernet monitors utilization on a subset of computers and network switches.

3.2.1 PCs

User workloads tax different subsystems of a computer. The CPU is one of the most power-hungry components and its load has a significant effect on the overall energy use of a computer.

CPU

To collect CPU information we provide volunteers with a program they can install on their Windows or machine. A short registration process ensures that the PC’s IP address, used for identification, is associated with the correct power meter CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 33

ID. This step is necessary so that power and CPU data can be correlated during later data analysis. On Windows, the program runs as a service and on Linux, it is setup as a script executed periodically via cron. In either case, the code checks the current CPU load once per second and after batching ten readings, sends them via UDP to the Powernet server.

Active Processes

For Windows machines, we collect additional workload information in the form of actively running processes. This data cannot be directly correlated with power draw, but it is useful in characterizing in more details the type of work office computers are used for. The active processes are also logged once per second and the record includes the process name and its current share of the CPU utilization.

3.2.2 Network Traffic

Even though they are not computing devices, network switches are a critical part of the IT infrastructure of any enterprise. Therefore, we choose to measure their utilization. The Powernet server runs an SNMP script that polls seven network switches once a minute and records the average incoming an outgoing traffic in Mbps. This measurement methodology has the drawback of missing transient peaks in the network due to short bursts of network activity. Collecting data at shorter intervals would pose too much overhead on these switches. While this might prevent certain types of data analysis, for the purposes of Powernet, the data is enough to describe the average demand on the network. We can also use the recorded number to calculate aworstcasescenarioinwhichallbitsweresentnotoveraminutebutoverjustone second, or even less. CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 34

2nd floor 3rd floor 4th floor

Figure 3.4: Map of wireless Powernet meters over three floors in the Gates building. Each dot is a sensing point and the two black triangles are base stations.

3.3 Deployments

The main Powernet deployment took place in the Gates Computer Science building at Stanford University. There were two additional, smaller, deployments that allowed us to measure types of devices not available in the Gates building. This section describes in turn each of the three environments.

3.3.1 Gates Hall

The majority of data analyzed in Chapter 4 comes from the Gates Hall deployment. Figure 3.4 shows the floor plan of one wing of the building over three floors. Each dot is a power meters and the black triangles (one per floor) are the base stations. The offices shown on the floor plans are a mixture of staff, student, and professor spaces. In addition, the type of work varies significantly between the floors – some areas host graphics research (high-power graphic cards), others host theory groups (more laptops, fewer desktops), yet others have a high concentration of administrative staff(more representative office workloads.) There are additional meters, part of Powernet, not shown in Figure 3.4. A large subset of the those are in one of the server rooms in the basement of the building; the rest are spread out in networking closets throughout Gates. CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 35

3.3.2 Thin clients

One type of computing environment that is not found in the Gates building is thin clients. These systems rely on centralized computing, with users sharing a small number of servers that execute user virtual machines. The thin clients themselves have minimal resources, enough to receive graphics call from the serves and display the user’s work environment. The motivation behind measuring thin clients as part of Powernet is that a common reason for using thin clients is that they save energy. However, there is little data for a measurement-based analysis to show what savings can be expected. We collected data from two different thin client setups. The first is an administra- tive department at Stanford University where we measure 12 thin clients, 6 desktops, and 2 servers. This power monitoring deployment used the same tools and Powernet infrastructure as the main Gates deployment. The second set of measurements was collected on a small scale at the Computer Science department at Friedrich-Alexander- Universit¨at Erlangen-N¨urnberg. We selected four representative thin clients and one of the two servers that run the virtual machines. These devices were connected to Plugwise [16] power meters (which can operate on Europe’s higher voltage.) The server also reported CPU utilization once a seoncd. Similarly to all other monitoring devices the data was sent to the central Powernet server at Stanford.

3.4 Summary

The Powernet sensing infrastructure collects continuous, fine-grained power and uti- lization data from tens to hundreds of computing devices. It uses plug-level power meters than can monitor individual pieces of equipment and has been deployed in three separate locations, with the largest on in the Stanford Computer Science de- partment. Deployment difficulties early on highlighted the importance of easy-to-install and program meters that do not interfere with existing IT operations. As a result, we designed custom power meters that are cheaper, more reliable, and easier to manage CHAPTER 3. POWERNET: A SENSING INFRASTRUCTURE 36

than existing commercial ones. This also made it trivial to extend our measurement efforts to new environments, such as the thin client ones. Despite, making it easier to collect power data, we discovered that covering all devices in a single building is a difficult and expensive feat. Therefore, Powernet monitored about 10% of all devices. This lead to a conscious effort to maximize the diversity of the measured equipment, while remaining representative of a larger enterprise. Chapter 6 discusses methodological implications of this decision. The combination of Powernet deployments gaves us the ability to collect an un- precedented amount of power data, filling a gap in prior research efforts. Powernet is also the first to place energy in the context of utilization, allowing for deeper anal- ysis of the energy efficiency of computing systems. The next chapter presents this analysis. Chapter 4

Data Analysis

This chapter uses the datasets collected via Powernet to characterize and analyze the energy use of enterprise computing devices. Much of the analysis focuses on Stan- ford’s Computer Science department. We combine power data with IT infrastructure metadata to create a picture of where energy is spent. Next, we use utilization data to reveal the inefficiencies of computing systems. The insights in this study together with observations from the thin client measurement inspire the proposal and design of a more efficient enterprise , discussed in Chapter 5.

4.1 Device Energy Consumption

We divide computing equipment into four categories – LCD screens, PCs, networking equipment, and servers – and study each device class individually. It is worth noting that despite the large amount of data collected via Powernet, the monitoring system only covers about 10% of the computing equipment in the building. Therefore, in addition to examining the empirical data, we also develop a set of methodologies that allow us to extrapolate measurements and make conclusions about the entire department. To do that, we use collect various metadata in the form of network activity logs, device registration database entries, and a survey of building occupants. This extrapolation allows us to estimate what percentage of the building’s electricity is spent on the computing infrastructure.

37 CHAPTER 4. DATA ANALYSIS 38

Figure 4.1: Distribution of power draw values for desktops and laptops monitored with Powernet.

4.1.1 Personal Computers

The energy characteristics of personal computers are diverse due to variations in hardware. In an idle state, the baseline energy use of a computer is determine by the amount of power needed to spin hard disks, to refresh volatile memory (RAM), to keep processors turned on, and and fans running. The large number of options for any of these components creates different energy profiles. Usage patterns and load introduce even more power dynamics. Figure 4.1 shows the distribution of average power for 86 machines, including 17 laptops. The ‘desktop’ curve shows a large spread with the an order of magnitude difference between the lowest power PC, a 30-watt MacMini, and the highest one, a300-wattcustom-builtdesktopwithacapablegraphicscard.Thisdiversityhigh- lights the need for widespread desktop power monitoring in order to ensure results are representative. It also means that studies that limit themselves to measuring a handful of desktops and then proceed to make generalizations will likely introduce large errors in the data analysis. Chapter 6 investigates this problem further. Another observation apparent in Figure 4.1 is the large difference between the power requirements of laptops and desktops. For the purposes of this analysis laptop CHAPTER 4. DATA ANALYSIS 39

Figure 4.2: Desktop energy varies both over time and between different pieces of equipment, as shown by these three PC data traces. data does not include the screen for a more fair comparison with desktops. The me- dian difference between laptops and desktops is 83 watts. Laptops have traditionally been designed with energy efficiency in mind since users expect to use their laptops on battery power. Thus the efficiency of a laptop was born out of necessity. The sig- nificantly lower energy requirements of laptops do not come for free. The hardware is not as capable as that of a desktop, sacrificing fast hard drives, better graphics cards, and using lower-performance processors. Therefore, choice of hardware can have a large impact on both efficiency, and user productivity and experience. This observation is one of the main motivations for the Anyware compute system described in Chapter 5. The heterogenous nature of personal computers does not end with the differences in hardware. Powernet data show that there can be a large variation in time for the same machine. This suggests that measurements over an extended time period will carry a lot more information compared to a single data point. Figure 4.2 shows the power draw of three desktops over 24 hours; each one of them has a different power behavior over time, indicating differences in how the PCs are used. The PC labeled ‘b’ does not exhibit much temporal variation and only a few data points CHAPTER 4. DATA ANALYSIS 40

might be sufficient to describe its energy usage. However, PCs ‘a’ and ‘c’ have more complicated power traces. If one were to measure, PC ‘c’ at a single point in time, the resulting number could be 60 watts or 3 watts. A longer record of its power draw is much more valuable in understanding when and how much energy is used. The next step after studying the power characteristics of individual devices is to use the empirical data to estimate the energy use of all desktops. Such information can provide more visibility into the monthly electricity bill. First, we must account for all computers, not just the ones that are part of Powernet. Since there is no up-to-date inventory of computing equipment, we use network presence as a way to gauge the number of PCs. According to the department’s database of registered devices there are about 1250 machines in the building that are actively observed on the network. Of those, about 500 are servers in the basement, giving us a count of roughly 750 PCs. The simplest way to draw conclusions about these machines would be to take the average reading of the 100 or so Powernet PCs and use that as the estimated power draw of the rest of the computers. The diversity of equipment, however, means that such a simplistic approach will be highly inaccurate. In order to capture this diversity in power numbers, we bin PCs in three classes – laptops, low-end desktops, and high-end desktops. Low-end desktops are those with average power of 80 watts or less and include models such as Mac Mini, Shuttle PC, and Optiplex. Full-size desktops like the Dell Precision are considered high-end machines and use more than 80 watts. For this part of the analysis, we assume that laptops are used with their screens on (i.e. no external monitors.) The average power draw of each device is calculated using one week of data in order to capture both week days and weekends. Binning the PCs monitored by Powernet is easy, since the power data is already available. In order to bin all the unmonitored machines, we need additional metadata. We take the 742 MAC addresses from the network database and cross-referenced them with the university’s whois service. The whois metadata includes node descriptions, such as PC model and OS, provided upon network registration. Of the 742 nodes, 456 have description that allow us to classify them as laptops, low- or high-end desktops. This is reflected in the ‘Observed’ column of Table 4.1. CHAPTER 4. DATA ANALYSIS 41

Observed Estimated Total Laptops 47 29 76 Low-end PCs 43 27 70 High-end PCs 366 230 596 Total 456 286 742 Table 4.1: Personal computers are binned into three categories, and university databases and active network node counts allow us to extrapolate to the whole build- ing.

After accounting for the machines with available descriptions, there were 286 PCs remain to be classified. These are the ones labeled ‘Estimated’ in Table 4.1. The breakdown of these 286 machines assumes that the observed distribution is represen- tative of the building. This is a straightforward way of filling the gaps in inventory information, without requiring a manual inspection of the entire building. Based on Powernet measurements, we calculate the median power draw for laptops is 26 watts (including the screen), 63 watts for for low-end machines, and 121 watts for high-end machines. This means that the three categories of machines draw 2 kW, 4.4 kW, and 72.1 kW respectively for a total usage of 78.5 kW a day or 58500 kilowatt- hours a month. According to utility bills provided by the building manager, the electricity consumption of Gates Hall is about 355,000 kWh per month. Therefore, the 742 personal computers in the building account for approximately 17% of the total electricity use.

4.1.2 Computer Displays

While not often discussed in the green computing literature, displays are just as prevalent as the PCs they are attached to. The trend of using larger size LCDs also means that their energy cost is increasing. For example, a 30” LCD can draw as much or more power than the average desktop. To better understand the contribution of computer displays to the overall cost of computing systems, we first study a single LCD. Then we present the full Powernet dataset and extrapolate to the whole building. CHAPTER 4. DATA ANALYSIS 42

(a) Power draw of a 30” Dell display under different settings.

(b) Energy consumption can be reduced by 10%-28% without affect- ing usability. In this case, a user changed to a dark color scheme.

Figure 4.3: Brightness level and color scheme have a significant effect on monitor power consumption. A one-time change in LCD screen configurations can have a large impact on the power draw. CHAPTER 4. DATA ANALYSIS 43

Figure 4.3(a) shows an hour-long data trace during which we adjusted one 30” monitor’s brightness and the associated desktop color scheme. Depending on the monitor brightness settings and the colors in the image displayed, the power varies by up to 35W (25%). Lowering the brightness by two settings (pressing the ‘–’ button on the screen twice) reduced the average power draw from 145 to 117 watts, a 19% reduction in consumption. Additionally, the energy use is visibly affected by the colors being displayed. The observation is tied to the fundamental way in which LCD screens work. In order to display brighter colors, a larger number of liquid crystals per pixel need to be aligned. This alignment permits more light to shine through and thus requires more energy. The result is that a 30-inch monitor consumed a maximum amount of energy, at 145 watts, when the majority of the screen displays white visual elements. Switching to a dark background and color scheme or viewing darker web pages reduces the draw to 127 watts. Displaying dark colors with the lower brightness setting reduces power draw to 110W. These findings prompted users participating in the Powernet deployment to lower their monitor brightness, as well as change their desktop backgrounds. Figure 4.3(b) shows typical data from one such user who only modified desktop color schemes. (The monitor brightness was already reduced.) The monitor’s power profile is shown over a working week day once in April and then again in May. We observe over 10% reduction in energy usage. For a device that is on about 40 hours a week, 400Wh are conserved. In addition to the controlled measurements described above, Powernet collected power data from over 70 LCDs of various sizes. These individual sensing points allow us to quantify the average power draw of different size LCDs when they are in use. The data is shown in the last column of Table 4.2. Extrapolating this data to all computer screens in the building has the same chal- lenge as the PC device category – there is no inventory of the existing equipment. To obtain an estimate of the number of displays and their size distribution, we conducted an online survey asking occupants for the number, size, and manufacturer of the com- puter screens they use. Table 4.2 presents data from the 169 responses reporting 225 monitors. These responses account for about 30% of the building’s occupants. The CHAPTER 4. DATA ANALYSIS 44

Size Count Power < 20” 42 30 W 20” to 22” 40 45 W 23” to 25” 84 63 W 26” to 27” 15 80 W 29” to 32” 44 120 W Table 4.2: A survey shows that majority of building occupants use mid-sized LCD displays. The number of large (30”) monitors is increasing as equipment is upgraded. cumulative power draw of the LCDs reported by users is 15 kW. We use the survey responses to make a linear extrapolation for 750 LCD screens, based on their size and popularity. Scaling to the whole building yields 52 kW or 12% of the building’s power demand during daytime. At a first glance, 12% is close to the footprint of desktops (12%). However, active duty cycling of screens reduces their energy cost significantly. Many operating systems are by default configured to turn offthe screen when the user is idle. LCDs also shut themselves offif the computing devices attached to them is powered off. In practice, Powernet data over time shows that displays are powered on around 50 hours a week. Therefore, over one month, LCD screens consume about 14,000 kWh, or 4.2% of the monthly electricity budget. This brings the total electricity use by computers and screens to 21%, or a fifth of the bill. It is worth noting that while some displays draw as much power as a desktop, the built-in duty cycling means that the overall energy use is lower than that of always-on PCs. Our analysis shows that LCD screens are an energy efficient component of the computing infrastructure. The consume energy when being used, but remain turned offwhen the user is not active. Any further optimizations in this device class would have to come from improvements in hardware.

4.1.3 Server Machines

One aspect that makes the Computer Science department building different from more typical enterprise environments is the existence of two server rooms in the basement. CHAPTER 4. DATA ANALYSIS 45

The two rooms have on the order of 500 machines that are used by research groups and outside departments. While such a large number of dedicated equipment might not be representative of an office building, it is not uncommon for enterprises to have internal cloud services such as email and shared software. Therefore, Powernet monitors 32 of the 500 servers in Gates Hall in order to observe their energy behavior. Similarly to desktops, servers exhibit varied power profiles. For example, a stan- dard 1U rack mount can have a power draw anywhere between 95 and 275 watts. Unlike desktops though, the server population is much more homogeneous; it is com- mon to see 40 identical machines in a single rack. For this reason, we spread out our measurements to get maximum coverage, with meters measuring identical devices for verification purposes. We find the average power over the Powernet dataset to be 233 watts. With about 500 servers, the aggregate draw is 117 kW – 26% of the total building energy consumption per month. Unfortunately, we were unable to collect much information from the server rooms since users were hesitant to take their equipment down and install power meters. During the process, we also discovered that there was little accountability about the addition of equipment into the two rooms. There was no record of the number of servers, how many of them were powered on, and more importantly, how many were actively used. Since these observations in 2010, things have started changing. The department recently informed faculty and staffthat the server rooms circuit breakers are operating near capacity resulting in power outages when a burst i energy usage trips the breakers. While energy cost alone has not traditionally been a motivator for improving efficiency, power provisioning limits are. Therefore, our department is actively seeking to employ both technological and policy solutions.

4.1.4 Networking Equipment

In most network switches linecard with Ethernet ports are always powered on and ready to process packets. Whether or not there is network traffic does not change the power draw of a switch. This means that networking equipment consumed the same amount of energy at all time, regardless of usage, barring small deviations due to CHAPTER 4. DATA ANALYSIS 46

Figure 4.4: Power remains constant for a wide variety of network load. The number of active Ethernet porst, and more interestingly, the maker of the equipment, have a larger effect on energy.

CPU load and fan activity [62]. Powernet measurements confirm this and Figure 4.4 show three representative data traces. For similar switch models, the number of linecards correlates strongly with power draw. The bottom two switches in Figure 4.4 illustrate this point – the NEC switch with 47 active ports (on 4 linecards) draws about twice as much power as the one with 23 ports (2 linecards). If looking at a single piece of equipment, one can expect little variation over time, suggesting that power monitoring can be less frequent or even one-off. The low variability also means that power data can be a strong signal of anomalous behavior. For example, if a 200-watt switch suddenly starts reporting much lower numbers, a linecard might have failed. If the switch reports a much higher than average number, there might be a rogue process taxing the CPU or the fans might be misbehaving. While this observation can be helpful in estimating the power draw of similar- model switches, its predictive power is extremely limited. When comparing switches that differ significantly in their make or year of production, fresh measurements are needed. For example, The 23-port NEC switch in Figure 4.4 has the same number of active ports as the HP one, yet it draws only a third of the power. Since variability CHAPTER 4. DATA ANALYSIS 47

Type # Count Power Draw (watts) HP5406zl(6-slot) 20 325 HP5412zl(12-slot) 8 500 HP 2724 2 100 Cisco Cat 6509 2 400 Cisco Cat 4000 2 600 Cisco Cat 3750G 2 160 Linksys 2 50 NEC (various) 5 100 Cisco (various) 5 100 Quanta (4-slot) 5 50 Misc (estimated) 100 10 Totalmajorswitches: 53 Table 4.3: Summary of switch types, quantities, and and estimated individual power consumptions. This inventory includes all major network switches and excludes small per-room switches and hubs. can be large when comparing different network switches, datasets like Powernet can be helpful in highlighting greener choices when new equipment is being considered for purchase. Collecting individual switch data allows us to study the network infrastructure as a whole. Unlike the PC or server device classes, getting an inventory of network equipment is much easier. Most enterprise networks are planned before deployment resulting in more homogeneous choice. The total number of devices is also significantly lower than that of computers. In the Computer Science building, the network backbone is provided by 2 core switches located in the basement and 26 edge switches spread across five floors. There are also a number of medium- and small-sized switches that have been deployed on as-needed basis. We account for all major switches and estimate the number of smaller ones with the help IT staff. Table 4.3 summarizes the types of networking equipment together with their power draw. The power draw of wireless access points is folded into the switch data since they are Ethernet-powered. We use Powernet’s measurements and inventory from Table 4.3 to calculate the daily power draw of all CHAPTER 4. DATA ANALYSIS 48

Device Measured Total Extrapolated Total Uptime Monthly Share Type Devices Devices via Power (h/day) kWh (%) Switches 27 62 network records 15 kW 24 11,000 3.5 PCs 83 742 MAC registrations 80 kW 24 61,000 17 LCD displays 70 750 occupant survey 48 kW 814,4004 Servers 32 500 manual inspection 117 kW≈ 24 86,000 26 Table 4.4: We cross-correlate Powernet measurements with IT databases to extrapo- late the energy consumption of all computing systems in the building.

Figure 4.5: Aggregate power draw for the entire Powernet building shows diurnal and weekday/weekend patterns. Computing systems account for 51% of the total 445 kW. The given week of data is representative of the building, except for Monday, which was a university holiday (Feb 15). networking equipment, 15.4 kilowatts. This translates to 11,500 kWh per month or 3.5% of the building’s total consumption. We defer a discussion on potential efficiency improvements until the utilization of the network has also been studied.

4.1.5 Whole-building Summary

So far, this section presented empirical power data and a new methodology for char- acterizing energy consumptions, using both plug-level empirical measurements and device metadata, to create a detailed picture of IT energy. We find that 50% of the building’s energy goes to computing equipment: 26% goes to servers, 17% to PCs, 4% to displays, and 3.5% to networking. CHAPTER 4. DATA ANALYSIS 49

Table 4.4 and Figure 4.5 summarize our extrapolation methodology and resulting breakdown. Ground truth is provided by aggregate measurements from outside the building, logged every 15 minutes by campus services. The top curve in Figure 4.5 shows one week of this data, with Monday being a holiday. Our data confirms prior observations and intuition that PCs and servers are major contributors to the energy bill of enterprise buildings [45]. The data also highlights that smaller parts of IT, such as networks and LCD monitors, account for almost 8% of the overall building’s electricity use. We find that displays are responsible for 50% of the building’s diurnal power variation and are the only computing component that exhibits such patterns. This confirms that there is room for improvement not only in the IT infrastructure but also in the rest of the building. In summary, we find that computing systems in the Gate building draw between 210 and 259 kilowatts, depending on the time of day, or 47% to 58% of the building’s 445 kilowatt load. This aggregate power draw translates to 170,000 kilowatt-hours, or 50% of the building’s monthly electricity usage.

4.2 Utilization

While a breakdown of the electric bill is a useful first step toward finding opportunities for savings, it is difficult to identify specific failures in energy efficiency. Energy data alone is not enough, it is only meaningful if paired with a characterization of systems’ utilization. This section examines the workloads of computers and network switches to determine what part of the energy is spent well and how much of it is wasted.

4.2.1 Computers

Related work [27, 76] suggests that desktop machines are rarely turned offwhen not in use, and Powernet power measurements over a >1 year-long period support this claim. Complementary work [34] has begun looking into how power data can be combined with occupancy sensors to decide when rooms are empty and desktops can be put to sleep. CHAPTER 4. DATA ANALYSIS 50

Percentile CPU Machine Type 5th 50th 95th Student PCs Dell Precision T3400 0% 1% 7% Dell Inspiron 530 1% 1% 8% Dell Precision T3400 0% 1% 13% HPPavilionElitem9250f 0% 0% 25% Dell Precision T3400 0% 4% 29% High-endcustom-built 0% 1% 57% Dell Optiplex 745 1% 9% 58% StaffPCs Dell Dimension 9200 0% 0.75% 3% Dell Precision 690 0% 0.7% 4% Dell OptiPlex 760 0% 0% 5.45% Dell OptiPlex SX 280 0% 0.75% 5.5% Dell Dimension 9200 0% 1.5% 8% Dell OptiPlex 745 0% 1.5% 9% Dell OptiPlex SX 280 0% 0% 10% Dell OptiPlex 760 0% 1.55% 17% Table 4.5: CPU utilization of both student and administrative staffmachines reveals that processing resources are only lightly taxed. Data was collected once a second for 11 months (students) and 1 month (staff).

Figure 4.6: CDF of traffic for seven switches over 6 months shows that switches are operating well under capacity. CHAPTER 4. DATA ANALYSIS 51

While most green computing research so far has focused on solving the problem of idle PCs, our utilization data sheds light on an equally wasteful problem – power- hungry machine that even when active, barely tax their resources. To study this low utilization further, Powernet collects data from both student and staffPCs. Since the computing needs of the two groups are likely to differ, we analyze the data separately. Table 4.5 shows the CPU utilization of a number of desktops. Computer science students use more of their available processing resources, but even so, in many cases CPU usage is under 30% for 95% of the time. The demand on administrative staff machines is even lower. Since most of the measured computers were left powered on at all times, the 50th-percentile data is not surprising: machines are often idling. What is surprising is that even when PCs are in active use, the level of usage is low –5%to10%onatypicalstaffmachine.Ifdesktopswereenergy-proportionalthat would not be an issue, but the current high baseline power draw means that the energy cost for a PC that is running at 5-6% of its capabilities is disproportionately high. In one extreme case, measurements showed that the most power-hungry staff desktop (quad-core Dell Dimensions 9200), drawing over 150 watts, has the lowest CPU utilization – around 3.1% for 95% of the time. Another way of investigating whether utilization matches the type of equipment we buy is to look at typical tasks users perform. We focus on staffcomputing because it is more representative of an enterprise computing environment. Table 4.6 shows the most common workloads on administrative machines, excluding Windows services and virus checks. The percentage of active time is calculated as the cumulative time over one month that the process was running; the range of time captures the minimum and maximum numbers over four computers. The workload data raises the question of mismatched user needs and technology. There is no reason why an entry level laptop or a Mac Mini cannot perform the same basic tasks (document editing, web browsing, PDF viewing) as a quad-core, 150-watt desktop. Characterizing the utilization of computers has revealed that there is a lot more waste than idle machines alone. The baseline power draw of desktops, combined with low use of system resources, means that there are energy-saving opportunities even when PCs are actively used. Powernet’s PC utilization data suggests that future CHAPTER 4. DATA ANALYSIS 52

Process % of time active Acrobat Professional 1% to 4% Firefox 0.5% to 4% Internet Explorer 0.3% to 2% MS Excel 1% to 2% Thunderbird 0.4% to 1.2% MS Word 0.2% to 0.8% Outlook 0.4% Acrobat Reader 0.3% Explorer 0.01% to 0.3% Table 4.6: The most popular workloads on administrative computing systems are general office and web application. These workloads imply that a laptop can be a used instead of a desktop.

Figure 4.7: Typical traffic patterns for one edge switches in the building. Network utilization remain low. Power consumption for this switch remain constant, at ap- proximately 500 watts. green computing research should tackle all PCs, not just idle ones.

4.2.2 Network Equipment

Section 4.1.4 found that the networking infrastructure consumes 3.5% of the building’s electricity monthly electricity. This translates to a cost of $15,000 a year just for networking. We also noted that switches consume a constant amount of power due to their hardware design. If the network is operating near capacity, then the 3.5% is energy spent well. Otherwise, if we find that the network operates at, say, 10% capacity even at peak, it means a bulk of the energy is wasted. CHAPTER 4. DATA ANALYSIS 53

Figure 4.8: CDF of traffic for seven switches over 6 months shows that switches are operating well under capacity.

Label Switch Type Active Ports Datatrace (gigabit each) (# days) a HP 5412zl 120 150 b HP 5406zl 96 40 c HP 5412zl 120 40 d HP 5406zl 72 150 e NEC IP8800 24 420 f HP 5412zl 24 420 g NEC IP8800 48 420

Table 4.7: Summary of groups of switches with individual and estimated total power consumption. Gates building. CHAPTER 4. DATA ANALYSIS 54

This prompts the questions of how much traffic is flowing through the 60 or so switches in the building, and whether smaller or fewer switches could more efficiently meet bandwidth demands. We begin by examining the traffic coming into one of the four switches on the second floor of the Gates building. This is an HP Procurve switch with 96 1-gigabit active ports, consuming 500 watts and serving 50+ people. Figure 4.7 shows the switch bandwidth over one week, measured once per second. The demand never exceeded 200 Mbps – an amount that could have been handled by a less power- hungry edge switch and additional small switches (2 to 5 watts each) in individual offices to meet port demand. To verify that this is not aberrant behavior, Figure 4.8 shows the cumulative distribution of traffic for 7 of the main building switches. Note that the x-axis has a log scale. The number of ports for different switches varies from 24 to 120 and the CDF data was collected over 40 to 420-day periods. Table 4.7 accompanies the figure with a list of switch types we measure and the length of each data trace. Similarly to PCs, switches are highly underutilized. For the equipment we mea- sure, total network demand is lower than 1000 Mbps for 100% of the time. The equipment in question is provisioned to handle 20 to one hundred times the traffic. Of course, network over-provisioning is not a new concept or observation; it provides benefits, including higher throughput, lower loss, and lower jitter. But when the av- erage utilization is under one hundredth of one percent, several questions are worth considering. Is the amount of over-provisioning unnecessarily large? How can we take better advantage of the large amount of bandwidth that today’s networks are ready to support? Going forward, there are two ways to address the issue: consolidate equipment and make better purchasing decisions in the future, or make use of the extra available bandwidth. The story that network traffic tells is no different than that of PC utilization – systems are heavily over-provisioned, often with no regard of expected workloads, leading to wasted energy. Powernet’s contribution is in bringing such utilization data to light and placing it in the context of green computing. CHAPTER 4. DATA ANALYSIS 55

4.3 Thin Client Setups

The majority of Powernet data was collected from the main deployment at the Gates building at Stanford. This dataset allowed us to characterize the energy use and utilization of a typical enterprise – one with a variety of equipment, most of which, use by individuals. encompassed one way of meeting the computing needs of an enterprise. Such a setup is common of many office buildings. An alternative approach is the use of thin client systems. Thin client setups rely on lightweight terminals on the users’ desks and powerful backend servers. The clients have no local compute resources but are instead responsible for displaying graphics and handling user input. The real work happens on servers, where each user has a dedicated virtual machine. From an energy perspective, the thin client approach appears promising since workloads are consolidated on the servers, making the most out of their energy use. In addition, since the clients do very little processing, their power draw must be much lower than that of desktops and laptops. Unfortunately, to date there are no detailed studies that show the per-user energy cost of thin client setups. Such data can shed light on the comparative advantages of thin clients versus standard desktop setups. To fill the gap in data, we expanded Powernet outside of the Gates bulging. We deployed smaller versions of the power monitoring setup at an administrative depart- ment at Stanford University and at the Computer Science department at Erlangen University in Germany. Next, we describe the computing at these two departments and present data from their thin client infrastructures.

4.3.1 United States VMWare Deployment

The first data set comes from an administrative department on the Stanford Univer- sity campus that handles paperwork associated with research grants. The transition to thin clients was prompted by a desire to have a greener infrastructure. Currently, the user workloads are supported by a mixture of Dell thin clients/servers, desk- tops, and a few laptops. Of those, we collected power measurements for 12 clients, 6 desktops, 2 laptops, and 2 servers. CHAPTER 4. DATA ANALYSIS 56

The desktops are a selection of Dell machines with average power draw of 55 to 100 watts. These measurements reinforce those in the Gates building dataset, showing that there is significant variation in equipment and energy cost even when the workload across machines is the same. Two members of the department have opted for laptops, presumably for mobility, as their workload involves meetings and presentations in addition to regular office applications. On average, the laptops draw between 12 and 15 watts, excluding the screens, immediately resulting in an energy reduction in comparison to the desktops of their co-workers. The remainder of our power sensors measure the thin client setup. The deployment includes Dell FX-160 Diskless thin clients, together with two Dell PowerEdge servers running VMware ESX Server. The servers have two quad-core, 3GHz CPUs and 32GB of RAM. Each user has their own virtual desktop. The system load balances these VMs, migrating them between the servers if necessary. The thin clients all average between 15 and 17 watts, with a standard deviation of less than 1 watt. As with the desktop and laptop measurements, displays are excluded. The main observation from the data is that the thin clients, despite having no real computational capabilities, draw the same amount of power as laptops. The two backend servers handle a total of 44 thin clients, a typical number for administrative departments at Stanford. To calculate the power overhead due to servers, we measured their power draw without any VMs running, as well as over several weeks under normal use. Table 4.8 shows power data from each server as well as the number of virtual desktops running on each machine. The collected data show that the server overhead for each client is 15 watts on average, 18 watts at peak. Adding together power draw of a thin client to that of its share of the backend server yields a total average per-user power cost of 30 watts. The characterization of this department is particularly valuable. Since the users whose equipment was monitored have very similar workloads, we can compare the different compute options on fair ground. The laptops have the lowest energy cost, at 17 watts, followed by the thin clients (30 watts). Lastly, there is the desktops, the most power-hungry of which draws 100 watts. CHAPTER 4. DATA ANALYSIS 57

Machine VM Min Max Avg Avg count power power power per client Server 1 21 311W 373W 328W 15.6W Server 2 23 332W 410W 348W 15.1W

Table 4.8: Power draw statistics for two servers handling 44 virtual desktops via VMWare’s ESX Server OS. Each VM corresponds to a user with a 15-watt thin client at their desk. The average total per-user computing cost is 30 watts.

This immediately highlights the tremendous savings to be had by replacing desk- tops. It also shows that a thin client setup can draw 50% more power than a laptop setup. Each of these options has different benefits. For example, thin clients allow for centralized control and easier growth of resources. Laptops, on the other hand, allow users to stay mobile. The most interesting observation is that the clients alone consume as much energy as laptops (about 15-20 watts), yet have no local resources. We return to this point in Chapter 5 and using it as a stepping stone to propose a new, more energy efficient computing setup.

4.3.2 German Computer Science Lab Sun Deployment

To validate the results from the Stanford thin client setup, we measure another sim- ilar IT infrastructure. The equipment is used by over twenty Computer Science faculty, students, and staffat Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg in Germany. The department has been using thin clients for the past ten years. The current deployment consists of 26 Sun Ray clients [70] of types 1, 1G, and 2FS. The clients are connected to two Sun Fire X4200 [21] servers where client vir- tual machines (VMs) migrate between the two servers based on load balancing and redundancy policies. The servers are backed by an HP ProLiant DL360 G7 NFS file server and a storage unit that provides 7.2TB of secondary storage. The X4200 servers feature two dual core AMD Opteron 280 CPUs each, totaling to 8 cores with 2.4 GHz each; each server has 16 GB of RAM available. The servers were purchased five years ago, with the expectation to accommodate up to 32 clients. The CHAPTER 4. DATA ANALYSIS 58

Figure 4.9: CPU for two Sun Fire servers at the Erlangen thin client infrastructure.

file server and storage unit were purchased about a half year ago, and are expected to meet the demands of the users for the next couple of years. We measured the power draw of four clients and one of the two servers in the setup with the help of four Plugwise [16] power meters. Our custom Powernet meters cold not be used due to differences in voltage and plug types between Europe and the United States. The data was collected over a period of three weeks. The server power draw over the three weeks is 302 watts on average, with a standard deviation of 6 watts, a maximum of 345 watts, and a minimum of 223 watts. Figure 4.9 shows utilization data gathered from both backend servers in parallel with the power measurements. For 75% of the time, the CPU utilization is under 50% splits between the two machines. This data once again shows a level of under-utilization but this time it is not as bad as what was observed on individual desktops. The difference is expected since the thin client setup is actively consolidating workloads. The power draw of the thin clients does not vary drastically; the newer 1G and 2FS clients draw 10 watts and the older type-1 clients – about 18 watts. The file server and attached storage unit draw another 325 watts. These measurements are consistent with the thin client deployment at Stanford, with the Sun clients drawing slightly less power than the Dell ones (15 watts.) Altogether the power data show that on average, the back-end infrastructure draws 627 watts to support 26 clients, or 24 watts per user. If we were to assume the best case scenario in which the system has 32 users, its maximum intended load, the CHAPTER 4. DATA ANALYSIS 59

per-client overhead goes down to 20 watts. The thin client power is addition to that, at 10 to 18 watts. Thus the total power budget per user is 30 to 38 watts. These numbers are comparable to the infrastructure at Stanford, with small differ- ences in the client hardware and storage options. At 30 watts, the German Computer Science department is much more energy efficient than the one at Stanford, where the median desktop draws 100 watts.

4.4 Implications for Enterprise Computing

The two sets of measurements of thin client setups contribute to the fuller charac- terization of enterprise computing energy use. While desktop computing, as seen in the Gates building at Stanford, is perhaps the most common way to meet users’ needs, alternatives do exist. Some enterprises encourage staffto use laptops, so they can remain mobile and have the ability to work from home. In addition, terminal computing via thin clients is gaining more popularity as IT stafftry to reduce energy costs and centralize management. The power and utilization data from a variety of equipment leads to a set of high-level insights and observations.

4.4.1 Systems have poor power proportionality.

Figure 4.10 shows how power draw increases with system utilization for representative examples of four different computing elements within the enterprise: a Mac laptop, a Dell desktop, an Xeon server, and a single active port on an HP network switch. All power data comes empirical measurements of devices within our building, gathered using custom wireless power meters. All devices have poor power propor- tionality [33]: their idle power draw is 48% to 100% of their draw at full utilization. One important observation is that the switch’s power draw is completely independent of its utilization. Therefore, a computing system that increases network traffic can still reduce the aggregate energy consumption. CHAPTER 4. DATA ANALYSIS 60

Figure 4.10: Relationship between power draw and utilization for three pieces of office equipment. Note that the network cost is low and constant.

Figure 4.11: Processor performance as rated the 3DMark benchmark versus maximum power dissipation. For modern processors, power draw increases non-linearly with performance. CHAPTER 4. DATA ANALYSIS 61

4.4.2 Small increases in performance can have a high power cost.

Power and performance have a non-linear relationship. Beyond a certain point, small increases in performance consume a lot of energy. Figure 4.11 illustrates these di- minishing returns for several modern mobile and desktop processors [74]. The perfor- mance is rated using the 3DMark benchmarking tool [1]. The power ratings reflect the maximum processor power dissipation, an upper bound for active power draw. AtypicalmobileCPUdrawsaboutof15-20wattsduringoperation,whileadesk- top processor reaches 100 watts or more — four to five times as much [45, 61]. While the difference in power draw is significant, this four-fold increase in energy cost does not come with a four-fold increase in performance. Therefore, we could build more energy efficient systems by using multiple points from the design space.

4.4.3 User systems have low average utilization but occasion- ally need high performance.

Figure 4.12 shows CPU utilization for seven desktops in our building; the data were collected once per second over a month-long period. The machines are largely un- derutilized – 85% of the time, CPU is under 25%. Office workloads only occasionally tax a machine’s resources, but systems are provisioned for such uncommon, bursty events. The non-linear relationship between power and performance means this over- provisioning has a significant cost: hundreds of machines consume energy for the rare cases when one of them needs high performance.

4.4.4 Networks are significantly overprovisioned.

Enterprise networks are also highly underutilized. Figure 4.8 has network traffic data from several switches within an enterprise building. The number of active gigabit ports on each device varies from 24 to 96 but the cumulative traffic is much lower than the capacity, with the peak at 20% or less. This unused capacity within the enterprise LAN can be put to use without increasing the energy cost of the network. CHAPTER 4. DATA ANALYSIS 62

Figure 4.12: Utilization data collected over 3 months from seven office PCs show how over-provisioned current desktop setups are: CPU lies below 25% for 85% of the time.

4.4.5 Current power saving techniques trade offproductivity for efficiency.

Thin clients are one common approach to reducing energy consumption. Thin client systems consist of an end user system which is typically no more than a display and alightweightprocessorforprocessingdrawcalls.Theyprovidenolocalcomputeor storage resources. Instead, a small number of powerful backend servers run all user programs and sessions, assuming that multiplexing the server resource across users improves average utilization without reducing peak performance. Thin clients have two major limitations. First, the lack of local compute resources means they cannot handle graphics-heavy tasks such as watching videos. The server decompresses a video codec designed for efficient transmission and turns it into an unoptimized series of draw calls and videos display poorly. Second, thin clients draw as much power as lightweight computers such as laptops (15–20W) [60]. Recent research on enterprise energy efficiency has concentrated on putting idle desktops to sleep by using sleep proxies and migrating user sessions [44, 67, 76, 28]. A common limitation to these approaches is that they require users to wait for their compute environment to become available. This fundamentally trades offconvenience CHAPTER 4. DATA ANALYSIS 63

and productivity for improved energy efficiency. Furthermore, putting personal com- puters to sleep is not always possible. For example, in our organization, computers need to be awake at night for nightly backups, and as prior work has pointed out [27], network wakeup mechanisms are fragile and difficult to rely on in practice.

4.5 Summary

This chapter presented a detailed study of the energy use and waste of enterprise com- puting systems. The energy data, collected via Powernet, revealed that IT equipment in the Gates Computer Science building accounts for about 50% of the overall elec- tricity use. While this number is pushed up by the equipment in two server rooms, we believe that the rest of the infrastructure (desktops, laptops, LCD monitors, network equipment) are representative of how office environments are structured today. The analysis of utilization data brought to light the large mismatch between avail- able compute resources and actual workload needs. The chapter also highlighted the benefits and drawbacks of alternative computing systems such as thin clients. Studying the network component of the IT infrastructure revealed that sending data between devices is cheap, in terms of energy. These observations suggest that green computing efforts should focus not just on a single system component, but rather think about the enterprise as a whole, with all its resources have to offer. Chapter 5

Anyware: A Hybrid Compute Model

The five insights presented at the end of the previous chapter directly motivate the design of a new system architecture for enterprise computing. The combination of low typical utilization and lack of power proportionality creates waste. The analysis of thin client and laptop data show that it is worth exploring multiple points in the design space. Lastly, the energy behavior of the network suggests that rather than focusing on individual components (e.g., CPU) or devices (e.g., server) within a computing infrastructure, redesigning it as a whole can have greater benefits with lower costs. The non-linear relationship between device power and performance, the rare need for high performance in office environments, and the opportunity to improve the use of available resources guide the design of a novel computing system called Anyware. By running most undemanding applications on low power, efficient personal com- puters such as laptops, one can reduce each person’s personal computing energy consumption by over 80%. But this benefit comes by sacrificing performance and therefore productivity. For some demanding tasks, a lightweight personal computer such as an Eee PC can take twice as long as a reasonable desktop. To provide both low energy consumption and high performance, Anyware uses a hybrid architecture. Some applications run locally on user computers and others run remotely on powerful servers which are shared across many (e.g., 25) users. The architecture is elastic since

64 CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 65

more servers can be added to scale to users’ workload needs. Anyware’s goal is to reduce energy consumption without harming productivity. It is therefore completely invisible to its users. This is in contrast with existing approaches such as LiteGreen [44] and SleepServer [28], which put computers to sleep such that users have to wake them up when needed. Applications running on top of Anyware look and run identically (configuration, preferences, plugins, etc.) whether they are run locally or remotely. All of a user’s files are always accessible. Anyware requires no centralized application authority: a user can install a new application on her file system and use it with Anyware immediately. This invisibility comes without any modifications to user applications or operating system kernels. Although Anyware is a synthesis of many existing system design concepts targeted at these problems, its contributions lie in answering the following research questions:

1. How can one run a locally installed application on a remote server without modifications, user intervention, or even a user noticing?

2. How does a client find remote computing resources for offloading applications?

3. When should a client offload applications and when should it run them locally?

4. What does a computing system designed to take advantage of Anyware look like?

5. How does Anyware affect application performance and energy consumption?

The remainder of this chapter tackles each of these questions in turn.

5.1 Overal System Design

The very low average PC utilization in the enterprise means that most work can be handled by lower power, lightweight computers such as Eee PCs, laptops, and Mac Minis. For example, if laptops have an average power draw of 20 watts while desktops have an average draw of 100 watts, simply replacing all desktops in the enterprise CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 66

with laptops will reduce idle power by 80%, more than any existing sleep solution. Furthermore, the power/performance curve of these lower power devices means that they can provide a reasonable fraction of desktop performance at a tiny fraction of the cost. But some occasional workloads need the performance of desktop, and running them on a laptop can be significantly slower. To conserve energy without harming productivity, an enterprise computing system needs multiple classes of computing devices that appear as a single enterprise system. This leads us to our first question:

How can one run a locally installed application on a remote server without modifications, user intervention, or even a user noticing?

5.1.1 Anyware Overview

Anyware is a hybrid, elastically-provisioned system architecture for enterprise com- puting. It is hybrid because a single user can be executing some tasks on a local low- power client,whileotherworkloadsareoffloadedtoamorepowerfulremote server. It is elastically-provisioned because in the spirit of cloud computing, resources can grow as needed by adding more backend servers. Anyware requires neither OS nor application changes; it exports directories via NFS and reassigns MIME types via a configuration file. Anyware assumes that user files and programs are stored locally, on the low-power client. Communication between local and remote resources happens over a local area network (LAN); we leave it as future work to explore other options such as WiFi. An Anyware client machine has a processor with a very high performance-per- joule, such as a laptop or a low-end PC. Anyware shares a small number of servers across many users, amortizing their cost similarly to thin client systems. But unlike them, an Anyware client is a fully operational computer that can function discon- nected from the server, if necessary. Unlike sleep approaches, the user’s work envi- ronment can always remain on since idle power is low. This institutes no wakeup latency or penalty. Alternatively, IT managers or users can employ sleep techniques on top to Anyware, similarly to any other computing setup. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 67

5.1.2 Remote Execution Options

The first challenge in developing Anyware is running applications both locally and remotely such that the user cannot tell the difference. We considered a number of points in the design space, including kernel– and –level modifications, network booting, and application-level virtual machines. Kernel and low-level library modifi- cations require updating the OS on all client machines, an invasive requirement which we found few would agree to, including our system administrators. Network booting has the advantage of providing a simple, central place to manage systems and deploy Anyware, but it is much slower than local booting and requires users to switch over their systems. Furthermore, people often customize their systems significantly, an ability they were wary of giving up. Finally, we considered running every application in its own virtual machine and migrating those VMs as needed. This imposes a huge overhead on the compute infrastructure. People often have six or more applications open, so using one VM per application would require significant hardware upgrades.

5.1.3 Anyware Design

After trying each of these approaches, we settled on one that operates at the appli- cation level and can be built from existing, well-known and well-tested mechanisms that system administrators are familiar and comfortable with. This simplicity makes setup practical and maintenance straightforward. Our approach requires only one change to the OS configuration and allows Anyware to run entirely in user space. The approach trivially extends to other Unix-like systems that support network file systems and graphics display. Anyware can be engineered for other OSes in a similar fashion, by sending drawing calls over the network to the client machine. An Anyware client maintains a full operating system, with all user files, configu- ration data, libraries, and applications. A small number of high-performance servers on the enterprise cloud host virtual machines. A given client can have at most one VM per server but may have VMs on multiple servers. Each server VM contains a minimal OS installation, configured only to be able to bootstrap execution of the user’s software. The Anyware client is an NFS server, exporting data and program CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 68

directories. Anyware server VMs mount the necessary directories via NFS with the sync option. All other communication between the Anyware client and VM are over SSH. X11 forwarding allows remote-execute applications to appear as if running on the client machine. The server has to carefully select which portions of the user file system to mount. Some directories and file are hardware-specific or virtual. Certain core system con- figurations also need to be unique. For instance, the client and the VM do not share the same fstab file or log files. They also do not share network-related configura- tions. The end result is that an application running on a remote server looks almost perfectly like it is running on the local client except when one digs into low-level OS services and abstractions. In this way, Anyware borrows ideas from prior work such as the Utility Processor [47] and CDE [54] to have a single execution environment which many machines mimic. Having the VMs mount NFS volumes from the Any- ware client allows the existing software on the user’s computer to run on the server as is. A user can install a new application locally and immediately run it remotely using Anyware without any modification. Anyware flexibly places applications by intercepting all user-initiated program executions through a level of indirection to operating system MIME type associations. It introduces two shadow files, mimeinfo.cache and mimeapps.list,thatcauseallfile extensions and application shortcuts to be associated with the Anyware executable. Once intercepted, the Anyware daemon decides where to run the application and invokes the original program on that host. However, the system design makes some trade-offs in favor of simplicity and prac- ticality. In particular, we note two drawbacks of Anyware’s approach. First, Anyware must choose the location of the computation up-front. The time- granularity for choosing the location of applications in Anyware is the entire duration of the application. This restricts the ability of the system to react to dynamic changes in the demands of applications, unlike systems which support process or VM migra- tion. This choice drastically simplifies Anyware’s implementation. To mitigate this potential drawback, the policy is built based on user experience ratings with real- world tasks with the goal of capturing the long term behavior of applications and CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 69

discovering features of applications which lead to a poor experience. Second, the choices of NFS and X11 may not be ideal for synchronization or interactivity. For example, NFS operations can be asynchronous, so an application may improperly believe its data has been stored to the user’s disk. Alternatively, using NFS in syn- chronous mode may significantly reduce performance. However, NFS is commonly used and simple to setup, so it is at least a good starting point for a prototype. Simi- larly, X11 mostly operates asynchronously and usually remains highly responsive, but some applications are unusable over even a high bandwidth, low latency LAN con- nection because of heavy use of synchronous X commands. In both cases, we believe these choices are the right ones for Anyware because they reduce the complexity of the system, ease deployment, and the drawbacks are not as significant in a highly controlled enterprise setting.

5.2 Finding Remote Resources

Acarefulcompositionofuser-levelsettingssuchasMIMEtypebindingsandexported filesystems allow Anyware to execute workloads remotely in a way that is invisible to the user. This goal of creating a minimally invasive computing system, raises an additional challenge:

How does a client find available remote computing resources for offloading applications?

There are a variety of ways to answer this question, from a manually configured installation to the use of a centralized authority that keeps track of server resources and that clients can query. The latter alternative would require tasking one server with special responsibilities – an unnecessary addition of complexity. The insight is that instead of treating the backend resources as a static part of the system, one can treat them as a dynamic service provided by Anyware. Anyware adopts a zero-configuration (Zeroconf) methodology that takes advan- tage of DNS-based Service Discovery (DNS-SD) [38]. DNS-SD uses standard DNS queries to allow network clients to look up a service within a specific name domain. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 70

For example, Bonjour is an implementation of DNS-SD for Mac OS, enabling services such as the Shared Folder and iTunes sharing on the same network. This technology is well suited for Anyware which operates on enterprise LANs. The current implementation of Anyware uses the Avahi Zeroconf implementation for Linux and BSD [4]. Remote Anyware servers use it to advertise themselves by placing a configuration file in the /etc/avahi/services folder. A new Anyware client can bootstrap the entire process via one of these advertisements; it requires a thin layer of software to execute an Avahi query, avahi-browse _anyware_zeroconf._tcp

and parse the returned advertisement,

=eth0IPv4AnywareServeronanyware _anyware_zeroconf._tcp local

hostname = [anyware.local] address = [172.27.76.224] port = [3500] txt = ["This is a service that provides zero-config for new clients."]

This advertisement has the server’s IP address and port. The client then uses these to establish a socket connection and download the rest of the setup scripts from the server. First, the client initiates several local configuration steps. It creates a public/pri- vate key pair (currently, without a passphrase) and modifies /etc/exports to export its directories. Next, it sends information about its operating system and architec- ture so that the server can instantiate a new virtual machine, one per user. During this step the server can pass additional information to the VM, including the client’s username and IP address. The VM creates a new user to match the client machine’s username. This is necessary so permissions for NFS directories are later set correctly CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 71

by the idmapd process that maps usernames to user IDs. After the VM is up and run- ning the client’s NFS directories are mounted. In addition to giving the VM access to applications and user data, the mount also implicitly sets up the SSH connection: both the private and public keys live in the same folder and are accessed by the server and client respectively. From the client’s perspective this is equivalent to a ssh localhost. At this point, the VM signals that the setup is complete. Since DNS-SD advertisements are cheap to send, the server can use those not only for new client setups but also for communicating other bits of information such as load statistics or specific capabilities. In a multi-server environment, clients issue Avahi queries if they want to have VMs on more that one server, for load balancing purposes. Taking a zero configuration approach to Anyware is key to being invisible to the user. A client only needs to execute a simple program to look for advertisements in order to start using Anyware; everything past the first step is an automated exchange of network messages, letting users focus on more productive tasks. This setup needs to be done only once, when new client hardware is configured. From a system’s per- spective, Zeroconf allows flexibility on both the remote and local sides of Anyware. Additional server resources can be added on demand, by initiating new advertise- ments. Clients can automatically find who to offload applications to and have the ability to choose one or more servers to connect to.

5.3 Execution Placement

Sections 5.1 and 5.2 described the mechanisms by which an Anyware client can in- visibly run applications remotely. However, we cannot expect enterprise users to be experts in the needs or performance of applications and force them to decide where to run applications. Instead, the enterprise computing system needs to decide so automatically, leading to the question:

When should a client offload applications and when should it run them locally? CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 72

The centralization worldview, argued by the administrators of a lab filled with thin clients, is that remote execution is almost always better because it can use high- end server hardware. The local execution worldview, argued by our local system administrators, posits that the added latency and bottleneck of the network mean local execution will be generally better. Which is true? Or are the two mostly equivalent except in some edge cases? The answers depend on more than just the raw performance of a client or a server. End user applications are complex mixes of computation, communication, storage, and user interaction. Furthermore, users’ satisfaction and productivity can be affected in very subtle ways. Small changes in the latencies of a task can lead people to choose very different workflow strategies [53]. Google found that slowing down search page results by 100-400 milliseconds reduced the number of searches a person performs and the effect increased over time [20]. We perform a small-scale user study in which subjects complete a wide range of computer tasks using either a laptop or remote Anyware execution and rate their experience. The results from the study (Section 5.3.2) together with a set of ap- plication features allow us to develop a statistical model. This model, described in Sections 5.3.3 and validated in Section 5.3.4, shows how a few application properties can be used to predict user preferences for remote and local execution.

5.3.1 Methodology

Five subject participate in the study by performing 39 unique tasks in two conditions. In the laptop condition,taskwereexecutedlocally.IntheAnyware condition,tasks were offloaded to a server using Anyware. The jobs span a variety of applications that capture the workloads of an enterprise environments. These include office pro- grams (e.g., create a slide presentation, edit photos, read and fill out PDFs, compress files), social communication (e.g., send an instant message or an email), and games. Table 5.1 shows the full list of workloads. To reduce bias, the order of the two conditions was randomized and the study was double-blind. Each task was run with a warm disk cache because Anyware is an CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 73

Task Description Application Class Playing5turnsofFreeCiv freeciv either Send an email thunderbird local-pref Play 5 turns of Solitaire sol either Createadocument lowriter either Add 3 address book entries contacts either Check stocks online firefox local-pref Send instant message pidgin either Create spreadsheet graph localc either Create a short presentation loimpress either Export a presentation as a PDF loimpress either Edit a photo gimp either Add items to task list gtodo either Fill out a PDF form evince either View 1 page PDF evince either View multi-page PDF evince local-pref CreateaplotinRStudio rstudio either Play a game of minesweeper mines either Look up tide predictions xtide local-pref Look up Bible passages bibiletime either Play hangman khangman either Transfer file with FTP gftp either Read RSS feeds liferea remote-pref Sendatweet turpial either Resize a photo gimp either View a photo album shotwell either Search for a location on a map google-earth local-only Compare two PDFs diffpdf either Look up a word in a dictionary artha either Manage recipes gourmet either Zip a directory of files file-roller local-pref Unzip an archive file-roller local-pref Play video mplayer local-only Drawthesun inkscape either Batch process images phatch remote-pref Practice math kbruch either Create photomosaic pixelize either Play a game of checkers flcheckers either Create a label glabels either Write a text document focuswriter local-pref Table 5.1: Tasks performed by subjects to build and test a predictive model for the Anyware. The ’local’ column refers to whether users preferred this application to remain on the client. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 74

Collection Identifier Explanation mechanism instr Instructions executed ( 109) perf ipc Instructions per cycle × perf llc %last-levelcachemisses perf xmsg Number of X messages xtrace procs Number of processes spawned strace opencalls Number of open system calls strace mbread Data read (MB) strace mbin Data sent over the network, from iptables server to laptop (MB) mbout Data sent over the network, from iptables laptop to server (MB) Table 5.2: Application features collected in order to build a predictive model for Anyware. always-on system where we expect both the server VM and the laptop to remain on for long periods. After performing a task in either the laptop or Anyware condition, the subject answered whether the application performance was acceptable (yes or no). At the end of each workload, the subject answered whether the first or second execution was better (first, second, or no difference). Lastly, we profiled each workload separately to collect application features. This profiling was separate from the user study to prevent interference with user experi- ences. These profiles include metrics such as instructions per cycle, number of window drawing calls, and data transferred over the network or the I/O bus. Table 5.2 sum- marizes the features and how they were measured. Based on the user responses, we classified applications into six classes, shown in Table 5.3.

5.3.2 Experimental Results

While there was some disagreement between test subjects on whether there was no difference between conditions or one condition was better, the results were never inconsistent. In no case did one subject mark a task as better with Anyware’s remote execution and another subject mark it as better running locally. Most applications CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 75

Class Description Local only Remote execution is unusable, local is usable Local preferred Both executions are usable, local is better Either Both remote and local were acceptable, no difference Remote preferred Both executions are usable, remote is better Remote only Local execution is unusable, remote is usable Fail Neitherlocalnorremoteisusable

Table 5.3: Execution placement classes determined from user experiments. were rated as usable in both the local and remote scenario. A majority of users classified PDF viewing and photo organizing as local preferred. Users preferred remote execution for many of the image processing tasks, as well as the Open Office document manipulation tasks. One user placed Thunderbird in the fail class due to its slow response to user interactions. Only two applications were classified as local only by two or more people – video playback and Google Earth. These tasks deserve some special examination because they highlight the failure case of systems in which all computation is remote (thin clients or remote VMs). Why is it that such tasks cannot be run remotely? Both applications update screen graphics at a high rate. The model in the derived in the next two subsections delves deeper into this issue and show an evaluation of one of these local only tasks in Section 5.5.3.

5.3.3 Logistic Regression Model

We use the features in Table 5.2 and user classifications to create a model of where Anyware should run applications. Our model is built using logistic regression, a type of regression analysis that is well-suited for predicting a boolean value [11] from predictor variables. To build a model, one specifies a data set of numeric prediction variables and their corresponding boolean results. Given a set of prediction variables, amodelproducesanumberbetween0and1,indicatingthepredictedprobability that the result is true. In the case of Anyware, the predictor variables are application features and the CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 76

boolean value is whether to run an application locally. For model training, local only and local preferred are local execution (value is 1), while either, remote only, and remote preferred are remote execution (value is 0). The either case defaults to remote in order to enable thinner, lower power clients. Anyware runs an application locally if the output of the model is 0.5. ≥ To generate a more compact model, reduce the number of features Anyware must collect, and gain insight into which features are most important, we simplify the model using single-term deletions. Single-term deletion removes individual features or combination of features whose removal does not lead to a statistically significant difference (at p value < 0.05) in the model’s predictive power. −

5.3.4 Model Training and Validation

In order to train and test a user-independent model that determines an application’s execution environment, we first need to collapse the five user’s ratings into one. Sec- tion 5.3.2 discussed how different user responses diverged. For example, four users place a task in the either category, while the last user places it in the local preferred. In these situations, we assign the task to the category that majority of users chose and note this in the ‘class’ column of Table 5.1. To evaluate the effectiveness of logistic regression based on these features and classes, we randomly select 29 of the tasks to be in the training set and test the resulting model on the remaining 10. We generate and test 20 such models, each with a different division of the 39 workloads. This evaluates whether training on a reasonable set of tasks can generate a model that is broadly applicable to many more tasks. All resulting reduced-feature models use the following same variables: k instructions − × +l MB send × +m MB received × The constants (k, l, m)dependonthetrainingdatasetandtheunitsforeach feature, e.g. megabytes versus kilobytes. What is important is the sign of each contributing variable. A positive coefficient increases the output of the function, CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 77

such that a higher value for that feature will push the application to run locally. Conversely, a negative coefficient means a higher value for that feature will push the application to run remotely. This highly simplified model captures the inherent tradeoffbetween processing and I/O capabilities. The number of instructions an application executes is a proxy for how CPU intensive the task is; higher instruction counts bias towards remote execution that can use the server’s more powerful CPU. Conversely, tasks that perform substantial I/O between server and client (MB sent and MB received) can become network latency bound in comparison to local systems. The average prediction accuracy is 88%, the minimum is 60% (in one of the 20 models), and a maximum of 100% accuracy (in five models). In only one case, there is an application assigned to an unusable environment – Google Earth which is classified as local only is predicted to be offloaded to the server. The remaining errors are tasks in the local preferred class, which the model predicted as remote. These include firefox, thunderbird, unzip, and full-screen text editing – all tasks with a high number of instructions executed. On the flip side, the FreeCiv game which users did not have a preference for, was predicted as local by the model, due to its higher network traffic.

5.3.5 Discussion

The model derived here is limited by a small-scale data collection effort – only five users and 29 training applications. Still, the logistic regression results in a model that, on average, chooses the correct execution environment 9 out of 10 times. As more data becomes available, the model can be refined. For example, it might make sense to adjust the model features so they capture the duration of workloads. Instead of cumulative numbers such as instructions and network traffic, instructions per second or Mbps can provide an additional model dimension. The current model takes a prediction value between 0 and 1 and translates that to local or remote execution. One avenue for future work is using the raw values to dynamically adjust the threshold under which a task is offloaded. For example, if a CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 78

Runtimeinseconds Gnumeric GIMP Kate Desktop 6.93 42.19 2.42 Laptop only 8.72 51.06 2.72 (+31.5%) (+21.0%) (+12.4%) Anyware on Laptop 8.21 37.11 4.75 (+18.5%) (-12.0%) (+96.3%) Eee PC only 12.25 155.39 4.33 (+76.8%) (+268%) (+78.9%) Anyware on Eee PC 9.48 44.13 4.48 (+36.8%) (+4.6%) (+85.1%)

Table 5.4: Even with Anyware, replacing a desktop with lower-end machine has a large negative effect on system performance. server is running low on resources, it may only accept tasks with a score of 0.3 or lower, instead of 0.5 or lower. Using these prediction scores opens up opportunities for custom policies that take into account available resources on both the client and server side. Another option is to incorporate energy as a decision metric in the cases in which a workload can be executed either locally or remotely. Such a decision policy would depend not only on user tasks but also on the power characteristics of the hardware being used.

5.4 Architectural Support for Anyware

The Anyware architecture trades offan increase in I/O for the ability to choose where to run an application. Given that the client serves user data, a question emerges:

What does a client system designed to take advantage of Anyware look like?

Laptops and low-end PCs often have low performance storage systems, such as low- speed (5400RPM) disks. This hardware decision becomes a performance bottleneck for the I/O-centric nature of Anyware clients. Table 5.4 shows the time in seconds it takes on average (over ten runs) to execute three sample workloads with warm caches. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 79

Section 5.5.1 describes the experimental methodology in detail; in this context, the important point is that Anyware client performance lags behind a desktop, in some cases drastically (half speed). The Gnumeric workload, which is a mixture of CPU and I/O, has a 31% increase in completion time on the laptop, while the Eee PC gives an 76% increase, compared to the desktop. This verifies that simply switching out equipment is not an acceptable path to energy savings. Anyware makes up for some of the slowdown by taking advantage of the faster CPU, but there is still an 18% and 37% increase in completion time relative to the base case. The advantage of the server-side CPU is more visible in the CPU-bound GIMP workload. In the extreme case, The Eee PC execution sees a 200% increase in com- pletion time. The combination of a single-core slow CPU, a slow disk, and not enough memory is devastating for the task. Anyware’s remote run remedies this. When used on the laptop, Anyware performs better than the power-hungry desktop; on the Eee PC, it is close to the desktop. The last workload has higher I/O demand that the previous two since it requires the reading and writing of a 3.2MB file. The slower hard disks affect execution time by 12% and 80% for the two clients, and adding Anyware makes things even worse. NFS served from a slow drive become a serious bottleneck for Anyware, resulting in almost 100% increase in execution time compared to the desktop. Since Anyware uses NFS mounts with the sync option, every write has to hit the laptop disk, before the application can proceed. The Kate workload causes NFS to send 72 ‘WRITE’ remote procedure calls over the network, compared to 12 and 6 for Gnumeric and GIMP, respectively. On one hand, the data in Section 5.3 was collected on the laptop and users rated application performance as satisfactory. Yet, the data in Table 5.4 illustrate that the slowdown is not insignificant. Even though a workload can benefit from the fast CPU at the server, it turns out that with careless hardware configurations the local hard-drive and NFS become bottlenecks. Given that many user workloads are not CPU-bound but rather I/O- bound, Anyware’s reliance on client storage becomes a liability. Anyware clients CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 80

therefore follow the trend of storage today and replace their hard disk drives with solid state drives. In our prototype, we replace the laptop and Eee PC drives with OCZ Vertex 4 256GB SSDs. Such a change raises concerns about the cost of hardware. A transition to Anyware can happen gradually, as old desktops become outdated or new personnel need new hardware. The system can work on variety of equipment, which means IT staffwould have the flexibility to choose what between equipment with different price points and capabilities. A $500 Dell laptop (Inspiron 15, Intel i3 CPU at 1.8GHz, 4GB of RAM) is com- parable to the equipment tested in Section 5.5. The more advanced XPS Ultrabook comes with more resources, and an SSD, at the price of $1000. The Asus Eee PC that was also used to evaluate Anyware cost only $280. Either of these hardware options will need an SSD upgrade – a 256GB SSD cost us an additional $180. For comparison, a powerful desktop can easily cost more than $1,000 (e.g. the XPS 8500 is $1,300). Anyware deployments will also require backend servers,assuming they are not already available as part of the IT infrastructure. The machine that we used in our setup cost $5,000 and we estimate that it can be shared by at least 25 people. This amortizes the server cost to an additional $200 per user. We defer a discussion what more significant changes to client and hardware archi- tectures might benefit Anyware to Section 5.6.

5.5 Evaluation

This section quantitatively evaluates how closely Anyware can mimic desktop perfor- mance, answering the question

How does Anyware affect application performance and energy consump- tion?

The results in Table 5.4 suggest that storage I/O is critically important: Sec- tion 5.5.2 evaluates Anyware using SSDs rather than HDDs, showing it matches and in some cases exceeds desktop performance. Section 5.5.3 illustrates the benefits of a CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 81

Machine CPU Mem HDD (GB) (RPM) Desktop Intel Core2 Quad, 2.40GHz 4 7200 or SSD Mac Laptop Intel Core2 Duo, 1.6GHz 4 5400 or SSD Eee PC Atom D425, 1.8GHz 2 5400 or SSD Server Intel Xeon, 12 cores, 3GHz 48 7200 User VM 4 cores 4 7200

Table 5.5: Hardware used to evaluate Anyware. Non-desktop clients have slower CPU and I/O performance. hybrid approach by allowing applications with I/O bound workloads to run locally. Next, Section 5.5.4 discusses how Anyware’s performance is affected when multiple users share the server’s resources. The section concludes with an analysis of Anyware’s energy savings.

5.5.1 Experimental Methodology

Our experimental Anyware setup has a client at the user’s desk and a backend virtual machine running on a shared server. We use a midrange desktop to compare Anyware to a more traditional setup. The laptop, desktop, and VM all run on 64-bit Ubuntu 11.10. The server has Ubuntu 10.04 LTS. All machines are on the same VLAN; the client machines are located on the second floor, while the server resides in the basement server room. Average ping time between client and server is 270us with a standard deviation of 44us. The server is a 12-core, 3GHz Xeon Anyware server with 7200RPM drives and 48GB of RAM. Clients are a Macintosh PowerBook laptop or an Eee PC. Choosing two points in the low-end PC design space allows us to see how client hardware variations affect performance. Table 5.5 contains the details on each device. To simulate realistic user interactions with applications, we use a Perl package that automates interaction with X11 GUIs [89]. Because we are interested in performance bottlenecks, the script executes commands as quickly as possible by waiting for X11 events indicating a previous action has completed. For example, to open a file the CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 82

script may send the keys Ctrl+O to the application, wait for the file open dialog to open, then send keys to open the correct file. We use three sample workloads used to evaluate Anyware. The first is in the Gnumeric spreadsheet application. It involves opening a data file, selecting the data, and creating and saving a graph. The second is CPU-bound. It consists of opening alargeimageinGIMPandapplyingthe‘VanGogh’visualfiltertothepicture.The third task involves opening a 3.2MB text file, editing it by adding a few sentences, then saving and exiting the Kate text editor application. In addition, we also play a video to evaluate a case in which remote application execution is not desirable. The following section presents Anyware’s performance on these workloads.

5.5.2 Anyware Client Performance

Spreadsheet. Figure 5.1 shows the average performance of the Gnumeric workload for different setups, all using SSDs. Comparing to the data in Table 5.4 reveals that the desktop does not benefit tremendously from the SSD addition since it already has a relatively fast hard drive. All other setups, on the other hand, see a significant reduction in completion time. Most notably, adding an SSD to Anyware makes its performance on the laptop comparable to the desktop’s; it takes 6.87 seconds to create and save a graph on the desktop and 6.83 seconds using Anyware. Since it is a hybrid setup, Anyware gets the best of both worlds – a fast data drive on the local side and a fast processor on the remote side. Delays due to the network transfers are negligible. While the laptop spends 3.29 seconds saving a figure (which involved CPU due to image compression), Anyware is able to do the same over the network in 30% less time, outperforming even the power-hungry desktop. The Eee PC+Anyware setup is within 10% of the desktop – a performance degradation that will be acceptable to most user. Image Editing. Both Anyware configurations complete the GIMP task faster than the 100+ watt desktop. Figure 5.2 summarizes the results; data are averaged over ten runs and total application execution time is within 3% for all iterations. The Eee PC sees an improvement of 10 seconds over the hard-drive case, to a total CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 83

Figure 5.1: On the Gnumeric Anyware performs identically to the desktop. The results are averaged over 10 runs, which are all within 0.6% of each other.

Figure 5.2: The Van Gogh GIMP filter is a CPU-intensive task so both the laptop and the Eee PC benefits from Anyware’s access to the server. The missing ‘Eee PC only’ bar is at 147 seconds. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 84

Figure 5.3: Editing a 3.2MB text file with the Kate Editor has comparable perfor- mance for all three setups, with variation between runs of at most 0.25 seconds. execution time of 147 seconds (bar omitted in figure.) This is an example of a task that makes the most out of the additional resources made available by Anyware. Text Editing. Lastly, Figure 5.3 presents data on the text edit workload. Any- ware performance match that of the powerful desktop. The Eee PC is slower due to its lower memory capacity; the entire slowdown is due to application startup, which does not affect productivity on longer tasks. Cold Cache. To present a complete picture of Anyware’s performance profile, we execute the three task on just-booted hardware, with empty data, instruction, and NFS caches. This is similar to starting an application immediately after booting a desktop. Figure 5.4 shows these worst-case data for the spreadsheet and text edit workloads. The GIMP workload is not affected, compared to running on the desktop because of the remote CPU gains. The Gnumeric task can be slowed down by as much as 35% (using the Eee PC) and the text edit suffers the most due to to the large amount of data it needs to load. The user will experience such performance on rare occasions. We leave it as future work to modify Anyware so that it pre-caches applications to produce the performance of a warm cache at all times. Summary. The time-to-completion data in this subsection highlight two impor- tant points. First, the switch from hard drives to SSDs illustrates how choosing the right hardware in combination with a new system design can produce the needed CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 85

Figure 5.4: Executing applications with a cold cache represents worst-case perfor- mance for Anyware, similar to a just-booted desktop.

Not % Not Setup Decoded Dropped Total Displ. FPS Desktop 1 2 14308 0 24 Laptop 0 5 14309 0 24 Remote 1670 2923 12639 32 18.08 Table 5.6: Remote video playback results in low frames-per-second and degraded viewing experience. Anyware has the benefit of both local and remote resources so such tasks will remain local. performance. Second, testing Anyware on two different client machines shows that flexibility in the client setup can yield different results. In a practical deployment, the IT staffand users can make a hardware decision based on a balance between hardware, price, and energy.

5.5.3 Local-only Applications

Section 5.3 revealed that some applications effectively cannot be offloaded. Workloads that require large transfer of data over the network or that take advantage of special hardware, e.g. acceleration for graphics, will perform better locally. Google Earth’s data traffic and heavy graphics makes remote-execution unsatisfactory. Another such CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 86

application is video. In order to quantify what a user might call ‘unsatisfactory performance’, we play a 10 minute video through mplayer, enabling its internal benchmarking. Table 5.6 shows the result when the video is played locally and remotely (using Anyware.) The percentage of frames not displayed is a suitable metric for video quality. The video player can skip displaying a frame for two reasons. First, if decoding takes too long, by the time it is decoded it is not worth displaying anymore. Second, if decoding the previous frame took so long that the next frame is already out of date by the time processing finishes, it will skip the frame completely. Running mplayer remotely using Anyware causes one-third of all frames to not be displayed, resulting in a video playing at 18 frames per second.In contrast, the video plays at 24 FPS on the desktop and laptop. The unacceptable remote-run performance of the video task is not a drawback for Anyware. Instead, it validates the need for a hybrid computing approach that makes use of local resources when those are needed. Thin clients cannot accommodate such tasks: everything must execute remotely, leading to unsatisfactory user experience.

5.5.4 Sharing Server Resources

Section 5.5.2 showed that Anyware can achieve acceptable performance for a variety of workloads. However, the measurements so far have been collected under ideal conditions with no additional load on the server. One of the keys ways in which Anyware saves energy is by consolidating user workloads on the server. Therefore, it is important to understand how a single user’s experience will be affected by sharing server resources. We stress three components of the server – the memory, CPU, and network and rerun the three test workloads using the laptop+SSD hardware. To introduce server memory pressure, we use a perl script that allocates memory and keeps it live by reading it. The memory usage increases from 0% to 95% in steps of 10%, or about 4.8GB. There is no observed effect on execution time for the Gnumeric and Kate workloads and an increase of one second in the GIMP execution time past the 90% of memory in use. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 87

Figure 5.5: As pre-existing CPU load on the server increases, so does the execution time of workloads. An offloading policy should take into account current utilization before assigning tasks for remote execution.

To generate pre-existing CPU load, the server runs multiple threads of a python script that does a variety of computations. The CPU utilization increase in steps of 5%, until it reaches 99+%. The text edit task is not affected as expected for an I/O-bound workload. The other two tasks experience increasing execution time as CPU utilization goes up. Figure 5.5 shows the performance of the Gnumeric and GIMP tasks at different points of CPU load. For Gnumeric, execution time increases from 6.7 seconds to 9.7 second at 99% CPU utilization. The effect of sharing the CPU is even more visible in the case of GIMP, which is a CPU-intensive task. There is a steady increase in execution time which accelerates past 20% utilization. At 30% utilization, Anyware still completes the GIMP task as fast as a desktop (40 seconds). At its worst, when the 99% of the CPU is occupied by other tasks, it takes 60% (20 seconds) longer to execute the task, compared to what it would have on a standard desktop. We use iperf to generate additional network traffic going into the server. The traffic load increases in steps of 100Mbps to maximum of 960Mbps and all traffic is on the same virtual LAN. There is no effect for the GIMP application and minimal effect for the spreadsheet one. As expected, the text edit task is the most affected. Its completion time becomes visibly variable, between 2.5 and 3.4 seconds, when the traffic on the link is past 500Mbps. A breakdown reveals that the extra 0.9 second is from opening files. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 88

Understanding the performance of applications under pre-existing load on the backend server has two purposes. First, it shows how different types of tasks will be affected when multiple users share the server. Second, it gives an idea of how to decide that a server is overloaded and should not accept any more tasks. CPU is the most important server resource, affecting both individual user experience and energy savings. Prior work [27, 61, 67, 76] and Figure 4.12 consistently show that office machines are underutilized. Therefore for a reasonable number of users (e.g. 25) we do not expect long stretches of high CPU use – any minimal slowdown of Anyware applications would be transient. As a simple policy, IT staffshould set CPU and memory thresholds past which offloaded tasks are sent to a secondary server.

5.5.5 Energy Savings

To achieve these results and still keep energy costs low, Anyware relies on shared servers. We assume that the server can support about 5 user VMs. This assumption is critical in estimating per-user equipment and server energy costs. The choice of this number is backed by data from three real-world thin client IT setups. Two of them are in administrative departments at a university; the other is an academic department. Using a thin client setup for making Anyware estimates is acceptable because in those types of deployments all work is done at the server. Therefore, if the typical thin client office can support 25 users with no local processing power, Anyware will be able to support at least as many, with a subset of tasks being completed on local machines. The three compute setups that inform the 25-VM assumption are supported by the following hardware:

• 20 user VMs: two E5450 Xeon quad-core 3GHz CPUs with 32GB RAM;

• 30 user VMs: two dual-core AMD Opteron 2.8GHz CPUs with 16GB RAM;

• 55 user VMs: Dell PowerEdge R810 with four 6-core 2.0GHz processors and 256 GB RAM. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 89

Equipment Idle (W) 100% CPU (W) Desktop 100 165 Server (total) 130 270 Server (per user) 5 11 Laptop 14 24 Eee PC 13 22 Anyware (per user) 18-19 33-35 Table 5.7: Average power draw of different types of computing equipment in watts. The per-user server values assume 25 VMs per one physical server.

The first two setups use slightly older equipment compared to the Anyware server, yet comfortably support between 20 and 30 users. The last one is newer and works out to about 4.5GB of memory and 0.87 of a CPU per user – resources similar to those on the Anyware server. While in practice it might be possible to fit many more VMs on a single server, the energy calculations use 25 user in oder to be conservative. Table 5.7 summarizes the power draw of the equipment used to evaluate Any- ware. The data is from empirical measurements collected at the device plug level; these numbers are lower than the data in Figure 4.11 which used maximum power dissipation data. The data do not include LCD screens; assume that users will use the same display regardless of PC choice. In the best case scenario, when the setup is idle, the Anyware setup consumes fewer than 20 watts. This is an 80% decrease from the desktop running at 100W. In the worst-case scenario from an energy perspective, at 100% utilization, the Anyware hybrid setup will still draw only 35 watts. The energy savings presented in this section are subject to our specific equipment setup and the assumption of 25 VMs per physical server. Figure 5.6 shows the total per-user power draw changes as the number of users per server varies. Even a system that can support 10 clients per server will have a power draw of roughly 35 watts, or a65%reductionoverdesktops. CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 90

Figure 5.6: Anyware’s energy savings depend on how many users can share the same server. Total per-user power draw lies within the shaded region.

5.6 Discussion

Anyware shows that it is possible to build an invisibly different enterprise computing infrastructure that consumes only one fifth the energy of systems today. This hybrid computing system, by using both highly energy efficient clients as well as high perfor- mance servers, achieves these savings without sacrificing performance and therefore productivity. Furthermore, Anyware supports disconnected operation. If a person takes their client home or the network fails, all that happens is some tasks that would run better remotely run locally. Furthermore, the system provides flexibility in its elasticity. The ratio of servers to clients can scale with the degree of demanding applications a particular enterprise runs. The growth of cluster computing saw the emergence of networked and distributed file systems such as AFS and Coda [63]. Anyware extends this to include distributed execution, but in a narrower sense than distributed operating systems such as Amoeba [85], where any process can run any- where and storage is replicated. Instead, Anyware’s disconnection-friendly approach complements the recent trend towards syncing data with the cloud for long-term durability and sharing through services such as Dropbox. The computing trend of the 1980s and 1990s was towards individual, distributed CHAPTER 5. ANYWARE: A HYBRID COMPUTE MODEL 91

resources, as systems moved from mainframes to workstations and personal comput- ers. This was in part due to the non-linear dollar costs of equipment: a mainframe with ten times the capabilities had a price tag more than ten times greater. Now we see a similar non-linearity in power. and Anyware amortizes the energy cost across clients in a manner similar to the old mainframe computing systems. This also re- sembles the trend towards elastic computing within the cloud – why buy and pay for a server you only occasionally need when you can just use one on demand? The current Anyware implementation uses existing computing devices. One intel- lectually interesting question to ask is, were Anyware to become a common computing model, how would one design a client for it? Section 5.4 argued that optimizing I/O is the most critical step. Section 5.5 demonstrated that an Anyware setup that has a high performance I/O architecture can significantly improve performance; it can also reduce the cost of cold caches. Multiple SSD drives, each on a separate I/O channel, can improve read/write throughput. Such an approach would more closely resemble embedded systems and storage area network architectures (e.g. HP’s 3PAR [9]) than traditional clients. Solid state drives mean improved performance does not have to cost energy. DDR3 memory dissipates approximately 1-1.5 watt/GB on average [13] and idle power dominates costs [65]. An Anyware client can therefore safely have significant memory resources to run complex applications locally and remain responsive. Mobile devices have been tremendously successful at bringing compute resources at a very low energy cost. Many modern laptop CPU chipsets draw about 15 watts when idle and 20-30 when active. At the same time, tablets and smartphones are getting increasingly more capable at power draws of under 10 watts. An Anyware client can take advantage of the progress made in mobile computing, in combination with features common on standard PC to enable an energy efficient system with elastic resources. Essentially, the Anyware client becomes a storage device, with a low-power processor attached for local applications, and GPU or perhaps ASICs for video processing since many local tasks are graphical in nature. Chapter 6

Methodology Guidelines for Energy Research

This chapter uses the experiences of deploying Powenet and analyzing its data to present a discussion on the methodology of energy-related research. It starts by recollecting how the use of off-the-shelf power meters highlights key desirable features in a plug-level meter. It then studies how the methodology of collecting and analyzing power data can be improved. The chapter summarizes the steps necessary for a successful power monitoring deployment.

6.1 Power Measurement Hardware

The Powernet deployment of the Watts Up power meters was a valuable experience that informed what an ideal power meter would look like. Using off-the-shelf equip- ment manifested a number of problems. The first practical issue was the lack of in-field upgradable firmware. When a bug was discovered in the TCP stack, the only option was to pack up four large boxes of power meters and send them back, so that company staffcould fix the propri- etary code. It took several weeks to received the repaired meters so the deployment could begin. This roadblock highlights that power meters must be upgradable remotely to minimize the time and effort required to program individual devices.

92 CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 93

In the early stages of the Watts Up deployment, it became clear that few offices had an open Ethernet port for each power meter. Many required additional small Ethernet switches and extra cables. The volunteer participants were unhappy with the clutter under their desks, partially due to wiring and partially due to the size of the meters themselves. Each one weighs 2.5 lbs, with a thick, six-foot-long cord leading to a 7” x 4” x 2” base. These observations were a driving force behind the design of a smaller, wireless meter. Another lesson that pointed to the need for wireless meters was network registra- tions. In the Gates building, each device must have a MAC address registration to obtain an IP address. Each group within the building has a unique VLAN, so each meter would be statically registered to a group. The network admins were burdened by the the process of entering the MAC address and pertinent information for every individual meter; with this much manual configuration, mistakes inevitably happen. After about 80 meters were deployed., we received an email from an IT admin- istrator stating that “more than half of all DNS lookups emanating from [the three Engineering buildings] to the campus servers” were coming from the power meters. This was a clear signal that as-is, the power monitoring infrastructure was interfer- ing with the normal operation of the network. The immediate solution for the lack of DNS caching was to go back to each meter, plug it into a laptop via USB, and hard-code the IP address of the Powernet server. The meters were also making ARP requests once per second and overwhelming the monitoring infrastructure. We received another email from the IT staff, pointing out that ”[t]he 70 current meters now account for 20% of total daily recorded flows” by the security system. To work around this problem, the logging server was moved to a special VLAN that was not monitored by the network admins. That resulted in an IP address change, which meant yet another trip to the individual meters to update the hard coded IP address of the server. The problems the meter deployment caused to the existing computing infrastruc- ture in the Gates building led to the lesson that monitoring equipment should be separate from the main setup. The Powernet wireless meters were designed to ad- dress to challenges identified via the use of the Watts Up meters. Any large power CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 94

monitoring deployment should strive for the following practical goals:

• minimal infrastructure requirements – no need for additional Ethernet cables, network switches, or power strips;

• remotely upgradeable software;

• minimal interference with existing network setups when transmitting data.

Meeting these goals can help a deployment be more welcome in the community, be completed faster, and remain cost-effective.

6.2 Methodology Lessons in Power Measurement

Once individual meters are collecting data from devices, there are many deployment parameters that can be adjusted including duration and frequency of monitoring, type and number of devices. The extensive Powernet datasets provide an opportunity to investigate some of these choices.

6.2.1 Sampling Frequency

Prior work has rarely discussed in detail the sample interval at which power is mea- sured. If the end goal is to calculate the amount of energy used up over a day or week, the frequency is not of great concern. If the goal is to analyze data or to de- tect changes and anomalies, then sampling speed could make a difference. There are inherent tradeoffs associated with the choice interval. Less frequent sampling will re- sult in less stress on the measurement infrastructure and a more manageable dataset. The longer the interval between consecutive samples, the larger the risk of missing interesting events in-between. An interesting example appeared during the Powernet deployment, illustrating the point above. Figure 6.1 shows a one-hour timeline of power draw for a Dell Studio XPS desktop. Each datapoint in the graph is the average power over the last 1 minute, for a total of 60 measurements. Minute granularly is not atypical CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 95

Figure 6.1: Five-minute averages of power data do not show anything out of the ordinary – the PC is idle at about 95 watts.

Figure 6.2: Power data collected once a second reveals a misbehaving PC. Earlier, 5-minute averages hid the anomaly. In certain use cases it is beneficial to have high- resolution data. for many commercially available plug-level meters. The flatness of the line indicates that the computer was largely idle for the duration of the data collection period. Furthermore, the value of approximately 100 watts is a reasonable power draw for a full-sized desktop – nothing is out of the ordinary. Increasing the granularity of measurements paints a completely different picture, as shown in Figure 6.2. This is the same one-hour period but power samples were recorded once a second, for a total of 3600 measurements. The frequent jump in power draw is immediately obvious and taking the FFT of the data confirms that the 30-watt spike is regular with a period of one minute. Upon further investigation, we were able to correlate the power measurements with CPU spikes caused by the wireless card on the desktop. A quick search online CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 96

Device #1 #2 %diff Optiplex 760 60W (9) 34W (24) 43% Optiplex SX280 68W (12) 56W (8) 18% Optiplex GX620 71W (8) 63W (13) 11% Precision T3400 117W (17) 110W (10) 6% HP 5400zl switch 467W (8) 463W (4) 0.01%

Table 6.1: Average power draw of two different devices with the same model (standard deviation shown in parentheses). Two devices of the same model can differ by as much as 43%. Networking equipment is more uniform than PCs. revealed the existence of a bug in the Linux drivers for this specific piece of hardware. Turning offthe wireless card solved the periodic CPU/power spike. Concurrent work [39] has found that 10-second intervals are a reasonable choice, capturing power dynamics without overloading the infrastructure. Our experiences showed that depending on the task, different resolutions are desirable. For many prac- tical uses – visualizing data, computing long-term estimates – even 5-minute averages are useful. Higher-resolution data, on the other hand, is needed for correlating uti- lization metrics with power draw. In the extreme case, measurements on sub-second scale can be used to study power harmonics. While different deployments will have varying resources and goals, experiences with Powernet teach us that having high-resolution data can be valuable not only in energy characterization but also in monitoring for unexpected behavior.

6.2.2 Device Variations

Prior work has implicitly assumed that instances of the same equipment model or specification have the same power characteristics, simply because of the lack of better data. Under this assumption, measurements taken from one or two devices have been used to reason about other, unmonitored pieces of equipment. Unfortunately, such methodology can yield inaccurate results. Powernet data reveals that some types of computing systems can exhibit large variations even when comparing two instances of the same device model. Table 6.1 shows five example devices – 4 Dell desktop models and 1 network switch. The two CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 97

Dell Optiplex 760 desktops have over 40% difference in their average power draw. In contrast, the two HP switches have almost identical power draw, as well as very low standard deviation over time. In some cases, even though the PCs appear to be the same on the surface, they might have been upgraded with custom components, causing a difference in power draw. Furthermore, while two devices from the same type might have similar motherboards, power supplies, and processors, they can differ in in the user workloads they support, leading to different power profiles. There is no single solution to error in estimating power, when only partial mea- surements are available. In the case of desktops, it is not surprising that there is great variability, but putting concrete numbers to it can help anticipate inaccuracies. Additionally, one could augment power measurements with other data, such as PC utilization, to get more accurate understanding of how equipment is used. In the cases when variability is low (e.g. switches), data points from only one or two devices can be treated as much more reliable. One argument for monitoring a larger number of low-variation devices is anomaly detection. For example, if a set of machines are known to be identical in hardware and usage, power data can highlight unexpected behavior. In one specific case, Powernet was monitoring ten 1U server machines in a single rack. The user expected the same behavior and power characteristics from all the devices – they would either be idle or be running an identical workload in parallel. The initial power data showed very close readings for all machines, with one noticeable exception. One of the servers was drawing 308 watts, 20% more compared to 245 watts for the other machines. The utilization data confirmed that all the machines were idle. Running a demanding, balanced workload resulted in the same increase in power for all machines. Was the special 308W server misconfigured, did it have a malfunctioning component, or was its position in the rack affecting its power draw? To test the latter theory, we swapped the outlier server, which was at the top of the rack, with the bottom server. After the swap, the power consumption of the original server dropped to 245W, while its replacement’s increased from 250W to 270W. This observation confirmed the hypothesis that cooling at the top of the rack was an issue, causing the fans to spin faster and longer. Further investigation CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 98

showed that a network switch was placed close to the top server, generating additional heat. In this case, utilization alone could not explain the power variation within a server rack. Thus, an infrastructure that senses multiple modalities – power, load, temperature, configuration, fans, etc. – can help create a more complete picture of energy usage.

6.2.3 Device Sample Size

In most power monitoring studies it is not practical to measure every single device. For example, the Gates building has an estimate of over 2000 computing devices, but Powernet only had about 200–300 plug-level meters. This raises the question of the accuracy of prediction for a whole device class, given limited data. Given the Powernet dataset we can Figure 6.3, the Powernet curve, shows the wide distribution of desktop power. It is worth considering what errors can be expect if one were to sample only part of the PC population. We use ground truth data from 69 desktops to show how the expected error of average power draw changes if based on random samples from the population. The average power draw of the 69 desktops is 109 watts. We generate 1,000,000 random samples of size 5, 10, and 20, drawing from the lists of 69 machines. Figure 6.4 shows the resulting histograms of estimated average power. Samples of only 5 desktops can have more than 16% error in estimating the mean power draw. Increasing the sample size from 5 to 20 machines cuts the error by more than a half. The lesson is that when it comes to PCs, a small sample is not desirable if trying to extrapolate to a large, heterogeneous set. Recent work [36] correctly brings attention to the importance of complete device inventories in order to understand how varied an environment is and targeting measurement points accordingly.

6.2.4 Duration of Measurements

The Powernet datasets show that while the base load of computing systems is con- sistently high, month-to-month variations do exist. These changes result in slightly different energy use throughout the month and year. For example, over one year the CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 99

Figure 6.3: PCs have a wide distribution of power draws, making it critical to have multiple measurement points.

Figure 6.4: Desktop diversity requires the measurement of a large sample of the population. In this experiment, if only 5 desktops are used to estimate the power of all 69, then the expected error is over 16%. CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 100

Figure 6.5: As the number of months of data increases, the standard deviation of error in estimates decreases. Even if only one month of data is used over 16 desktop, the year approximation will be within 4% of the true value of $1600. monthly power draw average of one desktop varied from 183W (min) to 293 (max), with 216-watt average over the whole year. Another PC was consistently between 247 and 257 watts. One question to tackle is ‘How does the duration of power mea- surements affect a yearly cost estimate for some set of devices?’ Our analysis uses data from 16 desktops; each PC was monitored for one year, from May 2010 to April 2011. The cumulative average power draw of the sixteen PCs is of 1524 watts ($1600 for the whole year, at $0.11 per kWh.) Examining the monthly average power draw of each machine reveals that no single month is representative of the whole year. If we were to take one day or week or month data in the hopes of estimating the yearly electricity cost, we should expect to be off. But by how much? Using a single day of data from the year-long trace allows us to generate over 350 different estimates for the yearly cost. Similarly, for one week or month of data, we can compute multiple cost estimates. We can also repeat the process using sliding windows with size of two to 11 months. Figure 6.5 summarizes the results. The x-axis shows what duration of data was used for the estimate. The y-axis shows the average and maximum error of the estimate as percent of the real energy cost. In a worst case scenario, measurements taken for less than a week can have error of 15% CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 101

or more. At the scale of a building IT systems, such error in predicting costs can be thousands of dollars. On the positive side, the analysis shows that collecting data at the month timescale, as opposed to longer, could yield data with an acceptable error. These results are in line with concurrent work at Lawrence Berkeley National Labs [39] who found that two months of data yields an acceptable tradeoffbetween deployment effort and accuracy.

6.3 Four Steps to Characterizing Energy Use

The prior section pointed out several data and methodology considerations which, if not though out carefully, can lead to inaccurate analysis of energy use. The results from Powernet, however, represent only one point in time. As computing continues to evolve, green computing research will need to periodically re-measure energy con- sumption and waste. This raises the follow-up question: ‘Given limited time, money, and effort, how should one measure computing system energy consumption in order to minimize error?’ This section presents methodology guidelines to aide future green computing research.

Step 1: Characterization

Not all device classes are equal: some require much more effort to measure accurately than others. Table 6.1, for example, showed a 43% variation in the power draw of Optiplex 760 PCs but a 0.01% variation in the power draw of HP 5400zl switches. An approximate ordering of the different devices in terms of variability places desktops as the most diverse, followed by servers, laptops, LCD monitors, and lastly, switches. Rather than distribute measurement points uniformly, one should measure the high variation device classes more densely. But device classes change quickly: Dell, for example, no longer sells Optiplex 760 PCs. Being able to determine which device classes have significant variation requires up-to-date, current measurements. To understand where to measure, one first needs to know which device classes are high variation and which are not. This can be done quickly, as a series of point measurements made over a day. For example, suppose that an enterprise has a large CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 102

Figure 6.6: A week-long trace of power consumption and CPU utilization shows how well the two track each other, with r2 =0.996. number of a new Dell PC. One can randomly select 10 of these PCs and measure each of them booting. This will provide a large dynamic range of power measurements within the class as well as across the class. If the 10 show significant differences, then they might need to be measured densely. One can use the observed power draw distributions and statistically compute what deployment of sensors will lead to the lowest observed error. These point measurements should use simple digital readouts (e.g., Watt’s Up or Kill-A-Watt meters) which a person reads and writes down. Depending on a wireless mesh or wired network ports is probably more trouble than it’s worth (lack of connectivity, VLANs, etc.).

Step 2: Measurement

Once a short-term characterization study has provided guidance on where to deploy sensors, they need to be deployed for a sufficient duration. We were able to use custom sensors and our own software to collect data over a wireless mesh, but this technology is not commonly available. Our experiences with Watt’s Up meters – coordination with IT infrastructure, reconfiguration, failure etc. – was that they are a poor choice for a very large, long-term deployment, but are acceptable at smaller scale. The results in Figure 6.5 showed that energy consumption, especially for personal computing, changes significantly over time. One should measure for at least a week, and preferably for a month. After a month, expected error, even for high-variation devices, drops to 4%. CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 103

In addition, while using power data from one machine to estimate another’s can be problematic, CPU load can sometimes be used as a proxy for power on a single, calibrated machine. Figure 6.6 shows one week of power and CPU data for a desktop. Visually, it is immediately noticeable that CPU tracks power very closely, with r2 = 0.996. Therefore, in the context of desktops, one can collect a limited set of power meter measurements followed by the use of software sensors that report a feature (CPU) that is closely correlated with power. Overall, given the choice between breadth (number of devices measured) and depth (length of measurement), greater breadth generally leads to more accurate results. At the extremes, it is better to gain a single point measurement of every device than measure one device for a year.

Step 3: Extrapolation

The final step is to take the set of biased measurements and extrapolate to whole system power. Our experiences with Powernet have highlighted the need for data beyond power and utilization measurements. If extrapolation is to be successful, one also needs metadata in the form of equipment inventories and descriptions. Surpris- ingly, such metadata is not nearly as complete and readily available as we had hoped. Rather, we had to resort to indirect sources such as cross-correlating networked device registrations with active IPs on the network. In the future, green computing researchers should encourage IT personnel to keep updated and detailed records of what equipment is added to a building. Such record- keeping can ensure that estimates are meaningful and that equipment purchases are kept in check.

Step 4: Beyond Power Data

In addition to collecting and analyzing power data, we recommend a focus on addi- tional information about how computing systems are used. Complementing power data with other modalities is critical for understanding efficiency and pinpointing CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 104

waste. For computers, we find that CPU utilization numbers, list of common pro- cesses, and an understanding of common user tasks are valuable in assessing the true compute needs of users. On the network side, load data can reveal whether there is unused capacity that can be taken advantage of. Going past low-level utilization metrics, data on power-saving settings, building occupancy, and purchasing decisions can also add value to the power data analysis.

6.4 External Datasets: A Cautionary Tale

Until now, this section discussed how to get the most out of a power monitoring de- ployment, with minimum effort and maximum accuracy. Unfortunately, large data- gathering efforts like Powernet might not always be possible. Instead, IT staffmight rely on existing datasets to support equipment and infrastructure changes. Perhaps the largest, openly-available dataset of computing power draw is provided by En- ergy Star[5] which collects measurements from companies and product that seek its certification. Energy Star data should be used cautiously for several reasons. First, it is com- posed entirely of devices that have passed minimum energy efficiency requirements. This means that it does not reflect the distribution of devices sold and the data is self- reported. Furthermore, Energy Star measurements and certification do not consider PCs under load – they only deal with idle, sleep, and offstates. Recently, measure- ments have started including a peak power draw value but that is still not enough to tell what the expected power draw will be during typical usage. Figure 6.7 illustrates the divide between Energy Star data and the real-world measurements collected by Powernet. The Energy Star dataset has the benefit of a lot more data points and real-world distributions might shift from building to building. Despite having far fewer data points, the Powernet data set capture a greater variety of devices. The differences between the two datasets are easy to observe for both laptops and desktops. In figure 6.7(b), close to 100% of the 4,000+ Energy Star desktops fall below the 100-watt cutoff. In our measurements, that is the median PC power draw. CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 105

(a) Desktop computers

(b) Laptop computers

Figure 6.7: The Energy Star datasets for both dekstops and laptops show lower mean and median values for power draw. The difference in desktops is particularly large, misrepresing real worls. Additionally, the Energy Star dataset underestimates the energy savings a desktop/laptop switch could have. CHAPTER 6. METHODOLOGY GUIDELINES FOR ENERGY RESEARCH 106

The divergence in the laptop dataset is about 30% for the median power draw. The conclusion is that using Energy Star data in lieu of real measurements is likely to underestimate energy costs in most contexts. On the positive site, the data can be useful in advocating lower-power machines. Another way in which Energy Star data can be misleading is the comparison between the two types of computing devices. The 50th-percentile difference between desktops and laptops is much greater for the Powernet data – 83 watts compared to 34 watts. In an enterprise scenario where different hardware options are being weighed in, the Energy Star data set is likely to de-emphasize the tremendous savings to be had from switching away from desktops to using laptops instead. Different datasets will have different benefits and drawbacks; the key is under- standing how and what data was collected before analyzing it and using it to make decisions. Without a doubt, the best possible scenario is a repository of many and varied datasets that help build a more complete picture of computer energy consump- tion. Chapter 7

Conclusion

This dissertation presents a novel system architecture to greately improve the energy efficiency of enterprise computing. This architecture is informed by an extensive study of the energy cost and utilization of enterprise computing.

7.1 Contributions

The thesis put forward in this work is that:

Enterprise compute systems waste a huge amount of energy. A novel architecture can remove this waste without sacrificing per- formance.

We argued that in order to pinpoint waste, it is necessary to understand where electricity is going and what user workloads are powered by it. Therefore, Chapter 3 described a large-scale deployment, Powernet, that monitors the energy use and uti- lization of a variety of computing devices. Powernet comprises over 200 power meters, each connected to an individual piece of equipment and data are collected once per second, over many months. Such detailed, long-term data allowed us to study the energy profile of computing devices at various time scales. Software sensors provided

107 CHAPTER 7. CONCLUSION 108

utilization data in the form of CPU load and network traffic which was combined with power measurements to identify how energy is spent. Chapter 4 presented a de- tailed analysis of Powernet’s datasets, revealing the high cost of computing for even moderate workloads. Since the IT infrastructure of the Gates building is a single data point, represent- ing a desktop-heavy environment, we used Powernet to gather data from a different compute setup. Thin clients differ from more classical systems in that they have limited local resources. Instead, user workloads are consolidated on back-end servers, allowing for easier management and reduction in energy use. The existence of successful thin clients deployments in the enterprise is a proof that most office workloads do not require the resources of a desktops. Thin clients, however, limit productivity. Their complete reliance on a backend server means that no work can be done locally. For task that benefit from hardware optimizations or that require large transfers of data (e.g. video), thin clients underperform, hurting users’ experience. Furthermore, Powernet data showed that the power draw of thin clients is comparable to that of many laptops. Akeyideaexploredinthisdissertationisthatasystemcancombinethelow-power local resources of a laptop with the thin client’s ability to scale up performance and consolidate workloads via shared servers. Chapter 5 presented the design of Anyware, asystemthattradesofflocalcomputingpowerforasmallnumberofpowerfulservers and increased network utilization. The former is amortized across many clients, while the latter essentially comes for free as network power is independent of its utilization. Anyware applications run remotely by having very thin virtual machines hosted on remote servers that use the client’s local file system. Clients can automatically find remote Anyware resources using DNS with service discovery. Once a user’s machine is connected to the local network, Anyware is auto- configured and starts running applications remotely without user intervention. We derived an offload based on a logistic regression model of multiple applica- tion properties and a user study of perceived performance. Anyware’s design revealed the importance of energy efficient but fast I/O, suggesting a future personal comput- ing device centered around solid-state drives, low-power processors. An evaluation of CHAPTER 7. CONCLUSION 109

Anyware’s functional prototype showed that it reduces the energy costs of enterprise computing by over 75% without harming application performance and in some cases even improving it.

7.2 Going Forward

Anyware’s approach of distributed execution and centralized storage, potentially backed up to the cloud, is a novel design point in end-user computing space. It is motivated by the changing tradeoffs between energy, network use and speed, and workloads we see in enterprises today. Currently, Anyware’s execution placement algorithm makes a binary decision – ataskiseithersenttotheremoteserveroritisnot.Oneimportantdirectionfor future work is extending the model to have a classification output that provides more flexibility. If applications have different offloading priorities, one can formulate and test a larger set of policies. For example, if a spreadsheet application is classified remote but with low-priority, Anyware can consider other factors, such as server load, before making a decision. In this dissertation, we only explored Anyware’s performance with a single server. In practice, it is possible that a client machine has multiple servers to choose from. A combination of server load data and application priorities will allow the Anyware daemon to choose the best execution environment for each task. A spreadsheet appli- cation might be offloaded to a server that is already at 60% of its resource use, while a CPU-intensive tasks will be sent to a more lightly used backend machine. Server resources are only part of the equation. A more advanced algorithm will also incorporate the local state when making decisions. In a scenario in which the client machine is not doing much work, it might make sense to keep tasks local. If, however, the client is already using a lot of its local resources by executing a video player, a browser, and an email application, it makes sense to start offloading more aggressively. Finding the point at which more and more tasks become unusable locally is an interesting area of future research. In addition to adapting Anyware to the availability of local and remote resources, CHAPTER 7. CONCLUSION 110

we also consider adapting to users. The existing placement model is trained and tested on applications and it assumes that the same offloading decision is appropriate for all users. In practice, people interact differently with computers and have varying workloads. One idea is to seed Anyware with a default policy that evolves over time by taking user feedback and employing techniques. Lastly, the proliferation of mobile devices presents another opportunity to rethink enterprise computing systems. Today, the same office task can be accomplished on a variety of platforms, ranging from PCs and laptops, to netbooks that use exclusively the cloud, to tablets and smartphones. Anyware has the potential to take advantage of this array of devices. Each option offers a slightly different set of capabilities at different energy points and any one can play the role of an Anyware client. In the extreme, a user can have access to multiple servers and multiple clients. A challenging direction of future research is studying how Anyware can be modified to provide seamless computing between multiple types of devices. Going forward, it will be exciting to see how lower power devices (such as phones and tablets) fit into the vision of hybrid, elastic computing. Bibliography

[1] 3D Computer Benchmarking. http://en.wikipedia.org/wiki/3DMark.

[2] ADE7753: Single Phase Multifunction Energy Metering IC with di/dt Input. www.analogdevices.com.

[3] Arch Rock. www.archrock.com.

[4] Avahi Zeroconf Software. http://en.wikipedia.org/wiki/Avahi_ (software).

[5] Energy Star. http://www.energystar.gov.

[6] Epic: An Open Mote Platform for Application-Driven Design. http://www.cs. berkeley.edu/~prabal/projects/epic/.

[7] Google Chromebook. http://www.google.com/intl/en/chrome/devices/.

[8] Google Power Meter. www.google.org/powermeter.

[9] HP 3Par Architecture. http://h18006.www1.hp.com/storage/solutions/ 3par/architecture.html.

[10] IEEE 802.15: Wireless Personal Area Networks (PANs). http://standards. ieee.org/about/get/802/802.15.html.

[11] Logistic Regression. http://en.wikipedia.org/wiki/Logistic_regression.

[12] Lucid Design Group Building Dashboard. www.luciddesigngroup.com.

111 BIBLIOGRAPHY 112

[13] Micron 2Gb: x4, x8, x16 DDR3 SDRAM. DataSheet MT41J128M16HA-125, Mi- cron, 2010. http://download.micron.com/pdf/datasheets/dram/ddr3/2Gb_ DDR3_SDRAM.pdf.

[14] Microsoft Windows Remote Desktop. http://www.microsoft.com/.

[15] NX Distributed Computing Infrastructure. http://www.nomachine.com/ documentation/html/intr-technology.html.

[16] Plugwise. www.plugwise.com.

[17] Public Powernet Datasets. http://sing.stanford.edu/maria/powernet.

[18] RealVNC - VNC Remote Control Software. http://www.realvnc.com/.

[19] Sentilla Energy Management. http://www.sentilla.com.

[20] Speed Matters for Google Web Search. http://googleresearch.blogspot. com/2009/06/speed-matters.html.

[21] Sun Fire X4270 M2 Server by Oracle. http://www.oracle.com/us/products/ servers-storage/servers//sun-fire-x4270-m2-server-ds-079882. pdf.

[22] TelosB Wireless Mote. http://bullseye.xbow.com:81/Products/ productdetails.aspx?sid=252.

[23] VMWare Desktop . http://www.vmware.com/solutions/ desktop/.

[24] Wake-on-LAN Technology. https://en.wikipedia.org/wiki/Wake-on-LAN.

[25] Energy Star Computer Power Data. https://energystar.gov/products/ specs/node/143,2010.

[26] PGI&E Smart Meter. www.pge.com/smartmeter/,2013. BIBLIOGRAPHY 113

[27] Yuvraj Agarwal, Steve Hodges, Ranveer Chandra, James Scott, Paramvir Bahl, and Rajesh Gupta. Somniloquy: Augmenting Network Interfaces to Reduce PC Energy Usage. In NSDI’09,2009.

[28] Yuvraj Agarwal, Stefan Savage, and Rajesh Gupta. SleepServer: A Software- Only Approach for Reducing the Energy Consumption of PCs within Enterprise Environments. USENIX Annual Technical Conference,2010.

[29] Yuvraj Agarwal, Thomas Weng, and Rajesh Gupta. The Energy Dashboard: Improving the Visibility of Energy Consumption at a Campus-Wide Scale. ACM Workshop On Embedded Sensing Systems For Energy-Efficiency In Buildings, 2009.

[30] D.G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Va- sudevan. FAWN: A Fast Array of Wimpy Nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles,2009.

[31] Rajesh Krishna Balan, Mahadev Satyanarayanan, So Young Park, and Tadashi Okoshi. Tactics-based Remote Execution for Mobile Computing. In Proceedings of the 1st international conference on Mobile systems, applications and services, MobiSys ’03, pages 273–286, New York, NY, USA, 2003. ACM.

[32] R.A. Baratto, L.N. Kim, and Jason Nieh. Thinc: A Virtual Display Architecture for Thin-client Computing. ACM SIGOPS Operating Systems Review,39(5):290, 2005.

[33] Luiz Andr Barroso and Urs H¨olzle. The Case for Energy-Proportional Comput- ing. Computer,40(12):33–37,2007.

[34] Gowtham Bellala, Manish Marwah, Martin Arlitt, GeoffLyon, and Cullen E. Bash. Towards an Understanding of Campus-scale Power Consumption. In Pro- ceedings of the Third ACM Workshop on Embedded Sensing Systems for Energy- Efficiency in Buildings, BuildSys ’11, pages 73–78, New York, NY, USA, 2011. ACM. BIBLIOGRAPHY 114

[35] Nilton Bila, Eyal De Lara, Matti Hiltunen, Kaustubh Joshi, and HA Lagar. The Case for Energy-Oriented Partial Desktop Migration. 2nd USENIX Workshop on Hot Topics in Cloud Computing,2010.

[36] Richard Brown, Steven Lanzisera, Hoi Ying (Iris) Cheung, Judy Lai, Xiaofan Jiang, Stephen Dawson-Haggerty, Jay Taneja, Jorge Ortiz, and David Culler. Using Wireless Power Meteres to Measure Energy Use of Miscellaneous and Electric Devices in Buildings. Energy Efficiency in Domestic Appliances and Lighting,2011.

[37] Sean Carroll. The Cosmological Constant. http://preposterousuniverse. com/writings/encyc/encycpdf.pdf.

[38] S. Cheshire and M. Krochmal. DNS-Based Service Discovery. RFC 6763 (Pro- posed Standard), February 2013.

[39] H.Y. Iris Cheung, Steven Lanzisera, Judy Lai, Richard Brown, Stephen Dawson- Haggerty, Jay Taneja, and David Culler. Detailed Energy Data Collection for Miscellaneous and Electronic Loads in a Commercial Office Building. To appear in ACEEE Summer Study on Energy Efficiency in Buildings, August, 2012.

[40] Kenneth Christensen, Chamara Gunaratne, Bruce Nordman, and Alan George. The Next Frontier for Communications Networks: . Com- puter Communications,2004.

[41] Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti. CloneCloud: Elastic Execution between Mobile Device and Cloud. In Proceedings of the sixth conference on Computer systems,EuroSys’11.ACM, 2011.

[42] Byung-Gon Chun and Petros Maniatis. Augmented Smartphone Applications Through Clone Cloud Execution. In Proceedings of the 12th conference on Hot topics in operating systems, HotOS’09, pages 8–8, Berkeley, CA, USA, 2009. USENIX Association. BIBLIOGRAPHY 115

[43] Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. MAUI: Making Smartphones Last Longer with Code Offload. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, MobiSys ’10, New York, NY, USA, 2010. ACM.

[44] Tathagata Das, Pradeep Padaala, Venkata Padmanabhan, Ramachandran Ram- jee, and Kang Shin. LiteGreen: Saving Energy in Networked Desktops Using Virtualization. USENIX Annual Technical Conference,2009.

[45] Stephen Dawson-Haggerty, Steven Lanzisera, Jay Taneja, Richard Brown, and David Culler. @scale: Insights from a Large, Long-Lived Appliance Energy WSN. Conference on Information Processing in Sensor Networks, SPOTS Track,2012.

[46] Department of Energy, Annual Energy Review 2011. http://www.eia.doe.gov/ aer/,September2012.

[47] John R. Douceur, Jeremy Elson, Jon Howell, and Jacob R. Lorch. The Utility Coprocessor: Massively Parallel Computation from the Coffee Shop. In USENIX Annual Technical Conference,2010.

[48] Prabal Dutta, Jay Taneja, Jaein Jeong, Xiaofan Jiang, and David Culler. A Building Block Approach to Sensornet Systems. In In Proceedings of the Sixth ACM Conference on Embedded Networked Sensor Systems (SenSys’08),2008.

[49] Federal Agencies’ EPEAT Purchasing Takes Off. EPEAT Press Release, April 2009.

[50] Philip Levis et al. T2: A Second Generation OS For Embedded Sensor Networks. Technical Report TKN-05-007, Telecommunication Networks Group, Technische Universitat Berlin, 2005.

[51] Omprakash Gnawali, Rodrigo Fonseca, Kyle Jamieson, David Moss, and Philip Levis. Collection Tree Protocol. In Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems (SenSys’09), November 2009. BIBLIOGRAPHY 116

[52] Mark S. Gordon, D. Anoushe Jamshidi, Scott Mahlke, Z. Morley Mao, and Xu Chen. COMET: Code Offload by Migrating Execution Transparently. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12, pages 93–106, Berkeley, CA, USA, 2012. USENIX Association.

[53] Wayne D. Gray and Deborah A. Boehm-Davis. Milliseconds Matter: An In- troduction to Microstrategies and to Their Use in Describing and Predicting Interactive Behavior. 2000.

[54] Philip J. Guo. CDE: Run Any Linux Application On-Demand Without Instal- lation. In USENIX Large Installation System Administration Conference,2011.

[55] Brandon Heller, Yiannis Yiakoumis, Srini Seetharaman, Puneet Sharma, Priya Mahadevan, Sujata Banerjee, and Nick McKeown. ElasticTree: Saving Energy in Data Center Networks. In Networked Systems Design and Implementation. USENIX Association, 2010.

[56] C.H. Hsu and U. Kremer. The Design, Implementation, and Evaluation of a Algorithm for CPU Energy Reduction. In Proceedings of the ACM SIGPLAN 2003 conference on design and implementa- tion,page48.ACM,2003.

[57] Jonathan W. Hui and David Culler. The Dynamic Behavior of a Data Dissemi- nation Protocol for Network Programming at Scale. In Proceedings of the Second ACM Conference on Embedded networked sensor systems (SenSys),2004.

[58] Xiaofan Jiang, Stephen Dawson-Haggerty, Prabal Dutta, and David Culler. De- sign and Implementation of a High-Fidelity AC Metering Network. In The 8th ACM/IEEE International Conference on Information Processing in Sensor Net- work, San Francisco, CA, USA, 2009.

[59] Xiaofan Jiang, Minh Van Ly, Jay Taneja, Prabal Dutta, and David Culler. Ex- periences with a High-Fidelity Wireless Building Energy Auditing Network. In BIBLIOGRAPHY 117

Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems - SenSys ’09, page 113, New York, New York, USA, 2009. ACM Press.

[60] Maria Kazandjieva, Brandon Heller, Omprakash Gnawali, Wanja Hofer, Philip Levis, and Christos Kozyrakis. Software or hardware: The Future of Green Enterprise Computing. Technical Report CS TR 2011-02, Stanford, July 2011.

[61] Maria Kazandjieva, Brandon Heller, Omprakash Gnawali, Philip Levis, and Christos Kozyrakis. Green Enterprise Computing Data: Assumptions and Real- ities. In International Green Computing Conference,2012.

[62] Maria Kazandjieva, Brandon Heller, Philip Levis, and Christos Kozyrakis. En- ergy Dumpster Diving. In Workshop on Power Aware Computing and Systems (HotPower’09),2009.

[63] James J. Kistler and M. Satyanarayanan. Disconnected Operation in the Coda File System. ACM Trans. Comput. Syst.,10(1):3–25,February1992.

[64] P. Mahadevan, S. Banerjee, and P. Sharma. Energy Proportionality of an Enter- prise Network. In Proceedings of the first ACM SIGCOMM workshop on Green networking,pages53–60.ACM,2010.

[65] Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. Towards Energy-proportional Dat- acenter Memory with Mobile DRAM. In Proceedings of the 39th International Symposium on , ISCA ’12, pages 37–48, Piscataway, NJ, USA, 2012. IEEE Press.

[66] John C. McCullough, Yuvraj Agarwal, Jaideep Chandrashekar, Sathyanarayan Kuppuswamy, Alex C. Snoeren, and Rajesh K. Gupta. Evaluating the Effec- tiveness of Model-based Power Characterization. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC’11, pages 12–12, Berkeley, CA, USA, 2011. USENIX Association. BIBLIOGRAPHY 118

[67] Sergiu Nedevschi, Sylvia Ratnasamy, Jaideep Chandrashekar, Bruce Nordman, and Nina Taft. Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems. In Networked Systems Design and Implementation,2009.

[68] David Nichols. Using Idle Workstation in a Shared Computing Environment. Proceedings of the Eleventh ACM Symposium on Operating Systems Principles, SOSP’87,1987.

[69] Bruce Nordman and Ken Christensen. Improving the Energy Efficiency of Ethernet-Connected Devices: A Proposal for Proxying. Ethernet Alliance white paper,2007.

[70] Oracle. Sun Ray 2 Virtual Display Client. http://www.oracle.com/us/products/servers-storage/desktop- workstations/030726.htm,pages1–4,2010.

[71] Steven Osman, Dinesh Subhraveti, Gong Su, and Jason Nieh. The Design and Implementation of Zap: a System for Migrating Computing Environments. SIGOPS Oper. Syst. Rev.,36(SI):361–376,December2002.

[72] John K. Ousterhout, Andrew R. Cherenson, Frederick Douglis, Michael N. Nel- son, and Brent B. Welch. The Sprite Network Operating System. Computer, 21(2):23–36, February 1988.

[73] James Pierce, Diane Schiano, and Eric Paulos. Home, Habits, and Energy: Deconstructing Domestic Interactions and Energy Consumption. In ACM Con- ference on Human Factors in Computing Systems, Atlanta, GA, USA, 2010.

[74] Processor Performance and Power Data. http://www.notebookcheck.net.

[75] Asfandyar Qureshi, Rick Weber, Hari Balakrishnan, John Guttag, and Bruce Maggs. Cutting the Electric Bill for Internet-scale Systems. In SIGCOMM ’09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pages 123–134, New York, NY, USA, 2009. BIBLIOGRAPHY 119

[76] Joshua Reich, Michel Goraczko, Aman Kansal, and Jitendra Padhye. Sleepless in Seattle No Longer. USENIX Annual Technical Conference,2010.

[77] Suzanne Rivoire, Parthasarathy Ranganathan, and Christos Kozyrakis. A com- parison of high-level full-system power models. In Proceedings of the 2008 confer- ence on Power aware computing and systems, HotPower’08, pages 3–3, Berkeley, CA, USA, 2008. USENIX Association.

[78] Suzanne Rivoire, Mehul A. Shah, Parthasarathy Ranganathan, and Christos Kozyrakis. JouleSort: a Balanced Energy-efficiency Benchmark. In Chee Yong Chan, Beng Chin Ooi, and Aoying Zhou, editors, SIGMOD Conference,pages 365–376. ACM, 2007.

[79] Mahadev Satyanarayanan, Paramvir Bahl, Ram´on Caceres, and Nigel Davies. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Com- puting,8(4):14–23,October2009.

[80] Siddhartha Sen, Jacob R. Lorch, Richard Hughes, Carlos Garcia Jurado Suarez, Brian Zill, Weverton Cordeiro, and Jitendra Padhye. Don’t Lose Sleep Over Availability: The GreenUp Decentralized Wakeup Service. In Networked Systems Design and Implementation,2012.

[81] Power Management Software Helps Slash Energy Costs. http://www.windowsitpro.com/article/news2/ power-management-software-helps-slash-energy-costs.aspx,2008.

[82] After-hours Power Status of Office Equipment and Inventory of Miscellaneous Plug-Load Equipment. Lawrence Berkeley Laboratory, http://enduse.lbl. gov/info/LBNL-53729.pdf,2004.

[83] Steven Lanzisera, Stephen Dawson-Haggerty, H.Y. Iris Cheung, Jay Taneja, David Culler, and Richard Brown. Methods for Detailed Energy Data Collection of Miscellaneous and Electronic Loads in a Commercial Office Building. Building and Environment,65:170–177,2013. BIBLIOGRAPHY 120

[84] Yuwen Sun, Lucas Francisco Wanner, and Mani B. Srivastava. Low-cost Es- timation of Sub-system Power. In International Green Computing Conference (IGCC),pages1–10,2012.

[85] Andrew S. Tanenbaum and Sape J. Mullender. An overview of the amoeba distributed operating system.

[86] Marvin M. Theimer, Keith A. Lantz, and David R. Cheriton. Preemptable Remote Execution Facilities for the V-system. In Proceedings of the tenth ACM symposium on Operating systems principles, SOSP ’85, pages 2–12, New York, NY, USA, 1985. ACM.

[87] Niraj Tolia, Zhikui Wang, Manish Marwah, Cullen Bash, Parthasarathy Ran- ganathan, and Xiaoyun Zhu. Delivering Energy Proportionality with Non- energy-proportional Systems: Optimizing the Ensemble. In Proceedings of the 2008 conference on Power aware computing and systems, HotPower’08, pages 2–2, Berkeley, CA, USA, 2008. USENIX Association.

[88] Watt’s Up Internet Enabled Power Meters. https://www.wattsupmeters.com/ secure/products.php,2009.

[89] X11::GUI Test - Perl Package for User Emulation. http://sourceforge.net/ projects/x11guitest/.