Investigation into the dependency between resource utilization, power consumption and performance in Multimedia servers

Master thesis by Alaa Abdulkareem Hameed Brihi

Supervisor: Dr.-Ing. habil. Waltenegus Dargie

Advising Professor: Prof. Dr. rer. nat. habil. Dr. h. c Alexander Technische Universitat¨ Dresden Department of Science Chair for Computer Network

Augest 2012 i

Declaration of Authorship

In the following I, Alaa Brihi, want to state that this thesis titled ” Investigation into the dependency between resource utilization, power consumption and performance in Multimedia servers ” and the content presented in it are results of my own work.

All published works of others, which were utilized in this thesis, are referenced and can be found in the Bibliography.

Dresden 21.08.2012 ii

Abstract

Managing power consumption has become a major challenge of designing computer systems. Dynamic voltage frequency scaling (DVFS) is a technique, which aims to reduce the energy consumption of computing platforms by dynamically scaling the CPU frequency at run-time. The literature presents the usefulness of DVFS strategy in embedded systems, mobile devices, and wireless sensor networks. Recently, it has also been proposed for servers and data centers.

This thesis investigates experimentally the relationship between the power consump- tion, resource utilization and the performance of a multimedia server under various voltage and frequency levels are used. It investigates the applicability of DVFS to re- duce the power consumed in two Multimedia servers’ platforms based on AMD Athlon X2 and Intel Duo2Core processors. The servers were setup with two of more realistic workloads which are selected and evaluated. The workloads have two different scenar- ios. In the first I/O intensive workload scenario, the servers hosted requests to down- load video files of known and available formats. In the second CPU-intensive workload scenario, the server employed a transcoder to convert between AVI, MPEG and FLV formats before the videos were downloaded videos; in case of the required formats are unaccepted.

Through the achieved measurements, the experimental results indicated that there was not a meaningful relation between power consumption, CPU utilization and per- formance when the server runs the two different workload scenarios under different DVFS power management policies. Additionally, a comparison between AMD and In- tel server was made through calculation the energy-efficiency (EE) ratio, the results ob- served that under using frequency scaling policy, the AMD server was more energy efficient than the Intel one with I/O-intensive workload, while Intel server is more en- ergy efficient than AMD with CPU-intensive workload.

It was conclude that the optimal DVFS power management policy is not clear, and it depends on many factors; such as the type of the workload, processor’s design and the maximum server frequency. iii

Acknowledgment

This dissertation would not have been possible without the guidance and the help of many people who has in one way or another contributed and extended their valuable assistance in the preparation and completion of this study.

First and foremost, I would like to thank my supervisor, Dr. Waltenegus Dargie, for his support, worthy guidance and valuable comments in completion of this work.

There are also other people who provided an enormous support during my thesis. I would like to thank Dr. Marious feldmann for helping and providing me with his a valuable suggestions. I also thank my Energy lab mates. many Thanks to Mr. jainjun Wen for helping me in the configure the machine. He was always willing to help and his best suggestions

It would not have been possible to write this Master thesis without the help and sup- port of the kind people around me. I would like to thank Mrs.Rasha Faqeh, Mr.Muhammad Hassan Obeid and Mr. Mahmoud hafiez for their reviewing my writing

A special gratitude and love goes to my family for their unfailing support. I thank my parents for their love, and for inspiring me to complete my higher education with their continued emotional support, which has often proven to be the deciding factor for my successes. Also I would like to thank my husband, Mr. Abdulqader Shawaa, whose love, encouragement and belief made everything possible, he has enriched my life and made every day happier. Special thanks go to my lovely son, Mohammed, for his sweet smile, his patience and understanding.

Without their love, support and encouragement, I could never have gotten this far.

Last but not the least; I would like to acknowledge that this thesis is a part of project SFB-912/1 2011: Energy-Efficient Service Execution. Contents

1 Introduction1

1.1 Power consumption...... 2

1.2 Power management and energy efficiency overview...... 4

1.3 Power Management Techniques...... 5

1.3.1 Processor Power Management...... 7

1.4 Problem statement...... 9

1.5 Thesis Organization...... 10

2 Background and related work 11

2.1 Introduction...... 11

2.2 Data center...... 14

2.3 Server system...... 17

2.3.1 CPU power management...... 20 CONTENTS v

2.3.2 Operating system power management...... 24

3 Concept 29

3.1 DVFS support in Linux...... 29

3.1.1 CPUfreq Subsystem...... 30

3.2 Methodology...... 31

3.2.1 System architecture...... 32

3.2.2 System components...... 33

3.2.3 Measurement system...... 36

3.2.4 The experiment methodology...... 38

3.3 Summary ...... 39

4 The experimental results 41

4.1 Measurements...... 41

4.1.1 Power Consumption...... 42

4.1.2 CPU utilization...... 57

4.1.3 Performance...... 59

4.2 Experiments analysis...... 60

4.3 Summary...... 75

5 Conclusions 77 vi CONTENTS

5.1 Summary and Conclusion...... 77

5.2 Future work...... 80 List of Figures

1.1 All system power states as defined by the ACPI specification [33].....7

2.1 Moore’s law - the doubling of transistors count every two years.[43].... 12

2.2 worldwide expense to power and cool the Server Installed Base, 1996- 2010[45]...... 15

2.3 Server Power Consumption according to Intel Lab [4]...... 17

3.1 A high-level view of the CPUfreq subsystem...... 31

3.2 General architecture of experiment...... 32

3.3 The server environment...... 35

3.4 The experiment scenarios...... 37

4.1 General Diagram of the main components in a motherboard...... 43 viii LIST OF FIGURES

4.2 The cumulative power consumption of overall power consumption of AMD server when (left) without transcoder : it runs Apache only. (right) with transcoder: it runs both Apache and the FFmpeg transcoder..... 45

4.3 The cumulative power consumption of overall power consumption of Intel server when (left) without transcoder : it runs Apache only,(right) with transcoder: it runs both Apache and the FFmpeg transcoder..... 45

4.4 The cumulative power consumption of 12 V CPU of AMD server when (left) without transcoder, (right) with transcoder...... 47

4.5 The cumulative power consumption of 12 V CPU of intel server when a) without transcoder, b) with transcoder...... 48

4.6 The cumulative power consumption of 5 V supply line to motherboard of AMD server when (left) without transcoder, (right) with transcoder.. 49

4.7 The cumulative power consumption of 5 V supply line to motherboard of Intel server when (left) without transcoder, (right) with transcoder... 49

4.8 The cumulative of the memory used by the kernal in AMD server us- ing the four DVFS policies : (left) cache memory (right) virtual memory without transcoder workload...... 51

4.9 The cumulative of the memory used by the kernal in AMD server using the four DVFS policies : (left) cache memory (right) virtual memory with transcoder workload...... 51

4.10 The cumulative power consumption of 3.3 V supply line to motherboard in AMD server (left) without transcoder, (right) with transcoder...... 52

4.11 The cumulative power consumption of 12 V supply line to motherboard of AMD server (left) without transcoder, (right) with transcoder...... 53 LIST OF FIGURES ix

4.12 The cumulative power consumption of 12V supply line to Hard Disk in AMD server (left) without transcoder, (right) with transcoder...... 54

4.13 The cumulative power consumption of 12V supply line to Hard Disk in Intel server (left) without transcoder, ( right) with transcoder...... 54

4.14 The cumulative power consumption of 5V supply line to Hard Disk in AMD server (left) without transcoder, (right) with transcoder...... 55

4.15 The cumulative power consumption of 5V supply line to Hard Disk in Intel server (left) without transcoder, (right) with transcoder...... 55

4.16 The cumulative of a) read operation, b) write operation of Hard Disk in AMD server without transcoder...... 56

4.17 The cumulative of a) read operation, b) write operation of Hard Disk in AMD server with transcoder...... 56

4.18 The cumulative CPU utilization average in AMD server (left) without transcoder and (right) with transcoder...... 57

4.19 The cumulative CPU utilization average in intel server (left) without transcoder and (right) with transcoder...... 58

4.20 The cumulative of The detail of CPU utilization average in AMD server with transcoder , CPU stats (system, user, idle, wait, hardware interrupt, interrupt)...... 58

4.21 The cumulative of the total power consumption with different workload in (left) AMD server and (right) Intel server...... 64

4.22 The relation of power consumption ,resource utilization and throughput. 67

4.23 The cumulative of power consumption in cases start and shutdown AMD server...... 68 x LIST OF FIGURES

4.24 The cumulative of the overall power consumption for the Intel and AMD servers on the Halt state...... 69

4.25 The cumulative of the overall power consumption for the Intel and AMD servers on the Sleep state...... 69

4.26 the DC power consumption for the AMD server (left)without transcoder and (right) with transcoder...... 70

4.27 the DC power consumption for the AMD server (with maximum fre- quency)with transcoder workload...... 71

4.28 The cumulative of the CPU power consumption for serving requests of different file sizes separately, (left) without transcoder scenario and (right) with transcoder scenario...... 73

4.29 The cumulative of the 12 V CPU power consumption for serving differ- ent request sizes in (left) without transcoder scenario and in (right) with transcoder scenario...... 74

4.30 The cumulative of comparison power consumption between C-state and P-state with without transcoder workload in Intel server...... 75 List of Tables

3.1 Specifications of the used servers system...... 34

4.1 A comparison of the throughput of AMD multimedia server with and without transcoding...... 59

4.2 A comparison of the throughput of Intel multimedia server with and without transcoding...... 60

4.3 A comparison of the power, CPU utilization and throughput of AMD multimedia server without transcoding workload...... 60

4.4 A comparison of the power, CPU utilization and throughput of AMD multimedia server with transcoding workload...... 61

4.5 A comparison of the power, CPU utilization and throughput of Intel mul- timedia server without transcoding workload...... 61

4.6 A comparison of the power, CPU utilization and throughput of Intel mul- timedia server with transcoding workload...... 62

4.7 A comparison of the total and CPU power consumption, CPU utilization on servers with different workload...... 63 xii LIST OF TABLES

4.8 A comparison of the Energy Efficiency (EE) ratio between AMD and Intel servers without transcoder workload...... 71

4.9 A comparison of the Energy Efficiency (EE) ratio between AMD and Intel servers with transcoder workload...... 72 Chapter 1

Introduction

The power consumption and the energy-cost of servers have increased over the time. Therefore, power management of the server’s environments is becoming increasingly important in the daily life. Hereby power management can be in general defined as a process where the power is directed efficiently to the different components of the system and the hardware are managed in way that saves power.

Aim of the thesis work is to analyze the power management in server systems. Ex- perimentally the scope of application and usefulness of Dynamic Voltage and Frequency Scaling (DVFS) in a realistic Multimedia server environment is investigated. Effects of different DVFS policies on the power consumption, performance and hardware re- sources utilization are examined.

The first chapter deals with basic information about power consumption, power management, energy efficiency and the technologies used in the implementation will be introduced. The thesis is presented along with the problem statement and overview of the thesis. 2 Chapter 1. Introduction

1.1 Power consumption

In electrical circuits, when an electric current passes through a resistor or any electric appliance, the power P is the product of the electric potential difference V by the current I where P measured in watt,V measured in volts, and I measured in amperes.

P = V. I (1.1)

While the energy E is defined as the multiplication of the power and the time t , Where E is measured in joules and the time t in seconds.

E = P. t (1.2)

From equation (2), the power can be defined as the energy consumed per unit of time: P = E / t (1.3)

Therefore, reducing the power will lead to a reduction in the amount of energy con- sumed in a given time.

Due to the fact that recent processors are made up by hundreds of millions of tran- sistors, power consumption in computing systems has become a critical issue. A large amount of power dissipation occurs when the transistors are switching between on and off states, which will increase power consumption and cooling requirements. There are two types of power consumption in Complementary Metal Oxide Semiconductor (CMOS) technology: static power consumption and dynamic power consumption. The static power consumption arises from the leakage of a transistor’s bias currents. This becomes increasingly significant whenever transistors become smaller and faster. This is a challenge for the circuit designers. To reduce the static power, there are Static Power Management (SPM) techniques which deal with regulating power consumption in idle periods, by keeping the system in a power-efficient state, while maintaining the states of operating system and applications according to pre-defined policies [29] . Also there 1.1. Power consumption 3

are many researches in this area. The most recent work introduced by [42], proposes a new method to reduce static power by reducing leakage power in CMOS Very Large Scale Integration (VLSI) circuit.

The dynamic power consumption occurs due to the charging and discharging of load capacitance , as well as short circuit1 power dissipation. So far, no method has been found to reduce the value of short-circuit current consumption without compro- mising the performance. The charging and discharging of capacitance are considered to be the main origin of dynamic power consumption. Therefore, the dynamic power consumption can be defined as the following:

2 P ower = α .C .V .F (1.4)

Where α is the switching activity, C is capacitance, V is the supply voltage, and F is the clock frequency. To reduce the dynamic power there are Dynamic Power Manage- ment (DPM) techniques.

There are several components which consume the total power of the server. A recent research demonstrates that in currently used server systems the main consumers for power are processors, followed by memory, then disks, the motherboard, and lastly the fan and networking interconnects [5].

Data centers use huge number of servers and computing devices for data processing and data storage, which has led to increased usage of electricity. Improvements in the sectors energy efficiency can provide significant energy savings. In modern hardware and operating systems there are power management features for reducing power con- sumption which require software control. This software controls the power state of the hardware to save energy when its not in use.

1Which is a low-resistance electrical circuit that allows a current to travel along a different path from the one intended 4 Chapter 1. Introduction

1.2 Power management and energy efficiency overview

The amount of electricity used by servers has become significant in recent years and with rising energy costs. It is highly desirable to reduce the waste of energy of servers. Therefore, the focus of the computer system design has been shifted toward power and energy efficiency, especially in data centers when using internet services like down- loading multimedia files or internet communication programs. Additionally, significant amounts of energy are consumed to support power delivery and cooling systems. The problem is much worse for data centers, as huge number of servers and computing de- vices for data processing and data storage are used, which leads to a significant increase in electricity usage. In a report issued to Congress by the EPA (Environmental Protec- tion Agency) in the USA in 2007 on energy efficiency of servers and data centers it was reported that data centers in the USA consume about 61 billion kWh for a total elec- tricity cost of about 4.5 billion dollars, accounting for 1.5 percent of total U.S. electricity consumption [19]. In PC Energy Report 2009, it was reported that approximately 30% of PCs in Germany were left ”on” overnight, resulting in an estimated annual energy waste of 4.8 billion KWH each year, costing 919 million Euro[28].

One of the most controversial discussed topics of today is the global warming, which is due to increasing of CO2 (carbon dioxide). To avoid the worst effects of global warm- ing in the future CO2 emissions have to be reduced. That means that the total con- sumption of fossil fuels has to stop the climate change and environmental pollution [24]. In 2007, Information and communications technology (ICT) sector reported that

IT-energy-demand accounts for approximately 2% of global CO2 emissions, whereby 23% emissions out of it are related to servers and data centers[26]. The recent research demonstrates that the emissions from the ICT sector are going to increase significantly over the coming years. This rise imply that it is meaningful to the ICT sector to make impressive advances in energy efficiency which may be accomplished by improving the energy efficiency of its products and services, and that could reduce carbon and increase cost-saving. For example, from 2007 to 2010 Intel reduced IT-related CO2 emissions by 50% and reduced the absolute carbon footprint by 20% by using the most energy effi- cient IT equipment[25]. 1.3. Power Management Techniques 5

Therefore, to deal with problems of rising energy consumption and increasing re- source scarcity, it is needed to develop more sustainable sources of clean and renew- able energy. To mitigate the climate change and its impacts energy the potentials of efficiency are needed to be excessively extended and economical, practical and techno- logical measurements have to be taken to maximize energy efficiency and decrease the general energy use[1]. Significant reductions in the operating costs can be realized by using an intelligent power management with energy efficient components.

Power management is meant to turning off the power or switching the system to a low-power state when being inactive [2]. To know what the energy efficient compo- nents of IT systems are, first of all the energy efficient concept in the discipline IT has to be defined.Energy Efficiency is the optimization of energy usage. In a computer sys- tem, energy efficiency deals with minimizing the energy necessary to perform a task. Energy efficiency can be increased by improving the control of energy-using devices and systems. Therefore, using the present energy efficient sources leads to minimum wastage of power and achieves better performance; which is the goal of energy man- agement. Therefore, the system has to adjust the hardware resources dynamically to perform tasks by using power management software.

1.3 Power Management Techniques

Lots of research work has been done in the area of power and energy-efficient-resource management in computing systems. Power management is done by switching the hard- ware component between high and low-power states. For example, in high-power mode, the (CPU) is fully active and operational and it can be run at its highest frequency to complete a task in a short time, but with high power consumption. While in low-power mode the CPU can reduce power consumption by going into a deep sleep state after the CPU completes its work and finishes execut- ing its instructions. These can be done by using DVFS and sleep states which will be explained later. Therefore, different power management technologies which include process, circuits, architecture, platform, methodology and system level software to con- 6 Chapter 1. Introduction trol a system’s energy and reduce power consumption are introduced in times of low utilization.

As already mentioned in section 1.1, power management techniques can be divided into static and dynamic power management. DPM techniques are classified in [41] into hardware level and software level. Hardware DPM techniques themselves are classified as Dynamic Performance Scaling (DPS), such as DVFS, and Dynamic Component Deac- tivation (DCD) during periods of inactivity. While, in software DPM techniques utilize an interface to the system’s power management and apply hardware DPM. For exam- ple, Advanced Power Management (APM) was developed by Intel and Microsoft and released in 1992. It enables the operating system to work with the basic input output system (BIOS) of the PC to achieve a power management. But it suffered from many problems, in 1996, Hewlett-Packard, Intel, Microsoft, Phoenix and Toshiba developed the Advanced Configuration and Power Interface (ACPI) . It is an open industry power management which focuses on OS-based power management. It aims to consolidate, check and improve upon existing power and configuration standards for hardware de- vices [27].

The hardware platforms are based on CPUs and related components. Therefore, ACPI defines power states for the system and its different devices. These levels start with the global power state G0 up to G3. Where, G0 is the working state and G1 till G3 designate the different sleeping states. In G0 there are power states per-device, called D-states (D0-D3) and CPU power states called C-states (C0-C3). While a device or pro- cessor operates (D0 and C0, respectively), it can be in one of several power-performance states, called P-states which affect the components operational performance while run- ning. In C0 there is also another power state which is Throttling state, called T-state as shown in Figure 1.1.

Modern operating systems, like Linux, are using ACPI to improve their decisions related to power management. The optimal decisions related to power management depend on both the platform and the workload. To know how much power a computer consumes, many things should be known, like what hardware (the resource utilized), operating system and application is used in this computer and how much power each one of these components consumes. 1.3. Power Management Techniques 7

Figure 1.1: All system power states as defined by the ACPI specification [33]

1.3.1 Processor Power Management

The CPU is one of the largest consumers of energy. For this reason, the processor is the target of power management. Current processors feature flexible power management technologies and provide a variety of power saving. Some features are hardware trig- gered and some are considered to be used by the operating system. There are a number of techniques to control CPU power, such as processor P-states and C-states which were mentioned previously in ACPI power states.

Processor P-state (operational state) is the ability of running the processor at different operating voltage and frequency levels at run time[44] . This ability is a widely applied technique for adjusting performance. When clock speed is slowed, operating voltage levels can also be reduced; this is called DVFS (dynamic voltage and frequency scaling) which will be described later.

Processor C-state (Processor sleep states) is the processor capability to go in different idle states[44]. That are states in which a processor enters when the system is idle, and 8 Chapter 1. Introduction the power of the CPU goes down to various degrees.

Another processor feature is Intel Dynamic Acceleration (IDA) [30]. During execu- tion of single threaded applications, this feature allows one processor core to deliver extra performance by increasing the clock rate, while the other core is idle. Intel later developed an improved version of IDA called Turbo Boost Technology. It allows the processor to run above its base operating frequency via dynamic control of the CPU’s ”clock rate”[31].

The tickless kernel project [32] in Linux introduced an initial implementation of a dynamic tick. The timer tick happens periodically, whether CPUs are idle or busy. This costs quite a lot of energy, causing unnecessary power consumption in servers. The pe- riodic timer prevents the CPU from remaining idle long enough to enter power saving states, thus increasing overall system power consumption. By reprogramming the per- CPU, to eliminate clock ticks during idle time this problem is solved.

Dynamic Frequency and Voltage Scaling

Dynamic Frequency and Voltage Scaling (DFVS) is a power reduction technique used to reduce both dynamic and static power consumption. With this technique, the CPUs frequency is adjusted to power saving purposes.

The basic motivation for dynamic voltage and frequency scaling (DVFS) is captured by Ohms Law, the dissipated power will be approximately proportional to the square of the voltage

P owerα V oltage2 ∗ F requency

In case of low frequency, the power consumption will be lowered proportionally. At low voltage, the power consumption will drop quadratically. This is due to lowering the voltage causes lowers the applicable frequency on which the chip can carry on [5].

The CPU should dynamically change its frequency according to the current system 1.4. Problem statement 9 load. This technique is supported by the most modern CPUs including mobiles, desk- tops (laptops) and server systems. An increasing the number of processors that im- plement DVFS can lead to quadratic energy savings for systems which the CPU is a dominant power consumer [17].

The Linux kernel provides a pseudo file system that can be used to access various components of the operating system. For example, sysfs pseudo-file system presents a frequency scaling interface, it provides several governors that react differently to changes of system load.

The advantage of frequency scaling, since it done in the hardware, is not causing overhead. Further, the disadvantage of frequency scaling is the limited number of scal- ing levels available, which could not be sufficient for some experiments. Another dis- advantage is that not possible to change the frequency of each core independently on most CPUs [9].

To find a good balance of power saving and performance degradation, frequency and voltage scaling is used. In the scope of this thesis, the effectiveness of DVFS on Mul- timedia server systems will be analyzed. However, the effectiveness of DVFS depends on the workload and the processor design. The aim is to cover different workloads and platform classes to determine which frequency should be used to provide energy efficiency.

1.4 Problem statement

While the large consumer for power is the CPU, todays CPUs support the dynamic frequency scaling. This thesis represents DVFS power management technique in Mul- timedia server systems. It attempts to investigate the dependency of resource utiliza- tion, power consumption and performance on the servers. This will be done by apply- ing DVFS power management policies on the CPU server. The aim of the thesis is to evaluate how frequency scaling affects the performance and the power consumption of workloads which have different characteristics and was made to investigate the follow- 10 Chapter 1. Introduction ing question when applying different DVFS power management polices: How does the power consumption of the CPU changes for a given workload as different power man- agement policies are applied to the server based on experimental results? The work is based on experimental results.

To evaluate employed polices, synthetic workloads are implemented to simulate YouTube as website because of widespread use of this service in data centers. This investigation will discuss detailed power consumption analysis of complete computer systems and some individual components.

The response of two different platforms to running different workloads was studied. For each platform, the processor executed the same workload for different frequencies. Then, depending on the relation between resource consumption, performance and en- ergy consumption, it can be decided which frequency should be used to provide optimal energy efficiency.

1.5 Thesis Organization

The thesis is organized as following

• Chapter2 provides related work, summaries and an analysis of prior work in the power management research area.

• Chapter3 describes the concept, architecture and methodology of the experimen- tal work.

• Chapter4 describes the experiments measurement and provides evaluation of the results.

• Chapter5 provides conclusions with a brief summary along with suggested future work. Chapter 2

Background and related work

Recently, many researches on power management and energy-efficient resource utiliza- tion have been conducted. Throughout this chapter, background information is pre- sented which is fundamental for understanding the work accomplished. Section 2.1 introduces an overview of the development in power management techniques as well as their effectiveness. Section 2.2 summarizes some approaches used to achieve power control within different levels of abstraction in data center systems. Afterword, sec- tion 2.3 focuses on the single server system and discusses different power management approaches at the server level.

2.1 Introduction

According to Moore’s Law, the number of transistors on a chip would double every two years as shown in Figure 2.1. Moreover, the sizes of semiconductor devices are slightly increased. As a result, this increases the power loss and the temperature evolu- tion inside the device structure designs. This inevitably will lead to a growing energy consumption, which is used to cool the system.

Therefore, the problem of power consumption leads to wide research studies by 12 Chapter 2. Background and related work

Figure 2.1: Moore’s law - the doubling of transistors count every two years.[43] computer architects and becomes a critical design constraint for all classes of systems. Different studies are conducted in the field of power management in order to mitigate the power consumption problems. In general, they target to increase the performance and to reduce the power consumption.

The initial studies on power management and energy-efficient resource techniques have been applied to mobile devices, which are battery-powered and widely used in embedded systems[48], wireless systems and wireless sensor networks[35, 49] . Due to the continuous increase in functionality, complexity of systems and applications inte- grated in mobile devices, reducing the power consumption and extending battery life time have become a critical aspect of designing battery-powered systems.

Furthermore, the development in processors, display technologies, rising complex- 2.1. Introduction 13 ity of applications and the computational demands have raised considerable interest in reducing the amount of power they consume. However, the system performance is still required to be at a satisfactory level. The resulting scenario requires power management techniques at various levels of abstraction.

To evaluate the effectiveness of these power-management techniques, different tests need to be conducted. These tests may be done either by developing power modeling, which allows accurate prediction of the systems power consumption at any given time, or by taking measurements at various points in the system in order to obtain profile data. To provide information related to certain component hardware, the profile data can be analyzed. For example, one focus might be on measuring the AC current of an entire computer [67, 74, 83], while others might be taking measurements at isolated power supplies leading directly to various points in system and measuring the DC cur- rent in specific systems like the CPU package [3, 84]. The results of different approaches have different interpretations in measurements. These measurements can be used as parameters in power models.

Several approaches used performance-monitoring unit (PMU), which enables online measurement of different events such as cache misses, unhalted CPU cycles, etc. Bel- losa [82] propose using performance-counter events which are measured by the PMU to model processor power consumption. Performance-counter measurements are recorded for each process running in the system at run-time. These measurements are used to demonstrate the correlations between performance events (instructions/cycle, memory references, cache references) and power consumption. Bircher et al.[68] also use perfor- mance counters to present power measurements of the complete system. They present a simple linear model for the Pentium4 based on the number of instructions with the average error less than 9%. The use of performance counters and the proposed runtime thermal model configured for the Pentium 4 processor is extend by Lee et al. [81]. In ad- dition they test the thermal behavior of applications and explain the potential benefits of using this model for temperature aware research.

Another approach uses power measurement sensors, which allows for the measur- ing of individual components of the system separately. For example, Carroll and Heiser [51]analyze the power consumption of OpenMoko Freerunner smart-phones, an open- 14 Chapter 2. Background and related work source mobile hardware platform. They inserted sense-resistors on the power supply rails of the relevant components in order to measure the current. They developed a power model based on the resulting measurements and analyzed the energy usage and battery lifetime under a number of usage patterns.

Different power models for different systems have been developed to measure power consumption of different hardware and software components. However, the focus of power management techniques has moved beyond the increase of battery life time in mobile and embedded systems to server and data center systems due to the continu- ously rising power and energy consumption of these systems.

2.2 Data center

Data centers consist of a large number of computing servers with high power con- sumption, generating more heat, which leads to a greater probability of thermal fail- ure. Thereby, additional energy for cooling is required. Additionally, the computational capacities are increasing due to deploying multiple CPU packages in modern systems, which raises the power consumption. Therefore, operational power consumed in exe- cuting tasks and cooling has become a major concern. Figure 2.2 presents the general trend in data centers of increasing the expenditures for power, cooling, and new servers from 1994 until 2010.

In this situation power control is becoming a key concern for modern data center operators. It addresses the challenges of the data center that is related to the reduc- tion of all energy-related costs. A multitude of approaches have been developed in the power control field. Parolini et al.[53] classify these approaches into three typical scales, data center level, group (cluster) level and server level. The three classifications can be concluded as:

• Data center level:

Various approaches at this level have been concerned with the workload manage- 2.2. Data center 15

Figure 2.2: worldwide expense to power and cool the Server Installed Base, 1996- 2010[45]

ment system in the data center [55, 56, 57, 58]. To save energy by migrating work- load to a location of lower real-time electricity prices is discussed by Quershi et al.[55]. They observed that the nature of geographical and temporal differences in the price of electricity offer an opportunity to reduce the cost of servicing requests of large distributed systems and to save millions of dollars per year in electricity costs.

On other hand, other approaches focused on the idle server. Most of the energy of data centers is wasted. It should be noted that between 10% and 50% of their maximum utilization, the servers in data centers are busy and most of the time are completely idle[59]. In order to maximize the servers energy efficiency, it would be desirable to turn off the servers when they are not in use. The drawback to turning off the servers is setup time, which is the time that the server requires to turn on that will lead to energy loss. Barroso and Holzle¨ [59] solve this problem by energy-proportional computers, in which, energy consumption is proportional to the usage of some underlying resource. 16 Chapter 2. Background and related work

• Group Level: A significant body of work has been achieved on power management at this level as indicated by Wang et al. [52]. They focus on the feedback control algorithms for the unified power management of a group of servers by using DVFS as the knob. They developed a power capping algorithm, which was designed from experi- mental data on real servers. They evaluated the integrated architecture by using trace-driven simulation. The overall results showed that the servers under inte- grated control algorithms provided reasonable trade-off between power capping, efficiency and application performance for various production workload traces. Another approach has been proposed on load balancing on server clusters by Dargie and Schill[54]. The relationship between power consumption, resource utilization, and throughput of a multimedia server cluster under different load balancing policies is investigated. They found through experiment, that the pro- cessors of the multimedia servers were not fully utilized and that the bandwidth utilization of the cluster was always near saturation. To deal with these prob- lems,they investigated two approaches. The first approach was to increase the bandwidth from 1 Gbit/s to 10 Gbit/s which doubled the throughput while in- creasing the power consumption by 1.25%. The second approach was to scale the CPU frequency while leaving the bandwidth as it was. The authors observe that using DVFS to scale CPU frequency does not affect the overall throughput of IO- bound applications, as well as reduces the power consumption of the servers in about 12% of the overall power consumption.

• Server Level: System designers, when designing traditional servers, do not consider energy- savings the most important at the server level design [60]. The reasons for that are firstly related to the differences between the power and the energy terms, which have been explained thoroughly in the previous chapter. Where a high-power server that consumes doubles the power of a mobile node but completes a task faster, it will consume the same amount of energy for this task as the mobile node. Second, in the computing world, reducing the power will reduce the performance. From the customers point of view such reduction in power and performance is not acceptable. They prefer to deal with low-latency systems. However, now it 2.3. Server system 17

is becoming more interesting to replace the traditional server nodes with high- performing low-power nodes. Much of the prior work [5, 12] attempted oversimplifies the problems in data cen- ters by improving the energy-efficiency of an individual servers components and controlling a single server independently from others. The next section provides more detail about the research works at the server level.

Figure 2.3: Server Power Consumption according to Intel Lab [4]

2.3 Server system

According to data provided by Intel Labs [4], the main power consumers in a server are the processors and the memory, followed by power supply, which wastes energy by converting AC power into DC. Disk drive power only becomes significant in servers with several disk drives. Figure 2.3 shows how power is consumed on average within an individual server.

Different approaches were developed to reduce the power consumption on the server level. Current power management research in the platform level (whether it was indi- vidual node or server) are divided into the following three categories [46]: 18 Chapter 2. Background and related work

1. System component level

For this level, researchers have proposed different approaches to reduce the en- ergy consumption of different hardware components, such as the processor, mem- ory, etc. Some approaches focus on a single component and other approaches in- teract with multiple components.

For single component, different power saving techniques have been presented. For example, on DRAM memory systems, Mini-rank [61] describe techniques that can improve bandwidth and power efficiency, but with latency cost, by reducing the number of devices involved in each memory access and taking advantage of fast low power modes. On disks, Energy Efficient Disk (EED) [63] drive architec- ture was proposed. It can reduce energy consumption significantly with improv- ing performance. The EED integrates a relatively small non-volatile flash memory into a traditional disk drive. Real trace driven simulations were employed to val- idate the EED disk drive architecture. The simulation result shows that the EED reduces the number of program/erase calls and extends the life span of the flash memory.

On caches, [64] proposes several power-aware storage cache management algo- rithms that provide more opportunities for the underlying disk power manage- ment schemes to save energy. Algorithms proposed are off-line energy-optimal cache replacement algorithm, off-line power-aware greedy algorithm and two on- line power-aware algorithms ( PA-LRU and PB-LRU). These algorithms can save up to 22% of disk energy and provide up to 64% better average response time.

At multiple component approaches, Li et al.[62] develop a joint-adaptation algo- rithm for power adaptation between the processor and memory. The goal is to reduce the total energy consumed within a specified performance loss. Another approach is reported by Li et al. [66] on a disk and memory. They determined the limitations of control algorithms for memory and disk power management, which are related to manual tuning of thresholds and limitations on performance guar- antees. They addressed the limitation by proposing two control algorithms with performance guarantees. The Performance-Directed Dynamic (PD) algorithm is a self-tuning, heuristics-based energy management algorithm, which features dy- namically adjustable thresholds, but it is limited to Disk management due to com- 2.3. Server system 19

plexity. To improve the PD, the Performance-Directed Static (PS) algorithm is pro- posed, which is a simple threshold-free control algorithm.

Power-Nap et al. [96] proposed an approach of fine-grain power management based on fast switching between two power states, active and low power (sleep) states, in order to minimize power consumption for each system component while the server is in idle state. In contrast, barroso and holzle[¨ 59] approach address this problem by achieving energy-proportional computing, which can be accom- plished by developing machines that consume energy in proportion to the amount of work performed.

2. Application level

Concerning the application level, several approaches were studied to enforce power management based on the workload characteristics. For example, application- aware power management[65] was developed, which incorporates processor per- formance counters for monitoring critical workload characteristics, event counter- based power, performance projection models and low-overhead DVFS-based p- state change mechanisms. Two power management solutions are presented. The first one is PerformanceMaximizer (PM), which finds the best possible perfor- mance under specified power constraints. The second one is PowerSave (PS), which saves energy while keeping performance above specified requirements.

3. Operating system level

To reduce power consumption, the Operating System (OS) power management approach is used. The main advantage of OS is that it controls all the resources in the system and it can turn off unused devices to reduce the power consump- tion. Therefore, adding power management to operating system design can re- duce overall power consumption to particular system. for the operating system to manage power effectively, it requires algorithms and heuristics that let the operat- ing system make good decisions about what to shut down and when.

In this thesis, the processor as a single component is of interest. The next subsection 2.3.1 will discuss different approaches to processor power management, and will present 20 Chapter 2. Background and related work different approaches which used the DVFS algorithm to save processor power. In sub- section 2.3.2, an operating system power management will presented. It provides an overview of several frameworks that have been developed for OSs, including those that Linux currently uses to leverage the mechanisms available on modern CPUs.

2.3.1 CPU power management

In recent years, processor power management has been an area that is getting a lot of attention and the most addressed aspect is at the server level. Different techniques have been introduced by system designers to improve energy efficiency and thermals. Mod- ern processors and chip-sets have evolved to include a numerous of features intended for power management. Some of these features are hardware triggered and some allow the OS to trade performance for reduced power consumption.

Processor power management approaches:

Various studies have been done on processor power management by using a performance- monitoring unit (PMU) which is in most modern CPUs. PMU design consists of a num- ber of counters and configurable registers to measure events that are significant to the energy consumption such as cache misses, unhalted CPU cycles, etc. PMUs can be used as parameters in models for the estimation power consumption and execution time. Isci and Martonosi [67] used the PMU on the Pentium 4 system to determine the power consumed by individual functional-units within the processor. They present runtime power modeling with high accuracy.

Bircher and John [85] presented a power and performance analysis of dynamic power adaptations in a Quad-Core AMD processor. They analyzed indirect and direct per- formance effects of P-states and C-states, taking into account two different workloads compute-bound and memory-bound. They studied the effect of the idle core frequency in power and performance of the active core by changing the frequency of idle core. They concluded that direct effect due to slow transitions by the operating system be- tween idle and active operation, cause significant performance loss and reduce power 2.3. Server system 21 saving. The second conclusion is an indirect effect, they found significant performance reduction due to the shared power-managed resources (like cache) if idle core frequency reductions are not limited. Furthermore, complete system measurements are presented and it was observed that the processor core and the disk consume the most power. They proposed a power management configuration and policy that has an average power re- duction of 30 % with an impact on performance of less than 3%.

Other research focused on the processor inactive state (Idle state). Pallipadi et al.[95] introduced the cpuidle generic processor idle management framework in the Linux ker- nel. The knowledge of the next timer deadline in the scheduler is used to match with C-state latencies and to transition to the deepest possible one; the main aim is to get maximum possible power advantage with little impact on performance. But this ap- proach failed to address the effects of cache flushing due to aggressive C-state switching; Amur el al.[86]introduced IdlePower methods which addressed the effect of aggressive C-state switching. The trade-offs and effects of C-state management on multi-core ar- chitectures in virtualized systems are evaluated. They studied the relationship between application and C-state switching to address the effect of cache flushing. IdlePower introduced application-awareness to idle state management algorithms. Results show improved residencies in the deepest C3 idle state by up to 10%, and avoided perfor- mance degradations in workloads of up to 26%.

Extensive studies of specific CPU power management mechanisms and policies have been shown. One of these studies, and the main focus of this thesis, is the use of dy- namic voltage and frequency scaling (DVFS) at the operating system level which allows a system to switch their frequency and voltage at run-time, and it is implemented in many modern processors.

Dynamic Frequency and Voltage Scaling (DVFS)

Dynamic voltage and frequency scaling is a significantly studied power management technique, which aims to reduce the power consumption of computing platforms and improve energy efficiency by dynamically scaling the CPU frequency. 22 Chapter 2. Background and related work

There exists extensive work using DVFS in different systems, on embedded [37], mo- bile [39, 40], desktop systems, wireless sensor networks[35] and also on server systems to reduce the cost and prevent overheating [69]. However, most of them are based on analytic or simulation models.

Weiser [34] suggested using DVFS to reduce the energy consumption of computer processors by producing a number of techniques that can be employed by the operat- ing system (OS). Trace-driven simulation approach is used, which determines the clock frequency for each interval and scheduling jobs at different clock rates to evaluate the energy, as well as to choose a new CPU frequency at each OS scheduler. Later this work was extended by Govil [38] using a simulation to compare a number of policies for dy- namic speed-setting and to develop algorithms which predict future CPU utilization based on recent events.

Grunwald et al.[8] implemented a number of clock scaling algorithms that are used to reduce the processor power by adjusting the processor speed. These algorithms at- tempt to minimize idle time and ignore the DVFS complications. Due to these algo- rithms being limited to two frequencies, the developed algorithms failed to achieve power saving.

Three policies designed to reduce energy consumption in web servers is described by Elnozahy et al.[22]. These policies are DVFS and request batching. The first policy used DVFS employed with feedback driven control framework to maintain the system responsiveness. The second policy used request batching, which is a mechanism to conserve energy during low workload intensities. The third policy used both DVFS and request batching mechanism to reduce processor energy usage over a wide range of workload intensities. Results show that DVFS policy and request batching policies saved the CPU energy used by the base system from 8.7% to 38% and from 3.1% to 27%, respectively. While the combined policy saved from 17% to 42%of CPU energy.

Rajamony et al. [3] presented a case for managing power consumption in web servers. They measured the energy consumption of a typical web server under a va- riety of workloads derived from access logs of real websites. The power consumption was measured on each power supply line over time. They created a power simulator 2.3. Server system 23 for web serving workloads that estimates CPU energy consumption with less than 5.7% error for the workloads for which it was used. Experimental results show that the CPU is the largest consumer of power for web servers and that the DVFS technique is effec- tive for saving energy and reducing the CPU energy consumption by up to 36% while keeping server responsiveness within reasonable limits.

Shalan and El-Sissy[70] proposed an implementation of a negative feedback con- trol algorithm that uses DVFS for power saving in soft real-time systems. Different soft real-time workloads (audio /video files) are used. To achieve best performance, they calculate the CPU utilization in runtime periodically and report it to the controller which adjusts the CPU operating point online by using the proposed algorithm. The experimental results show that power savings up to 24% can be achieved.

Huang and Feng [6] introduced a power-aware, eco-friendly daemon, run-time al- gorithm called eco. This algorithm focused on CPU utilization in order to apply DVFS during processor stall cycles due to long off-chip activities to improve energy efficiency. The results show tightly controlled performance (5% loss) over the adaptation algorithm and Linux on-demand governor while saving up to 50% in CPU energy and delivering substantial energy savings of 11%.

Ruan et al.[91] design an energy-efficient scheduling algorithm (TDVAS) using dy- namic voltage scaling for parallel applications on large-scale clusters. Due to signifi- cant communication latencies and high energy consumption, the scheduling of parallel applications on large scale clusters is technically challenging. The idea of the TADVS scheduling algorithm is to exploit idle processor time intervals in each node of a clus- ter, making it possible to leverage idle time intervals to dynamically reduce the supply voltage of the node which provides significant energy savings. The salient feature of the TDVAS algorithm is that it solved the increased execution times of parallel task problem when reducing processor voltage by exploiting tasks precedence constraints. Experimental results show that the TDVAS algorithm is helpful for reducing energy dis- sipation in large-scale clusters which has a negative impact on the system performance.

Donald and Martonosi[71]used DVFS to reduce the thermal problem. The aim of the research in the thermal area has focused on keeping the temperature of systems core as 24 Chapter 2. Background and related work low as possible and below a certain threshold. The core temperature was kept below a targeted threshold by using a distributed DVFS algorithm [71].

The memory system consumes a significant amount of power on the server level. David et al.[72]proposed memory dynamic voltage/frequency scaling (DVFS) to ad- dress this problem and increase energy efficiency. They presented a control algorithm based on observing memory bandwidth utilization and adjusting its frequency/voltage to minimize performance impact. They concluded that the memory DVFS can be an effective energy efficiency technique, especially when memory bandwidth utilization is low. Another concurrent approach is MemScale [73]. This approach proposed DVFS in memory systems. However, there are two differences between the two approaches. The first is that the evaluation in [73] was done using simulations while in [72] they used a real-system evaluation methodology. The second difference is in the algorithm design. MemScale estimated the performance impact when memory frequency is re- duced, while [72] switched frequency based on memory bandwidth utilization.

Le Sueur [74] analyzed the effectiveness of DVFS on recent computer systems, rang- ing from server- to mobile-class systems. He found that the effect of DVFS on total power consumption on modern systems is decreasing. This decrease is due to the scal- ing of transistors to smaller feature sizes on modern processors. This scaling exponen- tially increases static power consumption and reduces dynamic power consumption due to smaller gate capacitance and short-circuits current. Furthermore, a little reduc- tion in system level energy consumption for realistic workloads such as MPEG video playback is reported. It is worth mentioning he found that DVFS could be effective at reducing system-level energy consumption for systems running workloads. This hap- pened when there is low load or high interrupts and/or directs memory access (DMA) rates without impacting throughput or response latency.

2.3.2 Operating system power management

A significant volume of research has focused on operating system power management. As a result, many frameworks were developed. Theses frameworks aim to allow the OS 2.3. Server system 25 to obtain accurate estimates of power consumption for tasks running on a system and to make smart decisions to manage power effectively.

OS’s power management Frameworks:

Many frameworks for operating system power management exist. Weissel and Bel- losa [50] developed Process Cruise Control framework for multitasking systems. It is based on modeling the response of workloads to change the frequency of the CPU. Their techniques used event counters to choose a frequency domain for an executing task. Moreover, they used the performance-monitoring unit (PMU), which is available in most processors, as a count parameter like memory requests per cycle and instruc- tions per cycle. The effect of frequency scaling on the performance and the total power consumption were measured. This system was able to achieve energy savings up to 22% for memory-intensive applications when the frequency was scaled alone. It was further claimed that their system would achieve savings of 37% if the voltage was scaled as well.

ECOSystem project attempted to build an energy awareness operating system in Linux. Zeng et al. [75],[76], [77] developed ECOSystem, a framework that fairly shares energy resources amongst tasks running on a system. They proposed the Currentcy model that unifies resource management for different components of the system and allows energy itself to be explicitly managed. This framework could be used to spec- ify energy management policies that decompose the level of service selectively to keep energy capacity for more important work.

Snowdon developed the Koala framework [23] , which is based on ECOSystem and process cruise-control [50] ideas. The similarity of this framework with ECOSystem is that it used energy accounting and modeling to estimate energy used for processes and with Weissel and Bellosa’s [50] approach in that it calculated which event counter is most important to the system. It should be noted that Koala’s approach was focusing on the CPU and memory, while ECOSystem focused on whole the system. In contrast to Weissel and Bellosa’s approach, Koala separated the model and the policy. 26 Chapter 2. Background and related work

Snowdon enhanced Weissel and Bellosa approach by developing a technique to au- tomatically choose the best model parameters from the hundreds of possible events that measured by PMUs. Where he applied DVFS policies using different events coun- ters depended on the prediction of the model. The first policy used is maximum- degradation which chooses the lowest frequency that guarantees a predefined perfor- mance threshold. The second policy, the generalized energy delay policy, minimizes the power-delay product. The generalized energy-delay policy is represented by

η = P 1−α T 1+α

where η is the value to be minimized, P is the power consumption, T is the execution time and α is a parameter, which takes a value between -1 and 1 to set the desired trade- off between energy consumption and performance. This policy was used to determine the frequency at which a task should run at each time-quantum.

The Koala framework was implemented on the Linux kernel and it was shown that significant energy savings can be achieved on some workloads by up to 30 % for a performance impact of about 4%.

The research of power management at the operating system level focused on find- ing the best policies to save power. Designing the best policy for all computers is not easy. Pettis et al.[79, 80] presented an automatic policy selection which aims to choose a group of policies instead of one policy at run-time without user or administrator inter- vention. A software framework called the Homogeneous Architecture for Power Policy Integration (HAPPI) was proposed. They validated HAPPI’s energy savings by imple- menting the architecture on different platforms running Linux. HAPPI introduced an interface to simplify the implementation of policies in a commodity OS and to automat- ically select the proper policy for each device. This approach allows these policies to be compared simultaneously in order to select the best policy among a set of distinct policies at runtime. Experimental results proved that the best policy is dependent on a device’s power parameters and workload. HAPPI achieved energy savings within 4% 2.3. Server system 27 of the best individual policy for each device in several computing systems without a priori knowledge of workloads.

On memory-aware, Bellosa et al.[36] designed a scheduling policy for avoiding re- source contention and for optimal frequency selection. They analyzed the memory char- acteristics of tasks and proposed a scheduling policy that sorts core-specific run-queues to allow co-scheduling of tasks with minimal energy delay product. Then, according to the memory characteristics of the workload, they reduce the frequency when only memory-bound tasks are available. The result shows that memory-aware scheduling policy can reduce Energy Delay Product (EDP) considerably.

LINUX power management Frameworks:

The Linux operating system is an obvious choice for research purposes, because it is an open-source and offers compatibility with existing hardware. The power consump- tion of the normal running system is a major area for power management in the Linux kernel. Several different projects aim to reduce the power consumption in Linux. For example, LessWatts.org is an open source project focused on delivering the components and tools required to reduce the power consumption and improve the power efficiency on systems, which are running with Linux.

Linux kernel has two major frameworks which control CPU power-management. The ’cpufreq’ framework is used when the processor is active and the ’cpuidle’ frame- work is used when the processor has nothing to do. There are other policies to save power when the CPU is not active such as ’Dynamic tick’, which prevents the CPU from being woken up each millisecond from this low power mode if there is nothing to do.

Pallipadi et al.[21] developed an in-kernel power manager for Linux OS called the ’ondemand governor’. The governor defines utilization thresholds to determine the ap- propriate CPU frequency. If the utilization is above a certain threshold, the ondemand sets the frequency to the highest value. However, if it is below a certain threshold, then the frequency is reduced by 20%. The ondemand governor attempted to minimize idle 28 Chapter 2. Background and related work time by changing CPU frequency in response to load and keeping the performance loss as a result of reduced frequency to a minimum. This governor manages each CPU in- dividually and different cores in the CPU can manage separately if supported by hard- ware. They compared the power gains of these two governors against two settings, performance state (when a server runs at maximum frequency) and power-save state (when it runs at a low frequency). Furthermore, they found that even if the actual power gain differs from one application to another. The two policies perform better than the performance state with an insignificant reduction of performance. The disadvantage in this approach is that it leads to bad decision making when there are dependent logical CPUs which can run at one constant frequency, as they are managed together as a single entity. Hardware will do the coordination and run the CPUs on the frequency based on the highest utilization the core.

However, Linux kernel takes advantage of different power management features that are already available on the platform. It can achieve substantial amount of power savings by turning on this feature. However, there is still room for improvement when- ever new platform features will become available in future. Chapter 3

Concept

In this chapter, the concept of this work will be explained. First of all, it presents an overview of the implementation of DVFS in Linux kernel that was used in the experi- ment. Secondly, this chapter shows the methodology used to simulate the experiment’s architecture and components. And the last section summarizes this chapter.

The design of the concept has been driven by the requirements and research ques- tions that were introduced in Chapter1. The experiment result and evaluation of this concept are reviewed in the next chapter.

3.1 DVFS support in Linux

Modern CPUs have a feature that supports changing the core voltage and operating frequencies of the processor during run time to minimize the power consumption of the system. Since the 2.6.0 Linux kernel version was released, CPUfreq subsystem has been introduced to implement such a feature, which supports the DVFS in the Linux kernel. In the next subsection, the CPUfreq subsystem will be described in more detail. 30 Chapter 3. Concept

3.1.1 CPUfreq Subsystem

CPUfreq is a Linux kernel framework that supports dynamic frequency scaling to re- duce the power consumption [11]. Figure 3.1 shows a high-level view of the CPUfreq subsystem [11], which contains the following components:

1. CPUfreq module: provides a common interface to the CPU-specific frequency- control technologies and the CPU frequency-controlling policies that enables chang- ing the CPU frequency.

2. CPU-specific driver: implements various technologies that support the DVFS. For example, the Intel processor utilizes Enhanced Intel Speedstep Technology (EIST), while the AMD processor implements the Cool’n’Quiet for desktop and the Pow- ernow! Technology for laptop and the VIA processor employs LongHaul Technol- ogy. Even though multiple drivers can exist in the kernel, the CPUfreq infrastruc- ture allows the user to use one CPU-specific driver per platform to change the fre- quencies. It should be noted that Intel processors have both drivers acpi cpufreq and -centrino. On other hand, both the AMD processor and the VIA processor are using powernow k8 driver and longhaul driver respectively.

3. In-kernel governors: when the proper driver is loaded, the desired CPU policy governor must be chosen. Kernel governors’ policies are built as kernel modules, and they manage the actual behavior of the CPU. The governor monitors the re- quirements of the system performance, and changes the CPU frequency whenever the change is required. This change is based on different criteria such as the usage of the CPU. There are different governors available to be used on the system; the following are the policy governors which are available:

• Performance, keeps the CPU at the highest possible frequency. • Powersave, which, in contrast to performance, keeps the clocks of the CPU at the lowest frequency. • Ondemand, sets the CPU speed dynamically depending on the workload. • Conservative, which also sets the CPU dynamically like ondemand but the frequency increases step by step rather than changing it drastically. 3.2. Methodology 31

• Userspace , exports the available frequency information to the user, who can manually configure the operating frequency of the CPU to a specific value.

There are several software packages for managing CPU frequency settings. The cpufrequtils package, which was used in the experiment, provides useful command- line utilities and a daemon script to set the governor at boot [47]. Linux kernel has drivers and driver infrastructure to support these features.

Figure 3.1: A high-level view of the CPUfreq subsystem

3.2 Methodology

This section presents the general view of the system architecture followed by a more detailed view of the component’s architecture. Additionally, it shows the measure- ment’s system that describes the power, performance and CPU utilization’s measure- ments. Then, the experimental methodology will be presented. 32 Chapter 3. Concept

3.2.1 System architecture

Figure 3.2 shows a general architecture that shows the components present in the ex- periment’s system. This architecture mainly consists of four components: client, server, power meter and a (1GB /sec) Ethernet switching router, which interconnects between the server and client. The client generates a workload by sending HTTP requests and interacting with the server. There is an additional node for recording the power con- sumption which is connected to the power-meter.

Figure 3.2: General architecture of experiment 3.2. Methodology 33

3.2.2 System components

This part illustrates the specific components used to simulate the above mentioned ar- chitecture.

1. Users (clients) In order to generate users’ requests, a synthetic workload was used to simulate the ’YouTube’ service. This workload was developed in another exper- iment in our energy-lab [7]. In this workload, the user’s behavior was modeled and a maximum number of 100 users were simulated. It was modified to fit with the experimental requirements. In order to conserve the similarity of the workload in each experiment, the range of user requests should be fixed and unchangeable over time. The experiment was executed in two scenarios. In the first one, the users request videos of known and available formats. In this scenario, the re- quested command in the original application, which used the Wget [14] utility to download the video, was not changed as shown in Pseudo code in Listing 3.1.

Wget http //: YouTube server address/filename

Listing 3.1: Pseudo code for generating user requests without transcoder

In the second scenario, the users request videos of known and unavailable for- mats, and then the servers employ a transcoder to carry out format conversion. This additional transcoding operation imposes the change to the original applica- tion as shown in Listing 3.2.

Wget http // : YouTube server address/PHPscriptname$filename

Listing 3.2: Pseudo code for generating user requests with transcoder 34 Chapter 3. Concept

After a certain number of user’s requests have been generated, the program would wait for the entire download to finish.

2. Servers The experiment was performed using two different types of servers. The first server was deployed with AMD R AthlonTM 64 X2 Dual core processor 3800+, and the second server with Intel R CoreTM 2 Duo Processor E8500. Table 3.1 outlines the main characteristics of the AMD and Intel server’s systems.

Server AMD Intel processor Athlon 64 X2 3800+ Core2Duo E8500 Clock speed (GHz) 2.0 3.16 Cores / Threads 2/2 2/2 Frequency (GHz) 1.0 - 2.0 1.9 - 3.16 Voltage (V) 0.8 - 1.55 0.85 - 1.36 L2 cache 512KB 6 MB Memory 4 GB DDR2 SDRAM 133 MHz 4 GB DDR2 SDRAM 667 MHz Storage(GB) 160 160

Table 3.1: Specifications of the used servers system

It is worth mentioning that the Ubuntu Server Edition (Ubuntu 10.4.3 LTS) was running on both servers. The experiment requires installing the following:

• Apache2 [13] is installed on both servers to handle user’s HTTP requests. • PHP server-side scripting language [15] to work together with apache server, in order to run the transcoder script. • FFmpeg tool [16] to execute the transcoder. It is one of the most popular command line tools to convert media files such as audio, video and images from one format to another. 3.2. Methodology 35

• Dstat tool to monitor server-resource consumption. It is a powerful tool for generating system resource statistics which monitors all server resources in a single command [10].

Moreover, The Linux CPUFreq subsystem was enabled to control processor power saving features. The servers are configured as multimedia servers (like YouTube). Apache server hosted database containing more than 100 videos. The size of the videos varies between 3 MB and 100 MB. Figure 3.3 describes the server’s envi- ronment.

Figure 3.3: The server environment

The AMD Athlon 64 server can operate at 3 frequencies which are 1000 MHz, 1800 MHz and 2000 MHz. The core operating voltage ranges of this processor can be varied between 0.8 V and 1.55 V linearly with frequency. AMD processor uses the Cooln Quiet Technology [89] to support voltage and frequency scaling. The driver that it uses is powernow-k8. In the second server, which was deployed with an Intel2Duo processor, the core operating voltage ranges of this processor can be varied between 0.85 and 1.3625 V. The operation frequencies for these voltages are 1999 MHz and 3165 MHz linearly with voltage. It uses the Enhanced Intel Speedstep Technology (EIST) [94] for DVFS supporting. The driver used is the acpi-cpufreq. 36 Chapter 3. Concept

The operating system enables sampling of the CPU utilization in AMD Athlon server between 10700 and 4294967295 µs, while in Intel server between 10000 and 4294967295 µs. The long sampling interval allows inaccuracy estimation at a min- imal overhead. While a short sampling interval provides high overhead. To ad- dress this issue, the experiments sampling interval was chosen 107000 µs for AMD server and 20000 µs for Intel server to balance accuracy and overhead.

As mentioned earlier, the experiment was carried out in two different scenarios; where Apache was run with and without a transcoder, as described in Fig. 3.4. In the first scenario (without transcoder), the servers will download video files without changing the formats. While in the second scenario(with transcoder), the transcoder converts requested videos between FLV, MPG4 and AVI formats before the videos are ready to be downloaded. PHP Script was developed to execute transcoding operation by using FFmpeg tool. It was run inside Apache web server to convert video formats. It is worth noting that to better measure the Web server’s capability and to achieve more reality, the downloaded video are non-cached at the server side after finishing the download, because if the next user decided to download the same video, then the previous downloaded video won’t be served from the cache to the new user. In the second scenario, the video converted to the other format is saved temporarily in a different folder in the hard disk, and after finishing downloading the video, this video will be removed for the same reason as in scenario one.

3. Power measurement device: For measuring the power, the servers were con- nected to the Yokogawa R WT210 Digital Power Meters [93] . The devices can measure the DC and AC power consumption at a rate of 10 Hz and DC current between 15 µA and 26 A with an accuracy of 0.1%. The experimental results were collected through power meter devices.

3.2.3 Measurement system

Different techniques are used to make the power, CPU utilization, and performance measurements. These techniques can be expressed as follows: 3.2. Methodology 37

Figure 3.4: The experiment scenarios

1. The power: power consumption could be measured at different points in the server, either at the power supply directly or at the system’s peripherals. At the power supply directly, it shows the total power consumed by the system , while at all of its peripherals, it shows the power consumed in each component of the server when it is applied to the related power supply line. The following power supplies were monitored:

• At the main power supply of the server. • +3.3 V supply to the motherboard, powering video card, and Ethernet card. • +5 V supply to the motherboard, powering the RAM, processor. • +12 V supply to the motherboard, powering the processor cooling fan. • +12 V supply to the CPU, powering the processor. • +5 V supply to the single disk drive, powering the controller and other com- ponents. • +12 V supply to the single disk drive, powering the motor and head actuator mechanisms. 38 Chapter 3. Concept

The power was measured for all these power supplies in different power manage- ment policies and scenarios.

2. CPU utilization :

Dstat tool [10] was used to obtain the actual CPU utilization measurements. When a user application is fed as input, this tool gives an information about the entire server resources in real-time. This data was written directly to a CSV file [87], to be imported and used later in R (statistical and graphical programming language) [78] in order to analyze the experimental results. This tool displays CPU stats when it is in user, system, idle, wait, hardware interrupts and software interrupts states. In order to calculate the CPU utilization by determining the percentage of time the processor spends doing work (as opposed the time the processor spends in the IDLE state), the following equation was implemented:

CPU utilization = 100%-idle state% (3.1)

3. The performance:

Apache access-log file [88] records all incoming requests and all requests that have been processed by servers to get a feedback about the activity and performance of the server. It was used to extract the overall size of videos that had been down- loaded at a certain point in time for each experiment.

3.2.4 The experiment methodology

All measurements were accomplished by running the experiments for a similar dura- tion. Each experiment was conducted for one hour, and each test was run at least 5 times to ensure that the results were statistically sound. Workloads that stress the system components are run while utilization measurements are recorded.The data of the mea- surements were collected after every experiment and analyzed to determine the power consumption, performance and processor utilization. The experiments were conducted with CPUfreq subsystem governors. The following policy governors were selected to implement the experiment: 3.3. Summary 39

• Ondemand

• Powersave

• Performance

• Conservative

3.3 Summary

In this chapter, the design concepts of the whole system presented. The development of the system which consists of the server, the client, power meter and measurement sys- tem (measure power, performance, CPU utilization), as well as the power management policies, have been discussed in detail. () Chapter 4

The experimental results

This chapter covers the details of the system’s implementation based on the user’s re- quests and the servers’. In Section4.1, the experimental results obtained by implement- ing the CPUfreq governors on the two servers’ platform are described. The measure- ments cover three aspects: power consumption, CPU utilization and performance mea- surements. In Section 4.2 the experiments’ analysis is presented. Finally, the summary of the entire chapter is outlined.

4.1 Measurements

The experiments aimed to analyze the relationship that couples the CPU utilization and the power consumed by the various components of the multimedia server and the performance when different power management policies are applied.

As mentioned in subsection 3.2.3, power consumption was measured at different points throughout the servers. The experiment was performed for each scenario (with and without transcoder), and for each governor policy (on-demand, power-save, per- formance and conservative). The acquired data from the measurements were analyzed to determine the power consumption, performance, and processor utilization. There- 42 Chapter 4. The experimental results after, the results of the power consumption and CPU utilization were plotted as cumu- lative distribution functions (CDF) of time and this function was implemented in the R statistics package[78]. In these plots, the X-axis represents the cumulative distribution functions (CDF) for each measurement, and the Y-axis represents the power consump- tion measured in watts, or the percentage of the CPU utilization measurement when plotting it together with the (CDF).

4.1.1 Power Consumption

To understand the power consumption characteristics of the multimedia server under the different frequency scaling policies, the power measurement experiment examined the overall power consumption (AC power consumption) of the multimedia server as well as the DC power consumption.

The AMD server was built on D2461 Siemens/Fujitsu Motherboard architecture while Intel server was built on D2581 Siemens/Fujitsu Motherboard architecture.

In order to move in the correct direction in the investigation, one should be aware of the motherboard architecture. Every device in a computer system connects either directly or indirectly to the motherboard. The motherboard has a chipset that deter- mines the type of processor, the type and the capacity of RAM, and the internal and the external devices that the motherboard supports. Most modern chipsets consist of the Northbridge and the Southbridge, as shown in Figure 4.1. The Northbridge chip handles communications among the CPU, PCI Express Graphic card and the RAM. The difference between Intel-based motherboards and AMD-based motherboards is that the memory controller on the AMD-based motherboard is built into the CPU. The South- bridge handles all of the computer’s I/O functions, such as the USB, audio, hard drives, the system BIOS and others. The motherboards come in two form factor; ATX and BTX, for each there is a determined power supply type. ATX power supplies are the most common and they fit into all sizes of ATX and BTX motherboards [18].

The power supply converts (AC) power into (DC) power that the motherboard and 4.1. Measurements 43

Figure 4.1: General Diagram of the main components in a motherboard drives need. They are connected to the motherboard through Molex connectors that provide power to the motherboard, hard disk drive and other components. The moth- erboards are powered through 4-pin and 24-pin Molex connectors. The main power of the motherboard is supplied through 24-pin connector which provides +3.3 V, + 5 V, and +12V (in this thesis, this voltage is denoted as 12V motherboard (MB)). The 4-pin connector provides +12 V which is responsible for supplying the processor with power (in this thesis, this voltage is denoted as 12V CPU) as well as the black wires which are grounded.

The ISL6312 four-phase Pulse-Width Modulation (PWM) control Integrated Circuit (IC) provides an accurate voltage regulation system for advanced microprocessors. This IC controller works with both Intel and AMD microprocessors. The CPU core voltage is generated by a three-phase voltage regulator; the main voltage of the voltage regulator comes from the 12 V line of the 4-pin connector and the PWM controller takes its core voltage from the 5 V. Furthermore, the ISL 6545 PWM controller works with either the 5V or 12V supply voltage of the 24-pin. The motherboard provides the memory unit 44 Chapter 4. The experimental results with a single phase voltage regulator. The motherboard also provides additional voltage regulators to the Southbridge which predominantly uses the 12 V line of the 24-pin while the other IO controllers predominantly withdraw current from the 3.3 V [90].

In the following subsection, power measurements at different points in the servers will be presented. The power consumption of the servers is characterized by using the two scenarios described in the previous chapter while varying the DVFS power management policy used.

• Overall power measurement

The two servers platforms were connected with a power meter to measure the to- tal power drawn by each system when it was tested with the two scenarios work- loads.

– AMD server

Figure 4.2 displays the CDF of the overall power consumption of the AMD multi- media server when the first scenario (without transcoder) and the second scenario (with transcoder) were applied.

As shown from the left of Figure 4.2, the server under the without transcoder scenario consumed, on average 58.5 W in the performance state, whereas it con- sumed 53 W with the ondemand, 51.3 W with the conservative and 51.9 W with the power save policies. The second scenario, displayed in the right of Figure 4.2, revealed that the average power consumption of the server, when it operated in the performance state, was 73.3 W while it was 78.1 W under the conservative policy and 81.8 W under the on-demand policy. For the power-save policy, the average power consumption was 62.4 W.

– Intel server

Figure4.3 displays the CDF of the overall power consumption of the Intel multi- media server, when the first scenario(without transcoder) and when the second 4.1. Measurements 45

Figure 4.2: The cumulative power consumption of overall power consumption of AMD server when (left) without transcoder : it runs Apache only. (right) with transcoder: it runs both Apache and the FFmpeg transcoder

Figure 4.3: The cumulative power consumption of overall power consumption of Intel server when (left) without transcoder : it runs Apache only,(right) with transcoder: it runs both Apache and the FFmpeg transcoder 46 Chapter 4. The experimental results

scenario (with transcoder) were applied.

As shown in the left of Figure 4.3, the server consumed in average 53.2 W in the performance state, whereas it consumed 52.6 W with the ondemand, 52.59W with the conservative and 52.86 W with the power save policies in the first scenario. As the right of Figure 4.3 shows, when the second scenario is applied, the average of the power consumption in the server when it operated under the performance state was 101.52 W, while it was 100.89 W under the conservative policy and 99.96 W under the ondemand policy. For the power save policy, the average power consumption was 70.86 W. Due to the transcoding process, which needs more work to convert the videos file to another format, the power consumption was increased in both servers in comparison with the without transcoding scenario. Furthermore, the power con- sumption in the Intel server in the transcoder scenario is higher than what is it in AMD server, and this is due to the Intel clock speed is higher than AMD processor clock speed which leads to more power consumption.

• DC power measurement

Power supplies convert AC power to DC power so it can be used by subsystems’ circuitry. Through the experiments, two aspects were observed in DC power con- sumption: dynamic and static aspect. The Dynamic aspect was observed when the DC power consumption changed under the various settings through 12 V CPU and 5 V. While, in the static aspect it was observed that, the DC power consump- tion was slightly changed under the various settings through the 3.3 V and the 12 V MB, as well as the power consumption of the disk drive through 12 V and 5 V. In the next part,the DC power consumption of the different subsystems (processor, memory, disk drive, and other IO) is investigated. This is used to provide a deeper understanding of the power consumption characteristics of the server under the different frequency scaling policies. 4.1. Measurements 47

• Dynamic aspect

1) 12V CPU power supply

The CPU power consumption is measured at the 12 V CPU supply line under the four frequency scaling policies with and without transcoder scenarios. Figures 4.4 show the power consumption of the CPU of the AMD server and, Figures 4.5 were display the power consumption of the CPU of the Intel server.

Figure 4.4: The cumulative power consumption of 12 V CPU of AMD server when (left) without transcoder, (right) with transcoder

As can be seen from figures (4.4,4.5), the processor was accounts for a signifi- cant portion of the DC power consumption of the server in the 12 V CPU line. The power consumption of the 12 V CPU is varied according to the workload of the server and the frequency setting. In the first scenario, when Apache was run alone, the frequency scaling was working effectively with the processor of the AMD server. The power consumption can be improved by up to (42%) in compar- ison with the performance policy (without scaling frequency). On the other hand, 48 Chapter 4. The experimental results

Figure 4.5: The cumulative power consumption of 12 V CPU of intel server when a) without transcoder, b) with transcoder

the frequency scaling in the Intel server’s processor, with same workload, has no influence; it consumed about (10 W) in the different DVFS policies. In the second scenario when it runs both Apache and the FFmpeg transcoder, the processor con- sumes about (30 W) in the AMD server and about (48 W) of Intel server, except when the processor’s frequency was minimum, since the power consumption im- proved by up to (50 %), but with a gain in performance of up to (65%) in the Intel server and up to (10%) on AMD server, as we will see later.

2)5 V power supply Most of the power coming from 5 V line supplies power to the memory subsystem while the rest supplies the power for transferring data between memory and the CPU. Figure 4.6 displays the power consumption at the 5 V power supply lines of the AMD server. While Figure 4.7shows the power consumption of the same power supply line of Intel server. This line was measured under the four fre- quency scaling policies in the mentioned scenarios; with and without transcoder.

As can be seen from Figures (4.6, 4.7), the power consumed in the 5 V line is var- ied also according to the workload of the server and frequency setting. When the 4.1. Measurements 49

Figure 4.6: The cumulative power consumption of 5 V supply line to motherboard of AMD server when (left) without transcoder, (right) with transcoder

Figure 4.7: The cumulative power consumption of 5 V supply line to motherboard of Intel server when (left) without transcoder, (right) with transcoder 50 Chapter 4. The experimental results

Apache server was run alone, the 5 V line consumed about (8 W) in AMD server, and nearly (12 W) in the Intel server. While, when the two services run at the same time (Apache and transcoder) the power consumption of this line slightly in- creasees by about (2 W) in both servers, since the transcoder is a memory-intensive service. The observation from the data obtained for this line states that the usage of DVFS may not work well for the memory subsystem because there are slight differences in power consumption between the four DVFS policies. The behavior of the memory subsystem has been analyzed through experiments by using the collected measurements from the D-stat utility. The result shows that the amount of memory used by the kernel as cache was significant when Apache worked alone in the first scenario. While the amount of virtual memory used was insignificant, as seen from Figures 4.8. The other way around happened with the second scenario , where the used memory were increased and the cache memory decreased, when the Apache and the transcoder were run together as displayed in Figures 4.9. When the Apache web server is started, it starts multiple servers and distributes the traffic amongst them. Thereafter, it loads the PHP library to execute the PHP script, increasing the memory usage. However, in the second workload’s scenario, the used memory grows over the time; this is because the transcoder’s process saturates the memory with reading and writing video files. From the experiments, we notice that the behavior of the cache and the used mem- ory greatly affected the memory performance, and hence the power consumption. Therefore, whenever the memory’s usage (cache, used) increases the power con- sumption also increases. Furthermore, when the memory used continued to in- crease, the two cores worked at the same time, which contributed to the increase of power consumption. However, when the memory usage is small (no transcod- ing is done), or when the memory reads from the hard disk to get the requested video, as in the first scenario, the amount of power consumed is reduced since one of the cores goes to the idle state.

• Static aspect As mentioned before, in the static aspect the DC power consumption over the supply lines has a slight variation while executing the workloads. This power 4.1. Measurements 51

Figure 4.8: The cumulative of the memory used by the kernal in AMD server using the four DVFS policies : (left) cache memory (right) virtual memory without transcoder workload

Figure 4.9: The cumulative of the memory used by the kernal in AMD server using the four DVFS policies : (left) cache memory (right) virtual memory with transcoder workload 52 Chapter 4. The experimental results

consumption is also quite small.

1) 3.3V power supply

The 3.3 V line supplies power to the peripherals including the Network interface controller (NIC) and the graphic card. Figures 4.10 displays the power consump- tion of the 3.3 V power supply line of the AMD server with two workload scenar- ios. It was observed that the power consumption did not change with the change of the workload; it is likely constant at about (3.36 W).

Figure 4.10: The cumulative power consumption of 3.3 V supply line to motherboard in AMD server (left) without transcoder, (right) with transcoder

2) 12V motherboard (MB)power supply

The 12V motherboard’s power supply provides power to the CPU fan. In addition, 4.1. Measurements 53

it is used as a control signal by the voltage regulators of the Southbridge and the memory termination logic. Figure 4.11 displays the power consumption in 12 V motherboard’s power supply line of the AMD server. It was observed that the power consumption of this line is mostly constant about (7 W) with two different workload scenarios.

Figure 4.11: The cumulative power consumption of 12 V supply line to motherboard of AMD server (left) without transcoder, (right) with transcoder

3)12 V Hard drive (HD) power supply

The 12V HD power supply provides the motor and head actuator mechanisms on the single disk drive. Figure 4.12 displays the power consumption of the 12V HD power supply line of the AMD server. While in figure 4.13, the power con- sumption of the same power supply line of Intel server is displayed. The 12V disk power was also nearly constant (3.5 W) in the AMD server and (3 W) in the Intel server. 4) 5 V Hard drive (HD) power supply The 5 V HD power supply provides power to the controller and other components 54 Chapter 4. The experimental results

Figure 4.12: The cumulative power consumption of 12V supply line to Hard Disk in AMD server (left) without transcoder, (right) with transcoder

Figure 4.13: The cumulative power consumption of 12V supply line to Hard Disk in Intel server (left) without transcoder, ( right) with transcoder 4.1. Measurements 55

on the single disk drive. The 5 V disk power showed relatively little variation. Figure 4.14 displays the power consumption of the 5 V HD power supply line of the AMD server. While in Figure 4.15, the power consumption of the same power supply line of the Intel server is displayed. Also the power consumption in this line was constant, about (3 W), in both servers.

Figure 4.14: The cumulative power consumption of 5V supply line to Hard Disk in AMD server (left) without transcoder, (right) with transcoder

Figure 4.15: The cumulative power consumption of 5V supply line to Hard Disk in Intel server (left) without transcoder, (right) with transcoder

The behavior of the hard drive subsystem was analyzed through the experiments 56 Chapter 4. The experimental results

Figure 4.16: The cumulative of a) read operation, b) write operation of Hard Disk in AMD server without transcoder

Figure 4.17: The cumulative of a) read operation, b) write operation of Hard Disk in AMD server with transcoder 4.1. Measurements 57

by using the collected measurements from the Dstat utility. The results showed that the read and write operations in the hard disk increased significantly in the second scenario in comparison with the first scenario. This is because the transcod- ing process in the second scenario needs more read and write operations with the video’s files, whereas the high caching usage in the first scenario leads to a reduc- tion in the disk activity. This can be seen in Figure (4.16 and 4.17)

4.1.2 CPU utilization

Figures (4.18, 4.19) describe the percentage of the CPU usage in AMD and Intel servers according to equation 3.1, which was used to calculate CPU utilization. There is a reboot after every test to cool down and to reset the system.

Figure 4.18: The cumulative CPU utilization average in AMD server (left) without transcoder and (right) with transcoder

The CPU utilization in the first scenario, when Apache worked alone, was found to be less than 10%. While in the second scenario, when the server runs both ser- vices(Apache and transcoder), the CPU utilization was more than 98% in both servers. 58 Chapter 4. The experimental results

Figure 4.19: The cumulative CPU utilization average in intel server (left) without transcoder and (right) with transcoder

Figure 4.20: The cumulative of The detail of CPU utilization average in AMD server with transcoder , CPU stats (system, user, idle, wait, hardware interrupt, software inter- rupt) 4.1. Measurements 59

Figure 4.20 explains the data achieved by the CPU utilization’s measurements with the transcoder’s scenario. This data was obtained by analyzing CPU states which was acquired from the Dstat data information. It was observed that the dominant CPU state in the second scenario is the user state, which is the time spent while running user-level process. In the performed experiments, the user state is the PHP script that transcoded the video files to the other format; this user state reached up to 80% of the CPU utiliza- tion.

4.1.3 Performance

To calculate the throughput of multimedia servers when using different DVFS power management policies,the Apache access log file was used. Tables (4.1, 4.2) present the throughput (in GB) of the AMD and the Intel servers, respectively. Each experiment was run for one hour, and one of the two workload’s scenarios was applied with one of the different DVFS power management policies.

Governors policy Without Transcoder/GB With Transcoder/GB Ondemand 49.87089 1.3622 Power-save 55.24799 0.6737399 performance 51.80392 0.9463543 conservative 50.17798 0.5701639

Table 4.1: A comparison of the throughput of AMD multimedia server with and without transcoding

As expected, the throughput decreased significantly when the second scenario was applied; this is because the transcoding process needed time to convert the video’s files to other format. 60 Chapter 4. The experimental results

Governors policy Without Transcoder/GB With Transcoder/GB Ondemand 50.45462 1.86271 Power-save 51.33617 0.4185703 performance 52.74526 1.328156 conservative 47.26372 1.345913

Table 4.2: A comparison of the throughput of Intel multimedia server with and without transcoding

4.2 Experiments analysis

According to the measurements in the previous section, the comparison between the results of the three criteria (power consumption, CPU utilization and throughput) based on different frequency scaling governor policies are summarized in Tables (4.3 to 4.6).

• AMD server

Governors policy Power (Watts) CPU (%) Throughput (GB) Ondemand 53 5.79 51.80392 Power-save 51.9 5.27 50.17798 performance 58.5 4.44 49.87089 conservative 51.3 5.6 55.24799

Table 4.3: A comparison of the power, CPU utilization and throughput of AMD multi- media server without transcoding workload

• Intel server 4.2. Experiments analysis 61

Governors policy Power (Watts) CPU (%) Throughput (GB) Ondemand 81.8 99.97356 0.9463543 Power-save 62.4 99.97458 0.5701639 performance 73.3 99.52904 1.362283 conservative 78.1 99.65371 0.6737399

Table 4.4: A comparison of the power, CPU utilization and throughput of AMD multi- media server with transcoding workload

Governors policy Power (Watts) CPU (%) Throughput (GB) Ondemand 52.6 1.204606 50.45462 Power-save 52.86 1.572874 51.33617 performance 53.22 2.034471 52.74526 conservative 52.59 1.472948 47.26372

Table 4.5: A comparison of the power, CPU utilization and throughput of Intel multi- media server without transcoding workload

The range of total power consumption of the system in AMD Athlon was between (48W- 90 W), while it was between (49 W-101 W) in Intel Core2Due, depending on the workload. In general, the Intel motherboard requires more power than an AMD moth- erboard does, because Intel motherboards are prepared for high speeds of data transfer, and because they support additional memory sockets [20].

When comparing the two workloads, the first scenario that applied the Apache server without transcoder is an I/O intensive workload. In this workload, as can be seen from Figures (4.8, 4.18, 4.19), the cache was stressed and the two CPUs of the servers were in the idle state most of the time, and the CPU utilization was less than 10%. This is because each core only reads or writes the requested video, which resides in server’s memory, without performing any other operation. Hence, the server will have less work to do and will consume less power. In contrast to the I/O intensive work- load, the CPU-intensive workload application is the transcoder. This workload kept the processor busy at over 98% of CPU utilization, showing that the system was being fully 62 Chapter 4. The experimental results

Governors policy Power (Watts) CPU (%) Throughput (GB) Ondemand 99.96 98.23628 1.86271 Power-save 70.86 99.97383 0.4185703 performance 101.52 98.17405 1.328156 conservative 100.89 98.71851 1.345913

Table 4.6: A comparison of the power, CPU utilization and throughput of Intel multi- media server with transcoding workload utilized as shown in Figures (4.9,4.18,4.19), and that there was no time for the CPU to go to idle state during the execution of the workloads. Also, the power consumption of the server increased significantly in comparison with I/O intensive workload. Observation indicates that the power consumption in both servers increased with the increase of the CPU utilization, as shown in table (4.7).

When there is no workload executed on the servers, the processor spends nearly all of it’s time in idle state, resulting in the 12V CPU consuming approximately 5.0 Watts in the AMD server and 8.8 Watts in the Intel server. The overall power consumption of the idle server in AMD was about 51 W and in the Intel server it was about 49 W. This power was needed to run the operating system and to maintain hardware peripherals (such as the memory, disks, PCI slots and fans) even when a server was not loaded with the user’s tasks. Table (4.7) and Figure (4.21) show the total and CPU power consump- tion and CPU utilization in the servers with and without workload.

Power consumption, CPU utilization and throughput based on dif- ferent frequency scaling policies:

Reducing the CPU frequency to the lowest level increases the execution time of the tasks, which causes a reduction in the throughput. However, the power consumption decreases when the CPU frequency is reduced. The overall power measurement, Fig- ures (4.2 and 4.3), shows that the powersave policy saved significant amount of power 4.2. Experiments analysis 63

Platform Workload Power (Watts) CPU power(Watts) CPU (%) No workload 51.28 5.09 0.45 AMD Without transcoder 53 6.74 5.79 With transcoder 81.8 31.14 99.97 No workload 49.54 8.88 0.28 Intel Without transcoder 52.6 9.56 1.2 With transcoder 99.9653 46.44 98.23

Table 4.7: A comparison of the total and CPU power consumption, CPU utilization on servers with different workload

(up to about 25%) in the CPU-intensive workload, but actually did not save power on the I/O intensive workload. As seen from tables (4.3 and 4.5), the servers consumed a higher amount of power in the powersave policy in comparison to the conservative policy in the AMD server and with both ondemand and conservative policies in the Intel server in the I/O intensive workload. In contrast, the throughput was reduced in the CPU-intensive workload, but there was an acceptable throughput on I/O intensive workload.

At a high CPU frequency, increasing the CPU frequency decreased the execution time of tasks and increases the throughput, but this caused to increase the power con- sumption as shown in tables (4.3, 4.5 and 4.6) at the performance policy’s fields. The performance policy is used when one wants to provide the user with the best service possible and a gain in power, but the experimental results showed that this was not al- ways the case. When applying the I/O workload on the AMD server at a high CPU’s frequency, the power consumption would be the highest, but at the same time it does not give the best service in comparison with other governor policies. The contrary would happen when applying the CPU intensive workload to the two servers. This observa- tion on powersave and performance policies, clearly indicate that these policies are not suitable for a wide range of workloads.

All the frequency scaling policies were comparatively effective on I/O intensive workloads in both servers, because they reduced the power consumption compared to the performance state. The CPU utilization in all the frequency scaling policies was 64 Chapter 4. The experimental results

Figure 4.21: The cumulative of the total power consumption with different workload in (left) AMD server and (right) Intel server less than 10%. However, the throughput was reduced but within an acceptable range in comparison to power consumed. The throughput reduction may be due to the fre- quency scaling policies, since they require time to respond to the change in system load, and then the performance might be reduced if the workload utilization changed the frequently.

On the other hand, when applying the CPU-intensive workload, two points were noticed on both servers. The first one was that the frequency scaling policies were not effective when the CPU-intensive workload was applied on AMD server, because the server consumed an amount of power under the ondemand and conservative policies which was higher than when it operated at the maximum frequency, table (4.6). How- ever, the throughput of the ondemand and conservative policies were lower than the throughput of performance policy which clearly indicates that in the CPU-intensive workload the cost of dynamic voltage and frequency scaling surpasses the gain that can be achieved with it. The second observation was that, the frequency scaling poli- cies were comparatively effective with the CPU-intensive workload applied on Intel server. The power consumed at ondemand and conservative policies, was lower than the power consumed at the performance state. Thus, the throughput of these policies is 4.2. Experiments analysis 65 higher than the performance’s throughput, and the CPU utilization is nearly convergent in all cases.

From these observations, it is reasonable to conclude that the CPU frequency scal- ing could be effective to reduce the power consumption of Multimedia servers when the systems are running I/O intensive workload with less than 20% CPU utilization. Furthermore, the effectiveness of frequency scaling policies was influenced by the pro- cessor’s design and the type of application. When the processor has different frequency, it leads to penalty because of switching the CPU’s frequency from one level to another, which results in high overall power consumption. During a frequency switch, the CPU will be unsteady for 10 seconds in order to adjust the corresponding core voltage au- tomatically, and generating the desired voltage by the voltage regulator. During this time the CPU may be operating at frequencies that are different from the requested one, as a consequence this switching would have a non-negligible cost which might cause to degrade the performance. This appears in the AMD server as it has three frequency levels while the Intel server has only two levels which leads to the reduction of the cost of the penalty.

However, the power efficiency of a server is not determined only by the efficiency of the CPU, but other components on the server can have an influence on the overall power consumption. For example, even if the two CPUs are of the same model but they are deployed in different motherboards, the efficiency of the two servers is not the same. Specifically, the difference in the RAM and in the amount of cache memory would have an influence on the power efficiency. Esmaeilzadeh et al.[84] mentioned that the varia- tion of processor’s power, performance, and energy responses are due to features such as clock scaling, microarchitecture, simultaneous multithreading (SMT), and chip mul- tiprocessors (CMP), leads to poor understanding of energy efficient design space.

Relation between resource utilization, power consumption and per- formance:

There are several models that characterize the relationship between power consump- tion, resource utilization, and performance (throughput). One of them is the proba- 66 Chapter 4. The experimental results bilistic models that capture the probabilistic interactions between power consumption, resource utilization, and performance. It takes the three variables as random variables.

Depending on Leibniz’s integral rule, Dargie [90] investigated the relationship be- tween power consumption, resource utilization, and performance (throughput) as fol- lowing: Z ∞ 1 f(z) = fX (z/y)fY (y) dy 0 y

Where: X is the requested arrival rate and Y is the workload size each request intro- duces, Z is the workload of the server per unit time which is expressed as a multipli- cation of two random variables (Z=X.Y),f(z) is the density of Z. In our work, the user request arrival rate varied between 0 and 100 requests per second, while the video size varied between 3 and 100 MB.

Dargie supposed that the model will be more realistic when the requested rates are uniformly distributed, with having the size of the videos being downloaded are expo- nentially distributed, then f(z)will be:

Z ∞ exp µ − y f(z) = µ dy 0 y

Where µ is a mean video size of 3 MB.

When using intel server without transcoder workload the described relation stated that an increment (decrement) in any of studied factors resulted in a corresponding increment (decrement) in other factors. While, when using AMD server with applying the same workload in addition to the case that when the both servers were used with applying transcoder workload, changing the power consumption does not lead to a corresponding variation in the throughput,as shown in Figure(4.22).

As conclusion, the experimental results indicate that there is no seen meaningful relation between the power, CPU utilization and the performance when the server was run the two different workload scenarios, and it was not possible to conclude whether this is because of the problem of DVFS or in the workload arrival estimation. 4.2. Experiments analysis 67

Figure (4.22) show the relation between power consumption, CPU utilization and the performance (throughput) in the two server’s platform and with applying two work- load scenarios.

Figure 4.22: The relation of power consumption ,resource utilization and throughput 68 Chapter 4. The experimental results

Power consumption during switch on/off, sleep state:

For further investigation, the servers’ power consumption was checked when the server was in the switch on/off and sleep states.

The power consumption when starting (switching on) the system was found to be (61.9 W), while it was (56.9 W) when the machine was switched off, as shown in Figure (4.23).

It is worth mentioning that in order to achieve an accurate result, the experiment was repeated by starting and shutting down the system 50 times.

Figure 4.23: The cumulative of power consumption in cases start and shutdown AMD server

To reduce the switch on/off power consumption at the server’s level, the server might be turned into the sleeping state instead of turning it off. Sleeping servers con- sume less power than the switched off servers, and can be back to the ”On” state faster than servers that are turned off. The servers’ power consumption was checked by using the halt instruction, and the suspending to memory instruction (sleep mode) in the two servers, as seen in Figures (4.24 and 4.25). The halt state brings the system down to its lowest state, but leaves it powered on. Suspending to memory saves more energy, 4.2. Experiments analysis 69

Figure 4.24: The cumulative of the overall power consumption for the Intel and AMD servers on the Halt state

Figure 4.25: The cumulative of the overall power consumption for the Intel and AMD servers on the Sleep state 70 Chapter 4. The experimental results where all of the motherboard’s components are disabled except those necessary to re- fresh the RAM and handle wake-up events[97].

Power efficiency of the power supply unit:

There is a loss of power that is observed from the measured DC power consumption; this is due to the inefficiency of the power supply unit, which is consistent with the ATX specification. For example, the average overall AC power consumption of the AMD server was 53 W, and the average DC power consumption, which was collected from all the DC supply lines, was 33 W. Hence, 20 W (about 38%) was lost with the I/O intensive workload, and about 32% was lost with CPU-intensive workload, as seen in Figure (4.26).

Figure 4.26: the DC power consumption for the AMD server (left)without transcoder and (right) with transcoder

The power efficiency of the power supply unit improved to 70% when the server operates at the maximum frequency with the CPU intensive workload. Since the power loss reduces to 22%, the average overall AC power consumption of the server was 81.8 W, and the average DC power consumption was 57.22 W, ,as seen in Figure (4.27) 4.2. Experiments analysis 71

Figure 4.27: the DC power consumption for the AMD server (with maximum fre- quency)with transcoder workload

Energy Efficiency ratio:

In order to have a fair comparison between AMD and Intel platforms, the energy ef- ficiency in both platforms was calculated using the same equation that was used in [83]. This equation is defines the energy efficiency (EE) as the ratio of the work performed to the power consumed:

EE = Work/Energy = Work/( Power * Time) = Performance/Power (4.1)

Governors policy AMD (EE)without Intel (EE)without transcoder transcoder (GB/Watt)) (GB/Watt)) Ondemand 0,97 0,95 Power-save 0,96 0,97 performance 0,85 0,99 conservative 1,05 0,89

Table 4.8: A comparison of the Energy Efficiency (EE) ratio between AMD and Intel servers without transcoder workload 72 Chapter 4. The experimental results

Governors policy AMD (EE)with Intel (EE)with transcoder transcoder (GB/Watt)) (GB/Watt)) Ondemand 0,011 0,018 Power-save 0,009 0,05 performance 0,018 0,013 conservative 0,008 0,013

Table 4.9: A comparison of the Energy Efficiency (EE) ratio between AMD and Intel servers with transcoder workload

From the results based on I/O-intensive workload in table (4.8,4.9), It can be con- cluded that AMD platform is more energy efficient than the Intel platform platform during an I/O-intensive workload using frequency scaling, while, without frequency scaling, the Intel platform is more energy efficient than the AMD platform.

On the other hand, from the results based on the CPU-intensive workload, displayed in table (4.9), It can be concluded that Intel platform is also more energy efficient than AMD with applying frequency scaling policies, while without frequency scaling AMD platform is more energy efficiency.

Other experiments:

Three sets of experiments were investigated. The first experiment was performed us- ing a steady request stream for a single HTTP that requests a Small file (5 MB), Medium file (100 MB), and Large file (450 MB) separately. Each test for a specific file was exe- cuted for 30 minutes and executed in two scenarios; with and without a transcoder. This experiment provides valuable information on the effect of the load on server power con- 4.2. Experiments analysis 73

Figure 4.28: The cumulative of the CPU power consumption for serving requests of dif- ferent file sizes separately, (left) without transcoder scenario and (right) with transcoder scenario sumption even if it is nota realistic workload.

The experimental results were plotted in Figure (4.28). The power consumption in each test with first scenario workload, without transcoder, is fairly convergent, since the average of power consumption for the experiment with a small file size is 7.7 W, for medium file size is 8.11 W and for large file size is 8.16 W as seen from Figure (4.28). In the second scenario, with transcoder, the results is also convergent. The average of CPU power consumption for the experiment with small file size is 22.8 W, 22.2 W for medium file size and for large file size is 29.4, as seen in Figure (4.28). This result was reasonable due to the heavy load when transcoding the large file which leads to an increase in the CPU power consumption.

The second experiment was performed using a steady request stream for a single HTTP request that request a small sized video file (about 5 MB) at various request rates (50, 100 and 200 request/sec). Each request rate was executed for 30 minutes. The results of the experiment are shown in Figure (4.29). The CPU power consumption of the I/O intensive workload kept approximately the same since there was no heavy load 74 Chapter 4. The experimental results

Figure 4.29: The cumulative of the 12 V CPU power consumption for serving different request sizes in (left) without transcoder scenario and in (right) with transcoder scenario on the server, while the CPU power consumption of the server reached a constant value of about 30 W, when the heavy load was applied to the system.

The third experiment compared the power consumption of C-state and P-state. The C-State power management mechanism has been developed to reduce the idle power consumption of modern processors by switching off unused components. This mecha- nism was investigated in the Intel server. In these experiments, only idle power man- agement mechanisms are enabled through the BIOS of the Intel server while keeping Intel SpeedStep disabled, and vice versa. The C-state mechanism was suitable only for the first workload scenario, because the CPU was idle most of the time. In the second scenario the CPU is fully utilized and only the dynamic power management was com- puted for this kind of workload.

Figure (4.30) shows the comparison between C-state and P-state when an I/O inten- sive workload is applied. When the server was at a high frequency level, the experi- mental results show that, the C-state has the same benefit as the ondemand governor for Intel SpeedStep mechanism with I/O intensive workload to reduce the power con- sumption. Even so, the reduction in the power consumption is very small.

Le Sueur [74] investigated the effect of the DVFS with modern processors and he concludes that the effectiveness of the DVFS in improving the energy efficiency is likely 4.3. Summary 75

Figure 4.30: The cumulative of comparison power consumption between C-state and P-state with without transcoder workload in Intel server to decrease with recent computers. He observed several trends in the processor’s design that diminish the DVFS. For instance, scaling transistors to smaller feature sizes, leads to the increase of the static power and the reduction of the dynamic power. This reduces the effects of DVFS on total power consumption. He also found other trends that could affect the DVFS with the release of the multi-core processor; for instance, increasing the memory performance, improving the sleep/idle modes, and in addition to the asyn- chronously clocked L3 caches and memory-controllers. Hence, the DVFS alone is not the best strategy; however, it can be used with other energy management techniques.

4.3 Summary

The experimental results leads to observations inferred from analyzing the power con- sumption, resource utilization and performance trends of Multimedia servers for sev- eral months. The first section presented the results of the experimental measurement graphs. This section was presented in three parts; the first part presented the results of 76 Chapter 4. The experimental results the power measurements that measured the overall power consumption, and the DC power consumption at different power supply lines in the servers. The second part pre- sented the CPU utilization measurements, and the last part presented the performance measurements in two servers in different workload scenarios.

The second section presented result analysis of intensive measurement done on the server’s platforms in different situations. The power consumption of the Multimedia servers was experimentally analyzed under different dynamic voltage and frequency scaling policies. The experimental results proved that frequency scaling was not same effective on all workload. Specifically, AMD server with the ondemand and conserva- tive policies performed poorly with CPU intensive workload, when they were run on the Multimedia servers. They neither reduced the power consumption nor achieved ap- preciable throughput. However, these policies were performed effectively with the Intel server with the same workload, since the power consumption was reduced about 2% and the performance was increased about 40%. Furthermore ,The result indicated that there was not a meaningful relation between power consumption, CPU utilization and performance when the server runs the two different workload scenarios under differ- ent DVFS power management policies. Additionally, a comparison between AMD and Intel server was made through calculation the energy-efficiency (EE) ratio, the results observed that under using frequency scaling policy, the AMD server was more energy efficient than the Intel one with I/O-intensive workload, while Intel server is more en- ergy efficient than AMD with CPU-intensive workload. It can be concluded that the optimal DVFS power management policy is unclear and depends on many factors such as the type of the workload, processor design and the maximum server frequency.

It should be mentioned that a part of the implementations and measurements which were accomplished in this work were published in the 5th International Conference on the Cloud Computing (IEEE Cloud 2012) by Dargie [90]. Chapter 5

Conclusions

5.1 Summary and Conclusion

In this thesis, the power management in server’s systems was analyzed. The scope and usefulness of Dynamic Voltage and Frequency Scaling (DVFS) in a realistic multimedia server environment have been investigated experimentally. This project examined the effects of different DVFS policies on the power consumption, performance and hard- ware resources utilization in order to improve system energy efficiency. In this work, the following chapters were presented:

Chapter1, the introduction, gave an overview of power management and the tech- nologies which were used in its implementation. The chapter was then continued by discussing the research questions and providing the general structure of the complete thesis document.

In chapter2, the most important background information was presented; addition- aly, an overview of the development in power management techniques was introduced, and some approaches used to achieve power control within different levels of abstrac- tion in data center systems were summarized. Much of the prior works were dis- cussed in this chapter; most of these works had focused on the single server system and 78 Chapter 5. Conclusions discussed different power management approaches at server levels. Various research frameworks were described, as well as the current power-management algorithms used in Linux.

Throughout chapter3, the design’s concepts based on the thesis description were discussed. This started with a general overview of the implementation of DVFS in Linux kernel that was used in the experiment. Then, the methodology used to sim- ulate the experiments architecture and components was presented. Furthermore, the measurement system (the measurement of the power, performance and CPU utiliza- tion) was discussed in detail.

Chapter4 presented the experimental measurement of the power consumption of a multimedia server under different dynamic voltage and frequency scaling policies. The chapter then continued with the evaluation of the results of the measurement system.

Two different server systems were analyzed to examine the effectiveness of DVFS. A set of measurements were carried out to test both I/O-intensive (idling and cache intensive) and CPU-intensive (non-idling) workloads. The power consumption of a multimedia server was investigated under different dynamic voltage and frequency scaling policies. The decisions about power management are non-trivial. The power consumption varies significantly with workload. Through the measurements achieved, it was observed that DVFS could be effective to reduce the power consumption of the multimedia servers. When the system ran an I/O-intensive workload, with less than 20% CPU utilization, the power consumption was slightly reduced in the Intel-server by approximately 2% and about 10 % in the AMD-server. Furthermore, the reduction in CPU frequency had little effect on the throughput of the Multimedia servers. For a CPU-intensive workload, it was observed that the efficiency of DVFS depended on the processor’s design. Although DVFS in the Intel server reduces the power consumption up to 2%, the performance improved by 40%. On the contrary, DVFS was not effective with the AMD-server.

We conclude that dynamic frequency and voltage scaling alone is not a perfect mech- anism in servers’ environment, because no single policy of this mechanism has the same effect in all applications. Conservative policy is more effective (in terms of power and 5.1. Summary and Conclusion 79 performance as well as the CPU utilization) than ondemand in an I/O-intensive work- load with an AMD-server. While in Intel, ondemand is more efficient in terms of the performance with same workload. However, with slightly loaded systems power con- sumption can be reduced and the energy efficiency would be improved significantly.

The total power of the system is still dominated by the CPUs power usage, up to 36% in the case of a CPU- intensive workload and up to 16% in an I/O-intensive workload. However, one of the interesting observations from our experimental results is that there is a large fraction of power spent in other Multimedia server components. For example, the measurements of the DC power consumption showed that there is a loss of power due to the inefficiency of the power supply unit. For instance, in the AMD server, 38% out of the total power was lost with the I/O-intensive workload and 32% was lost with the CPU-intensive workload. Also, the memory system which is powered by 5 V power supply line consumes 14% of power consumption in the I/O-intensive workload, while it consumes 11% of power consumption in the CPU-intensive workload. Hence, the power efficiency should not be only include the efficiency of the CPU, it should also include the improvement of the power consumption of the server component which might influence the overall power consumption.

Through detailed measurements, The experimental results indicate that there is not seen a meaningful relation between power consumption, CPU utilization and perfor- mance when the server runs the two different workload scenarios. additionally, we make comparison between AMD and Intel server through calculation the energy-efficiency (EE) ratio, the result observation that under using frequency scaling policy, the AMD server is more energy efficient than Intel with I/O-intensive workload. In contrast, Intel server is more energy efficient than AMD with CPU-intensive workload.

Finally, other experiments were performed using multimedia servers, like request- ing a single HTTP video different numbers of times in the AMD server to provide valu- able information about the effect of the load applied on the server’s power consump- tion. Another experiment was performed to compare between C-state and P-state in the Intel server; it showed that C-state has the same ability to reduce the power consump- tion as the ondemand governor with the I/O-intensive workload. 80 Chapter 5. Conclusions

5.2 Future work

Throughout this project, the DVFS showed great disparity in improving energy effi- ciency; it was shown that its influence depends on the type of the workload and the design of the processor. Le Sueur[74] investigated the effect of the DVFS on modern processors; he concluded that the influence of the DVFS in improving energy efficiency is likely to decrease with recent computers. Hence, the DVFS alone is not the best strat- egy to apply to modern processors, but it might be used with other energy management techniques.

Most recent computers are supported with other power management policies, like Turbo-Boost technology [31], which was mentioned in chapter2. So, as future work, it is suggested to compare the effectiveness of this technique with DVFS. Also, it is recom- mended to compare frequency scaling policies with other policies that were developed in other researches, such as Koala [23].

Due to the large fraction of power spent on other server components such as memory and the power supply unit, it is concluded that it is necessary investigate and develop approaches to address the power consumption of these components.

These paths of research are left to future work. Bibliography

[1] S. Chu. The energy problem and Lawrence Berkeley National Laboratory, Talk given to the California Air Resources Board ,2008, http://www.arb.ca.gov/ research/seminars/chu/chu.pdf. 1.2

[2] Wikipedia . Power management ,http://en.wikipedia.org/wiki/Power_ management , last modified on 25 November 2011 at 00:55. 1.2

[3] Pat Bohrer, Elmootazbellah N. Elnozahy, Tom Keller, Michael Kistler , Charles Lefurgy , Chandler McDowell , Ram Rajamony. THE CASE FOR POWER MANAGEMENT IN WEB SERVERS. IBM Research, Austin TX 78758, USA,http://www.research.ibm.com/arl/projects/papers/ power-management-in-web-servers-2002.pdf. 2.1, 2.3.1

[4] The Problem of Power Consumption in Servers, This article is based on material found in book Energy Efficiency for Information Technology by Lauri Minas and Brad Ellison,Intel on April 01,2009, http://www.infoq.com/articles/ power-consumption-servers. (document), 2.3, 2.3

[5] Yang Jiao , Heshan Lin , Wu-chun Feng. Characterizing Performance and Power of GPU Applications with DVFS, IEEE/ACM International Conference on and Communications (GreenCom),Hangzhou, China, 2010. 1.1, 1.3.1, 2.2 82 Bibliography

[6] Huang Song, Feng Wu-chun. Energy-Efficient Cluster Computing via Accurate Work- load Characterization, 9th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), Shanghai, China, May 2009. 2.3.1

[7] Syed Ejaz. Analysis of the trade-off between performance and energy consumption of ex- isting load balancing algorithms,Technische Universitat¨ Dresden,1. November 2011. 1

[8] Dirk Grunwald , Philip Levis, Keith I. Farkas, Charles B. Morrey III , Michael Neufeld,Policies for Dynamic Clock Scheduling, In Proceedings of the 4th USENIX Symposium on Operating Systems Design and Implementation, pages 73-86, San Diego, CA, USA, Oct. 2000. 2.3.1

[9] Tomasz Buchert, Lucas Nussbaum, Jens Gustedt. Methods for Emulation of Multi- Core CPU Performance. ,Poznan University of Technology, Novembre 2010. 1.3.1

[10] Andrew Pollock, Dag Wieers. DStat Linux man page. http://linux.die.net/ man/1/dstat.2,2

[11] V.Pallipadi, Enhanced Intel SpeedStep Technology and Demand-Based Switching on Linux ,Article,Feb 2009. http://software.intel.com/en-us/articles/ enhanced-intel-speedstepr-technology-and-demand-based-switching-on-linux/. 3.1.1

[12] Xiaorui Wang , Ming Chen, Cluster-level feedback power control for performance opti- mization,HPCA 2008. 2.2

[13] Apache. http://www.apache.org/.2

[14] Linux / Unix Command: wget. http://linux.about.com/od/commands/ l/blcmdl1_wget.htm.1

[15] Apache 2 and PHP 5 mod-php on Linux. http://dan.drydog.com/ apache2php.html.2

[16] FFmpeg. http://ffmpeg.org/.2 Bibliography 83

[17] R. Xu, C. Rusu, D. Zhu, D. Mosse , R. Melhem, Practical Energy-Efficient Policies for Server Clusters. In the 6th Brazilian Workshop on Real-Time Systems(WTR’04). Gramado, Rio Grande do Sul, Brazil, May 2004. 1.3.1

[18] Motherboards, Power Supplies, and Cases ,This article is based on material found in book Mike Meyers CompTIA Certification Passport, Fourth ,chap- ter4 ,by Michael Meyers ,http://www.mhprofessional.com/downloads/ products/0072263083/0072263083_ch04.pdf. 4.1.1

[19] R. Brown ,Report to congress on server and data center energy efficiency, Public law 109-431, Lawrence Berkeley National Laboratory, 2008. 1.2

[20] Jasmine Carpenter, AMD Vs. Intel Motherboards,eHow tech article.http: //www.ehow.com/about_5663519_amd-vs_-intel-motherboards. html#ixzz22xDu54Vp. 4.2

[21] V. Pallipadi and A. Starikovskiy,The ondemand governor,in Proceedings of the Linux Symposium, vol. 2, 2006. 2.3.2

[22] M. Elnozahy, M. Kistler, R. Rajamony ,Energy Conservation Policies for Web Servers , IBM research,USA,2003. 2.3.1

[23] D. C.Snowdon, E. Le Sueur, S. M. Petters, G. Heiser. Koala: a platform for os-level power management. In Proceedings of the 4th ACM European conference on Com- puter systems, EuroSys 09, pages 289302, New York, NY, USA, 2009. ACM. 2.3.2, 5.2

[24] Jens Malmodin, The Energy and Carbon Footprint of ICT and Media Services and Lessons Learned so far in the EARTH Project for Wireless Access Net- works, 5th International ICST Conference on Access Networks(Accessnets2010), http://www.accessnets.org/keynote.shtml. 1.2

[25] Smart 2020: Enabling the low carbon economy in the information age Technical report, 2008. 1.2

[26] IT energy management, http://en.wikipedia.org/wiki/IT_energy_ management , last modified on 18 February 2012 at 16:51. 1.2 84 Bibliography

[27] Hewlett-Packard, Intel Corporation, Microsoft, Phoenix Technologies , Toshiba (2011-10-23). Advanced Configuration and Power Interface Specification, revision 5.0 .Retrieved 2011-10-30. 1.3

[28] PC Energy Report 2009, United States, United Kingdom, Germany, Alliance to Save Energy and 1E, March 2009 ,http://www.climatesaverscomputing. org/docs/1E_PC_Energy_Report_2009_US.pdf 1.2

[29] U. Kumar, K. Burnwal, Power management in embedded systems, http:// intranet.daiict.ac.in/˜ranjan/esp/F/Final-complete.pdf 1.1

[30] Tech ARP - Intel Dynamic Acceleration, http://www.techarp.com/ showfreebog.aspx?lang=0&bogno=412 1.3.1

[31] , http://www.intel.com/technology/turboboost/ 1.3.1, 5.2

[32] Tickless kernel project, http://www.lesswatts.org/projects/tickless/ 1.3.1

[33] ACPI power states, http://www.techarp.com/article/PC_Power_ Management/acpi_states.gif (document), 1.1

[34] Weiser, B.Welch, A. J. Demers, and S. Shenker. Scheduling for reduced CPU energy. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, pages 1323, Monterey, CA, USA, Nov. 1994. 2.3.1

[35] W. Dargie. Dynamic power management in wireless sensor network: State-of-the-art. IEEE Sensor Journal, 12(5):15181528, 2012. 2.1, 2.3.1

[36] A. Merkel , F. Bellosa. Memory-aware scheduling for energy efficiency on multicore processors . In HotPower’08: Proceedings of the 1st USENIX workshop on power aware computing and systems. USENIX Association, 2008. 2.3.2

[37] C. Tianzhou, H. Jiangwei, Z. Zhenwei, X. Liangxiang,A practical dynamic frequency scaling scheduling algorithm for general purpose embedded operating system, College of computer science, ZheJiang University, China. In International Journal of u- and e-Service, Science and Technology, Vol. 2, No. 1, 2009. 2.3.1 Bibliography 85

[38] K. Govil, E. Chan, and H. Wasserman. Comparing algorithms for dynamic speed- setting of a low-power cpu. In 1st Annual International Conference on Mobile Com- puting and Networking, November 1995. 2.3.1

[39] D. Tam,W. Tsang,C. Drula, in Mobile Devices, CSC2228 Project Final Report,December 15, 2003. 2.3.1

[40] W. Liang, Po-Ting Lai,C. Chiou,An Energy Conservation DVFS Algorithm for the Android Operating System, Journal of Convergence , December 2010. 2.3.1

[41] A. Beloglazov, R. Buyya, Y. Choon Lee, A. Y. Zomaya, A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems. Advances in Comput- ers 82: 47-111 (2011). 1.3

[42] Md. Asif. Chowdhury, Md. Shahriar Rizwan, and M. S. Islam, An Efficient VLSI Design Approach to Reduce Static Power using Variable Body Biasing, World Academy of Science, Engineering and Technology 64:264-167(2012). 1.1

[43] Moores law,http://en.wikipedia.org/wiki/Moore’s_law, last modified on 28 May 2012 at 04:17. (document), 2.1

[44] V. Pallipadi ,S.B Siddha, Processor Power Management features and Process Scheduler: Do we need to tie them together?. In Proc. LinuxConf Europe, 2007. 1.3.1

[45] J. Scaramella, Worldwide server power and cooling expense 2006-2010 forecast. Inter- national Data Corporation (IDC), Sep. 2006. (document), 2.2

[46] I. Rodero, S. Chandra, M. Parashar, R. Muralidhar, H. Seshadri, S. Poole .Investi- gating the potential of application-centric aggressive power management for HPC work- loads. HiPC 2010: 1-10 2.3

[47] Cpufrequtils package,https://wiki.archlinux.org/index.php/ .Official Repositories: Last accessed on November 14, 2011: 22:38 CET. 3.1.1

[48] A. Agarwal, E. Fernandez, System Level Power Management for Embedded RTOS: An Object Oriented Approach. International Journal of Engineering (IJE), 2009. 2.1

[49] N. Shigei, I. Fukuyama, H. Miyajima, Y. Yudo, Battery Aware Mobile Relay for Wire- less Sensor Network, IMECS 2012,Hong Kong. 2.1 86 Bibliography

[50] A. Weissel and F. Bellosa, Process cruise control: event-driven clock scaling for dynamic power management. In CASES, 2002. 2.3.2

[51] Carroll and G. Heiser, An analysis of power consumption in a smartphone. In Proceed- ings of the 2010 USENIX Annual Technical Conference, pages 112, Boston, MA, USA, June 2010. 2.1

[52] Z. Wang, C. McCarthy, X. Zhu, P. Ranganathan, and V. Talwar, Feedback Con- trol Algorithms for Power Management of Servers. Proc. Third Int’l Workshop Feed- back Control Implementation and Design in Computing Systems and Networks (FeBID), 2008. 2.2

[53] L. Parolini, N. Toliaz, B. Sinopoli, B. H. Krogh, A Cyber-Physical Systems approach to energy management in data centers.in Proc Of First International Conference on Cyber-Physical Systems. April 2010, Stockholm, Sweden. 2.2

[54] Waltenegus Dargie and Alexander Schill, Analysis of the Power and Hardware Re- source Consumption of Servers under Different Load Balancing Policies. The 5th Inter- national Conference on Cloud Computing (IEEE Cloud 2012), June 24-29 2012, Honolulu, Hawaii, USA 2.2

[55] A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, and B. Maggs,Cutting the electric bill for internet-scale systems, in Proc. of the ACM SIGCOMM Conference on Data communication, Aug. 2009, pp.123134. 2.2

[56] L. Rao, X. Liu, L. Xie, and W. Liu,Minimizing electricity cost: Optimization of dis- tributed internet data centers in a multi-electricitymarket environment, in Proc. of the 29th IEEE International Conference on Computer Communications (INFOCOM), Mar. 2010, pp. 1-9. 2.2

[57] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos,Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach, IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 11, pp. 1458-1472, 2008. 2.2

[58] S. Srikantaiah, A. Kansal, and F. Zhao, Energy aware consolidation for cloud comput- ing, Cluster Computing, vol. 12, pp. 1-15, 2009. 2.2 Bibliography 87

[59] L.A.Barroso and U.Holzle,¨ The case for energy-proportional computing. Computer, 40(12):33-37, 2007. 2.2,1

[60] Laura Keys, A Model-Based Process for Evaluating Cluster Building Blocks.Technical Report No. UCB/EECS-2010-117,August 18, 2010. http://www.eecs. berkeley.edu/Pubs/TechRpts/2010/EECS-2010-117.html. 2.2

[61] H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu. Mini-Rank: Adap- tive DRAM Architecture for Improving Memory Power Efficiency. In MICRO, 2008. 1

[62] X. Li, R. Gupta, S. V. Adve, and Y. Zhou. Cross-Component Energy Management: Joint Adaptation of Processor and Memory. ACM Transactions on Architecture and Code Optimization, 4, 2007.1

[63] Y. Deng, F. Wang,N. Helian, EED: Energy Efficient Disk drive architecture, Published in Information Sciences: an International Journal archive ,Volume 178 Issue 22, November, 2008,Pages 4403-4417.1

[64] Q. Zhu, Y. Zhou. Power-aware storage cache management. IEEE Trans. Computers, 54(5), 2005.1

[65] K.Rajamani, H. Hanson, J. Rubio, S. Ghiasi, F. L. Rawson, Application-Aware Power Management. IISWC 2006: 39-48.2

[66] X.Li, Z. Li, Pin Zhou, Y. Zhou, S. Adve, S. Kumar, Performance Directed Energy Management for Main Memory and Disks. In the IEEE Micro Special Issue: Micro’s Top Picks from Computer Architecture Conferences, Pages 38-49, November 2004. 1

[67] C.Isci, AND M. Martonosi, Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the International Symposium on Microarchitecture (MICRO-36) (Dec. 2003). 2.1, 2.3.1

[68] W. Lloyd Bircher, Lizy K. John, Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events, In Proc. of ISPASS’2007. pp.158 168. 2.1 88 Bibliography

[69] X. Zheng and Y. Cai, Optimal Server Provisioning and Frequency Adjustment in Server Clusters,39th International Conference on Parallel Processing Workshops,2010. 2.3.1

[70] M. Shalan, D. El-Sissy, Online Power Management using DVFS for RTOS, the 4th International Design and Test Workshop (IDT), 2009. 2.3.1

[71] J. Donald and M. Martonosi. Techniques for multicore thermal management: Classi- fication and new exploration. In ISCA ’06: Proceedings of the 33rd annual interna- tional symposium on Computer Architecture, pages 78-88, Washington, DC, USA, 2006. IEEE Computer Society. 2.3.1

[72] H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, O. Mutlu, Memory Power Management via Dynamic Voltage/Frequency Scaling. ICAC11, June 1418, 2011, Karl- sruhe, Germany. 2.3.1

[73] Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, R. Bianchini , MemScale: Active low-power modes for main memory. ASPLOS, 2011. 2.3.1

[74] E. Le Sueur , An Analysis of the Effectiveness of Energy Management on Modern Com- puter Processors. Master thesis ,The University of New South Wales,2011. 2.1, 2.3.1, 4.2, 5.2

[75] H. Zeng , C. S. Ellis, A. R. Lebeck. Experiences in Managing Energy with ECOSystem. IEEE Pervasive Computing, 4(1):6268, 2005. 2.3.2

[76] H. Zeng, C. S. Ellis, A. R. Lebeck, and A. Vahdat. Ecosystem: managing energy as a first class operating system resource. SIGPLAN Not., 37(10):123132, 2002. ISSN 0362- 1340.http://doi.acm.org/10.1145/605432.605411. 2.3.2

[77] H. Zeng, C. S. Ellis, A. R. Lebeck, and A. Vahdat. Currentcy: Unifying policies for resource management. In Proceedings of the 2003 USENIX Annual Technical Con- ference, San Antonio, Texas, June 2003. 2.3.2

[78] The R Project for Statistical Computing, http://www.r-project.org/.2, 4.1

[79] N. Pettis, J. Ridenour, and Y.-H. Lu, Automatic Run-Time Selection of Power Policies for Operating Systems, Proc. Design, Automation and Test in Europe (DATE 06), pp. 508-513, 2006. 2.3.2 Bibliography 89

[80] Pettis, Nathaniel and Lu, Yung-Hsiang, A homogeneous architecture for power policy integration in operating systems (2009). ECEF aculty Publications. Paper 35.http: //docs.lib.purdue.edu/ecepubs/35 2.3.2

[81] Kyeong Lee and Kevin Skadron. Using Performance Counters for Runtime Tem- perature Sensing in High Performance Processors. High-Performance, Power-Aware Computing, April 2005. 2.1

[82] F. Bellosa. The benefits of event-driven energy accounting in power-sensitive systems. In Proceedings of the 9th SIGOPS European Workshop, Kolding, Denmark, Sept. 17-20 2000. 2.1

[83] Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah. Analyzing the energy efficiency of a database server. SIGMOD Conference 2010: 231-242 2.1, 4.2

[84] H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scal- ing. In Proceedings of the 16th ASPLOS2011, Newport Beach, California, USA, Mar. 2011. 2.1, 4.2

[85] W. Lioyd Bircher and Lizy K. John. Analysis of Dynamic Power Management on Multi-Core Processors. In ICS ’08: Proceedings of the 22nd annual international conference on Supercomputing, pages 327-338, New York, NY, USA, 2008. ACM. 2.3.1

[86] H. Amur, R. Nathuji, M. Ghosh, K. Schwan, H. S. Lee,IdlePower: Application-Aware Management of Processor Idle States, First Workshop on Managed Many-Core Sys- tems (MMCS, in conjunction with HPDC), Boston, June 2008. 2.3.1

[87], http://en.wikipedia.org/wiki/Comma-separated-values, This page was last modified on 5 July 2012 at 05:49.2

[88] Access Log - Apache ,http://httpd.apache.org/docs/2.0/logs.html.3

[89] AMD Cool’n’Quiet Technology, http://www.amd.com/us/products/ technologies/cool-n-quiet/Pages/cool-n-quiet.aspx 2 90 Bibliography

[90] W.Dargie,Analysis of the Power Consumption of a Multimedia Server under Differ- ent DVFS Policies. The 5th International Conference on Cloud Computing (IEEE Cloud 2012), June 24-29 2012, Honolulu, Hawaii, USA 4.1.1, 4.2, 4.3

[91] X. Ruan, X. Qin, Z. Zong, K. Bellam, M. Nijim. An Energy-Efficient Scheduling Al- gorithm Using Dynamic Voltage Scaling for Parallel Applications on Clusters. The 16th IEEE International Conference on Computer Communications and Networks (IC- CCN), Honolulu, Hawaii, Aug. 2007. 2.3.1

[92] Andreas Merkel, Frank Bellosa, Memory-aware Scheduling for Energy Efficiency on Multicore Processors. HotPower, 2008

[93] yokogawa power analyzer, http://tmi.yokogawa.com/products/ digital-power-analyzers/digital-power-analyzers/ wt210wt230-digital-power-meters/#tm-wt210_01.htm.3

[94] Enhanced Intel SpeedStep Technology and Demand-Based Switch- ing on Linux,,http://software.intel.com/en-us/articles/ enhanced-intel-speedstepr-technology-and-demand-based-switching-on-linux/ 2

[95] V. Pallipadi, cpuidle - Do nothing, efficiently..., http://ols.108.redhat.com/ 2007/Reprints/pallipadi-Reprint.pdf, (2007) 2.3.1

[96] David Meisner, Brian T. Gold, and Thomas F. Wenisch ,PowerNap: eliminating server idle power. ASPLOSACM (2009) , p. 205-216. http://dblp.uni-trier. de/db/conf/asplos/asplos2009.html#MeisnerGW09 1

[97] A. Leonard Brown, Rafael J. Wysocki, Suspend to-RAM in Linux, In Proc. of the Linux Symposium, 2008 4.2