Beyond CPU: Considering Memory Power Consumption of Software

Hayri Acar1, Gülfem I. Alptekin2, Jean-Patrick Gelas3 and Parisa Ghodous1 1LIRIS, University of Lyon, Lyon, France 2Galatasaray University, Istanbul, Turkey 3ENS Lyon, LIP, UMR 5668, Lyon, France

Keywords: Power Consumption, Sustainable Software, Energy Efficiency, Green IT.

Abstract: ICTs (Information and Communication Technologies) are responsible around 2% of worldwide greenhouse gas emissions (Gartner, 2007). And according to the Intergovernmental Panel on Climate Change (IPPC) recent reports, CO2 emissions due to ICTs are increasing widely. For this reason, many works tried to propose various tools to estimate the energy consumption due to software in order to reduce carbon footprint. However, these studies, in the majority of cases, takes into account only the CPU and neglects all others components. Whereas, the trend towards high-density packaging and raised memory involve a great increased of power consumption caused by memory and maybe memory can become the largest power consumer in servers. In this paper, we model and then estimate the power consumed by CPU and memory due to the execution of a software. Thus, we perform several experiments in order to observe the behavior of each component.

1 INTRODUCTION more possible to obtain accurate and efficient results for energy consumption. However, using these types ICTs (Information and Communication of devices is complicated because it is necessary to Technologies) are responsible around 2% of have these devices and connect them to different worldwide greenhouse gas emissions (Gartner, components. What is more with this method, it is 2007). And according to the Intergovernmental Panel impossible to measure the energy consumed by on Climate Change (IPPC) recent reports, CO2 virtual machines and applications on process. emissions due to ICTs are increasing widely. For this In later years, a new methology has appeared reason, many works tried to propose various tools to which consists of estimating the energy consumed by estimate the energy consumption due to software in a software based on mathematical formula order to reduce carbon footprint. established according to the characteristics of each Since a few years, we have been able to find components susceptible to consume power. But, these several research, on the web tools, (Power Supply tools (Kansal et al., 2010); (Wang et al., 2011); Calculator, 2014), (eXtreme Power Supply (Noureddine et al., 2012), in the majority of cases, Calculator, 2006), (Computer Power Consumption takes into account only the CPU and neglects all Calculator) that allow estimating the energy others components. Moreover, the trend towards consumed by each component of a computer. Doing high-density packaging and raised memory involve a so, the user chooses the feature of the component and great increase of power consumption caused by an estimation is given about related power memory and maybe memory can become the largest consumption. However, this approach provides quite power consumer in servers (Minas and Ellison, 2012). vague results so that a developer cannot use them as In this paper, we will present a methodology to a guide when developing the software. estimate the energy consumed by CPU and memory. That is the reason of the appearance of other Through different experiments we show the measurement means: Measurement of power performance of the proposed methodology. consumption via hardware devices such as power meter or printed circuits (Kern et al., 2013); (Joseph et al., 2001); (Kamil et al., 2008). Using them, it is

417

Acar, H., Alptekin, G., Gelas, J-P. and Ghodous, P. Beyond CPU: Considering Memory Power Consumption of Software. In Proceedings of the 5th International Conference on Smart Cities and Green ICT Systems (SMARTGREENS 2016), pages 417-424 ISBN: 978-989-758-184-7 Copyright c 2016 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

SMARTGREENS 2016 - 5th International Conference on Smart Cities and Green ICT Systems

2 CPU MODELIZATION E it . V . dt (4)

For a long time the CPU was considered the largest We also know that the current is given with the energy consumer component (Kim et al., 2014) in a following expression (5): computer. That is why, in each research work, the modelization of his structure has been taken into dv i t C . (5) account to estimate the energy consumed by an dt computer program only. Thus, the expression (4) becomes (6): Several factors contribute to the CPU power consumption and globally it is possible to give the E V .C dt following formula (1) in order to describe the power (6) E V .C consumed by the CPU: We assume that in a switching cycle, there are low- P P, P, P, (1) to-high and high-to-low transition. So, we can obtain where represents dynamic power the power formulate (7) of this gate: , consumption, corresponds to short-circuit , P f .V .C (7) power consumption and , power loss due to , where f is the frequency. transistor leakage currents and varies with the For N gates, we must multiply the power by N. In temperature (Zapater et al., 2015). The last two power a complex circuit the situation is more complicated, are due to at the hardware manufacturing. Hence, as not all the gates commute at the same frequency. only the manufacturer can reduce the energy Hence, we can define a parameter α < 1 as the average consumption due to hardware. So, it is possible to fraction of gates that commute at every cycle. group this two power in order to obtain a static power Thus, the next expression of the power (8): on the equation (2): P f .V .C . N . α (8) P, P, P, (2) By combining the constants as follows (9): Thus, it is possible to reformulate the equation (1) as follows (3): βC .N.α (9)

P P, P, (3) we obtain (10):

In our case, we want to reduce the energy consumed P, β .f . V (10) by software. For this, we take account only to have more accurate and efficient Moreover, we want to obtain the power consumed by , the program. Thus, the percentage of the process Id results. The CPU, like many integrated circuit, is a set of () is multiplied with the previous expression (10) as follows (11): switches. So the main power consumption in CPU is due to charge and discharge of capacitors during P,, P, . N (11) computations that we can represent with the following figure 1: Thanks to these formulas, we can say that there are several ways to reduce the power consumption due to CPU:

Table 1: Possibilities to reduce power consumption of the CPU. Solutions Technics Dual voltage CPUs Voltage reduction Overvolting/Undervolting Underclocking Frequency reduction Dynamic frequency scaling Capacitance reduction Integrated circuits

Figure 1: One switch in CPU. Dual voltage CPUs consist of uses a split-rail design to allow lower voltages to be used in the The energy can be expressed (4) as follows:

418

Beyond CPU: Considering Memory Power Consumption of Software

processor core while the external Input/Output (I/O) . Read power; voltages remain unchanged. . Write power. Dynamic voltage scaling: the voltage used is To modelize these powers, we need to understand the increased (Overvolting) or decreased (Undervolting) functionality of a DDR3 SDRAM. The master depending upon circumstances. operation is controlled by clock enable (CKE) that Underclocking: modify timing settings to run at a must be high to allow the DRAM to receive activate, lower than is specified. precharge, read, and write commands. And in this Dynamic frequency scaling: the frequency of a situation, commands begin to propagate across the microprocessor can be automatically adjusted for DRAM command decoders, and the activity rises the saving energy. power consumption. Integrated circuits: replace PCB (Printed Circuit We regroup all the parameters that we will use to Board) traces between two chips. calculate the following powers in the table 2. So, we defined a mathematical formula in order to estimate the power consumed by the CPU. And, we 3.1 Activate Power noted the different ways to save energy. Thus, should be limited to the energy The first command sent to the DRAM, during normal consumption of the CPU or does it take into account working, is an activate command that chooses a bank other components whose energy consumption could and row address in order to allow a DDR3 SDRAM be represent an importance compare to the CPU ? to read or write data. The data, that is stored in the That is why, we will try to model the energy cells of the chosen row, is then transferred from the consumption due to Memory. array into the sense amplifiers. Then, the DRAM past in the active state. The precharge command restores the data from the sense amplifiers into the memory 3 POWER CONSUMPTION OF array and resets the bank for the next activate DRAM command. This leaves the bank in its precharge condition. According to (Minas and Ellison, 2012), the power Thus, the following expression (12) can be used used on servers is increasing and the two largest to estimate activate power: consumers of power are the processor and the P PsysACT_PDN memory. Otherwise, several research works try to PsysACT_STBY (12) optimize systems to reduce DRAM power PsysACT consumption: . (Kang et al., 2010); where: . (Hur and Lin, 2008); PsysACT_PDN IDD3P ∗ Vcc . (Emma et al., 2008); ∗BNK_PRE ∗ CKE_LO_ACT (13) . (Zheng et al., 2008); ∗ Vdd / Vcc² . (Vogelsang, 2010). ∗syst_ck_freq / 1000 ∗Tck_used There are also some memory system simulator: . DRAMSim2 (Rosenfeld et al., 2011); PsysACT_STBY IDD3N ∗ Vcc ∗ 1 . Cacti 5.1 (Thoziyoor et al., 2008); BNK_PRE ∗ 1 CKE_LO_ACT (14) . Micron System Power Calculator (Micron, 2007). ∗ Vdd / Vcc² That is why, we choose to study the DRAM in order ∗syst_ck_freq / 1000 to model his power consumption. We need to use ∗Tck_used datasheet values from DRAM manufacturer to PsysACT IDD0 IDD3N establish an expression to estimate the power. ∗tRAS / tRC IDD2N As the CPU, we are interested only by the ∗ tRC tRAS / tRC (15) dynamic power consumed because we can only save ∗ Vcc ∗ tRC / tRRDsch energy in this part. Thus, based on (Micron, 2007) we ∗ Vdd / Vcc² assume that the dynamic power is composed of: So, activate power depends of many factors. Each . Activate power; term of these equations are summarized on the Table . Precharge power; 2.

419

SMARTGREENS 2016 - 5th International Conference on Smart Cities and Green ICT Systems

3.2 Precharge Power column address location. Write power is defined with (20): Every activate command, that opens a row, have a P IDD4W IDD3N ∗ Vcc precharge command, that closes the row, associated ∗8/Blength with it. ∗ WRsch Precharge power can be formulated with the (20) ∗ Vdd / Vcc² equation (16): ∗syst_ck_freq / 1000 P PsysPRE_PDN + ∗Tck_used (16) Psys(PRE_STBY) Each parameter of this formula is also expressed on where: the Table 2.

PsysPRE_PDN Idd2P ∗ Vcc 3.5 DRAM Total Power ∗ BNK_PRE (17) ∗ CKE_LO_PRE DRAM total power (21) is obtained by summing all ∗ Vdd / Vcc² ∗ 1 the equations (12), (16), (19) and (20) of powers PsysPRE_STBY defined in the preceding paragraphs.

IDD2N ∗ Vcc P P P P ∗ BNK_PRE ∗ 1 (21) P CKE_LO_PRE% (18) ∗ Vdd / Vcc² Moreover, we want to calculate the power consumed ∗ syst_ck_freq / 1000 by the application. That is why, the usage percent of ∗ Tck_used the process Id () is multiplied with the previous expression (21) as follows (22): Precharge power depends also of several factors that are defined on the Table 2. P, P . M (22)

3.3 Read Power Table 2: Data sheet specifications. Parameter Description During active state, data can be read from or written Idd2P Precharge power-down current to the DDR3 SDRAM. A read command decodes a Vcc Voltage The percentage of time that all banks on BNK_PRE specific column address associated with the data that the DRAM are in a precharged state is stored in the sense amplifiers. The data from this Percentage of the all bank precharge time CKE_LO_PRE column is driven across the I/O, gating to the internal for which CKE is held LOW read latch. From there, it is multiplexed onto the Vdd System VDD output drivers. IDD2N Precharge standby current syst_ck_freq System CK frequency Read power can be expressed as follows (19): Tck_used Used for current measurements

P IDD4R IDD3N ∗ Vcc IDD3P Active power-down current Percentage of the at least one bank active ∗ 8 / Blength ∗ RDsch CKE_LO_ACT time for which ∗ Vdd / Vcc² (19) CKE is held LOW ∗ syst_ck_freq / 1000 IDD3N Active standby current Operating current: One bank active- ∗ Tck_used IDD0 precharge Each term of this equation is also described on the tRAS Used for IDD0 calculation Table 2. tRC Activate-to-activate timing The average time between ACT tRRDsch commands to this DRAM 3.4 Write Power IDD4W Operating burst write current Blength Burst length The power needed for a write data is similar to the The percentage of clock cycles which are read data except the data propagates in the opposite WRsch inputting write data to the DRAM direction. Data from the DQ pins is latched into the IDD4R Operating burst read current data receivers/registers and is transferred to the The percentage of clock cycles which are internal data drivers that transmit the data to the sense RDsch outputting read amplifiers across the I/O gating and into the decoded data from the DRAM

420

Beyond CPU: Considering Memory Power Consumption of Software

Thus, we established also the relation allowing us multiply and divide. For instance, a >> 2 can be to estimate the power consumed by DRAM. Hence, used in place of a / 4, and a << 1 replaces a * we implemented a tool and realize some experiments 2. in order to see the behavior of DRAM compare to In our case in order to see the behavior of this CPU. replacement, we execute the same operation several time (here: 50000 repetitions). So, we can observe the results on the Figures 3a and 3b. 4 EXPERIMENTS

4.1 Devices Used

We used the laptop ASUS model N751J composed of a CPU Core i7-4710HQ (2.5GHz) and a RAM 16 Go (2 * 8 Go) DDR3 1600 MHz. To run tests, we developed a tool TEEC (Tool to Estimate Energy Consumption), whose model is shown in Figure 2, in Java programing language because depending on (Noureddine, 2012) Java represent the language with the least power Figure 3a: Strength reduction unoptimized. consumption during compilation and execution steps in default parameter settings of the compiler. In this tool TEEC, we use Sigar library (Morgan and MacEachern, 2010) in order to get information about the CPU and the RAM. Moreover, we use also the parameter provides by manufacturers. And, Java Agents allows us to the instrumentation capabilities to an application.

Figure 3b: Strength reduction optimized.

For this test, we observe that the DRAM power consumption remains constant in the two cases and Figure 2: Model of our tool TEEC. the time elapsed is similar. The CPU power consumption is less important after the strength Thus, using TEEC, we realized several different reduction. This show, the impact of the code source tests in order to observe the variation of the power on the CPU power consumption. Moreover, in the consumption due to the CPU and the memory and two cases, we observe that the CPU power varies and compare them. sometimes these values are close to DRAM values. Thus, we can say that the DRAM power consumption 4.2 Source Code Adjustment is not always neglected in front of CPU power consumption. Based on (Kambadur and Kim, 2014), we realize the following tests in order to see the impacts of source 4.2.2 Eliminate Common Subexpressions code on CPU and memory power consumption. To remove redundant calculation, we eliminate 4.2.1 Strength Reduction common subexpressions. This part of code:

double a = c * (d / e) * f; Strength reduction consists of replacing an operation double b = c * (d / e) * g; by a similar operation. The most common example of strength reduction is using the shift operator to can be rewritten as:

421

SMARTGREENS 2016 - 5th International Conference on Smart Cities and Green ICT Systems

double h = c * (d / e); The results of this test is represented on the double a = h * f; Figures 5a and 5b. double b = h * g;

We run test in a loop of 50000 repetitions to observe the variation of power. The results are in Figure 4a and 4b.

Figure 5a: Code motion unoptimized.

Figure 4a: Subexpression unoptimized.

Figure 5b: Code motion optimized.

This test show that in the unoptimized code motion, the time elapsed is slightly greater than Figure 4b: Subexpression optimized. optimized code. CPU and DRAM power consumption are quite similar in the two cases. And, In this test, the results show that the CPU and the sometimes CPU power consumption curve DRAM power consumption and the elapsed time in approaches DRAM power consumption curve. the two cases are quite similar. However, we note that the CPU power consumption vary and several times 4.2.4 Unrolling Loops is more close to DRAM power consumption. Unrolling loops reduces the number of loop control 4.2.3 Code Motion code by performing more than one operation each time in the loop, and consequently running fewer Code motion moves code that calculates an iterations. With the previous example, if the length of expression whose result doesn't change. This is most the table a is always a multiple of two, the loop can common with loops, but it can also involve code be rewrite like: repeated on each invocation of a method. For example: double pico = Math.PI* Math.cos(b); for (int i = 0; i < a.length; i += 2) { for (int i = 0; i < a.length; ++i) a[i] *= pico; a[i] *= Math.PI * Math.cos(b); a[i+1] *= pico; } becomes: Figure 6 shows the power consumption of CPU and double pico = Math.PI* Math.cos(b); DRAM depending on the time. for (int i = 0; i < a.length; i++) a[i] *= pico;

422

Beyond CPU: Considering Memory Power Consumption of Software

CPU power consumption remains the main energy consumer. However, the DRAM power consumption can’t be neglected. Moreover, some code optimizations don’t make any real impact on the CPU and DRAM power consumption. The contribution to power measurement literature will continue by bringing improvement to the estimation of the consumption of other components; such as, disk and network in order to observe their impact. It will allow us to have a higher accuracy in Figure 6: Unrolling loops. estimating the energy consumption of a program. The proposed tool TEEC is expected to be Compare to the Figure 5b, in this case, we observe improved, and it is planned to dynamically that at the beginning of the curve, the CPU consumes identifying locations where code consume the largest more power during some time than code motion and power. This will allow developers to optimize their then becomes similar. But, in unrolling loops case, the own codes to obtain green and sustainable software. total execution elapsed time is the half of the code motion case. And at the end of the curve in Figure 6, the CPU power is less important than the curve in REFERENCES code motion (Figure 5b). Moreover, in this test, the difference between CPU and DRAM power Gartner, Green IT: The New Industry Shock Wave, consumption is less important than code motion case. Gartner, Presentation at Symposium/ITXPO Thus, the results reveal that the unrolling loops Conference, 2007. method is quicker and consumes less CPU power than Power Supply Calculator, February 2014. URL: the code motion method. http://powersupplycalculator.net/. eXtreme Power Supply Calculator, January 2006. URL: http://outervision.com/power-supply-calculator. Computer Power Consumption Calculator. URL: 5 CONCLUSIONS http://www.matthewb.id.au/power/computer_power_c onsumption_calculator.html. A modelization of the CPU and the DRAM has been Kern, E., Dick, M., Naumann, S., Guldner, A., Johann, T., made in order to understand the behavior and the 2013. Green software and green software engineering– definitions, measurements, and quality aspects. ICT4S functionality of each component. Thanks to this 2013: Proceedings of the First International model, several mathematical formulas have been Conference on Information and Communication established to estimate the power consumption due to Technologies for Sustainability. each part of each component. Thus, based on this Joseph, R., Brooks, D., Martonosi, M., 2001. Live, runtime methodology, a tool that allow to measure the power power measurements as a foundation for evaluating consumed by CPU and DRAM has been implemented power/performance tradeoffs. Workshop on Complexity and named TEEC (Tool to Estimate Energy Effective Design WCED, held in conjunction with Consumption). This tool gives accurate and efficient ISCA-28. information about CPU and DRAM power Kamil, S., Shalf, J., Strohmaier, E., 2008. Power efficiency in high performance computing. IEEE International consumption, has been used to perform some Symposium on Parallel and Distributed Processing, experiments. The goal of these tests was to observe IPDPS 2008. the impact of the code source of an application in the Kansal, A., Zhao, F., Liu, J., Kothari, N., Bhattacharya, A., power consumption. These experiments have 2010. Virtual Machine Power Metering and provided several results. Provisioning. ACM Symposium on Cloud Computing When the code source is optimized, it is possible (SOCC). to reduce the power consumption due to CPU. But, Wang, S., Chen, H., Shi, W., 2011. SPAN: A software the DRAM power consumption remains quite power analyzer for multicore computer systems. constant. Sustainable Computing: Informatics and Systems, Volume 1, Issue 1. Sometimes, it is possible to save energy with an Noureddine, A., Bourdon, A., Rouvoy, R., Seinturier, L., optimization of the code by reducing execution time 2012. A Preliminary Study of the Impact of Software of an application. Engineering on GreenIT. First International Workshop In several cases, after some time of execution,

423

SMARTGREENS 2016 - 5th International Conference on Smart Cities and Green ICT Systems

on Green and Sustainable Software. Kim, M., Ju, Y., Chae, J., Park, M., 2014. A Simple Model for Estimating Power Consumption of a Multicore Server System. International Journal of Multimedia and Ubiquitos Engineering. Zapater, M. et al., 2015. Leakage-Aware Cooling Management for Improving Server Energy Efficiency. IEEE Trans. Parallel Distrib. Syst. 26(10): 2764-2777. Minas, L., Ellison, B., 2012. The Problem of Power Consumption in Servers. Intel Press. Kang, U. et al., 2010. 8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology. Journal of Solid- State Circuits. Hur, I. and Lin, C., 2008. A comprehensive approach to DRAM . International Symposium on High Performance Computer Architecture. Emma, P., Reohr, W. and Meterelliyoz, M., 2008. Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications. IEEE Micro. Zheng, H. et al., 2008. Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency. Proceedings of Micro. Vogelsang, T., 2010. Understanding the energy consumption of dynamic random access memories. Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. Rosenfeld, P., Cooper-Balis, E., Jacob, B., 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. CAL. Thoziyoor, S., Muralimanohar, N., Ahn, J., Jouppi, N., 2008. CACTI 5.1. HP Laboratories Palo Alto. Micron Technologies Inc. System Power Calculator. URL: http://www.micron.com/support/power-calc. Micron, 2007. Calculating Memory System Power for DDR3. Morgan, R. and MacEachern, D. 2010. URL: https://support.hyperic.com/display/SIGAR/Home Kambadur, M., Kim, M.A., 2014. An experimental survey of energy management across the stack. ACM International Conference on Object Oriented Programming Systems Languages & Applications.

424