Power-Energy Simulation for Multi-Core Processors in Bench- Marking Mona A

Advances in Science, Technology and Engineering Systems Journal ASTES Journal Vol. 2, No. 1, 255-262 (2017) ISSN: 2415-6698 www.astesj.com Power-Energy Simulation for Multi-Core Processors in Bench- marking Mona A. Abou-Of*;1, Amr A. Sedky2, Ahmed H. Taha2 1Assistant Professor, Department of Computer Engineering, Pharos University in Alexandria, Alexandria, Egypt 2Undergraduate Student, Department of Computer Engineering, Pharos University in Alexandria, Alexandria, Egypt ARTICLEINFO ABSTRACT Article history: At Microarchitectural level, multi-core processor, as a complex System Received: 21 December, 2016 on Chip, has sophisticated on-chip components including cores, shared Accepted: 19 January, 2017 caches, interconnects and system controllers such as memory and Online: 28 January 2017 ethernet controllers. At technological level, architects should consider Keywords : the device types forecast in the International Technology Roadmap for Multi-Core Processors Semiconductors (ITRS). Energy simulator Energy simulation enables architects to study two important metrics Power Profiling Systems simultaneously. Timing is a key element of the CPU performance that Performance Benchmarks imposes constraints on the CPU target clock frequency. Power and the Convexity Rule resulting heat impose more severe design constraints, such as core System Clock Speed clustering, while semiconductor industry is providing more transistors in the die area in pace with Moore’s law. Energy simulators provide a solution for such serious challenge. Energy is modelled either by combining performance benchmarking tool with a power simulator or by an integrated framework of both performance simulator and power profiling system. This article presents and asses trade-offs between different architectures using four cores battery-powered mobile systems by running a custom-made and a standard benchmark tools. The experimental results assure the Energy/ Frequency convexity rule over a range of frequency settings on different number of enabled cores. The reported results show that increasing the number of cores has a great effect on increasing the power consumption. However, a minimum energy dissipation will occur at a lower frequency which reduces the power consumption. Despite that, increasing the number of cores will also increase the effective cores value which will reflect a better processor performance. 1 Introduction mizations have to be considered in order to optimize the offered workload. In addition, other factors in mi- Microprocessor performance has helmed its indus- croprocessor hardware implementation must be taken try for four decades. Reducing power consumption into account in order to speed up this workload exe- has become a stringent design principle especially for cution time such as using many cores. battery-driven devices. Limiting the increase in CPU In this paper, we make the case for exploring the clock frequency, because of low-power constraints trade-off between low power and energy efficiency and high energy efficiency, has become a real chal- over a wide range of clock frequencies. We do the lenge for improving microprocessor performance over experiments on different battery-powered Laptops the next generation. So, other aspects in microproces- and Smartphones in [1] on a single core. We en- sor architecture (Instruction Set) and compilers opti- face two problems: the choice of power measure- *Mona A. Abou-Of, [email protected] www.astesj.com 255 https://dx.doi.org/10.25046/aj020131 MA. Abou-Of et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 1, 255-262 (2017) ment tools and the choice of performance bench- approaches. Section 3 formulates the problem mark tools. An accurate reliable power measure- with some equations. In section 4 the experi- ment software has to be selected in such a way to mental results are evaluated and analyzed. Fi- be running on Linux platform for Laptop devices like nally, section 5 concludes the paper. Powerstat (Power consumption calculator for Ubuntu Linux. Available: http://www.hecticgeek.com), or running on Android platform for Smartphones like 2 Related Work Powertutor [2]. To capture the transitions be- Most of the existing system energy modeling ap- tween power states, two different finite state ma- proaches combine between power profiling sys- chines (FSM) based power modeling scheme [3] are tems and performance benchmark tools. SPEC implemented: The standard CoreMark benchmark has developed SPECpower ssj2008 (S.P.E. Cor- (Industry-standard benchmarks for embedded sys- poration. specpower ssj2008 benchmark suite. tems. Available: http://www.eembc.org/coremark), Available: http://www.spec.org/power ssj2008) executed on Linux OS, represents a disk with tail focusing on server computer consumption, and power state model that writes the running power on EEMBC has introduced EnergyBench establish- a disk file and stays at high power state for a pe- ing a framework for adding energy to the riod after the active I/O activity. The custom-made metrics of the EEMBC’s performance bench- Fibonacci benchmark, written with Java on Android, marks (E.T.E.M.B. Consortium. energybench ver- represents a free model that returns to the base state sion 1.0 power/energy benchmarks. Available: without inactivity period. http://www.eembc.org/benchmark/power sl.php). In summary, this paper makes the following contribu- McPAT [5] is a fully-integrated power, area and tim- tions: ing modeling framework. It models all types of power • We make laboratory experiments for explor- dissipation and provides an integrated solution for ing the relationship between processor perfor- multithreaded and multi-core processors. McPAT mance, power consumption and energy effi- power modeling is combined with Sniper perfor- ciency over a range of clock frequencies on dif- mance simulation in [6]. ferent number of enabled cores. • We represent the experimental setup in order to 2.1 Power Profiling Systems obtain reliable results. Existing power measurement methods are lim- • We represent a detailed implementation on dif- ited in two ways. First, some systems ferent laptops and Smartphones operating sys- [3,7,8] and Monsoon power monitor (Available: tems. http://www.msoon.com/LabEquipment/PowerMonitor) generate their models by using external hardware lab • The plotted results assure that minimum energy equipments like sensors, meters, and data acquisi- dissipation is always achieved even with differ- tion devices. Second, other systems like Powerstat, ent workloads, and at a certain clock frequency [2,9,10] are self-modeling. They construct their mod- but with a limited performance, lower power els without external circuitry. They use built-in bat- consumption and without optimization realiza- tery sensors or the smart battery interface fuel gauge tion. IC; or read system files available on mobile systems. Integrated sensors are provided on CPUs [11] such • We have proved the Energy/ Frequency convex- as Intel processors [12] and AMD processors [13], on ity rule on multi-core (instead of one core) pro- GPU cards [14], or on motherboards equipped with cessors [4]. a Baseboard Management Controller (BMC) monitor- • Such observations can be fed into an intelligent ing chip [15]. DVFS scheduling, power management module Some of this systems are Event-based as in [3,5] or of an operating system, on multi-core proces- per-component power measurements in addition to sors, which can achieve energy and power sav- the total power as in [5,7,9,16,17]. Others modeled ings without impacting the performance. power measurements by applications as in [2]. Industry simulators are typically cycle-accurate that • We have proved that increasing number of cores run at a speed of 1 to 10 kHz. Academic simulators, has a great effect on increasing the power con- such as [18,19] are not truly cycle-accurate compared sumption. However, a minimum energy dissi- to real hardware, and therefore they are faster, with pation will occur at a lower frequency which simulation speeds in the tens to hundreds of KIPS reduces the power consumption. Despite that, (kilo simulated instructions per second) range. They increasing the number of cores will also increase do not scale well to large multi-core systems. the effective cores value which will reflect a better processor performance. 2.2 Performance Benchmark Tools The rest of this paper is organized as follows: SPECpower ssj2008 benchmark and the Apache Section 2 presents the existing energy modeling benchmarking tool (ab - apache benchmarking tool. www.astesj.com 256 MA. Abou-Of et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 1, 255-262 (2017) Available: http://httpd.apache.org/docs/2.2/programs 4 Experimental Setup /ab.html) are used for HTTP server traffics. The SPECpower ssj2008 is the first industry standard The presented experiments measure the power and SPEC benchmark that evaluates the power and the execution time while running different workloads performance characteristics of volume server class on specific Dynamic Voltage and Frequency Scaling and multi-node class computers.The widespread (DVFS) mobile system settings over a 0.6 GHz to 1.7 used benchmark in industry and academia is SPEC GHz range of CPU frequencies. CPU2006 [20]. EEMBC has benchmarks for general- The variation of CPU frequency settings needs the purpose performance analysis including CoreMark, CPU frequency information of the used mobile de- MultiBench(multicore), and FPMark (floating-point). vice. These settings demand the resetting of the power management policy, the disabling of some cores; and the setting of the only enabled cores with one of its 3 Problem Formulation frequency values in parallel with its upper frequency limit. The experiments are implemented on three differ- The basic relationships among computer perfor- ent battery-powered mobile systems shown in Table mance, power consumption and energy efficiency are 1: two Intel Laptops (Acer and Dell) and one ARM expressed as follows: Smartphone (Samsung A5), on different Operating Systems Ubuntu and Android respectively in [1] and P rocessor P erf ormance = 1=CP U Execution time (1) extended to multi-core on the two Laptops only.

Load more