Simultaneous Multithreading on X86 64 Systems: an Energy Efficiency Evaluation

Total Page:16

File Type:pdf, Size:1020Kb

Simultaneous Multithreading on X86 64 Systems: an Energy Efficiency Evaluation Simultaneous Multithreading on x86_64 Systems: An Energy Efficiency Evaluation Robert Schöne Daniel Hackenberg Daniel Molka Center for Information Services and High Performance Computing (ZIH) Technische Universität Dresden, 01062 Dresden, Germany {robert.schoene, daniel.hackenberg, daniel.molka}@tu-dresden.de ABSTRACT Intel added simultaneous multithreading to the pop- In recent years, power consumption has become one ular x86 family with the Netburst microarchitecture of the most important design criteria for micropro- (e.g. Pentium 4) [6]. The brand name for this tech- cessors. CPUs are therefore no longer developed nology is Hyper-Threading (HT) [8]. It provides with a narrow focus on raw compute performance. two threads (logical processors) per core and still is This means that well-established processor features available in many Intel processors. that have proven to increase compute performance The primary objective of multithreading is to im- now need to be re-evaluated with a new focus on prove the throughput of pipelined and superscalar energy efficiency. This paper presents an energy ef- architectures. Working on multiple threads pro- ficiency evaluation of the symmetric multithreading vides more independent instructions and thereby (SMT) feature on state-of-the-art x86 64 proces- increases the utilization of the hardware as long as sors. We use a mature power measurement method- the threads do not compete for the same crucial re- ology to analyze highly sophisticated low-level mi- source (e.g., bandwidth). However, MT may also crobenchmarks as well as a diverse set of applica- decrease the performance of a system due to con- tion benchmarks. Our results show that{depending tention for shared hardware resources like caches, on the workload{SMT can be at the same time ad- TLBs, and re-order buffer entries. Harmful effects vantageous in terms of performance and disadvan- such as cache and TLB thrashing can be the result. tageous in terms of energy efficiency. Moreover, we Many scientific studies have investigated the effec- demonstrate how the SMT efficiency has advanced tiveness of SMT in general and HT in particular. In between two processor generations. this paper we re-evaluate this technique with an new focus: Does HT increase the power consumption of the processor, and if so, does this outweigh the 1. INTRODUCTION performance advantage so that the overall energy Hardware multithreading is an established tech- requirement to carry out a given task increases? nique to increase processor throughput by handling The question is even more difficult to answer when multiple threads in parallel. It can be distinguished considering the fact that the answer strongly de- into temporal multithreading (TMT) and simulta- pends on the type of workload that is executed. neous multithreading (SMT). While TMT proces- This paper contributes an energy efficiency study sors can only issue instructions from a single thread of Hyper-Threading using a mature power measure- in one cycle, SMT processors are able to issue in- ment methodology and two state-of-the-art x86 64 structions from multiple threads concurrently. The Intel microarchitectures (Westmere-EP and Sandy first SMT-capable system [2] was installed in 1988. Bridge). We analyze the impact of HT on the power consumption when running synthetic benchmarks as well as a rich set of application benchmarks from the SPEC CPU and OMP benchmark suites. 2. RELATED WORK Ungerer et al. describe the different types of hard- ware multithreading that are implemented in pro- cessors along with their advantages and shortcom- 1 ings [13]. Tullsen et al. focus on simultaneous mul- Table 1: Hardware Configuration tithreading (SMT) and illustrate the substantially Fujitsu ESPRIMO increased complexity of the processor design [12]. System Dell PowerEdge R510 P700 E85+ Hyper-Threading is the most common type of SMT Processor 2x Xeon X5670 1x Core i7 2600 and has been described in [8, 6]. A recent perfor- 2.93 GHz 3.4 GHz Core clock mance evaluation of Nehalem cluster has demon- (Turbo 3.33 GHz) (Turbo 3.8 GHz) strated that the performance improvement of HT Uncore clock 2.66 GHz 3.4 GHz (Turbo 3.8) is highly application dependent [11]. Li et al. [7] TDP 95 W 95 W model the effects of SMT on power consumption for Codename Westmere-EP Sandy Bridge POWER4-like architectures. They conclude that FPU 2x 128 Bit (SSE) 2x 256 Bit (AVX) the IBM implementation of SMT can significantly L1 cache 2x 32 KiB per core 2x 32 KiB per core improve energy efficiency. L2 cache 256 KiB per core 256 KiB per core A set of synthetic low-level benchmarks that al- L3 cache 12 MiB per chip 8 MiB per chip lows to determine the performance of data transfers IMC channels 3x RDDR3 2x DDR3 within shared memory x86 64 systems has been pre- Memory type PC3L-10600R PC3-10600 sented in[9, 4]. They have been further extended Memory size 12 GiB (6x 2 GiB) 8 GiB (2x 4 GiB) to determine the energy consumption of accesses to Chipset Intel 5520 Intel Q65 Dell 480W Fujitsu different levels in the memory hierarchy and per- Power supply PN H410J DPS-250AB-62 A forming arithmetic operations on two-socket AMD OS Linux 2.6.38 Linux 2.6.38 and Intel systems [10]. This study has shown first signs that the use of Hyper-Threading introduces a significant power consumption overhead while not of dual socket Sandy Bridge processors, we compare providing any performance advantage. In this paper a single socket Sandy Bridge system with a dual we build upon the results of [10] in order to investi- socket Westmere-EP machine. In order to ensure a gate how the more recent Sandy Bridge microarchi- fair comparison, we do not look at absolute power tecture performs and how this effect translates to consumption numbers, but only relative changes, real applications as represented by the benchmarks i.e. the percentage difference of the power and en- SPEC OMP [1] and SPEC CPU2006 [5]. ergy consumption when running a workload with SMT enabled or disabled. Moreover, we compare 3. EXPERIMENTAL SETUP workloads that{while using all available cores of the system{perform almost no communication be- 3.1 Test System Hardware tween the individual threads. Finally, we ensure that memory allocation of each thread occurs lo- We use two state-of-the-art x86 64 test systems to cally, so that no significant inter-socket communi- evaluate the energy efficiency of the multithreading cation occurs on the dual socket machine. implementation (HT) of two different generations of Intel processors. One test system is based on 3.2 Software Workloads the most current dual socket Intel Xeon processors (Westmere-EP). This processor features six cores The highly optimized microbenchmarks [9, 4] al- with SSE units, private L1 and L2 caches and a low us to determine the data throughput of different shared L3 cache. The second test system is based instruction types that access well-defined locations on the most recent Intel processor family (Sandy within the memory hierarchy. Either n (on an n- Bridge)1. This processor features four cores with core system with HT disabled) or 2n (HT enabled) AVX units and a similar memory subsystem. Ta- threads perform a uniform workload consisting of ble 1 lists the test system configuration in detail. independent operations that on disjoint memory re- Both of the test systems are highly power opti- gions. The innermost benchmark routines are writ- mized with only indispensable components installed. ten in highly optimized assembly code that achieves The power supplies of both systems are very effi- near peak bandwidth for every cache level as well as cient and the Dell system additionally features low- main memory. One thread per core is generally suf- power DRAM modules. Due to the unavailability ficient to fully utilize the data paths, leaving little to no potential for performance increases through 1This is in fact a desktop class CPU (Core i7). How- the use of HT. With HT enabled, the execution ever, almost identical results were obtained on a pre- units and caches are concurrently stressed by two production Intel Xeon 1280 test system. The Xeon system showed some stability issues and we therefore threads per core that both achieve roughly half of present the Core i7 results. the peak per-core throughput. The existing bench- 2 marks have been extended to use AVX instructions Table 2: Compiler Configuration on the Sandy Bridge platform in order to fully uti- Westmere-EP Sandy Bridge lize the 256 Bit wide execution units and data paths. Compiler Suite Intel Composer XE 12.0 We also run a subset of the SPEC CPU2006 ap- Compiler Flags -m32 -xSSE4.2 -ipo -m32 -xAVX -ipo plication benchmarks on our test systems. Officially SPEC CPU -O3 -no-prec-div -static -opt-prefetch submitted SPEC CPU2006 results from HT-capable Compiler Flags -xSSE4.2 systems are often generated with HT enabled to SPEC OMP -O3 -ipo1 -openmp improve the overall benchmark performance. In our study we focus on the integer part of SPEC 3.3 Power Measurement CPU2006 as we found these benchmarks to be more For both the synthetic and the application bench- sensitive to the use of HT. We focus on the through- marks we measure the power consumption using a put mode of the benchmark (rate instead of speed) ZES Zimmer LMG450 power analyzer attached to that runs multiple copies of the same serial work- the power supply of each test system. This device load on all available cores. To avoid unnecessary provides highly accurate power consumption data complexity, we use the base configuration that al- with a sampling frequency of 20 Hz. The power con- lows for just one set of compiler flags for all bench- sumption data is collected by dedicated hardware marks instead of individually tuned configurations and software components [10].
Recommended publications
  • Low-Power Computing
    LOW-POWER COMPUTING Safa Alsalman a, Raihan ur Rasool b, Rizwan Mian c a King Faisal University, Alhsa, Saudi Arabia b Victoria University, Melbourne, Australia c Data++, Toronto, Canada Corresponding email: [email protected] Abstract With the abundance of mobile electronics, demand for Low-Power Computing is pressing more than ever. Achieving a reduction in power consumption requires gigantic efforts from electrical and electronic engineers, computer scientists and software developers. During the past decade, various techniques and methodologies for designing low-power solutions have been proposed. These methods are aimed at small mobile devices as well as large datacenters. There are techniques that consider design paradigms and techniques including run-time issues. This paper summarizes the main approaches adopted by the IT community to promote Low-power computing. Keywords: Computing, Energy-efficient, Low power, Power-efficiency. Introduction In the past two decades, technology has evolved rapidly affecting every aspect of our daily lives. The fact that we study, work, communicate and entertain ourselves using all different types of devices and gadgets is an evidence that technology is a revolutionary event in the human history. The technology is here to stay but with big responsibilities comes bigger challenges. As digital devices shrink in size and become more portable, power consumption and energy efficiency become a critical issue. On one end, circuits designed for portable devices must target increasing battery life. On the other end, the more complex high-end circuits available at data centers have to consider power costs, cooling requirements, and reliability issues. Low-power computing is the field dedicated to the design and manufacturing of low-power consumption circuits, programming power-aware software and applying techniques for power-efficiency [1][2].
    [Show full text]
  • Green Computing: Barriers and Benefits
    International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 3 (2017), pp. 339-342 © Research India Publications http://www.ripublication.com Green Computing: Barriers and Benefits 1Monika, 2Jyoti Yadav, 3Muskan and 4Romika Yadav 1,3,4 Indira Gandhi University, Meerpur, Haryana, India 2Deenbandhu Chhoturam University of Science & Technology, Murthal, Haryana, India Abstract Green computing provide reusability of resources that are currently used by various technologies. Green computing is responsible for environmentally and eco-friendly use of computer and their resources. So it defines the study of engineering, manufacturing, designing and using disposing computing devices in such a way that help to reduce their impact on environment. This provides an idea about why we use green computing and their barriers in implementing green computing. Subsequently benefits and their implementations technologies of green computing are proposed. Keywords: Reusability, Technology, Resources, Manufacturing. 1. INTRODUCTION Green computing sometimes also called Green Technology. In the green computing we use computer and its related other resources such as monitor, printer, hard disk, floppy disk, networking in very efficiently manner which has less impact on the environment. Green computing is about eco-friendly use of computer. Green computing is important for all type of system. It is important for handheld system to large scale data centre[1]. Many IT companies have been start the use of green computing to reduce the environment impact of their IT operations[2].Green computing is the emerging practice of using computing and information technology resources more efficiently while maintaining or improving overall performance. The concept identifies the barriers and benefits of green computing 340 Monika, Jyoti Yadav, Muskan and Romika Yadav Green computing is an environment friendly approach to manage information and communication technology.
    [Show full text]
  • Processamento Paralelo Em Cuda Aplicado Ao Modelo De Geração De Cenários Sintéticos De Vazões E Energias - Gevazp
    PROCESSAMENTO PARALELO EM CUDA APLICADO AO MODELO DE GERAÇÃO DE CENÁRIOS SINTÉTICOS DE VAZÕES E ENERGIAS - GEVAZP André Emanoel Rabello Quadros Dissertação de Mestrado apresentada ao Programa de Pós-Graduação em Engenharia Elétrica, COPPE, da Universidade Federal do Rio de Janeiro, como parte dos requisitos necessários à obtenção do título de Mestre em Engenharia Elétrica. Orientadora: Carmen Lucia Tancredo Borges Rio de Janeiro Março de 2016 PROCESSAMENTO PARALELO EM CUDA APLICADO AO MODELO DE GERAÇÃO DE CENÁRIOS SINTÉTICOS DE VAZÕES E ENERGIAS - GEVAZP André Emanoel Rabello Quadros DISSERTAÇÃO SUBMETIDA AO CORPO DOCENTE DO INSTITUTO ALBERTO LUIZ COIMBRA DE PÓS-GRADUAÇÃO E PESQUISA DE ENGENHARIA (COPPE) DA UNIVERSIDADE FEDERAL DO RIO DE JANEIRO COMO PARTE DOS REQUISITOS NECESSÁRIOS PARA A OBTENÇÃO DO GRAU DE MESTRE EM CIÊNCIAS EM ENGENHARIA ELÉTRICA. Examinada por: ________________________________________________ Profa. Carmen Lucia Tancredo Borges, D.Sc. ________________________________________________ Prof. Antônio Carlos Siqueira de Lima, D.Sc. ________________________________________________ Dra. Maria Elvira Piñeiro Maceira, D.Sc. ________________________________________________ Prof. Sérgio Barbosa Villas-Boas, Ph.D. RIO DE JANEIRO, RJ – BRASIL MARÇO DE 2016 iii Quadros, André Emanoel Rabello Processamento Paralelo em CUDA Aplicada ao Modelo de Geração de Cenários Sintéticos de Vazões e Energias – GEVAZP / André Emanoel Rabello Quadros. – Rio de Janeiro: UFRJ/COPPE, 2016. XII, 101 p.: il.; 29,7 cm. Orientador(a): Carmen Lucia Tancredo Borges Dissertação (mestrado) – UFRJ/ COPPE/ Programa de Engenharia Elétrica, 2016. Referências Bibliográficas: p. 9 6-99. 1. Programação Paralela. 2. GPU. 3. CUDA. 4. Planejamento da Operação Energética. 5. Geração de cenários sintéticos. I. Borges, Carmen Lucia Tancredo. II. Universidade Federal do Rio de Janeiro, COPPE, Programa de Engenharia Elétrica.
    [Show full text]
  • Green Computing Beyond the Data Center
    An IT Briefing produced by Green Computing Beyond the Data Center Sponsored By: Green Computing Beyond the Data Center By Helen Cademartori © 2007 TechTarget Author: Helen Cademartori Sr. IT Business Manager, Board Member CTB BIO Helen Cademartori is a member of the board of CTB. She has worked for a number of organizations across several industries, including health care, finance, and education, in the Northern California area, where there is a deep commitment to transforming and conserving energy usage in computing. This IT Briefing is based on a Faronics/TechTarget Webcast, “Green Computing Beyond the Data Center.” This TechTarget IT Briefing covers the following topics: • Introduction . 1 • Data Centers: Some Surprising Statistics . 1 • Power Usage: 2007. 1 • Projected Power Usage: 2011 . 1 • Beyond the Data Center: “Desktop Warming” . 1 • Actions to Take to Combat Desktop Warming . .2 • Measure Power Usage . .2 • Become ENERGY STAR-Compliant . .2 • Educate End Users . 3 • Let Your PC Sleep . 3 • Set Energy Reduction Goals. 3 • Report Back on Energy Saving Measures . 3 • Resources Available from Faronics Corporation and Others . 3 • Summary . 4 Copyright © 2007 Faronics. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. About TechTarget IT Briefings TechTarget IT Briefings provide the pertinent information that senior-level IT executives and managers need to make educated purchasing decisions. Originating from our industry-leading Vendor Connection and Expert Webcasts, TechTarget-produced IT Briefings turn Webcasts into easy-to-follow technical briefs, similar to white papers. Design Copyright © 2004–2007 TechTarget. All Rights Reserved. For inquiries and additional information, contact: Dennis Shiao Director of Product Management, Webcasts [email protected] Green Computing Beyond the Data Center Introduction comparison, the current annual power consumed by data centers would be sufficient to desalinate enough This document explains “green” computing.
    [Show full text]
  • The Von Neumann Computer Model 5/30/17, 10:03 PM
    The von Neumann Computer Model 5/30/17, 10:03 PM CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm The von Neumann Computer Model 1. The von Neumann Computer Model 2. Components of the Von Neumann Model 3. Communication Between Memory and Processing Unit 4. CPU data-path 5. Memory Operations 6. Understanding the MAR and the MDR 7. Understanding the MAR and the MDR, Cont. 8. ALU, the Processing Unit 9. ALU and the Word Length 10. Control Unit 11. Control Unit, Cont. 12. Input/Output 13. Input/Output Ports 14. Input/Output Address Space 15. Console Input/Output in Protected Memory Mode 16. Instruction Processing 17. Instruction Components 18. Why Learn Intel x86 ISA ? 19. Design of the x86 CPU Instruction Set 20. CPU Instruction Set 21. History of IBM PC 22. Early x86 Processor Family 23. 8086 and 8088 CPU 24. 80186 CPU 25. 80286 CPU 26. 80386 CPU 27. 80386 CPU, Cont. 28. 80486 CPU 29. Pentium (Intel 80586) 30. Pentium Pro 31. Pentium II 32. Itanium processor 1. The von Neumann Computer Model Von Neumann computer systems contain three main building blocks: The following block diagram shows major relationship between CPU components: the central processing unit (CPU), memory, and input/output devices (I/O). These three components are connected together using the system bus. The most prominent items within the CPU are the registers: they can be manipulated directly by a computer program. http://www.c-jump.com/CIS77/CPU/VonNeumann/lecture.html Page 1 of 15 IPR2017-01532 FanDuel, et al.
    [Show full text]
  • Lecture Notes
    Lecture #4-5: Computer Hardware (Overview and CPUs) CS106E Spring 2018, Young In these lectures, we begin our three-lecture exploration of Computer Hardware. We start by looking at the different types of computer components and how they interact during basic computer operations. Next, we focus specifically on the CPU (Central Processing Unit). We take a look at the Machine Language of the CPU and discover it’s really quite primitive. We explore how Compilers and Interpreters allow us to go from the High-Level Languages we are used to programming to the Low-Level machine language actually used by the CPU. Most modern CPUs are multicore. We take a look at when multicore provides big advantages and when it doesn’t. We also take a short look at Graphics Processing Units (GPUs) and what they might be used for. We end by taking a look at Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). Stanford President John Hennessy won the Turing Award (Computer Science’s equivalent of the Nobel Prize) for his work on RISC computing. Hardware and Software: Hardware refers to the physical components of a computer. Software refers to the programs or instructions that run on the physical computer. - We can entirely change the software on a computer, without changing the hardware and it will transform how the computer works. I can take an Apple MacBook for example, remove the Apple Software and install Microsoft Windows, and I now have a Window’s computer. - In the next two lectures we will focus entirely on Hardware.
    [Show full text]
  • A Study of Power Management Techniques in Green Computing Sadia Anayat
    I.J. Education and Management Engineering, 2020, 3, 42-51 Published Online June 2020 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2020.03.05 Available online at http://www.mecs-press.net/ijem A Study of Power Management Techniques in Green Computing Sadia Anayat Govt. College Woman University Sialkot Received: 06 April 2020; Accepted: 13 May 2020; Published: 08 June 2020 Abstract Cloud computing is a mechanism for allowing effective, easy and on-demand network access to a shared pool of computer resources. Instead of storing data on PCs and upgrading softwares to match your requirements, the internet services are used to save data or use its apps remotely. It perform the function of processing and storing a database to provide consumers with versatility. For specialized computational needs, the supercomputers are used in cloud computing. Because of execution of such high performances computers, a great deal of power devoured and the result is that certain dangerous gases are often emitted in a comparable amounts of energy. Green computing is the philosophy that aim to restrict this technique by introducing latest models that would work effectively while devouring less resources and having less people. The basic goal of this study is to discuss the techniques of green computing for achieving low power consumption. We analyze multiple power management techniques used in the virtual enviroment and further green computing uses are mentioned. The advantages of green computing discussed in this study have shown that it help in cutting cost of companies, save enviroment and maintain its sustainability. This work suggested that researchers are becoming ever more invloved in green computing technology.
    [Show full text]
  • Green Computing- a Case for Data Caching and Flash Disks?
    GREEN COMPUTING- A CASE FOR DATA CACHING AND FLASH DISKS? Karsten Schmidt, Theo Härder, Joachim Klein, Steffen Reithermann University of Kaiserslautern, Gottlieb-Daimler-Str., 67663 Kaiserslautern, Germany {kschmidt, haerder, jklein, reitherm}@informatik.uni-kl.de Keywords: Flash memory, flash disks, solid state disk, data caching, cache management, energy efficiency. Abstract: Green computing or energy saving when processing information is primarily considered a task of processor development. However, this position paper advocates that a holistic approach is necessary to reduce power consumption to a minimum. We discuss the potential of integrating NAND flash memory into DB-based ar- chitectures and its support by adjusted DBMS algorithms governing IO processing. The goal is to drastically improve energy efficiency while comparable performance as is disk-based systems is maintained. 1 INTRODUCTION personal digital assistants (PDAs), pocket PCs, or digital cameras and provides the great advantage of Recently, green computing gained a lot of attention zero-energy needs, when idle or turned off. In these and visibility also triggered by public discussion cases, flash use could be optimally configured to concerning global warming due to increased CO2 typical single-user workloads known in advance. emissions. It was primarily addressed by enhanced However, not all aspects count for DB servers and research and development efforts to reduce power not all DBMS algorithms can be directly applied usage, heat transmission, and, in turn, cooling
    [Show full text]
  • Green Computing: Techniques and Challenges in Creating Friendly Computing Environments in Developing Economies
    International Journal of Research and Scientific Innovation (IJRSI) | Volume VII, Issue IX, September 2020 | ISSN 2321–2705 Green Computing: Techniques and Challenges in Creating Friendly Computing Environments in Developing Economies Kadima Victor Chitechi Masinde Muliro University of Science & Technology, Kenya Abstract: The adoption of ICT’s has currently realised adoption of the technique, most electronic consumers have advancements in technologies such as faster internet connectivity implemented their devices to run on sleep mode (Sidhu, has changed the way we live, work, learn and play , it is affecting 2016). our environment in several ways. Today use of ICT has enabled and created many opportunities for employment round the This concept of green computing can be used in globe, as the computer literacy becomes a prerequisite condition environmental science to offer economically possible for sustenance in almost every sector. Besides this, ICT has solutions for conserving natural resources. Green computing impacted both positively and negatively on our environment. To is designing, manufacturing, using and disposing of computers grow awareness about environmental impact of computing, and its resources efficiently with minimal or no impact on green technology is gaining increasing importance. Green ICT as environment) (Sidhu, 2016). The goals of Green computing a concept has been popularized to achieve energy efficiency and minimize consumption of energy by e-equipment. Climate are to manage the power and energy efficiency, choice of eco- change is one of the main environmental concerns being friendly hardware and software, and recycling the material to addressed globally; our environment has been changing thus increase. Green computing, also called green technology, is posted more worrying because it is impossible to predict exactly the environmentally sustainable use of computers and related how it will develop and what the consequences will be.
    [Show full text]
  • A Survey of Resource Scheduling Algorithms in Green Computing
    Arshjot Kaur et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (4) , 2014, 4886-4890 A Survey of Resource Scheduling Algorithms in Green Computing Arshjot Kaur, Supriya Kinger Department of Computer Science and Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India (140406) Abstract-Cloud computing has emerged as an optimal way of C. Cloud service models sharing and providing resources over the internet. Its service models have helped a lot in providing sources easily to the users. Green computing is now gaining a lot of importance and is an intense situation where all the major issues need to be resolved. Rapid growth of the internet, increasing cooling requirements and increased rate of power has led to adoption of green computing. Implementation of green computing has become important so as to secure the environment. Emphasis has been laid on virtualisation, power management, material recycling and telecommunicating. Still,a lot needs to be done. The work habits of computer users and business can be modified to minimise adverse impact on the global environment. Job scheduling is one of the challenging issues in Figure 1.Cloud service models green computing. A lot of work needs to be done on deciding Figure 1 explains the following: the priority of jobs and time issues so as to provide efficient 1. Infrastructure as a service (Iaas): It is the first layer of execution of user jobs. Assigning the right priority with loud computing. This model helps in managing our reduced time and less energy consumption needs to be focussed on. applications, data operating system, middleware and runtime.
    [Show full text]
  • Theoretical Peak FLOPS Per Instruction Set on Modern Intel Cpus
    Theoretical Peak FLOPS per instruction set on modern Intel CPUs Romain Dolbeau Bull – Center for Excellence in Parallel Programming Email: [email protected] Abstract—It used to be that evaluating the theoretical and potentially multiple threads per core. Vector of peak performance of a CPU in FLOPS (floating point varying sizes. And more sophisticated instructions. operations per seconds) was merely a matter of multiplying Equation2 describes a more realistic view, that we the frequency by the number of floating-point instructions will explain in details in the rest of the paper, first per cycles. Today however, CPUs have features such as vectorization, fused multiply-add, hyper-threading or in general in sectionII and then for the specific “turbo” mode. In this paper, we look into this theoretical cases of Intel CPUs: first a simple one from the peak for recent full-featured Intel CPUs., taking into Nehalem/Westmere era in section III and then the account not only the simple absolute peak, but also the full complexity of the Haswell family in sectionIV. relevant instruction sets and encoding and the frequency A complement to this paper titled “Theoretical Peak scaling behavior of current Intel CPUs. FLOPS per instruction set on less conventional Revision 1.41, 2016/10/04 08:49:16 Index Terms—FLOPS hardware” [1] covers other computing devices. flop 9 I. INTRODUCTION > operation> High performance computing thrives on fast com- > > putations and high memory bandwidth. But before > operations => any code or even benchmark is run, the very first × micro − architecture instruction number to evaluate a system is the theoretical peak > > - how many floating-point operations the system > can theoretically execute in a given time.
    [Show full text]
  • COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011
    COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Edgar Gabriel Spring 2011 Edgar Gabriel Moore’s Law • Long-term trend on the number of transistor per integrated circuit • Number of transistors double every ~18 month Source: http://en.wikipedia.org/wki/Images:Moores_law.svg COSC 6385 – Computer Architecture Edgar Gabriel 1 What do we do with that many transistors? • Optimizing the execution of a single instruction stream through – Pipelining • Overlap the execution of multiple instructions • Example: all RISC architectures; Intel x86 underneath the hood – Out-of-order execution: • Allow instructions to overtake each other in accordance with code dependencies (RAW, WAW, WAR) • Example: all commercial processors (Intel, AMD, IBM, SUN) – Branch prediction and speculative execution: • Reduce the number of stall cycles due to unresolved branches • Example: (nearly) all commercial processors COSC 6385 – Computer Architecture Edgar Gabriel What do we do with that many transistors? (II) – Multi-issue processors: • Allow multiple instructions to start execution per clock cycle • Superscalar (Intel x86, AMD, …) vs. VLIW architectures – VLIW/EPIC architectures: • Allow compilers to indicate independent instructions per issue packet • Example: Intel Itanium series – Vector units: • Allow for the efficient expression and execution of vector operations • Example: SSE, SSE2, SSE3, SSE4 instructions COSC 6385 – Computer Architecture Edgar Gabriel 2 Limitations of optimizing a single instruction
    [Show full text]