Energy Efficiency Analysis of Arm Big.Little Global Task Scheduling

Total Page:16

File Type:pdf, Size:1020Kb

Energy Efficiency Analysis of Arm Big.Little Global Task Scheduling ENERGY EFFICIENCY ANALYSIS OF ARM BIG.LITTLE GLOBAL TASK SCHEDULING Bachelor Degree Project in Informatics with Specialization towards Network and System Administration Level G2E, 15hp Spring term 2015 Sigurd Israelsson Supervisor: Thomas Fischer Examiner: Jonas Mellin ABSTRACT In this paper an ARM big.LITTLE system with Global Task Scheduling is evaluated in terms of its energy efficiency and performance measured in execution time. The big.LITTLE system is evaluated against the same system but with only the big or LITTLE processor active. The evaluation is done by performing experiments that target three different levels of load: full load, varying load and low load. The benchmarking software PARSEC Blackscholes and BBench are used to put the system under a synthetic workload in the tests. The results show that overall big.LITTLE achieves an improvement in execution time for all test scenarios, although very slim for varying load, and is more energy efficient than the big processor with the possible exception of a low load scenario. However, the LITTLE processor by itself is found to be the most energy efficient system even though it showed the slowest execution time. Keywords: ARM big.LITTLE, Global Task Scheduling, PARSEC Blackscholes, BBench, Energy efficiency Table of Contents 1. Introduction ............................................................................................................................ 1 2. Background ............................................................................................................................ 2 2.1 ARM big.LITTLE ............................................................................................................ 2 2.2 Power and Energy ............................................................................................................ 3 2.3 Energy efficiency ............................................................................................................. 3 2.4 INA231 Sensor ................................................................................................................. 3 3. Problem Description and Problem Statement ........................................................................ 4 3.1 Purpose ............................................................................................................................. 4 3.2 Motivation ........................................................................................................................ 4 3.3 Objectives ......................................................................................................................... 5 4. Methodology .......................................................................................................................... 6 4.1 Validity ............................................................................................................................. 6 4.2 Evaluation of workloads ................................................................................................... 9 5. Implementation ..................................................................................................................... 13 5.1 Experimental Platform ................................................................................................... 13 5.2 System Setup .................................................................................................................. 13 5.3 Test configurations ......................................................................................................... 15 5.3.1 Parsec Blackscholes ................................................................................................ 15 5.3.2 BBench .................................................................................................................... 16 5.3.3 Idle ........................................................................................................................... 16 5.5 Measurement Tools ........................................................................................................ 17 5.6 Experimental Procedure ................................................................................................. 18 5. Results .................................................................................................................................. 20 5.1 Idle .................................................................................................................................. 20 5.2 BBench ........................................................................................................................... 21 5.3 PARSEC Blackscholes ................................................................................................... 23 6. Analysis ................................................................................................................................ 26 7. Related work ........................................................................................................................ 28 8. Conclusion ............................................................................................................................ 30 9. Discussion ............................................................................................................................ 31 10. Future Work ....................................................................................................................... 32 11. Non-technical summary ..................................................................................................... 33 12. References .......................................................................................................................... 34 Appendix A. Threats ................................................................................................................ 37 Appendix B. Scripts ................................................................................................................. 38 Appendix C. Benchmark software evaluation .......................................................................... 40 1. Introduction This work will evaluate the performance, in terms of execution time, and energy efficiency of the latest ARM big.LITTLE design called global task scheduling. The big.LITTLE system employing this scheduling technique will be evaluated against the internal little and big processors individually. For a long time processors from ARM holdings have designed small, low power processors that is often found in embedded devices and cellphones. It is estimated that technology from ARM exists in 95% of every smartphone (Bent, 2012). In cellphones and other mobile platforms performance is increasingly being requested as the users expect more and more functionality from their devices. Battery capacity and design goals concerning it are limiting factors to how much performance can be implemented in mobile devices. (Kingsley-Hughes, 2012) This leads to innovations in design to increase performance while reducing impact on battery time. One fairly recent innovation from ARM has been to design system on chips (SoCs) that make use of low power processors together with high performance ones to be able to provide a low energy usage for long periods of low workloads while at the same time being able to provide high performance for occasional high intensity workloads. This innovation is called ARM big.LITTLE. Energy efficiency is not only a concern in mobile devices such as smartphones, tablets and laptops. Energy efficiency is an increasing concern in nearly all IT and computing today. In large data centers power hungry servers not only increase the total cost of electricity, but also generate a lot of heat which has to be diverted with cooling solutions. Another infrastructural cost that is directly affected by power consumption is the cost of backup power solutions. By using less energy to perform the same task the total cost of ownership can decrease dramatically. Hamilton (2010) calculated a model for data centers where he concluded that power consumption and the infrastructure to support it consisted of 31 percent of the total cost of ownership within the data center. In a study by the Natural Resources Defense Council (2014) the total energy consumption of U.S. servers in the year 2013 was estimated to be 91 billion kilo watt-hours which cost roughly 9 billion U.S. dollars in electricity. The power consumption of computers worldwide has spawned a new term called Green Computing. Green Computing describes an aim to reduce carbon emission from computers by making sure that the energy efficiency is as high as possible and that the electricity used comes from reliable sources. 1 2. Background This section describes important concepts and theory necessary to understand the work done in this paper. 2.1 ARM big.LITTLE ARM Holdings is a supplier of intellectual property (IP) for processing units and other related technologies. ARMs business model is to license its IP to semiconductor businesses who then can incorporate ARMs technology in their systems (ARM, n.d, a). Among other things ARM develops its own instruction set architecture (ISA) as well as design application processors, real-time processors and embedded processors. The two most recent architectures from ARM are ARMv7 and ARMv8. The main difference between the two architectures is the instruction length. ARMv7 uses a 32-bit instruction set while the newer ARMv8 introduces support for 64-bits. (ARM, n.d, b)
Recommended publications
  • EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President
    EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President ISPASS 2005 Certified Performance Analysis for Embedded Systems Designers EEMBC: A Historical Perspective • Began as an EDN Magazine project in April 1997 • Replace Dhrystone • Have meaningful measure for explaining processor behavior • Developed business model • Invited worldwide processor vendors • A consortium was born 1 EEMBC Membership • Board Member • Membership Dues: $30,000 (1st year); $16,000 (subsequent years) • Access and Participation on ALL Subcommittees • Full Voting Rights • Subcommittee Member • Membership Dues Are Subcommittee Specific • Access to Specific Benchmarks • Technical Involvement Within Subcommittee • Help Determine Next Generation Benchmarks • Special Academic Membership EEMBC Philosophy: Standardized Benchmarks and Certified Scores • Member derived benchmarks • Determine the standard, the process, and the benchmarks • Open to industry feedback • Ensures all processor/compiler vendors are running the same tests • Certification process ensures credibility • All benchmark scores officially validated before publication • The entire benchmark environment must be disclosed 2 Embedded Industry: Represented by Very Diverse Applications • Networking • Storage, low- and high-end routers, switches • Consumer • Games, set top boxes, car navigation, smartcards • Wireless • Cellular, routers • Office Automation • Printers, copiers, imaging • Automotive • Engine control, Telematics Traditional Division of Embedded Applications Low High Power
    [Show full text]
  • Cloud Workbench a Web-Based Framework for Benchmarking Cloud Services
    Bachelor August 12, 2014 Cloud WorkBench A Web-Based Framework for Benchmarking Cloud Services Joel Scheuner of Muensterlingen, Switzerland (10-741-494) supervised by Prof. Dr. Harald C. Gall Philipp Leitner, Jürgen Cito software evolution & architecture lab Bachelor Cloud WorkBench A Web-Based Framework for Benchmarking Cloud Services Joel Scheuner software evolution & architecture lab Bachelor Author: Joel Scheuner, [email protected] Project period: 04.03.2014 - 14.08.2014 Software Evolution & Architecture Lab Department of Informatics, University of Zurich Acknowledgements This bachelor thesis constitutes the last milestone on the way to my first academic graduation. I would like to thank all the people who accompanied and supported me in the last four years of study and work in industry paving the way to this thesis. My personal thanks go to my parents supporting me in many ways. The decision to choose a complex topic in an area where I had no personal experience in advance, neither theoretically nor technologically, made the past four and a half months challenging, demanding, work-intensive, but also very educational which is what remains in good memory afterwards. Regarding this thesis, I would like to offer special thanks to Philipp Leitner who assisted me during the whole process and gave valuable advices. Moreover, I want to thank Jürgen Cito for his mainly technologically-related recommendations, Rita Erne for her time spent with me on language review sessions, and Prof. Harald Gall for giving me the opportunity to write this thesis at the Software Evolution & Architecture Lab at the University of Zurich and providing fundings and infrastructure to realize this thesis.
    [Show full text]
  • Automatic Benchmark Profiling Through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin
    Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin To cite this version: Alexis Martin, Vania Marangozova-Martin. Automatic Benchmark Profiling through Advanced Trace Analysis. [Research Report] RR-8889, Inria - Research Centre Grenoble – Rhône-Alpes; Université Grenoble Alpes; CNRS. 2016. hal-01292618 HAL Id: hal-01292618 https://hal.inria.fr/hal-01292618 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin , Vania Marangozova-Martin RESEARCH REPORT N° 8889 March 23, 2016 Project-Team Polaris ISSN 0249-6399 ISRN INRIA/RR--8889--FR+ENG Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin ∗ † ‡, Vania Marangozova-Martin ∗ † ‡ Project-Team Polaris Research Report n° 8889 — March 23, 2016 — 15 pages Abstract: Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling. It computes unified benchmark profiles reflecting benchmarks’ duration, function repartition, stability, CPU efficiency, parallelization and memory usage.
    [Show full text]
  • CMSC 411: Computer Architecture
    CMSC 411: Computer Architecture Spring 2019 Jason Tang "1 About Your Friendly Instructor • Jason Tang (just call me Jason!)! • UMBC adjunct faculty member since 2012! • Taught CMSC 104, 202, 421, and 411! • Work full-time at a nearby mega-corporation as a software engineer "2 Contact Information • Email me at [email protected]! • O$ce in ITE 201C! • Tuesday / Thursday, 7:00 pm - 8:00 pm, right after class! • Teaching Assistant:! • TBA! • "3 Am I in the Right Class? • Prerequisites are:! • CMSC 313, or! • CMPE 212 + CMPE 310! • Must be able to read hexadecimal notation! • Should already by familiar with C/C++ and some assembly code! • This does not mean Java, Python, or other scripting language "4 Required Programming Knowledge • Know how (or research on Stack Overflow) to do these things:! • Read the very fantastic man pages! • Call a function and pass values in and out! • Di%erence between an in parameter, out parameter, and in/out parameter! • Know what a C++ reference technically is! • Understand basic boolean logic "5 Topics Covered • Instruction Sets! • Performance Measurements! • Machine Arithmetic! • Processor Design! • Memory Systems! • I/O Design! • Computer Buses "6 Course Information • http://www.csee.umbc.edu/~jtang/cs411.s19! • Grades will be posted on Blackboard! • Discussion forums are also on Blackboard! • All assignments submitted via submit system at linux.gl.umbc.edu! • Ensure you have a way to transfer files between your development machine and UMBC server (scp, PuTTy, Cyberduck, or equivalent)! • Using the clipboard to
    [Show full text]
  • Automatic Benchmark Profiling Through Advanced Trace Analysis
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Hal - Université Grenoble Alpes Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin To cite this version: Alexis Martin, Vania Marangozova-Martin. Automatic Benchmark Profiling through Advanced Trace Analysis. [Research Report] RR-8889, Inria - Research Centre Grenoble { Rh^one-Alpes; Universit´eGrenoble Alpes; CNRS. 2016. <hal-01292618> HAL Id: hal-01292618 https://hal.inria.fr/hal-01292618 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´eeau d´ep^otet `ala diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´esou non, lished or not. The documents may come from ´emanant des ´etablissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou ´etrangers,des laboratoires abroad, or from public or private research centers. publics ou priv´es. Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin , Vania Marangozova-Martin RESEARCH REPORT N° 8889 March 23, 2016 Project-Team Polaris ISSN 0249-6399 ISRN INRIA/RR--8889--FR+ENG Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin ∗ † ‡, Vania Marangozova-Martin ∗ † ‡ Project-Team Polaris Research Report n° 8889 — March 23, 2016 — 15 pages Abstract: Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling.
    [Show full text]
  • Introduction to Digital Signal Processors
    INTRODUCTION TO Accumulator architecture DIGITAL SIGNAL PROCESSORS Memory-register architecture Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Load-store architecture Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 78712-1084 http://anchovy.ece.utexas.edu/ Outline n Signal processing applications n Conventional DSP architecture n Pipelining in DSP processors n RISC vs. DSP processor architectures n TI TMS320C6x VLIW DSP architecture n Signal and image processing applications n Signal processing on general-purpose processors n Conclusion 2 Signal Processing Applications n Low-cost embedded systems 4 Modems, cellular telephones, disk drives, printers n High-throughput applications 4 Halftoning, radar, high-resolution sonar, tomography n PC based multimedia 4 Compression/decompression of audio, graphics, video n Embedded processor requirements 4 Inexpensive with small area and volume 4 Deterministic interrupt service routine latency 4 Low power: ~50 mW (TMS320C5402/20: 0.32 mA/MIP) 3 Conventional DSP Architecture n High data throughput 4 Harvard architecture n Separate data memory/bus and program memory/bus n Three reads and one or two writes per instruction cycle 4 Short deterministic interrupt service routine latency 4 Multiply-accumulate (MAC) in a single instruction cycle 4 Special addressing modes supported in hardware n Modulo addressing for circular buffers (e.g. FIR filters) n Bit-reversed addressing (e.g. fast Fourier transforms) 4Instructions to keep the
    [Show full text]
  • Geekbench 3 License Keygen 26
    1 / 5 Geekbench 3 License Keygen 26 Geekbench 3 License Keygen 26 > http://fancli.com/19xvdd 1a8c34a149 Geekbench 3 is Primate Labs' cross-platform processor benchmark, .... Call of duty 4 Serial key Included For Pc Only free · [P3D] - Beech B200 ... full hd Don 2 movies free download 720p torrent ... geekbench 3 license keygen 26. 5960x cpu cache voltage, Intel Core i7-5960X - Benchmark, Geekbench 5, Cinebench R20, ... The Intel XScale processor supports a range of frequencies and voltages in order to allow the user to save power [3]. Instead ... Serial number to imei converter free ... Vcenter 7 keygen ... Jobsmart 26 gallon air compressor specs.. Geekbench 3 License Keygen 26 DOWNLOAD: http://cinurl.com/1ff2qd geekbench keygen, geekbench 4 keygen, geekbench 3 keygen 608fcfdb5b 3 Crack 61 .... ... work with our system. 3. Ready for kiosk ... II gaming platform. Novomatic(Gaminator) offline casino system ... geekbench 3 license keygen 26. Coverage setting aviation 4 audio hijack pro keygen. ... Download Crack Archicad 19 4006 Ita: Audio hijack pro 2.10.5 keygen ... Geekbench 3.. I told them that I wasn't expecting that and have adobe cs5 keygen mac 2020 reason to do so given that CS3 is serving me. ... To or adobe mindjet 26 adobe. ... These Geekbench 3 benchmarks are in bit mode and are for a single processor .... Crack Data Glitch 2 0 1. 3 Juin 2020 0. data glitch, data ... Rowbyte Data Glitch 2.0; Data Glitch 2 Serial Serial Numbers. Con . ... geekbench 3 license keygen 26 Increase benchmark scores (Antutu, Geekbench, Quadrant) Use App ... PlayerPro Music Player v4.2 APK Is Here [LATEST] Driver Magician 5.0 Keygen Is Here! ..
    [Show full text]
  • BOOM): an Industry- Competitive, Synthesizable, Parameterized RISC-V Processor
    The Berkeley Out-of-Order Machine (BOOM): An Industry- Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio David A. Patterson Krste Asanović Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-167 http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-167.html June 13, 2015 Copyright © 2015, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio, David Patterson, and Krste Asanovic´ University of California, Berkeley, California 94720–1770 [email protected] BOOM is a work-in-progress. Results shown are prelimi- nary and subject to change as of 2015 June. I$ L1 D$ (32k) L2 data 1. The Berkeley Out-of-Order Machine BOOM is a synthesizable, parameterized, superscalar out- exe of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of uncore regfile out-of-order processors. Our goal is to provide a readable, issue open-source implementation for use in education, research, exe and industry. uncore BOOM is written in roughly 9,000 lines of the hardware L2 data (256k) construction language Chisel.
    [Show full text]
  • A Free, Commercially Representative Embedded Benchmark Suite
    MiBench: A free, commercially representative embedded benchmark suite Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, Richard B. Brown {mguthaus,jringenb,ernstd,taustin,tnm,brown}@eecs.umich.edu The University of Michigan Electrical Engineering and Computer Science 1301 Beal Ave., Ann Arbor, MI 48109-2122 Abstract Among the common features of these This paper examines a set of commercially microarchitectures are deep pipelines, significant representative embedded programs and compares them instruction level parallelism, sophisticated branch to an existing benchmark suite, SPEC2000. A new prediction schemes, and large caches. version of SimpleScalar that has been adapted to the Although this class of machines has been the chief ARM instruction set is used to characterize the focus of the computer architecture community, performance of the benchmarks using configurations relatively few microprocessors are employed in this similar to current and next generation embedded market segment. The vast majority of microprocessors processors. Several characteristics distinguish the are employed in embedded applications [8]. Although representative embedded programs from the existing many are just inexpensive microcontrollers, their SPEC benchmarks including instruction distribution, combined sales account for nearly half of all memory behavior, and available parallelism. The microprocessor revenue. Furthermore, the embedded embedded benchmarks, called MiBench, are freely application domain is the fastest growing market available to all researchers. segment in the microprocessor industry. The wide range of applications makes it difficult to 1. Introduction characterize the embedded domain. In fact, an Performance based design has made benchmarking embedded benchmark suite should reflect this by a critical part of the design process [1].
    [Show full text]
  • Geekbench 5 Compute Workloads
    Geekbench 5 Compute Workloads Introduction 3 Platform Support 4 API Support 4 Runtime 5 Scores 6 Comparing Scores 6 Compute Workloads 7 Sobel 7 Canny 7 Stereo Matching 7 Histogram Equalization 7 Gaussian Blur 8 Depth of Field 8 Face Detection 8 Horizon Detection 8 Feature Matching 9 Particle Physics 9 SFFT 9 September 2019 "2 Introduction This document outlines the workloads included in the Geekbench 5 Compute Benchmark suite. Compute Benchmark scores are used to evaluate and optimize GPU Compute performance using workloads that include image processing, computational photography, computer vision, and machine learning. Performance in these workloads is important for a wide variety of applications including cameras, image editors, and real-time renderers. September 2019 "3 Platform Support Platform Minimum Version Comment Android Android 7 “Nougat” iOS iOS 12 Linux Ubuntu 16.04 LTS macOS macOS 10.13 Windows Windows 10 API Support Geekbench 5 supports the following GPU Compute APIs: API Version Comment CUDA CUDA 9.0 Compute Capability 3.0 or later. Metal Metal 2.0 Metal 2.1 if available. OpenCL OpenCL 1.1 Vulkan Vulkan 1.0 September 2019 "4 Runtime Geekbench 5 runs Compute workloads in the order listed here as the Compute Benchmark. Each workload is run for 20 iterations by default. September 2019 "5 Scores Geekbench 5 provides one overall score for the Compute Benchmark. The overall score is the geometric mean of the scores of the individual Compute workloads. Comparing Scores Each Compute workload has an implementation for each supported Compute API. While it is possible to compare scores across APIs (e.g., a OpenCL score with a Metal score) it is important to keep in mind that due to the nature of Compute APIs, the performance difference can be due to more than differences in the underlying hardware (e.g., the GPU driver can have a huge impact on performance).! September 2019 "6 Compute Workloads Sobel The Sobel operator is used in image processing and computer vision for finding edges in images.
    [Show full text]
  • Geekbench Browser
    MAcPro7,1 - Geekbench Browser https://browser.geekbench.com/v4/cpu/15449062 Geekbench Browser Geekbench 5 Top Single-Core Results Top Multi-Core Results Recent Results Search Results Recent Compute Results Search Compute Results Geekbench 4 Top Single-Core Results Top Multi-Core Results Recent Results Search Results Recent Compute Results Search Compute Results Benchmark Charts Android Benchmark Chart iOS Benchmark Chart Mac Benchmark Chart Processor Benchmark Chart CUDA Benchmark Chart Metal Benchmark Chart OpenCL Benchmark Chart Vulkan Benchmark Chart Search Account Profile Settings Log Out This result has been added to your account. MAcPro7,1 Single-Core Score Multi-Core Score 6567 38907 Geekbench 4.4.2 Tryout for Mac OS X x86 (64-bit) Result Information User foster2005 Upload Date May 01 2020 07:27 AM Views 1 System Information System Information Operating System macOS 10.15.5 (Build 19F72f) Model MAcPro7,1 Motherboard Acidanthera Mac-EE2EBD4B90B839A8 MacBook10,1 Memory 32.00 GB 3200 MHz DDR4 Northbridge 1 of 6 5/1/20, 1:33 AM MAcPro7,1 - Geekbench Browser https://browser.geekbench.com/v4/cpu/15449062 System Information Southbridge BIOS Acidanthera 185.0.0.0.0 Processor Information Name Intel Core i9-9900K Topology 1 Processor, 8 Cores, 16 Threads Identifier GenuineIntel Family 6 Model 158 Stepping 12 Base Frequency 3.60 GHz Package Codename L1 Instruction Cache 32.0 KB x 8 L1 Data Cache 32.0 KB x 8 L2 Cache 256 KB x 8 L3 Cache 16.0 MB x 1 Single-Core Performance Single-Core Score 6567 Crypto Score 4285 Integer Score 6545 Floating Point
    [Show full text]
  • Low-Power High Performance Computing
    Low-Power High Performance Computing Panagiotis Kritikakos August 16, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract The emerging development of computer systems to be used for HPC require a change in the architecture for processors. New design approaches and technologies need to be embraced by the HPC community for making a case for new approaches in system design for making it possible to be used for Exascale Supercomputers within the next two decades, as well as to reduce the CO2 emissions of supercomputers and scientific clusters, leading to greener computing. Power is listed as one of the most important issues and constraint for future Exascale systems. In this project we build a hybrid cluster, investigating, measuring and evaluating the performance of low-power CPUs, such as Intel Atom and ARM (Marvell 88F6281) against commodity Intel Xeon CPU that can be found within standard HPC and data-center clusters. Three main factors are considered: computational performance and efficiency, power efficiency and porting effort. Contents 1 Introduction 1 1.1 Report organisation . 2 2 Background 3 2.1 RISC versus CISC . 3 2.2 HPC Architectures . 4 2.2.1 System architectures . 4 2.2.2 Memory architectures . 5 2.3 Power issues in modern HPC systems . 9 2.4 Energy and application efficiency . 10 3 Literature review 12 3.1 Green500 . 12 3.2 Supercomputing in Small Spaces (SSS) . 12 3.3 The AppleTV Cluster . 13 3.4 Sony Playstation 3 Cluster . 13 3.5 Microsoft XBox Cluster . 14 3.6 IBM BlueGene/Q .
    [Show full text]