ARCHITECTURE Fuat Keceli, Doctor of Philosophy

Total Page:16

File Type:pdf, Size:1020Kb

ARCHITECTURE Fuat Keceli, Doctor of Philosophy ABSTRACT Title of dissertation: POWER AND PERFORMANCE STUDIES OF THE EXPLICIT MULTI-THREADING (XMT) ARCHITECTURE Fuat Keceli, Doctor of Philosophy, 2011 Dissertation directed by: Professor Uzi Vishkin Department of Electrical and Computer Engineering Power and thermal constraints gained critical importance in the design of micropro- cessors over the past decade. Chipmakers failed to keep power at bay while sustaining the performance growth of serial computers at the rate expected by consumers. As an alter- native, they turned to fitting an increasing number of simpler cores on a single die. While this is a step forward for relaxing the constraints, the issue of power is far from resolved and it is joined by new challenges which we explain next. As we move into the era of many-cores, processors consisting of 100s, even 1000s of cores, single-task parallelism is the natural path for building faster general-purpose com- puters. Alas, the introduction of parallelism to the mainstream general-purpose domain brings another long elusive problem to focus: ease of parallel programming. The result is the dual challenge where power efficiency and ease-of-programming are vital for the prevalence of up and coming many-core architectures. The observations above led to the lead goal of this dissertation: a first order valida- tion of the claim that even under power/thermal constraints, ease-of-programming and com- petitive performance need not be conflicting objectives for a massively-parallel general- purpose processor. As our platform, we choose the eXplicit Multi-Threading (XMT) many- core architecture for fine grained parallel programs developed at the University of Mary- land. We hope that our findings will be a trailblazer for future commercial products. XMT scales up to thousand or more lightweight cores and aims at improving single task execution time while making the task for the programmer as easy as possible. Per- formance advantages and ease-of-programming of XMT have been shown in a number of publications, including a study that we present in this dissertation. Feasibility of the hardware concept has been exhibited via FPGA and ASIC (per our partial involvement) prototypes. Our contributions target the study of power and thermal envelopes of an envisioned 1024-core XMT chip (XMT1024) under programs that exist in popular parallel benchmark suites. First, we compare XMT against an area and power equivalent commercial high-end many-core GPU. We demonstrate that XMT can provide an average speedup of 8.8x in ir- regular parallel programs that are common and important in general purpose computing. Even under the worst-case power estimation assumptions for XMT, average speedup is only reduced by half. We further this study by experimentally evaluating the performance advantages of Dynamic Thermal Management (DTM), when applied to XMT1024. DTM tech- niques are frequently used in current single and multi-core processors, however until now their effects on single-tasked many-cores have not been examined in detail. It is our pur- pose to explore how existing techniques can be tailored for XMT to improve performance. Performance improvements up to 46% over a generic global management technique has been demonstrated. The insights we provide can guide designers of other similar many- core architectures. A significant infrastructure contribution of this dissertation is a highly configurable cycle-accurate simulator, XMTSim. To our knowledge, XMTSim is currently the only publicly-available shared-memory many-core simulator with extensive capabilities for es- timating power and temperature, as well as evaluating dynamic power and thermal man- agement algorithms. As a major component of the XMT programming toolchain, it is not only used as the infrastructure in this work but also contributed to other publications and dissertations. POWER AND PERFORMANCE STUDIES OF THE EXPLICIT MULTI-THREADING (XMT) ARCHITECTURE by Fuat Keceli Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2011 Advisory Committee: Professor Uzi Vishkin, Chair/Advisor Assistant Professor Tali Moreshet Associate Professor Manoj Franklin Associate Professor Gang Qu Associate Professor William W. Pugh c Copyright by Fuat Keceli 2011 To my loving mother, Gulbun, who never stopped believing in me. Anneme. ii Acknowledgments I would like to sincerely thank my advisors Dr. Uzi Vishkin and Dr. Tali Moreshet for their guidance. It has been a long but rewarding journey through which they have always been understanding and patient with me. Dr. Vishkin’s invaluable experiences, unique insight and perseverance not only shaped my research, but also gave me an everlasting perspective on defining and solving problems. Dr. Moreshet selflessly spent countless hours in discussions with me, during which we cultivated the ideas that were collected in this dissertation. Her advice as a mentor and a friend kept me focused and taught me how to convey ideas clearly. It was a privilege to have both as my advisors. I am indebted to the past and present members of the XMT team, especially my good friends George C. Caragea and Alexandros Tzannes with whom I worked closely and puz- zled over many problems. James Edwards’ collaboration and feedback was often crucial. Aydin Balkan, Xingzhi Wen and Michael Horak were exceptional colleagues during their time in the team. I would not have made it to the finish line without their help. It was the sympathetic ear of many friends that kept me sane through the course of difficult years. They have helped me deal with the setbacks along the way, push through the pain of my father’s long illness and gave me a space where I can let off steam. Thank you (in no particular order), Apoorva Prasad, Thanos Chrysis, Harsh Dhundia, Nimi Dvir, Orkan Dere, Bulent Boyaci, and everybody else that I may have inadvertently left out. Apoorva Prasad deserves a special mention, for he took the time to proofread most of this dissertation during his brief visit from overseas. Finally and most importantly, I would like express my heart-felt gratitude to my family. I am lucky that my mother Gulbun and my brother Alp were, are, and always will be with me. Alas, my father Ismail, who was the first engineer that I knew and a very good one, passed away in 2009; he lives in our memories and our hearts. I would not be the person that I am today without their unwavering support, encouragement, and love. I owe many thanks to Tina Peterson for being my family away from home, for her continuous support and concern. Lastly, I will always remember my grandfather, Muzaffer Tuncel, as the person who planted the seeds of scientific curiosity in me. iii Table of Contents List of Tables ix List of Figures x List of Abbreviations xiii 1 Introduction 1 2 Background on Power/Temperature Estimation and Management 5 2.1 Sources of Power Dissipation in Digital Processors ............... 5 2.2 Dynamic Power Consumption .......................... 6 2.2.1 Clock Gating ................................ 6 2.2.2 Estimation of of Dynamic Power in Simulation ............. 7 2.3 Leakage Power Consumption ........................... 9 2.3.1 Management of Leakage Power ..................... 10 2.3.2 Estimation of of Leakage Power in Simulation ............. 11 2.4 Power Consumption in On-Chip Memories ................... 12 2.5 Power and Clock Speed Trade-offs ........................ 12 2.6 Dynamic Scaling of Voltage and Frequency ................... 14 2.7 Technology Scaling Trends ............................. 16 2.8 Thermal Modeling and Design .......................... 18 2.9 Power and Thermal Constraints in Recent Processors ............. 21 2.10 Tools for Area, Power and Temperature Estimation .............. 22 3 The Explicit Multi-Threading (XMT) Platform 24 3.1 The XMT Architecture ............................... 25 3.1.1 Memory Organization ........................... 26 3.1.2 The Mesh-of-Trees Interconnect (MoT-ICN) ............... 26 3.2 Programming of XMT ............................... 28 3.2.1 The PRAM Model ............................. 28 3.2.2 XMTC – Enhanced C Programming for XMT .............. 29 3.2.3 The Prefix-Sum Operation ......................... 30 iv 3.2.4 Example Program .............................. 31 3.2.5 Independence-of-Order and No-Busy-Wait ............... 31 3.2.6 Ease-of-Programming ........................... 32 3.3 Thread Scheduling in XMT ............................ 34 3.4 Performance Advantages ............................. 36 3.5 Power Efficiency of XMT and Design for Power Management ........ 36 3.5.1 Suitability of the Programming Model .................. 37 3.5.2 Re-designing Thread Scheduling for Power ............... 37 3.5.3 Low Power States for Clusters ...................... 40 3.5.4 Power Management of the Synchronous MoT-ICN ........... 41 3.5.5 Power Management of Shared Caches – Dynamic Cache Resizing .. 42 4 XMTSim – The Cycle-Accurate Simulator of the XMT Architecture 44 4.1 Overview of XMTSim ................................ 45 4.2 Simulation Statistics and Runtime Control .................... 48 4.3 Details of Cycle-Accurate Simulation ....................... 49 4.3.1 Discrete-Event Simulation ......................... 49 4.3.2 Concurrent Communication of Data Between Components ...... 52 4.3.3 Optimizing the DE Simulation Performance .............. 54 4.3.4 Simulation Speed .............................. 56 4.4 Cycle Verification Against
Recommended publications
  • From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
    IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform
    [Show full text]
  • IBM Powerpc 970 (A.K.A. G5)
    IBM PowerPC 970 (a.k.a. G5) Ref 1 David Benham and Yu-Chung Chen UIC – Department of Computer Science CS 466 PPC 970FX overview ● 64-bit RISC ● 58 million transistors ● 512 KB of L2 cache and 96KB of L1 cache ● 90um process with a die size of 65 sq. mm ● Native 32 bit compatibility ● Maximum clock speed of 2.7 Ghz ● SIMD instruction set (Altivec) ● 42 watts @ 1.8 Ghz (1.3 volts) ● Peak data bandwidth of 6.4 GB per second A picture is worth a 2^10 words (approx.) Ref 2 A little history ● PowerPC processor line is a product of the AIM alliance formed in 1991. (Apple, IBM, and Motorola) ● PPC 601 (G1) - 1993 ● PPC 603 (G2) - 1995 ● PPC 750 (G3) - 1997 ● PPC 7400 (G4) - 1999 ● PPC 970 (G5) - 2002 ● AIM alliance dissolved in 2005 Processor Ref 3 Ref 3 Core details ● 16(int)-25(vector) stage pipeline ● Large number of 'in flight' instructions (various stages of execution) - theoretical limit of 215 instructions ● 512 KB L2 cache ● 96 KB L1 cache – 64 KB I-Cache – 32 KB D-Cache Core details continued ● 10 execution units – 2 load/store operations – 2 fixed-point register-register operations – 2 floating-point operations – 1 branch operation – 1 condition register operation – 1 vector permute operation – 1 vector ALU operation ● 32 64 bit general purpose registers, 32 64 bit floating point registers, 32 128 vector registers Pipeline Ref 4 Benchmarks ● SPEC2000 ● BLAST – Bioinformatics ● Amber / jac - Structure biology ● CFD lab code SPEC CPU2000 ● IBM eServer BladeCenter JS20 ● PPC 970 2.2Ghz ● SPECint2000 ● Base: 986 Peak: 1040 ● SPECfp2000 ● Base: 1178 Peak: 1241 ● Dell PowerEdge 1750 Xeon 3.06Ghz ● SPECint2000 ● Base: 1031 Peak: 1067 Apple’s SPEC Results*2 ● SPECfp2000 ● Base: 1030 Peak: 1044 BLAST Ref.
    [Show full text]
  • Chapter 1-Introduction to Microprocessors File
    Chapter 1 Introduction to Microprocessors Expected Outcomes Explain the role of the CPU, memory and I/O device in a computer Distinguish between the microprocessor and microcontroller Differentiate various form of programming languages Compare between CISC vs RISC and Von Neumann vs Harvard architecture NMKNYFKEEUMP Introduction A microprocessor is an integrated circuit built on a tiny piece of silicon It contains thousands or even millions of transistors which are interconnected via superfine traces of aluminum The transistors work together to store and manipulate data so that the microprocessor can perform a wide variety of useful functions The particular functions a microprocessor perform are dictated by software The first microprocessor was the Intel 4004 (16-pin) introduced in 1971 containing 2300 transistors with 46 instruction sets Power8 processor, by contrast, contains 4.2 billion transistors NMKNYFKEEUMP Introduction Computer is an electronic machine that perform arithmetic operation and logic in response to instructions written Computer requires hardware and software to function Hardware is electronic circuit boards that provide functionality of the system such as power supply, cable, etc CPU – Central Processing Unit/Microprocessor Memory – store all programming and data Input/Output device – the flow of information Software is a programming that control the system operation and facilitate the computer usage Programming is a group of instructions that inform the computer to perform certain task NMKNYFKEEUMP Introduction Computer
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Green500 List
    Making a Case for a Green500 List S. Sharma†, C. Hsu†, and W. Feng‡ † Los Alamos National Laboratory ‡ Virginia Tech Outline z Introduction What Is Performance? Motivation: The Need for a Green500 List z Challenges What Metric To Choose? Comparison of Available Metrics z TOP500 as Green500 z Conclusion W. Feng, [email protected], (540) 231-1192 Where Is Performance? Performance = Speed, as measured in FLOPS W. Feng, [email protected], (540) 231-1192 What Is Performance? TOP500 Supercomputer List z Benchmark LINPACK: Solves a (random) dense system of linear equations in double-precision (64 bits) arithmetic. Introduced by Prof. Jack Dongarra, U. Tennessee z Evaluation Metric Performance, as defined by Performance (i.e., Speed) speed, is an important metric, Floating-Operations Per Second (FLOPS) but… z Web Site http://www.top500.org z Next-Generation Benchmark: HPC Challenge http://icl.cs.utk.edu/hpcc/ W. Feng, [email protected], (540) 231-1192 Reliability & Availability of HPC Systems CPUs Reliability & Availability ASCI Q 8,192 MTBI: 6.5 hrs. 114 unplanned outages/month. HW outage sources: storage, CPU, memory. ASCI 8,192 MTBF: 5 hrs. (2001) and 40 hrs. (2003). White HW outage sources: storage, CPU, 3rd-party HW. NERSC 6,656 MTBI: 14 days. MTTR: 3.3 hrs. Seaborg SW is the main outage source. Availability: 98.74%. PSC 3,016 MTBI: 9.7 hrs. Lemieux Availability: 98.33%. Google ~15,000 20 reboots/day; 2-3% machines replaced/year. (as of 2003) HW outage sources: storage, memory. Availability: ~100%. MTBI: mean time between interrupts; MTBF: mean time between failures; MTTR: mean time to restore Source: Daniel A.
    [Show full text]
  • Power Vector Intrinsic Programming Reference
    REVIEW DRAFT www.openpowerfoundation.org Power Vector Intrinsic Program- February 11, 2020 Revision 1.0.0_prd2 ming Reference Power Vector Intrinsic Programming Reference System Software Work Group <[email protected]> OpenPower Foundation Revision 1.0.0_prd2 (February 11, 2020) Copyright © 2017-2020 OpenPOWER Foundation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 REVIEW DRAFT - Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. The limited permissions granted above are perpetual and will not be revoked by OpenPOWER or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis AND TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, THE OpenPOWER Foundation AS WELL AS THE AUTHORS AND DEVELOPERS OF THIS STANDARDS FINAL DELIVERABLE OR OTHER DOCUMENT HEREBY DISCLAIM ALL OTHER WARRANTIES AND CONDITIONS, EITHER EXPRESS, REVIEW DRAFT - IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES, DUTIES OR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A PARTICULAR PURPOSE, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF WORKMANLIKE EFFORT, OF LACK OF VIRUSES, OF LACK OF NEGLIGENCE OR NON-INFRINGEMENT. OpenPOWER, the OpenPOWER logo, and openpowerfoundation.org are trademarks or registered trademarks of OpenPOWER Foundation, Inc., registered in many jurisdictions worldwide.
    [Show full text]
  • Game Consoles: a Look Ahead Game Consoles: a Look Ahead by Brian Neal – December 2003
    Ace’s Hardware Game Consoles: A Look Ahead Game Consoles: A Look Ahead By Brian Neal – December 2003 Introduction With the emergence of Microsoft and its Xbox, the console gaming market has become increasingly competitive, especially with Microsoft and Nintendo both angling for the No. 2 spot in the market and a shot for the top. While the present battle rages, though, the stage is being set for the next-generation of gaming machines. All three competitors have formed alliances with various partners and although Sony's Playstation 3 and the "CELL" project have been the subject of much discussion and countless rumors, information about the other future consoles has only begun to trickle out relatively recently. In this article, we'll take a look at the current generation as well as what's been revealed thus far for all three upcoming gaming machines, in an effort provide some speculation as to what we think the future may hold. The Status Quo Before peering into the future, let's take a look at the current situation and the three major gaming consoles: the Sony Playstation 2, Nintendo Gamecube, and Microsoft Xbox. Sony Playstation 2 The Playstation 2 was introduced in 2000 as the successor to the well-received and popular Playstation. Under the hood of the PS2 is a 300 MHz Toshiba Emotion Engine and 150 MHz Sony Graphics Synthesizer. The Emotion Engine tackles the roles of both a general purpose CPU and vector processor/vertex shader for the GPU. The main CPU is a MIPS R5900 core -- an in-order 2-issue superscalar processor implementing the MIPS IV instruction set.
    [Show full text]
  • IBM Research Report: Power Measurement of the Apple Power
    RC23276 (W0407-046) July 20, 2004 Computer Science IBM Research Report Power Measurement on the Apple Power Mac G5 Wes Felter, Tom Keller IBM Research Division Austin Research Laboratory 11501 Burnet Road Austin, TX 78758 Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. I thas been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g ,. payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home . Power Measurement on the Apple Power Mac G51 Wes Felter ([email protected]) , Tom Keller ([email protected]) IBM Austin Research Lab 1 Introduction This report describes the instrumentation developed for the dynamic measurement of power and temperature for the Apple Power Mac G5. The G5 employs one or two 64-bit PowerPC 970 processors and runs Mac OS X, and 32-bit or 64-bit Linux. Apple has implemented a set of temperature, power and fan RPM sensors. OS X uses the sensors for fan speed control, in order to minimize the acoustic disturbance of a deskside system.
    [Show full text]
  • SQA Advanced Higher Computing Unit 3C: Computer Architecture
    SCHOLAR Study Guide SQA Advanced Higher Computing Unit 3c: Computer Architecture David Bethune Heriot-Watt University Andy Cochrane Heriot-Watt University Ian King Heriot-Watt University Interactive University Edinburgh EH12 9QQ, United Kingdom. First published 2005 by Heriot-Watt University This edition published in 2006 by Interactive University Copyright ­c 2006 Heriot-Watt University Members of the SCHOLAR Forum may reproduce this publication in whole or in part for educational purposes within their establishment providing that no profit accrues at any stage, Any other use of the materials is governed by the general copyright statement that follows. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without written permission from the publisher. Heriot-Watt University accepts no responsibility or liability whatsoever with regard to the information contained in this study guide. SCHOLAR is a programme of Heriot-Watt University and is published and distributed on its behalf by Interactive University. British Library Cataloguing in Publication Data Interactive University SCHOLAR Study Guide Unit 3: Computing 1. Computing I. Computer Architecture ISBN 1 904647 89 8 Typeset by: Interactive University, Wallace House, 1 Lochside Avenue, Edinburgh, EH12 9QQ. Printed and bound in Great Britain by Graphic and Printing Services, Heriot-Watt University, Edinburgh. Part Number 2006-1385 Acknowledgements Thanks are due to the members of Heriot-Watt University’s
    [Show full text]
  • Performance of Various Computers Using Standard Linear Equations Software
    ———————— CS - 89 - 85 ———————— Performance of Various Computers Using Standard Linear Equations Software Jack J. Dongarra* Electrical Engineering and Computer Science Department University of Tennessee Knoxville, TN 37996-1301 Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN 37831 University of Manchester CS - 89 - 85 June 15, 2014 * Electronic mail address: [email protected]. An up-to-date version of this report can be found at http://www.netlib.org/benchmark/performance.ps This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract DE-AC05-96OR22464, and in part by the Science Alliance a state supported program at the University of Tennessee. 6/15/2014 2 Performance of Various Computers Using Standard Linear Equations Software Jack J. Dongarra Electrical Engineering and Computer Science Department University of Tennessee Knoxville, TN 37996-1301 Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN 37831 University of Manchester June 15, 2014 Abstract This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from the Earth Simulator to personal computers. 1. Introduction and Objectives The timing information presented here should in no way be used to judge the overall performance of a computer system. The results reflect only one problem area: solving dense systems of equations. This report provides performance information on a wide assortment of computers ranging from the home-used PC up to the most powerful supercomputers. The information has been collected over a period of time and will undergo change as new machines are added and as hardware and software systems improve.
    [Show full text]
  • The Playstation 3 for High- Performance Scientific Computing
    T ec HNOLOG ies Editor: Michael A. Gray, [email protected] THE PLAYSTATION 3 FOR HIGH- PERFORMANCE SCIENTIFIC COMPUTING By Jakub Kurzak, Alfredo Buttari, Piotr Luszczek, and Jack Dongarra Is real-world gaming technology the next big thing in the more academically based high-performance computing arena? The authors put PlayStation 3 to the test. he heart of the Sony PlaySta- SMT feature, which comes at a small chip. Figure 1 shows a schematic of tion 3—the Cell processor— 5 percent increase in the hardware’s the Cell processor’s design. T wasn’t originally intended for cost, can deliver up to 30 percent in- All the Cell processor’s compo- scientific number crunching, just as crease in performance. The PPU also nents, including the PPE, the SPEs, the PlayStation 3 itself wasn’t meant includes a short-vector single instruc- the main memory, and the I/O sys- primarily to serve such purposes. Yet, tion, multiple data (SIMD) engine tem, are connected via the element both these items could impact the called VMX, which is an incarnation interconnection bus, which has four high-performance computing world of the PowerPC’s AltiVec. unidirectional rings (two in each in significant ways. This introductory However, the Cell processor’s real direction) and a token-based arbi- article takes a closer look at their po- power lies in the eight Synergistic tration mechanism that plays the tential to do so; an extended version of Processing Elements (SPEs) that ac- role of traffic light. Each partici- it is published as a University of Ten- company the PPE.
    [Show full text]
  • Die Geschichte Der Digitalen Evolution Bezugsquelle
    Die Geschichte der digitalen Evolution Bezugsquelle: www.computerposter.ch 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 und ... Phasen Online-Zeitalter Internet-Hype Wireless-Zeitalter Web 2.0/Start Cloud Computing Start des Tablet-Zeitalters Cognitive Computing und Internet der Dinge (IoT) Zukunftsvisionen Jobs mel- All-in-One- NAS-Konzept OLPC-Projekt: A. Bowyer Cloud Wichtig Dass Computer und Bausteine immer kleiner, det sich Konzepte Start der entwickelt Computing für die AI- schneller, billiger und energieoptimierter werden, Hardware mit dem werden Massenpro- den ersten Akzeptanz: ist bekannt. Bei diesen Visionen geht es um iMac und inter- duktion des Open Source Unterstüt- mögliche zukünftige Anwendungen, welche sich mit neuem essant: XO-1-Laptops: 3D-Drucker zung und mit neuen Technologien und Konzepten realisie- Veriton RepRap nicht Ersatz ren lassen. Diese basieren auf Resultaten aus Logo Millennium Bug (Datumfehler): Ver- Haupteinsatz: Apple Watch: Jetzt kaufbar (April). FP2 (Acer), (Replicating von Spezia- Forschung und Entwicklung, welche man in den zurück. Uruguay, Peru, Sensoren: Herzfrequenz, Lage, IBM lanciert die Aptiva-Linie für PC im hinderung des IT-Horrorszenario Rapid-Pro- AI (Artificial Intelligence) wird immer listen. weltweiten Labors erarbeitet. iMac wird verschlingt 800 bis 1’000 Mia $. Mexico, Ruan- Beschleunigung. Wi-Fi, Bluetooth - Systeme den Heimmarkt. Compaq beherrscht als Markt- Bildschirm totyper) als wichtiger: Computer-Magazine 1. kommerzieller Einsatz von Watson Cognitive Computing als Ergänzung IBM ThinkPad TransNote: 27.1.2010: Steve Jobs präsentiert Das IBM-Programm Watson 4.0, NFC, S1-CPU, 10’000 Apps. leader das PC-Business.
    [Show full text]