Introduction Why High Performance Computing?

Total Page:16

File Type:pdf, Size:1020Kb

Introduction Why High Performance Computing? Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand nuclear physics because it is too fast. It is hard to understand the greenhouse effect because it is too slow. Supercomputers break these barriers to understanding. They, in effect, shrink oceans, zoom in on molecules, slow down physics, and fastforward climates. Clearly a scientist who can see natural phenomena at the right size and the right speed learns more than one who is faced with a blur. Al Gore, 1990 Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-2 Why High Performance Computing? • Grand Challenges (Basic Research) • Decoding of human genome • Kosmogenesis • Global Climate Changes • Biological Macro molecules • Product development • Fluid mechanics • Crash tests • Material minimization • Drug design • Chip design • IT Infrastructure • Search engines • Data Mining • Virtual Reality • Rendering • Vision Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-3 Examples for HPC-Applications: Finite Element Method: Crash-Analysis Asymmetric frontal impact of a car to a rigid obstacle Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-4 Examples for HPC-Applications: Aerodynamics Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-5 Examples for HPC-Applications: Molecular dynamics Simulation of a noble gas (Argon) with a partitioning for 8 processors: 2048 molecules in a cube of16 nm edge length simulated in time steps of 2x10-14 sec. Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-6 Examples for HPC-Applications: Nuclear Energy • 1946 Baker Test • 23.000 Tons TNT • Lots of nukes produced during the cold war • Status today: Largely unknown Source: Wikipedia Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-7 Examples for HPC-Applications: Visualization Rendering (Interior architecture) Rendering (Movie: Titanic) Photo realistic presentation using sophisticated illumination modules Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-8 Examples for HPC-Applications: Analysis of complex materials C60-trimer Si1600 MoS2 4H-SiC Source: Th. Frauenheim, Paderborn Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-9 Grosch's Law Performance Price „The performance of a computer increases (roughly) quadratically with the prize" Consequence: It is better to buy a computer which is twice as fast than to buy two slower computers. (The law was valid in the sixties and seventies over a wide range of universal computers.) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-10 Eighties Availability of powerful Microprocessors • High Integration density (VLSI) • Single-Chip-Processors • Automated production process • Automated development process • High volume production Consequence: Grosch's Law no longer valid: 1000 cheap microprocessors render (theoretically) more performance in (MFLOPS) than expensive Supercomputer (e.g. Cray) Idea: To achieve high performance at low cost use many microprocessors together Parallel Processing Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-11 Eighties • Wide spread use of workstations and PCs Terminals being replaced by PCs Workstations achieve (computational) performance of mainframes at a fraction of the price. • Availability of local area networks (LAN) (Ethernet) Possibility to connect a larger number of autonomous computers using a low cost medium. Access to data of other computers. Usage of programs and other resources of remote computers. • Network of Workstations as Parallel Computer Possibility to exploit unused computational capacity of other computers for computational-intensive calculations (idle time computing). Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-12 Nineties • Parallel computers are built of a large number of microprocessors (Massively Parallel Processors, MPP), e. g. Transputer systems, Connection Machine CM-5, Intel Paragon, Cray T3E • Alternative Architectures are built (Connection Machine CM-2, MasPar). • Trend to use cost efficient standard components (“commercial-off- the-shelf, COTS”) leads to coupling of standard PCs to a “Cluster” (Beowulf, 1995) • Exploitation of unused compute power for HPC (“idle time computing”) • Program libraries like Parallel Virtual Machine(PVM) and the Message Passing Interface (MPI) allow for development of portable parallel programs • Linux as operating system for HPC becomes prevailing Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-13 Top 10 of TOP500 List (06/2001) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-14 Top 10 of TOP500 List (06/2002) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-15 Earth Simulator (Rank 1, 2002-2004) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-16 Earth Simulator • 640 Nodes • 8 vector processors (each 8 GFLOPS) and 16 GB per node • 5120 CPUs • 10 TB main memory • 40 TFLOPS peak performance • 65m x 50m physical dimension Source: H.Simon, NERSC Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-17 Rank Site Computer / Processors Rmax Country/Year Manufacturer Rpeak 1 IBM/DOE BlueGene/L beta-System 70720 United States/2004 BlueGene/L DD2 beta-System (0.7 GHz PowerPC 440) / 32768 91750 IBM 2 NASA/Ames Research Columbia 51870 Center/NAS SGI Altix 1.5 GHz, Voltaire Infiniband / 10160 60960 United States/2004 SGI 3 The Earth Simulator Center Earth-Simulator / 5120 35860 Japan/2002 NEC 40960 4 Barcelona Supercomputer MareNostrum 20530 Center eServer BladeCenter JS20 (PowerPC970 2.2 GHz), Myrinet / 3564 31363 Spain/2004 IBM 5 Lawrence Livermore National Thunder 19940 Laboratory Intel Itanium2 Tiger4 1.4GHz - Quadrics / 4096 22938 United States/2004 California Digital Corporation 6 Los Alamos National ASCI Q 13880 Laboratory ASCI Q - AlphaServer SC45, 1.25 GHz / 8192 20480 United States/2002 HP 7 Virginia Tech System X 12250 United States/2004 1100 Dual 2.3 GHz Apple XServe/Mellanox Infiniband 4X/Cisco GigE / 2200 20240 Self-made 8 IBM - Rochester BlueGene/L DD1 Prototype (0.5GHz PowerPC 440 w/Custom) / 8192 11680 United States/2004 IBM/ LLNL 16384 TOP 10 / Nov. 2004 10 / Nov. TOP 9 Naval Oceanographic Office eServer pSeries 655 (1.7 GHz Power4+) / 2944 10310 United States/2004 IBM 20019.2 10 NCSA Tungsten 9819 United States/2003 PowerEdge 1750, P4 Xeon 3.06 GHz, Myrinet / 2500 15300 Dell Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-18 Rank Site Computer Processors Year Rmax DOE/NNSA/LLNL BlueGene/L - eServer Blue Gene Solution 1 131072 2005 280600 United States IBM IBM Thomas J. Watson Research BGW - eServer Blue Gene Solution 2 40960 2005 91290 Center United States IBM DOE/NNSA/LLNL ASC Purple - eServer pSeries p5 575 1.9 3 10240 2005 63390 United States GHz IBM NASA/Ames Research Center/NAS Columbia - SGI Altix 1.5 GHz, Voltaire 4 10160 2004 51870 United States Infiniband SGI Thunderbird - PowerEdge 1850, 3.6 GHz, Sandia National Laboratories 5 Infiniband 8000 2005 38270 United States Dell Sandia National Laboratories Red Storm Cray XT3, 2.0 GHz 6 10880 2005 36190 United States Cray Inc. The Earth Simulator Center Earth-Simulator 7 5120 2002 35860 Japan NEC Barcelona Supercomputer Center MareNostrum - JS20 Cluster, PPC 970, 8 4800 2005 27910 Spain 2.2 GHz, Myrinet IBM TOP 10 / Nov. 2005 10 / Nov. TOP ASTRON/University Groningen Stella - eServer Blue Gene Solution 9 12288 2005 27450 Netherlands IBM Oak Ridge National Laboratory Jaguar - Cray XT3, 2.4 GHz 10 5200 2005 20527 United States Cray Inc. Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-19 IBM Blue Gene Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-20 IBM Blue Gene Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-21 TOP 10 / Nov. 2006 Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-22 Top 500 Nov 2008 Processor System Inter- Rank Computer Country Year Cores RMax Family Family connect 1 BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 IBM Ghz / Opteron DC 1.8 GHz , Voltaire Infiniband United States 2008 129600 1105000 Power Cluster Infiniband 2 Cray XT5 QC 2.3 GHz United States 2008 150152 1059000AMD x86_64 Cray XT Proprietary 3 SGI Altix ICE 8200EX, Xeon QC 3.0/2.66 GHz United States 2008 51200 487005 Intel EM64T SGI Altix Infiniband IBM 4 eServer Blue Gene Solution United States 2007 212992 478200 Power BlueGene Proprietary IBM 5 Blue Gene/P Solution United States 2007 163840 450300 Power BlueGene Proprietary Sun Blade 6 SunBlade x6420, Opteron QC 2.3 Ghz, Infiniband United States 2008 62976 433200AMD x86_64 System Infiniband 7 Cray XT4 QuadCore 2.3 GHz United States 2008 38642 266300AMD x86_64 Cray XT Proprietary 8 Cray XT4 QuadCore 2.1 GHz United States 2008 30976 205000AMD x86_64 Cray XT Proprietary 9 Sandia/ Cray Red Storm, XT3/4, 2.4/2.2 GHz Cray Inter- dual/quad core United States 2008 38208 204200AMD x86_64 Cray XT connect Dawning 5000A, QC Opteron 1.9 Ghz, Infiniband, Dawning 10 Windows HPC 2008 China 2008 30720 180600AMD x86_64Cluster Infiniband Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-23 Top 500 Nov 2008 Processor System Inter- Rank Computer Country Year Cores RMax Family Family connect IBM 11 Blue Gene/P Solution Germany 2007 65536 180000 Power BlueGene Proprietary
Recommended publications
  • An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System
    An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System Sadaf Alam, Nicola Bianchi, Nicholas Cardo, Matteo Chesi, Miguel Gila, Stefano Gorini, Mark Klein, Colin McMurtrie, Marco Passerini, Carmelo Ponti, Fabio Verzelloni CSCS – Swiss National Supercomputing Centre Lugano, Switzerland Email: {sadaf.alam, nicola.bianchi, nicholas.cardo, matteo.chesi, miguel.gila, stefano.gorini, mark.klein, colin.mcmurtrie, marco.passerini, carmelo.ponti, fabio.verzelloni}@cscs.ch Abstract—The Swiss National Supercomputing Centre added depth provides the necessary space for full-sized PCI- (CSCS) upgraded its flagship system called Piz Daint in Q4 e daughter cards to be used in the compute nodes. The use 2016 in order to support a wider range of services. The of a standard PCI-e interface was done to provide additional upgraded system is a heterogeneous Cray XC50 and XC40 system with Nvidia GPU accelerated (Pascal) devices as well as choice and allow the systems to evolve over time[1]. multi-core nodes with diverse memory configurations. Despite Figure 1 clearly shows the visible increase in length of the state-of-the-art hardware and the design complexity, the 37cm between an XC40 compute module (front) and an system was built in a matter of weeks and was returned to XC50 compute module (rear). fully operational service for CSCS user communities in less than two months, while at the same time providing significant improvements in energy efficiency. This paper focuses on the innovative features of the Piz Daint system that not only resulted in an adaptive, scalable and stable platform but also offers a very high level of operational robustness for a complex ecosystem.
    [Show full text]
  • New CSC Computing Resources
    New CSC computing resources Atte Sillanpää, Nino Runeberg CSC – IT Center for Science Ltd. Outline CSC at a glance New Kajaani Data Centre Finland’s new supercomputers – Sisu (Cray XC30) – Taito (HP cluster) CSC resources available for researchers CSC presentation 2 CSC’s Services Funet Services Computing Services Universities Application Services Polytechnics Ministries Data Services for Science and Culture Public sector Information Research centers Management Services Companies FUNET FUNET and Data services – Connections to all higher education institutions in Finland and for 37 state research institutes and other organizations – Network Services and Light paths – Network Security – Funet CERT – eduroam – wireless network roaming – Haka-identity Management – Campus Support – The NORDUnet network Data services – Digital Preservation and Data for Research Data for Research (TTA), National Digital Library (KDK) International collaboration via EU projects (EUDAT, APARSEN, ODE, SIM4RDM) – Database and information services Paituli: GIS service Nic.funet.fi – freely distributable files with FTP since 1990 CSC Stream Database administration services – Memory organizations (Finnish university and polytechnics libraries, Finnish National Audiovisual Archive, Finnish National Archives, Finnish National Gallery) 4 Current HPC System Environment Name Louhi Vuori Type Cray XT4/5 HP Cluster DOB 2007 2010 Nodes 1864 304 CPU Cores 10864 3648 Performance ~110 TFlop/s 34 TF Total memory ~11 TB 5 TB Interconnect Cray QDR IB SeaStar Fat tree 3D Torus CSC
    [Show full text]
  • Petaflops for the People
    PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution.
    [Show full text]
  • Red Storm Infrastructure at Sandia National Laboratories
    Red Storm Infrastructure Robert A. Ballance, John P. Noe, Milton J. Clauser, Martha J. Ernest, Barbara J. Jennings, David S. Logsted, David J. Martinez, John H. Naegle, and Leonard Stans Sandia National Laboratories∗ P.O. Box 5800 Albquuerque, NM 87185-0807 May 13, 2005 Abstract Large computing systems, like Sandia National Laboratories’ Red Storm, are not deployed in isolation. The computer, its infrastructure support, and the operations of both have to be designed to support the mission of the or- ganization. This talk and paper will describe the infrastructure and operational requirements of Red Storm along with a discussion of how Sandia is meeting those requirements. It will include discussions of the facilities that surround and support Red Storm: shelter, power, cooling, security, networking, interactions with other Sandia platforms and capabilities, operations support, and user services. 1 Introduction mary components: Stewart Brand, in How Buildings Learn[Bra95], has doc- 1. The physical setting, such as buildings, power, and umented many of the ways in which buildings — often cooling, thought to be static structures – evolve over time to meet the needs of their occupants and the constraints of their 2. The computational setting which includes the re- environments. Machines learn in the same way. This is lated computing systems that support ASC efforts not the learning of “Machine learning” in Artificial Intel- on Red Storm, ligence; it is the very human result of making a machine 3. The networking setting that ties all of these to- smarter as it operates within its distinct environment. gether, and Deploying a large computational resource teaches many lessons, including the need to meet the expecta- 4.
    [Show full text]
  • Unlocking the Full Potential of the Cray XK7 Accelerator
    Unlocking the Full Potential of the Cray XK7 Accelerator ∗ † Mark D. Klein , and, John E. Stone ∗National Center for Supercomputing Application, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA †Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA Abstract—The Cray XK7 includes NVIDIA GPUs for ac- [2], [3], [4], and for high fidelity movie renderings with the celeration of computing workloads, but the standard XK7 HVR volume rendering software.2 In one example, the total system software inhibits the GPUs from accelerating OpenGL turnaround time for an HVR movie rendering of a trillion- and related graphics-specific functions. We have changed the operating mode of the XK7 GPU firmware, developed a custom cell inertial confinement fusion simulation [5] was reduced X11 stack, and worked with Cray to acquire an alternate driver from an estimate of over a month for data transfer to and package from NVIDIA in order to allow users to render and rendering on a conventional visualization cluster down to post-process their data directly on Blue Waters. Users are able just one day when rendered locally using 128 XK7 nodes to use NVIDIA’s hardware OpenGL implementation which has on Blue Waters. The fully-graphics-enabled GPU state is many features not available in software rasterizers. By elim- inating the transfer of data to external visualization clusters, currently considered an unsupported mode of operation by time-to-solution for users has been improved tremendously. In Cray, and to our knowledge Blue Waters is presently the one case, XK7 OpenGL rendering has cut turnaround time only Cray system currently running in this mode.
    [Show full text]
  • A New UK Service for Academic Research
    The newsletter of EPCC, the supercomputing centre at the University of Edinburgh news Issue 74 Autumn 2013 In this issue Managing research data HPC for business Simulating soft materials Energy efficiency in HPC Intel Xeon Phi ARCHER A new UK service for academic research ARCHER is a 1.56 Petaflop Cray XC30 supercomputer that will provide the next UK national HPC service for academic research Also in this issue Dinosaurs! From the Directors Contents 3 PGAS programming 7th International Conference Autumn marks the start of the new ARCHER service: a 1.56 Petaflop Profile Cray XC30 supercomputer that will provide the next UK national Meet the people at EPCC HPC service for academic research. 4 New national HPC service We have been involved in running generation of exascale Introducing ARCHER national HPC services for over 20 supercomputers which will be years and ARCHER will continue many, many times more powerful Big Data our tradition of supporting science than ARCHER. Big Data activities 7 Data preservation and in the UK and in Europe. are also playing an increasingly infrastructure important part in our academic and Our engagement with industry work. supercomputing takes many forms 9 HPC for industry - from racing dinosaurs at science We have always prided ourselves Making business better festivals, to helping researchers get on the diversity of our activities. more from HPC by improving This issue of EPCC News 11 Simulation algorithms and creating new showcases just a fraction of them. Better synthesised sounds software tools. EPCC staff
    [Show full text]
  • Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams
    MERRIMAC – HIGH-PERFORMANCE AND HIGHLY-EFFICIENT SCIENTIFIC COMPUTING WITH STREAMS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Mattan Erez May 2007 c Copyright by Mattan Erez 2007 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (William J. Dally) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Patrick M. Hanrahan) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Mendel Rosenblum) Approved for the University Committee on Graduate Studies. iii iv Abstract Advances in VLSI technology have made the raw ingredients for computation plentiful. Large numbers of fast functional units and large amounts of memory and bandwidth can be made efficient in terms of chip area, cost, and energy, however, high-performance com- puters realize only a small fraction of VLSI’s potential. This dissertation describes the Merrimac streaming supercomputer architecture and system. Merrimac has an integrated view of the applications, software system, compiler, and architecture. We will show how this approach leads to over an order of magnitude gains in performance per unit cost, unit power, and unit floor-space for scientific applications when compared to common scien- tific computers designed around clusters of commodity general-purpose processors.
    [Show full text]
  • Developing Integrated Data Services for Cray Systems with a Gemini Interconnect
    Developing Integrated Data Services for Cray Systems with a Gemini Interconnect Ron A. Oldeld Todd Kordenbrock Jay Lofstead May 1, 2012 Abstract Over the past several years, there has been increasing interest in injecting a layer of compute resources between a high-performance computing application and the end storage devices. For some projects, the objective is to present the parallel le system with a reduced set of clients, making it easier for le-system vendors to support extreme-scale systems. In other cases, the objective is to use these resources as staging areas to aggregate data or cache bursts of I/O operations. Still others use these staging areas for in-situ analysis on data in-transit between the application and the storage system. To simplify our discussion, we adopt the general term Integrated Data Services to represent these use-cases. This paper describes how we provide user-level, integrated data services for Cray systems that use the Gemini Interconnect. In particular, we describe our implementation and performance results on the Cray XE6, Cielo, at Los Alamos National Laboratory. 1 Introduction den on the le system and potentially improve over- all ecency of the application workow. Current In our quest toward exascale systems and applica- work to enable these coupling and workow scenar- tions, one topic that is frequently discussed is the ios are focused on the data issues to resolve resolu- need for more exible execution models. Current tion and mesh mismatches, time scale mismatches, models for capability class high-performance comput- and make data available through data staging tech- ing (HPC) systems are essentially static, requiring niques [20, 34, 21, 11, 2, 40, 10].
    [Show full text]
  • This Is Your Presentation Title
    Introduction to GPU/Parallel Computing Ioannis E. Venetis University of Patras 1 Introduction to GPU/Parallel Computing www.prace-ri.eu Introduction to High Performance Systems 2 Introduction to GPU/Parallel Computing www.prace-ri.eu Wait, what? Aren’t we here to talk about GPUs? And how to program them with CUDA? Yes, but we need to understand their place and their purpose in modern High Performance Systems This will make it clear when it is beneficial to use them 3 Introduction to GPU/Parallel Computing www.prace-ri.eu Top 500 (June 2017) CPU Accel. Rmax Rpeak Power Rank Site System Cores Cores (TFlop/s) (TFlop/s) (kW) National Sunway TaihuLight - Sunway MPP, Supercomputing Center Sunway SW26010 260C 1.45GHz, 1 10.649.600 - 93.014,6 125.435,9 15.371 in Wuxi Sunway China NRCPC National Super Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Computer Center in Cluster, Intel Xeon E5-2692 12C 2 Guangzhou 2.200GHz, TH Express-2, Intel Xeon 3.120.000 2.736.000 33.862,7 54.902,4 17.808 China Phi 31S1P NUDT Swiss National Piz Daint - Cray XC50, Xeon E5- Supercomputing Centre 2690v3 12C 2.6GHz, Aries interconnect 3 361.760 297.920 19.590,0 25.326,3 2.272 (CSCS) , NVIDIA Tesla P100 Cray Inc. DOE/SC/Oak Ridge Titan - Cray XK7 , Opteron 6274 16C National Laboratory 2.200GHz, Cray Gemini interconnect, 4 560.640 261.632 17.590,0 27.112,5 8.209 United States NVIDIA K20x Cray Inc. DOE/NNSA/LLNL Sequoia - BlueGene/Q, Power BQC 5 United States 16C 1.60 GHz, Custom 1.572.864 - 17.173,2 20.132,7 7.890 4 Introduction to GPU/ParallelIBM Computing www.prace-ri.eu How do
    [Show full text]
  • Through the Years… When Did It All Begin?
    & Through the years… When did it all begin? 1974? 1978? 1963? 2 CDC 6600 – 1974 NERSC started service with the first Supercomputer… ● A well-used system - Serial Number 1 ● On its last legs… ● Designed and built in Chippewa Falls ● Launch Date: 1963 ● Load / Store Architecture ● First RISC Computer! ● First CRT Monitor ● Freon Cooled ● State-of-the-Art Remote Access at NERSC ● Via 4 acoustic modems, manually answered capable of 10 characters /sec 3 50th Anniversary of the IBM / Cray Rivalry… Last week, CDC had a press conference during which they officially announced their 6600 system. I understand that in the laboratory developing this system there are only 32 people, “including the janitor”… Contrasting this modest effort with our vast development activities, I fail to understand why we have lost our industry leadership position by letting someone else offer the world’s most powerful computer… T.J. Watson, August 28, 1963 4 2/6/14 Cray Higher-Ed Roundtable, July 22, 2013 CDC 7600 – 1975 ● Delivered in September ● 36 Mflop Peak ● ~10 Mflop Sustained ● 10X sustained performance vs. the CDC 6600 ● Fast memory + slower core memory ● Freon cooled (again) Major Innovations § 65KW Memory § 36.4 MHz clock § Pipelined functional units 5 Cray-1 – 1978 NERSC transitions users ● Serial 6 to vector architectures ● An fairly easy transition for application writers ● LTSS was converted to run on the Cray-1 and became known as CTSS (Cray Time Sharing System) ● Freon Cooled (again) ● 2nd Cray 1 added in 1981 Major Innovations § Vector Processing § Dependency
    [Show full text]
  • Cray XC30™ Supercomputer Intel® Xeon® Processor Daughter Card
    Intel, Xeon, Aries and the Intel Logo are trademarks of Intel Corporation in the U.S. and/or other countries. All other trademarks mentioned herein are the properties of their respective owners. 20121024JRC 2012 Cray Inc. All rights reserved. Specifications are subject to change without notice. Cray is a registered trademark, Cray XC30, Cray Linux Environment, Cray SHMEM, and NodeKare are trademarks of Cray In. Cray XC30™ Supercomputer Intel® Xeon® Processor Daughter Card Adaptive Supercomputing through Flexible Design Supercomputer users procure their machines to satisfy specific and demanding The Cray XC30 supercomputer series requirements. But they need their system to grow and evolve to maximize the architecture has been specifically machine’s lifetime and their return on investment. To solve these challenges designed from the ground up to be today and into the future, the Cray XC30 supercomputer series network and adaptive. A holistic approach opti- compute technology have been designed to easily accommodate upgrades mizes the entire system to deliver and enhancements. Users can augment their system “in-place” to upgrade to sustained real-world performance and higher performance processors, or add coprocessor/accelerator components extreme scalability across the collective to build even higher performance Cray XC30 supercomputer configurations. integration of all hardware, network- ing and software. Cray XC30 Supercomputer — Compute Blade The Cray XC30 series architecture implements two processor engines per One key differentiator with this compute node, and has four compute nodes per blade. Compute blades stack adaptive supercomputing platform is 16 to a chassis, and each cabinet can be populated with up to three chassis, the flexible method of implementing culminating in 384 sockets per cabinet.
    [Show full text]
  • Hardware Complexity and Software Challenges
    Trends in HPC: hardware complexity and software challenges Mike Giles Oxford University Mathematical Institute Oxford e-Research Centre ACCU talk, Oxford June 30, 2015 Mike Giles (Oxford) HPC Trends June 30, 2015 1 / 29 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Top500 supercomputers Really impressive { 300× more capability in 10 years! Mike Giles (Oxford) HPC Trends June 30, 2015 3 / 29 My personal experience 1982: CDC Cyber 205 (NASA Langley) 1985{89: Alliant FX/8 (MIT) 1989{92: Stellar (MIT) 1990: Thinking Machines CM5 (UTRC) 1987{92: Cray X-MP/Y-MP (NASA Ames, Rolls-Royce) 1993{97: IBM SP2 (Oxford) 1998{2002: SGI Origin (Oxford) 2002 { today: various x86 clusters (Oxford) 2007 { today: various GPU systems/clusters (Oxford) 2011{15: GPU cluster (Emerald @ RAL) 2008 { today: Cray XE6/XC30 (HECToR/ARCHER @ EPCC) 2013 { today: Cray XK7 with GPUs (Titan @ ORNL) Mike Giles (Oxford) HPC Trends June 30, 2015 4 / 29 Top500 supercomputers Power requirements are raising the question of whether this rate of improvement is sustainable.
    [Show full text]