Introduction Why High Performance Computing?

Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand nuclear physics because it is too fast. It is hard to understand the greenhouse effect because it is too slow. Supercomputers break these barriers to understanding. They, in effect, shrink oceans, zoom in on molecules, slow down physics, and fastforward climates. Clearly a scientist who can see natural phenomena at the right size and the right speed learns more than one who is faced with a blur. Al Gore, 1990 Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-2 Why High Performance Computing? • Grand Challenges (Basic Research) • Decoding of human genome • Kosmogenesis • Global Climate Changes • Biological Macro molecules • Product development • Fluid mechanics • Crash tests • Material minimization • Drug design • Chip design • IT Infrastructure • Search engines • Data Mining • Virtual Reality • Rendering • Vision Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-3 Examples for HPC-Applications: Finite Element Method: Crash-Analysis Asymmetric frontal impact of a car to a rigid obstacle Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-4 Examples for HPC-Applications: Aerodynamics Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-5 Examples for HPC-Applications: Molecular dynamics Simulation of a noble gas (Argon) with a partitioning for 8 processors: 2048 molecules in a cube of16 nm edge length simulated in time steps of 2x10-14 sec. Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-6 Examples for HPC-Applications: Nuclear Energy • 1946 Baker Test • 23.000 Tons TNT • Lots of nukes produced during the cold war • Status today: Largely unknown Source: Wikipedia Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-7 Examples for HPC-Applications: Visualization Rendering (Interior architecture) Rendering (Movie: Titanic) Photo realistic presentation using sophisticated illumination modules Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-8 Examples for HPC-Applications: Analysis of complex materials C60-trimer Si1600 MoS2 4H-SiC Source: Th. Frauenheim, Paderborn Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-9 Grosch's Law Performance Price „The performance of a computer increases (roughly) quadratically with the prize" Consequence: It is better to buy a computer which is twice as fast than to buy two slower computers. (The law was valid in the sixties and seventies over a wide range of universal computers.) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-10 Eighties Availability of powerful Microprocessors • High Integration density (VLSI) • Single-Chip-Processors • Automated production process • Automated development process • High volume production Consequence: Grosch's Law no longer valid: 1000 cheap microprocessors render (theoretically) more performance in (MFLOPS) than expensive Supercomputer (e.g. Cray) Idea: To achieve high performance at low cost use many microprocessors together Parallel Processing Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-11 Eighties • Wide spread use of workstations and PCs Terminals being replaced by PCs Workstations achieve (computational) performance of mainframes at a fraction of the price. • Availability of local area networks (LAN) (Ethernet) Possibility to connect a larger number of autonomous computers using a low cost medium. Access to data of other computers. Usage of programs and other resources of remote computers. • Network of Workstations as Parallel Computer Possibility to exploit unused computational capacity of other computers for computational-intensive calculations (idle time computing). Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-12 Nineties • Parallel computers are built of a large number of microprocessors (Massively Parallel Processors, MPP), e. g. Transputer systems, Connection Machine CM-5, Intel Paragon, Cray T3E • Alternative Architectures are built (Connection Machine CM-2, MasPar). • Trend to use cost efficient standard components (“commercial-off- the-shelf, COTS”) leads to coupling of standard PCs to a “Cluster” (Beowulf, 1995) • Exploitation of unused compute power for HPC (“idle time computing”) • Program libraries like Parallel Virtual Machine(PVM) and the Message Passing Interface (MPI) allow for development of portable parallel programs • Linux as operating system for HPC becomes prevailing Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-13 Top 10 of TOP500 List (06/2001) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-14 Top 10 of TOP500 List (06/2002) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-15 Earth Simulator (Rank 1, 2002-2004) Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-16 Earth Simulator • 640 Nodes • 8 vector processors (each 8 GFLOPS) and 16 GB per node • 5120 CPUs • 10 TB main memory • 40 TFLOPS peak performance • 65m x 50m physical dimension Source: H.Simon, NERSC Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-17 Rank Site Computer / Processors Rmax Country/Year Manufacturer Rpeak 1 IBM/DOE BlueGene/L beta-System 70720 United States/2004 BlueGene/L DD2 beta-System (0.7 GHz PowerPC 440) / 32768 91750 IBM 2 NASA/Ames Research Columbia 51870 Center/NAS SGI Altix 1.5 GHz, Voltaire Infiniband / 10160 60960 United States/2004 SGI 3 The Earth Simulator Center Earth-Simulator / 5120 35860 Japan/2002 NEC 40960 4 Barcelona Supercomputer MareNostrum 20530 Center eServer BladeCenter JS20 (PowerPC970 2.2 GHz), Myrinet / 3564 31363 Spain/2004 IBM 5 Lawrence Livermore National Thunder 19940 Laboratory Intel Itanium2 Tiger4 1.4GHz - Quadrics / 4096 22938 United States/2004 California Digital Corporation 6 Los Alamos National ASCI Q 13880 Laboratory ASCI Q - AlphaServer SC45, 1.25 GHz / 8192 20480 United States/2002 HP 7 Virginia Tech System X 12250 United States/2004 1100 Dual 2.3 GHz Apple XServe/Mellanox Infiniband 4X/Cisco GigE / 2200 20240 Self-made 8 IBM - Rochester BlueGene/L DD1 Prototype (0.5GHz PowerPC 440 w/Custom) / 8192 11680 United States/2004 IBM/ LLNL 16384 TOP 10 / Nov. 2004 10 / Nov. TOP 9 Naval Oceanographic Office eServer pSeries 655 (1.7 GHz Power4+) / 2944 10310 United States/2004 IBM 20019.2 10 NCSA Tungsten 9819 United States/2003 PowerEdge 1750, P4 Xeon 3.06 GHz, Myrinet / 2500 15300 Dell Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-18 Rank Site Computer Processors Year Rmax DOE/NNSA/LLNL BlueGene/L - eServer Blue Gene Solution 1 131072 2005 280600 United States IBM IBM Thomas J. Watson Research BGW - eServer Blue Gene Solution 2 40960 2005 91290 Center United States IBM DOE/NNSA/LLNL ASC Purple - eServer pSeries p5 575 1.9 3 10240 2005 63390 United States GHz IBM NASA/Ames Research Center/NAS Columbia - SGI Altix 1.5 GHz, Voltaire 4 10160 2004 51870 United States Infiniband SGI Thunderbird - PowerEdge 1850, 3.6 GHz, Sandia National Laboratories 5 Infiniband 8000 2005 38270 United States Dell Sandia National Laboratories Red Storm Cray XT3, 2.0 GHz 6 10880 2005 36190 United States Cray Inc. The Earth Simulator Center Earth-Simulator 7 5120 2002 35860 Japan NEC Barcelona Supercomputer Center MareNostrum - JS20 Cluster, PPC 970, 8 4800 2005 27910 Spain 2.2 GHz, Myrinet IBM TOP 10 / Nov. 2005 10 / Nov. TOP ASTRON/University Groningen Stella - eServer Blue Gene Solution 9 12288 2005 27450 Netherlands IBM Oak Ridge National Laboratory Jaguar - Cray XT3, 2.4 GHz 10 5200 2005 20527 United States Cray Inc. Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-19 IBM Blue Gene Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-20 IBM Blue Gene Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-21 TOP 10 / Nov. 2006 Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-22 Top 500 Nov 2008 Processor System Inter- Rank Computer Country Year Cores RMax Family Family connect 1 BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 IBM Ghz / Opteron DC 1.8 GHz , Voltaire Infiniband United States 2008 129600 1105000 Power Cluster Infiniband 2 Cray XT5 QC 2.3 GHz United States 2008 150152 1059000AMD x86_64 Cray XT Proprietary 3 SGI Altix ICE 8200EX, Xeon QC 3.0/2.66 GHz United States 2008 51200 487005 Intel EM64T SGI Altix Infiniband IBM 4 eServer Blue Gene Solution United States 2007 212992 478200 Power BlueGene Proprietary IBM 5 Blue Gene/P Solution United States 2007 163840 450300 Power BlueGene Proprietary Sun Blade 6 SunBlade x6420, Opteron QC 2.3 Ghz, Infiniband United States 2008 62976 433200AMD x86_64 System Infiniband 7 Cray XT4 QuadCore 2.3 GHz United States 2008 38642 266300AMD x86_64 Cray XT Proprietary 8 Cray XT4 QuadCore 2.1 GHz United States 2008 30976 205000AMD x86_64 Cray XT Proprietary 9 Sandia/ Cray Red Storm, XT3/4, 2.4/2.2 GHz Cray Inter- dual/quad core United States 2008 38208 204200AMD x86_64 Cray XT connect Dawning 5000A, QC Opteron 1.9 Ghz, Infiniband, Dawning 10 Windows HPC 2008 China 2008 30720 180600AMD x86_64Cluster Infiniband Barry Linnert, [email protected], Cluster Computing SoSe 2017 1-23 Top 500 Nov 2008 Processor System Inter- Rank Computer Country Year Cores RMax Family Family connect IBM 11 Blue Gene/P Solution Germany 2007 65536 180000 Power BlueGene Proprietary

Introduction Why High Performance Computing?

An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System

New CSC Computing Resources

Petaflops for the People

Red Storm Infrastructure at Sandia National Laboratories

Unlocking the Full Potential of the Cray XK7 Accelerator

A New UK Service for Academic Research

Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams

Developing Integrated Data Services for Cray Systems with a Gemini Interconnect

This Is Your Presentation Title

Through the Years… When Did It All Begin?

Cray XC30™ Supercomputer Intel® Xeon® Processor Daughter Card

Hardware Complexity and Software Challenges