Introduction to High Performance Computing

Introduction to High Performance Computing n Why HPC n Basic concepts n How to program n Technological trends 1 Why HPC? n Many problems require more resources than available on a single computer n “Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems requiring PetaFLOPS and PetaBytes of computing resources. n Web search engines/databases processing millions of transactions per second 2 Uses of HPC n Historically "the high end of computing” n Chemistry, Molecular Sciences n Atmosphere, Earth, Environment n Geology, Seismology n Physics - applied, nuclear, particle, n Mechanical Engineering - from condensed matter, high pressure, prosthetics to spacecraft fusion, photonics n Electrical Engineering, Circuit Design, n Bioscience, Biotechnology, Genetics Microelectronics n Computer Science, Mathematics 3 Uses of HPC n Today, commercial applications provide an n Medical imaging and diagnosis equal or greater driving force; require n Pharmaceutical design processing of large amounts of data in n Management of national and multi-national sophisticated ways corporations n Databases, data mining n Financial and economic modeling n Oil exploration n Advanced graphics and virtual reality, particularly in the entertainment industry n Web search engines, web based business services n Networked video and multi-media technologies n Collaborative work environments 4 Example: Weather Prediction 10 km Target for addressing key science challenges in weather & climate prediction: Global 1-km Earth system simulations @ ~1 year / day rate 1 km ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE5 Example: NOMAD Science and Data Handling Challenges Data is the raw materials of the 21st century 10.000 1 Mio The NOMAD Archive NOMAD supports all important codes in computational materials science. The code- independent Archive contains data from many million calculations (billions of CPU 100 1.000 hours). The NOMAD challenge: Build a map # Compositions and fill the existing white spots 1 1 Photovoltaics # Geometries per Composition B Thermal- Discovering interpretable patterns and barrier materials correlations in this data will • create knowledge Descriptor Transparent Superconductors • advance materials science, metals • identify new scientific phenomena, and • support industrial applications. Descriptor A ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE6 The Airbus Challenge, David Hills, 2008 ExaFLOW An Airbus 310 cruising at 250 m/s at 10000m Teraflops machine (1012 Flops): 8·105 years Result in one week: 4·1019 flops machine (40 EFlops) (based on John Kim’s estimate, TSFP-9, 2015) ExaFLOW HPC Summit, Prague 7 Predicting interactomes by docking… a dream? Ø ~20’000 human proteins Ø Interactome prediction will require 20’0002 docking runs Ø Which will require > 10 billions CPU hours and generate about 100 exabytes of data Ø Interest in simulating/understanding the impact of disease-related mutations that affect/alter the interaction network 8 Molecular Dynamics on the exascale • Understanding proteins and A drugs VSD • A 1 μs simulation: 10 exaflop • Many structural transition: many simulations needed • Study effect of several bound drugs B • Study effect of mutations R0 • All this multiplies to >> zettaflop E183 R1 E226 R2 • Question: how far can we D259 R3 4 parallelize? E236 K R5 R1 R2 E183 R3 Example: ion channel in a nerve cell. * F233 R4 * Α/Β K5 E226 Δ/Ε* Γ/Δ* Β/Γ R6 Opens and closes during signalling. D259 E236 Affected by e.g. alcohol and drugs. Β Α Γ 200 000 atoms Ε Δ Partners Q Funding 9 bioexcel.eu FET: Human Brain Project F Schürmann, H Markram (Blue Brain Project, EPFL) What is Parallel Computing n Traditionally, software has been written for serial computation: n To be run on a single computer having a single Central Processing Unit (CPU); n A problem is broken into a discrete series of instructions. n Instructions are executed one after another. n Only one instruction may execute at any moment in time. 11 Parallel Computing n In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: n A problem is broken into discrete parts that can be solved concurrently n Each part is further broken down to a series of instructions 12 Parallelism on different levels n CPU n Instruction level parallelism, pipelining n Vector unit n Multiple cores • Multiple threads or processes n Computer n Multiple CPUs n Co-processors (GPUs, FPGAs, …) n Network n Tightly integrated network of computers (supercomputer) n Loosely integrated network of computers (distributed computing) 13 Flynn’s taxonomy (1966) n {Single, Multiple} {Instructions, Data} SISD SIMD Single Instruction, Single Data Single Instruction, Multiple Data MISD MIMD Multiple Instruction, Single Data Multiple Instruction, Multiple Data 14 Single Instruction Single Data n A serial (non-parallel) computer n Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle n Single data: only one data stream is being used as input during any one clock cycle n Deterministic execution n This is the oldest and used to be the most common type of computer (up to arrival of multicore CPUs) n Examples: older generation mainframes, minicomputers and workstations; older generation PCs. n Attention: single core CPUs exploit instruction level parallelism (pipelining, multiple issue, speculative execution) but are still classified SISD 15 Single Instruction Multiple Data n “Vector” Computer n Single instruction: All processing units execute the same instruction at any given clock cycle n Multiple data: Each processing unit can operate on a different data element n Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. n Synchronous (lockstep) and deterministic execution n Two varieties: Processor Arrays and Vector Pipelines n Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units. 16 Multiple Instruction, Multiple Data n Currently, the most common type of parallel computer. Most modern computers fall into this category. n Multiple Instruction: every processor may be executing a different instruction stream n Multiple Data: every processor may be working with a different data stream n Execution can be synchronous or asynchronous, deterministic or non- deterministic n Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multicore PCs. n Note: many MIMD architectures also include SIMD execution sub- components 17 Multiple Instruction, Single Data n No examples exist today n Potential uses might be: n Multiple cryptography algorithms attempting to crack a single coded message n Multiple frequency filters operating on a single signal 18 Single Program Multiple Data (SPMD) n MIMDs are typically programmed following the SPMD model n A single program is executed by all tasks simultaneously. n At any moment in time, tasks can be executing the same or different instructions within the same program. All tasks may use different data. (MIMD) n SPMD programs usually have the necessary logic programmed into them to allow different tasks to branch or conditionally execute only those parts of the program they are designed to execute. That is, tasks do not necessarily have to execute the entire program - perhaps only a portion of it. 19 Multiple Program Multiple Data (MPMD) n MPMD applications typically have multiple executable object files (programs). While the application is being run in parallel, each task can be executing the same or different program as other tasks. n All tasks may use different data n Workflow applications, multidisciplinary optimization, combination of different models 20 FLOPS n FLoating Point Operations per Second n Most commonly used performance indicator for parallel computers n Typically measured using the Linpack benchmark n Most useful for scientific applications Name Flops n Other benchmarks include Yotta 1024 SPEC, NAS, stream (memory) Zetta 1021 Exa 1018 Peta 1015 Tera 1012 Giga 109 Mega 106 21 Moore’s Law n Gordon E. Moore, "Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965: “The complexity for minimum component costs has increased at a rate of roughly a factor of two per year ... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.” n With later alterations: Transistor density doubles every 18 months n So far this law holds n It has also been interpreted as doubling performance every 18 months n A little inaccurate - see later 22 4 Years 4 Years 23 Top500 Nr 1: ”TaihuLight - Sunway” n National Supercomputing Center in Wuxi, China n Sunway SW26010 260 cores, 1.45 GHz n 10,649,600 cores n 93 PF Linpack (125.5 PF theoretical peak) n 15 MW 24 Communication Architecture 25 A parallel computer is “a collection of processing elements that communicate and cooperate to solve large problems fast” (Almasi and Gottlieb 1989) 26 Communication Architecture n Defines basic communication and synchronization

Load more