High-Performance Computing High-Performance

High-Performance Computing - and why Learn about it? Tarek El-Ghazawi The George Washington University Washington D.C., USA Outline What is High-Performance Computing? Why is High-Performance Computing Important? Advances in Performance and Architectures Heterogeneous Accelerated Computing Advances in Parallel Programming Making Progress: The HPCS Program, near-term Making Progress: Exascale and DOE Conclusions Tarek El-Ghazawi, GWU 2 What is Supercomputing and Parallel Architectures? Also called High-Performance Computing and Parallel Computing Research and innovation in architecture, programming and applications associated with computer systems that are orders of magnitude faster (10X- 1000X or more) than modern desktop and laptop computers Supercomputers achieve speed through massive parallelism- Parallel Architectures! E.g. many processors working together http://www.collegehumor.com/video:1828443 Tarek El-Ghazawi, GWU 3 Outline What is High-Performance Computing? Why is High-Performance Computing Important? Advances in Performance and Architectures Hardware Accelerators and Accelerated Computing Advances in Parallel Programming What is Next: The HPCS Program, near-term What is Next: Exascale and DARPA UHPC Conclusions Tarek El-Ghazawi, GWU 4 Why is HPC Important? Critical for economic competitiveness because of its wide applications (through simulations and intensive data analyses) Drives computer hardware and software innovations for future conventional computing Is becoming ubiquitous, i.e. all computing/information technology is turning into Parallel!! Is that why it is turning into an international HPC muscle flexing contest? Tarek El-Ghazawi, GWU 5 Why is HPC Important? Design Build Test Design Model Simulate Build Tarek El-Ghazawi, GWU 6 Why is HPC Important? National and Economic Competitiveness Molecular Dynamics Gene Sequence Alignment HIV-1 Protease Inhibitor Drug HPC Simulation for 2ns: Phylogenetic Analysis: • 2 weeks on a desktop Application • 32 days on desktop • 6 hours on a supercomputer Examples • 1.5 hrs supercomputer Car Crash Understanding Simulations Fundamental Structure of Matter 2 million elements simulation: • 4 days on a desktop • 25 minutes on a supercomputer Requires a billion- billion calculations per Tarek El-Ghazawi, GWU second 7 Why is HPC Important? National and Economic Competitiveness Industrial competitiveness Computational models that can run on HPC are only for the design of NASA space shuttles, but they can also help with Business Intelligence (e.g. IBM) and Watson Designing effective shapes and/or material for Potato Chips Clorox Bottles … Tarek El-Ghazawi, GWU 8 HPC Technology of Today is Conventional Computing of Tomorrow: Multi/Many-cores in Desktops and Laptops Intel 80 Core Chip 1 Chip and 1 TeraFLOPs in 2007 The ASCI Red Supercomputer 9000 chips for 3 TeraFLOPs in 1997 Intel 72 Core Chip Xeon Phi KNL 1 Chip and 3 TeraFLOPs in 2016 Tarek El-Ghazawi, GWU 9 Why is HPC Important?- HPC is Ubiquitous Sony PS3 iPhone 7 4 Cores 2.34 GHz HPC is Ubiquitous! All Computing is becoming HPC, Can we become Uses the Cell Processors! bystanders? The Road Runner: Was Fastest Supercomputer in 08 Xeon Phi KNL: A 72 CPU Chip Uses Cell Processors! Tarek El-Ghazawi, GWU 10 Why this is happening? - The End of Moore’s Law in Clocking The phenomenon of exponential improvements in processors was observed in 1979 by Intel co-founder Gordon Moore The speed of a microprocessor doubles every 18-24 months, assuming the price of the processor stays the same Wrong, not anymore! The price of a microchip drops about 48% every 18-24 months, assuming the same processor speed and on chip memory capacity Ok, for Now The number of transistors on a microchip doubles every 18-24 months, assuming the price of the chip stays the same Ok, for Now Tarek El-Ghazawi, GWU 11 No faster clocking but more Cores? Source: Ed Davis, Intel Tarek El-Ghazawi, GWU 12 Cores and Power Efficiency Source: Ed Davis, Intel Tarek El-Ghazawi, GWU 13 Comparative View of Processors and Accelerators Fabrication Peak FP Peak DP Freq # Cores Memory Process Performance Power Flops/W SPFP DPFP Memory nm GHz W BW GB/s GFlops GFlops type PowerXCell 8i 65 3.2 1 + 8 204 102.4 92 1.11 25.6 XDR NVidia Fermi 40 1.3 512 1330 665 225 2.9 177 GDDR5 Tesla M2090 Nvidia Kepler 28 0.73 2688 3950 1310 235 5.6 250 GDDR5 K20X NVIDIA Kepler 28 0.88 2x2496 8749 2910 300 9.7 480 GDDR5 K80 Intel Xeon Phi 60 (240 22 1.05 - 1011 225 4.5 320 GDDR5 5110P (KNC) threads) Intel Xeon Phi 72 (288 14 1.7 - ~3500 245 14.3 115.2 DDR4 7290 (KNL) threads) 2.4 DDR3- Intel Xeon 32 10 202.6 101.3 130 0.78 42.7 E7-8870 (2.8) 1333 DDR3- AMD Opteron 45 2.5 12 240 120 140 0.86 42.7 6176 SE 1333 Xilinx V6 40 - - - 98.8 50 3.3 - - SX475T Altera Stratix V 28 - -Tarek El-Ghazawi, - GWU 210 60 3.5 - - 14 GSB8 Most Power Efficient Architectures: Green 500 Tarek El-Ghazawi, GWU https://www.top500.org/green500/lists/2016/11/15 Outline What is High-Performance Computing? Why is High-Performance Computing Important? Advances in Performance and Architectures Heterogeneous Accelerated Computing Advances in Parallel Programming What is Next: The HPCS Program, near-term What is Next: Exascale and DoE Conclusions Tarek El-Ghazawi, GWU 16 How the Supercomputing Race is Conducted? TOP500 Supercomputers and LINPACK Top500 in November and in June Rmax - Maximal LINPACK performance achieved Rpeak - Theoretical peak performance In the TOP500 List table, the computers are ordered first by their Rmax value In the case of equal performances (Rmax value) for different computers, order is by Rpeak For sites that have the same performance, the order is by memory size and then alphabetically Check www.top500.org for more information Tarek El-Ghazawi, GWU 17 Top 10 Supercomputers: November 2016 www.top500.org Countr R Rank Site Computer # Cores max y (PFlops) Sunway TaihuLight - Sunway National Supercomputing Center in MPP, Sunway SW26010 260C 10,649,60 1 Wuxi 0 93.0 1.45GHz, Sunway China NRCPC Tianhe-2 (MilkyWay-2) - TH- National University of Defense IVB-FEP Cluster, Intel Xeon E5- 2 Technology 2692 12C 2.200GHz, TH 3,120,000 33.9 China Express-2, Intel Xeon Phi 31S1P Titan – Cray XK7, Opteron 16 3 Oak Ridge National Laboratory Cores, 2.2GHz, Gemini, 560,640 17.6 Nvidia K20X Sequoia – BlueGene/Q, Power Lawrence Livermore National 1,572,86 4 BQC 16 Cores, Custom 16.3 Laboratory 4 interconnection Cori - Cray XC40, Intel Xeon DOE/SC/LBNL/NERSC Phi 7250 68C 1.4GHz, Aries 5 622,336 14.0 United States interconnect Cray Inc. Tarek El-Ghazawi, GWU 18 Top 10 Supercomputers: November 2016 www.top500.org R # max Rank Country Site Computer (PFlop Cores s) Oakforest-PACS - PRIMERGY Joint Center for Advanced High CX1640 M1, Intel Xeon Phi 556,10 6 Performance Computing 13.6 7250 68C 1.4GHz, Intel Omni- 4 Japan Path, Fujitsu RIKEN Advanced Institute for K Computer – SPARC64 VIIIfx 795,02 7 10.5 Computational Science 2.0 GHz, Tofu Interconnect 4 Piz Daint - Cray XC30, Xeon Swiss National Supercomputing E5-2670 8C 2.600GHz, Aries 206,72 8 Centre (CSCS) 9.8 interconnect , NVIDIA K20x 0 Switzerland Cray Inc. Mira – BlueGene/Q, Power 786,43 9 Argonne National Laboratory BQC 16 Cores, Custom 8.16 2 interconnection Trinity - Cray XC40, Xeon E5- DOE/NNSA/LANL/SNL 2698v3 16C 2.3GHz, Aries 301,05 10 8.1 United States interconnect 6 Tarek El-Ghazawi,Cray GWU Inc. 19 History Source: top500.org. Also see: http://spectrum.ieee.org/tech-talk/computing/hardware/china-builds-worlds- fastest-supercomputer Tarek El-Ghazawi, GWU 20 Supercomputers - History R Computer Processor # Pr. Year max (TFlops) Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz 10649600 2016 93,014 TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C Tianhe-2 (MilkyWay-2) 3120000 2013 33,862 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P Titan Cray XK7, Opteron 16 Cores, 2.2GHz, NvidiaK20X 560640 2012 17,600 K-Computer, Japan SPARC64 VIIIfx 2.0GHz, 705024 2011 10,510 Intel EM64T Xeon X56xx (Westmere-EP) 2930 MHz Tianhe-1A, China 186368 2010 2,566 (11.72 Gflops) + NVIDIA GPU, FT-1000 8C Jaguar, Cray Cray XT5-HE Opteron Six Core 2.6 GHz 224162 2009 1,759 Roadrunner, IBM PowerXCell 8i 3200 MHz (12.8 GFlops) 122400 2008 1,026 BlueGene/L - eServer PowerPC 440 700 MHz (2.8 GFlops) 212992 2007 478 Blue Gene Solution, IBM BlueGene/L – eServer PowerPC 440 700 MHz (2.8 GFlops) 131072 2005 280 Blue Gene Solution, IBM BlueGene/L beta-System IBM PowerPC 440 700 MHz (2.8 GFlops) 32768 2004 70.7 Earth-Simulator / NEC NEC 1000 MHz (8 GFlops) 5120 2002 35.8 IBM ASCI White,SP POWER3 375 MHz (1.5 GFlops) 8192 2001 7.2 IBM ASCI White,SP POWER3 375MHz (1.5 GFlops) 8192 2000 4.9 Intel ASCI Red Intel IA-32 Pentium Pro 333 MHz (0.333 GFlops) 9632 1999 2.4 Tarek El-Ghazawi, GWU 21 Historical Analysis Performance MPPs with Multicores and Massively Heterogeneous Accelerators Vector Parallel Machines Processors PetaFLOPS Tons of Lightweight Cores TeraFLOPS Discrete Integrated Time 1993- 2008- 2011 2016 HPCC End of Moore’s Law in Clocking! Tarek El-Ghazawi, GWU 22 DARPA High-Productivity Computing Systems Launched in 2002 Next Generation Supercomputers by 2010 Not only performance, but productivity, where Productivity = f(execution time, Development time) Typically, Productivity = utility/cost Addresses everything – hardware and software Tarek El-Ghazawi, GWU 23 HPCS Structure Each Team is led by a company and includes university research groups Three Phases Phase I: Research Concepts SGI, HP, Cray, IBM, and Sun Phase II: R&D Cray, IBM, Sun Phase III: Deployment Cray, IBM GWU with SGI in Phase I and IBM in Phase II Tarek El-Ghazawi, GWU 24 IBM, Sun & Cray’s effort on HPCS Vendor Project Hardware Arch.

Load more