Highlights of the 51St TOP500 List

ISC 2018, Frankfurt, Highlights of June 25, 2018 the 51st Erich TOP500 List Strohmaier ISC18 TOP500 TOPICS • New #1 • New TOP5 • Slow-down, aging, and concentration • 25 years of systems and sites • China, a new twist • Industry and GPUs • HPCG petaflops Power # Site Manufacturer Computer Country Cores Rmax ST [Pflops] [MW] Oak Ridge 41 LIST: THESummit TOP10 1 IBM USA 2,282,544 122.3 8.8 National Laboratory IBM Power System, P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 National Supercomputing Sunway TaihuLight 2 NRCPC China 10,649,600 93.0 15.4 Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Lawrence Livermore Sierra 3 IBM USA 1,572,480 71.6 National Laboratory IBM Power System, P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 National University of Tianhe-2A 4 NUDT China 4,981,760 61.4 18.5 Defense Technology ANUDT TH-IVB-FEP, Xeon 12C 2.2GHz, Matrix-2000 National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 5 Fujitsu Japan 391,680 19.9 1.65 Industrial Science and Technology PRIMERGY CX2550 M4, Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Swiss National Supercomputing Piz Daint 6 Cray Switzerland 361,760 19.6 2.27 Centre (CSCS) Cray XC50, Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Oak Ridge Titan 7 Cray USA 560,640 17.6 8.21 National Laboratory Cray XK7, Opteron 16C 2.2GHz, Gemini, NVIDIA K20x Lawrence Livermore Sequoia 8 IBM USA 1,572,864 17.2 7.89 National Laboratory BlueGene/Q, Power BQC 16C 1.6GHz, Custom Los Alamos NL / Trinity 9 Cray USA 979,968 14.1 3.84 Sandia NL Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries Cori Lawrence Berkeley 10 Cray Cray XC40, USA 622,336 14.0 3.94 National Laboratory Intel Xeons Phi 7250 68C 1.4 GHz, Aries System Overview System Performance Each node has The system includes • Peak performance of 200 • 2 IBM POWER9 • 4608 nodes petaflops for modeling & processors • Dual-rail Mellanox EDR simulation • 6 NVIDIA Tesla V100 GPUs InfiniBand network • Peak of 3.3 ExaOps for data • 608 GB of fast memory • 250 PB IBM Spectrum analytics and artificial • 1.6 TB of NVMe memory Scale intelligence file system transferring data at 2.5 TB/s 5 Power # Site Manufacturer Computer Country Cores Rmax ST [Pflops] [MW] Oak Ridge 41 LIST: THESummit TOP10 1 IBM USA 2,282,544 122.3 8.8 National Laboratory IBM Power System, P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 National Supercomputing Sunway TaihuLight 2 NRCPC China 10,649,600 93.0 15.4 Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Lawrence Livermore Sierra 3 IBM USA 1,572,480 71.6 National Laboratory IBM Power System, P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 National University of Tianhe-2A 4 NUDT China 4,981,760 61.4 18.5 Defense Technology ANUDT TH-IVB-FEP, Xeon 12C 2.2GHz, Matrix-2000 National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 5 Fujitsu Japan 391,680 19.9 1.65 Industrial Science and Technology PRIMERGY CX2550 M4, Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Swiss National Supercomputing Piz Daint 6 Cray Switzerland 361,760 19.6 2.27 Centre (CSCS) Cray XC50, Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Oak Ridge Titan 7 Cray USA 560,640 17.6 8.21 National Laboratory Cray XK7, Opteron 16C 2.2GHz, Gemini, NVIDIA K20x Lawrence Livermore Sequoia 8 IBM USA 1,572,864 17.2 7.89 National Laboratory BlueGene/Q, Power BQC 16C 1.6GHz, Custom Los Alamos NL / Trinity 9 Cray USA 979,968 14.1 3.84 Sandia NL Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries Cori Lawrence Berkeley 10 Cray Cray XC40, USA 622,336 14.0 3.94 National Laboratory Intel Xeons Phi 7250 68C 1.4 GHz, Aries AVERAGE SYSTEM AGE 25 20 15 10 7.6 month Age [Months] 5 0 RANK AT WHICH HALF OF TOTAL PERFORMANCE IS ACCUMULATED 100 80 60 40 20 0 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 PERFORMANCE DEVELOPMENT June 2013 10 Eflop/s 1.00E+10 1.21 EFlop/s 1 Eflop/s 1.00E+09 122 PFlop/s 1.00E+08 100 Pflop/s 1.00E+07 10 Pflop/s SUM 716 TFlop/s 1.00E+06 1 Pflop/s 1.00E+05 100 Tflop/s 1.00E+04 10 Tflop/s N=1 1.00E+03 1 Tflop/s 1.17 TFlop/s 1.00E+02 100 Gflop/s N=500 59.7 GFlop/s 1.00E+01 10 Gflop/s June 2008 1.00E+00 1 Gflop/s 422 MFlop/s 1.00E-01 100 Mflop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 ANNUAL PERFORMANCE INCREASE OF THE TOP500 2.6 2.4 TOP500 2.2 1000 x 2 TOP500: Averages 11 y 1.8 15 y 1.6 Moore’s Law 20 y 1.4 1.2 1 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 25 Years – 51 Editions How do we integrate/aggregate over time/editions? • System counts are focused on the low end • Moore’s Law overpowers everything performance based • Normalize each list by average performance • HPL not Peak • Average performance not max (#1) or min • Add up contributions from various lists over time • Each full lists contributed a total of 500 • 51 lists together have a total weight of 25,500 No. 1’s – HPL R_max 11.0E+06 Eflop/s Summit Sunway TaihuLight 1001.0E+05 Pflop/s Tianhe-2 Titan 10 Pflop/s Sequoia 1.0E+04 K computer 1 Pflop/s Tianhe-1A 1.0E+03 Jaguar 100 Tflop/s Roadrunner 1.0E+02 BlueGene/L 10 Tflop/s Earth-Simulator 1.0E+01 ASCI White 1 Tflop/s ASCI Red 1.0E+00 CP-PACS/2048 SR2201/1024 100 Gflop/s 1.0E-01 XP/S140 Numerical Wind Tunnel 10 Gflop/s 1.0E-02 CM-5/1024 1995 2000 2005 2010 2015 # 1’s – Accumulated Norm-HPL 500 Summit Sunway TaihuLight 450 Tianhe-2 Titan 400 Sequoia 350 K computer Tianhe-1A 300 Jaguar Roadrunner 250 BlueGene/L 200 Earth-Simulator ASCI White 150 ASCI Red CP-PACS/2048 100 SR2201/1024 50 XP/S140 Numerical Wind Tunnel 0 CM-5/1024 1995 2000 2005 2010 2015 Dominant Sites # Site Country Norm-HPL 1 LLNL USA 1,504 2 LANL USA 816 3 ORNL USA 795 4 SNL USA 658 5 NSCC Guangzhou China 474 6 RIKEN AICS Japan 361 7 NASA/Ames USA 356 8 FZ Jülich Germany 339 9 JAMSTEC Japan 325 10 NERSC / LBNL USA 310 11 ANL USA 305 COUNTRIES 500 China 400 Korea, South Italy 300 Canada 200 France United Kingdom 100 Germany 0 Japan United States 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 COUNTRIES / SYSTEM SHARE South Korea, 1% Others, 10% United States Ireland, 1% China Netherlands, 2% Japan France, 4% United United Kindom Germany, 4% States, 25% United Kindom, Germany 5% China, 41% France Japan, 7% Netherlands Ireland 100 200 300 400 500 0 1993 1995 1997 1999 2001 PRODUCERS 2003 2005 2007 2009 2011 2013 2015 2017 Japan Europe China USA Russia VENDORS / SYSTEM SHARE Penguin Others, 38, 8% Fujitsu, Compung, Lenovo 13, 3% 11, 2% HPE Dell EMC, 13, 3% Lenovo, 117, 23% Inspur Huawei, 14, 3% Sugon IBM, 18, 4% HPE, 79, Cray Inc. Bull, 21, 4% 16% Bull Cray Inc., 53, IBM 10% Inspur, 68, 13% Huawei Sugon, 55, 11% # of systems, % of 500 VENDORS / PERFORMANCE SHARE others, 132, 11% Lenovo, 143, Lenovo NRCPC, 12% Penguin 94, 8% HPE Compung, HPE, 120, 10% 15, 1% Inspur Fujitsu, 64, 5% Sugon Inspur, 73, 6% Dell EMC, 26, 2% Cray Inc. Huawei, 12, IBM, 239, Sugon, 53, 4% Bull 1% 20% IBM Cray Inc., 188, Huawei Bull, 52, 4% 16% Sum of Pflop/s, % of whole list ACCELERATORS 130 Matrix-2000 120 PEZY-SC 110 100 Kepler/Phi 90 Xeon Phi Main 80 Intel Xeon Phi 70 Clearspeed 60 IBM Cell Systems 50 40 ATI Radeon 30 Nvidia Volta 20 Nvidia Pascal 10 Nvidia Kepler 0 Nvidia Fermi 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 PERFORMANCE SHARE OF ACCELERATORS 60% 50% Xeon Phi Main 40% Accelerators 30% Performance 20% 10% FracYon of Total TOP500 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 MOST ENERGY EFFICIENT ARCHITECTURES Rmax/ Computer Power Shoubou system B, ZettaScaler-2.2 Xeon 16C 1.3GHz Infiniband EDR PEZY-SC2 18.4 Suiren2, ZettaScaler-2.2 Xeon 16C 1.3GHz Infiniband EDR PEZY-SC2 16.8 Sakura, ZettaScaler-2.2 Xeon 8C 2.3GHz Infiniband EDR PEZY-SC2 16.7 DGX Saturn V, NVIDIA DGX-1 Volta36 Xeon 20C 2.2GHz Infiniband EDR Tesla V100 15.1* Summit, IBM Power System Power9 22C 3.07GHz Infiniband EDR Volta GV100 13.9 Tsubame 3.0, SGI ICE XA Xeon 14C 2.4GHz Intel Omni-Path Tesla P100 SXM2 13.7* AIST AI Cloud, NEC 4U-8GPU Xeon 10C 1.8GHz Infiniband EDR Tesla P100 SXM2 12.7 AI Bridging Cloud Infrastructure (ABCI), Xeon Gold 20C Infiniband EDR Tesla V100 SXM2 12.1 Fujitsu PRIMERGY, NVIDIA Tesla V100 2.4GHz MareNostrum P9 CTE, IBM Power System Power9 22C 3.1GHz Infiniband EDR Tesla V100 11.9 Wilkes-2, Dell C4130 Xeon 12C 2.2GHz Infiniband EDR Tesla P100 10.4 * Efficiency based on Power op`mized HPL runs of equal size to TOP500 run.

Highlights of the 51St TOP500 List

ORNL Debuts Titan Supercomputer Table of Contents

Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators

Tsubame 2.5 Towards 3.0 and Beyond to Exascale

Science-Driven Development of Supercomputer Fugaku

2017 HPC Annual Report Team Would Like to Acknowledge the Invaluable Assistance Provided by John Noe

Safety and Security Challenge

Towards Exascale Computing

Computational PHYSICS Shuai Dong

Lessons Learned in Deploying the World's Largest Scale Lustre File

Titan: a New Leadership Computer for Science

Musings RIK FARROWOPINION

FCMSSR Meeting 2018-01 All Slides