<<

ISC 2018, Frankfurt, Highlights of June 25, 2018 the 51st Erich TOP500 List Strohmaier ISC18 TOP500 TOPICS • New #1 • New TOP5 • Slow-down, aging, and concentration • 25 years of systems and sites • , a new twist • Industry and GPUs • HPCG petaflops Power # Site Manufacturer Country Cores Rmax ST [Pflops] [MW] Oak Ridge 41 LIST: THESummit TOP10 1 IBM USA 2,282,544 122.3 8.8 National Laboratory IBM Power System, P9 22C 3.07GHz, Mellanox EDR, GV100 National Supercomputing TaihuLight 2 NRCPC China 10,649,600 93.0 15.4 Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Lawrence Livermore 3 IBM USA 1,572,480 71.6 National Laboratory IBM Power System, P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 National University of Tianhe-2A 4 NUDT China 4,981,760 61.4 18.5 Defense Technology ANUDT TH-IVB-FEP, 12C 2.2GHz, Matrix-2000 National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 5 391,680 19.9 1.65 Industrial Science and Technology PRIMERGY CX2550 M4, Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Swiss National Supercomputing Piz Daint 6 361,760 19.6 2.27 Centre (CSCS) Cray XC50, Xeon E5 12C 2.6GHz, Aries, P100 Oak Ridge 7 Cray USA 560,640 17.6 8.21 National Laboratory Cray XK7, 16C 2.2GHz, Gemini, NVIDIA K20x Lawrence Livermore 8 IBM USA 1,572,864 17.2 7.89 National Laboratory BlueGene/Q, Power BQC 16C 1.6GHz, Custom Los Alamos NL / Trinity 9 Cray USA 979,968 14.1 3.84 Sandia NL Cray XC40, 7250 68C 1.4GHz, Aries Cori Lawrence Berkeley 10 Cray Cray XC40, USA 622,336 14.0 3.94 National Laboratory Intel Phi 7250 68C 1.4 GHz, Aries System Overview

System Performance Each node has The system includes

• Peak performance of 200 • 2 IBM POWER9 • 4608 nodes petaflops for modeling & processors • Dual-rail Mellanox EDR simulation • 6 NVIDIA Tesla V100 GPUs InfiniBand network • Peak of 3.3 ExaOps for data • 608 GB of fast memory • 250 PB IBM Spectrum analytics and artificial • 1.6 TB of NVMe memory Scale intelligence file system transferring data at 2.5 TB/s

5 Power # Site Manufacturer Computer Country Cores Rmax ST [Pflops] [MW] Oak Ridge 41 LIST: THESummit TOP10 1 IBM USA 2,282,544 122.3 8.8 National Laboratory IBM Power System, P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 National Supercomputing Sunway TaihuLight 2 NRCPC China 10,649,600 93.0 15.4 Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Lawrence Livermore Sierra 3 IBM USA 1,572,480 71.6 National Laboratory IBM Power System, P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 National University of Tianhe-2A 4 NUDT China 4,981,760 61.4 18.5 Defense Technology ANUDT TH-IVB-FEP, Xeon 12C 2.2GHz, Matrix-2000 National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 5 Fujitsu Japan 391,680 19.9 1.65 Industrial Science and Technology PRIMERGY CX2550 M4, Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Swiss National Supercomputing Piz Daint 6 Cray Switzerland 361,760 19.6 2.27 Centre (CSCS) Cray XC50, Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Oak Ridge Titan 7 Cray USA 560,640 17.6 8.21 National Laboratory Cray XK7, Opteron 16C 2.2GHz, Gemini, NVIDIA K20x Lawrence Livermore Sequoia 8 IBM USA 1,572,864 17.2 7.89 National Laboratory BlueGene/Q, Power BQC 16C 1.6GHz, Custom Los Alamos NL / Trinity 9 Cray USA 979,968 14.1 3.84 Sandia NL Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries Cori Lawrence Berkeley 10 Cray Cray XC40, USA 622,336 14.0 3.94 National Laboratory Intel Xeons Phi 7250 68C 1.4 GHz, Aries AVERAGE SYSTEM AGE

25

20

15

10 7.6 month Age [Months] 5

0 RANK AT WHICH HALF OF TOTAL PERFORMANCE IS ACCUMULATED 100

80

60

40

20

0 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 PERFORMANCE DEVELOPMENT June 2013 10 Eflop/s 1.00E+10 1.21 EFlop/s 1 Eflop/s 1.00E+09 122 PFlop/s 1.00E+08 100 Pflop/s 1.00E+07 10 Pflop/s SUM 716 TFlop/s 1.00E+06 1 Pflop/s 1.00E+05 100 Tflop/s 1.00E+04 10 Tflop/s N=1 1.00E+03 1 Tflop/s 1.17 TFlop/s 1.00E+02 100 Gflop/s N=500 59.7 GFlop/s 1.00E+01 10 Gflop/s June 2008 1.00E+00 1 Gflop/s 422 MFlop/s 1.00E-01 100 Mflop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 ANNUAL PERFORMANCE INCREASE OF THE TOP500

2.6 2.4 TOP500 2.2 1000 x 2 TOP500: Averages 11 y 1.8 15 y 1.6 Moore’s Law 20 y 1.4 1.2 1 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 25 Years – 51 Editions

How do we integrate/aggregate over time/editions? • System counts are focused on the low end • Moore’s Law overpowers everything performance based

• Normalize each list by average performance • HPL not Peak • Average performance not max (#1) or min • Add up contributions from various lists over time • Each full lists contributed a total of 500 • 51 lists together have a total weight of 25,500 No. 1’s – HPL R_max

11.0E+06 Eflop/s Sunway TaihuLight 1001.0E+05 Pflop/s Tianhe-2 Titan 10 Pflop/s Sequoia 1.0E+04 1 Pflop/s Tianhe-1A 1.0E+03 100 Tflop/s Roadrunner 1.0E+02 BlueGene/L 10 Tflop/s Earth-Simulator 1.0E+01 ASCI White 1 Tflop/s ASCI Red 1.0E+00 CP-PACS/2048 SR2201/1024 100 Gflop/s 1.0E-01 XP/S140 Numerical Wind Tunnel 10 Gflop/s 1.0E-02 CM-5/1024 1995 2000 2005 2010 2015 # 1’s – Accumulated Norm-HPL

500 Summit Sunway TaihuLight 450 Tianhe-2 Titan 400 Sequoia 350 K computer Tianhe-1A 300 Jaguar Roadrunner 250 BlueGene/L 200 Earth-Simulator ASCI White 150 ASCI Red CP-PACS/2048 100 SR2201/1024 50 XP/S140 Numerical Wind Tunnel 0 CM-5/1024 1995 2000 2005 2010 2015 Dominant Sites # Site Country Norm-HPL 1 LLNL USA 1,504 2 LANL USA 816 3 ORNL USA 795 4 SNL USA 658 5 NSCC Guangzhou China 474 6 RIKEN AICS Japan 361 7 NASA/Ames USA 356 8 FZ Jülich 339 9 JAMSTEC Japan 325 10 NERSC / LBNL USA 310 11 ANL USA 305 COUNTRIES

500 China

400 Korea, South 300 200 100 Germany 0 Japan 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 COUNTRIES / SYSTEM SHARE

South Korea, 1% Others, 10% United States Ireland, 1% China , 2% Japan France, 4% United United Kindom Germany, 4% States, 25% United Kindom, Germany 5% China, 41% France Japan, 7% Netherlands Ireland PRODUCERS

500 400 China 300

200 Europe

100 Japan 0 USA 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 VENDORS / SYSTEM SHARE

Penguin Others, 38, 8% Fujitsu, Compung, 13, 3% 11, 2% HPE Dell EMC, 13, 3% Lenovo, 117, 23% Huawei, 14, 3% IBM, 18, 4% HPE, 79, Cray Inc. Bull, 21, 4% 16% Bull Cray Inc., 53, IBM 10% Inspur, 68, 13% Huawei Sugon, 55, 11%

# of systems, % of 500 VENDORS / PERFORMANCE SHARE

others, 132, 11% Lenovo, 143, Lenovo NRCPC, 12% Penguin 94, 8% HPE Compung, HPE, 120, 10% 15, 1% Inspur Fujitsu, 64, 5% Sugon Inspur, 73, 6% Dell EMC, 26, 2% Cray Inc. Huawei, 12, IBM, 239, Sugon, 53, 4% Bull 1% 20% IBM Cray Inc., 188, Huawei Bull, 52, 4% 16%

Sum of Pflop/s, % of whole list ACCELERATORS

130 Matrix-2000 120 PEZY-SC 110 100 Kepler/Phi 90 Xeon Phi Main 80 Intel Xeon Phi 70 Clearspeed 60 IBM Systems 50 40 ATI Radeon 30 Nvidia Volta 20 Nvidia Pascal 10 Nvidia Kepler 0 Nvidia Fermi 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 PERFORMANCE SHARE OF ACCELERATORS

60%

50% Xeon Phi Main 40% Accelerators 30%

Performance 20%

10% Fracon of Total TOP500 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 MOST ENERGY EFFICIENT ARCHITECTURES Rmax/ Computer Power Shoubou system B, ZettaScaler-2.2 Xeon 16C 1.3GHz Infiniband EDR PEZY-SC2 18.4 Suiren2, ZettaScaler-2.2 Xeon 16C 1.3GHz Infiniband EDR PEZY-SC2 16.8 Sakura, ZettaScaler-2.2 Xeon 8C 2.3GHz Infiniband EDR PEZY-SC2 16.7 DGX Saturn V, NVIDIA DGX-1 Volta36 Xeon 20C 2.2GHz Infiniband EDR Tesla V100 15.1* Summit, IBM Power System Power9 22C 3.07GHz Infiniband EDR Volta GV100 13.9 3.0, SGI ICE XA Xeon 14C 2.4GHz Intel Omni-Path Tesla P100 SXM2 13.7* AIST AI Cloud, NEC 4U-8GPU Xeon 10C 1.8GHz Infiniband EDR Tesla P100 SXM2 12.7 AI Bridging Cloud Infrastructure (ABCI), Xeon Gold 20C Infiniband EDR Tesla V100 SXM2 12.1 Fujitsu PRIMERGY, NVIDIA Tesla V100 2.4GHz MareNostrum P9 CTE, IBM Power System Power9 22C 3.1GHz Infiniband EDR Tesla V100 11.9 Wilkes-2, Dell C4130 Xeon 12C 2.2GHz Infiniband EDR Tesla P100 10.4

* Efficiency based on Power opmized HPL runs of equal size to TOP500 run. [Gflops/Wa] POWER EFFICIENCY

7 TOP10

/W] 6 5

Gflops TOP50 4

3

/Power [ 2 1 TOP500 Linpack 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ENERGY EFFICIENCY

20 ZeaScaler-2.2 18 Max-Efficiency /W] 16 Tsubame 3.0 14 DGX SaturnV Gflops 12 ZeaScaler-1.6

10 Tsubame KFC 8 AMD FirePro NVIDIA K20x – K80 /Power [ 6 Mic 4 BlueGene/Q Cell

Linpack 2 TOP500 Average 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 HPCG HPCG/ HPCG/ # T Site Manufacturer Computer Country Rmax ST [Pflop/s] [Pflop/s] Peak HPL 41 LIST: SummitTHE TOP10 1 1 Oak Ridge National Laboratory IBM IBM Power System, USA 2.9258 122.3 1.6% 2.4% P9 22C 3.07 GHz, Volta GV100, EDR Lawrence Livermore Sierra 2 3 IBM USA 1.7957 71.6 1.5% 2.5% National Laboratory IBM Power System, P9 22C 3.1 GHz, Volta GV100, EDR RIKEN Advanced Institute for K Computer 3 16 Fujitsu Japan 0.6027 10.5 5.3% 5.7% Computational Science SPARC64 VIIIfx 2.0GHz, Tofu Interconnect Trinity Los Alamos NL / 4 9 Cray Cray XC40, USA 0.5461 14.1 1.2% 3.9% Sandia NL Intel Xeon Phi 7250 68C 1.4GHz, Aries Swiss National Supercomputing Piz Daint 5 6 Cray Switzerland 0.4864 19.6 1.9% 2.5% Centre (CSCS) Cray XC50, Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 National Supercomputing Sunway TaihuLight 6 2 NRCPC China 0.4808 93.0 0.4% 0.5% Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz JCAHPC Oakforest-PACS 7 12 Fujitsu Japan 0.3855 13.6 1.5% 2.8% Joint Center for Advanced HPC PRIMERGY CX1640 M1, Intel Xeons Phi 7250 68C 1.4 GHz, OmniPath Lawrence Berkeley Cori 8 10 Cray USA 0.3554 14.0 1.3% 2.5% National Laboratory Cray XC40, Intel Xeons Phi 7250 68C 1.4 GHz, Aries Commissariat a l'Energie Tera-1000-2 9 14 Bull France 0.3338 12.0 1.4% 2.8% Atomique (CEA) Bull Sequana X1000, Intel Xeon Phi 7250 68C 1.4 GHz, Bull BXI 1.2 Lawrence Livermore Sequoia 10 8 IBM USA 0.3304 17.2 1.6% 1.9% National Laboratory BlueGene/Q, Power BQC 16C 1.6GHz, Custom ISC18 TOP500 HIGHLIGHTS • ORNL’s Summit is new #1 (IBM, NVIDIA, Mellanox). • Four ‘new’ system in the TOP5! (Summit, Sierra, Tianhe-2A, ABCI) • Slow-down in performance growth since 2013 goes hand in hand with – Longer system usage (~2x) and – Concentration of capabilities at the top (relatively larger top systems) • Lenovo is first Chinese manufacturer to sell systems in numbers outside of China (everywhere) (China: 20, USA: 21, ROW: 23). • Accelerated system get finally adopted by industrial users (25% of new systems in November + June). • Summit and Sierra are the first systems to achieve over 1 Pflop/s on HPCG.