Highlights of the 53Rd TOP500 List

Total Page:16

File Type:pdf, Size:1020Kb

Highlights of the 53Rd TOP500 List ISC 2019, Frankfurt, Highlights of June 17, 2019 the 53rd Erich TOP500 List Strohmaier ISC 2019 TOP500 TOPICS • Petaflops are everywhere! • “New” TOP10 • Dennard scaling and the TOP500 • China: Top consumer and producer ? A closer look • Green500, HPCG • Future of TOP500 Power # Site Manufacturer Computer Country Cores Rmax ST [Pflops] [MW] Oak Ridge 41 LIST: THESummit TOP10 1 IBM IBM Power System, USA 2,414,592 148.6 10.1 National Laboratory P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 Sierra Lawrence Livermore 2 IBM IBM Power System, USA 1,572,480 94.6 7.4 National Laboratory P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 National Supercomputing Sunway TaihuLight 3 NRCPC China 10,649,600 93.0 15.4 Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Tianhe-2A National University of 4 NUDT ANUDT TH-IVB-FEP, China 4,981,760 61.4 18.5 Defense Technology Xeon 12C 2.2GHz, Matrix-2000 Texas Advanced Computing Frontera 5 Dell USA 448,448 23.5 Center / Univ. of Texas Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox HDR Piz Daint Swiss National Supercomputing 6 Cray Cray XC50, Switzerland 387,872 21.2 2.38 Centre (CSCS) Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Los Alamos NL / Trinity 7 Cray Cray XC40, USA 979,072 20.2 7.58 Sandia NL Intel Xeon Phi 7250 68C 1.4GHz, Aries National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 8 Industrial Science and Fujitsu PRIMERGY CX2550 M4, Japan 391,680 19.9 1.65 Technology Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 SuperMUC-NG 9 Leibniz Rechenzentrum Lenovo ThinkSystem SD530, Germany 305,856 19.5 Xeon Platinum 8174 24C 3.1GHz, Intel Omni-Path Lawrence Livermore Lassen 10 IBM USA 288,288 18.2 National Laboratory IBM Power System, P9 22C 3.1GHz, Mellanox EDR, NVIDIA Tesla V100 100 150 200 250 300 350 50 0 1993 1994 1995 1996 1997 REPLACEMENT RATE 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 96 PERFORMANCE DEVELOPMENT 1,00E+10 1.56 EFlop/s 1 Eflop/s 1,00E+09 149 PFlop/s 1,00E+08100 Pflop/s 10 Pflop/s SUM 1,00E+07 1.01 PFlop/s 1,00E+061 Pflop/s 1,00E+05100 Tflop/s N=1 1,00E+0410 Tflop/s 1,00E+031 Tflop/s 1.17 TFlop/s N=500 1,00E+02100 Gflop/s 59.7 GFlop/s 1,00E+0110 Gflop/s June 2008 1,00E+001 Gflop/s Slowdown of Dennard Scaling 422 MFlop/s 1,00E-01100 Mflop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 Age [Months] 10 15 20 25 0 5 1995 1996 AVERAGE SYSTEM AGE 1998 1999 2001 leads olderto systems systeMs upgrade to incentives Less 2002 2004 2005 2007 2008 2010 2011 2013 2014 7.6 month 2016 2017 2019 RANK AT WHICH HALF OF TOTAL PERFORMANCE IS ACCUMULATED 100 Less freQuent purchases allow to increase systeM sizes and 80 lead to a More top-heavy TOP500 60 40 20 0 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 ANNUAL PERFORMANCE INCREASE OF THE TOP500 Less freQuent purchases on steady 2,6 budget leads to increases in systeM 2,4 sizes, which temporarily keeps TOP500 2,2 overall perforMance increase up. 1000 x 2 TOP500: Averages 11 y 1,8 15 y 1,6 20 y 1,4 Moore’s Law 1,2 1 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 PERFORMANCE DEVELOPMENT 1,00E+11100 Eflop/s June 2013 10 Eflop/s 1.56 EFlop/s 149 PFlop/s 1,00E+091 Eflop/s 100 Pflop/s SUM 10 Pflop/s 1,00E+07 1.02 PFlop/s 1 Pflop/s 100 Tflop/s 1,00E+05 N=1 10 Tflop/s 1,00E+031 Tflop/s 1.17 TFlop/s 100 Gflop/s N=500 59.7 GFlop/s 1,00E+0110 Gflop/s June 2008 1 Gflop/s 1,00E-01100 Mflop/s 422 MFlop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 PROJECTED PERFORMANCE DEVELOPMENT 1,00E+11100 Eflop/s 1,00E+1010 Eflop/s 1,00E+091 Eflop/s 1,00E+08100 Pflop/s 10 Pflop/s 1,00E+07 SUM 1,00E+061 Pflop/s 100 Tflop/s 1,00E+05 N=1 1,00E+0410 Tflop/s 1,00E+031 Tflop/s 1,00E+02100 Gflop/s N=500 1,00E+0110 Gflop/s 1,00E+001 Gflop/s 1,00E-01100 Mflop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 COUNTRIES 500 China 400 Korea, South Italy 300 Canada 200 France United Kingdom 100 Germany 0 Japan United States 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 COUNTRIES / SYSTEM SHARE Canada; 2% Others; 10% United States Ireland; 3% China Netherlands; 2% Japan Germany; 3% United France United Kingdom; States; 23% 3% United Kingdom Germany France; 4% China; 44% Japan; 6% Netherlands Ireland Canada COUNTRIES / PERFORMANCE SHARE Others; 77; 5% Ireland; United States Italy; 2% 1% China Switzerland; 4% Japan Netherlands; 1% France Germany; 4% United States; 38% United Kingdom United Kingdom; Germany 3% Netherlands France; 4% China; 30% Ireland Japan; 8% Switzerland COUNTRIES (TOP50) / SYSTEM SHARE United States Others; 10% China Italy; 4% Japan France United States; United Kingdom Germany; 8% 38% Germany United Italy Kingdom; 6% Japan; 18% China; 6% Others France; 10% 100 200 300 400 500 0 1993 1995 1997 1999 2001 PRODUCERS 2003 2005 2007 2009 2011 2013 2015 2017 2019 USA Japan Europe China Russia VENDORS / SYSTEM SHARE Lenovo Others; 41; 8% Inspur IBM; 16; 3% Sugon Fujitsu; 14; 3% Lenovo; 175; HPE Dell EMC; 17; 3% 35% Cray Inc. Bull; 21; 4% Bull Dell EMC Cray Inc.; 42; 9% Inspur; 71; 14% Fujitsu HPE; 40; 8% IBM Sugon; 63; 13% # of systems, % of 500 VENDORS / PERFORMANCE SHARE NUDT; 66; 4% others; 90; 6% Lenovo NRCPC; 93; 6% Inspur Lenovo; Sugon 306; 20% HPE Inspur; 87; 6% IBM; 324; Cray Inc. 21% Sugon; 96; 6% Bull Dell EMC HPE; 120; 8% Fujitsu Fujitsu; 71; 4% Cray Inc.; 195; IBM Dell EMC; 58; 4% Bull; 54; 3% 12% Sum of Pflop/s, % of whole list VENDORS (TOP50) / SYSTEM SHARE Cray Inc. HPE Others IBM Bull; 3; 6% ; 8; 16% Cray Inc.; 14; Fujitsu Lenovo; 3; 6% 28% Lenovo Bull Fujitsu; 5; 10% HPE; 10; Others IBM; 7; 20% 14% LENOVO* / SYSTEM SHARE 2018+2019 Others; 14; 9% Australia; 3; 2% China Canada; 4; 2% United States Singapore; 5; 3% Ireland Netherlands United Kingdom; China; 69; 6; 4% 42% United Kingdom Netherlands; 12; Singapore 7% United Canada Ireland; 13; 8% States; 38; Australia 23% Others * Inspur, Sugon, Huawei have not sold outside of China yet. ACCELERATORS 170 Matrix-2000 160 150 PEZY-SC 140 130 Kepler/Phi 120 Xeon Phi Main 110 100 Intel Xeon Phi 90 Clearspeed 80 IBM Cell SysteMs 70 60 50 ATI Radeon 40 Nvidia Volta 30 20 Nvidia Pascal 10 Nvidia Kepler 0 Nvidia Fermi 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 ACCELERATORS (TOP50) / SYSTEM SHARE Nvidia Kepler; 1 Nvidia Pascal; 3 Nvidia Kepler Nvidia Pascal Nvidia Nvidia Volta Volta; 9 Matrix-2000; 1 Matrix-2000 Deep Computing P. None; 23 Deep Computing Xeon Phi P.; 1 Xeon Phi Main Main; 8 ShenWei Power BQC ShenWei; 1 None Power BQC; 3 MOST ENERGY EFFICIENT ARCHITECTURES Rmax/ Computer Power Shoubou system B, ZettaScaler-2.2 Xeon 16C 1.3GHz Infiniband EDR PEZY-SC2 17.6 DGX Saturn V, NVIDIA DGX-1 Volta36 Xeon 20C 2.2GHz Infiniband EDR Tesla V100 *15.1 Summit, IBM Power System Power9 22C 3.07GHz Infiniband EDR Volta GV100 14.7 AI Bridging Cloud Infrastructure (ABCI), Xeon Gold 20C Infiniband EDR Tesla V100 SXM2 *14.4 Fujitsu PRIMERGY, NVIDIA Tesla V100 2.4GHz MareNostrum P9 CTE, IBM Power System Power9 22C 3.1GHz Infiniband EDR Tesla V100 14.1 Tsubame 3.0, SGI ICE XA Xeon 14C 2.4GHz Intel Omni-Path Tesla P100 SXM2 *13.7 PANGEA III, IBM Power System Power9 18C 3.45GHz Infiniband EDR Volta GV100 13.1 Sierra, IBM Power System Power9 22C 3.1GHz Infiniband EDR Volta GV100 12.7 Hygon Dhyana (Epyc ACS (PreE), Sugon TC8600 200Gb 6D Torus Deep Computing 11.4 7501) 32C 2GHz Xeon Gold 6154 18C Tawania 2, QCT QuantaGrid D52G-4U/LC InfiniBand EDR Tesla V100 SXM2 11.3 3GHz * Efficiency based on Power optimized HPL runs of equal size to TOP500 run. [Gflops/Watt] POWER EFFICIENCY 10 TOP10 /W] 8 Gflops 6 TOP50 4 /Power [ /Power 2 TOP500 Linpack 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 ENERGY EFFICIENCY 20 ZettaScaler-2.2 18 Max-Efficiency /W] 16 TsubaMe 3.0 14 DGX SaturnV Gflops 12 ZettaScaler-1.6 c 10 TsubaMe KFC 8 AMD FirePro NVIDIA K20x – K80 /Power [ /Power 6 Mic 4 BlueGene/Q Cell 2 Linpack TOP500 Average 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 HPCG HPCG/ HPCG/ # T Site Manufacturer Computer Country Rmax ST [Pflop/s] [Pflop/s] Peak HPL 41 LIST: SummitTHE TOP10 1 1 Oak Ridge National Laboratory IBM IBM Power System, USA 2.9258 148.6 1.6% 2.0% P9 22C 3.07 GHz, Volta GV100, EDR Sierra Lawrence Livermore 2 2 IBM IBM Power System, USA 1.7957 94.6 1.5% 1.9% National Laboratory P9 22C 3.1 GHz, Volta GV100, EDR RIKEN Advanced Institute for K Computer 3 18 Fujitsu Japan 0.6027 10.5 5.3% 5.7% Computational Science SPARC64 VIIIfx 2.0GHz, Tofu Interconnect Trinity Los Alamos NL / 4 6 Cray Cray XC40, USA 0.5461 20.2 1.2% 2.7% Sandia NL Intel Xeon Phi 7250 68C 1.4GHz, Aries National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 5 7 Industrial Science and Fujitsu PRIMERGY CX2550 M4, Japan 0.5089 19.9 1.6% 2.6% Technology Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Piz Daint Swiss National Supercomputing 6 5 Cray Cray XC50, Switzerland 0.4970 21.2 1.8% 2.3% Centre (CSCS) Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 National Supercomputing Sunway TaihuLight 7 3 NRCPC China 0.4808 93.0 0.4% 0.5% Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Nurion Korea Institute of Science and 8 13 Cray Cray CS500, South Korea 0.3915 13.9 1.5% 2.8% Technology Information Intel Xeons Phi 7250 68C 1.4 GHz, OmniPath JCAHPC Oakforest-PACS 9 14 Fujitsu Japan 0.3855 13.6 1.5% 2.8% Joint Center for Advanced HPC PRIMERGY CX1640 M1, Intel Xeons Phi 7250 68C 1.4 GHz, OmniPath Cori Lawrence Berkeley 10 12 Cray Cray XC40, USA 0.3554 14.0 1.3% 2.5% National Laboratory Intel Xeons Phi 7250 68C 1.4 GHz, Aries ISC 2019 TOP500 HIGHLIGHTS • Petaflops are everywhere! • Slow-down of Dennard’s scaling lead to longer procurement cycles which provided a 5 year delay from slowing down at the very TOP (2008-2013) – Don’t expect this for Moore’s Law! • China: Top consumer and producer overall but not in the TOP50 (yet) • Future increase in architectural diversity will necessitate a flexible approach to benchmarking.
Recommended publications
  • Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
    Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators Toshio Endo Akira Nukada Graduate School of Information Science and Engineering Global Scientific Information and Computing Center Tokyo Institute of Technology Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Satoshi Matsuoka Naoya Maruyama Global Scientific Information and Computing Center Global Scientific Information and Computing Center Tokyo Institute of Technology/National Institute of Informatics Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Abstract—We report Linpack benchmark results on the Roadrunner or other systems described above, it includes TSUBAME supercomputer, a large scale heterogeneous system two types of accelerators. This is due to incremental upgrade equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD of the system, which has been the case in commodity CPU accelerators. With all of 10,480 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, clusters; they may have processors with different speeds as we have achieved 87.01TFlops, which is the third record as a result of incremental upgrade. In this paper, we present a heterogeneous system in the world. This paper describes a Linpack implementation and evaluation results on TSUB- careful tuning and load balancing method required to achieve AME with 10,480 Opteron cores, 624 Tesla GPUs and 648 this performance. On the other hand, since the peak speed is ClearSpeed accelerators. In the evaluation, we also used a 163 TFlops, the efficiency is 53%, which is lower than other systems.
    [Show full text]
  • Tsubame 2.5 Towards 3.0 and Beyond to Exascale
    Being Very Green with Tsubame 2.5 towards 3.0 and beyond to Exascale Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology ACM Fellow / SC13 Tech Program Chair NVIDIA Theater Presentation 2013/11/19 Denver, Colorado TSUBAME2.0 NEC Confidential TSUBAME2.0 Nov. 1, 2010 “The Greenest Production Supercomputer in the World” TSUBAME 2.0 New Development >600TB/s Mem BW 220Tbps NW >12TB/s Mem BW >400GB/s Mem BW >1.6TB/s Mem BW Bisecion BW 80Gbps NW BW 35KW Max 1.4MW Max 32nm 40nm ~1KW max 3 Performance Comparison of CPU vs. GPU 1750 GPU 200 GPU ] 1500 160 1250 GByte/s 1000 120 750 80 500 CPU CPU 250 40 Peak Performance [GFLOPS] Performance Peak 0 Memory Bandwidth [ Bandwidth Memory 0 x5-6 socket-to-socket advantage in both compute and memory bandwidth, Same power (200W GPU vs. 200W CPU+memory+NW+…) NEC Confidential TSUBAME2.0 Compute Node 1.6 Tflops Thin 400GB/s Productized Node Mem BW as HP 80GBps NW ProLiant Infiniband QDR x2 (80Gbps) ~1KW max SL390s HP SL390G7 (Developed for TSUBAME 2.0) GPU: NVIDIA Fermi M2050 x 3 515GFlops, 3GByte memory /GPU CPU: Intel Westmere-EP 2.93GHz x2 (12cores/node) Multi I/O chips, 72 PCI-e (16 x 4 + 4 x 2) lanes --- 3GPUs + 2 IB QDR Memory: 54, 96 GB DDR3-1333 SSD:60GBx2, 120GBx2 Total Perf 2.4PFlops Mem: ~100TB NEC Confidential SSD: ~200TB 4-1 2010: TSUBAME2.0 as No.1 in Japan > All Other Japanese Centers on the Top500 COMBINED 2.3 PetaFlops Total 2.4 Petaflops #4 Top500, Nov.
    [Show full text]
  • Science-Driven Development of Supercomputer Fugaku
    Special Contribution Science-driven Development of Supercomputer Fugaku Hirofumi Tomita Flagship 2020 Project, Team Leader RIKEN Center for Computational Science In this article, I would like to take a historical look at activities on the application side during supercomputer Fugaku’s climb to the top of supercomputer rankings as seen from a vantage point in early July 2020. I would like to describe, in particular, the change in mindset that took place on the application side along the path toward Fugaku’s top ranking in four benchmarks in 2020 that all began with the Application Working Group and Computer Architecture/Compiler/ System Software Working Group in 2011. Somewhere along this path, the application side joined forces with the architecture/system side based on the keyword “co-design.” During this journey, there were times when our opinions clashed and when efforts to solve problems came to a halt, but there were also times when we took bold steps in a unified manner to surmount difficult obstacles. I will leave the description of specific technical debates to other articles, but here, I would like to describe the flow of Fugaku development from the application side based heavily on a “science-driven” approach along with some of my personal opinions. Actually, we are only halfway along this path. We look forward to the many scientific achievements and so- lutions to social problems that Fugaku is expected to bring about. 1. Introduction In this article, I would like to take a look back at On June 22, 2020, the supercomputer Fugaku the flow along the path toward Fugaku’s top ranking became the first supercomputer in history to simul- as seen from the application side from a vantage point taneously rank No.
    [Show full text]
  • Towards Exascale Computing
    TowardsTowards ExascaleExascale Computing:Computing: TheThe ECOSCALEECOSCALE ApproachApproach Dirk Koch, The University of Manchester, UK ([email protected]) 1 Motivation: let’s build a 1,000,000,000,000,000,000 FLOPS Computer (Exascale computing: 1018 FLOPS = one quintillion or a billion billion floating-point calculations per sec.) 2 1,000,000,000,000,000,000 FLOPS . 10,000,000,000,000,000,00 FLOPS 1975: MOS 6502 (Commodore 64, BBC Micro) 3 Sunway TaihuLight Supercomputer . 2016 (fully operational) . 12,543,6000,000,000,000,00 FLOPS (125.436 petaFLOPS) . Architecture Sunway SW26010 260C (Digital Alpha clone) 1.45GHz 10,649,600 cores . Power “The cooling system for TaihuLight uses a closed- coupled chilled water outfit suited for 28 MW with a custom liquid cooling unit”* *https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/ . Cost US$ ~$270 million 4 TOP500 Performance Development We need more than all the performance of all TOP500 machines together! 5 TaihuLight for Exascale Computing? We need 8x the worlds fastest supercomputer: . Architecture Sunway SW26010 260C (Digital Alpha clone) @1.45GHz: > 85M cores . Power 224 MW (including cooling) costs ~ US$ 40K/hour, US$ 340M/year from coal: 2,302,195 tons of CO2 per year . Cost US$ 2.16 billion We have to get at least 10x better in energy efficiency 2-3x better in cost Also: scalable programming models 6 Alternative: Green500 Shoubu supercomputer (#1 Green500 in 2015): . Cores: 1,181,952 . Theoretical Peak: 1,535.83 TFLOPS/s . Memory: 82 TB . Processor: Xeon E5-2618Lv3 8C 2.3GHz .
    [Show full text]
  • Computational PHYSICS Shuai Dong
    Computational physiCs Shuai Dong Evolution: Is this our final end-result? Outline • Brief history of computers • Supercomputers • Brief introduction of computational science • Some basic concepts, tools, examples Birth of Computational Science (Physics) The first electronic general-purpose computer: Constructed in Moore School of Electrical Engineering, University of Pennsylvania, 1946 ENIAC: Electronic Numerical Integrator And Computer ENIAC Electronic Numerical Integrator And Computer • Design and construction was financed by the United States Army. • Designed to calculate artillery firing tables for the United States Army's Ballistic Research Laboratory. • It was heralded in the press as a "Giant Brain". • It had a speed of one thousand times that of electro- mechanical machines. • ENIAC was named an IEEE Milestone in 1987. Gaint Brain • ENIAC contained 17,468 vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 capacitors and around 5 million hand-soldered joints. It weighed more than 27 tons, took up 167 m2, and consumed 150 kW of power. • This led to the rumor that whenever the computer was switched on, lights in Philadelphia dimmed. • Input was from an IBM card reader, and an IBM card punch was used for output. Development of micro-computers modern PC 1981 IBM PC 5150 CPU: Intel i3,i5,i7, CPU: 8088, 5 MHz 3 GHz Floppy disk or cassette Solid state disk 1984 Macintosh Steve Jobs modern iMac Supercomputers The CDC (Control Data Corporation) 6600, released in 1964, is generally considered the first supercomputer. Seymour Roger Cray (1925-1996) The father of supercomputing, Cray-1 who created the supercomputer industry. Cray Inc.
    [Show full text]
  • FCMSSR Meeting 2018-01 All Slides
    Federal Committee for Meteorological Services and Supporting Research (FCMSSR) Dr. Neil Jacobs Assistant Secretary for Environmental Observation and Prediction and FCMSSR Chair April 30, 2018 Office of the Federal Coordinator for Meteorology Services and Supporting Research 1 Agenda 2:30 – Opening Remarks (Dr. Neil Jacobs, NOAA) 2:40 – Action Item Review (Dr. Bill Schulz, OFCM) 2:45 – Federal Coordinator's Update (OFCM) 3:00 – Implementing Section 402 of the Weather Research And Forecasting Innovation Act Of 2017 (OFCM) 3:20 – Federal Meteorological Services And Supporting Research Strategic Plan and Annual Report. (OFCM) 3:30 – Qualification Standards For Civilian Meteorologists. (Mr. Ralph Stoffler, USAF A3-W) 3:50 – National Earth System Predication Capability (ESPC) High Performance Computing Summary. (ESPC Staff) 4:10 – Open Discussion (All) 4:20 – Wrap-Up (Dr. Neil Jacobs, NOAA) Office of the Federal Coordinator for Meteorology Services and Supporting Research 2 FCMSSR Action Items AI # Text Office Comment Status Due Date Responsible 2017-2.1 Reconvene JAG/ICAWS to OFCM, • JAG/ICAWS convened. Working 04/30/18 develop options to broaden FCMSSR • Options presented to FCMSSR Chairmanship beyond Agencies ICMSSR the Undersecretary of Commerce • then FCMSSR with a for Oceans and Atmosphere. revised Charter Draft a modified FCMSSR • Draft Charter reviewed charter to include ICAWS duties by ICMSSR. as outlined in Section 402 of the • Pending FCMSSR and Weather Research and Forecasting OSTP approval to Innovation Act of 2017 and secure finalize Charter for ICMSSR concurrence. signature. Recommend new due date: 30 June 2018. 2017-2.2 Publish the Strategic Plan for OFCM 1/12/18: Plan published on Closed 11/03/17 Federal Weather Coordination as OFCM website presented during the 24 October 2017 FCMMSR Meeting.
    [Show full text]
  • (Intel® OPA) for Tsubame 3
    CASE STUDY High Performance Computing (HPC) with Intel® Omni-Path Architecture Tokyo Institute of Technology Chooses Intel® Omni-Path Architecture for Tsubame 3 Price/performance, thermal stability, and adaptive routing are key features for enabling #1 on Green 500 list Challenge How do you make a good thing better? Professor Satoshi Matsuoka of the Tokyo Institute of Technology (Tokyo Tech) has been designing and building high- performance computing (HPC) clusters for 20 years. Among the systems he and his team at Tokyo Tech have architected, Tsubame 1 (2006) and Tsubame 2 (2010) have shown him the importance of heterogeneous HPC systems for scientific research, analytics, and artificial intelligence (AI). Tsubame 2, built on Intel® Xeon® Tsubame at a glance processors and Nvidia* GPUs with InfiniBand* QDR, was Japan’s first peta-scale • Tsubame 3, the second- HPC production system that achieved #4 on the Top500, was the #1 Green 500 generation large, production production supercomputer, and was the fastest supercomputer in Japan at the time. cluster based on heterogeneous computing at Tokyo Institute of Technology (Tokyo Tech); #61 on June 2017 Top 500 list and #1 on June 2017 Green 500 list For Matsuoka, the next-generation machine needed to take all the goodness of Tsubame 2, enhance it with new technologies to not only advance all the current • The system based upon HPE and latest generations of simulation codes, but also drive the latest application Apollo* 8600 blades, which targets—which included deep learning/machine learning, AI, and very big data are smaller than a 1U server, analytics—and make it more efficient that its predecessor.
    [Show full text]
  • This Is Your Presentation Title
    Introduction to GPU/Parallel Computing Ioannis E. Venetis University of Patras 1 Introduction to GPU/Parallel Computing www.prace-ri.eu Introduction to High Performance Systems 2 Introduction to GPU/Parallel Computing www.prace-ri.eu Wait, what? Aren’t we here to talk about GPUs? And how to program them with CUDA? Yes, but we need to understand their place and their purpose in modern High Performance Systems This will make it clear when it is beneficial to use them 3 Introduction to GPU/Parallel Computing www.prace-ri.eu Top 500 (June 2017) CPU Accel. Rmax Rpeak Power Rank Site System Cores Cores (TFlop/s) (TFlop/s) (kW) National Sunway TaihuLight - Sunway MPP, Supercomputing Center Sunway SW26010 260C 1.45GHz, 1 10.649.600 - 93.014,6 125.435,9 15.371 in Wuxi Sunway China NRCPC National Super Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Computer Center in Cluster, Intel Xeon E5-2692 12C 2 Guangzhou 2.200GHz, TH Express-2, Intel Xeon 3.120.000 2.736.000 33.862,7 54.902,4 17.808 China Phi 31S1P NUDT Swiss National Piz Daint - Cray XC50, Xeon E5- Supercomputing Centre 2690v3 12C 2.6GHz, Aries interconnect 3 361.760 297.920 19.590,0 25.326,3 2.272 (CSCS) , NVIDIA Tesla P100 Cray Inc. DOE/SC/Oak Ridge Titan - Cray XK7 , Opteron 6274 16C National Laboratory 2.200GHz, Cray Gemini interconnect, 4 560.640 261.632 17.590,0 27.112,5 8.209 United States NVIDIA K20x Cray Inc. DOE/NNSA/LLNL Sequoia - BlueGene/Q, Power BQC 5 United States 16C 1.60 GHz, Custom 1.572.864 - 17.173,2 20.132,7 7.890 4 Introduction to GPU/ParallelIBM Computing www.prace-ri.eu How do
    [Show full text]
  • Germany's Top500 Businesses
    GERMANY’S TOP500 DIGITAL BUSINESS MODELS IN SEARCH OF BUSINESS CONTENTS FOREWORD 3 INSIGHT 1 4 SLOW GROWTH RATES YET HIGH SALES INSIGHT 2 6 NOT ENOUGH REVENUE IS ATTRIBUTABLE TO DIGITIZATION INSIGHT 3 10 EU REGULATIONS ARE WEAKENING INNOVATION INSIGHT 4 12 THE GERMAN FEDERAL GOVERNMENT COULD TURN INTO AN INNOVATION DRIVER CONCLUSION 14 FOREWORD Large German companies are on the lookout. Their purpose: To find new growth prospects. While revenue increases of more than 5 percent on average have not been uncommon for Germany’s 500 largest companies in the past, that level of growth has not occurred for the last four years. The reasons are obvious. With their high export rates, This study is intended to examine critically the Germany’s industrial companies continue to be major opportunities arising at the beginning of a new era of players in the global market. Their exports have, in fact, technology. Accenture uses four insights to not only been so high in the past that it is now increasingly describe the progress that has been made in digital difficult to sustain their previous rates of growth. markets, but also suggests possible steps companies can take to overcome weak growth. Accenture regularly examines Germany’s largest companies on the basis of their ranking in “Germany’s Top500,” a list published every year in the German The four insights in detail: daily newspaper DIE WELT. These 500 most successful German companies generate revenue of more than INSIGHT 1 one billion Euros. In previous years, they were the Despite high levels of sales, growth among Germany’s engines of the German economy.
    [Show full text]
  • Providing a Robust Tools Landscape for CORAL Machines
    Providing a Robust Tools Landscape for CORAL Machines 4th Workshop on Extreme Scale Programming Tools @ SC15 in AusGn, Texas November 16, 2015 Dong H. Ahn Lawrence Livermore Naonal Lab Michael Brim Oak Ridge Naonal Lab Sco4 Parker Argonne Naonal Lab Gregory Watson IBM Wesley Bland Intel CORAL: Collaboration of ORNL, ANL, LLNL Current DOE Leadership Computers Objective - Procure 3 leadership computers to be Titan (ORNL) Sequoia (LLNL) Mira (ANL) 2012 - 2017 sited at Argonne, Oak Ridge and Lawrence Livermore 2012 - 2017 2012 - 2017 in 2017. Leadership Computers - RFP requests >100 PF, 2 GiB/core main memory, local NVRAM, and science performance 4x-8x Titan or Sequoia Approach • Competitive process - one RFP (issued by LLNL) leading to 2 R&D contracts and 3 computer procurement contracts • For risk reduction and to meet a broad set of requirements, 2 architectural paths will be selected and Oak Ridge and Argonne must choose different architectures • Once Selected, Multi-year Lab-Awardee relationship to co-design computers • Both R&D contracts jointly managed by the 3 Labs • Each lab manages and negotiates its own computer procurement contract, and may exercise options to meet their specific needs CORAL Innova,ons and Value IBM, Mellanox, and NVIDIA Awarded $325M U.S. Department of Energy’s CORAL Contracts • Scalable system solu.on – scale up, scale down – to address a wide range of applicaon domains • Modular, flexible, cost-effec.ve, • Directly leverages OpenPOWER partnerships and IBM’s Power system roadmap • Air and water cooling • Heterogeneous
    [Show full text]
  • Summit Supercomputer
    Cooling System Overview: Summit Supercomputer David Grant, PE, CEM, DCEP HPC Mechanical Engineer Oak Ridge National Laboratory Corresponding Member of ASHRAE TC 9.9 Infrastructure Co-Lead - EEHPCWG ORNL is managed by UT-Battelle, LLC for the US Department of Energy Today’s Presentation #1 ON THE TOP 500 LIST • System Description • Cooling System Components • Cooling System Performance NOVEMBER 2018 #1 JUNE 2018 #1 2 Feature Titan Summit Summit Node Overview Application Baseline 5-10x Titan Performance Number of 18,688 4,608 Nodes Node 1.4 TF 42 TF performance Memory per 32 GB DDR3 + 6 GB 512 GB DDR4 + 96 GB Node GDDR5 HBM2 NV memory per 0 1600 GB Node Total System >10 PB DDR4 + HBM2 + 710 TB Memory Non-volatile System Gemini (6.4 GB/s) Dual Rail EDR-IB (25 GB/s) Interconnect Interconnect 3D Torus Non-blocking Fat Tree Topology Bi-Section 15.6 TB/s 115.2 TB/s Bandwidth 1 AMD Opteron™ 2 IBM POWER9™ Processors 1 NVIDIA Kepler™ 6 NVIDIA Volta™ 32 PB, 1 TB/s, File System 250 PB, 2.5 TB/s, GPFS™ Lustre® Peak Power 9 MW 13 MW Consumption 3 System Description – How do we cool it? • >100,000 liquid connections 4 System Description – What’s in the data center? • Passive RDHXs- 215,150ft2 (19,988m2) of total heat exchange surface (>20X the area of the data center) – With a 70°F (21.1 °C) entering water temperature, the room averages ~73°F (22.8°C) with ~3.5MW load and ~75.5°F (23.9°C) with ~10MW load.
    [Show full text]
  • Clearspeed Technical Training
    ENVISION. ACCELERATE. ARRIVE. ClearSpeed Technical Training December 2007 Overview 1 Copyright © 2007 ClearSpeed Technology Inc. All rights reserved. www.clearspeed.com Presenters Ronald Langhi Technical Marketing Manager [email protected] Brian Sumner Senior Engineer [email protected] 2 Copyright © 2007 ClearSpeed Technology Inc. All rights reserved. www.clearspeed.com ClearSpeed Technology: Company Background • Founded in 2001 – Focused on alleviating the power, heat, and density challenges of HPC systems – 103 patents granted and pending (as of September 2007) – Offices in San Jose, California and Bristol, UK 3 Copyright © 2007 ClearSpeed Technology Inc. All rights reserved. www.clearspeed.com Agenda Accelerators ClearSpeed and HPC Hardware overview Installing hardware and software Thinking about performance Software Development Kit Application examples Help and support 4 Copyright © 2007 ClearSpeed Technology Inc. All rights reserved. www.clearspeed.com ENVISION. ACCELERATE. ARRIVE. What is an accelerator? 5 Copyright © 2007 ClearSpeed Technology Inc. All rights reserved. www.clearspeed.com What is an accelerator? • A device to improve performance – Relieve main CPU of workload – Or to augment CPU’s capability • An accelerator card can increase performance – On specific tasks – Without aggravating facility limits on clusters (power, size, cooling) 6 Copyright © 2007 ClearSpeed Technology Inc. All rights reserved. www.clearspeed.com All accelerators are good… for their intended purpose Cell and GPUs FPGAs •Good for video gaming tasks •Good for integer, bit-level ops •32-bit FLOPS, not IEEE •Programming looks like circuit design •Unconventional programming model •Low power per chip, but •Small local memory 20x more power than custom VLSI •High power consumption (> 200 W) •Not for 64-bit FLOPS ClearSpeed •Good for HPC applications •IEEE 64-bit and 32-bit FLOPS •Custom VLSI, true coprocessor •At least 1 GB local memory •Very low power consumption (25 W) •Familiar programming model 7 Copyright © 2007 ClearSpeed Technology Inc.
    [Show full text]