High Performance Computing for Numerical Weather Prediction at , an IBM Business — Todd Hutchinson

John Wong You’re likely familiar with our consumer properties

266 million monthly users through and Weather Underground brands World’s most downloaded weather app

Data provider for Apple, Facebook, Android devices, and more

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation We also supply decision support to many industries

ENERGY AGRICULTURE RETAIL Outage Prediction Decision Platform Weather Signals

GROUND INSURANCE AVIATION TRANS. Alerts Fusion / Turbulence Travel Time Forecast IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation We have the world’s most accurate forecasts

Day 1-3 Days 4-6 Days 7-9 80% 80% 65% 70% 70% 60% 60% 55% US 50% 50% 45%

TWC TWC TWC Foreca Foreca Foreca Da rk Sk y Da rk Sk y Intellicast NWS Web Intellicast Intellicast NWS NDFD NWS NDFD AccuWeatherMe teoGroup W. Wx Online AccuWeather W. Wx Online AccuWeather W. Wx Online 80% 80% 65% 70% 70% 55% 60% 60% 50% 50% 45% EUROPE

TWC BBC TWC TWC Foreca Foreca Foreca Da rk Sk y Da rk Sk y Intellicast Intellicast Intellicast AccuWeatherMe teoGroup W. Wx Online AccuWeather W. Wx Online AccuWeatherW. Wx Online 80% 80% 75% 70% 70% 65% 60% 60% 55%

ASIA 50% 50% 45%

TWC TWC TWC Foreca Foreca Foreca Da rk Sk y Da rk Sk y Intellicast Intellicast Intellicast AccuWeatherMe teoGroup W. Wx Online AccuWeather W. Wx Online AccuWeather W. Wx Online

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation https://www.forecastwatch.com/wp-content/uploads/Three_Region_Accuracy_Overview_2010-2017.pdf Date: September 2018 The Weather Company Forecasts

Forecasts are generated from more than 50 forecast models Most are run by governments, often out to 10+days IBM GRAF is run internally, by The Weather Company

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation Introducing IBM GRAF, our new Global Hi-Resolution Forecasting System

Most models are run at 9-15km resolution and updated just 2-4 times daily IBM GRAF will run at 3km over most land areas GLOBALLY and be updated HOURLY First known operational global model to be focused on the 0-15 hour forecast period

GRAF brings the resolution once limited to the US, Japan, and W. Europe to the rest of the world

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation The IBM GRAF Modeling System

Data Assimilation: Hybrid EnKF using GSI; early and late cycles

Dynamic Core: Model for Prediction Across Scales (MPAS)

Compute: IBM Power9 GPU- accelerated HPC System

Product Generation: Numerous proprietary forecast products

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation The MPAS model core

MPAS-Atmosphere has been developed by the National Center for Atmospheric Research (NCAR)

Accelerated for GPU using OpenACC by NCAR, NVIDIA, IBM 15km 15km Variable resolution on unstructured 3km 3km grid (Voronoi meshes) 3km 15km 15km 3km 3km Flexible grid geometries with no 3km 15km abrupt nested-grid transitions 15km 15km

Courtesy NCAR: http://www2.mmm.ucar.edu/projects/mpas/tutorial/Boulder2018/slides/1._overview.pdf Global and highly-detailed

Resolution: 3km over 30% of the world

15km 15km 3km 3km

3km 15km 15km 3km 3km 3km 15km 15km 15km

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 9 IBM GRAF

First operational Global NWP system running on GPU

© 2019 IBM Corporation Dyeus 76 GPU nodes 16 CPU-only nodes An IBM® Power System™ 4 infrastructure nodes AC922 HPC 3.3PB usable space

Per server:

2x 20-core IBM Power 9 CPUs

4x NVIDIA Tesla V100 GPUs with NVLink 256GB of DDR4 memory 340 GB/sec Peak memory bandwidth

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 11 IBM Data center Research Triangle Park, NC

ü Level 3+ data center

ü Provides the necessary power/cooling supply and backup

ü Security infrastructure

ü Supports 24/7 non-stop high-intensity computation

ü Dedicated team from IBM Lab Services

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 12 Computational Performance

MPAS-GPU full weather forecast runs 7-9 times faster on AC922 servers compared to Intel servers AC922: Most components on GPU: OpenACC + MPI

40 available CPU cores used for remaining paramaterizations (radiation, land-surface) and I/O. Intel: MPI distributed parallelism

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 13 Computational Profile

Overall, MPAS is 7.2x faster on AC922 compared to Intel Broadwell Servers If we exclude 2 modules, MPAS is 13x faster on AC922

There is room for further improvement! 7x

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 14 GPU parallelism accomplished via openACC directives and code restructures

LOC

MPAS Original 376,110

MPAS Accelerated 384,032

Open ACC directives 2600

Additional Lines 5322

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 15 How will this benefit consumers and businesses?

Product generation, often tailored to business use cases Examples:

Mobile Devices Presentation Internet Aviation Turbulence

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 16 HPC for good

IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 17 IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation 18