Precipitation Nowcasting Leveraging and HPC Systems to Optimize the Data Pipeline Agenda

● Introduction ● Motivation ● Dataset ● Prediction Modeling ● Data Pipelines ● Results ● Q&A

2 AMS 2018 Copyright 2018 Cray Inc. Introduction

● Nowcasting ● Predict precipitation locations and rates at a regional level over a short timeframe ● Traditional Approach: Numerical Weather Prediction ● Requires an extended lead time between newly acquired data and release of forecasts ● Deep Learning ● Branch of based on Neural Networks ● Deep implies multiple layers of computation between inputs and output ● Pattern Matching

3 AMS 2018 Copyright 2018 Cray Inc. Motivation

● Why is short-term nowcasting important? ● Provide reliable ride-to-work forecasts ● Predict rapid formation of severe precipitation events: Flash Flood Warning! ● Why Deep Learning? ● Traditional Nowcasting relies on NWP, slow to respond to new data ● Deep Learning learns from past rainfall patterns ● Trained models are computationally cheap to utilize ● Why HPC? ● A lot of data! Training can utilize decades of observations ● Models can be trained and inferred for small regions in parallel

4 AMS 2018 Copyright 2018 Cray Inc. Prototype Nowcasting System

● Small: 4 stations ● KATX (Seattle), KTLX (Oklahoma City), KTLH (Tallahassee), KBUF (Buffalo) ● Variable sized dataset, as large as 7 years of historical rainfall data ● Total size (raw data): 4TB ● Total size (processed): 684GB ● Examine Pipeline Performance and Bottlenecks ● Explore Nowcasting Performance via Deep Learning

5 AMS 2018 Copyright 2018 Cray Inc. Dataset Processing

Data Collection Transformation

• Historical Radar Data • Raw radial data structure Sampling (NETCDF) converted to evenly • Geographical Region spaced Cartesian grid • Time-series • Days with over 0.1 inches (Tensors with float 32) • Inputs and of precipitation, info from • Resolution scaling and Labels NOAA – NCDC clipping • Random • Radar scans every 5-10 • Configure dimensionality sampling minutes throughout the • Sequencing BigDL day • 2 channels – Reflectivity, Velocity • Uses Py-ART package Framework

• Apache Spark on Urika-XC/GX • Implemented in Jupyter notebooks and Python

6 AMS 2018 Copyright 2018 Cray Inc. Prediction Modeling

● Convolutional ● Convolutional Neural Network – Spatial Patterns ● Recurrent Neural Network – Temporal Patterns ● ConvLSTM – Convolutional Long Short-Term Memory Network ● Sequence to Sequence ● Encoder Decoder ● Use recent history to predict future changes

7 AMS 2018 Copyright 2018 Cray Inc. Pipeline: Data Processing Iterate on Dataset

Compute Gradients

Convert to RDDs BigDL

Save As NumPy Arrays Feed-Forward Distributed File-System

8 AMS 2018 Copyright 2018 Cray Inc. Pipeline: Distributed Training

Split Data Train Unique Hyper- Networks for Parameter Trained By Region each region Optimization Networks KATX

KTLH

KBUF

… Distributed KTLX File-System

9 AMS 2018 Copyright 2018 Cray Inc. Idealized Training Timeline

● Station: KATX ● Dataset size: Process Wall-Time Proportion ● 118,342 Sequences ● 101GB Download 13 hours 32% ● Parameters ● Systems: Spark 4 hours 10% ● Data processing: Cray Urika-GX – 1024 cores Training 24 hours 58% ● Training: Cray CS-Storm – 8 Nvidia P100 GPUs Inference 10 seconds 0%

10 AMS 2018 Copyright 2018 Cray Inc. Scaling

● Tensorflow via Cray MPI Device Scaling Throughput Com. Plugin Count Efficiency ● Nvidia Tesla P100 GPUs 1 25.8 1.0

● Batchsize of 4 samples 2 51.6 1.0 per device ● Throughput in 4 102.7 .995 Samples/Second 8 205.4 .995

16 410.5 .994

11 AMS 2018 Copyright 2018 Cray Inc. Model Performance KATX CSI and Average Error KBUF CSI and Average Error 0.7 0.8 0.7

0.5 0.6 0.5

0.4 0.3 0.3

0.2 0.1 1 2 3 4 5 6 1 2 3 4 5 6 CSI-ConvLSTM CSI-Persistence CSI-ConvLSTM CSI-Persistence MAE-ConvLSTM MAE-Persistence MAE-ConvLSTM MAE-Persistence KTLX CSI and Average Error KTLH CSI and Average Error 0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1 1 2 3 4 5 6 1 2 3 4 5 6

12 AMS 2018 Copyright 2018 Cray Inc. Effect of dataset size

● More data is correlated to higher performing Average CSI for varying sized datasets 0.458 prediction model 0.456 ● Station: KATX (Seattle) 0.454 0.452 0.45 Years Size (GB) Sequences 0.448 1 40 43,171 0.446 0.444 Critical Success Index 3 109 118,342 0.442 0.44 5 143 155,337 0 1 2 3 4 5 6 7 8 Years in Dataset 7 217 235,301

13 AMS 2018 Copyright 2018 Cray Inc. Sample Prediction + Q/A

14 AMS 2018 Copyright 2018 Cray Inc.