Computing Challenges for Weather and Climate Modelling at the

Paul Selwood

© Crown copyright Met Office Current Status

© Crown copyright Met Office The

• The same model formulation is used for all models from climate scale to mesoscale

Climate modelling: input into IPCC reports (Coupled Atmosphere-Ocean models)

Seasonal forecasting: For commercial and business customers

NWP: Public Weather Service WAFC, Commercial ……

© Crown copyright Met Office Dynamics – aka CFD

• Lat-Long Grid Departure point • Advection Arrival Point • Semi-lagrangian scheme • Variable order interpolation • Adjustment

• Semi-implicit scheme • 3D Helmoltz equation • Diffusion : Removing noise

• Variable order

© Crown copyright Met Office Physical Parameterizations Convection Clouds Short-wave radiation

Vegetation Model

Long-wave radiation Precipitation

Surface Processes

© Crown copyright Met Office Parallel Implementation

• Regular, Static, Lat-Long Decomposition • Mixed mode MPI/OpenMP • Asynchronous I/O servers • Communications on demand for advection • Multiple halo sizes

© Crown copyright Met Office Atmospheric In transition to Production grid length Production system

1.5km UKV

4km UK4 NAE HadGEM3-RA 12km MOGREPS-R ensemble Global regional HadGEM3 24km HadGEM2 TIGGE 40km GloSea4 ensemble HadGEM1 Earth System 80km DePreSys HadCM3 ty xi le PRECIS p m o Coupled atmos/ocean 150km C

Global atmosphere-only

300km Regional atmosphere-only

36hrs 48hrs 5 days 15 days 6 months 10 years 30 years >100 years Timescale

© Crown copyright Met Office Current Production schedule

© Crown copyright Met Office Met Office HPC

• 1989-2003 : Cray YMP,C90,T3E

• 2003-2008 : NEC SX6/8 ~5TFlop peak • 2009-12 : IBM p575 Power6 o Operational from August 2009 o 145 TFlop peak capacity (7744 cores) o 2 identical systems (2*106 node) for resilience plus small system (30 node) for Collaboration with UK Universities

• 2012-> : IBM Power 7 • ~3 faster than Phase 1 measured by benchmark application speedup • At least 25000 cores with total Capacity approaching 1PFlop

© Crown copyright Met Office Scientific Drivers

© Crown copyright Met Office Timeliness is essential

© Crown copyright Met Office Resolution is important!

Boscastle storm: forecast rainfall accumulations for 16 August 2004, 5km 12:00-18:00 radar actual 12km forecast from 00UTC 4km forecast from 00UTC 1km forecast from 00UTC

© Crown copyright Met Office 1990 1996

(IPCC Timescales)

2001 2007

© Crown copyright Met Office Toward the Earth-System Model

Climate

Water usage Greenhouse Effect

Greenhouse Human Water gases Emissions Aerosols cycle CO , CH Human 2 4 Emissions Fires: soot Mineral dust Chemistry Ecosystems

Human Land use Emissions change

© Crown copyright Met Office Future Costs

Atmosphere Ocean

Name Res (km) X Y Z Res (deg) Levels Complexity Cost Factor

HG2 N96-ES 135 192 145 38 1 40 2.5 1

HG3 N96-ES 135 192 145 85 1 40 2 2

HG3 N216 60 432 325 85 0.24 75 2 18

HG3 N216-ES 60 432 325 85 0.24 75 5 45

HG3 N320 40 640 481 85 0.084 75 2 196

HG3 N320-ES 40 640 481 85 0.084 75 5 489

HG3 N512 25 1024 769 85 0.084 75 2 293

HG3 N512-ES 25 1024 769 85 0.084 75 5 732

HG3 N512-ES+ 25 1024 769 85 0.084 75 10 1463

HG3 1.5km-ES 1.5 17000 1280 200 0.084 75 10 406123 Can We Scale?

© Crown copyright Met Office Is Weak Scalability Possible?

• Scalability challenge suggests resolution increase.

• Double resolution from M to M/2 km • Grid-points increase by O(n 2) in horizontal • Grid-points increase by O(n 2) in vertical • Time-step will reduce • Iteration count in solver will increase • Scientists continue to add complexity to models

© Crown copyright Met Office Global Model Weak Scalability

3.5

3.0 448

2.5

2.0 Normalised Time / Gridpoint 8 96 Normalised Time / Gridpoint / 1.5 32 Timestep 12 1.0

0.5

0.0 N96L63 N144L70 N216L70 N320L70 N512L70

© Crown copyright Met Office Strong Scaling – Mar 2010

14

12

10

N512L70 - no I/O 8 N512L70 - full I/O UKV 6 HadGEM3-AO

4

2

0 0 500 1000 1500 2000

© Crown copyright Met Office Global Model Dynamics Problems

• Lat-Long grid causes problems • ADI preconditioner scales poorly • Communication on demand in the advection is fairly costly and introduces imbalance • Polar filtering is communication dominated and imbalanced • Polar re-mapping in wind advection introduces load imbalance • Constant pole requirement introduces communication

© Crown copyright Met Office Data assimilation

First guess Observations

T-3 T-2 T-1 T+0 T+1 T+2 T+3 T+144 • The challenge:

To compute the model state from which the resulting forecast best matches the available observations

© Crown copyright Met Office Analysis (schematic) Main run (N216) QG12

OPS (QG12) model UM UM background (QU06) analysis (QG12) increment

VAR vguess VAR N108 N216 Hessian eigenvectors

1445 1503 1623 GMT

© Crown copyright Met Office Earth System Model Components

JULES UM UKCA Land Surface Atmosphere Chemistry

OASIS Coupler

NEMO CICE Ocean Sea Ice

© Crown copyright Met Office Load balancing and all that

• Component speed depends on • Cores given • Number of threads

• Coupled model speed • Only runs as fast as the slowest component • Don’t want one component waiting for another • During optimisation work, constant need to rebalance.

© Crown copyright Met Office An extra dimension ...

© Crown copyright Met Office Coupled model scaling

© Crown copyright Met Office Individual components

© Crown copyright Met Office The world and its weather aren’t uniform …

© Crown copyright Met Office … so we get load imbalance in convection

© Crown copyright Met Office … and in surface schemes

© Crown copyright Met Office What to do about it?

© Crown copyright Met Office Conventional Optimisation

• Figure 1

© Crown copyright Met Office Rip it up and start again…

• Next Generation Weather and Climate Programme • Collaboration with Hartree, Met Office and NERC • Combine computer science and meteorology/climatology specialists • Clean slate approach.

• EO has been put out and evaluations are starting.

© Crown copyright Met Office VAR v EnDA (future scheme) with current IBM scaling

Scaling with fixed node number

1500

Perfect UM or 1000 VAR UM N512 T+12

VAR N216 500

Elapsed time (s) EnDA N216 48 members

0 12 24 36 48 Nodes

© Crown copyright Met Office Global Model Comparison using NWP Index basket measure % diff relative to Met Office

MetO

© Crown copyright Met Office Questions and answers

© Crown copyright Met Office