“Ensembles” in Numerical Weather Prediction & the Ensemble

http://www.ugo.com/channels/filmtv/

1

Overview Numerical Weather Prediction

• Aspects of Numerical Weather prediction Forecasting weather by the use of numerical models and observations, run on high speed computers. • (Ensemble) Kalman filter

• Statistical approach to prevent filter divergence Numerical Models Observations

• Physical laws of flow • Surface (towers, ships) Thomas Bengtsson, Jeff Anderson, Doug Nychka • Simplification • Altitude (planes, ballons) • Discretization • Radar http://www.cgd.ucar.edu/∼furrer • Parameterization • Satellite data

2 3

NWP Scheme Kalman Filter

State and space equations: Global analysis and Background Observations balancing or first guess yt = Htxt + wt wt ∼ Nr(0, Rt) xt+1 = Gtxt + vt vt ∼ Nq(0, Qt)

Data Assimilation “calls” for Kalman Filter where yt : observations Ht : measurement operator xt : state Gt : system operator Global forecast model wt : measurement error Rt : measurement vt : state error Qt : state error covariance matrix

Operational forecasts

4 5 Kalman Filter NWP within KF Scheme

State and space equations: −1 Ht Global analysis and x t Background Observations yt = Htxt + wt wt ∼ Nr(0, Rt) balancing or first guess xt+1 = Gtxt + vt vt ∼ Nq(0, Qt) yt We aim to update our knowledge of the state xt given the new data yt.

The Kalman filter is an iterative procedure based on – Bayes approach Global forecast model – conditional expectation Gt+1 – . . . x t+1 Operational forecasts

6 7

Kalman Filter Kalman Filter

The standard Kalman filter uses the initial knowledge What if we have:

f f • nonlinear state Gt(xt) p(xt | Yt−1) ∼ Nq(xt , Pt ) Yt−1 = {yt−1, . . . , y0} linearize the state via Taylor (Extended KF) then

a a a f f • non Gaussian errors p(xt | Yt) ∼ Nq(x , P ) x = x + Kt(yt − Htx ) t t t t t distributions do not have a closed form Pa = (I − K H )Pf t t t t f T f T −1 KF is still the best linear predictor Kt = Pt Ht (HtPt Ht + Rt) x xf Pf xf G xa • outliers, . . . p( t+1 | Yt) ∼ Nq( t+1, t+1) t+1 = t t Pf G PaGT Q work for statisticians. . . t+1 = t t t + t

7 7

“High” Dimensions EnsKF

KF in one and higher dimensions have similar properties. f Assume a sample {xt,i} for the forecast distribution p(xt | Yt−1). If “high” gets really high, say 106, 107: The update is performed according to a f ε f ε iid −→ Infeasible to calculate the densities xt,i = xt,i + Ktyt + t,i − Htxt,i { t,i} ∼ Nr(0, Rt) Simplifications: Ensemble Kalman filter (EnsKF): f T f T −1 • Kt is replaced by Kt = Pt Ht (Ht Pt Ht + Rt) for some propagation of the forecast distribution with a small ensemble. f f c c c estimate Pt of Pt . c Unscented Kalman filter: • Rt = rtI, Qt = 0: serial computations of “inverses”. description of the moments with a small specific sample. • Square root filter: deterministic version, no need to draw samples.

8 8 EnsKF: +/− Operational EnsKF in NWP

• Is feasible with observation vector is of size O(106) and state EnsKF has essentially three goals: 7 vector is of size O(10 ). • improve the forecast by ensemble averaging

• Works well even for very small ensemble sizes: n = 10, 40, . . . • provide an indication of the reliability of the forecast f if we take a well chosen estimate of Pt . • provide a quantitative basis for probabilistic forecasting • Some “tuning” is needed to prevent filter divergence.

9 10

Operational EnsKF in NWP Statistical Questions

Practically, EnsKF may be performed by: f We need to choose an estimate or approximation of Pt . • choosing different initial or boundary conditions • What estimate or approximation should we choose? • using different numerical techniques to solve the PDEs • What is the optimal ensemble size n? • using different parameterizations • What is the optimal “tuning”?

11 12

Approximations of Pf Goal of the Study g t

f What is the dependence of the error with Typical examples for approximations of Pt : f f f a a 1. Pf is estimated with n ensembles || P − P || || P − P || || K − K || t f f f f f Pt = Pt . where f c f f f f f P P P P C 2. Pt is estimated with n ensembles and tapered with C = or = ◦ or . . . f f f c f c P = P C. f t t ◦ on ensemble size n, state dimension q and eigenvalues of λi of P ? f c 3. Pf is inflated/boosted with ρ > 1 t Associated questions: f f f f Pt = ρ Pt or Pt = ρ Pt ◦ C. f c f c • how big has the ensemble size n to be?

• what is an optimal taper matrix C?

• what is the optimal boosting factor ρ?

13 14 EnsKF with Pf Pf EnsKF with tapering g = d

Straightforward, but “tedious” analysis is used to obtain f f What is the error when using P = P ◦ C, for a positive definite taper matrix C? f c forecast covariance matrix: exact expression f The expressions involve q, n , {λi} and the eigenvectors of P . update covariance matrix: lower bound For Pa and K, the expressions are long and are only approximate. c c Kalman gain matrix: approximation We want to minimize the error for the forecast covariance matrix with respect to all positive definite matrices C.

For diagonal Pf we have simple solutions.

15 16

Optimal Taper C Optimal Taper with isotropic Pf

A naive approach is to minimize component-wise Suppose we have an “isotropic” forecast matrix, i.e. the covariance 2 depends only on the distance of the locations. pij (Cmin)ij = p2 + (p2 + p p )/n ij ij ii jj It is natural to minimize within “isotropic” taper matrices.

But Cmin is We get analytic solutions for particular cases. • not a correlation matrix: (Cmin)ii = n/(n + 2) • not always positive definite: depends on n Solutions in the squared sense are feasible.

 1 if pij 6= 0 As n → ∞, (Cmin)ij =  0 if pij = 0 

17 18

Illustration Illustration

Suppose a regular transect with q = 250 in [0, 1] and exponentially Suppose a regular transect with q = 250 in [0, 1] and exponentially decaying forecast covariance structure. decaying forecast covariance structure. We consider ensemble size n = 20. We taper with Gaspari and Cohn’s taper for ensemble size n = 20. For 100 MC samples we calculated the spectrum. For 100 MC samples, calculate the MSE of the spectrum in function of taper length θ. 64 True spectrum

49 No tapering 2556.8 − 3390.8 − 5009.7 Exponential tapering 731.82 − 1131.22 − 1730.78

36 Naive tapering 727.74 − 1032.5 − 1644.86 2000

i Gaspari and Cohn 712.46 − 1011.8 − 1626.51 λ 25 1500 16 Spectrum 9 1000 4 Estimated MSE 1 500 Estimated MSE w/o tapering

0 2502.39 − 3459.69 − 5374.36 0 0 10 20 30 40 50 60 i 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 θ 19 20 Discussion Further Research

• Tapering the forecast covariance matrix reduces significantly the • Find optimal taper matrices for specific cases. error due to sampling variability. • Apply to “more”-step forecasts. • There exists an optimal taper matrix: • Generalize to other matrices Ht and Rt. – which is not always practical, i.e. is not always positive definite, – the system is not sensitive with respect to the taper choice. • Apply to high dimensional problems, i.e. numerical applications, DART. • Statistical tool to determine optimal taper matrices, “boosting factors”, . . .

• Discrepancy between theory and practice

20 21