The Ensemble Kalman Filter for Combined State and Parameter Estimation

MONTEMONTE CCARLOARLO TECHNIQUESTECHNIQUES FORFORDATAASSIMILATIONINLARGESYSTEMS DATA ASSIMILATION IN LARGE SYSTEMS

GEIR EVENSEN

he ensemble Kalman ﬁ lter (EnKF) [1] is a sequential Monte Carlo method that provides an alternative to the traditional Kalman ﬁ lter (KF) [2], [3] and adjoint or four-dimensional variational (4DVAR) methods [4]–[6] to better handle large state spaces and nonlinear error evolution. EnKF pro- Tvides a simple conceptual formulation and ease of implementation, since

PHOTO COMPOSITE COURTESY OF DAVID STENSRUD

there is no need to derive a tangent linear operator or adjoint equations, and there are no integrations backward in time. EnKF is used extensively in a large com- munity, including ocean and atmospheric sciences, oil reservoir simulations, and hydrological modeling. To a large extent EnKF overcomes two problems associated with the traditional KF. First, in KF an error covariance matrix for the model state needs to be stored and propagated in time, making the method computationally infeasible for models with high-dimensional state vectors. Second, when the model dynamics are nonlinear, the extended KF (EKF) uses a linearized equation for the error covariance evolution, and this linearization can result in unbounded linear instabilities Digital Object Identifier 10.1109/MCS.2009.932223 for the error evolution [7].

1066-033X/09/$25.00©2009IEEE JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 83 In contrast with EKF, EnKF represents the error covari- model, provides a representation of the subspace where the ance matrix by a large stochastic ensemble of model re- update is computed at each analysis time. It is possible to alizations. For large systems, the dimensionality problem formulate analysis schemes in terms of the ensemble, lead- is managed by using a low-rank approximation of the er- ing to efficient algorithms where the state error covariance ror covariance matrix, where the number of independent matrix is not computed and is only implicitly used. model realizations is less than the number of unknowns A major approximation introduced in EnKF is related to in the model. Thus, the uncertainty is represented by a set the use of a limited number of ensemble realizations. The of model realizations rather than an explicit expression for ensemble size limits the space where the solution is searched the error covariance matrix. The ensemble of model states for and in addition introduces spurious correlations that is integrated forward in time to predict error statistics. For lead to excessive decrease of the ensemble variance and linear models the ensemble integration is consistent with possibly filter divergence. The spurious correlations can be the exact integration of an error covariance equation in the handled by localization methods that attempt to reduce the limit of an infinite ensemble size. Furthermore, for non- impact of measurements that are located far from the grid- linear dynamical models, the use of an ensemble integra- point to be updated. Localization methods either filter away tion leads to full nonlinear evolution of the error statistics, distant measurements or attempt to reduce the amplitude which in EnKF can be computed with a much lower com- of the long-range spurious correlations. The use of a local putational cost than in EKF [8]. analysis scheme effectively increases the ensemble solution Whenever measurements are available, each individual space while reducing the impact of spurious correlations. realization is updated to incorporate the new information The use of a local analysis scheme allows for a relatively provided by the measurements. Implementations of the up- small ensemble size to be used with a high- dimensional date schemes can be formulated as either a stochastic [9] or dynamical model. a deterministic scheme [8], [10]–[13]. Both kinds of schemes A chronological list of applications of EnKF is given solve for a variance-minimizing solution and implicitly as- in [8]. This list includes both low-dimensional systems sume that the forecast error statistics are Gaussian by using of highly nonlinear dynamical models as well as high- only the ensemble covariance in the update equation. dimensional ocean and atmospheric circulation models The assumption of Gaussian distributions in EnKF al- with O(106 ) or more unknowns. Applications include lows for a linear and efficient update equation to be used. state estimation in operational circulation models for the A more sophisticated update scheme needs to be derived ocean and atmosphere as well as parameter estimation or to take into account higher order statistics, which leads history matching in reservoir simulation models. For ex- to particle filtering theory [14], where the Bayes formula ample, [15]–[17] present an implementation of an EnKF is solved at each update step, although normally at a huge with an isopycnal ocean general circulation model, while computational cost. While the particle filter accounts for [18] examines an implementation of a local EnKF with a non-Gaussian distributions by representing the full pdf in state-of-the-art operational numerical weather prediction the parameter space, its applicability is normally limited model using simulated measurements. It is shown that a to estimation of a few unknowns at the cost of integrating modest-sized ensemble of 40 members can track the evolu- a very large ensemble consisting of typically more than tion of the atmospheric state with high accuracy. O(104 ) realizations. An implementation of the EnKF at the Canadian Meteo- In [8], EnKF is rederived as a sequential Monte Carlo rological Centre in [19] demonstrates EnKF for operational method starting from a Bayesian formulation. The EnKF atmospheric data assimilation and reviews EnKF with fo- can then be characterized as a special case of the particle cus on localization and sampling errors. A review in [20] of filter, where the Bayesian update step in the particle fil- a variant of EnKF called the local ensemble transform Kal- ter is approximated with a linear update step in the EnKF man filter includes a derivation of the analysis equations using only the two first moments of the predicted prob- and the numerical implementation, which differ somewhat ability density function (pdf). With linear dynamics, EnKF from what is normally used in the Kalman filtering litera- is equivalent to a particle filter, since this case is fully de- ture. An implementation of the local ensemble transform scribed by Gaussian pdfs. However, with nonlinear dy- Kalman filter with the National Centers for Environmental namics, non-Gaussian contributions may develop, and Prediction (NCEP) global model, given in [21], concludes the EnKF only approximates the particle filter. Unlike the that the accuracy of the method is competitive with opera- particle filters [14], EnKF does not need to re-sample the tional algorithms and that this technique can efficiently ensemble from the posterior pdf during the analysis step, handle large number of measurements. since each prior model realization is individually updated An implementation of EnKF with the NCEP model in to create the correct posterior ensemble. [22] is compared with the operational NCEP global data as- In EnKF, the solution is solved for in the affine space similation system. The ensemble data assimilation system spanned by the ensemble of realizations. The ensemble, outperforms a reduced-resolution version of the operation- which evolves in time according to the nonlinear dynamical al three-dimensional variational (3DVAR) data assimilation

84 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 system and shows improvement in data sparse regions. An estimation we want to improve estimates of a set of poorly observation-thinning algorithm is presented in [22], where known model parameters leading to an exact model solu- observations with little information content leading to low tion that is close to the measurements. Thus, in this case we variance reduction are filtered out. The thinning algorithm assume that all errors in the model equations are associ- improves the analysis when unmodeled error correlations ated with uncertainties in the selected model parameters. are present between nearby observations. The need for the The model initial conditions, boundary conditions, and the thinning is eliminated if the error correlations are properly model structure are all exactly known. Thus, for any set specified in the measurement error covariance matrix. of model parameters the corresponding solution is found EnKF is currently used in several research fields in from a single forward integration of the model. The way addition to the ocean and atmosphere applications cited forward is then to define a cost function that measures the throughout this article. In [23], the EnKF is used to update distance between the model prediction and the observa- a model of tropospheric ozone concentrations and to com- tions plus a term measuring the deviation of the parameter pute short-term air quality forecasts. It is found that the values from a prior estimate of the parameter values. The EnKF updated estimates provide improved initial condi- relative weight between these two terms is determined by tions and lead to better forecasts of the next day’s ozone the prior error statistics for the measurements and the pri- concentration maxima. In [24], EnKF is applied to a mag- or parameter estimate. Unfortunately, these problems are netohydrodynamic model for space weather prediction. often hard to solve [8] since the inverse problem is highly The performance of EnKF in a land surface data assimi- nonlinear, and multiple local minima may be present in lation experiment is examined in [25]. These results are the cost function. compared with a sequential importance re-sampling (SIR) In [8] the combined parameter and state estimation filter, and it is found that EnKF performs almost as well problem is considered. An improved state estimate and as the SIR filter. Furthermore, it is emphasized that EnKF a set of improved model parameters are then searched leads to skewed and even multimodal distributions despite for simultaneously. In [26] and [27] this problem is the normality assumption imposed when computing the formulated using a variational cost function that is mini- analysis updates. mized using the representer method [28], [29]. Both [26] In this article, we outline the theory behind the EnKF and [27] report convergence problems due to the non- and demonstrate its use in various high-dimensional and linearity of the problem and the possible presence of nonlinear applications in mathematical physics while multiple local minima in the cost function. In [8] it is also considering the combined parameter and state es- shown that the combined parameter and state estimation timation problem in some detail. The goal of this article problem can be formulated, and in many cases solved is to serve as an introduction and tutorial for new users efficiently, using ensemble methods. An illustrative ap- of EnKF. We thus present examples that illustrate par- plication of the EnKF for combined state and parameter ticular properties of the EnKF, such as its capability to estimation includes estimation of the permeability fields handle high- dimensional state spaces as well as highly together with dynamic state variables in reservoir simu- nonlinear dynamics. lation models [30]. These problems have huge parameter and state spaces with O(106 ) unknowns. The formula- DATA ASSIMILATION AND PARAMETER ESTIMATION tion and solution of the combined parameter and state Given a dynamical model with initial and boundary con- estimation problem using ensemble methods are further ditions and a set of measurements that can be related to discussed below. the model state, the state estimation problem is defined as finding the estimate of the model state that in some weight- REVIEW OF THE KALMAN FILTER ed measure best fits the model equations, the initial and boundary conditions, and the observed data. Unless we Variance Minimizing Analysis Scheme relax the equations and allow some or all of the dynami- The KF is a variance-minimizing algorithm that updates cal model, the conditions, and the measurements to contain the state estimate whenever measurements are available. errors, the problem may become overdetermined and no The update equations in the KF are normally derived by general solution exists. minimizing the trace of the posterior error covariance ma- We often use a prior assumption of Gaussian distribu- trix. The algorithm refers only to first- and second-order tions for the error terms. It is also common to assume that statistical moments. With the assumption of Gaussian errors in the measurements are uncorrelated with errors in priors for the model prediction and the data, the update the dynamical model. The problem can then be formulated equation can also be derived as the minimizing solution by using a quadratic cost function whose minimum defines of a quadratic cost function. We start with a vector of vari- the best estimate of the state. ables stored in c(x, t), which is defined on some spatial The parameter estimation problem is different from domain 'D with spatial coordinate x. When c(x, t) is the state estimation problem. Traditionally, in parameter discretized on a numerical grid representing the spatial

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 85 model domain, it can be represented by the state vector Extended Kalman Filter ck at each time instant tk . The cost function can then be We now assume a nonlinear model, where the true state t written as vector c k at time tk is calculated from

t t c 5 G(c 2 ) 1 q 2 , (10) J3 a 4 f a T f 21 f a k k 1 k 1 c k 5 (c k 2ck ) (Ccc ) k (c k 2ck )

a T 21 a and a forecast is calculated from the approximate equation 1 (dk 2 Mkc k ) (CPP ) k (dk 2 Mk c k ), (1)

f a a f c k 5 G(c k21 ). (11) where c k and c k are the analyzed and forecast estimates respectively, d is the vector of measurements, M is the k k The error statistics then evolve according to the equation measurement operator that maps the model state ck to the f measurements dk , (Ccc ) k is the error covariance of the pre- r r c Cf (t ) 5 G Ca (t )G T 1 C (t ) 1 dicted model state, and (CPP ) k is the measurement error co- cc k k21 cc k21 k21 qq k21 , (12) a variance matrix. Minimizing with respect to c k yields the classical KF update equations where Cqq (tk21 ) is the model error covariance matrix and r Gk21 is the Jacobian or tangent linear operator given by a f f c k 5ck 1 Kk (dk 2 Mk c k ), (2) a f ' (Ccc ) k 5 (I 2 KkMk )(Ccc ) k, (3) r G(c) 2 Gk21 5 . (13) f T f T 21 'c c Kk 5 (Ccc ) k Mk (Mk (Ccc ) k Mk 1 (CPP ) k ) , (4) k21 where the matrix Kk is the Kalman gain. Thus, both the Note that in (12) we neglect an infinite number of terms model state and its error covariance are updated. containing higher order statistical moments and higher order derivatives of the model operator. EKF is based Kalman Filter on the assumption that the contributions from all of the It is assumed that the true state c t evolves in time accord- higher order terms are negligible. By discarding these ing to the dynamical model terms we are left with the approximate error covariance expression t t c k 5 Fc k21 1 qk21, (5) f . r a rT Ccc (tk ) Gk21Ccc (tk21 )Gk21 1 Cqq (tk21 ). (14) where F is a linear model operator and qk21 is the unknown model error over one time step from k 2 1 to k. In this case Higher order approximations for the error covariance evo- a numerical model evolves according to lution are discussed in [31].

f a c k 5 Fc k21, (6) EKF with a Nonlinear Ocean Circulation Model As an application of EKF we consider a nonlinear ocean where the superscripts a and f denote analysis and fore- circulation model [7]. The model in Figure 1 is a multilayer cast. That is, given the best possible estimate (traditionally quasi-geostrophic model of the mesoscale ocean currents. named analysis) for c at time tk21 , a forecast is calculated The quasi-geostrophic model solves simplified fluid equa- at time tk , using the approximate equation (6). tions for the slow motions in the ocean and are formulated The error covariance equation is derived by subtracting in terms of potential vorticity advection in a background (6) from (5), squaring the result, and taking the expecta- velocity field represented by a stream function. Given a tion, which yields change in the vorticity field, at each time step we can solve for the corresponding stream function. f a T Ccc (tk ) 5 FCcc (tk21 )F 1 Cqq (tk21 ), (7) It is found that the linear evolution equation for the error covariance matrix leads to a linear instability. This in- where we define the error covariance matrices for the pre- stability is demonstrated in an experiment using a steady dicted and analyzed estimates as background flow defined by an eddy standing on a flat ba- thymetry [see Figure 1(a)]. This particular stream function f f t f t T results in a velocity shear and thus supports a sheared flow Ccc 5 (c 2c )(c 2c ) , (8) instability. Thus, if we add a perturbation and advect it us- a a t a t T Ccc 5 (c 2c )(c 2c ) . (9) ing the linearized equations, then the perturbation grows exponentially. This growth is exactly what is observed in The overline denotes an expectation operator, which is Figure 1(b) and (c). By choosing an initial variance equal equivalent to averaging over an ensemble of infinite size. to one throughout the model domain, we observe strong

86 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 error-variance growth at locations of large velocity and velocity shear in the 8 8 eddy. The estimated mean square errors, which equal the trace of Ccc divided by 6 6 the number of gridpoints, indicate exponential error-variance growth. 8 4 1.7 4 7 -axis This linear instability is not realistic. -axis 1.5 Y Y 1.3 6 In the real world we expect the instabil- 1.1 5 ity to saturate at a certain climatologi- 2 0.9 2 4 0.7 3 cal amplitude. As an example, in the 0.5 2 0.3 1 atmosphere it is always possible to de- 0 0.1 0 0 fine a maximum and minimum pres- 02468 02468 X-axis X-axis sure, which is never exceeded, and the (a) (b) same applies for the eddy field in the ocean. An unstable variance growth 7 cannot be accepted but is in fact what 6 the EKF provides in some cases. 5 Thus, an apparent closure prob- 4 lem is present in the error-covariance evolution equation, caused by 3 discarding third- and higher order 2 Estimated Mean Square moments in the error covariance Errors (Nondimensional) 1 equation, leading to a linear insta- 01020304050 bility. If a correct equation could be Time (c) used to predict the time evolution of the errors, then linear instabilities FIGURE 1 Example of an extended Kalman filter experiment from [7]. (a) shows the stream would saturate due to nonlinear ef- function deﬁning the velocity ﬁeld of a stationary eddy, while (b) shows the resulting error fects. This saturation is missing in variance in the model domain after integration from t 5 0 to t 5 25. Note the large errors EKF, as confirmed by [32]–[34]. at locations where velocities are high. (c) shows the exponential time evolution of the estimated variance averaged over the model domain. Extended Kalman Filter for the Mean Equations (11), (13), and (14) are the most commonly It can be argued that for a statistical estimator it makes used for EKF. A weakness of the formulation is that the more sense to work with the mean than a central forecast. central forecast is used as the estimate. The central fore- After all, the central forecast does not have any statistical cast is the single model realization initialized with the interpretation as illustrated by running an atmospheric expected value of the initial state and then integrated model without assimilation updates. The central forecast by the dynamical model and updated at the measure- then becomes just one realization out of infinitely many ment steps. For nonlinear dynamics the central forecast possible realizations, and it is not clear how we can relate may not be equal to the expected value, and thus it is the central forecast to the climatological error covariance just one realization from an infinite ensemble of pos- estimate. On the other hand the equation for the mean sible realizations. provides an estimate that converges to the climatological An alternative approach is to derive a model for the evo- mean, and the covariance estimate thus describes the er- lution of the first moment or mean. First G(c) is expanded ror variance of the climatological mean. All applications around c to obtain of the EKF for data assimilation in ocean and atmospheric models use an equation for the central forecast. However, r 1 rr c the interpretation using the equation for the mean sup- G(c) 5G(c ) 1G (c )(c2c ) 1 G (c )(c2c ) 2 1 . 2 ports the formulation used in EnKF. (15) ENSEMBLE KALMAN FILTER Inserting (15) in (11) and taking the expectation or ensem- We begin by representing the error statistics using an en- ble average yields semble of model states. Next, we present an alternative to the traditional error covariance equation for predicting error statistics. Finally, we derive the traditional EnKF 1 rr c ck 5 G(ck21 ) 1 Gk21Ccc (tk21 ) 1 . (16) analysis scheme. 2

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 87 Representation of Error Statistics describes the time evolution of the pdf f(c) of the model f a The error covariance matrices Ccc and Ccc for the state. This equation has the form predicted and analyzed estimate in the Kalman filter are defined in terms of the true state in (8) and (9). How- ' c ' c '2 c T f( ) (gi f( )) 1 f( )(hCqqh ) ij ever, since the true state is not known, we define the en- 1 a 5 a , (21) 't 'c 2 'c 'c semble covariance matrices around the ensemble mean i i i,j i j c according to

where gi is the component number i of the model operator e f f f f f T T (Ccc ) 5 (c 2c )(c 2c ) , (17) G and hCqqh is the covariance matrix for the model errors. The Fokker-Planck equation (21) does not entail any ap- e a 5 c a 2ca c a 2 c a T (Ccc ) ( )( ) , (18) proximations and can be considered as the fundamental equation for the time evolution of the error statistics. A de- where now the overline denotes an average over the ensem- tailed derivation is given in [35]. Equation (21) describes the ble. Thus, we can use an interpretation where the ensemble change of the probability density in a local “volume,” which mean is the best estimate and the spreading of the ensem- depends on the divergence term describing a probability flux ble around the mean is a natural definition of the error in into the local “volume” (impact of the dynamical equation) the ensemble mean. and the diffusion term, which tends to flatten the probabil- Thus, instead of storing a full covariance matrix, we can ity density due to the effect of stochastic model errors. If (21) represent the same error statistics using an appropriate en- could be solved for the pdf, it would be possible to calculate semble of model states. Given an error covariance matrix, statistical moments such as the mean and the error covariance an ensemble of finite size provides an approximation to the for the model forecast to be used in the analysis scheme. error covariance matrix, and, as the size N of the ensemble A linear model for a Gauss-Markov process, in which the increases, the errors in the Monte Carlo sampling decrease initial condition is assumed to be taken from a normal distri- proportionally to 1/"N. bution, has a probability density that is completely character- Suppose now that we have N model states or realizations ized by its mean and covariance for all times. We can then in the ensemble, each of dimension n. Each realization can derive exact equations for the evolution of the mean and the be represented as a single point in an n-dimensional state covariance as a simpler alternative than solving the full Fok- space, while together the realizations constitute a cloud of ker-Planck equation. These moments of (21), including the er- such points. In the limit as N goes to infinity, the cloud of ror covariance (7), are easy to derive, and several methods are points can be described using the pdf illustrated in [35]. The KF uses the first two moments of (21). For a nonlinear model, the mean and covariance matrix dN f(c) 5 , (19) do not in general characterize the time evolution of f(c). N These quantities do, however, determine the mean path where dN is the number of points in a small unit volume and the width of the pdf about that path, and it is possible and N is the total number of points. Statistical moments to solve approximate equations for the moments, which is can then be calculated from either f(c) or the ensemble the procedure characterizing the EKF. representing f(c) . The EnKF applies a Markov chain Monte Carlo (MCMC) method to solve (21). The probability density Prediction of Error Statistics is then represented by a large ensemble of model states. A nonlinear model that contains stochastic errors can be By integrating these model states forward in time ac- written as the stochastic differential equation cording to the model dynamics, as described by the stochastic differential equation (20), this ensemble pre- dc5G(c)dt 1 h(c)dq. (20) diction is equivalent to using a MCMC method to solve the Fokker-Planck equation. Equation (20) states that an increment in time yields an in- Dynamical models can have stochastic terms embedded crement in c, which, in addition, is influenced by a random within the nonlinear model operator, and the derivation of contribution from the stochastic forcing term h(c)dq, rep- the associated Fokker-Planck equation can become com- resenting the model errors. The term dq describes a vector plex. Fortunately, the explicit form of the Fokker-Planck

Brownian motion process with covariance Cqqdt. Since the equation is not needed, since, to solve this equation using model operator G in (20) is not an explicit function of the MCMC methods, it is sufficient to know that the equation random variable dq, the Ito interpretation is used rather and a solution exist. than the Stratonovich interpretation [35]. When additive Gaussian model errors forming a Markov Analysis Scheme process are used, it is possible to derive the Fokker-Planck We now derive the update scheme in the KF using the equation (also called Kolmogorov’s equation), which ensemble covariances as defined by (17) and (18). For

88 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 convenience the time index k is omitted in the equa- a a f f c j 2c 5 (I 2 KeM)(c j 2c ) 1 Ke (dj 2 d), (26) tions to follow. As shown by [9] it is essential that the observations be treated as random variables having a distribution with mean equal to the observed value and where we use the Kalman gain covariance equal to CPP . Thus, we start by defining an e f T e f T e 21 ensemble of observations Ke 5 (Ccc ) M (M(Ccc ) M 1 CPP ) . (27)

P dj 5 d 1 j, (22) The error covariance update is then derived as where j counts from one to the number N of ensemble e a a a a a T (Ccc ) 5 (c 2c )(c 2 c ) members. By subtracting any nonzero mean from the N f f P 5 ((I 2 KeM)(c 2 c ) 1 Ke (d 2 d)) samples j, it is ensured that the simulated random mea- f f T surement errors have mean equal to zero and thus the ran- 3 ((I 2 KeM)(c 2 c ) 1 Ke (d 2 d)) dom perturbations do not introduce any bias in the update. 5 ( 2 )(c f 2cf )(c f 2cf ) T ( 2 ) T Next we define the ensemble covariance matrix of the mea- I KeM I KeM T T surement errors as 1 Ke (d 2 d)(d 2 d) Ke e f T T e T e T 5 (I 2 KeM)(Ccc ) (I 2 M Ke ) 1 KeCPPKe CPP 5 PP , (23) e f e f e f T T 5 (Ccc ) 2 KeM(Ccc ) 2 (Ccc ) M Ke while, in the limit of infinite ensemble size this matrix con- e f T e T 1 Ke (M(Ccc ) M 1 CPP )Ke verges to the prescribed error covariance matrix CPP used in e f the Kalman filter. The following discussion is valid using 5 (I 2 KeM)(Ccc ) . (28) both an exactly prescribed CPP and an ensemble representa- e tion CPP of CPP, which can be useful in some implementa- The last expression in (28) is the traditional result for the mini- tions of the analysis scheme. mum error covariance found in the KF analysis scheme. Thus, The analysis step in EnKF consists of updates performed (28) implies that EnKF in the limit of an infinite ensemble size on each of the ensemble members, as given by gives the same result as KF. It is assumed that the distributions used to generate the model-state ensemble and the observa- a f e f T e f T e 21 f c j 5cj 1(Ccc ) M (M(Ccc ) M 1CPP ) (dj 2Mc j ). (24) tion ensemble are independent. Using a finite ensemble size, neglecting the cross-term introduces sampling errors. Note With a finite ensemble size, the use of the ensemble covari- that the derivation (28) shows that the observations d must ances introduces an approximation of the true covarianc- be treated as random variables to introduce the measurement e es. Furthermore, if the number of measurements is larger error covariance matrix CPP into the expression. That is, than the number of ensemble members, then the matrices e f T e e T T M(Ccc ) M and CPP are singular, and pseudo inversion CPP 5 PP 5 (d 2 d)(d 2 d) . (29) must be used. Equation (24) implies that A full-rank measurement error covariance matrix can be used in (27), but the use of an ensemble representation of a f e f T e f T e 21 f c 5c 1 (Ccc ) M (M(Ccc ) M 1 CPP ) (d 2 Mc ), (25) the measurement error covariance matrix leads to an exact cancellation in the second last line in (27), which becomes where d 5 d since the measurement perturbations have ensemble mean equal to zero. Thus, the relation between e f T e T e f T e Ke (M(Ccc ) M 1 CPP )Ke 5 Ke(M(Ccc) M 1CPP ) the analyzed and predicted ensemble mean is identical e f T e 21 e f 3 (M(C ) M 1CPP ) M(C ) to the relation between the analyzed and predicted state in cc cc e f,a 5 ( e ) f the standard Kalman filter, apart from the use of (Ccc ) KeM Ccc . (30) e f,a and CPP instead of Ccc and CPP. Note that the introduction of an ensemble of observations does not affect the update Thus, we conclude that the use of a low-rank measurement of the ensemble mean. error covariance matrix, represented by the measurement It is now shown that, by updating each of the ensemble perturbations, when computing the Kalman gain, reduces members using the perturbed observations, we can create the sampling errors in EnKF. The remaining sampling er- a new ensemble with the correct error statistics. We derive rors come from neglecting the cross- correlation term be- the analyzed error covariance estimate resulting from the tween the measurements and the forecast ensemble, which analysis scheme given above, although we retain the stan- is nonzero with a final ensemble size, and from the approx- dard Kalman filter form for the analysis equations. First, imation of the state error covariance matrix using a finite (24) and (25) are used to obtain ensemble size.

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 89 This article provides a fundamental theoretical basis for understanding EnKF and serves as a useful text for future users.

The above derivation assumes that the inverse in the Ka- experiments with zero model error to examine the im- lman gain (27) exists. However, the derivation also holds pact of the dynamical evolution of the error covariance. when the matrix in the inversion is of low rank, for ex- The true initial state is sampled from a normal dis- ample, when the number of measurements is larger than tribution N , with mean equal to zero, variance equal e the number of realizations and the low-rank CPP is used. to one, and a spatial decorrelation length of 20 m. The The inverse in (27) can then be replaced with the pseudoin- first guess solution is generated by drawing another verse, and we can write the Kalman gain as sample from N and adding this sample to the tr ue state. The initial ensemble of 1000 realizations is generated e f T e f T e 1 N Ke 5 (Ccc ) M (M(Ccc ) M 1 CPP ) . (31) by adding samples drawn from to the first guess solution. Thus, the initial state is assumed to have an When the matrix in the inversion is of full rank, (31) error variance equal to one. Four measurements of the becomes identical to (27). Using (31) the expression (30) true solution, distributed regularly in the model do- becomes main, are assimilated every fifth time step. The measurements of the wave amplitude are contaminated e f T e T e f T e f T e 1 by errors of variance equal to 0.01, in nondimensional Ke (M(Ccc ) M 1 CPP ) Ke 5 (Ccc ) M (M(Ccc ) M 1 CPP ) e f T e units, and we assume uncorrelated measurement er- 3 (M(C ) M 1 CPP ) cc rors. The length of the integration is 300 s, which is 50 1 1 e 2 f T e 2 1 1 e 2 f 3 M Ccc M 1CPP M Ccc s longer than the time of 250 s needed for the solution e f T e f T 5 (Ccc ) M (M(Ccc ) M to advect from one measurement to the next. e 1 e f The example in Figure 2 illustrates the convergence of 1 CPP ) M(Ccc ) the estimated solution at various times during the experi- 5 K M(Ce ) f, (32) e cc ment. In particular, Figure 2 shows how information from measurements is propagated with the advection speed and where we have used the property Y1 5 Y1YY1 of the how the error variance is reduced each time measurements pseudoinverse. are assimilated. The first plot shows the result of the first It should be noted that the EnKF analysis scheme is ap- update with the four measurements. Near the measurement proximate in the sense that non-Gaussian contributions locations, the estimated solution is consistent with both the in the predicted ensemble are not properly taken into ac- true solution and the measurements, and the error variance count. In other words, the EnKF analysis scheme does is reduced accordingly. The second plot is taken at t 5 150 s, not solve the Bayesian update equation for non-Gaussian that is, after 30 updates with measurements. Now the infor- pdfs. On the other hand, the EnKF analysis scheme is not mation from the measurements has propagated to the right just a resampling of a Gaussian posterior distribution. with the advection speed, as seen both from direct compari- Only the updates defined by the right-hand side of (24), son of the estimate with the true solution, as well as from the which are added to the prior non-Gaussian ensemble, are estimated variance. The final plot, which is taken at t 5 300 linear. Thus, the updated ensemble inherits many of the s, shows that the estimate is now in good agreement with non-Gaussian properties from the forecast ensemble. In the true solution throughout the model domain. Note also summary, we have a computationally efficient analysis the linear increase in error variance to the right of the mea- scheme where we avoid resampling of the posterior. surements, which is caused by the addition of model errors at each time step. It is also clear that the estimated solution Ensemble Kalman Filter with deteriorates far from the measurements in the advection a Linear Advection Equation direction. For linear models with regular measurements at The properties of EnKF are now illustrated in a simple fixed locations and stationary error statistics, the increase example when used with a one-dimensional linear ad- of error variance from model errors balances the reduction vection model. The model describes general transport from the updates with measurements. in a prescribed background flow on a periodic domain of length 1000 m. The model has the constant advection Discussion speed u 5 1 m/s, the grid spacing Dx 5 1 m, and the time We now have a complete system of equations that consti- step Dt 5 1 s. Given an initial condition, the solution of tute the EnKF, and the similarity with the standard KF is this model is exactly known, which facilitates realistic maintained both for the prediction of error covariances

90 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 and in the analysis scheme. For lin- 8 ear dynamics the EnKF solution Measurements converges exactly to the KF solution 7 True Solution Estimated Solution with increasing ensemble size. 6 Estimated Std.Dev. One of the advantages of EnKF is 5 that, for nonlinear models, the equa- 4 tion for the mean is solved and no closure assumption is used since each 3 ensemble member is integrated by the 2 full nonlinear model. This nonlinear 1 error evolution is contrary to the ap-

Wave Amplitude (Nondimensional) 0 proximate equation for the mean (16), 0 200 400 600 800 1000 Distance (Meter) which is used in EKF. (a) Thus, it is possible to interpret EnKF as a purely statistical Monte Carlo meth- 8 od where the ensemble of model states Measurements 7 True Solution evolves in state space with the mean as Estimated Solution the best estimate and the spreading of 6 Estimated Std.Dev. the ensemble as the error variance. At 5 measurement times each observation 4 is represented by another ensemble, 3 where the mean is the actual measure- 2 ment and the variance of the ensemble represents the measurement errors. 1

Wave Amplitude (Nondimensional) 0 Thus, we combine a stochastic predic- 0 200 400 600 800 1000 tion step with a stochastic analysis step. Distance (Meter) (b)

PROBABILISTIC FORMULATION 8 For the ensemble Kalman smoother Measurements 7 True Solution (EnKS) [8], the estimate at a particular Estimated Solution time is updated based on past, present, 6 Estimated Std.Dev. and future measurements. In contrast, a 5 filter estimate is influenced only by the 4 past and present measurements. Thus, 3 EnKF becomes a special case of EnKS, 2 where information from measurements 1 is not projected backward in time. The

Wave Amplitude (Nondimensional) 0 assumptions of measurement errors 0 200 400 600 800 1000 being independent in time and the dy- Distance (Meter) namical model being a Markov process (c) are sufficient to derive the EnKF and FIGURE 2 An ensemble Kalman ﬁ lter experiment. For this experiment a linear advection the EnKS. These assumptions are nor- equation illustrates how a limited ensemble size of 100 realizations facilitates estimation mally not critical and are already used in a high-dimensional system whose state vector contains 1000 entries. The plots show in the original KF. It is also possible to the reference solution, measurements, estimate, and standard deviation at three different include the estimation of static model times, (a) t 5 5 s, (b) t 5 150 s, and (c) t 5 300 s. parameters in a consistent manner. The combined parameter and state estimation problem for a dy- 'c(x,t) 5 G(c(x, t), a(x))1 q(x, t), (33) namical model can be formulated as finding the joint pdf of 't the parameters and model state, given a set of measurements c(x, t0 ) 5C0 (x) 1 a(x), (34) [ D and a dynamical model with known uncertainties. c(j,t) 5Cb (j, t) 1 b(j, t), for all j d , (35) r a(x) 5a (x) 1a (x), (36) Model Equations and Measurements 0 M3c, a4 5 d 1 P. (37) We consider a model with associated initial and boundary conditions on the spatial domain D with boundary 'D, and The model state c(x, t) [ Rnc is a vector consisting of the with observations nc model variables, where each variable is a function of

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 91 when we introduce these error terms without additional ...... d1 d2 d3 dj dJ conditions, the system has infinitely many solutions. The ...... ti(1) ti(2) ti(3) ti(j) ti(J) way to proceed is to introduce a statistical hypothesis about the errors, for example, assuming that the errors are

t0 t1 t2 t3 t4 t5 t6 t7 ... ti ... tk normally distributed with means equal to zero and known ψ ψ ψ ψ ψ ψ ψ ψ ψ ψ 0 1 2 3 4 5 6 7 ... i ... k error covariances.

FIGURE 3 Discretization in time. The time interval is discretized into Bayes Theorem 1 c 5c 1 2 k 1 nodes, t 0 to t k, where the model state vector i ti is We now consider the model variables, the poorly known deﬁned. The measurement vectors d are available at the discrete j parameters, the initial and boundary conditions, and the subset of times t 1 2, where j 5 1, . . . , j. i j measurements as random variables, which can be described by pdfs. The joint pdf for the model state, as a function of space and time. The nonlinear model is defined by (33), space, time, and the parameters, is f(c, a). Furthermore, where G(c, a) [ Rnc is the nonlinear model operator. More for the measurements we can define the likelihood func- general forms can be used for the nonlinear model operation f(d|c, a). Thus, we may have measurements of both tor, although (33) suffices to demonstrate the methods con- the model state and the parameters. Using Bayes theorem, sidered here. the parameter and state estimation problem is now written The model state is assumed to evolve in time from the ini- in the simplified form C [ Rnc tial state 0 (x) defined in (34), under the constraints C j [ Rnc c a 5g c a c a of the boundary conditions b ( , t) defined in (35). f( , |d) f( , )f(d| , ), (39) The coordinate j runs over the surface 'D,where the boundary conditions are defined. The variable b is used to repre- where g is a constant of proportionality whose computa- sent errors in the boundary conditions. tion requires the evaluation of the integral of (39) over the na We define a(x) [ R as the set of na poorly known pa- high-dimensional solution and parameter space. rameters of the model. The parameters can be a vector of Parameter estimation problems, in particular, for ap- spatial fields in the form written here, or, alternatively, a plications involving high-dimensional models, such as vector of scalars, and are assumed to be constant in time. oceanic, atmospheric, marine ecosystem, hydrology, and a [ Rna A prior estimate 0 (x) of the vector of parameters petroleum applications, often do not include the model a(x) [ Rna is introduced through (36), and possible errors state as a variable to be estimated. It is more common to r in the prior are represented by a (x) . first solve for the poorly known parameters by minimizing Additional conditions are present in the form of the an appropriate cost function where the model equations measurements d [ RM. Both direct point measurements of act as a strong constraint and then rerun the model to find the model solution and more complex parameters that are the model solution. It is then implicitly assumed that the nonlinearly related to the model state can be used. For the model does not contain errors, an assumption that gener- time being we restrict ourselves to the case of linear mea- ally is invalid. surements. An example of a direct measurement functional In the dynamical model, we specify initial and bound- is then ary conditions as random variables, and we include prior information about the parameters. Thus, we define the pdfs c c a c c a f( 0 ), f( b ), and f( ) for the estimates 0 , b, and of T M 3c4 5 33c ( )dc d( 2 )d( 2 ) 5c( )dc i x, t i t ti x xi dt dx xi, ti i, the initial and boundary conditions, and the parameters, (38) respectively. Instead of f(c,a), we write

c a c c 5 c a c c c c a where the integration is over the space and time domain of f( , , 0, b ) f( | , 0, b )f( 0 )f( b )f( ). (40) the model. The measurement di is related to the model-state nc dc [ R variable as selected by the vector i and evaluated at Equation (39) should accordingly be written as the space and time location (xi, ti ). If a model with three c a c c 5g c a c c c c a c a state variables is used and the second variable is measured, f( , , 0, b|d) f( | , 0, b )f( 0 )f( b )f( )f(d| , ), T dc ( ) d( 2 ) then i becomes the vector 0, 1, 0 , while t ti and (41) d 2 (x xi ) are Dirac delta functions. In (33)–(37) we include unknown error terms, q, a, b, where it is also assumed that the boundary conditions r a , and P, which represent errors in the model equations, and initial conditions are independent, although this as- the initial and boundary conditions, the first guess for the sumption may not be true for the locations where initial model parameters, and the measurements, respectively. and boundary conditions intersect at t0 . Here the pdf c a c c Without these error terms the system as given above is f( | , 0, b ) is the prior density for the model solution overdetermined and has no solution. On the other hand, given the parameters and initial and boundary conditions.

92 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 Discrete Formulation That is, the model state and parameters with their respec- In the following discussion we work with a model state that tive uncertainties are updated sequentially in time when- is discretized in time, that is, c(x, t) is represented at fixed ever the measurements become available. c 5c 5 c time intervals as i (x) (x, ti ) with i 0, 1, , k; see We note again that this recursion does not introduce Figure 3. Furthermore, we define the pdf for the model in- any significant approximations and thus describes the full c c a c tegration from time ti21 to ti as f( i| i21, , b (ti )), which inverse problem as long as the model is a Markov process assumes that the model is a first-order Markov process. The and the measurements errors are independent in time. joint pdf for the model solution and the parameters in (40) Further, for many problems the recursive processing of can now be written as measurements provides a better posed approach for solving the inverse problem than trying to process all of the c c c a c c f( 1, , k, , 0, b ) measurements simultaneously as is normally done in vari- k 5 a c c c c a c ational formulations. Sequential processing is also conve- f( )f( b )f( 0 ) q f( i| i21, , b ). (42) i51 nient for forecasting problems where new measurements can be processed when they arrive without recomputing the full inversion. Independent Measurements We now assume that the measurements d [ RM can be divided Ensemble Smoother [ Rmj into subsets of measurement vectors dj , collected at times The ensemble smoother (ES) can be derived by assuming 5 c , , , c, , ti(j), with j 1, , J and 0 i(1) i(2) i(J) k. that the pdfs for the model prediction as well as the likeli- c 5c a The subset dj depends only on (ti(j) ) i(j) and . Further- hood are Gaussian and by using the original Bayes theo- more, it is assumed that the measurement errors are uncorre- rem (41). The derivation requires that we approximate lated in time. We can then write the pdfs resulting from an integration of the ensemble through the whole assimilation time period with Gauss- J c a 5 c a ian pdfs. We can then replace Bayes theorem with a least f(d| , ) q f(dj| i(j), ), (43) j51 squares cost function similar to (1), but with the time dimension included, and the analysis becomes a standard and from Bayes theorem we obtain variance minimizing analysis in space and time. All of the data are processed in one step, and the solution is up- c c c a c c dated as a function of space and time, using the space- f( 1, , k, , 0, b|d) k J time covariances estimated from the ensemble of model 5g a c c c c a c a f( )f( 0 )f( b ) q f( i| i21, ) q f(dj| i(j), ). (44) realizations. The ES in [39] is computed as a first-guess i51 j51 estimate, which is the mean of the freely evolving ensemble, plus a linear combination of time-dependent influ- Sequential Processing of Measurements ence functions, which are calculated from the ensemble It is shown in [36] and [37] that, in the case of time-correlated statistics. Thus, the method is equivalent to a variance- model errors, it is possible to reformulate the problem as a minimizing objective analysis method where the time first-order Markov process by augmenting the model er- dimension is included. rors to the model-state vector. A simple equation forced by white noise can be used to simulate the time evolution of Ensemble Kalman Smoother the model errors. An assumption of Gaussian pdfs for the model prediction In [38] it is shown that a general smoother and filter can (the prior) and the distribution for the data (the likeli- be derived from the Bayesian formulation given in (44). We hood function) in (45) allows us to replace the Bayesian now rewrite (44) as a sequence of iterations update formula with a least squares cost function similar to (1) but additionally including the state vector at c c c a c c c f( 1, , i(j), , 0, b|d1, , dj ) all previous times. Again the cost function is minimized using a standard variance-minimizing analysis scheme, 5g (c c c 2 a c c c 2 ) f 1, , i(j 1), , 0, b|d1, , dj 1 involving a state variable defined from the initial time to i(j) 3 c c a c a the current update time. That is, we also update the state q f( i| i21, )f(dj| i(j), ). (45) i5i(j21)11 variables backward in time using the combined time and space ensemble covariances. This scheme results in the Thus, we formulate the combined parameter and state- EnKS as in [38]. estimation problem using Bayesian statistics and see that, under the condition that measurement errors are indepen- Ensemble Kalman Filter dent in time and the dynamical model is a Markov process, The EnKF is just a special case of EnKS where the up- a recursive formulation can be used for Bayes theorem. dates at previous times are skipped. EnKF is obtained by

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 93 given by (x0, y0, z0 ) 5 (1.508870, 2 1.531271, 25.46091) and 20 the time interval is t [ 30, 404 . ES: Estimate 15 The observations and initial conditions are simulated by

10 x adding normally distributed white noise with zero mean 5 and variance equal to 2.0 to the reference solution. All of 0 the variables x, y, and z are measured. In the calculation –5 of the ensemble statistics, an ensemble of 1000 members is

Amplitude: –10 –15 used. The same simulation is rerun with various ensemble –20 sizes, and the differences between the results are negligible 0 5 10 15 20 25 30 35 40 with as few as 50 ensemble members. Time: t The three methods discussed above are now exam- (a) ined and compared in an experiment where the time

10 between measurements is Dtobs 5 0.5, which is similar to ES: Errors Experiment B in [40]. In the upper plots in figures 4–6, 8

x the red line denotes the estimate and the blue line is the 6 reference solution. In the lower plots the red line is the 4 standard deviation estimated from ensemble statistics, while the blue line is the true residuals with respect to Amplitude: 2 the reference solution. 0 0 5 10 15 20 25 30 35 40 Time: t Ensemble Smoother Solution (b) The ES solution for the x-component and the associated estimated error variance are given in Figure 4. It is found that the ES performs rather poorly with the current data FIGURE 4 Ensemble smoother. (a) shows the inverse estimate (red line) and reference solution (blue line) for x. (b) shows the density. Note, however, that even if the fit to the reference corresponding estimated standard deviations (red line) as well as trajectory is poor, the ES solution captures most of the tran- the absolute value of the difference between the reference solu- sitions. The main problem is related to the estimate of the tion and the estimate, that is, the real posterior errors (blue line). amplitudes in the reference solution. The problem is linked (Reproduced from [38] with permission.) to the appearance of non-Gaussian contributions in the distribution for the model evolution, which can be expected in integrating out the state variables at all previous times such a strongly nonlinear case. from (45) and assuming that the resulting model pdf for Clearly, the error estimates evaluated from the pos- the current time as well as the likelihood function are terior ensemble are not large enough at the peaks where Gaussian. The incremental update (45) can then be re- the smoother performs poorly. The underestimated errors placed by the penalty function (1), leading to the standard again result from neglecting the non-Gaussian contribution Kalman filter analysis equations. Thus, the measurements from the probability distribution for the model evolution. are filtered. At the final time, or, actually, from the lat- Otherwise, the error estimate looks reasonable with mini- est update and for predictions into the future, EnKF and ma at the measurement locations and maxima between the EnKS provide identical solutions. measurements. Note again that if a linear model is used, then the posterior density becomes Gaussian and the ES EXAMPLE WITH THE LORENZ EQUATIONS provides, in the limit of an infinite ensemble size, the same The example from [40] and [38] with the chaotic Lorenz solution as the EnKS and the Kalman smoother. model of [41] is now used to compare ES, EnKS, and EnKF. The Lorenz model consists of the coupled system of nonlin- Ensemble Kalman Filter Solution ear ordinary differential equations given by EnKF does a reasonably good job tracking the reference solution with the lower data density, as can be seen in Fig- dx 5g(y 2 x), (46) ure 5. One transition is missed near t 5 18, while EnKF has dt 5 dy problems, for example, at t 1, 5, 9, 10, 13, 17, 19, 23, 26, and 5rx 2 y 2 xz, (47) 34. The error variance estimate is consistent, showing large dt peaks at the locations where the estimate obviously has dz 5 xy 2bz. (48) problems tracking the reference solution. Note also the sim- dt ilarity between the absolute value of the residual between Here x(t), y(t), and z(t) are the dependent variables, the reference solution and the estimate, and the estimated and we choose the parameter values g510, r528, and standard deviation. For all peaks in the residual, a corre- b58/3. The initial conditions for the reference case are sponding peak is present in the error variance estimate.

94 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 The error estimates show the same behavior as in [32] numerical implementation and an alternative interpretation with very strong error growth when the model solution of the method. In the discussion below, we omit the time in- passes through the unstable regions of the state space and dex, since all variables refer to the same update time. otherwise weak error variance growth or even decay in the stable regions. Note, for example, the low error variance for Ensemble Representation of the Covariance t [ 328, 344 corresponding to the oscillation of the solution We define the matrix A whose columns are the ensemble [ Rn around one of the attractors. members ci by In this case, the nonlinearity of the problem causes c [ Rn3N EnKF to perform better than the ES. In fact, at each up- A 5 (c1, c2, , cN ) , (49) date, the realizations are pulled toward the true solution and are not allowed to diverge toward the wrong attrac- where N is the number of ensemble members and n is the tors of the system. In addition, the Gaussian increments size of the model state vector. The ensemble mean is stored of the ensemble members lead to an approximately in each column of A, which is defined as Gaussian ensemble distributed around the true solution.

This property of the sequential updating is not exploited A 5 A1N, (50) in the ES, where realizations evolve freely and lead to [ RN3N non-Gaussian ensemble distributions. Note again that if where 1N is the matrix whose entries are all equal the model dynamics are linear, then, in the limit of an 1/N. We then define the ensemble perturbation matrix as infinite ensemble size, EnKF gives the same solution as r the Kalman filter and the ES solution gives a better result A 5 A 2 A 5 A(I 2 1N ). (51) than EnKF. e [ Rn3n The ensemble covariance matrix Ccc can be de- Ensemble Kalman Smoother Solution fined as Figure 6 shows the solution obtained by EnKS. This so- 1 r r lution is smoother in time than the EnKF solution and Ce 5 A (A ) T. (52) cc N 2 1 provides a better fit to the reference trajectory. All of the problematic locations in the EnKF solution are recovered in the smoother estimate. Note, for example, that Measurement Perturbations the additional transitions at t 5 1, 5, 13, and 34 in the Given a vector of measurements d [ Rm, where m is the EnKF solution are eliminated in the smoother. In addi- number of measurements, we define the N vectors of per- tion, the missed transition at t 5 17 is recovered by EnKS. turbed observations as The error estimates are reduced throughout the time in- P c terval. In particular the large peaks in the EnKF solution are dj 5 d 1 j, j 5 1, , N, (53) now significantly reduced. As for the EnKF solution, there are corresponding peaks in the error estimates for all the which are stored in the columns of the matrix peaks in the residuals, which suggests that the EnKS error c [ Rm3N estimate is consistent with the true errors. In fact, in [40], it D 5 (d1, d2, , dN ) , (54) is found that the EnKS solution with Dtobs 5 0.5 seems to do as well or better than the EnKF solution with Dtobs 5 0.25. while the ensemble of perturbations, with ensemble mean Note that, if only z is measured in the Lorenz equations, equal to zero, are stored in the matrix the measured information is not sufficient to determine the P P c P [ Rm3N solution. EnKF in this case develops realizations located at E 5 ( 1, 2, , N ) , (55) both attractors, and a bimodal distribution develops. The EnKF update breaks down with the bimodal distribution, from which we construct the ensemble representation of but even the use of a Bayesian update in a particle filter does the measurement error covariance matrix not suffice to determine the correct solution in this case since the bimodal distribution has the same probability for both e 1 T peaks of the distribution. Note also that the assumption of CPP 5 EE . (56) N 2 1 Gaussian pdfs in the analysis equation is an approximation, whose severity must be judged on a case-by-case basis. Analysis Equation PRACTICAL IMPLEMENTATION The analysis equation (24), expressed in terms of the en- In [37] it is shown that the EnKF analysis scheme can be semble matrices, is formulated in terms of the ensemble without reference to a e T e T e 21 the ensemble covariance matrix, which allows for efficient A 5 A 1 CccM (MCccM 1 CPP ) (d 2 MA). (57)

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 95 The analysis equation (59) can then be written as 20 EnKF: Estimate 15 r r Aa 5 A 1 A STC21D

x 10 T 21 r 5 5 A 1 A(I 2 1N )S C D T 21 r 0 5 A(I 1 (I 2 1N )S C D ) r –5 5 A(I 1 STC21D ) Amplitude: –10 5 AX, (62) –15 –20 051015 20 25 30 35 40 T ; [ RN3N where we use (51) and 1NS 0. The matrix X is Time: t defined as (a) r 10 X 5 I 1 STC21D . (63) EnKF: Errors 8

x Thus, the EnKF analysis becomes a combination of the fore- 6 cast ensemble members and is searched for in the space 4 spanned by the forecast ensemble. It is clear that (62) is a stochastic scheme due to the use Amplitude: Amplitude: 2 of randomly perturbed measurements. Thus, (62) allows 0 for a nice interpretation of EnKF as a sequential Markov 0 5 10 15 20 25 30 35 40 chain Monte Carlo algorithm, while making it easy to un- Time: t derstand and implement the method. The efficient and (b) stable numerical implementation of the analysis scheme is discussed in [8], including the case in which C is singular FIGURE 5 Ensemble Kalman filter. (a) shows the inverse estimate (red line) and reference solution (blue line) for x . (b) shows the due to the number of measurements being larger than the corresponding estimated standard deviations (red line) as well as number of realizations. the absolute value of the difference between the reference solu- In practice, the ensemble size is critical since the com- tion and the estimate, that is, the real posterior errors (blue line). putational cost scales linearly with the number of real- (Reproduced from [38] with permission.) izations. That is, each individual realization needs to be integrated forward in time. The cost associated with the ensemble integration motivates the use of an ensemble Using the ensemble of innovation vectors defined as with the minimum number of realizations that can provide acceptable accuracy. r D 5 D 2 MA, (58) There are two major sources of sampling errors in EnKF, namely, the use of a finite ensemble of stochastic model re- along with the definitions of the ensemble error cova- alizations as well as the introduction of stochastic measure- riance matrices in (52) and (56), the analysis can be ex- ment perturbations [8], [42]. In addition, stochastic model pressed as errors influence the predicted error statistics, which is approximated by the ensemble. The sampling of physically r r r r r Aa 5 A 1 A A TMT (MA A TMT 1 EET ) 21D , (59) acceptable model realizations and realizations of model errors is chosen to ensure that the ensemble matrix has full where all references to the error covariance matrices are rank and good conditioning. Furthermore, stochastic per- eliminated. turbation of measurements used in EnKF can be avoided We now introduce the matrix S [ Rm3N holding the using a square root implementation of the analysis scheme, measurements of the ensemble perturbations by to be discussed below.

r S 5 MA , (60) EnKF for Combined Parameter and State Estimation When using EnKF to estimate poorly known model and the matrix C [ Rm3m, parameters, we start by representing the prior pdfs of the parameters by an ensemble of realizations, which is aug- T C 5 SS 1 (N 2 1)CPP. (61) mented to the state ensemble matrix A at the update steps. The poorly known parameters are then updated using the Here we can use the full-rank, exact measurement error covariance-minimizing analysis scheme, where the covari- variance matrix CPP as well as the low-rank representation ances between the predicted data and the parameters are e CPP defined in (56). used to update the parameters.

96 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 The updated ensemble for the parameters is included in the space defined by the initial ensemble of realizations. Thus, 20 EnKS: Estimate EnKF reduces the dimension of the combined parameter and 15

state estimation problem to a size given by the dimension of x 10 the ensemble space. This simplification allows us to handle 5 large sets of parameters, but it requires that the true param- 0 –5 eters can be well represented in the ensemble space. Amplitude: –10 The parameter estimation approach used in EnKF and –15 EnKS is a statistical minimization, or sampling of a posterior –20 pdf, rather than a traditional minimization of a cost function. 051015 20 25 30 35 40 Time: t Thus, EnKF does not to the same extent suffer from the typical problems of converging to local minima as in parameter-esti- (a) mation methods. EnKF rather has a problem with multimodal 10 EnKS: Errors pdfs. However, the EnKF does not search for the mode but 8 rather the mean of the distribution. Thus, in many cases where x a minimization method might converge to a local minimum, 6 EnKF provides an estimate that is the mean of the posterior. 4 An important point is that the sequential updating used in Amplitude: EnKF reduces the risk of development of multimodal distribu- 2 tions, a result that is supported by the Lorenz example. 0 051015 20 25 30 35 40 DETERMINISTIC SQUARE Time: t ROOT SCHEME (b) The perturbation of measurements used in the EnKF FIGURE 6 Ensemble Kalman smoother. (a) shows the inverse esti- standard analysis equation (57) is an additional source of mate (red line) and reference solution (blue line) for x . (b) shows sampling error. However, methods such as the square root the corresponding estimated standard deviations (red line) as well scheme compute the analysis without perturbing the mea- as the absolute value of the difference between the reference so- surements [10]–[13], [43], [44]. lution and the estimate, that is, the real posterior errors (blue line). (Reproduced from [38] with permission.) Based on results from [10]–[13], a variant of the square root analysis scheme is derived in [42] and further elab- orated on in [8]. The perturbation of measurements is analysis equation for the covariance update (28) in the Kal- avoided, and the scheme solves for the analysis without man filter. By using the ensemble covariances, (28) can be imposing any additional approximations, such as the as- written as sumption of uncorrelated measurement errors or knowl- e a edge of the inverse of the measurement error covariance (Ccc ) e f e f T e f T 21 e f matrix. This implementation requires the inverse of the 5 (Ccc ) 2 (Ccc ) M (M(Ccc ) M 1 R) M(Ccc ) , (65) matrix C , defined in (61), which can be computed efficiently, either using the low-rank ensemble representation with the time index dropped for convenience. When using

Ce or by projecting the measurement error covariance ma- the ensemble representation for the error covariance matrix e trix onto the space defined by the columns in S from (60). CPP defined in (52), (65) becomes This version of the square root scheme is now presented.

ar arT r T 21 rT Updating the Mean A A 5 A (I 2 S C S)A , (66) In the square root scheme, the analyzed ensemble mean is computed from the standard Kalman filter analysis equa- where S and C are defined in (60) and (61), and we drop the tion, which can be obtained by multiplying the first line superscripts “f” on the forecast ensemble. We now derive r in (62) from the right with 1N , so that each column in the an equation for updating the ensemble perturbations A by resulting equation for the mean becomes defining a factorization of (66), which does not involve the measurements or measurement perturbations. r c a 5 c f 1 A STC21 (d 2 M c f ). (64) We start by forming C as defined in (61). For now we assume that C21 exists, which requires that the rank of the ensemble be greater than the number of measurements. Updating the Ensemble Perturbations The low-rank case involves pseudo inversion [8]. Note also e The deterministic algorithm used to update the ensem- that the use of a full rank CPP can result in a full rank C ble perturbations is derived starting from the traditional even when m $ N.

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 97 By computing the eigenvalue decomposition ZLZT 5 C, updated perturbations, the vector (1/N)1, where 1 [ RN we obtain the inverse of C as has all components equal to one, must be an eigenvector of the square root transformation matrix T . As noted in C21 5 ZL21ZT, (67) [44] and [47], this condition is not satisfied for the update in (72). where Z [ Rm3m is an orthogonal matrix and L [ Rm3m is Multiplying (73) from the right with the vector 1 and as- diagonal. The eigenvalue decomposition may be the most suming that (1/N)1 is an eigenvector of T , we can write demanding computation required for the analysis when m r r r is large. An efficient alternative inversion algorithm is pre- 0 5 Aa 1 5 A T1 5lA 1 5 0. (74) sented in [8]. We now write (66) as Equation (74) shows that a sufficient condition for the mean to be unbiased is that (1/N)1 be an eigenvector of T . r r r r Aa Aa T 5 A (I 2 STZL21ZTS)A T If the transform matrix is of full rank, then this condition is

r 21/2 also necessary [47]. 5 A (I 2 (L ZTS) T (L21/2 T )) rT Z S A The symmetric square root solution for the analysis en- r T rT 5 A (I 2 X2X2 )A , (68) semble perturbations is defined as

X [ Rm3N ar r T 1/2 T where 2 is defined as A 5 A V2 (I 2S2S2 ) V2. (75)

21/2 T X2 5L Z S, (69) It is easy to show that (75) is also a factorization

of (71) since V2 is an orthogonal matrix. As shown in and where rank (X2) = min(m, N–1). Thus, X2 is a projection [44], [47], the symmetric square root has an eigenvector of S onto the eigenvectors of C scaled by the square root of equal to (1/N)1 and is unbiased. In addition, the sym- the eigenvalues of C . metric square root resolves the issue with outliers in the Next we compute the singular value decomposition of factorization used in (72). The analysis update of the

X2 given by perturbations becomes a symmetric contraction of the forecast ensemble perturbations. Thus, if the predicted T U2S2V2 5 X2, (70) ensemble members have a non-Gaussian distribution, then the updated distribution retains the shape but the [ Rm3m [ Rm3N [ RN3N with U2 , S2 and V2 . Since U2 variance is reduced. and V2 are orthogonal matrices, (68) can be written A randomization of the analysis update can be used to generate updated perturbations that better resemble a ar arT r 3 T 4T 3 T 4 rT A A 5 A (I 2 U2S2V2 U2S2V2 )A Gaussian distribution [42]. Thus, we write the symmetric r T T rT square root solution (75) as 5 A (I 2 V2S2S2V2 )A r r T T T r r 5 A V2 (I 2S2S2 )V2A a T 1/2 T T A 5 A V2 (I 2S2S2 ) V2F , (76) r " T r " T T 5 (A V2 I 2S2S2 )(A V2 I 2S2S2 ) . (71) where F is a mean-preserving random orthogonal matrix, Thus, a solution for the analysis ensemble perturbations is which can be computed using the algorithm from [44]. The properties of the square root schemes are illus- ar r " T A 5 A V2 I 2S2S2. (72) trated in Figure 7, which shows the resulting ensemble updates using several variants of the EnKF analysis scheme. As noted in [45] the update equation (72) does not conserve The Lorenz equations (46)–(48) are used since the strong the mean of the ensemble perturbations and in fact leads to nonlinearities lead to the development of a non-Gaussian the production of outliers that contain most of the ensem- distribution for the forecast ensemble. Three observations ble variance as explained in [46] and [8], which is further are used in the update step. Each ensemble member is plot- illustrated in the example below. ted as a circle in the x, y plane. In Figure 7(a) and (b) the We now write the square root update in the more gen- forecast ensemble members are plotted as the blue circles, eral form which have a non-Gaussian distribution in the x, y plane. In Figure 7(a) the updated analysis from the “one- r r Aa 5 A T, (73) sided” square root scheme in (72) is shown as the yellow circles. It can be seen that N 2 3 of the updated ensemble where T is a square root transformation matrix. perturbations collapse onto (0, 0), while the three non- It is shown in [44] and [47] that for the square root analy- zero “outliers,” one for each measurement, determine sis scheme to be unbiased and preserve the zero mean in the the ensemble variance. However, one of the outliers is

98 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 too close to zero to be distinguished from the other points at zero. The 0.0005 variance of the updated ensemble is Forecast correct, but the analysis introduces a 0 SQRT Sym bias through a shift in the ensemble EnKF No-Pert SQRT One-Sided mean. The shift in the mean should –0.0005 come as no surprise since we do not impose a condition for the conserva- –0.001 tion of the mean when the update equation is derived. The particular –0.0015 (Amplitude) ensemble collapse related to the use y of (72) is discussed and explained in –0.002 [8]. It is in fact shown that with three measurements and a diagonal mea- –0.0025 surement error covariance matrix, we obtain an ensemble with three –0.003 –0.003 –0.0025 –0.002 –0.0015 –0.001 –0.0005 0 0.0005 outliers, while the remainder of the perturbations collapse onto zero. x (Amplitude) In Figure 7(a) the updated analy- (a) sis from the symmetric square root scheme in (75) is shown as the red 0.0005 circles. This scheme has the property Forecast that it rescales the ensemble of pertur- 0 SQRT Sym MPRR EnKF bations without changing the original shape of the perturbations. Thus, the –0.0005 scheme allows for preserving possible non-Gaussian structures in the –0.001 ensemble during the update. We also note that the symmetric square root –0.0015 (Amplitude) scheme from (75) is unbiased and thus y preserves the mean [44]. –0.002 In Figure 7(b) the updated analysis from the symmetric square root –0.0025 scheme from (76), which includes an additional mean-preserving –0.003 –0.003 –0.0025 –0.002 –0.0015 –0.001 –0.0005 0 0.0005 random rotation, is plotted using the green circles. It is clear that the x (Amplitude) ensemble of updated perturbations (b) now has a Gaussian shape, and the non-Gaussian shape of the forecast FIGURE 7 Forecast and analysis ensembles for the Lorenz equations illustrating proper- ensemble perturbations is lost. The ties of the analysis schemes discussed in the text. The data used in these plots were random rotation completely destroys contributed by Dr. Pavel Sakov. any prior structure in the ensemble by randomly redistributing the variability among all of Gaussian, and some of the non-Gaussian properties of the the ensemble members. Thus, the random rotation acts forecast ensemble is retained, as indicated by the two out- as a complete resampling from a Gaussian distribution, liers that represent the tail of the distribution seen in the while preserving the ensemble mean and variance. forecast ensemble. Figure 7(b) also shows the updated analysis from the It is also interesting to consider the standard EnKF standard EnKF scheme from (62), where the measure- scheme when used without perturbation of measurements. ments are randomly perturbed to represent their uncer- It is then clear from (28) that the variance is reduced twice by tainty. The standard EnKF analysis becomes similar to the additional multiplication with I 2 KeM resulting from e the symmetric square root analysis with random rotation. CPP in (28) being identical to zero when the measurements As with the symmetric square root analysis, most of the are not treated as stochastic variables. Figure 7(a) shows that non-Gaussian shape of the forecast ensemble is lost. How- the EnKF scheme without perturbation of measurements ever, only the increment in the standard EnKF analysis is preserves the shape of the forecast distribution in the same

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 99 To a large extent, EnKF overcomes two problems associated with the traditional KF.

way as the symmetric square root scheme, although the Spurious Correlations variance is too low. Thus, the perturbation of measure- The use of a finite ensemble size to approximate the error ments in EnKF both increases the ensemble variance to the covariance matrix introduces sampling errors that are seen “correct” value, and introduces additional randomization. as spurious correlations over long spatial distances or be- The randomization is different from the one observed in tween variables known to be uncorrelated. A result of these (76) since only the increments are randomized in the EnKF sampling errors is that the updated ensemble variance is scheme with perturbation of measurements. underestimated. On the other hand, the consistency of the It is currently not clear which of the analysis schemes, updated variance improves when a larger ensemble is used. that is, the standard EnKF (62), the symmetric square root A spurious correlation between a predicted measurement (75), or the symmetric square root with random rotation and a variable leads to a small nonphysical update of the (76), is best in practice. Probably the choice of analysis variable in each ensemble member, and thus an associated scheme depends on the dynamical model and possibly also variance reduction. This problem is present in all EnKF on the measurement density and ensemble size used. For a applications and can lead to filter divergence. linear dynamical model, the forecast distribution is Gauss- The following example, which is based on the linear ad- ian, and the random rotation is not needed. Thus, we then vection case from Figure 2, illustrates the variance reduc- expect the symmetric square root (75) to be the best choice. tion resulting from spurious correlations. We use the form On the other hand, for a strongly nonlinear dynamical (62) for the EnKF analysis scheme with the update matrix model where non-Gaussian effects are dominant in the pre- X defined from (63). 3 dicted ensemble, the symmetric square root with a random An additional ensemble B [ Rnrand N is generated, rotation (76) or EnKF with perturbed measurements (62) where each row contains random samples from a Gaussian may work better. Both of these schemes introduce Gaussi- distribution with mean equal to zero and variance equal to anity into the analysis update, while a Gaussian forecast one, and the entries in different rows are sampled indepen- ensemble may lead to more consistent analysis updates. dently. Thus, B is the ensemble matrix for a state vector of The random rotation might be considered as a re- independent variables with zero mean and unit variance. sampling from a Gaussian distribution at each analysis At analysis times we compute the updates update. Note again that the random rotation in the square root filter, contrary to the measurement perturbation used Aa Af a b 5 a b X . (77) in EnKF, completely eliminates all previous non-Gaussian Ba Bf structures that may be contained in the forecast ensemble. The predicted ensemble Af is the result of the ensemble SPURIOUS CORRELATIONS, integration using the advection model, while Bf does not LOCALIZATION, AND INFLATION evolve according to any dynamical equation and at an up- Since EnKF is a Monte Carlo method, making this method date time equals Ba at the previous update time. affordable for large systems requires the use of a sufficient- Since the correlations between B and the predicted ly small ensemble of model realizations. Around 100 real- measurement perturbations S become zero in the limit of izations in the ensemble is typical in applications, and in an infinite ensemble size, it follows that many cases we see only marginal improvements when the BST ensemble size is further increased, which is explained by lim 5 0. (78) NS` N 2 1 the slow convergence, proportional to "N, of Monte Carlo methods, together with the fact that a large part of the vari- However, due to the finite ensemble size, (78) can- ability in the state and parameters often is well represented not be exactly satisfied, and Ba experiences a small by an ensemble of 100 model realizations. On the other update and associated reduction of variance through hand, even O(100) model realizations become extremely the update in (77). computationally demanding in many applications, which As in the advection example, we compute the matrix X is an incentive for using as few realizations as possible. In based on the four measurements, and then apply it to B ac- the following we discuss the problems caused by using a cording to (77) at every analysis time. The value nrand 5 100 finite ensemble size and present some remedies that can is found to be sufficient to obtain a consistent result that is reduce the impact of sampling errors. independent of the random sampling of B.

100 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 The variance reduction resulting from the spurious correlations is illustrated in Figure 8, which shows the 1 decrease of the average variance of the random ensemble B, resulting from EnKF with 100 and 250 realiza- 0.8 tions, and from the symmetric square root scheme using 0.6 100 realizations. EnKF with 100 realizations is repeated five more times 0.4 using different random seeds to verify that the result 0.2 is independent of the seed. A nearly linear decrease of variance is obtained during the first 50 updates, while 0 Ensemble Standard Deviation 0102030405060 for the final 12 updates the decrease is lower. The reason Updates for the lower error variance reduction in the final part of EnKF_100_a EnKF_100_b EnKF_100_c the experiment is that the information assimilated at one EnKF_100_d EnKF_100_e EnKF_100 measurement location propagates to the next measure- EnKF_250 SQRT_100 ment location during 50 updates. Thus, after 50 updates the ensemble variance is lower at the measurement loca- FIGURE 8 Variance reduction of a random ensemble due to spuri- tions, and the relative weight on the data compared to the ous correlations, as a function of analysis updates. The ensemble prediction is decreased. EnKF with 250 realizations ex- Kalman ﬁ lter (EnKF) with 100 realizations is compared with EnKF periences a significantly lower impact from spurious cor- with 250 realizations as well as the square root scheme using 100 relations, as expected. realizations. EnKF with 100 realizations is repeated using different seeds to ensure that the results are consistent. The square root scheme is slightly less influenced by the spurious correlations, and an explanation can be that the measurement perturbations in the EnKF update of measurements is much greater than the number of en- increases the strength of the update of individual real- semble realizations. izations and thus amplifies the impact of the spurious Another reason for computing the local analysis is the correlations. fact that EnKF is computed in a space spanned by the en- In many dynamical systems, the variance decrease semble members. This subspace may be rather small com- caused by spurious correlations may be masked by strong pared to the total dimension of the model state. Computing dynamical instabilities. The impact of the spurious correla- the analysis gridpoint by gridpoint implies that, for each tions may then be less significant. On the other hand, in gridpoint, a small model state is solved for in a relatively parameter-estimation problems, the spurious correlations large ensemble space. The analysis then results from a dif- clearly lead to an underestimate of the ensemble variance ferent combination of ensemble members for each grid- of the parameters. point, and the analysis scheme is allowed to reach solutions not originally represented by the ensemble. In many appli- Localization cations the local analysis scheme significantly reduces the We now discuss the use of localization to reduce spurious impact of a limited ensemble size and allows for the use of correlations [48]. Two classes of localization methods are EnKF with high-dimensional model systems. currently used, covariance localization and local updating. The degree of approximation introduced by the local In [48] the ensemble covariance matrix is multiplied analysis depends on the range of influence defined for with a specified correlation matrix through a Schur product the observations. In the limit that this range becomes (entry-wise multiplication). The specified correlation func- sufficiently large to include all of the data, the solution tions are defined with local support and thus effectively for all the gridpoints becomes identical to the standard truncate the long-range spurious correlations produced by global analysis. The range parameter must be tuned the limited ensemble size. Covariance localization is used and should be large enough to include the information in [11], [12], [49], and [50]. from measurements that contribute significantly but We can assume that only measurements located within small enough to eliminate the spurious impact of re- a certain distance from a gridpoint impact the analysis in mote measurements. that gridpoint. This assumption allows for an algorithm The local analysis algorithm goes as follows. We first where the analysis is computed gridpoint by gridpoint, construct the input matrices to the global EnKF, that is, and only a subset of observations, located near the current the measured ensemble perturbations S, the innovations r gridpoint, is used in each local analysis. This approach is D , and either the measurement perturbations E or the used in [51], [52], and [37] and is also the approach used in measurement error covariance matrix CPP. We then loop the local EnKF in [53]. In addition to reducing the impact of through the model grid, and, for each gridpoint, for ex- long-range spurious correlations, the localization methods ample, (i, j) for a two-dimensional model, we extract the make it simpler to handle large data sets where the number rows from these matrices corresponding to measurements

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 101 that are used in the current update, and then compute the correlations with the model variables in the local grid- matrix X(i,j) that defines the update for gridpoint (i, j) . point are assimilated. The analysis at gridpoint (i, j) becomes Thus, while traditional localization methods are distance based, [55]–[57] discuss adaptive localization methods a A(i,j) 5 A(i,j)X(i,j) (79) where the assimilation system determines whether correla-

5 A(i,j)X 1 A(i,j) (X(i,j) 2 X), (80) tions are significant or spurious, and whether a particular measurement shall be used in the update of a particular where X is the global solution, while X(i,j) becomes the so- model variable. The further development of adaptive local- lution for a local analysis corresponding to gridpoint (i, j) ization methods is important for many applications where where only the nearest measurements are used in the anal- distance-based methods are less suitable, an example being ysis. Thus, it is possible to compute the global analysis first the use of measurements that are integral functions of the and then add the corrections from the local analysis if these model state as in [57]. effects are significant. Finally, it is not clear how the local analysis scheme The quality of the EnKF analysis is connected to the en- is best implemented in EnKS. One approach is to define semble size used. We expect that to achieve the same qual- the local analysis to use only measurements in a certain ity of the result, a larger ensemble is needed for the global space-time domain, taking into account the propagation of analysis than the local analysis. In the global analysis, a information in the model together with the time scales of large ensemble is needed to properly explore the state space the model. In [58] EnKS is used with a high-dimensional and to provide a consistent result that is as good as the local atmospheric circulation model. The impact of spurious cor- analysis. Note also that the use of a local analysis scheme relations related to the lag time in a lagged EnKS is studied, is likely to introduce nondynamical modes, although the and it is pointed out that the lagged implementation facili- amplitudes of these modes are small if a large enough in- tates localization in time. fluence radius is used when selecting measurements. We also refer to the discussions on localization and filtering of Inﬂ ation long-range correlations by [54]. A covariance inflation procedure [59] can be used to coun- In adaptive localization methods, the assimilation sys- teract the variance reduction observed due to the impact tem itself is used to determine the localization strategy. of spurious correlations as well as other effects leading to Such algorithms are useful since the dynamical covari- underestimation of the ensemble variance. The impact of ance functions change in space and time, and the spuri- ensemble size on noise in distant covariances is examined ous correlations depend on the ensemble size. Thus, every in [49], while the impact of using an “inflation factor” as assimilation problem and ensemble size requires a separate discussed in [59] is evaluated. The inflation factor is used tuning of the localization parameters. to replace the forecast ensemble according to The hierarchical approach in [55] uses several small ensembles to explore the need for using localization in the cj 5r(cj 2 c) 1 c, (81) analysis. This approach uses a Monte Carlo method based on splitting the ensemble into several small ensembles to with r slightly greater than one (typically 1.01). The in- assess the sampling errors and the spurious correlations. flation procedure is also used in [60], where the EnKF is This method is a statistically consistent approach to the examined in an application with the Lorenz attractor, and problem. However, the localization is optimized for a small results are compared with those obtained from different ensemble and may become suboptimal when used with the versions of the singular evolutive extended Kalman (SEEK) full ensemble including all realizations. filter and a particle filter. In [60], ensembles with very few An alternative localization method in [56] is based on members are used, which favors methods like the SEEK the online computation of a flow-dependent moderation where the “ensemble” of empirical orthogonal functions function that is used to damp long-range and spurious cor- (EOFs) is selected to best represent the model attractor. relations. This method is named SENCORP for “smoothed Several approaches adaptively estimate an optimal infla- ensemble correlations raised to a power.” The idea is that the tion parameter. In [61] the covariance inflation is estimated moderation functions can be generated from a smoothed based on the sequence of innovation statistics, while in [62] covariance function, which, when raised to a power, damps a method is presented that is based on augmenting the in- small correlations. flation parameter to the model state where it is updated as In [57] a local analysis method handles measurements a parameter in the EnKF analysis computations. Online es- that are integral parameters of the model state. The idea timation of the inflation parameter is also studied in [63] is that the covariance matrix of the predicted measure- together with the simultaneous estimation of observation ments is computed globally using the full model state, errors. It is found that the estimation of inflation alone does while the updates are computed locally gridpoint by grid- not work appropriately without accurate observation error point, and only the measurements that have significant statistics, and vice versa.

102 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009 Clearly, the inflation parameter becomes a tuning pa- discussed, and it is shown that it can be computed efficient- rameter, and optimally it is best estimated adaptively. The ly in the space spanned by the ensemble realizations. The need for inflation depends on the use of a local versus glob- square root scheme is discussed as an alternative method al analysis scheme, and the use of a local scheme can to a that avoids the perturbation of measurements. However, large extent reduce the need for an additional inflation. the square root scheme has other pitfalls, and it is recom- Here we describe an alternative approach for estimating mended to use the symmetric square root with or with- the inflation coefficient. In the spurious correlation example, out a random rotation. The random rotation introduces a as presented in Figure 8, an independent ensemble is used to stochastic component to the update, and the quality of the quantify the variance reduction due to spurious correlations. scheme may then not improve compared to the original A simple algorithm for correcting the analyzed ensemble stochastic EnKF scheme with perturbed measurements. perturbations in each analysis step goes as follows. At each analysis time we generate the additional ensem- ACKNOWLEDGMENTS ble matrix Bf with random normally distributed numbers, I am grateful for many fruitful discussions with Pavel such that the mean in each row is exactly zero, and the vari- Sakov and Laurent Bertino during the preparation of this ance is exactly equal to one. We thus sample the matrix ran- article, and I thank Pavel Sakov for providing the data used domly from N(0, 1) . Then, for each row, first subtract any to illustrate the properties of the different analysis schemes nonzero mean, then compute the standard deviation and in Figure 7 and for pointing out that the use of an ensemble scale all entries by it. Then, compute the analysis update representation of the measurement error covariance matrix according to (77). For each row in Ba , compute the standard leads to an exact cancellation in the second last line in (27). deviation. The inflation factor r is then defined as one over The extensive feedback provided by the reviewers contrib- the average of the standard deviations from each row in Ba . uted to significantly improving the quality of the manu- The accuracy of the estimated inflation factor depends on script. I am partly supported by the Norwegian Research the number of realizations used as well as the number of Council through the Evita EnKF Project. rows in B. It is expected that with a low number of realizations additional rows in B might compensate for the sam- AUTHOR INFORMATION pling errors when computing the inflation factor. Geir Evensen ([email protected]) received the Ph.D. This algorithm provides a good first approximation of in applied mathematics from the University of Bergen, the inflation factor needed to counteract variance reduc- Norway, in 1992. He has published more than 40 articles tion due to long-range spurious correlations resulting from related to data assimilation, including Data Assimilation: sample noise. The estimated inflation factor depends on the The Ensemble Kalman Filter. He currently works on data number of realizations used, the number of measurements, assimilation and parameter estimation as a research direc- and the strength of the update determined by the innova- tor at StatoilHydro in Bergen and holds an associate posi- tion vector and both the predicted and measurement error tion at the Nansen Center in Bergen. covariance matrices. A question remains, as to whether the inflation is best applied equally for the whole model state, REFERENCES including at the measurement locations. [1] G. Evensen, “Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics,” CONCLUSIONS J. Geophys. Res., vol. 99, no. C5, pp. 10143–10162, 1994. [2] R. E. Kalman, “A new approach to linear filter and prediction problems,” This article provides a fundamental theoretical basis for J. Basic Eng., vol. 82, pp. 35–45, 1960. understanding EnKF and serves as a useful text for future [3] R. E. Kalman and R. S. Bucy, “New results of linear filtering and predic- users. Data assimilation and parameter-estimation prob- tion theory,” J. Basic Eng., vol. 83, pp. 95–108, 1961. [4] O. Talagrand and P. Courtier, “Variational assimilation of meteorologi- lems are explained, and the concept of joint parameter cal observations with the adjoint vorticity equation. I. Theory,” Q. J. R. Me- and state estimation, which can be solved using ensemble teorol. Soc., vol. 113, pp. 1311–1328, 1987. methods, is presented. KF and EKF are briefly discussed [5] E. Kalnay, H. Li, T. Miyoshi, S.-C. Yang, and J. Ballabrera-Poy, “4D-Var or ensemble Kalman filter?,” Tellus A, vol. 59, pp. 758–773, 2007. before introducing and deriving EnKF. Similarities and [6] E. J. Fertig, J. Harlim, and B. R. Hunt, “A comparative study of 4D-Var differences between KF and EnKF are pointed out. The and a 4D ensemble Kalman filter: Perfect model simulations with Lorenz- benefits of using EnKF with high-dimensional and highly 96,” Tellus A, vol. 59, pp. 96–101, 2006. [7] G. Evensen, “Using the extended Kalman filter with a multilayer quasi- nonlinear dynamical models are illustrated by examples. geostrophic ocean model,” J. Geophys. Res., vol. 97, no. C11, pp. 17905–17924, EnKF and EnKS are also derived from Bayes theorem, us- 1992. ing a probabilistic approach. The derivation is based on [8] G. Evensen, Data Assimilation: The Ensemble Kalman Filter. New York: Springer, 2007. the assumption that measurement errors are independent [9] G. Burgers, P. J. van Leeuwen, and G. Evensen, “Analysis scheme in the in time and the model represents a Markov process, which ensemble Kalman filter,” Mon. Weather Rev., vol. 126, pp. 1719–1724, 1998. allows for Bayes theorem to be written in a recursive form, [10] J. L. Anderson, “An ensemble adjustment Kalman filter for data assimilation,” Mon. Weather Rev., vol. 129, pp. 2884–2903, 2001. where measurements are processed sequentially in time. [11] J. S. Whitaker and T. M. Hamill, “Ensemble data assimilation without The practical implementation of the analysis scheme is perturbed observations,” Mon. Weather Rev., vol. 130, pp. 1913–1924, 2002.

JUNE 2009 « IEEE CONTROL SYSTEMS MAGAZINE 103 [12] C. H. Bishop, B. J. Etherton, and S. J. Majumdar, “Adaptive sampling [37] G. Evensen, “The ensemble Kalman filter: Theoretical formulation and with the ensemble transform Kalman filter. Part I: Theoretical aspects,” practical implementation,” Ocean Dyn., vol. 53, pp. 343–367, 2003. Mon. Weather Rev., vol. 129, pp. 420–436, 2001. [38] G. Evensen and P. J. van Leeuwen, “An ensemble Kalman smoother for [13] M. K. Tippett, J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. nonlinear dynamics,” Mon. Weather Rev., vol. 128, pp. 1852–1867, 2000. Whitaker, “Ensemble square-root filters,” Mon. Weather Rev., vol. 131, pp. [39] P. J. van Leeuwen and G. Evensen, “Data assimilation and inverse 1485–1490, 2003. methods in terms of a probabilistic formulation,” Mon. Weather Rev., vol. [14] A. Doucet, N. de Freitas, and N. Gordon, Eds. Sequential Monte Carlo 124, pp. 2898–2913, 1996. Methods in Practice (Statistics for Engineering and Information Science). New [40] G. Evensen, “Advanced data assimilation for strongly nonlinear dy- York: Springer-Verlag, 2001. namics,” Mon. Weather Rev., vol. 125, pp. 1342–1354, 1997. [15] C. L. Keppenne and M. Rienecker, “Initial testing of a massively paral- [41] E. N. Lorenz, “Deterministic nonperiodic flow,” J. Atmos. Sci., vol. 20, lel ensemble Kalman filter with the Poseidon isopycnal ocean general cir- pp. 130–141, 1963. culation model,” Mon. Weather Rev., vol. 130, pp. 2951–2965, 2002. [42] G. Evensen, “Sampling strategies and square root analysis schemes for the [16] C. L. Keppenne and M. Rienecker, “Assimilation of temperature into an EnKF,” Ocean Dyn., vol. 54, pp. 539–560, 2004. isopycnal ocean general circulation model using a parallel ensemble Kal- [43] O. Leeuwenburgh, “Assimilation of along-track altimeter data in the man filter,” J. Mar. Syst., vol. 40-41, pp. 363–380, 2003. Tropical Pacific region of a global OGCM ensemble,” Q. J. R. Meteorol. Soc., [17] C. L. Keppenne, M. Rienecker, N. P. Kurkowski, and D. A. Adamec, vol. 131, pp. 2455–2472, 2005. “Ensemble Kalman filter assimilation of temperature and altimeter data [44] P. Sakov and P. R. Oke, “Implications of the form of the ensemble trans- with bias correction and application to seasonal prediction,” Nonlinear Pro- form in the ensemble square root filters,” Mon. Weather Rev., vol. 136, no. 3, cess. Geophys., vol. 12, pp. 491–503, 2005. pp. 1042–1053, 2008. [18] I. Szunyogh, E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, [45] X. Wang, C. H. Bishop, and S. J. Julier, “Which is better, an ensemble E. Ott, and J. A. Yorke, “Assessing a local ensemble Kalman filter: Perfect of positive-negative pairs or a centered spherical simplex ensemble,” Mon. model experiments with the national centers for environmental prediction Weather Rev., vol. 132, pp. 1590–1605, 2004. global model,” Tellus A, vol. 57, pp. 528–545, 2005. [46] O. Leeuwenburgh, G. Evensen, and L. Bertino, “The impact of ensem- [19] P. L. Houtekamer and H. L. Mitchell, “Ensemble Kalman filtering,” Q. ble filter definition on the assimilation of temperature profiles in the Tropi- J. R. Meteorol. Soc., vol. 131, pp. 3269–3289, 2005. cal Pacific,” Q. J. R. Meteorol. Soc., vol. 131, pp. 3291–3300, 2005. [20] B. R. Hunt, E. J. Kostelich, and I. Szunyogh, “Efficient data assimila- [47] D. M Livings, S. L. Dance, and N. K. Nichols, “Unbiased ensemble tion for spatiotemporal chaos: A local ensemble transform Kalman filter,” square root filters,” Physica D, vol. 237, pp. 1021–1028, 2008. Physica D, vol. 230, pp. 112–126, 2007. [48] P. L. Houtekamer and H. L. Mitchell, “A sequential ensemble Kalman [21] I. Szunyogh, E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, filter for atmospheric data assimilation,” Mon. Weather Rev., vol. 129, pp. E. Satterfield, and J. A. Yorke, “A local ensemble transform Kalman filter 123–137, 2001. data assimilation system for the NCEP global model,” Tellus A, vol. 60, pp. [49] T. M. Hamill, J. S. Whitaker, and C. Snyder, “Distance-dependent fil- 113–130, 2008. tering of background error covariance estimates in an ensemble Kalman [22] J. S. Whitaker, T. M. Hamill, X. Wei, Y. Song, and Z. Toth, “Ensemble filter,” Mon. Weather Rev., vol. 129, pp. 2776–2790, 2001. data assimilation with the NCEP global forecast system,” Mon. Weather [50] J. L. Anderson, “A local least squares framework for ensemble filter- Rev., vol. 136, pp. 463–482, 2008. ing,” Mon. Weather Rev., vol. 131, pp. 634–642, 2003. [23] K. Eben, P. Juru, J. Resler, M. Belda, E. Pelikán, B. C. Krger, and J. Ked- [51] V. E. Haugen and G. Evensen, “Assimilation of SLA and SST data into er, “An ensemble Kalman filter for short-term forecasting of tropospheric an OGCM for the Indian ocean,” Ocean Dyn., vol. 52, pp. 133–151, 2002. ozone concentrations,” Q. J. R. Meteorol. Soc., vol. 131, pp. 3313–3322, 2005. [52] K. Brusdal, J. M. Brankart, G. Halberstadt, G. Evensen, P. Brasseur, P. J. [24] O. Barrero Mendoza, B. De Moor, and D. S. Bernstein, “Data assimila- van Leeuwen, E. Dombrowsky, and J. Verron, “An evaluation of ensemble tion for magnetohydrodynamics systems,” J. Comput. Appl. Math., vol. 189, based assimilation methods with a layered OGCM,” J. Mar. Syst., vol. 40-41, no. 1, pp. 242–259, 2006. pp. 253–289, 2003. [25] Y. Zhou, D. McLaughlin, and D. Entekhabi, “An ensemble-based [53] E. Ott, B. Hunt, I. Szunyogh, A. V. Zimin, E. Kostelich, M. Corazza, E. smoother with retrospectively updated weights for highly nonlinear sys- Kalnay, D. J. Patil, and J. A. Yorke, “A local ensemble Kalman filter for atmo- tems,” Mon. Weather Rev., vol. 135, pp. 186–202, 2007. spheric data assimilation,” Tellus A, vol. 56, pp. 415–428, 2004. [26] M. Eknes and G. Evensen, “Parameter estimation solving a weak con- [54] H. L. Mitchell, P. L. Houtekamer, and G. Pellerin, “Ensemble size, and straint variational formulation for an Ekman model,” J. Geophys. Res., vol. model-error representation in an ensemble Kalman filter,” Mon. Weather 102, no. C6, pp. 12479–12491, 1997. Rev., vol. 130, pp. 2791–2808, 2002. [27] J. C. Muccino and A. F. Bennett, “Generalized inversion of the Ko- [55] J. L. Anderson, “Exploring the need for localization in the ensemble rteweg-de Vries equation,” Dyn. Atmos. Oceans, vol. 35, pp. 227–263, 2001. data assimilation using a hierarchical ensemble filter,” Physica D, vol. 230, [28] A. F. Bennett, Inverse Methods in Physical Oceanography. Cambridge, pp. 99–111, 2007. U.K.: Cambridge Univ. Press, 1992. [56] C. H. Bishop and D. Hodyss, “Flow-adaptive moderation of spurious [29] A. F. Bennett, Inverse Modeling of the Ocean and Atmosphere. Cambridge, ensemble correlations and its use in ensemble-based data assimilation,” Q. U.K.: Cambridge Univ. Press, 2002. J. R. Meteorol. Soc., vol. 133, pp. 2029–2044, 2007. [30] G. Evensen, J. Hove, H. C. Meisingset, E. Reiso, K. S. Seim, and Ø. Espe- [57] E. J. Fertig, B. R. Hunt, E. Ott, and I. Szunyogh, “Assimilating non- lid, “Using the EnKF for assisted history matching of a North Sea reservoir local observations with a local ensemble Kalman filter,” Tellus A, vol. 59, model,” Houston, TX, Soc. Petroleum Eng., Inc., SPE 106184, 2007. pp. 719–730, 2007. [31] R. N. Miller, “Perspectives on advanced data assimilation in strongly [58] S. P. Khare, J. L. Anderson, T. J. Hoar, and D. Nychka, “An investigation nonlinear systems,” in Data Assimilation: Tools for Modelling the Ocean in a into the application of an ensemble Kalman smoother to high-dimensional Global Change Perspective (NATO ASI, vol. I 19), P. P. Brasseur and J. C. J. geophysical systems,” Tellus A, vol. 60, pp. 97–112, 2008. Nihoul, Eds. Berlin: Springer-Verlag, 1994, pp. 195–216. [59] J. L. Anderson and S. L. Anderson, “A Monte Carlo implementation [32] R. N. Miller, M. Ghil, and F. Gauthiez, “Advanced data assimilation in of the nonlinear filtering problem to produce ensemble assimilations and strongly nonlinear dynamical systems,” J. Atmos. Sci., vol. 51, pp. 1037–1056, forecasts,” Mon. Weather Rev., vol. 127, pp. 2741–2758, 1999. 1994. [60] D. T. Pham, “Stochastic methods for sequential data assimilation in [33] P. Gauthier, P. Courtier, and P. Moll, “Assimilation of simulated wind strongly nonlinear systems,” Mon. Weather Rev., vol. 129, pp. 1194–1207, lidar data with a Kalman filter,” Mon. Weather Rev., vol. 121, pp. 1803–1820, 2001. 1993. [61] X. Wang and C. H. Bishop, “A comparison of breeding and ensemble [34] F. Bouttier, “A dynamical estimation of forecast error covariances in an transform Kalman filter ensemble forecast schemes,” J. Atmos. Sci., vol. 60, assimilation system,” Mon. Weather Rev., vol. 122, pp. 2376–2390, 1994. pp. 1140–1158, 2003. [35] A. H. Jazwinski, Stochastic Processes and Filtering Theory. San Diego, CA: [62] J. L. Anderson, “An adaptive covariance inflation error correction algo- Academic, 1970. rithm for ensemble filters,” Tellus A, vol. 59, pp. 210–224, 2007. [36] R. H. Reichle, D. B. McLaughlin, and D. Entekhabi, “Hydrologic data [63] H. Li, E. Kalnay, and T. Miyoshi, “Simultaneous estimation of covari- assimilation with the ensemble Kalman filter,” Mon. Weather Rev., vol. 130, ance inflation and observation errors within ensemble Kalman filter,” Q. J. pp. 103–114, 2002. R. Meteorol. Soc., vol. 135, no. 639, pp. 523–533, 2009.

104 IEEE CONTROL SYSTEMS MAGAZINE » JUNE 2009