THE AMERICAN STATISTICIAN ,VOL.,NO., – http://dx.doi.org/./..

Understanding the Ensemble

Matthias Katzfussa, Jonathan R. Stroudb, and Christopher K. Wiklec aDepartment of Statistics, Texas A&M University, College Station, TX, USA; bDepartment of Statistics, George Washington University, Washington DC, USA; cDepartment of Statistics, University of Missouri, Columbia, MO, USA

ABSTRACT ARTICLE HISTORY The ensemble Kalman flter (EnKF) is a computational technique for approximate inference in state-space Received June  models. In typical applications, the state vectors are large spatial felds that are observed sequentially over Revised November  time. The EnKF approximates the Kalman flter by representing the distribution of the state with an ensem- KEYWORDS ble of draws from that distribution. The ensemble members are updated based on newly available data by ; shifting instead of reweighting, which allows the EnKF to avoid the degeneracy problems of reweighting- Forecasting; Kalman based algorithms. Taken together, the ensemble representation and shifting-based updates make the EnKF smoother; Sequential Monte computationally feasible even for extremely high-dimensional state spaces. The EnKF is successfully used in Carlo; State-space models data-assimilation applications with tens of millions of dimensions. While it implicitly assumes a linear Gaus- sian state-space model, it has also turned out to be remarkably robust to deviations from these assumptions in many applications. Despite its successes, the EnKF is largely unknown in the statistics community. We aim to change that with the present article, and to entice more statisticians to work on this topic.

1.Introduction of inference called fltering, which attempts to obtain sequen- tially the posterior distribution of the state at the current time involves combining observations with “prior point based on all observations collected so far. The combina- knowledge” (e.g., mathematical representations of mechanistic tion of high dimensionality and nonlinearity makes this a very relationships; numerical models; model output) to obtain an challenging problem. estimate of the true state of a system and the associated uncer- The ensemble Kalman flter (EnKF) is an approximate flter- tainty of that estimate (see, e.g., Nychka and Anderson 2010,for ing method introduced in the geophysics literature by Evensen areview).Althoughdataassimilationisrequiredinmanyfelds, (1994). In contrast to the standard Kalman flter (Kalman 1960), its origins as an area of scientifcinquiryaroseoutoftheweather which works with the entire distribution of the state explicitly, forecasting problem in geophysics. With the advent of the dig- the EnKF stores, propagates, and updates an ensemble of vectors ital computer, one of the frst signifcant applications was the that approximates the state distribution. This ensemble repre- integration of the partial diferential equations that described sentation is a form of dimension reduction, in that only a small the evolution of the atmosphere, for the purposes of short-to- ensemble is propagated instead of the joint distribution includ- medium range weather forecasting. Such a numerical model ing the full . When new observations become requires initial conditions from real-world observations that are available, the ensemble is updated by a linear “shift” based on the physically plausible. Observations of the atmosphere have vary- assumption of a linear Gaussian state-space model. Hence, addi- ing degrees of measurement uncertainty and are fairly sparse in tional approximations are introduced when non-Gaussianity or space and time, yet numerical models require initial conditions nonlinearity is involved. However, the EnKF has been highly that match the relatively dense spatial domain of the model. Data successful in many extremely high-dimensional, nonlinear, and assimilation seeks to provide these “interpolated” felds while non-Gaussian data-assimilation applications. It is an embodi- accounting for the uncertainty of the observations and using the ment of the principle that an approximate solution to the right numerical model itself to evolve the atmospheric state variables problem is worth more than a precise solution to the wrong in a physically plausible manner. Thus, data assimilation consid- problem (Tukey 1962). For many realistic, highly complex sys- ers an equation for measurement error in the observations and tems, the EnKF is essentially the only way to do (approximate) an equation for the state evolution, a so-called state-space model inference, while alternative exact inference techniques can only (see, e.g., Wikle and Berliner 2007). be applied to highly simplifed versions of the problem. In the geophysical problems that motivated the development The key diference between the EnKF and other sequential of data assimilation, the state and observation dimensions are Monte Carlo algorithms (e.g., particle flters) is the use of a lin- huge and the evolution operators associated with the numeri- ear updating rule that converts the prior ensemble to a posterior cal models are highly nonlinear. From a statistical perspective, ensemble after each observation. Most other sequential Monte obtaining estimates of the true system state and its uncertainty Carlo algorithms use a reweighting or resampling step, but it is in this environment can be carried out, in principle, by a type well known that the weights degenerate (i.e., all but one weight

CONTACT Matthias Katzfuss [email protected] Department of Statistics, Texas A&M University,  TAMU, College Station, TX . Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/TAS. ©  American Statistical Association THE AMERICAN STATISTICIAN: GENERAL 351 are essentially zero) in high-dimensional problems (Snyder et al. Assuming that the fltering distribution at the previous time 2008). t 1isgivenby − Most of the technical development and application of the xt 1 y1:t 1 n(µt 1, !t 1), (3) EnKF has been in the geophysics literature (e.g., Burgers, van − | − ∼ N − − Leeuwen, and Evensen 1998;HoutekamerandMitchell1998; the forecast step computes the forecast distribution for time t ! ! Bishop, Etherton, and Majumdar 2001;Tippettetal.2003;Ott based on (2)as et al. 2004;seealsoAnderson2009,forareview)andithas received relatively little attention in the statistics literature. This xt y1:t 1 n(µt , !t ), | − ∼ N is at least partially because the jargon and notation in the geo- µt : Mt µt 1, physics literature can be daunting upon frst glance. Yet, with =" " − !t : Mt !t 1Mt′ Qt . (4) the increased interest in approximate computational methods " = ! − + for “big data” in statistics, it is important that statisticians be The update step modif"es the forecast! distribution using the aware of the power of this relatively simple methodology. We new data yt . The update formula can be easily derived by con- also believe that statisticians have much to contribute to this area sidering the joint distribution of xt and yt conditional on the of research. Thus, our goal in this article is to provide the ele- past data y1:t 1,whichisgivenbyamultivariatenormaldistri- − mentary background and concepts behind the EnKF in a nota- bution, tion that is more common to the statistical state-space literature. Although the real strength of the EnKF is its application to high- xt µt !t !t H′ y , t . 1:t 1 n mt µ dimensional nonlinear and non-Gaussian problems, for peda- yt − ∼ N + $ Ht t $ Ht !t Ht !t H′ Rt %% ! "# ! " t + gogical purposes, we focus primarily on the case of linear mea- # " " " (5) # surement and evolution models with Gaussian errors to better Using well-known properties of" the multivariate" " normal distri- illustrate the approach. bution, it follows that x y (µ , ! ),wheretheupdate t | 1:t ∼ Nn t t In Section 2,wereviewthestandardKalmanflter for linear equations are given by Gaussian state-space models. We then show in Section 3 how the ! ! basic EnKF can be derived based on the notions of conditional µt : µt Kt (yt Ht µt ) (6) = + − simulation, with an interpretation of shifting samples from the !t : (In Kt Ht )!t , (7) prior to the posterior based on observations. In Section 4,we ! = " − " 1 discuss issues, extensions, and operational variants of the basic and Kt : !t H′ (Ht !t H′ Rt )− is the so-called Kalman gain = t ! t + " EnKF, and Section 5 concludes. matrix of size n mt . × An alternative" expression" for the update equations is given by

1 1 2.State-SpaceModelsandTheKalmanFilter µ ! (!− µ H′ R− y ) (8) t = t t t + t t t 1 1 1 For discrete time points t 1, 2,..., assume a linear Gaussian !− !− H′ R− Ht , (9) = t = !t "+ t t state-space model, ! " where the second! equation" is obtained from (7)usingthe Sherman–Morrison–Woodbury formula (Sherman and Morri- yt Ht xt vt , vt mt (0, Rt ), (1) = + ∼ N son 1950;Woodbury1950). These equations allow a nice inter- xt Mt xt 1 wt , wt n(0, Qt ), (2) = − + ∼ N pretation of the Kalman flter update. The fltered in (8) is a weighted average of the prior mean µt and the observation where yt is the observed mt -dimensional data vector at time t, vector yt ,wheretheweightsareproportionaltothepriorpreci- xt is the n-dimensional unobserved state vector of interest, and ! 1 1 the observation and innovation error vt and wt are mutually and sion t− ,andHt′ Rt− ,acombinationoftheobservationmatrix" serially independent. We call Equations (1)–(2)theobservation and the data precision matrix. The posterior precision in (9)is model and the evolution model, respectively. The observation the sum" of the prior precision and the data precision (projected matrix Ht relates the state to the observation, and the evolution onto the state space). (propagator) matrix Mt determines how the state evolves over In summary, the Kalman flter provides the exact fltering dis- time. In weather prediction problems, the state and observation tribution for linear Gaussian state-space models as in (1)–(2). dimensions are often enormous (i.e., n 107 and m 105). In However, if n or mt are large, calculating and storing the n n ≥ t ≥ × addition, with the exception of Section 5.3, we assume that M , matrices !t and !t and calculating the inverse of the mt mt t × Ht , Rt ,andQt are known, which is often the assumption in geo- matrix in the Kalman gain (below (7)) are extremely expensive, physical applications. and approximations" ! become necessary. An important form of inference for state-space models is fltering. In this framework, the goal at every time point t = 3.TheEnsembleKalmanFilter(EnKF) 1, 2,...,istoobtainthefltering distribution of the state; in Bayesian terms, this is equivalent to the posterior distribution of The ensemble Kalman flter(EnKF)can be viewedasan approx- x given y : y , y ,...,y ,thedatauptotimet. This distri- imate version of the Kalman flter, in which the state distribu- t 1:t ={ 1 2 t } bution is the conditional distribution of xt given y1:t ,whichwe tion is represented by a sample or “ensemble” from the distribu- denote by x y .FilteringforalinearGaussianmodelasin(1)– tion. This ensemble is then propagated forward through time t | 1:t (2)canbecarriedoutusingtheKalmanflter, which consists of and updated when new data become available. The ensemble two steps at every time point: a forecast step and an update step. representation is a form of dimension reduction, which leads 352 M. KATZFUSS ET AL. to computational feasibility even for very high-dimensional sys- in the case of complex nonlinear models, from various “spin- tems. up” algorithms (e.g., Hoteit et al. 2015), one can implement the (1) (N) Specifcally, assume that the ensemble xt 1,...,xt 1 is a EnKF in Algorithm 1. − − (i) sample from the fltering distribution at time t 1in(3): xt 1 − − ∼ Algorithm 1: Stochastic EnKF n(µt 1, !t 1).SimilartotheKalmanflter,& the EnKF& consists ( ) ( ) − − 1 N N Start with an initial ensemble x0 ,...,x0 . Then, at each of a forecast step and an update step at every time point t&. (1) (N) time t 1, 2,...,givenanensemblext 1,...,xt 1 of The! EnKF! forecast step obtains a sample from the forecast = − − draws from the fltering distribution& at& time t 1, the distribution (4)bysimplyapplyingtheevolutionequationin(2) − to each ensemble member: stochastic EnKF carries out the following& two steps& for i 1,...,N: (i) (i) (i) (i) = (i) ( , ) xt Mt xt 1 wt , wt n(0, Qt ), i 1,...,N. 1. Forecast Step: Draw wt n 0 Qt and calculate = − + ∼ N = (i) (i) (i) ∼ N (10) xt Mt xt 1 wt . = − + (i) (µ , ! ) (i) ( , ) It' is easy to& verify that xt n t t holds exactly. 2. Update Step: Draw vt mt 0 Rt and calculate ∼ N ( ) ( ) ( ) ( ) ( ) ∼ N ( ) Then, this forecast ensemble x 1 ,...,x N must be updated 'x i x i & K (y v i H x i ),whereK is given in t t t = t + t t + t − t t t based on yt ,thenewdataattime' t". This" update step can be car- (11). ried out stochastically or deterministically.' ' & ' & ' & Aswehaveseen,theEnKFupdatecanbenicelyinterpretedas an approximate version (because K is estimated by K )ofcon- 3.1. Stochastic Updates t t ditional simulation. For alternative interpretations, rewrite the We focus frst on stochastic updates, as these are more natural update step as & to statisticians, and can be easily motivated as an approximate ( ) ( ) ( ) ( ) x i x i K (y i H x i ) (12) form of conditional simulation (e.g., Journel 1974). t t t t t t = + (−i) (i) Specifcally, to conditionally simulate from the state fltering (In Kt Ht )xt Kt yt , (13) = − & + distribution, we use the state forecast ensemble in (10), together (i) & (i) ' ' (1) (N) where yt yt vt is a “perturbed”& observation.& From (12), with a set of simulated observations yt ,...,yt from the = + ' (i) the EnKF update can be viewed as a stochastic version of the observation forecast distribution. Setting vt mt (0, Rt ),it (i) (i) ∼(i)N (i) Kalman flter mean update (6), where the prior mean and the can be easily verifed that x and y ' Ht x ' v follow t t = t − t observation, (µt , yt ), are replaced with a prior draw and a per- the correct joint distribution in (5). Conditional simulation then (i) (i) turbed observation, (xt , yt ), respectively. That is, the EnKF shifts the forecast ensemble' based on' the diference' between the (i) combines a draw" from the prior or forecast distribution, xt , simulated and actual observations: (i) with a draw from the' “likelihood,” y mt (yt , Rt ),toobtain t ∼ N (i) (i) (i) aposteriordrawfromthefltering distribution as in (12). From x x K (y y ), i 1,...,N. ' t = t + t t − t = (13), we further see that the fltering ensemble is a linear com- (i) bination or “shift” of the forecast ensemble and the observation. It is straightforward to show that x n(µt , !t ) by con- & ' ' ∼ N ( ) sidering the frst two moments: Clearly, E(x i ) µ ,andthe t = t covariance matrix is given by & & & 3.2. Deterministic Updates

(i) (i) (i) & (i) ! (i) var(xt ) var(xt ) var(Kt yt ) 2cov(xt , Kt yt ) The update step of the stochastic EnKF in Algorithm 1 can be = + − replaced by a deterministic update, leading to the widely used !t Kt H!t 2Kt H!t !t . & = +' − ' = ' ' class of deterministic EnKFs. These methods obtain an approxi- mate ensemble from the posterior by deterministically shifting Hence, conditional" simulation" provides" a simple! way to update the forecast ensemble to obtain a fltering ensemble that is an the prior ensemble, without relying on simulated or perturbed exact sample from the fltering distribution. This requires com- observations. The main idea behind the deterministic flter in the uni- putation of the Kalman gain, Kt ,whichinturnrequirescompu- variate case is as follows. Suppose we have prior draws, x(i) tation and storage of the n n forecast covariance matrix, !t . ˜ ∼ × (µ, σ 2),andwewanttoconvertthemtoposteriordraws, To avoid calculating this potentially huge matrix, the update N ˜ ˜ x(i) (µ, σ 2).Todothiswefrst standardize the prior step of the EnKF is an approximate version of conditional simu-' ˆ ∼ N ˆ ˆ draws as z(i) (x(i) µ)/σ ,andthen“unstandardize”them lation, for which the Kalman gain Kt is replaced by an estimate = ˜ − ˜ ˜ as x(i) µ σz(i).Combiningthetwostepsgivesx(i) µ Kt based on the forecast ensemble. Often, the estimated Kalman ˆ =ˆ+ˆ ˆ =ˆ+ σ /σ (x(i) µ). Thus, the deterministic update involves shifting gain has the form ˆ ˜ ˜ − ˜ & and scaling each prior draw so that the resulting posterior draws 1 are shifted toward the data and have a smaller variance than the Kt : Ct Ht′ (Ht Ct Ht′ Rt )− , (11) = + prior. Figure 1(b) illustrates the idea in a simple univariate exam- where Ct is an estimate& of the state forecast covariance matrix ple. !t . The simplest example is Ct St ,whereSt is the sam- There are many variants of deterministic updates, including (1) = (N) ple covariance matrix of xt ,...,xt .Formoredetails,see the ensemble adjustment Kalman flter (EAKF) and ensemble Section" 4.1. ' ' transform Kalman flter (ETKF). The details of these algorithms (1) (N) Then, given an initial ensemble' 'x0 ,...,x0 ,thatcanbe difer slightly when N < n, but they all belong to the family of taken as draws from the posterior distribution at time t 0, or, square root flters and are based on the same idea (Tippett et al. = & & THE AMERICAN STATISTICIAN: GENERAL 353

Figure . Simulation from a one-dimensional state-space model with n mt 1, Ht 1, Mt 0.9, Rt 1,andQt 1.Thetoppanel(a)showsthestate,observations, Kalman filter, and the N 5 EnKF ensemble members for the first  time= points.≡ For≡ the Kalman≡ filter, we≡ show the≡ filtering mean and 75% confidence intervals (which correspond to the average= spread from N 5 samples from a ). For the first time point t 1,thebottompanel(b)comparestheKalmanfilterand stochastic EnKF updates (with perturbed observations)= to a deterministic EnKF (see Section .)andtotheimportancesampler(IS),whichisthebasisfortheupdatestep= in most particle filters (e.g., Gordon, Salmond, and Smith ). The bars above the ensemble/particles are proportional to the weights. All approximation methods start with the same, equally weighted N 5 prior samples. Even for this one-dimensional example, importance sampling degenerates and represents the posterior distribution with essentially only one particle with= significant weight, while the EnKF methods shift the ensemble members and thus obtain a better representation of the posterior distribution.

2003). Omitting the subscript t for notational simplicity, let L particle flters). This is illustrated in a simple one-dimensional be a matrix square root of the forecast (prior) covariance matrix example in Figure 1. ! (i.e., ! LL′), and defne D : H ! H′ R . Then we can' Deterministic EnKF variants generally have less sampling = = t t t + t write (7)as variability and are more accurate than stochastic flters for very " " '' " small ensemble sizes (e.g., Furrer and Bengtsson 2007, sec. 3.4). 1 1 However, if the prior distribution is non-Gaussian (e.g., multi- ! (I LL′H′D− H)LL′ L(I L′H′D− HL)L′ LL′, = n − = n − = modal or skewed), then stochastic flters can be more accurate than deterministic updates (Lei, Bickel, and Snyder 2010). and! so L LW'' is a matrix' square' ' root of' the posterior' ' (fltering)&& = 1 For readers interested in applying the EnKF to large real- covariance matrix, where WW′ I L′H′D− HL. That is, the = n − world problems, we recommend user-friendly software such as fltering& covariance' matrix can be obtained by post-multiplying the data assimilation research testbed, or DART (Anderson et al. the forecast covariance with the matrix'W. ' 2009). In the deterministic EnKF variants, the n n matrix ! is × obtained based on the forecast ensemble with N ! nmem- bers, and the resulting low-rank structure is computationally" 4.OperationalVariantsandExtensions exploited in diferent ways by the diferent variants. 4.1.VarianceInfation and Localization

3.3.SummaryofTheBasicEnKF For dimension reduction and computational feasibility, the ensemble size, N, typically is much smaller than the state dimen- In summary, the EnKF only requires storing and operating on sion, n. If the estimated Kalman gain Kt is (11)withtheesti- N vectors (ensemble members) of length n,andtheestimated mated forecast covariance Ct simply taken to be St (the sam- Kalman gain can often be calculated quickly (see Section 4.1). In ple covariance matrix of the forecast ensemble),& then Kt is often theory, as N , the EnKF converges to the (exact) Kalman a poor approximation of the true Kalman gain. This' has two →∞ flter for linear Gaussian models, but large values of N are typ- adverse efects that require adjustments in practice. First,& small ically infeasible in practice. Updating the ensemble by shift- ensemble sizes lead to downwardly biased estimates of the pos- ing makes the algorithm much less prone to degeneration than terior state covariance matrix (Furrer and Bengtsson 2007). This alternatives that rely on reweighting of ensemble members (e.g., can be alleviated by covariance infation (e.g., Anderson 2007a), 354 M. KATZFUSS ET AL.

Figure . For the spatial example in Section .,(a)N 25 prior ensemble members x(1),...,x(25) and (b) observations y and posterior ensemble members = t t x(1),...,x(25) from the stochastic EnKF algorithm with tapering radius r 10.Notethatthex-axis represents spatial location, not time. For illustration, three mem- t t = bers of the ensemble are shown in different colors. The prior ensemble members have constant' mean' and variance, while the mean of the posterior ensemble is shifted &toward the& data and the variance is smaller than for the prior ensemble. where St is multiplied by a constant greater than one. Second, uncorrelated (i.e., Rt is diagonal). If Rt is nondiagonal, the obser- 1/2 small ensemble sizes lead to rank defciency in St ,andoften vation Equation (1)canbepremultipliedbyRt− to obtain result in' spurious correlations appearing between state compo- uncorrelated observation errors. nents that are physically far apart. In practice, this' is usually For linear Gaussian models, it can be shown that the simul- avoided by “localization.” Several localization approaches have taneous and serial Kalman flter schemes yield identical pos- been proposed, but many of them can be viewed as a form of teriors after mt observations. The advantage of serial updating tapering from a statistical perspective (Furrer and Bengtsson is that it avoids calculation and storage of the n mt Kalman × 2007). The most common form of localization sets C S , gain matrix Kt (which requires inverting an mt mt matrix). t = t ◦ Tt × where denotes the Hadamard (entrywise) product, and is a Serial methods require computing an n 1 Kalman gain vec- ◦ Tt × sparse positive defnite correlation matrix (e.g., Furrer, Genton,' tor (and hence inverting only a scalar) for each of the mt scalar and Nychka 2006;Anderson2007b). observations. Stochastic and deterministic EnKFs can be imple- Figures 2 and 3 illustrate the EnKF update step with taper- mented serially by applying the updating formulas to each row ing localization in a one-dimensional spatial example at a single of (1)oneatatime.Covarianceinfation and localization can time step. Omitting time subscripts for notational simplicity, we also be applied in the serial setting, but they are only applied to defne the state vector as x (x ,...,x )′,whichrepresents one column of the covariance matrix after each scalar update, = 1 40 the “true spatial feld” at n 40 equally spaced locations along and the serial updates are then generally not equivalent to joint = a line. Observations are taken at each location, so m 40, and updates. = we assume that H I and R I. The prior distribution for the = = i j state is x (0, !),with! φ| − |,whereφ 0.9. That ∼ N40 ij = = is, the state follows a stationary spatial process with an expo- nential correlation function." We" implement the EnKF algorithm 4.3.Smoothing with localization using N 25 ensemble members, where the = While we have focused on fltering inference so far, there is tapering matrix is based on the compactly supported 5th- T another form of inference in state-space models called smooth- order piecewise polynomial correlation function of Gaspari and ing. Based on data y1:T collected in a fxed time period Cohn (1999), with a radius of r 10. No covariance infation 1,...,T ,smoothingattemptstofnd the posterior distribu- = { } is used. Figure 2 shows the prior and posterior ensemble mem- tion of the state x conditional on y for any t 1,...,T . t 1:T ∈ { } bers from the EnKF algorithm, along with the observed data. For linear Gaussian state-space models, this can be done exactly Figure 3 shows the true and ensemble-based prior covariance using the Kalman smoother, which consists of a Kalman fl- matrix, ! and C, and the Kalman gain, K. Note that the Kalman ter, plus subsequent recursive “backward smoothing” for t = gain is equivalent to the posterior covariance (i.e., K !)inthis T, T 1,...,1 (e.g., Shumway and Stofer 2006). Again, this = − example" because H R I. becomes computationally challenging when the state or obser- = = ! vation dimensions are large. For moderate dimensions, smooth- ing inference can then be carried out using an ensemble-based approximation of the Kalman smoother, which also starts with 4.2.SerialUpdating an EnKF and then carries out backward recursions using the estimated fltering ensembles (e.g., Stroud et al. 2010). In many high-dimensional data assimilation systems (such If the state dimension is very large or the evolution is non- as DART), the EnKF is implemented using serial updating linear, the most widely used smoother is the ensemble Kalman (e.g., Houtekamer and Mitchell 2001), where at each time t, smoother (EnKS) of Evensen and van Leeuwen (2000). The the ensemble is updated after each scalar observation, y , i EnKS is a forward-only (i.e., no backward pass is required) algo- ti = 1,...,mt ,ratherthansimultaneouslyusingtheentirevector rithm that relies on the idea of state augmentation.Ateachtime yt .Serialassimilationrequiresthattheobservationerrorsare point t,thestatevectorisaugmentedtoincludethelaggedstates THE AMERICAN STATISTICIAN: GENERAL 355

Figure . For the spatial example in Section .,comparisonoftrueandensemble-basedpriorcovarianceandKalmangainmatrices.Toprow:(a)truecovariance!,(b) sample covariance S, (c) tapered sample covariance C with r 10. Bottom row: Kalman gain matrices K and K obtained using the prior covariances in the top row (i.e., = plots (d)–(f) correspond to covariances in (a)–(c), respectively). The sample forecast covariance matrix in (b) and the corresponding Kalman gain matrix in (e) exhibit a" large amount of sampling' variability. The tapered covariance matrix in (c) has less sampling variability, the correlations& die out faster, and become identically zero (shown in white) for locations more than r 10 units apart. The corresponding gain matrix in (f) is a more accurate estimate of the true gain than the untapered version in (e), but it is no longer sparse. =

back to time 1: x (x′ ,...,x′ )′. The update step is analo- The second approach to parameter estimation is based on 1:t = 1 t gous to the EnKF, but instead of updating only xt ,itisnecessary approximate likelihood functions constructed using the output to update the entire history x1:t .Toavoidhavingtoupdatethe from the EnKF. It can in principle be used for any type of param- entire history, sometimes a moving-window approach is applied, eter. The parameters are estimated either by maximum likeli- in which the update at each time point only considers the recent hood (ML) or Bayesian methods. Examples of these approaches history up to a certain time lag. include sequential ML (Mitchell and Houtekamer 2000), of- line ML (Stroud et al. 2010), and sequential Bayesian methods (Stroud and Bengtsson 2007;FreiandKünsch2012). In general, these methods have been successful in examples with a relatively 4.4.ParameterEstimation small number of parameters, and more work is needed for cases where the parameter and state are both high dimensional. So far, we have assumed that the only unknown quantities in the state-space model (1)–(2)arethestatevectorsxt ,andthatthere are no other unknown parameters. In practice, of course, this is 4.5. Non-Gaussianity and Nonlinearity often not the case. The matrices Ht , Mt , Rt ,andQt often include some unknown parameters θ (e.g., autoregressive coefcients, As we have shown above in Section 3, the EnKF updates are variance parameters, spatial range parameters) that also need to based on the assumption of a linear Gaussian state-space model be estimated. of the form (1)–(2). However, the EnKF is surprisingly robust to There are two main approaches to parameter estimation deviations from these assumptions, and there are many exam- within the EnKF framework. The frst is a very popular approach ples of successful applications of such models in the geophysical called state augmentation (Anderson 2001). Here, the param- literature. eters are treated as time-varying quantities with small artif- In general, nonlinearity of the state-space model that cial evolution noise. We then combine the states and param- Ht xt in (1) is replaced by a nonlinear observation operator eters in an augmented state vector zt (x′ , θ′ )′ and run an t (xt ),orMt xt 1 in (2) is replaced by a nonlinear evolution = t t H − EnKF on the augmented state vector to obtain posterior esti- operator t (xt 1). An advantage of the EnKF is that these M − mates of states and parameters at each time t. This approach operators do not have to be available in closed form. Instead, works well in many examples; however, it implicitly assumes that as can be seen in Algorithm 1, these operators simply have to the states and parameters jointly follow a linear Gaussian state- be “applied” to each ensemble member. In addition, it is easy space model. For some parameters, such as covariance parame- to show that if xt 1 is a sample from the fltering distribution, ˆ − ters, this assumption is violated, and the method fails completely then xt t (xt 1) wt is an (exact) sample from the forecast ˜ = M ˆ − + (Stroud and Bengtsson 2007). distribution. 356 M. KATZFUSS ET AL.

The efectiveness of the EnKF and its variants in the non- Anderson, J., Hoar, T., Raeder, K., Liu, H., Collins, N., Torn, R., and Avel- Gaussian/nonlinear case is a consequence of so-called lin- lano, A. (2009), “The Data Assimilation Research Testbed: A Commu- ear Bayesian estimation (e.g., Hartigan 1969; Goldstein and nity Facility,” Bulletin of the American Meteorological Society,90,1283– 1296. [353] Woof 2007). In its original form, linear Bayesian estimation Anderson, J. L. (2001), “An Ensemble Adjustment Kalman Filter for Data seeks to fnd linear estimators given only the frst and sec- Assimilation,” Monthly Weather Review,129,2884–2903.[355] ond moments of the prior distribution and likelihood, and in ——— (2007a), “An Adaptive Covariance Infation Error Correction Algo- that sense it is “distribution-free.” This is a Bayesian justifca- rithm for Ensemble Filters,” Tellus A,59,210–224.[353] tion for kriging methods in spatial statistics (Omre 1987)and ——— (2007b), “Exploring the Need for Localization in Ensemble Data Assimilation Using an Hierarchical Ensemble Filter,” Physica D,230, non-Gaussian/nonlinear state-space models in time-series anal- 99–111. [354] ysis (e.g., West, Harrison, and Migon 1985;Fahrmeir1992). ——— (2009), “Ensemble Kalman Filters for Large Geophysical Applica- When considering linear Gaussian priors and likelihoods, this tions,” IEEE Control Systems Magazine,29,66–82.[351] approach obviously gives the true Bayesian posterior. In other Anderson, J. L., and Anderson, S. L. (1999), “A Monte Carlo Imple- contexts, it is appealing in that it allows one to update prior infor- mentation of the Nonlinear Filtering Problem to Produce Ensemble Assimilations and Forecasts,” Monthly Weather Review,127,2741– mation without explicit distributional assumptions. However, as 2758. [356] described in O’Hagan (1987), the linear Bayesian approach is Bengtsson, T., Snyder, C., and Nychka, D. (2003), “Toward a Nonlinear potentially problematic in the context of highly non-Gaussian Ensemble Filter for High-Dimensional Systems,” Journal of Geophys- distributions (e.g., skewness, heavy tails, multimodality, etc.). ical Research, 108, 8775. [356] If the EnKF approximation is not satisfactory in the pres- Bishop, C., Etherton, B., and Majumdar, S. (2001), “Adaptive Sampling With the Ensemble Transform Kalman Filter. Part I: Theoretical Aspects,” ence of non-Gaussianity, it is possible to employ normal mix- Monthly Weather Review, 129, 420–436. [351] tures (e.g., Alspach and Sorenson 1972;AndersonandAnder- Bocquet, M., Pires, C. A., and Wu, L. (2010), “Beyond Gaussian Statistical son 1999; Bengtsson, Snyder, and Nychka 2003), hybrid particle- Modeling in Geophysical Data Assimilation,” Monthly Weather Review, flter-EnKFs (Hoteit et al. 2008;Stordaletal.2011;FreiandKün- 138, 2997–3023. [356] sch 2012), and other approaches (see, e.g., Bocquet, Pires, and Burgers, G., van Leeuwen, P. J., and Evensen, G. (1998), “Analysis Scheme in the Ensemble Kalman Filter,” Monthly Weather Review,126,1719– Wu 2010, for a review). 1724. [351] Evensen, G. (1994), “Sequential Data Assimilation With a Nonlinear Quasi-Geostrophic Model Using Monte Carlo Methods to Forecast 5.Conclusions Error Statistics,” Journal of Geophysical Research,99,10143–10162. [350] The EnKF is a powerful tool for inference in high-dimensional Evensen, G., and van Leeuwen, P. J. (2000), “An Ensemble Kalman state-space models. The key idea of the EnKF relative to other Smoother for Nonlinear Dynamics,” Monthly Weather Review,128, sequential Monte Carlo methods is the use of shifting instead 1852–1867. [354] of reweighting in the update step, which allows the algorithm Fahrmeir, L. (1992), “Posterior Mode Estimation by Extended Kalman Fil- tering for Multivariate Dynamic Generalized Linear Models,” Journal to remain stable in high-dimensional problems. In practice, the of the American Statistical Association,87,501–509.[356] algorithm requires the choice of important tuning parameters Frei, M., and Künsch, H. R. (2012), “Sequential State and Observation Noise (e.g., tapering radius and variance infation factor). Covariance Estimation Using Combined Ensemble Kalman and Parti- As noted in the introduction, much of the development of cle Filters,” Monthly Weather Review,140,1476–1495.[355,356] the EnKF has been outside the statistics community. There Furrer, R., and Bengtsson, T. (2007), “Estimation of High-Dimensional Prior and Posterior Covariance Matrices in Kalman Filter Variants,” are many outstanding problems and questions associated with Journal of Multivariate Analysis,98,227–255.[353] EnKFs (e.g., parameter estimation, highly nonlinear and non- Furrer, R., Genton, M. G., and Nychka, D. (2006), “Covariance Tapering for Gaussian models), and statisticians surely have much to con- Interpolation of Large Spatial Datasets,” Journal of Computational and tribute. Graphical Statistics,15,502–523.[354] Gaspari, G., and Cohn, S. E. (1999), “Construction of Correlation Functions in Two and Three Dimensions,” Quarterly Journal of the Royal Meteo- Funding rological Society, 125, 723–757. [354] Goldstein, M., and Woof,D.(2007),Bayes Linear Statistics, Theory & Meth- Katzfuss’ research was partially supported by National Science Foundation ods (Vol. 716), New York: Wiley. [356] (NSF) Grant DMS-1521676 and by NASA’sEarth Science Technology Ofce Gordon, N., Salmond, D., and Smith, A. (1993), “Novel Approach to AIST-14 program. Wikle acknowledges the support of NSF grant DMS- Nonlinear/Non-Gaussian Bayesian State Estimation,” IEE Proceedings 1049093 and Ofce of Naval Research (ONR) grant ONR-N00014-10-0518. FRadarandSignalProcessing,, 140, 107–113. [353] Hartigan, J. (1969), “Linear Bayesian Methods,” Journal of the Royal Statis- tical Society,SeriesB,3,446–454.[356] Hoteit, I., Pham, D.-T., Gharamti, M., and Luo, X. (2015), “Mitigating Observation Perturbation Sampling Errors in the Stochastic EnKF,” Acknowledgment Monthly Weather Review.[352] Hoteit, I., Pham, D. T., Triantafyllou, G., and Korres, G. (2008), “A New The authors thank Thomas Bengtsson, Jefrey Anderson, and three anony- Approximate Solution of the Optimal Nonlinear Filter for Data Assim- mous reviewers for helpful comments and suggestions. ilation in Meteorology and Oceanography,” Monthly Weather Review, 136, 317–334. [356] Houtekamer, P. L., and Mitchell, H. L. (1998), “Data Assimilation Using References an Ensemble Kalman Filter Technique,” Monthly Weather Review,126, 796–811. [351] Alspach, D. L., and Sorenson, H. W. (1972), “Nonlinear Bayesian Estima- ———(2001), “A Sequential Ensemble Kalman Filter for Atmospheric Data tion Using Gaussian Sum Approximations,” IEEE Transactions on Auto- Assimilation,” Monthly Weather Review,129,123–137.[354] matic Control,17,439–448.[356] THE AMERICAN STATISTICIAN: GENERAL 357

Journel, A. G. (1974), “Geostatistics for Conditional Simulation of Ore Bod- Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J. L. (2008), “Obstacles ies,” Economic Geology,69,673–687.[352] to High-dimensional Particle Filtering,” Monthly Weather Review,136, Kalman, R. (1960), “A New Approach to Linear Filtering and Prediction 4629–4640. [351] Problems,” Journal of Basic Engineering,82,35–45.[350] Stordal, A. S., Karlsen, H. A., Nævdal, G., Skaug, H. J., and Vallès, B. Lei, J., Bickel, P., and Snyder, C. (2010), “Comparison of Ensemble Kalman (2011), “Bridging the Ensemble Kalman Filter and Particle Filters: The Filters Under Non-Gaussianity,” Monthly Weather Review, 138, 1293– Adaptive Gaussian Mixture Filter,” Computational Geosciences, 15, 1306. [353] 293–305. [356] Mitchell, H. L., and Houtekamer, P. L. (2000), “An Adaptive Ensemble Stroud, J. R., and Bengtsson, T. (2007), “Sequential State and Variance Esti- Kalman Filter,” Monthly Weather Review,128,416–433.[355] mation Within the Ensemble Kalman Filter,” Monthly Weather Review, Nychka, D., and Anderson, J. L. (2010), “Data Assimilation,” in Hand- 135, 3194–3208. [355] book of Spatial Statistics, eds. A. Gelfand, P. Diggle, P. Guttorp, and Stroud, J. R., Stein, M. L., Lesht, B. M., Schwab, D. J., and Beletsky, D. (2010), M. Fuentes, New York, NY: Chapman & Hall/CRC, pp. 477–492. “An Ensemble Kalman Filter and Smoother for Satellite Data Assimi- [350] lation,” Journal of the American Statistical Association,105,978–990. O’Hagan, A. (1987), “Bayes Linear Estimators for Randomized Response [354,355] Models,” Journal of the American Statistical Association,82,580–585. Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, [356] J. S. (2003), “Ensemble Square-Root Filters,” Monthly Weather Review, Omre, H. (1987), “Bayesian Kriging merging Observations and Qualifed 131, 1485–1490. [351,353] Guesses in Kriging,” Mathematical Geology,19,25–39.[356] Tukey, J. W. (1962), “The Future of Data Analysis,” The Annals of Mathe- Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza, matical Statistics,33,1–67.[350] M., Kalnay, E., Patil, D. J., and Yorke, J. A. (2004), “A Local Ensemble West, M., Harrison, P. J., and Migon, H. S. (1985), “Dynamic Generalized Kalman Filter for Atmospheric Data Assimilation,” Tellus,56A,415– Linear Models and Bayesian Forecasting,” Journal of the American Sta- 428. [351] tistical Association,80,73–83.[356] Sherman, J., and Morrison, W. (1950), “Adjustment of an Inverse Matrix Wikle, C. K., and Berliner, L. M. (2007), “A Bayesian Tutorial for Data Corresponding to a Change in One Element of a Given Matrix,” Annals Assimilation,” Physica D: Nonlinear Phenomena,230,1–16.[350] of Mathematical Statistics,21,124–127.[351] Woodbury, M. (1950), “Inverting Modifed Matrices,” Memorandum Shumway, R. H., and Stofer, D. S. (2006), Time Series Analysis and its Appli- Report 42, Statistical Research Group, Princeton University, Princeton, cations With R Examples (2nd ed.), New York: Springer. [354] NJ. [351]