The Ensemble and its Relations to Other Nonlinear Filters

Michael Roth, Carsten Fritsche, Gustaf Hendeby and Fredrik Gustafsson

Linköping University Post Print

N.B.: When citing this work, cite the original article.

Original Publication: Michael Roth, Carsten Fritsche, Gustaf Hendeby and Fredrik Gustafsson, The Ensemble Kalman Filter and its Relations to Other Nonlinear Filters, 2015, 23rd European Signal Processing Conference,Nice, France, Aug 31-Sept 4, 2015.

Copyright: The Authors

Preprint ahead of publication available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-121079

23rd European Signal Processing Conference (EUSIPCO)

THE ENSEMBLE KALMAN FILTER AND ITS RELATIONS TO OTHER NONLINEAR FILTERS

Michael Roth∗, Carsten Fritsche∗, Gustaf Hendeby∗†, and Fredrik Gustafsson∗

∗ Linkoping¨ University, Department of Electrical Engineering, Linkoping,¨ Sweden, e-mail: {roth,carsten,hendeby,fredrik}@isy.liu.se † Swedish Defence Research Agency (FOI), Linkoping,¨ Sweden e-mail: [email protected]

ABSTRACT insights and improvements, both for the EnKF and filtering The Ensemble Kalman filter (EnKF) is a standard algorithm algorithms for high-dimensional problems in general. in oceanography and meteorology, where it has got thousands One reason for the lack of awareness towards the EnKF of citations. It is in these communities appreciated since it is that its development took place in geoscientific rather than scales much better with state dimension n than the standard signal processing journals. For example, the first EnKF pa- Kalman filter (KF). In short, the EnKF propagates ensembles per [2] was published in a geophysics journal; the paper [3] with N state realizations instead of values and covari- which contains an important correction to [2] appeared in a ance matrices and thereby avoids the computational and stor- meteorological journal. Accordingly, application specific jar- age burden of working on n×n matrices. Perhaps surprising, gon is often used: The KF time update or prediction step is very little attention has been devoted to the EnKF in the sig- often termed forecast in EnKF literature; the measurement nal processing community. In an attempt to change this, we update or correction step is termed analysis or data assimila- present the EnKF in a Kalman filtering context. Furthermore, tion. An extensive listing of geoscientific EnKF publications its application to nonlinear problems is compared to sigma is given in [4]. Popular KF references such as [5] or even point Kalman filters and the particle filter, so as to reveal new Kalman’s paper [6] are seldom cited in EnKF articles. insights and improvements for high-dimensional filtering al- gorithms in general. A simulation example shows the EnKF Some attention beyond the geoscientific context has been performance in a space debris tracking application. devoted to the EnKF by the automatic control [7, 8] and statis- tics communities [9–12]. In particular, a result that for linear Index Terms— Kalman filter, ensemble Kalman filter, systems the EnKF converges to the KF as the number of en- sigma point Kalman filter, UKF, particle filter semble members tends to infinity has been reported by differ- ent authors [12–14] rather recently. 1. INTRODUCTION The structure of the paper is as follows. Sec. 2 develops The Ensemble Kalman Filter (EnKF) is an algorithm for state the EnKF from the KF for linear systems. Sec. 3 shows how estimation in high-dimensional state-space models. Its de- the EnKF can be applied to nonlinear systems, and establishes velopment has been driven by applications in oceanography, relations to sigma point Kalman filters and the particle filter. meteorology, and other geosciences, in which the state dimen- A high-dimensional simulation example in which space de- sion can be in the order of millions [1]. bris is tracked is presented in Sec. 4, and followed by some With the ever increasing amount of available data, such concluding remarks. high-dimensional state estimation problems also become more important in the signal processing community. Still, the EnKF has been widely overlooked by signal processing researchers. It is our intention to change this, first, by giving a clear presentation that derives the EnKF from the linear 2. FROM KALMAN FILTER TO ENSEMBLE Kalman filter (KF) and, second, by giving some relations to KALMAN FILTER IN LINEAR SYSTEMS more familiar algorithms such as sigma point Kalman filters (e.g., the UKF) and the particle filter (PF). We hope that this leads to a cross-fertilization of ideas that will reveal new This section recalls the KF which is then used to derive the This work was supported by the project Scalable Kalman Filters granted ensemble updates in the EnKF. Furthermore, we discuss the by the Swedish Research Council. filter gain computation and several practical aspects.

978-0-9928626-3-3/15/$31.00 ©2015 IEEE 1241 23rd European Signal Processing Conference (EUSIPCO)

2.1. The Kalman Filter and (unbiased) ensemble covariance 1 1 We consider linear stochastic state-space models x¯k|k = N Xk|k , (6a) ¯ 1 ˜ ˜ T Pk|k = N−1 Xk|kXk|k, (6b) xk+1 = F xk + Gvk, (1a) 1 T yk = Hxk + ek, (1b) where = [1,..., 1] is an N-dimensional column vector ˜ T and Xk|k = Xk|k − x¯k|k1 is an ensemble of deviations with an n-dimensional state x, an m-dimensional measure- from x¯k|k. It turns out that computation and storage of the ment y, with E(x0) =x ˆ0, cov(x0) = P0, cov(vk) = Q and large matrices in (5b) or (6b) is avoided in the EnKF, which cov(ek) = R. The Kalman filter [5] for (1) can be divided makes the algorithm attractive for solving high-dimensional into a time update of the state estimate and its covariance state estimation problems.

xˆk+1|k = F xˆk|k, (2a) 2.3. Ensemble propagation for a known Kalman gain P = FP F T + GQGT ; (2b) k+1|k k|k The EnKF time update corresponds to (2) and amounts to the computation of the predicted output and its covariance computing an ensemble Xk+1|k that encodes xˆk+1|k and Pk+1|k from Xk|k. This can be achieved with an ensemble (1) (N) yˆk|k−1 = Hxˆk|k−1, (3a) Vk = [vk , . . . , vk ] of N process noise realizations: T Sk = HPk|k−1H + R; (3b) Xk+1|k = FXk|k + GVk. (7) and a measurement update of the state estimate and its covari- The EnKF time update amounts to a simulation. Conse- ance quently, also continuous time and/or nonlinear dynamics can be considered as long as state transitions can be simulated. T −1 −1 Kk = Pk|k−1H Sk = MkSk , (4a) In fact, the time update is typically omitted in the EnKF lit- erature [8], where an ensemble Xk|k−1 is often the starting xˆk|k =x ˆk|k−1 + Kk(yk − yˆk|k−1) (4b) point. = (I − K H)ˆx + K y , k k|k−1 k k (4c) In the EnKF measurement update, a prediction ensemble T T Pk|k = (I − KkH)Pk|k−1(I − KkH) + KkRKk . (4d) Xk|k−1 is processed to obtain the filtering ensemble Xk|k that encodes the KF mean and covariance. For the moment, we as- Here, Pk|k is written in Joseph form which appears useful in sume that the Kalman gain Kk is known but discuss its com- the upcoming discussion of the EnKF. It should be noted that putation in Sec. 2.4. −1 the inverse Sk in the gain computation (4a) does not need With Kk available, the KF update (4c) can be applied to to be computed explicitly. Rather, it is numerically advis- each ensemble member as follows able to compute Kk by solving the linear system of equations T Xk|k = (I − KkH)Xk|k−1 + Kkyk1 . (8) KkSk = Mk [15].

The resulting ensemble average (6a) is the correct xˆk|k. How- T 2.2. The Ensemble Kalman Filter ever, with yk1 known, the sample covariance of Xk|k gives only the first term of (4d) and therefore fails to encode Pk|k. The KF is problematic in high-dimensional state-spaces be- T A solution [3] is to account for the missing term KkRKk cause it requires storing and processing n × n covariance ma- by adding artificial zero-mean measurement noise realizations trices. Moreover, the full P and P are hardly ever (1) (N) k|k k|k−1 E = [e , . . . , e ] with covariance R, according to useful as output information to the user. These motives led to k k k T the development of the EnKF, which is based on the idea to Xk|k = (I − KkH)Xk|k−1 + Kkyk1 − KkEk. (9) condense the information that is carried by xˆk|k and Pk|k into Then, Xk|k correctly encodes xˆk|k and Pk|k. The model (1) samples. Specifically, the EnKF represents the filtering result is implicit in (9) because the matrix H appears. If we, similar (i) N by an ensemble of N realizations {xk }i=1 such that to (3), define a predicted output ensemble

N Yk|k−1 = HXk|k−1 + Ek (10) 1 X (i) x¯k|k = N xk ≈ xˆk|k, (5a) that encodes yˆk|k−1 and Sk, we can reformulate (9) to an up- i=1 N date that resembles (4b): ¯ 1 X (i)  (i) T 1T Pk|k = N−1 xk − x¯k|k xk − x¯k|k ≈ Pk|k. (5b) Xk|k = Xk|k−1 + Kk(yk − Yk|k−1). (11) i=1 In contrast to the update (9), (11) is merely sample based, The ensemble can be stored in an n × N matrix Xk|k, which which is a useful property for the extension to nonlinear sys- also allows for the compact notation of the ensemble mean tems.

1242 23rd European Signal Processing Conference (EUSIPCO)

2.4. Estimating the Kalman Gain for spatially distributed grid points. They are described in a dedicated chapter in [4]. The preceding derivation of the EnKF assumed a known The matrices M¯ and S¯ are estimated from data and sub- gain K , which in the KF is a function of covariance ma- k k k ject to random sampling errors. For example, there might be trices. Accordingly, a more explicit version of (4a) can be non-zero elements in M¯ where M is actually zero. This written as k k can lead to unwanted correlation among state components, an −1 −1 Kk = MkSk = cov(xk, yk | y1:k−1) cov(yk | y1:k−1) . effect called spurious correlation in EnKF literature. A sug- (12) gested remedy is called localization and is based on the fact The EnKF encodes predicted mean values and covariance that most measurements only affect a small subset of the en- matrices via ensembles of random states (7) and outputs (10). tire state vector. The idea is to carry out the measurement The fully deterministic Kk of (4a) is in (9) or (11) replaced update only for these specific states. ¯ ¯ ¯−1 ¯ by the EnKF gain Kk = MkSk . The randomness in Mk Furthermore, heuristics that rely on artificially increasing and S¯k implies a random K¯k. Below, we discuss methods for the uncertainty are used to improve filter performance. The computing the matrices M¯ k and S¯k. method is called covariance inflation in EnKF literature and ¯ ¯ A direct way to compute Mk and Sk is to utilize Xk|k−1 amounts to increasing the spread of the ensembles. A similar- from (7) and Yk|k−1 from (10). First, the ensemble averages ity can be seen to heuristics in the filtering literature, such as 1 1 1 1 the “fudge factor” of [16] and dithering in the PF [17]. x¯k|k−1 = N Xk|k−1 and y¯k|k−1 = N Yk|k−1 are used to compute the deviation ensembles X˜ = X − x¯ 1T , (13a) 3. THE ENSEMBLE KALMAN FILTER FOR k|k−1 k|k−1 k|k−1 NONLINEAR SYSTEMS ˜ T Yk|k−1 = Yk|k−1 − y¯k|k−11 , (13b) The filtering problem in nonlinear and/or non-Gaussian state- from which we can then obtain space models ¯ 1 ˜ ˜ T Mk = N−1 Xk|k−1Yk|k−1, (13c) x = f(x , v ), (15a) ¯ 1 ˜ ˜ T k+1 k k Sk = N−1 Yk|k−1Yk|k−1. (13d) yk = h(xk, ek), (15b) The computations (13) are entirely sampling based, which is useful for the upcoming nonlinear case. However, they inherit with known probability density functions p(x0), p(vk), and errors due to the random sampling that can be avoided. p(ek) has an elegant conceptual solution: the Bayesian filter- The basis for the KF is the knowledge how mean val- ing equations [18] recursively yield the posterior p(xk | y1:k). ues and covariance matrices propagate through linear sys- Unfortunately, the required computations are intractable in all tems (1). Consequently, we can use model knowledge to but the simplest cases. decrease the randomness in (13). First, we can replace the en- We have shown in the previous section that the EnKF can be executed entirely sampling based. By overloading the semble averages in (13a) and (13b) by x¯k|k−1 = F x¯k−1|k−1 functions in (15) to accept ensembles as input, we see from and y¯k|k−1 = HF x¯k−1|k−1, with a previous filtering mean x¯k−1|k−1. Second, we know that the measurement noise ek Xk+1|k = f(Xk|k,Vk), (16a) is not correlated with the state and should not influence M¯ k. Therefore, we can work with an unperturbed output deviation Yk|k−1 = h(Xk|k−1,Ek) (16b) ensemble that the EnKF with the measurement update (11) is applicable ˜ T Zk|k−1 = HXk|k−1 − y¯k|k−11 (14a) to nonlinear systems. This section highlights parallels between the EnKF and to compute two other (approximate) nonlinear filtering algorithms, sigma ¯ 1 ˜ ˜T Mk = N−1 Xk|k−1Zk|k−1, (14b) point Kalman filters and the particle filter. ¯ 1 ˜ ˜T Sk = Zk|k−1Z + R. (14c) N−1 k|k−1 3.1. The EnKF and sigma point Kalman filters It should be noted that the above considerations do not apply to the nonlinear systems of the next section. Neither (13) nor Another class of sampling based algorithms that can be ap- (14) require the computation of n × n matrices. plied to (15) are sigma point Kalman filters such as the un- scented KF [19, 20], or variants that are based on interpola- tion [21] or numerical integration [21, 22]. All of [19–22] 2.5. Practical aspects share the measurement update (4), that yields an estimate The practical challenges and remedies below have their ori- xˆk|k and a covariance Pk|k, with the KF. Hence, sigma point gin in geoscientific applications, where the state-space mod- Kalman filters share the storage requirements of the KF, and els often come from discretizing partial differential equations inherit the problems that appear for large n.

1243 23rd European Signal Processing Conference (EUSIPCO)

A similarity between EnKF and sigma point filters is that update amounts to a simulation of each particle with an inde- samples are propagated according to (16). These samples are pendent process noise realization similar to (16a). That is, the called ensemble in the former and sigma points in the latter EnKF and PF time updates are similar. class of filters. In contrast to the EnKF that retains the en- Particle filters are capable of solving severely nonlinear semble, the sigma points are generated from xˆk|k and Pk|k at and non-Gaussian problems, but can only be applied to low- the beginning of each iteration and also condensed into mean dimensional problems without further care. One reason for values and covariance matrices before the measurement up- this is the problem of generating meaningful samples in high- date. Furthermore, each sigma point is assigned a weight. In dimensional spaces. contrast to the random sampling in the EnKF, the sigma point and weight generation is fully deterministic in all of [19–22]. Based on the viewpoint in the algorithm development, how- 4. ENKF PERFORMANCE IN SIMULATIONS ever, the sampling methodology and sample size differs. As illustration, we sketch one possible UKF time update Here a simulation is presented inspired by the problem to T T T for (15). Let r be the dimension of [xk , vk ] and N = 2r+1. track space debris, a growing problem in space aviation. The First, N sigma points Xk|k and Vk, and an N-dimensional example has been considerably simplified not to draw atten- weight vector wk|k are generated such that Xk|kwk|k =x ˆk|k, tion from the EnKF itself. Hence, the objects travel in 1D and Vkwk|k = 0, and with Wk|k = diag(wk|k) they are affected by the same external force, the solar wind, effectively changing their speed. The nominal angular veloc-  T   T T   Xk|k − xˆk|k1 Xk|k − xˆk|k1 Pk|k 0 ity of the object is chosen to represent debris orbiting earth at Wk = . (17) Vk Vk 0 Q a constant altitude of 100 km. The trajectory of object i is modeled with a constant ve- Second, the transformed sigma points Xk+1|k are gener- locity model, xi = [θi, ωi]T , ated via (16a). In contrast to the EnKF, however, Xk+1|k is condensed into a mean value xˆ = X w and     k+1|k k+1|k k|k i 1 T i 0 T xk+1 = xk + vk, (18a) Pk+1|k = Xk+1|k(Wk|k − wk|kwk|k)Xk+1|k. The latter 0 1 T expression compactly denotes a weighted sample covariance. i   i i yk = 1 0 xk + ek. (18b) A similar procedure yields Mk and Sk, and eventually the Kalman gain Kk. There is no randomness in the algorithm and for linear systems, the UKF is equivalent to the KF. The The sample interval is T = 1 min, and measurements are UKF gain computation is in fact similar to the one in EnKF, available every 10 min. At these times all objects can be ob- served. The nominal object speed is ωn = 1.2 · 10−5 rad/s, except that all ensemble members are equally weighted in the i i ◦ 2 the measurement noise is R = cov(ek) = (1 ) , the com- latter. −4 2 mon process noise is Q = cov(vk) = (10 /60) , and the i i initial uncertainty of each object state is P0 = cov(x0) = 3.2. The EnKF and the 106 diag(Ri,Q). The individual states are stacked to handle M objects. Similar to the EnKF, the particle filter (PF) retains N samples, The resulting model can be expressed using Kronecker prod- called particles, throughout each iteration. The algorithm has uct notation been developed to approximately solve the Bayesian filtering equations [17, 23]. Using the principle of importance sam- 1 T  0 pling, the PF generates samples X from a proposal dis- x = I ⊗ x + 1 ⊗ v , (19a) k|k−1 k+1 M 0 1 k M T k tribution and then assigns weights w such that weighted k|k   particle averages approximate expected values with respect to yk = IM ⊗ 1 0 xk + ek, (19b) the posterior p(xk | y1:k). The key difference between the EnKF and the PF is that where x has the dimension n = 2M. the PF samples are only generated once in each iteration, and 100 Monte Carlo simulations each were performed for then left unchanged. In basic PF variants, the measurement different numbers of tracked objects M and different ensem- influences merely the weights wk|k, and the particles only in- ble sizes N. The result is presented in Fig. 1. The illustrated directly through a resampling step that is carried out to avoid MSE is computed as the average squared position error of all the problem of weight degeneracy [17]. More advanced PF objects over time. Of the total simulation time of 1 000 min variants can make use of yk in the generation of samples [23]. only the last 250 min are used for evaluation. The EnKF, in contrast, changes each ensemble member in the The result verifies that the EnKF converges to the KF as measurement update. N grows. However, as M and thus n increase, the ensem- For a specific choice of PF proposal distribution, char- ble size N has to grow significantly in order to retain the KF acterized by the transition density p(xk+1 | xk), the PF time performance for the chosen example.

1244 23rd European Signal Processing Conference (EUSIPCO)

2 10 [4] G. Evensen, : The Ensemble Kalman Filter, 2nd ed. EnKF: 2 obj Dordrecht; New York: Springer, Aug. 2009. EnKF: 5 obj EnKF: 10 obj [5] B. D. O. Anderson and J. B. Moore, Optimal Filtering. Prentice Hall, EnKF: 25 obj Jun. 1979. [6] R. E. Kalman, “A new approach to linear filtering and prediction prob- lems,” Journal of basic Engineering, vol. 82, no. 1, pp. 35–45, Mar. 1960. 10 1 [7] S. Gillijns, O. B. Mendoza, J. Chandrasekar, B. L. R. De Moor, D. S. Bernstein, and A. Ridley, “What is the ensemble Kalman filter and how well does it work?” in Proceedings of the American Control Confer-

Normalized MSE ence, Jun. 2006. [8] G. Evensen, “The ensemble Kalman filter for combined state and pa- rameter estimation,” IEEE Control Syst. Mag., vol. 29, no. 3, pp. 83– 104, Jun. 2009. [9] R. Furrer and T. Bengtsson, “Estimation of high dimensional prior 10 0 0 50 100 150 200 250 300 and posterior covariance matrices in Kalman filter variants,” Journal Ensemble size N of Multivariate Analysis, vol. 98, no. 6, pp. 227–255, Feb. 2007. [10] M. Frei and H. R. Kunsch,¨ “Mixture ensemble Kalman filters,” Com- Fig. 1. Time average of the position MSE as a function of the putational Statistics & Data Analysis, vol. 58, pp. 127–138, Feb. 2013. ensemble size N, normalized by the KF MSE, for different [11] M. Frei, “Ensemble Kalman filtering and generalizations,” Disserta- numbers of objects M. The state dimension is n = 2M. The tion, ETH, Zurich,¨ 2013, nr. 21266. results are based on 100 Monte Carlo simulations. [12] F. Le Gland, V. Monbet, and V. Tran, “Large sample asymptotics for the ensemble Kalman filter,” in The Oxford Handbook of Nonlinear Filtering, D. Crisan and B. Rozovskii, Eds. Oxford University Press, 2011, ch. 7.3, pp. 598–634. 5. CONCLUDING REMARKS [13] M. D. Butala, J. Yun, Y. Chen, R. A. Frazin, and F. Kamalabadi, “Asymptotic convergence of the ensemble Kalman filter,” in 15th IEEE We have presented the ensemble Kalman filter (EnKF) in a International Conference on Image Processing, Oct. 2008, pp. 825– way that clearly shows its origin in the Kalman filter (KF), and 828. makes it easily accessible to the signal processing community. [14] J. Mandel, L. Cobb, and J. D. Beezley, “On the convergence of the ensemble Kalman filter,” Applications of Mathematics, vol. 56, no. 6, We have discussed how the EnKF can be applied to nonlinear pp. 533–541, Jan. 2011. non-Gaussian filtering problems and at the same time high- [15] L. N. Trefethen and D. Bau, III, Numerical Linear Algebra. Philadel- lighted the similarities and differences to sigma point Kalman phia: SIAM, Jun. 1997. filters and the particle filter (PF). The simulation experiment [16] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with Applica- shows that the EnKF gives the KF results for linear problems, tions to Tracking and Navigation. Wiley-Interscience, Jun. 2001. once the ensemble becomes large enough. [17] F. Gustafsson, “Particle filter theory and practice with positioning ap- The EnKF is still an unexplored method in the signal pro- plications,” IEEE Aerosp. Electron. Syst. Mag., vol. 25, no. 7, pp. 53– cessing community. We have provided some initial insights 82, 2010. into the algorithm. However, we believe that this is only [18] A. H. Jazwinski, Stochastic Processes and Filtering Theory. Aca- demic Press, Mar. 1970. the beginning and that cross-fertilization with other methods is possible. Understanding the Kalman gain approximation [19] S. J. Julier, J. K. Uhlmann, and H. F. Durrant-Whyte, “A new approach for filtering nonlinear systems,” in Proceedings of the American Con- could be one key ingredient in this. Moreover, it is important trol Conference, vol. 3, 1995, pp. 1628–1632. to characterize the model features that make the EnKF work [20] S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear es- well in its successful applications. timation,” Proc. IEEE, vol. 92, no. 3, pp. 401– 422, Mar. 2004. [21] K. Ito and K. Xiong, “Gaussian filters for nonlinear filtering problems,” IEEE Trans. Autom. Control, vol. 45, no. 5, pp. 910–927, May 2000. [22] I. Arasaratnam and S. Haykin, “Cubature Kalman filters,” IEEE Trans. References Autom. Control, vol. 54, no. 6, pp. 1254–1269, Jun. 2009. [23] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential Monte Carlo [1] J. L. Anderson, “Ensemble Kalman filters for large geophysical ap- methods for Bayesian filtering,” Statistics and Computing, vol. 10, plications,” IEEE Control Syst. Mag., vol. 29, no. 3, pp. 66–82, Jun. no. 3, pp. 197–208, 2000. 2009. [2] G. Evensen, “Sequential data assimilation with a nonlinear quasi- geostrophic model using Monte Carlo methods to forecast error statis- tics,” Journal of Geophysical Research, vol. 99, no. C5, pp. 10 143– 10 162, 1994. [3] G. Burgers, P. J. van Leeuwen, and G. Evensen, “Analysis scheme in the ensemble Kalman filter,” Monthly Weather Review, vol. 126, no. 6, pp. 1719–1724, Jun. 1998.

1245