<<

1 Particle filtering with dependent processes Saikat Saha, Member, IEEE, and Fredrik Gustafsson, Fellow, IEEE

Abstract—Modeling physical systems often leads to discrete assumed throughout the whole book. It is the purpose of this time state space models with dependent process and measurement contribution to fill this gap in the PF theory. . For linear Gaussian models, the Kalman filter handles this The case of dependent noise might be more common in case, as is well described in literature. However, for nonlinear or non-Gaussian models, the particle filter as described in literature practice than is acknowledged in the literature. More specifi- provides a general solution only for the case of independent cally, it occurs whenever a (linear or nonlinear) filter is based noise. Here, we present an extended theory of the particle filter on a discrete time model that is derived from a continuous time for dependent noises with the following key contributions: (i) model (the only exception is when a piecewise constant noise The optimal proposal distribution is derived. (ii) The special process is assumed). For instance, in a typical target tracking case of in nonlinear models is treated in detail, leading to a concrete algorithm that is as easy to implement application, a radar is used to track an object. Even if the as the corresponding Kalman filter. (iii) The marginalized (Rao- measurement noise in the radar is completely independent of Blackwellized) particle filter, handling linear Gaussian substruc- the motion of the object, the sampling process of the motion tures in the model in an efficient way, is extended to dependent dynamics gives an extra noise contribution to the sensor model noise. Finally, (iv) the parameters of a joint Gaussian distribution which is dependent with the object’s motion. We will explain of the noise processes are estimated jointly with the state in a recursive way. this phenomenon in more detail in the last section. Also downsampling the dynamics in filtering problem introduces Index Terms—Bayesian methods, recursive estimation, particle such noise dependency (see [23] for details). This dependency filters, dependent noise, Rao-Blackwellized particle filter also arises in modeling many practical applications of interests, see e.g., [21]. 1 I.INTRODUCTION In this article , we propose a new class of particle filter algorithms which can take care of the noise dependency. The The particle filter (PF) provides an arbitrary good numerical organization of this article is as follows. In section II, we approximation to the online nonlinear filtering problem. More start with outlining the somewhat different structures of the specifically, the PF approximates the posterior distribution dependency and treat the optimal filtering for these different p(xk|Yk) of the latent state xk at time k, given the observations structures in parallel. We then derive the optimal proposal Yk = {y1, y2, . . . , yk}, based on the discrete time state space densities to be used in combination with the particle filters in model section III. The two most common approximations (prior and likelihood proposals) are also outlined. The optimal proposals xk+1 = f(xk, vk), (1a) are then specialized to the case of Gaussian dependent noise yk = h(xk, ek). (1b) processes. Moreover, with affine sensor model, the optimal Here, the model is specified by the state dynamics f(·), proposals for Gaussian dependent noise processes are obtained in closed form. We next develop the marginalized particle filter the observation dynamics h(·) and the initial state x0. Note that the latent state is usually assumed to be Markovian, framework with dependent noise processes in section IV. In section V, we address a recursive framework of estimating the i.e., the conditional density of xk+1 given the past state unknown noise of the dependent Gaussian noises, x0:k ≡ (x0, x1, . . . , xk), depends only on xk. The process driving a general state space model. Finally, in section VI, as noise vk and measurement noise ek, k = 1, 2, ··· are both assumed to be independent over time (white noise). The illustration, we show how sampling continuous time models can lead to the noise dependency. processes vk and ek are independent, except for either vk, ek (type I) or vk−1, ek (type II), which in this contribution, are assumed to be dependent. In this model, we assume that the II.OPTIMAL FILTERING WITH DEPENDENT NOISE probability density functions for x0, vk and ek are known. PROCESSES The theory of the PF as described in survey and tutorial For simplicity, consider the dynamic system (1) with additive articles [6]– [11]) treats the process noise and measurement measurement noise noise as independent processes. This is in contrast to the Kalman filter, where the case of correlated noise is a standard xk+1 = f(xk, vk) (2a) modification, see for instance [4] where type I dependence is yl = h(xl) + el (2b)

Copyright (c) 2012 IEEE. Personal use of this material is permitted. where xk is the latent state at time step k while yl is the However, permission to use this material for any other purposes must be observation at time step l. vk and el are the respective process obtained from the IEEE by sending a request to [email protected]. The authors are with the Department of Electrical Engineering, Division and measurement noises with their joint density assumed to of Automatic Control, Linkoping¨ University, Linkoping,¨ SE 581-83, Sweden, e-mail: {saha, fredrik}@isy.liu.se 1A preliminary version was presented in FUSION 2010 [21]. 2 be known. Furthermore, as in (1), the noise sequences are individually assumed to be independent. The joint posterior p(Xk|Yk) can be recursively obtained (up to a proportionality constant) as

p(Xk|Yk) ∝ p(yk|Xk,Yk−1)p(xk|Xk−1,Yk−1) ×

× p(Xk−1|Yk−1). (3)

For the standard Markovian model with independent process and measurement noises (i.e. p(vi, ej) = p(vi)p(ej)), we have

p(xk|Xk−1,Yk−1) = p(xk|xk−1) (4a) Figure 2. Type I dependency between process and measurement noise processes p(yk|Xk,Yk−1) = p(yk|xk). (4b)

Here the predictive density p(xk|xk−1) and the likelihood This is pretty common in the engineering literature, see e.g., density p(yk|xk) can be characterized in terms of the noise [5]. For this dependency, we have densities, p(vk−1) and p(ek) respectively. This explains why the standard model in particle filtering literature (see e.g. [6]– p(xk|Xk−1,Yk−1) = p(xk|xk−1, yk−1) (5a) [11]) is based on the prediction model and the likelihood function, respectively. p(yk|Xk,Yk−1) = p(yk|xk). (5b) Now we consider the more general case where the process Now, we use the decomposition noise and the measurement noise are dependent. To further explain this dependency, consider the graphical representation p(vk−1, ek−1) = p(vk−1|ek−1)p(ek−1). (6) of the dynamic system given by (2a)–(2b) in Figure 1. Here Since knowing xk−1 and yk−1 together would provide the complete information on ek−1, p(xk|xk−1, yk−1) and p(yk|xk) can be characterized in terms of p(vk−1|ek−1) and p(ek) respectively. Clearly, for this case, the hidden states and the observations form jointly a [14]. This joint Markov chain model was considered in [13] for the particle filter with correlated Gaussian noises.

B. Type II dependency

We next consider the dependency structure where vk and Figure 1. A graphical representation of the state space model (2a)–(2b) ek+1 are dependent to each other for k = 0, 1, ··· , while the T sequence of the noise vector (vk, ek+1) over different k is assumed to be independent. We call this Type II dependency. the process noise vk is driving the latent state xk to the This is shown in 3. This dependency structure has been next time step xk+1. The measurements corresponding to the adjacent states xk and xk+1 are yk and yk+1, obtained through the measurement noises ek and ek+1 respectively. According to the time occurrence of the dependency (adjacent in time), we treat here two dependency structures where the process noise vk either depends on (1) the measurement noise ek (same time step) or (2) ek+1 (one step apart). As we will explain here, the two cases are not quite the same. However, to proceed with both the cases, the main idea is a suitable decomposition of the joint density of the dependent noises p(vi, ej), j = i or (i + 1), into factors of appropriate conditionals. Figure 3. Type II dependency between process and measurement noise processes A. Type I dependency considered, e.g., in [12] for the treatment of Kalman filter. We first consider the dependency structure where vk−1 is It follows that dependent to ek−1, k = 1, 2, ··· , as shown in Figure 2. Here T p(x |X ,Y ) = p(x |x ) (7a) the sequence of the noise vector (vk−1, ek−1) over different k k k−1 k−1 k k−1 is assumed to be independent. We call this Type I dependency. p(yk|Xk,Yk−1) = p(yk|xk, xk−1). (7b) 3

For this case, we use the decomposition case of dependent noise can now be phrased as p(yk, xk+1|xk), where p(vk−1, ek) = p(ek|vk−1)p(vk−1). (8) p(yk, xk+1|xk) 6= p(xk+1|xk)p(yk|xk), (10) Like the previous case, knowing xk and xk−1 together would provide the complete information on vk−1. As a result, that is, yk and xk+1 are not independent, given xk. p(xk|xk−1) and p(yk|xk, xk−1) can be characterized in terms The proposal distribution has the functional form of p(vk−1) and p(ek|vk−1) respectively. Note that, this de- q(xk|Xk−1,Yk). In the standard PF, the Markovian property pendency treatment here does not require any so called joint and the independence of Yk−1 and xk are used to get [5] Markovian assumption. q(xk|Xk−1,Yk) = q(xk|xk−1, yk). (11a) y III.OPTIMAL PROPOSALFOR DEPENDENT NOISES For the case (10), there is a dependency between k−1 and xk, so we get In particle filtering, the posterior is approximated in the form of (weighted) random samples, also known as particles. q(xk|Xk−1,Yk) = q(xk|xk−1, yk, yk−1). (11b) These particles are generated from a different distribution, Based on this, we derive the following theorem for the optimal called the importance distribution (or the proposal), which is proposal function. ideally supposed to be as close as possible to the (unknown) Theorem 1: [Optimal proposal for Type I dependent posterior. Without going into further details of PF, we here noise] Here the optimal proposal function is given by provide a generic PF algorithm that will be generalized to dependent noise in our subsequent developments. p(xk|yk−1, xk−1)p(yk|xk) q(xk|xk−1, yk, yk−1) = . (12) p(yk|yk−1, xk−1) Algorithm 1: [] Proof: The optimal proposal is given by the posterior distri- Recursively over time k = 0, 1, 2,... bution of the same functional form, which can be rewritten For i = 1,...,N, where N is the total number of particles, using Bayes’ law as (i)  (i)  • sample x ∼ q x |X ,Y and k k k−1 k q(x |x , y , y ) (13a)   k k−1 k k−1 X(i) X(i) , x(i) set k , k−1 k = p(xk|xk−1, yk, yk−1) (13b) (i) • evaluate the corresponding importance weights w ac- p(x , y |y , x ) k = k k k−1 k−1 (13c) cording to p(yk|yk−1, xk−1) (i) (i) (i) p(yk|xk)p(xk|yk−1, xk−1) p(yk|X ,Yk−1)p(x |X ,Yk−1) = . (13d) w(i) ∝ w(i) k k k−1 . p(y |y , x ) k ek−1 (i) (i) k k−1 k−1 q(xk |Xk−1,Yk) This concludes the proof. . (i) w(i) The optimal proposal as described in (12) has been used as • normalize the importance weights w = k . ek PN (i) i=1 wk a special case for estimating the stochastic volatility in [15]. (i) N • resample the trajectories {xk }i=1 with probabilities This optimal proposal should be compared to the standard one (i) N (i) given by {w˜k }i=1 and set w˜k = 1/N. Reampling can be done at every time (SIR-PF) or when sample depletion is p(xk|xk−1)p(yk|xk) indicated (SIS-PF). q(xk|xk−1, yk) = . (14) p(yk|xk−1) One can, just as for the standard PF, define two extreme The performance of PF depends critically on the selection cases of sub-optimal proposal distributions of the proposal q(xk|·). In this section, we derive the optimal proposal when the noises are dependent. Here the proposal is Prior : q(xk|xk−1, yk−1) ∝ p(xk|yk−1, xk−1) optimal in the sense of minimum conditional of the (15a) importance weights [3]. Likelihood : q(xk|yk) ∝ p(yk|xk). (15b) The first proposal corresponds to the model (10), while the A. Optimal Proposal for Type I dependency second proposal is obtained directly from the observation In engineering literature, the following state space model, and model (9b). variants of it, is commonly used, B. Gaussian Noise Case for Type I dependency xk+1 = fk(xk) + Gkvk, (9a) To get instructive and explicit expressions, the Gaussian y = h (x ) + e , (9b) k k k k case, where the process noise sequence vk and the measurement x0 ∼ N (ˆx1|0,P1|0), (16a) noise sequence ek, k = 1, 2, ··· , are individually assumed      to be independent, while v and e are dependent. This vk Qk Sk k k ∈ N 0, T , (16b) corresponds to Type I dependency as defined in II-A. This ek Sk Rk 4

is studied in detail. The standard PF applies for the case Sk = standard SIR PF is obtained by letting Sk−1 = 0 in (22). 0 only. This Gaussian noise model can also be written as The optimal proposal in (18) cannot be analytically obtained x   f(x ) G Q GT G S  p k+1 |x = N k , k k k k k . in general. However, one important exception is for an affine y k h(x ) ST GT R k k k k k sensor model, for which the optimal proposal is Gaussian. This (17) is shown below: Note that this noise is the standard representation Corollary 2: [Optimal Gaussian proposal for affine sen- in the classic text book [4] on Kalman filters (KF). The main sor model with Type I dependency] When the sensor model result is given in Theorem 2. in (9b) is affine, the optimal proposal in (18) is Gaussian, i.e., ¯ Theorem 2: [Optimal proposal for Type I Gaussian de- q(xk|yk, xk−1, yk−1) = N (¯µk, Σk) (refer to (27)). pendent noise] For the model specified by (17), the optimal Proof: From Corollary 1, we have ∗ ∗ proposal function is given by p(xk|xk−1, yk−1) = N(µk, Σk) where ∗ −1  q(xk|xk−1, yk, yk−1) ∝ µk := f(xk−1) + Gk−1Sk−1Rk−1 yk−1 − h(xk−1) , (23a) −1  −1 ∗ T  T  Σk := Gk−1 Qk−1 − Sk−1Rk−1Sk−1 Gk−1. (23b) N f(xk−1) + Gk−1Sk−1Rk−1 yk−1 − h(xk−1) ,  G Q − S R−1 ST GT × When the sensor model is affine (i.e. h(xk) = Ak + Ckxk), k−1 k−1 k−1 k−1 k−1 k−1 we can write × N (h(x ),R ). (18) k k x   I 0 x   0  k = k + . (24) Proof: The result follows from (12) by studying the two factors yk Ck I ek Ak in the numerator. The second factor p(yk|xk) is obtained from Conditioned on (xk−1, yk−1), we have the observation model (9b). The first factor p(xk|xk−1, yk−1) x  x  µ∗ Σ∗ 0  follows by using Lemma 1 below.  k | k−1 ∈ N k , k , (25) Lemma 1: [Conditional Gaussian Distributions] Suppose ek yk−1 0 0 Rk the vectors X and Y are jointly Gaussian distributed as and by (24), we obtain          X µx Pxx Pxy µx ∼ N , T = N ,P .     Y µy Pxy Pyy µy xk xk−1 (19) p | = yk yk−1 Then, the conditional distribution for X, given the observed  ∗   ∗ ∗ T  µk Σk ΣkCk Y = y, is Gaussian distributed: N ∗ , ∗ ∗ T (26). Ckµk + Ak CkΣk CkΣkCk + Rk −1 −1 (X|Y = y) ∼ N (µx + PxyP yy (y − µy),Pxx − PxyP yy Pyx). Now using Lemma 1 in (26), we can show that (20) p(xk|yk, xk−1, yk−1) = N (¯µk, Σ¯ k), where  ∗ ∗ T ∗ T −1 In the above Lemma, let X = xk|xk−1 and Y = yk−1|xk−1 µ¯k = µk + ΣkCk (CkΣkCk + Rk) × and use the joint distribution of X and Y from (17), time- ∗ × {yk − Ckµk − Ak} (27a) shifted one step, then ¯ ∗ ∗ T p(xk|xk−1, yk−1) = Σk = Σk − ΣkCk ×  −1  × (C Σ∗CT + R )−1C Σ∗. (27b) N f(xk−1) + Gk−1Sk−1Rk−1 yk−1 − h(xk−1) , k k k k k k  ∗ T ∗ T −1 −1 T  T Now defining, Kk Σ C (CkΣ C + Rk) , we can Gk−1 Qk−1 − Sk−1Rk−1Sk−1 Gk−1 . (21) , k k k k rewrite (27a) – (27b) as

The prior proposal in (15a) becomes more explicit as ∗ summarized in Corollary 1. µ¯k = Kk(yk − Ak) + (I − KkCk)µk (27c) ¯ ∗ Corollary 1: [Prior proposal for Type I Gaussian depen- Σk = (I − KkCk)Σk. (27d) dent noise] For the model specified by (17), the prior proposal function is given by 

q(xk|xk−1, yk−1) = C. Optimal Proposal for Type II dependency  −1  N f(xk−1) + Gk−1Sk−1Rk−1 yk−1 − h(xk−1) , We consider here the following model: −1 T  T  Gk−1 Qk−1 − Sk−1Rk−1Sk−1 Gk−1 . (22) xk = fk(xk−1) + Gkvk−1, (28a) yk = hk(xk) + ek, (28b) Proof: In this case, the proposal is p(xk|xk−1, yk−1) which is directly given by Lemma 1, as shown above.  where the process noise sequence vk and the measurement It is thus straightforward to generate samples from this noise sequence ek, k = 1, 2, ··· , are individually assumed to proposal using the Gaussian random number generator. The be independent, while vk−1 and ek are dependent for different 5 k = 1, 2, ··· . This corresponds to Type II dependency as in (28b) is affine, the optimal proposal in (35) is Gaussian, defined in II-B. This noise dependency implies that given by q(xk|yk, xk−1) = N (µek, Σek) (refer to (37)). Proof: When the sensor model is affine (i.e. h(xk) = Ak + p(xk, yk|xk−1) = p(xk|xk−1)p(yk|xk, xk−1). (29) Ckxk), the state space model ((28a)–(28b)) can be written as Theorem 3: [Optimal proposal for Type II dependent x   f(x )   G 0 v  noise] For the model specified by (28a)–(28b), the optimal k = k−1 + k k−1 . y A + C f(x ) C G I e proposal function is given by k k k k−1 k K k (37a) q(x |X ,Y ) = p(x |x , y ) (30) k k−1 k k k−1 k Now using (33) one can show ∝ p(xk|xk−1)p(yk|xk, xk−1). (31)    xk  p |xk−1 ∼ N µ¯k, Σ¯ k (37b) Proof: yk p(yk, xk|xk−1) p(xk|xk−1, yk) = where p(yk|xk−1)   f(xk−1) p(yk|xk, xk−1)p(xk|xk−1) µ¯k = (37c) = Ak + Ckf(xk−1) p(yk|xk−1) ∝ p(xk|xk−1)p(yk|xk, xk−1) and  G Q GT G Q GT CT + G S  . k k−1 k k k−1 k k k k ¯   Just like the standard PF, one can define the following two Σk =  T T T T T T  CkGkQk−1Gk + Sk Gk CkGk(Qk−1Gk Ck + Sk)+ sub-optimal proposal distributions T T T +Sk Gk Ck + Rk Prior : q(xk|xk−1) ∝ p(xk|xk−1) (32a) (37d)   Likelihood : q(xk|yk, xk−1) ∝ p(yk|xk, xk−1). (32b) P¯xx,k P¯xy,k := ¯T ¯ (37e) Pxy,k Pyy,k D. Gaussian Noise Case for Type II dependency Using Lemma 1 in (37b), we obtain When the noises vk−1 and ek are jointly Gaussian as      q(xk|yk, xk−1) = p(xk|yk, xk−1) = N (µek, Σek) (37f) vk−1 Qk−1 Sk ∈ N 0, T , (33) ek Sk Rk where the equivalent probabilistic description of the state space ¯ ¯−1 µek = f(xk−1) + Pxy,kPyy,k(yk − Ak − Ckf(xk−1)) (37g) model ((28a)–(28b)) can be given by −1 T Σk = P¯xx,k − P¯xy,kP¯ P¯ . (37h)       T  e yy,k xy,k xk fk(xk−1) GkQk−1Gk GkSk p |xk−1 = N , T T . yk hk(xk) Sk Gk Rk  (34) To conclude this section, a summary of the different pro- posals for dependent noise cases is presented in Table I. Theorem 4: [Optimal proposal for Type II Gaussian dependent noise] For the model specified by (34), the optimal proposal function is given by IV. MPF FORMIXEDLINEAR /NONLINEAR STATE SPACE   MODELSWITHDEPENDENT GAUSSIANNOISES q(x |x , y ) ∝ N f(x ),G Q GT × k k−1 k k−1 k k−1 k The idea behind the marginalized particle filter (MPF) is  T −1 † as follows. If there is an analytically tractable substructure × N h(xk) + Sk Qk−1Gk(xk − f(xk−1)), within the general state space model, the state estimation T −1  Rk − Sk Qk−1Sk . (35) problem can be divided into sub-parts: given any observation, the non analytical part is estimated using the PF and the Proof: The result follows from (31) by studying the two tractable substructure can be estimated analytically condi- factors. The first factor is obtained directly from the process tioned on the PF output. This method is also referred to as model (28a). The second factor (yk|xk−1, xk) follows by using Rao-Blackwellized particle filter. There are several advantages Lemma 1 in (34).  using MPF: besides obtaining an improved estimate from the Corollary 3: [Prior proposal for Type II Gaussian de- Rao-Blackwellization, this helps us to keep the state dimension pendent noise] For the model specified by (34), the prior small enough for the PF to be feasible. proposal function is given by A widely used MPF for state estimation is the one contain-  T  ing a conditionally linear-Gaussian substructure, for which op- q(xk|xk−1, yk) ∝ N f(xk−1),GkQk−1Gk (36) timal estimate can be obtained analytically using the Kalman Proof: In this case, the proposal is p(xk|xk−1) which is filter (see e.g., [17], [3], [19]). However, in all such available directly obtained from the process model (28a).  algorithms for MPF, the measurement noise vector is assumed Corollary 4: [Optimal Gaussian proposal for affine sen- to be independent of the process noise vector. Here we relax sor model with Type II dependency] When the sensor model this assumption and extend the available results to the case 6

Table I SUMMARY OF THE DIFFERENT PROPOSALS WITH DEPENDENT NOISE CASES

Dependent noise: Type I Dependent noise: Type II Optimal proposal Theorem 1 Theorem 3 Opt. proposal: Gaussian noise case Theorem 2 Theorem 4 Prior proposal: Gaussian noise case Corollary 1 Corollary 3 Opt. proposal: Gaussian noise case with affine sensor model Corollary 2 Corollary 4 of dependent noise processes2. For the derivation, we mainly such that follow [19]. p p¯ w¯k ∼ N (0, Λk) (42b) A. Mixed linear/nonlinear state space model with A rather general model containing a linear-Gaussian sub- p¯ p pp py yy −1 py T structure is given as [5] Λk = Cov(w ¯k) = Σk − Σk (Σk ) (Σk ) (42c) l l p l p l l p l xk+1 = fk(xk) + Ak(xk)xk + Gk(xk)wk (38a) Now define p p p p p l p p p lp¯ l p lp py yy −1 ly xk+1 = fk (xk) + Ak(xk)xk + Gk(xk)wk (38b) Λk = E(wkw¯k) = Σk − Σk (Σk ) (Σk ). (43) p p l yk = hk(x ) + Ck(x )x + ek, k = 1, 2,... (38c) k k k Then l where xk denotes the state variable with conditionally linear p l l ly yy −1 lp¯ p¯ −1 p w¯ w − Σ (Σ ) ek − Λ (Λ ) w¯ , (44a) dynamics, xk denotes the nonlinear state variable and yk k , k k k k k k l p is the measurement at discrete time step k. wk, wk are the l p leading to process noises driving xk+1 and xk+1 respectively and ek is the measurement noise. The noise vector at time step k, is l ¯l w¯k ∼ N (0, Λk) (44b) assumed to be jointly Gaussian with zero as with ¯  l  Λl = Cov(w ¯l ) = Σll − Σly(Σyy)−1(Σly)T − wk k k k k k k wp ∼ N (0, Σ ) (39) lp¯ p¯ −1 lp¯ T  k k −Λk (Λk) (Λk ) . (44c) ek For notational convenience, we define with covariance marix  ll lp ly  Γpy = Σpy(Σyy)−1 (45a) Σk Σk Σk k k k lp T pp py ly ly yy −1 Σk = (Σk ) Σk Σk  (40) Γk = Σk (Σk ) (45b) (Σly)T (Σpy)T Σyy lp lp¯ p¯ −1 k k k Γk = Λk (Λk) . (45c) The sequence of this noise vector over different k is assumed Now using (42a) and (44a), the model as described by (38a)– to be independent. Furthermore, xl is assumed to be inde- 0 (38c) can be re-written as pendent of the noises and distributed according to a Gaussian l l l l l l ly lp p as, x = f + A x + G [w ¯ + Γ ek + Γ w¯ ](46a) l ¯ k+1 k k k k k k k k x0 ∼ N (¯x0, P0). (41) p p p l p p py xk+1 = fk + Akxk + Gk[w ¯k + Γk ek] (46b) p The density of x is arbitrary, but it is assumed to be l 0 yk = hk + Ckxk + ek. (46c) known. We further assume that the dynamic model follows favorable property as in [18]. For notational brevity, Defining the pseudo measurements obtained from the residuals p the dependence on xk in equations (38a)–(38c) is suppressed as onwards. (1) p p Zk = (xk+1 − fk ) (47a) (2) B. Gram-Schmidt orthogonalization for dependent noise pro- Zk = (yk − hk). (47b) cesses p It then follows that We define here two new Gaussian noise processes, w¯k and w¯l , which are independent of each other and also individually (2) l k Zk = Ckx + ek (48a) independent of e , using the standard Gram-Schmidt proce- k Z(1) = Apxl + Gp[w ¯p + Γpy{Z(2) − C xl }]. (48b) dure ( [1], [4]) as follows: k k k k k k k k k Define Defining p p py yy p p p py −1 A¯ = [A − G Γ Ck], (49) w¯k , wk − Σk (Σk ) ek (42a) k k k k

2A multivariate Gaussian noise is completely characterized by the second so that equation (48b) can now be written as order statistics. Hence dependent Gaussian noise implies correlated Gaussian (1) ¯p l p py (2) p p noise. Zk = Akxk + GkΓk Zk + Gkw¯k. (50) 7

Similarly, using equation (48a) and (50) in equation (46a) and However, the same problem for a general state space problem p assuming Gk to be invertible, we have is less studied. In this section, we address a joint state and l l l l l l ly (2) l noise parameter estimation problem involving dependent xk+1 = fk + Akxk + Gk[w ¯k + Γk {Zk − Ckxk} + Gaussian noise processes using PF. lp p −1 (1) ¯p l p py (2) +Γk (Gk) {Zk − Akxk − GkΓk Zk }] To proceed with, consider the following state-space model l l l ly l p −1 lp ¯p l with and measurement noises: = fk + [Ak − GkΓk Ck − Gk(Gk) Γk Ak]xk + l ly lp p −1 p py (2) xk = f(xk−1) + vk, (57a) +Gk[Γk − Γk (Gk) (Gk)Γk ]Zk + l lp p −1 (1) l l yk = h(xk) + ek, k = 1, 2,... (57b) +[GkΓk (Gk) ]Zk + Gkw¯k. (51) nv ne Define where xk ∈ R is the hidden state and yk ∈ R is the measurement, at time step k. vk and ek are the corresponding ¯l l l ly l p −1 lp ¯p Ak = [Ak − GkΓk Ck − Gk(Gk) Γk Ak] (52) process and measurement Gaussian noises, which are depen- 3 w ∈ d d = n + n and dent to each other . Define k R (here v e) as l l l ly lp py (2) l lp p −1 (1)   f¯ = f +G [Γ −Γ Γ ]Z +[G Γ (G ) ]Z , (53) vk k k k k k k k k k k k wk = , (58) ek so that, equation (51) can be written as where the sequence of wk is independent Gaussian conditioned xl = f¯l + A¯l xl + Gl w¯l . (54) k+1 k k k k k on an unknown mean µk and Σk. Here we assume the parameters (µk, Σk) to be slowly varying in time. C. Revised Model Definition This slowly varying nature can arise e.g., due to model mis- The state space model obtained using equations (54), (50) specification [26]. Now, the conditional distribution of wk is and (48a), is linear, driven by independent zero mean Gaussian given as noises as wk|(µk, Σk) ∼ N (µk, Σk) (59) l ¯l ¯l l l l xk+1 = fk + Akxk + Gkw¯k (55a) with (1) ¯p l p py (2) p p Σ Σ  Zk = Akxk + GkΓk Zk + Gkw¯k (55b) T T T vv,k ve,k µk = [µv,k µe,k] ;Σk = T . (60) (2) l Σve,k Σee,k Zk = Ckxk + ek (55c) One key objective here is to learn the noise parameters with θk , (µk, Σk) adaptively as the new measurement arrives. The (1) p p problem of learning those parameters (using a MPF approach) Zk = (xk+1 − fk ) (55d) when the noises are independent, has recently been addressed Z(2) = (y − h ), (55e) k k k in [22]. Here we consider a more general case with dependent where noises.  l   ¯l  w¯k Λk 0 0 p p¯ Cov w¯k =  0 Λk 0  . (56) A. Conjugate prior for unknown Gaussian noise parameters yy ek 0 0 Σ k Following [2], a suitable conjugate prior for (µk, Σk) is ¯l ¯l ¯p known to be a Normal-inverse-Wishart distribution of the form Here fk, Ak and Ak are obtained using equations (53), (52) and (49) respectively. Now one can apply the standard results (µk, Σk) ∼ NiW(νk,Vk), where (e.g. in [19]) for MPF utilizing a linear-Gaussian substructure µ |Σ ∼N (ˆµ , Σˆ ) (61a) on this revised model. The summary of the main steps are k k k k given in Table II and the details are outlined in appendix. Σk ∼ iW(νk − d − 1, Λk). (61b) An application of this framework to the terrain navigational Here iW(·) denotes Inverse Wishart distribution. The parame- problem is presented in [24]. ters νk and Vk hold the sufficient statistics and can be updated recursively. The relevant quantities are defined as, V. ESTIMATING THE UNKNOWN NOISE STATISTICS OF  V V  DEPENDENT GAUSSIANNOISES V = wkwk,k 1wk,k (62a) k V V Most of the estimation algorithms involving a state space wk1,k 11,k −1 model assume a prior knowledge of the noise distributions, µˆk = V11,kV1wk,k (62b) whereas the properties of the noise processes are often ˆ −1 Σk = ΣkV11,k (62c) unknown for many practical problems. Moreover, the noise Λ = V − V V −1 V (62d) distributions may be non-stationary or state dependent, which k wkwk,k 1wk,k 11,k wk1,k further prevents the so called off-line tuning approach. For 3We note that this corresponds to Type II dependency (as in Figure 3, but linear Gaussian model, the adaptive Kalman filters can vk−1 is replaced by vk for notational convenience). Treatment of Type I estimate the unknown noise parameters jointly with the state. dependency, which we have not included here, can be carried out similarly. 8

Table II SUMMARY OF THE INFORMATION STEPS OF MPF UTILIZINGALINEAR-GAUSSIANSUBSTRUCTURE

l p l p p Prior p(xk,Xk |Yk) = p(xk|Xk ,Yk) p(Xk |Yk) | {z } | {z }

p p PF: TU p(Xk |Yk) ⇒ p(Xk+1|Yk) | {z } prior

l p l p KF: Dyn MU p(xk|Xk ,Yk) ⇒ p(xk|Xk+1,Yk) | {z } prior

l p R l p l l p l KF: TU p(xk+1|Xk+1,Yk) = p(xk+1|Xk+1, xk,Yk) p(xk|Xk+1,Yk) dxk | {z } KF:Dyn MU

yk+1 is available now

p p PF: MU p(Xk+1|Yk) ⇒ p(Xk+1|Yk+1) | {z } PF:TU

l p l p KF: MU p(xk+1|Xk+1,Yk) ⇒ p(xk+1|Xk+1,Yk+1) | {z } KF:TU

l p l p p Posterior p(xk+1,Xk+1|Yk+1) = p(xk+1|Xk+1,Yk+1)p(Xk+1|Yk+1)

Vwkwk,k is defined as the upper-left d × d sub-block of Vk ∈ with (d+1)×(d+1) R . The joint density of (µk, Σk) is of the form ˜ T ˜ −1 µ˜e|v,k =µ ˜e,k + Σve,kΣvv,k(vk − µ˜v,k) (65d) ˜ ˜ ˜ T ˜ −1 ˜ p(µk, Σk) = NiW(νk,Vk) (63a) Σe|v,k = he|v,k(Σee,k − Σve,kΣvv,kΣve,k) (65e) 1 − νk 1 T = |Σk| 2 × he|v,k = [˜νk + (vk − µ˜v,k) × c (˜νk + dv) 1 −1 T ˜ −1 × exp(− tr(Σ [−I , µ ]V [−I , µ ] )), (63b) × Σ (vk − µ˜v,k)]. (65f) 2 k d k k d k vv,k where c is the normalizing constant. Consequently, the poste- B. Joint state and parameter estimation rior predictive distribution of wk can be analytically obtained Our interest lies in estimating p(xk|Yk) and p(θk|Yk) as a multivariate Student’s t distribution of the form recursively over time. Suppose we are at time step k − 1 and p(x |Y ) is approximately given in the form of weighted ˜ k−1 k−1 p(wk) = tν˜k (˜µk, Σk) particle cloud. The propagation of p(xk−1|Yk−1) to the next Γ(˜ν /2 + d/2) |Σ˜ |−1/2 time step using a running PF is shown below: = k k × (d/2) Γ(˜νk/2) (˜νkπ) ν˜k+d 1) Particle filter update: PF approximates p(Xk−1|Yk−1)  −( 2 ) 1 T ˜ −1 by the empirical measure as × 1 + (wk − µ˜k) Σk (wk − µ˜k) (64) ν˜k N X (i) p(Xk−1|Yk−1) ' ωk−1δ (i) (Xk−1). (66) where ν˜k = νk − d + 1, is the degree of freedom, µ˜k = Xk−1 (1+V11,k) i=1 µk and Σ˜ k = Λk are the location and the scale (νk−d+1)V11,k (i) parameters of the above Student’s t distribution, with Γ(·) as Now we generate new samples xk from the proposal q(xk|·) (i) (i) (i) (i) the Gamma function. If we now partition the variable wk into and form the trajectories Xk as Xk , [Xk−1, xk ] such two blocks (corresponding to vk and ek respectively), then the that marginals of wk (i.e. vk and ek) are also obtained as Student’s N X (i) t distributions [20] p(Xk|Yk) ' ωk δ (i) (Xk). (67) Xk i=1 ˜ vk ∼ tν˜k (˜µv,k, Σvv,k) (65a) The weight update of PF can be obtained recursively as ˜ ek ∼ tν˜k (˜µe,k, Σee,k). (65b) (i) (i) (i) p(yk|X ,Yk−1)p(x |X ,Yk−1) ω(i) = ω(i) k k k−1 . k k−1 (i) (68) Moreover, the conditional, p(ek|vk) can be obtained as, q(xk |·) So to obtain the new weights, we need to evaluate p(ek|vk) ∼ t (˜µ , Σ˜ ) (65c) (˜νk+de) e|v,k e|v,k p(yk|Xk,Yk−1) and p(xk|Xk−1,Yk−1) respectively. Now, 9

from (57a)–(57b), p([xk − f(xk−1)]|Xk−1,Yk−1) = p(vk), We again stress that the path dependency of the parameter which is given by (65a). So, posterior leads to accumulation of errors over time. However, ∗ ˜ by using the principle of exponential forgetting, this is less (xk|Xk−1,Yk−1) ∼ t(˜νk+dv )(˜µv,k, Σvv,k), (69) critical here. Now, we define Tk(xk) := p(θk|xk,Yk). Since ∗ where µ˜v,k =µ ˜v,k + f(xk−1). Tk(xk) can be written as Z Similarly, p([yk − h(xk)]|Xk,Yk−1) = p(ek|vk), given by Tk(xk) = p(θk|Xk,Yk)p(Xk−1|Yk−1, xk)dXk−1,(78) (65c). So p(yk|Xk,Yk−1) is another Student’s t distribution as given by (65c), with mean modified as we can establish the recursive relation for Tk(xk) as ∗ µ˜¯ =µ ˜ + h(xk). (70) ZZ e|v,k e|v,k p(θk|Xk,Yk)p(θk−1|Xk−1,Yk−1) Tk(xk) = × Subsequently, from (67), one can approximate the marginal as p(θk−1|Xk−1,Yk−1) N × p(Xk−2|Yk−2, xk−1)p(xk−1|Yk−1, xk) × X (i) p(xk|Yk) ' ωk δ (i) (xk). (71) × dXk−2 dxk−1 xk i=1 Z p(θk|Xk,Yk) = Tk−1(xk−1) × PF provides an approximation of the joint smoothing p(θk−1|Xk−1,Yk−1) distribution recursively. However, as k increases, such particle × p(xk−1|Yk−1, xk)dxk−1. (79) filter suffers due to a progressively impoverished particle representation as a result of resampling. This problem is From the forward particle filter, we have known as the particle path degeneracy problem [8]. On the PN (j) (j) other hand, uniform convergence in time of the particle filter j=1 ωk−1p(xk|xk−1)δ (j) (xk−1) xk−1 is known under the mixing assumptions as in [18]. This p(xk−1|Yk−1, xk) ≈ . PN ω(l) p(x |x(l) ) property ensures that any error is forgotten exponentially with l=1 k−1 k k−1 (80) time and explains why the particle filter works for marginal filter density (as in (71)) in most practical applications. Now using (80) we can approximate (79) as

2) Updates on noise parameters: We note from ((57a)– N (ij) (ij) (i) X NiW(νk ,Vk ) (j) (57b)) that knowing the sequence (Xk−1,Yk−1) would lead Tk(xk ) = Tk−1(xk−1)× NiW(ν(j) ,V (j) ) us to the completely observed noise sequence w1:k−1. Now j=1 k−1 k−1 suppose that p(θ |X ,Y ) is given by a Normal- (j) (i) (j) k−1 k−1 k−1 ωk−1p(xk |xk−1) 4 × , (81) inverse-Wishart prior as PN (l) (i) (l) l=1 ωk−1p(xk |xk−1) p(θk−1|Xk−1,Yk−1) = p(θk−1|wk−1) = NiW(νk−1,Vk−1). (72) with  (ij) Since we assume the noise parameters to be slowly varying, (ij) (j) w  (ij)  V = λV + k (w )T 1 (82) we approximate the time update step using principle of expo- k k−1 1 k nential forgetting [25] as (ij) (j) νk = λνk−1 + 1, (83) p(θk|Xk−1,Yk−1) = p(θk|wk−1) = NiW(λνk−1, λVk−1), (73) where we define ! 5 (i) (j) where λ ∈ [0, 1] is the forgetting factor used . Via Bayesian (ij) xk − f(xk−1) wk = (i) . (84) conjugacy, the posterior p(θk|Xk,Yk) is again a Normal- yk − h(xk ) inverse-Wishart distribution as Finally, using (81) and (71), we get p(θ |X ,Y ) = p(θ |w ) = NiW(ν ,V ), (74) k k k k k k k Z with p(θk|Yk) ≈ Tk(xk)p(xk|Yk)dxk w    V = λV + k (w )T 1 (75) N k k−1 1 k X (i) (i) = ωk Tk(xk ). (85) νk = λνk−1 + 1, (76) i=1 where     vk xk − f(xk−1) C. Algorithmic summary w = = . (77) k e y − h(x ) k k k In this section, we give a summary of one step of the main 4 Treating both µk and Σk to be unknown might have implications on algorithm. the observability and identifiability of the model. When Σk is the only unknown parameter, a suitable conjugate prior is given by the inverse Wishart Algorithm 2: [Estimating the statistics of unknown distribution and the posterior predictive is again a Student’s t distribution [27]. 5Similar argument was put forward in [26], page 369, where λ is called a dependent Gaussian noises] discount factor. 10

• At time step (k − 1): constant process noise is assumed. The following application N n (i) (i) (i) o areas surveyed in [16] are important cases: we have xk−1, ωk−1,Tk−1(xk−1) , such that i=1 • Navigation – The input is one of or a combination of N acceleration and angular rates as measured by accelerom- X (i) p(xk−1|Yk−1) ' ωk−1δ (i) (xk−1), eters and gyroscopes. The basic form of continuous time xk−1 i=1 dynamics is in both cases given by a double integrator N X (i) (i) p¨(t) = u(t) + v(t). The sensors are typically low-pass p(θk−1|Yk−1) ' ωk−1Tk−1(xk−1). filtered before sampling to avoid aliasing, and thus the i=1 true acceleration/angular rate u(t)+v(t) is not piecewise • At time step (k): constant. with new observation yk • Odometry – Here the angular speeds of two wheels – Generate particles from the proposal are measured and used to compute the position based (i) xk ∼ q(xk|·). on the principle of dead-reckoning. The basic form of – Recursive weight update: continuous time dynamics here is a single integrator (i) p˙(t) = u(t)+v(t). The angular rate encoders are typically ωk according to (68) using (69)–(70). (i) low-pass filtered before sampling. Recursive computation Tk(xk ) using (81)–(84). • Tracking – The input is an unobserved force, so u(t) = 0 and v(t) models the force input. The dynamics is given To summarize this section, the problem of estimating the by a double integrator p¨(t) = v(t). A suitable model unknown state from a general state space model involving for v(t) is subject to debate, but all cases except when unknown Gaussian noise characteristics is presented. Specifi- it is assumed piecewise constant synchronized with the cally, we addressed the online joint state and noise parameter external position sensor lead to dependent noise. estimation problem involving additive dependent Gaussian noises using a PF based approach. VII.CONCLUSIONS The fact that the process noise is dependent to the measure- VI.SAMPLING CONTINUOUS TIME MODELS ment noise in sampled models is often neglected in literature. Dependency between process and observation noises in a There might be several reasons for this. The process noise is general state space model occurs naturally in many situation, often instrumental for the tuning, so its physical interpretation although it is often ignored in favor of modeling conveniences. is of less importance. Another reason is that the correlation This section motivates the importance of dependent noise can be quite small for fast sampling compared to the time processes for a wide range of applications starting with a constant of the system. A final reason might be that the PF continuous time models. theory is not yet adapted to dependent noise, in contrast to Consider the following continuous time with the KF literature where this is more of a standard assumption a noisy input u(t), with a rather simple remedy. x˙(t) = Ax(t) + Bu(t) + v(t), (86a) We have extended the particle filter theory in three ways. First, the important choice of proposal density is examined. y(tk) = Cx(tk) + e(tk), (86b) Both the prior and optimal proposal are derived for depen- e(tk) ∈ N (0,R). (86c) dent noise, for two different cases of dependence structures. where x(t) is the continuous time state, y(tk) is the discrete For Gaussian noise, the optimal proposal gets an analytical time observation and v(t) is a zero mean white Gaussian noise expression, which further simplifies to a Gaussian for the process, with E(v(t)v(s)T ) = Qδ(t − s), where E(·) denotes prior proposal. Second, the marginalized particle filter, that is the expectation operator and δ(·) is the . instrumental for real-world applications to mitigate the curse For the continuous time model, it is natural to assume that of dimensionality, was derived for dependent noise. Third, the v(t) and e(tk) are independent. However, of less studied problem of estimating parameters in the noise the model to a sampled discrete time model defined at the distributions was addressed for the case of Gaussian dependent time instants tk may introduce dependence. Tables III and noise. IV summarize different sampling strategies (see Chapter 13.1 in [5]) for single and double integrators, respectively. The ACKNOWLEDGMENT corresponding discrete time model is given by, The authors would like to thank the Linnaeus research envi-  ronment CADICS, funded by the Swedish Research Council xk+1 = F xk + G uk + vk = F xk + Guk +v ¯k, (87a)  for the financial support. yk = Hxk + J uk + vk + ek = Hxk + Juk +e ¯k, (87b) APPENDIX     T T  v¯k GQG GQJ DETAILED STEPS OF MPF USINGALINEAR-GAUSSIAN ∈ N 0, T T . (87c) e¯k JQG JQJ + R SUBSTRUCTURE l p l We note from Tables III and IV that we get dependence (J 6= 0 At time zero, p(x0|X0 ,Y0) = N (ˆx0|0,P0|0). and G 6= 0) in all cases except for ZOH, where a piecewise Under favorable mixing condition, we assume that 11

Table III CORRELATION DUE TO SAMPLING OF THE STATE x(t) = (p(t), v(t))T INADOUBLEINTEGRATORUSINGZERO-ORDERHOLD (ZOH, ASSUMING PIECEWISECONSTANTINPUT), FIRST-ORDERHOLD (FOH, ASSUMINGPIECEWISELINEARINPUT), ANDBILINEARTRANSFORMATION (BIL), WHICHIS ANOFTENUSEDAPPROXIMATIONFORBAND-LIMITEDSIGNALS.     0n In 0n Continuous time A = B = C = (In, 0n) D = 0n 0n 0n In    T 2  In TIn In ZOH F = G = 2 H = (In, 0n) J = 0n 0n In TIn    2  In TIn T In T 2 FOH F = G = H = (In, 0n) J = In 0n In TIn 6   T 2 ! In TIn In  T  T 2 BIL F = G = 4 H = In, In J = In 0n In T 2 2 2 In

Table IV SIMILARTO TABLE III, BUT FOR A SINGLE INTEGRATOR USING THE STATE x(t) = p(t).

Continuous time A = 0n B = In C = In ZOH F = In G = TIn H = In J = 0n T FOH F = In G = TIn H = In J = 2 In T BIL F = In G = In H = TIn J = 2 In

l p l p(xk|Xk ,Yk) = N (ˆxk|k,Pk|k) at an arbitrary time, k. Now we outline here one complete cycle of propagating PF measurement update (PF: MU) the joint density of the state conditional on the available With new measurement yk+1, we get p(Xk+1|Yk+1) ' PN (i) observation. i=1 ωk+1δ (i) (Xk+1), where the weights of the particle Xk+1 PF time update (PF: TU) filter can be recursively updated according to : At this stage, it is required to generate N new particles N n p(i) o p(i) p(i) p(i) (samples) x from the appropriate importance p(yk+1|X ,Yk)p(x |X ,Yk) k+1 ω(i) = ω(i) k+1 k+1 k . p i=1 k+1 k p(i) (92) function q(xk+1|·). q(xk+1|·) KF dynamic measurement update (KF : DYN MU) p p The transition density p(xk+1|Xk ,Yk) can be obtained as p p l l p l p p(xk+1|Xk , xk,Yk)p(xk|Xk ,Yk) p(x |X ,Yk) = . Z k k+1 R p(xp |Xp, xl ,Y )p(xl |Xp,Y )dxl p p p p l l p l k+1 k k k k k k k p(x |X ,Yk) = p(x |X , xk,Yk)p(xk|X ,Yk)dxk, (88) k+1 k k+1 k k

l p l where, p(xk|Xk ,Yk) = N (ˆxk|k,Pk|k), as obtained from the l p l prior. Since at this stage Z(2) is known, we have from (55b) From the prior, we have p(xk|Xk ,Yk) = N (ˆxk|k,Pk|k). k p p tr tr (1) l p and (55d), p(xk+1|Xk ,Yk) = N (µk+1, Σk+1), where Now, at this stage, Zk is available. Let p(xk|Xk+1,Yk) = N (ˆxl∗ ,P ∗ ). Then following the proof (part 2) of [19], k|k k|k tr p ¯p l p py (2) µk+1 = fk + Akxˆk|k + GkΓk Zk (93a) N ∗ = A¯pP (A¯p)T + GpΛp¯(Gp)T (89a) tr ¯p ¯p T p p¯ p T k k k|k k k k k Σk+1 = AkPk|k(Ak) + GkΛk(Gk) . (93b) ¯p T ∗ −1 Lk = Pk|k(Ak) (Nk ) (89b) We now obtain the likelihood density p(y |Xp ,Y ) as l∗ l (1) ¯p l p py (2) k+1 k+1 k xˆk|k =x ˆk|k + Lk(Zk − Akxˆk|k − GkΓk Zk )(89c) ∗ ∗ T Z Pk|k = Pk|k − Lk(Nk )(Lk) (89d) p p l p(yk+1|Xk+1,Yk) = p(yk+1|Xk+1, xk+1,Yk)× × p(xl |Xp ,Y )dxl . (94) KF time update (KF: TU) k+1 k+1 k k+1 p L L Let p(yk+1|Xk+1,Yk) = N (µk+1, Σk+1). Then Z l p l p l p(xk+1|Xk+1,Yk) = p(xk+1|Xk+1, xk,Yk)× L l µk+1 = hk+1 + Ck+1xˆk+1|k (95a) l p l L T yy × p(xk|Xk+1,Yk)dxk, (90) Σk+1 = Ck+1Pk+1|k(Ck+1) + Σk+1. (95b) l p l∗ ∗ where p(xk|Xk+1,Yk) = N (ˆxk|k,Pk|k). It follows that l p l p(xk+1|Xk+1,Yk) = N (ˆxk+1|k,Pk+1|k) with KF measurement update (KF: MU) l p From KF time update stage, we have p(xk+1|Xk+1,Yk) = xˆl = f¯l + A¯l xˆl∗ (91a) l k+1|k k k k|k N (ˆxk+1|k,Pk+1|k). As yk+1 is available now, it implies that ¯l ∗ ¯l T l ¯l l T (2) Pk+1|k = AkPk|k(Ak) + GkΛk(Gk) (91b) Zk+1 = (yk+1 − hk+1) is also available at this stage. Now 12 following the proof (part 1) of [19], we have [21] F. Gustafsson and S. Saha, ”Particle filtering with dependent noises”, 13th International Conference on Information Fusion (Fusion 2010), l p Edinburgh, UK. p(x |X ,Yk+1) = k+1 k+1 [22] S. Saha, E. Ozkan,¨ F. Gustafsson and V. Smidl, ”Marginalized Particle p l l p 13th p(yk+1|Xk+1, xk+1,Yk)p(xk+1|Xk+1,Yk) Filters for Bayesian Estimation of Gaussian Noise Parameters”, R p l l p l . (96) International Conference on Information Fusion (Fusion 2010), Edin- p(yk+1|Xk+1, xk+1,Yk)p(xk+1|Xk+1,Yk)dxk+1 burgh, UK. [23] F. Gustafsson, S. Saha and U. Orguner, ”The Benefits of Down-Sampling Using the fact that the measurement noise and thereby in the Particle Filter”, 14th International Conference on Information p l p(yk+1|Xk+1, xk+1,Yk) is Gaussian, and using KF, We can Fusion (Fusion 2011), Chicago, USA. show that [24] S. Saha and F. Gustafsson, ”Marginalized Particle Filter for Dependent Gaussian Noise Processes”, IEEE Aerospace Conference, 2012, Mon- tana, USA. l p l p(xk+1|Xk+1,Yk+1) = N (ˆxk+1|k+1,Pk+1|k+1), where [25] R. Kulhavy´ and M. B. Zarrop, ”On a general concept of forgetting”, International Journal of Control, Vol 58, No. 4, pp. 905–924, 1993. T yy Bayesian and Dynamic Models Mk+1 = Ck+1Pk+1|k(Ck+1) + Σ (97a) [26] M. West and J. Harrison, , k+1 Springer-Verlag, 1989. −1 Kk+1 = Pk+1|kCk+1(Mk+1) (97b) [27] C. M. Bishop, Pattern Recognition and , Springer, l l (2) l 2006. xˆk+1|k+1 =x ˆk+1|k + Kk+1(Zk+1 − Ck+1xˆk+1|k)(97c) T Pk+1|k+1 = Pk+1|k − Kk+1Mk+1(Kk+1) (97d) 

REFERENCES [1] F. Gustafsson, ”Adaptive Filtering and Change Detection”, John Wiley and Sons, 2000. [2] V. Smidl and A. Quinn, ”The Variational Bayes Method in Processing”, Springer, 2006. Saikat Saha is an Assistant Professor at the Di- [3] A. Doucet, S. Godsill, C. Andrieu, ”On sequential Monte Carlo sampling vision of Automatic Control, Linkoping¨ University, methods for Bayesian filtering”, Statistics and Computing,Vol 10, No. Sweden. He previously held a postdoctoral posi- 3, pp. 197–208, 2000. tion in the same division between 2009 and 2011. [4] T. Kailath, A. H. Sayed and B. Hassibi, ”Linear Estimation”, Prentice- He received his MSc (Engg) from Indian Institute Hall, Upper Saddle River, NJ, 2000. of Science in 2003 and PhD from University of [5] F. Gustafsson, Statistical Sensor Fusion, Studentlitteratur, 2010. Twente, the Netherlands in 2009. His research in- [6] S. Arulampalam, S. Maskell, N. Gordon and T. Clapp, ”A Tutorial on terests include statistical , informa- Particle Filters for online nonlinear/non-Gaussian Bayesian tracking”, tion fusion, system identification and computational IEEE Transaction on Signal Processing, vol. 50(2), pp. 174–188, Feb. finance. 2002. [7] J. V. Candy, ”Bootstrap particle filtering”, IEEE Signal Processing Magazine, vol. 24(4), pp. 73–85, 2007. [8] O. Cappe, S. J. Godsill and E. Moulines, ”An overview of existing meth- ods and recent advances in sequential Monte Carlo”, IEEE Proceedings, vol. 95(5), pp. 899–924, 2007. [9] P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo and J. Miguez, ”Particle filtering”, IEEE Signal Processing Magazine, vol. 20(5), pp. 19–38, 2003. [10] F. Gustafsson, ”Particle filter theory and practice with positioning applications”, IEEE Aerospace and Electronic Systems Magazine, vol. 25(7), pp. 53–82, 2010. Fredrik Gustafsson is professor in Sensor In- [11] A. Doucet and A. M. Johansen, ”A Tutorial on Particle Filtering and formatics at Department of Electrical Engineering, Smoothing: Fifteen years Later”, Handbook of Nonlinear Filtering (eds. Linkoping¨ University, since 2005. He received the D. Crisan et B. Rozovsky), Oxford University Press, 2011. M.Sc. degree in electrical engineering 1988 and the [12] D. Simon, ”Optimal State Estimation”, Wiley Interscience, 2006. Ph.D. degree in Automatic Control, 1992, both from [13] N. Caylus, A. Guyader and F. Legland, ”Particle filters for partially ob- Linkoping¨ University. During 1992-1999 he held served Markov chains”, IEEE workshop on Statistical Signal Processing, various positions in automatic control, and in 1999 Saint-Louis, 2003. he got a professorship in Communication Systems. [14] F. Desbouvries and W. Pieczynski, ”Particle filtering with pairwise In 2004, he was awarded the Arnberg prize by the Markov processes”, IEEE ICASSP, Hong-Kong, 2003 Royal Swedish Academy of Science (KVA) and in [15] S. Saha, ”Topics in Particle Filtering and Smoothing”, PhD thesis, Univ. 2007 he was elected member of the Royal Academy of Twente, 2009, ISBN 978-90-365-2864-1. of Engineering Sciences (IVA). His research interests are in stochastic signal [16] F. Gustafsson, F. Gunnarsson, N. Bergman, U. Forssell, J. Jansson, R. processing, adaptive filtering and change detection, with applications to Karlsson and P. Nordlund, ”Particle Filters for Positioning, Navigation communication, vehicular, airborne and audio systems, where the current and Tracking”, IEEE Transaction on Signal Processing, vol. 50(2), Feb. focus is on sensor fusion algorithms for navigation and tracking problems. 2002. He was an associate editor for IEEE Transactions of Signal Processing 2000- [17] R. Chen and J. S. Liu, ”Mixture Kalman filters”, Journal of the Royal 2006 and is currently associate editor for EURASIP Journal on Applied Signal Statistical Society, vol. 62(3), pp. 493–508, 2000. Processing and IEEE Transactions on Aerospace and Electronic Systems. He [18] D. Crisan and A. Doucet, ”A Survey of Convergence Results on is an IEEE fellow. Particle Filtering Methods for Practitioners,” IEEE Transaction on Signal Processing, vol. 50(3), pp. 736–746, 2002. [19] T. Schon,¨ F. Gustafsson and P. J. Nordlund, ”Marginalized Particle Filters for Mixed Linear/Nonlinear State-Space Models”, IEEE Trans- actions on Signal Processing,Vol 53, No. 7, pp. 2279–2289, Jul. 2005. [20] K. P. Murphy, ”Conjugate Bayesian Analysis of the Gaussian distribu- tion”, Technical Report, UBC, 2007.