Linköping University Post Print

Decentralized Particle Filter with Arbitrary State Decomposition

Tianshi Chen, Thomas B. Schön, Henrik Ohlsson and Lennart Ljung

N.B.: When citing this work, cite the original article.

©2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Tianshi Chen, Thomas B. Schön, Henrik Ohlsson and Lennart Ljung, Decentralized Particle Filter with Arbitrary State Decomposition, 2011, IEEE Transactions on Signal Processing, (59), 2, 465-478. http://dx.doi.org/10.1109/TSP.2010.2091639 Postprint available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-66191

1 Decentralized Particle Filter with Arbitrary State Decomposition Tianshi Chen, Member, IEEE, Thomas B. Sch¨on, Member, IEEE, Henrik Ohlsson, Member, IEEE, and Lennart Ljung, Fellow, IEEE

Abstract—In this paper, a new particle filter (PF) which we In this paper, a new PF, which we refer to as the decen- refer to as the decentralized PF (DPF) is proposed. By first tralized PF (DPF), will be proposed. By first decomposing the decomposing the state into two parts, the DPF splits the filtering state into two parts, the DPF splits the filtering problem of problem into two nested sub-problems and then handles the two nested sub-problems using PFs. The DPF has the advantage over system (1) into two nested sub-problems and then handles the regular PF that the DPF can increase the level of parallelism the two nested sub-problems using PFs. The DPF has the of the PF. In particular, part of the resampling in the DPF bears advantage over the regular PF that the DPF can increase the a parallel structure and can thus be implemented in parallel. The level of parallelism of the PF in the sense that besides the parallel structure of the DPF is created by decomposing the state particle generation and the importance weights calculation, space, differing from the parallel structure of the distributed PFs which is created by dividing the sample space. This difference part of the resampling in the DPF can also be implemented results in a couple of unique features of the DPF in contrast with in parallel. As will be seen from the DPF algorithm, there are the existing distributed PFs. Simulation results of two examples actually two resampling steps in the DPF. The first resampling indicate that the DPF has a potential to achieve in a shorter in the DPF, like the resampling in the regular PF, cannot be execution time the same level of performance as the regular PF. implemented in parallel, but the second resampling bears a Index Terms—Particle filtering, parallel algorithms, nonlinear parallel structure and can thus be implemented in parallel. system, state estimation. Hence, the parallel implementation of the DPF can be used to shorten the execution time of the PF. I.INTRODUCTION As pointed out in [5], the application of PFs in real-time systems is limited due to its computational complexity which In this paper, we study the filtering problem for the follow- is mainly caused by the resampling involved in the PF. The ing nonlinear discrete-time system resampling is essential in the implementation of the PF as

ξt+1 = ft(ξt, vt) without resampling the variance of the importance weights will (1) increase over time [13]. The resampling however introduces yt = ht(ξt,et) a practical problem. The resampling limits the opportunity to Rnξ where t is the discrete-time index, ξt ∈ is the state at time parallelize since all the particles must be combined, although n n n t, yt ∈ R y is the measurement output, vt ∈ R v and et ∈ R e the particle generation and the importance weights calculation are independent noises whose known distributions are indepen- of the PF can still be realized in parallel [13]. Therefore, the dent of t, ξt and yt, and ft(·) and ht(·) are known functions. resampling becomes a bottleneck to shorten the execution time The filtering problem consists of recursively estimating the of the PF. Recently, some distributed resampling algorithms posterior density p(ξt|y0:t) where, y0:t , {y0, ..., yt}. Analytic for parallel implementation of PFs have been proposed in solutions to the filtering problem are only available for a [5,22]. The idea of the distributed resampling is to divide relatively small and restricted class of systems, the most the sample space into several strata or groups such that the important being the Kalman filter [19] which assumes that resampling can be performed independently for each stratum system (1) has a linear-Gaussian structure. A class of powerful or group and can thus be implemented in parallel. The effect numerical algorithms to the filtering problem are particle filters of different distributed resampling algorithms on the variance (PFs), which are sequential Monte Carlo methods based on of the importance weights has been analyzed in [22]. Based particle representations of probability densities [2]. Since the on the introduced distributed resampling algorithms, a couple seminal work [15], PFs have become an important tool in of distributed PFs have been further proposed in [5,22], such handling the nonlinear non-Gaussian filtering problem, and as the distributed resampling with proportional allocation PF have found many applications in statistical signal processing, (DRPA-PF) and the distributed resampling with nonpropor- economics and engineering; see e.g., [2,12,14,16] for recent tional allocation PF (DRNA-PF). surveys of PFs. The underlying idea of the DPF is different from that of the existing distributed PFs, while they all have parallel structure. Copyright (c) 2010 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be The parallel structure of the DPF is created by decomposing obtained from the IEEE by sending a request to [email protected]. the state space, differing from the parallel structure of the The authors are with the Division of Automatic Control, Department of distributed PFs which is created by dividing the sample space. Electrical Engineering, Link¨oping University, Link¨oping, 58183, . E- mail: {tschen, schon, ohlsson, ljung}@isy.liu.se. Phone: +46 (0)13 282 226. This difference results in a couple of unique features of Fax: +46 (0)13 282 622. the DPF in contrast with the existing distributed PFs. First, 2

II. PROBLEM FORMULATION 4 4 4 A. Intuitive preview 3 3 3 The formulas for particle filtering tend to look complex, and it may be easy to get lost in indices and update expressions. 2 2 2 Let us therefore provide a simple and intuitive preview to get the main ideas across. 1 1 1 Filtering is about determining the posterior densities of the

z 0 z 0 z 0 states. If the state is two-dimensional with components x and z, say, the density is a surface over the x − z plane, see Fig. −1 −1 −1 1. One way to estimate the density is to fix M points in the plane in a regular grid (Fig. 1.a) and update the values of −2 −2 −2 the density according to Bayesian formulas. This is known as the point-mass filter [3,6]. Another way is to throw M −3 −3 −3 points at the plane at random (Fig 1.b), and let them move to important places in the plane, and update the values of the −4 −4 −4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 x x x densities at the chosen points, using Bayesian formulas. This a b c is a simplified view of what happens in the regular PF. A third way is illustrated in Fig 1.c: Let the points move to well Fig. 1. Patterns for points where the posterior densities are com- chosen locations, but restrict them to be aligned parallel to one puted/estimated. a. A fixed regular grid used in the point-mass filter. b. Randomly allocated points following the regular PF equations. c. Points of the axes (the z-axis in the plot). The parallel lines can move randomly allocated to vertical parallel lines (which themselves are randomly freely, as can the points on the lines, but there is a restriction located) used in the DPF. of the pattern as depicted. The algorithm we develop in this paper (DPF) gives both the movements of the lines and the positions of the points on the lines, and the density values at the chosen points, by application of Bayesian formulas. It is well known that the regular PF outperforms the point- compared to the DRPA-PF, the DPF allows a simpler scheme mass filter with the same number of points, since it can for particle routing and actually treats each processing element concentrate them to important areas. One would thus expect as a particle in the particle routing. Second, the DPF does that the DPF would give worse accuracy than the regular PF not have the efficiency decrease problem of the DRPA-PF. with the same number of points, since it is less “flexible” in Given a PF with parallel structure, it works most efficiently if the allocation of points. On the other hand, the structure might each processing element handles the same number of particles. allow more efficient ways of calculating new point locations However, the efficiency of the DRPA-PF usually decreases, and weights. That is what we will develop and study in the since the numbers of particles produced by each processing following sections. element are not evenly but randomly distributed among the processing elements. Third, it will be verified by two numer- ical examples that, the DPF has the potential to achieve in a B. Problem statement shorter execution time the same level of performance as the Consider system (1). Suppose that the state ξt can be bootstrap PF. In contrast, the DRNA-PF actually trades the PF decomposed as performance for the speed improvement [5]. Besides, the level of parallelism of the DPF can be further increased in two ways xt ξt = (2) so that the execution time of the parallel implementation of  zt  the DPF can be further shortened; the first one is to utilize and accordingly that system (1) can be decomposed as any of the distributed resampling algorithms proposed in [5, x x 22] to perform the first resampling of the DPF, and the other xt+1 = ft (xt,zt, vt ) z z is based on an extension of the DPF. As a result, the DPF is zt+1 = ft (xt,zt, vt ) (3) a new option for the application of PFs in real-time systems yt = ht(xt,zt,et) and the parallel implementation of PFs. Rnx Rnz x T z T T where xt ∈ , zt ∈ , and vt = [(vt ) (vt ) ] with The rest of the paper is organized as follows. The problem vx ∈ Rnvx and vz ∈ Rnvz . In the following, it is assumed for formulation is given in Section II. In Section III, the DPF t t convenience that the probability densities p(x0), p(z0|x0) and algorithm is first derived and then summarized, and finally for t ≥ 0, p(xt+1|xt,zt),p(zt+1|xt:t+1,zt) and p(yt|xt,zt) some issues regarding the implementation of the DPF are are known. discussed. In Section IV, some discussions about the DPF and In this paper, we will study the filtering problem of its comparison with the existing distributed PFs are made. In recursively estimating the posterior density p(zt, x0:t|y0:t). Section V, two numerical examples are elaborated to show the According to the following factorization efficacy of the DPF. Finally, we conclude the paper in Section VI. p(zt, x0:t|y0:t)= p(zt|x0:t,y0:t)p(x0:t|y0:t) (4) 3

where x0:t , {x0, ..., xt}, the filtering problem (4) can be p(x0:t|y0:t−1) p(zt|x0:t,y0:t−1) split into two nested sub-problems:

1) recursively estimating the density p(x0:t|y0:t); p(x0:t|y0:t) p(yt|x0:t,y0:t−1) p(yt|xt, zt) 2) recursively estimating the density p(zt|x0:t,y0:t). That the two sub-problems are nested can be seen from Fig. p(zt|x0:t,y0:t) 2 where we have sketched the five steps used to derive the recurrence relation of the conceptual solution to the filtering problem (4). Since there is in general no analytic solution p(x0:t+1|y0:t) p(xt+1|x0:t,y0:t) p(xt+1|xt, zt) to the filtering problem (4), a numerical algorithm, i.e., the DPF is introduced to provide recursively the empirical approx- p(zt|x0:t+1,y0:t) imations to p(x0:t|y0:t) and p(zt|x0:t,y0:t). The DPF actually handles the two nested sub-problems using PFs. Roughly speaking, the DPF solves the first sub-problem using a PF with p(zt+1|xt:t+1, zt) p(zt+1|x0:t+1,y0:t) (i) Nx particles (x0:t, i =1, ..., Nx) to estimate p(x0:t|y0:t). Then Fig. 2. Sketch of the steps used to derive the conceptual solution to the the DPF handles the second sub-problem using Nx PFs with (i) filtering problem (4). Assume that the probability densities p(x0), p(z0|x0) Nz particles each to estimate p(zt|x0:t,y0:t), i = 1, ..., Nx. and for t ≥ 0, p(xt+1|xt,zt),p(zt+1|xt:t+1,zt) and p(yt|xt,zt) are As a result of the nestedness of the two sub-problems, it known. With a slight abuse of notation let p(x0|y0:−1) = p(x0) and p(z0|x0,y0:−1) = p(z0|x0). Then five steps are needed to derive the will be seen later that the steps of the PF used to estimate recurrence relation of the conceptual solution to the filtering problem (4). p(x0:t|y0:t) is nested with that of the Nx PFs used to estimate The specific relations between the different densities can be obtained by (i) straightforward application of Bayesian formulas and thus are omitted. Instead, p(zt|x0:t,y0:t). the notation p1(·) → p2(·) is used to indicate that the calculation of the Remark 2.1: The idea of decomposing the state into two density p2(·) makes use of the knowledge of the density p1(·). parts and accordingly splitting the filtering problem into two nested sub-problems is not new. Actually, it has been used in the Rao-Blackwellized PF (RBPF); see, e.g, [1,8,9,11,13, throughout the paper. Suppose we have

25]. However, the RBPF imposes certain tractable substructure N assumption on the system considered and hence solves one of 1 PN (dξt)= δξ(i) (dξt) (6) the sub-problem with a number of optimal filters, such as the N t Xi=1 Kalman filter [19] or the HMM filter [24]. In particular, the (i) filtering problem (4) has been previously studied in [25] where where δ (i) (A) is a Dirac measure for a given ξt and a ξt system (3) is assumed to be conditionally (on xt) linear in measurable set A. Then the expectation of any test function zt and subject to Gaussian noise. Due to these assumptions, g(ξt) with respect to p(ξt) can be approximated by the state zt of system (3) is marginalized out by using the N Kalman filter. However, since there is no tractable substructure 1 (i) g(ξt)p(ξt)dξt ≈ g(ξt)PN (dξt)= g(ξ ) (7) assumption made on system (3) in this paper, no part of the Z Z N t Xi=1 state ξt is analytically tractable as was the case in [1,8,9,11, 13,25]. ♦ Although the distribution PN (dξt) does not have a well defined density with respect to the Lebesgue measure, it is In the following, let x˜ ∼ p(x) denote that x˜ is a sample a common convention in the particle filtering community [2, drawn from the density p(x) of the random variable x, 15] to use let N (m, Σ) denote the (multivariate) Gaussian probability density with mean vector m and covariance matrix Σ and let N 1 (i) P denote the probability of the event . For convenience, pN (ξt)= δ(ξt − ξt ) (8) (A) A N N Xi=1 for each i =1, ..., N, αi = βi/ j=1 βj is denoted by P where δ(·) is a Dirac delta function, as if it is an empirical N approximation of the probability density p(ξt) with respect αi ∝ βi; αi =1 (5) to the Lebesgue measure. The notations like (8) and the Xi=1 corresponding terms mentioned above are, in the most rigorous mathematical sense, not correct. However, they enable one to where N is a natural number, αi,βi, i =1, ..., N, are positive real numbers, and αi ∝ βi denotes that αi is proportional to avoid the use of measure theory which indeed simplifies the representation a lot especially when a theoretical convergence βi. proof is not the concern.

III. DECENTRALIZED PARTICLE FILTER A. Derivation of the DPF algorithm (i) In this section, the DPF algorithm is first derived and then First, we initialize the particles x˜0 ∼ p(x0), i =1, ..., Nx, (i) (i,j) (i) summarized. Finally, some issues regarding the implementa- and for each x˜0 , the particles z˜0 ∼ p(z0|x˜0 ), j = tion of the DPF are discussed. 1, ..., Nz. Then the derivation of the DPF algorithm will be We here make a comment regarding some notations used completed in two steps by the induction principle. In the first 4 step, we show that Inductive Assumptions 3.1 to 3.3 to be A convergence result formalizing the “closeness” of (13) introduced below hold at t =1. In the second step, assume that to (12), as Nx and Nz tend to infinity, will be in line with our Inductive Assumptions 3.1 to 3.3 hold at t for t ≥ 1, then we earlier convergence results [18] of the PF for rather arbitrary show that Inductive Assumptions 3.1 to 3.3 hold recursively unbounded test functions. ♦ at t +1. Then in the first step we should show that Inductive As- In the following, we first introduce Inductive Assumptions sumptions 3.1 to 3.3 hold at t = 1. However, since the first 3.1 to 3.3 at t. step is exactly the same as the second step, we only consider (i) Assumption 3.1: The DPF produce Nx particles x0:t−1,i = the second step here due to the space limitation. In the second 1, ..., Nx and an unweighted empirical approximation of step, assume that Inductive Assumptions 3.1 to 3.3 hold at t, p(x0:t−1|y0:t−1) as then we will show in seven steps that Inductive Assumptions 3.1 to 3.3 recursively hold at . Nx t +1 1 (i) p˜Nx (x0:t−1|y0:t−1)= δ(x0:t−1 − x0:t−1) (9) 1) Measurement update of x0:t based on yt Nx Xi=1 By using importance sampling, this step aims to de- rive a weighted empirical approximation of p(x |y ). and for each path (i) , , the DPF produce 0:t 0:t x0:t−1 i = 1, ..., Nx Note that an unweighted empirical approximation N particles z¯(i,j), j = 1, ..., N , and a weighted empirical z t−1 z p˜N (x0:t−1|y0:t−1) of p(x0:t−1|y0:t−1) has been given as (i) x approximation of p(zt−1|x0:t−1,y0:t−1) as (9) in Inductive Assumption 3.1, then simple calculation shows that the following equation holds approximately Nz (i) (i,j) (i,j) pNz (zt−1|x0:t−1,y0:t−1)= q¯t−1 δ(zt−1 − z¯t−1 ) (10) p(x0:t|y0:t) ∝ p˜Nx (x0:t−1|y0:t−1)p(xt|x0:t−1,y0:t−1) Xj=1 × p(yt|x0:t,y0:t−1) (14) where q¯(i,j) is the importance weight and its definition will t−1 (i) be given in the derivation. From Inductive Assumptions 3.1 and 3.2, x0:t−1, i = (i) 1, ..., N , can be regarded as samples drawn from Assumption 3.2: The particles x˜t , i = 1, ..., Nx, are gen- x (i) (i) erated according to the proposal function π(xt|x ,y0:t−1) p˜Nx (x0:t−1|y0:t−1), and for i = 1, ..., Nx, x˜t is 0:t−1 i (i) (i,j) the sample drawn from π(x |x( ) ,y ). Therefore and for each x˜ , i = 1, ..., Nx, the particles z˜ , j = t 0:t−1 0:t−1 t t (i) 1, ..., Nz, are generated according to the proposal function p˜Nx (x0:t−1|y0:t−1)π(xt|x0:t−1,y0:t−1) can be treated as (i) (i) (i) (i) the importance function. On the other hand, from (10) π(zt|x˜0:t,y0:t−1) where x˜0:t , (x0:t−1, x˜t ). Assumption 3.3: For each i = 1, ..., Nx, an approximation and (i) (i) pNz (yt|x˜0:t,y0:t−1) of p(yt|x˜0:t,y0:t−1) can be obtained as p(xt|x0:t−1,y0:t−1) (i) (15) pNz (yt|x˜0:t,y0:t−1) = p(zt−1|x0:t−1,y0:t−1)p(xt|xt−1,zt−1)dzt Nz Nz Z (i,j) (i) (i,j) (i,l) (11) = r˜t p(yt|x˜t , z˜t )/ r˜t (i) (i) an approximation pNz (˜xt |x0:t−1,y0:t−1) of Xj=1 Xl=1 (i) (i) p(˜xt |x0:t−1,y0:t−1) can be obtained as with (i,j) (i,j) (i) (i,j) (i) , r˜t =p ˜Nz (˜zt |x˜0:t,y0:t−1)/π(˜zt |x˜0:t,y0:t−1) (i) (i) (i,j) (i) pNz (˜xt |x0:t−1,y0:t−1) where p˜Nz (˜zt |x˜0:t,y0:t−1) is an approximation of (i,j) (i) Nz p(˜zt |x˜0:t,y0:t−1) and its definition will be given in the (i,j) (i) (i) (i,j) (16) derivation. = q¯t−1 p(˜xt |xt−1, z¯t−1 ) Remark 3.1: In Inductive Assumptions 3.1 and 3.3, Xj=1 (i) p˜ (x |y ) and p (z |x ,y ) are empirical (i) Nx 0:t−1 0:t−1 Nz t−1 0:t−1 0:t−1 Moreover, an approximation of i pNz (yt|x˜0:t,y0:t−1) ( ) (i) approximations of p(x0:t−1|y0:t−1) and p(zt−1|x0:t−1,y0:t−1), p(yt|x˜0:t,y0:t−1) has been given as (11) in Inductive respectively. These approximations should of course be as Assumption 3.3. close to the true underlying density as possible. This closeness Now using the importance sampling yields a weighted is typically assessed via some test function gt (·) used in the −1 empirical approximation of p(x0:t|y0:t) as following sense, Nx (i) (i) I(gt−1)= gt−1(zt−1, x0:t−1) pNx (x0:t|y0:t)= wt δ(x0:t − x˜0:t) (17) Z Xi=1 × p(z , x |y )dz dx (12) t−1 0:t−1 0:t−1 t−1 0:t−1 (i) where wt is evaluated according to nz t×nx where gt−1 : R × R → R is a test function. With the (i) (i) (i) empirical approximations defined in (9) and (10), an estimate (i) pNz (yt|x˜0:t,y0:t−1)pNz (˜xt |x0:t−1,y0:t−1) wt ∝ ; IN ,N (gt−1) of I(gt−1) is obtained as follows (i) (i) x z π(˜x |x ,y0:t−1) t 0:t−1 (18) Nx Nz Nx 1 (i,j) (i,j) (i) (i) INx,Nz (gt−1)= q¯t−1 gt−1(¯zt−1 , x0:t−1) (13) wt =1 Nx Xi=1 Xj=1 Xi=1 5

(i) (i,1) (i,1) (i,Nz) (i,Nz ) 2) Resampling of {x˜0:t, z˜t , r˜t , ..., z˜t , r˜t }, i = Simple calculation shows that 1, ..., Nx (i) (i) (i) p(zt|x˜ ,y0:t) ∝ p(zt|x ,y0:t−1)p(yt|x ,zt) By using resampling, this step aims to derive an un- 0:t+1 0:t t (24) (i) (i) weighted empirical approximation of p(x0:t|y0:t). × p(˜xt+1|xt ,zt) Resample Nx times from the discrete distribution over (i) (i) (i,1) (i,1) (i,Nz) (i,Nz ) Analogously to step 3), choose π(zt|x0:t,y0:t−1) as the {{x˜0:t, z˜t , r˜t , ..., z˜t , r˜t },i = 1, ..., Nx} (i,j) (i) (i) importance function and note that p˜Nz (¯zt |x0:t,y0:t−1) with probability mass wt associated with the element (i,j) (i) (i) (i,1) (i,1) (i,Nz) (i,Nz) is an approximation of p(¯zt |x0:t,y0:t−1). Using im- {x˜0:t, z˜t , r˜t , ..., z˜t , r˜t } to generate sam- (i,j) (i) (i,1) (i,1) (i,Nz) (i,Nz ) portance sampling and also noting the definition of rt ples {x t, z¯t , rt , ..., z¯t , rt },i =1, ..., Nx, 0: yields, for each i = 1, ..., Nx, a weighted empirical so that for any m, (i) approximation of p(zt|x˜0:t+1,y0:t) as (m) (m,1) (m,1) (m,Nz) (m,Nz) Nz P{{x0:t , z¯t , rt , ..., z¯t , rt } = (19) p (z |x˜(i) ,y )= q(i,j)δ(z − z¯(i,j)) (25) {x˜(i), z˜(i,1), r˜(i,1), ..., z˜(i,Nz), r˜(i,Nz)}} = w(i) Nz t 0:t+1 0:t t t t 0:t t t t t t Xj=1 (i,j) (i,j) (i) (i) (i) (i,j) where rt can be defined as rt = where x˜0:t+1 , (x0:t, x˜t+1) and qt is evaluated (i,j) (i) (i,j) (i) according to p˜Nz (¯zt |x0:t,y0:t−1)/π(¯zt |x0:t,y0:t−1) according to the definition of r˜(i,j) in Inductive Assumption 3.3 (i,j) (i) (i,j) (i) (i) (i,j) (i,j) t qt ∝ p(yt|xt , z¯t )p(˜xt+1|xt , z¯t )rt ; (See Section III-C1 for more detailed explanation). Nz As a result, it follows from (17) that an unweighted (i,j) (26) qt =1 empirical approximation of p(x0:t|y0:t) is obtained as Xj=1

Nx (i,j) 1 6) Resampling of the particles z¯t , i = 1, ..., Nx, j = p˜ (x |y )= δ(x − x(i)) (20) Nx 0:t 0:t 0:t 0:t 1, ..., Nz Nx Xi=1 By using resampling, this step aims to derive an un- (i) 3) Measurement update of zt based on yt weighted empirical approximation of p(zt|x˜0:t+1,y0:t). By using importance sampling, this step aims to derive For each i = 1, ..., Nx, resample Nz times from the (i) discrete distribution over (i,j) with a weighted empirical approximation of p(zt|x0:t,y0:t). {z¯t , j = 1, ..., Nz} (i,j) (i,j) Simple calculation shows that probability mass qt associated with the element z¯t (i,j) to generate samples {z , j = 1, ..., Nz}, so that for (i) (i) (i) t p(zt|x0:t,y0:t) ∝ p(zt|x0:t,y0:t−1)p(yt|xt ,zt) (21) any m, i,m i,j i,j (i) ( ) ( ) ( ) P{zt =z ¯t } = qt (27) From Inductive Assumption 3.2, for each xt , i = (i,j) 1, ..., Nx, the particles z¯t , j = 1, ..., Nz, are gener- As a result, it follows from (25) that for each i = (i) , an unweighted empirical approximation of ated from the proposal function π(zt|x0:t,y0:t−1). There- 1, ..., Nx (i) p(z |x˜(i) ,y ) is as follows fore π(zt|x0:t,y0:t−1) is chosen as the importance func- t 0:t+1 0:t tion. On the other hand, note that an approximation Nz (i,j) (i) (i,j) (i) (i) 1 (i,j) p˜Nz (¯zt |x0:t,y0:t−1) of p(¯zt |x0:t,y0:t−1) has been p˜Nz (zt|x˜0:t+1,y0:t)= δ(zt − zt ) (28) Nz given in Inductive Assumption 3.3. Xj=1 Then using importance sampling and also noting the (i,j) (i,j) 7) Generation of particles z˜t+1 , i =1, ..., Nx, j =1, ..., Nz definition of rt yields, for each i = 1, ..., Nx, a (i) (i) Assume that for each x˜t+1, i = 1, ..., Nx, the particles weighted empirical approximation of p(zt|x0:t,y0:t) as (i,j) z˜t+1 , j = 1, ..., Nz, are generated according to the Nz (i) (i) (i,j) (i,j) proposal function π(zt+1|x˜0:t+1,y0:t). pNz (zt|x0:t,y0:t)= q¯t δ(zt − z¯t ) (22) By using importance sampling, we try to derive a Xj=1 (i) weighted empirical approximation of p(zt+1|x˜0:t+1,y0:t). (i) (i,j) where q¯ is evaluated according to First, choose π(zt+1|x˜0:t+1,y0:t) as the importance func- t tion. Then from (28) and Nz p(zt |x t ,y t) q¯(i,j) ∝ p(y |x(i), z¯(i,j))r(i,j); q¯(i,j) =1 (23) +1 0: +1 0: t t t t t t (29) Xj=1 = p(zt|x0:t+1,y0:t)p(zt+1|xt:t+1,zt)dzt Z 4) Generation of particles x˜(i) , i =1, ..., N (i) t+1 x an approximation of p(zt+1|x˜ ,y0:t), i = 1, ..., Nx, (i) 0:t+1 Assume that the particles x˜t+1, i =1, ..., Nx, are gener- can be obtained as (i) (i) ated according to the proposal function π(xt+1|x ,y0:t). 0:t p˜Nz (zt+1|x˜0:t+1,y0:t) 5) Measurement update of zt based on xt+1 Nz (30) By using importance sampling, this step aims to derive 1 (i) (i,l) (i) = p(zt+1|x˜t:t+1,zt ) Nz a weighted empirical approximation of p(zt|x˜0:t+1,y0:t). Xl=1 6

(i,j) Now using importance sampling yields that for i = 7) Generation of particles z˜t+1 , i =1, ..., Nx, j =1, ..., Nz (i,j) 1, ..., Nx, a weighted empirical approximation of For each i = 1, ..., Nx, the particles z˜ , j = (i) t+1 p(zt+1|x˜0:t+1,y0:t) as follows 1, ..., Nz, are generated according to the proposal function (i) (i) (i) (i) (i) π(zt+1|x˜0:t+1,y0:t) where x˜0:t+1 , (x0:t, x˜t+1). pNz (zt+1|x˜0:t+1,y0:t) Nz Nz (31) C. Implementation issues = r˜(i,j)δ(z − z˜(i,j))/ r˜(i,l) t+1 t+1 t+1 t+1 1) Two resampling steps: Unlike most of PFs in the lit- Xj=1 Xl=1 erature, the DPF has two resampling steps, i.e., step 2) and (i,j) where r˜t is evaluated according to step 6). Furthermore, the second resampling bears a parallel +1 (i,j) structure. This is because the particles z¯t , i = 1, ..., Nx, r˜(i,j) =p ˜ (˜z(i,j)|x˜(i) ,y )/π(˜z(i,j)|x˜(i) ,y ) (32) t+1 Nz t+1 0:t+1 0:t t+1 0:t+1 0:t j = 1, ..., Nz, can be divided into Nx independent groups in Finally, from (31) and terms of the index i. Therefore, the second resampling can be implemented in parallel. p(yt+1|x0:t+1,y0:t) In the implementation of the first resampling, it (33) would be helpful to note the following points. For = p(zt+1|x0:t+1,y0:t)p(yt+1|xt+1,zt+1)dzt+1 (i,1) (i,Nz ) Z i = 1, ..., Nx, {r˜t , ..., r˜t } is associated with (i) (i,1) (i,Nz) (i,j) (i) {x˜0:t, z˜t , ..., z˜t }, according to the definition r˜t = an approximation pNz (yt+1|x˜0:t+1,y0:t) of (i,j) (i) (i,j) (i) (i) p˜Nz (˜zt |x˜0:t,y0:t−1)/π(˜zt |x˜0:t,y0:t−1). Therefore, af- p(yt |x˜ ,y t) can be obtained as (11) with t replaced +1 0:t+1 0: ter resampling of (i) (i,1) (i,Nz) , , by t +1. In turn, it can be seen from (20), (22), step 4), step {x˜0:t, z˜t , ..., z˜t } i = 1, ..., Nx (i,1) (i,Nz ) 7), and (32) that Inductive Assumptions 3.1 to 3.3 hold with {r˜t , ..., r˜t }, i = 1, ..., Nx, should accordingly be (i,1) (i,Nz ) t replaced by t +1. Hence, we have completed the derivation resampled to obtain {rt , ..., rt }, i = 1, ..., Nx. (i,j) (i,j) of the DPF by the induction principle. According to the definition of r˜t , rt can be de- (i,j) (i,j) (i) (i,j) (i) fined as rt =p ˜Nz (¯zt |x0:t,y0:t−1)/π(¯zt |x0:t,y0:t−1). (i) (i,1) (i,1) (i,Nz) (i,Nz ) B. Summary of the filtering algorithm As a result, {x˜0:t, z˜t , r˜t , ..., z˜t , r˜t }, i = 1, ..., Nx, is resampled in step 2). Moreover, since the Initialization (i) (i) particles x0:t−1,i = 1, ..., Nx, will not be used in Initialize the particles x˜0 ∼ p(x0), i = 1, ..., Nx, and for (i) (i,j) (i) the future, it is actually only necessary to resample each x˜0 , the particles z˜0 ∼ p(z0|x˜0 ), j =1, ..., Nz. With (i) (i,1) (i,1) (i,Nz) (i,Nz ) {x˜t , z˜t , r˜t , ..., z˜t , rt }, i = 1, ..., Nx, to gen- a slight abuse of notation let p(x )= p (x |x ,y )= i i, i, i,N i,N 0 Nz 0 0:−1 0:−1 erate {x( ), z¯( 1), r( 1), ..., z¯( z), r( z )},i =1, ..., N . π(x |x ,y ) and p(z |x ) =p ˜ (z |x ,y ) = t t t t t x 0 0:−1 0:−1 0 0 Nz 0 0 0:−1 2) Construction of the proposal functions: Like [15] where π(z |x ,y ). 0 0 0:−1 the “prior” is chosen as the proposal function, we try to At each time instant (t ≥ 0) (i) (i) choose p(xt+1|x0:t,y0:t) and p(zt+1|x˜0:t+1,y0:t) as the pro- 1) Measurement update of x0:t based on yt (i) (i) (i) posal functions π(xt+1|x0:t,y0:t) and π(zt+1|x˜0:t+1,y0:t), The importance weights wt , i =1, ..., Nx, are evaluated (i) respectively. Unlike [15], however, p(xt+1|x0:t,y0:t) and according to (18). (i) (i) (i,1) (i,1) (i,N ) (i,N ) 2) Resampling of {x˜ , z˜ , r˜ ..., z˜ z , r˜ z }, i = p(zt+1|x˜0:t+1,y0:t) are usually unknown. Therefore, we 0:t t t t t (i) 1, ..., Nx need to construct approximations to p(xt+1|x0:t,y0:t) and (i) (i) (i,j) According to (19), resample p(zt+1|x˜0:t+1,y0:t) such that the particles x˜t+1 and z˜t+1 , (i) (i,1) (i,1) (i,Nz) (i,Nz ) j = 1, ..., N , can be sampled from the approximations, {x˜0:t, z˜t , r˜t ..., z˜t , r˜t }, i = 1, ..., Nx, to z (i) (i,1) (i,1) (i,Nz) (i,Nz ) respectively. generate samples {x0:t, z¯t , rt ..., z¯t , rt }, (i,j) A convenient way to construct those approximations is given i =1, ..., Nx, where for t ≥ 0, r˜t is defined according (i) to (32). as follows. From (22), an approximation of p(xt+1|x0:t,y0:t) can be obtained as 3) Measurement update of zt based on yt (i,j) Nz For i = 1, ..., Nx, the importance weights q¯t , j = (i) (i,j) (i) (i,j) pN (xt+1|x ,y0:t)= q¯ p(xt+1|x , z¯ ) (34) 1, ..., Nz, are evaluated according to (23). z 0:t t t t (i) Xj=1 4) Generation of particles x˜t+1, i =1, ..., Nx (i) (i) For each i = 1, ..., Nx, x˜t+1 is generated according to In turn, a further approximation of p(xt+1|x0:t,y0:t) can be (i) obtained as (i) (i) with (i) and (i) , respectively, the proposal function π(xt+1|x0:t,y0:t). N (¯xt+1, Σt+1) x¯t+1 Σt+1 5) Measurement update of zt based on xt+1 the mean and the covariance of the discrete distribution over (i,j) {x˜(i,j), j = 1, ..., N } with probability mass q¯(i,j) associated For i = 1, ..., Nx, the importance weights qt , j = t+1 z t (i,j) (i) (i,j) 1, ..., Nz, are evaluated according to (26). with the element x˜t+1 ∼ p(xt+1|xt , z¯t ). Therefore, for (i,j) (i) 6) Resampling of the particles z¯t , i = 1, ..., Nx, j = i =1, ..., Nx, the particle x˜t+1 can be generated from 1, ..., Nz π(x |x(i),y )= N (¯x(i) , Σ(i) ) (35) According to (27), for each i = 1, ..., Nx, resample t+1 0:t 0:t t+1 t+1 (i,j) the particles z¯t , j = 1, ..., Nz, to generate samples On the other hand, from (28) and (29), an approximation (i,j) (i) (i) zt , j =1, ..., Nz. p˜Nz (zt+1|x˜0:t+1,y0:t) of p(zt+1|x˜0:t+1,y0:t) has already been 7

x (i) x x (i) T (i) (i) given in (30). Then, it follows from (30) and the assumption gt (xt )Qt (gt (xt )) ) with x¯t+1 and Σt+1, respectively, that p(zt+1|xt:t+1,zt) is known that, for each i = 1, ..., Nx, the mean and the covariance of the discrete distribution (i,j) (i,j) (i,j) the particle z˜t+1 , j =1, ..., Nz can be generated from {x˜t+1 , j = 1, ..., Nz} with probability mass q¯t asso- ciated with the element x˜(i,j) ∼ p(x |x(i), z¯(i,j)). For (i) (i) (36) t+1 t+1 t t π(zt+1|x˜0:t+1,y0:t)=˜pNz (zt+1|x˜0:t+1,y0:t) (i) (i) (i) this special case, π(xt+1|x0:t,y0:t) = N (¯xt+1, Σt+1 + Remark 3.2: From (25) and (29), another approximation of gx(x(i))Qx(gx(x(i)))T ). ♦ (i) t t t t t p(zt+1|x˜0:t+1,y0:t) can be obtained as While we have assumed the proposal functions in the form (i) (i) of π(xt|x ,y t ) and π(zt|x˜ ,y t ), it is possible p (z |x˜(i) ,y ) 0:t−1 0: −1 0:t 0: −1 Nz t+1 0:t+1 0:t to choose proposal functions in more general forms. For Nz (i,j) (i) (i,j) (37) example, in the importance sampling of (24), the proposal i = qt p(zt+1|x˜0:t+1, z¯t ,y0:t) ( ) function π(zt|x0:t,y0:t−1) can be replaced by another proposal Xj=1 (i) function in the form of π(zt|x˜0:t+1,y0:t−1). Moreover, this This observation shows that the PF used to estimate new proposal function, together with the two proposal func- (i) (i) (i) p(zt|x0:t,y0:t) is closely related to the so-called marginal tions π(xt|x ,y0:t−1) and π(zt|x˜ ,y0:t−1) can be further (i,j) 0:t−1 0:t PF [21]. A marginal PF would sample the particles z˜t+1 , made dependent on yt. That is, in the importance sampling j =1, ..., Nz, from of (14), (21) and (24), the proposal functions in the form (i) (i) (i) (i) of π(x |x ,y ), π(z |x ,y ) and π(z |x˜ ,y ) can π(z |x˜ ,y ) t 0:t−1 0:t t 0:t 0:t t 0:t+1 0:t t+1 0:t+1 0:t be used, respectively. Due to the space limitation, we do not N z (38) discuss this issue here and we refer the interested reader to, = q(i,j)π(z |x˜(i) , z¯(i,j),y ) t t+1 0:t+1 t 0:t for example, [2,7,13,23] for relevant discussions. Finally, we Xj=1 remaind that if these proposal functions in more general forms According to [21], sampling from (38) is precisely equiv- are used, then slight modification is needed to make to the DPF (i,j) alent to first resample the particles z¯t , i = 1, ..., Nx, algorithm. j = 1, ..., Nz according to (27) and then sample the par- 3) Computing the state estimate: A common application of (i,j) (i) (i,j) ticle z˜t+1 from π(zt+1|x˜0:t+1,zt ,y0:t). If (37) is cho- a PF is to compute the state estimate, i.e., the expected mean (i) of the state. For system (3), the state estimate of x and z are sen as the proposal function, i.e., π(zt+1|x˜0:t+1,y0:t) = t t (i) defined as pNz (zt+1|x˜0:t+1,y0:t) in the marginal PF, then sampling from (38) would be exactly the same as what we did. In addition, x¯t = Ep(xt|y0:t)(xt), z¯t = Ep(zt|y0:t)(zt) (41) like the marginal PF, for each i =1, ..., Nx, the computational (i,j) 2 Then the approximation of x¯t and z¯t can be computed in the cost of r˜t , j = 1, ..., Nz, is O(Nz ). However, this should following way for the DPF. Note from (17) that p(x |y ) has not be a problem as a small Nz of particles are usually used t 0:t (i) an empirical approximation p (x |y )= Nx w(i)δ(x − to approximate p(zt|x ,y0:t). Moreover, a couple of methods Nx t 0:t i=1 t t 0:t (i) have been given in [21] to reduce this computational cost. On x˜t ). Then, an approximation xˆt of x¯t canP be calculated in the other hand, if (36) is chosen as the proposal function, the following way (i,j) then r˜t = 1, i = 1, ..., Nx, j = 1, ..., Nz. As a result, the Nx (i,1) (i,Nz ) E (i) (i) (42) resampling of {r˜t , ..., r˜t },i =1, ..., Nx, is not needed xˆt = pNx (xt|y0:t)(xt)= wt x˜t and the computation of (11), (23) and (26) is simplified. ♦ Xi=1

Remark 3.3: If system (3) has a special structure, then the Analogously, note from (20) and (22) that p(zt|y t) (i) 0: construction of π(xt |x ,y t) can become simpler. We +1 0:t 0: has an empirical approximation pNx,Nz (zt|y0:t) = mention two cases here. First, assume that the x-dynamics 1 Nx Nz (i,j) (i,j) q¯ δ(zt − z¯ ). Then, an approximation (i) Nx i=1 j=1 t t of system (3) is independent of z, then p(xt+1|x0:t,y0:t) = zˆt of z¯t can be calculated as (i) (i) P P p(xt |x ) and thus we can choose π(xt |x ,y t) = +1 t +1 0:t 0: Nx Nz (i) 1 (i,j) (i,j) p(xt+1|xt ). Systems with this special structure have been E zˆt = pNx,Nz (zt|y0:t)(zt)= q¯t z¯t (43) Nx studied in the literature before, see e.g., [11]. Second, assume Xi=1 Xj=1 that system (3) takes the following form x x x IV. DISCUSSION xt+1 = ft (xt,zt)+ gt (xt)vt z z z In the DPF algorithm, the summation calculation in zt+1 = ft (xt,zt)+ gt (xt,zt)vt (39) (i) the normalizing factor (the denominator) of wt in yt = ht(xt,zt,et) (18) and the first resampling, i.e., the resampling of x z x z (i) (i,1) (i,1) (i,Nz) (i,Nz ) where ft (·),ft (·),gt (·),gt (·) and ht(·) are known functions, {x˜t , z˜t , r˜t , ..., z˜t , r˜t }, i = 1, ..., Nx, are the vt is assumed white and Gaussian distributed according to only operations that cannot be implemented in parallel. Be- x x zx T sides, the remaining operations including the second resam- vt Qt (Qt ) (i,j) vt = z ∼ N 0, zx z (40) pling, i.e., the resampling of the particles z¯t , i =1, ..., Nx,  vt    Qt Qt  j = 1, ..., Nz, can be divided into Nx independent parts in Then from (34), a further simplified approximation of terms of the index i and can thus be implemented in parallel. (i) (i) (i) p(xt+1|x0:t,y0:t) can be obtained as N (¯xt+1, Σt+1 + In this sense, we say that the DPF has the advantage over the 8

1 PE ) y MU MUx  - PE CU W (k) wi

Inter TUz i R q PE Rx N (k) Fig. 3. The architecture of the PF with parallel structure. It consists of a central unit (CU) and a number of processing elements (PE) where the CU Rz handles the operations that cannot be implemented in parallel and the PEs are Intra run in parallel to deal with the operations that can be implemented in parallel. R

x MUz regular PF in that the DPF can increase the level of parallelism of the PF. PR The underlying idea of the DPF is different from that of TUx the distributed PFs, such as the DRPA-PF and the DRNA- (k) N −N ξt-particles PF (see Section I), while they all have parallel structure. This PR difference results in a couple of unique features of the DPF in PR contrast with the distributed PFs, which identify the potential y MUz of the DPF in the application of PFs in real-time systems and the parallel implementation of PFs. In the remaining part See Remark 4.1 of this section, we first briefly review the DRPA-PF and the DRNA-PF, and then we point out the unique features of the TU PR DPF. Finally, we show that there exist two ways to further increase the level of parallelism of the DPF. PEk CU PEi CU Before the discussion, it should be noted that all the DPF, DRPA-PF DPF the DRPA-PF and the DRNA-PF have the architecture as Fig. 4. Sequence of operations performed by the specified PE and the CU shown in Fig. 3. for the DRPA-PF and the DPF. The data transmitted between the specified PE and the CU and between different PEs are marked. For the DRPA- PF, the abbreviations are MU (measurement update of ξ0:t based on yt), A. Review of the DRPA-PF and the DRNA-PF [5] Inter R (inter resampling), Intra R (intra resampling), PR (particle routing), (l) and TU (generation of particles ξt+1, l = 1,...,M). For the DPF, the Assume that the sample space contains M particles, where y abbreviations are MUx (measurement update of x0:t based on yt), Rx (i) (i,1) (i,1) (i,Nz) (i,Nz) M is assumed to be the number of particles that is needed (resampling of {x˜ , z˜ , r˜ ,..., z˜ , r˜ }, i = 1,...,Nx), t t y t t t for the sampling importance resampling PF (SIR-PF) [13] or PR (particle routing), MUz (measurement update of zt based on yt), TUx (i) x the bootstrap PF [15] to achieve a satisfactory performance (generation of particles x˜t+1, i = 1,...,Nx), MUz (measurement update of (i,j) for the filtering problem of system (1). Then the sample space zt based on xt+1), Rz (resampling of z¯t , i = 1,...,Nx, j = 1,...,Nz) and TU (generation of particles z(i,j), i ,...,N , j ,...,N ). is divided into K disjoint strata where K is an integer and z ˜t+1 = 1 x = 1 z satisfies 1 ≤ K ≤ M. Further assume that each stratum corresponds to a PE. Before resampling each PE thus has N among the PEs has to be performed such that each PE has particles where N = M/K is an integer. N particles. This procedure is referred as particle routing The sequence of operations performed by the kth PE and the and is conducted by the CU. According to [5], the DRPA- CU for the DRPA-PF is shown in Fig. 4. The inter-resampling PF requires a complicated scheme for particle routing due is performed on the CU and its function is to calculate to the proportional allocation rule. In order to shorten the the number of particles N (k) that will be produced after delay caused by the complicated particle routing scheme in resampling for the kth PE. E(N (k)) should be proportional the DRPA-PF, the DRNA is in turn proposed in [5]. to the weight W (k) of the kth PE which is defined as the sum of the weights of the particles inside the kth PE. In particular, N (k) is calculated using the residual systematic resampling B. Unique features of the DPF (RSR) algorithm proposed in [4]. Once N (k), k = 1, ..., K, The parallel structure of the DPF is created by decomposing are known, resampling is performed inside the K PEs inde- the state space, differing from the parallel structure of the pendently which is referred to as the intra-resampling. Note distributed PFs which is created by dividing the sample space. that after resampling, for k = 1, ..., K, the kth PE has N (k) In the following, we will show that this difference results in particles and N (k) is a random number because it depends on a couple of unique features of the DPF. the overall distribution of the weights W (k), k =1, ..., K. On Before the discussion, the sequence of operations performed the other hand, note that each PE is supposed to be responsible by the ith PE and the CU for the DPF is shown in Fig. 4. The (i) for processing N particles and to perform the same operations summation calculation in the normalizing factor of wt in (18) (i) (i,1) (i,1) (i,Nz) (i,Nz ) in time. Therefore after resampling, the exchange of particles and the resampling of {x˜t , z˜t , r˜t , ..., z˜t , r˜t }, 9

i =1, ..., Nx, are performed on the CU. Assume that there are the DRPA-PF and the DRNA-PF have been given in [5,22]. Nx PEs, and for each i = 1, ..., Nx, the ith PE handles the Analogously, we can also give the ideal minimum execution ith independent part of the remaining operations of the DPF. time of the DPF. Like [5,22], the following assumptions are In particular, for each i = 1, ..., Nx, the ith PE handles the made. Consider an implementation with a pipelined processor. (i,j) resampling of the particles z¯t , j =1, ..., Nz. Therefore, the Assume that the execution time of the particle generation (i,j) resampling of the particles z¯t , i =1, ..., Nx, j =1, ..., Nz, and the importance weights calculation of every particle is is run in parallel on the PEs. LTclk where L is the latency due to the pipelining and Tclk Remark 4.1: In the particle routing of the DPF, the data is the clock period. Also assume that the resampling takes transmitted between different PEs depends on the resampling the same amount of time as the particle generation and the (i) (i,1) (i,1) (i,Nz) (i,Nz ) importance weights calculation. As a result, we have the ideal result of {x˜ , z˜ , r˜ , ..., z˜ , r˜ }, i = 1, ..., Nx. t t t t t DPF More specifically, there will be no data transmitted through minimum execution time of the DPF as Tex = (2Nz + the th PE if (i) (i,1) (i,1) (i,Nz) (i,Nz ) is selected L + Nx + Mr + 1)Tclk. Here, 2Nz represents the delay due i {x˜t , z˜t , r˜t , ..., z˜t , r˜t } (i,j) only once in the resampling. Otherwise, the data transmitted to the resampling of the particles z¯t , j = 1, ..., Nz, and (m) (m,1) (m,1) (m,Nz) the corresponding particle generation and importance weights through the ith PE will be {x˜t , z˜t , r˜t , ..., z˜t , r˜(m,Nz)} for some m = 1, ..., N . In particular, if calculation, Nx represents the delay due to the resampling of t x (i) (i,1) (i,1) (i,Nz) (i,Nz ) (i) (i,1) (i,1) (i,Nz) (i,Nz ) {x˜t , z˜t , r˜t , ..., z˜t , r˜t }, i =1, ..., Nx, Mr is the {x˜t , z˜t , r˜t , ..., z˜t , r˜t } is selected more than (i) (i,1) (i,1) (i,Nz) (i,Nz ) delay due to the particle routing and the extra one T is due once, then m = i; if {x˜ , z˜ , r˜ , ..., z˜ , r˜ } clk t t t t t to the particle generation and importance weight calculation is not selected, then m = 1, ..., N and m 6= i. x (i) ♦ Therefore, the data transmitted between any two PEs, say, of the particle xt . the i1th PE and the i2th PE, will be either zero or m m, m, m,N m,N C. Two ways to further increase the level of parallelism of the {x˜( ), z˜( 1), r˜( 1), ..., z˜( z), r˜( z)} for either m = i t t t t t 1 DPF or m = i2. ♦ In contrast to the DRPA-PF, the DPF allows a simpler parti- The first resampling of the DPF, i.e., resampling of (i) (i,1) (i,1) (i,Nz) (i,Nz ) cle routing scheme. For the DRPA-PF, since after resampling {x˜t , z˜t , r˜t , ..., z˜t , r˜t }, i = 1, ..., Nx, is the the kth PE has N (k) particles that is a random number, a major operation that cannot be implemented in parallel. If complicate scheme has to be used for the DRPA-PF to make Nx is large, then this resampling will cause a large delay. all K PEs has equally N particles. For the DPF, however, since In order to further increase the level of parallelism of the DPF (i) (i,1) (i,1) (i,Nz) (i,Nz ) and shorten the execution time, it is valuable to find ways to after the resampling of {x˜t , z˜t , r˜t , ..., z˜t , r˜t }, handle this problem. i = 1, ..., Nx, all Nx PEs still have the same number of particles, then the DPF allows a simpler particle routing Two possible ways will be given here. The first one is scheme and actually each PE can be treated as a single particle straightforward and is to employ any of the distributed re- in the particle routing. sampling algorithms proposed in [5,22] to perform the first Given a PF with parallel structure, it works most efficiently resampling of the DPF and besides, the remaining parts of the if each PE handles the same number of particles. The effi- DPF stay unchanged. Nonetheless, we prefer the DRPA to the ciency of the DRPA-PF usually decreases, since the numbers other distributed resampling algorithms, since it can produce of particles produced by each PE are not evenly but randomly the same result as the systematic resampling [20] according distributed among the PEs. To be specific, note that the time to [4]. Compared to the first way, the second way only applies to used by the kth PE to produce N (k) particles, k = 1, ..., K, high dimensional system (1) and it is based on an extension after resampling are usually not the same. This observation of the DPF. We have assumed above that the state is implies that the time used by the DRPA to produce the ξt decomposed into two parts according to (2). Actually, the particles after resampling is determined by the k∗th PE that ∗ DPF can be extended to handle the case where the state ξ produces the largest N (k ). Clearly, the more unevenly the t is decomposed into more than two (at most ) parts. For numbers of particles produced by each PE are distributed, nξ illustration, we briefly consider the case where the state is the more time the DRPA takes to produce the particles after ξt ∗ decomposed into three parts. The more general case can be resampling. Especially, in the extreme case that N (k ) ≫ N (k) studied using the induction principle. with k = 1, ..., K, and k 6= k∗, the efficiency of the DRPA- For convenience, assume that the state x in (2) can be PF will be decreased significantly. However, for the DPF, t (i,j) further decomposed into two parts, i.e., the ith PE that handles the resampling of particles z¯t , j = 1, ..., Nz, produces, after resampling, the same number x1,t (i,j) xt = (44) x2,t of particles zt , j =1, ..., Nz. Therefore, the DPF does not   have the efficiency decrease problem of the DRPA-PF. and accordingly, system (3) can be further decomposed into Besides, it will be verified by two numerical examples in the the following form subsequent section that, the DPF has the potential to achieve x1 x1 x1,t+1 = f (x1,t, x2,t,zt, v ) in a shorter execution time the same level of performance as t t x2 x2 the bootstrap PF. However, the DRNA-PF actually trades PF x2,t+1 = ft (x1,t, x2,t,zt, vt ) z z (45) performance for speed improvement [5,22]. zt+1 = ft (x1,t, x2,t,zt, vt )

Remark 4.2: The ideal minimum execution time Tex of yt = ht(x1,t, x2,t,zt,et) 10

Rnx Rnx x x1 T x2 T T where x1,t ∈ 1 , x2,t ∈ 2 , vt = [(vt ) (vt ) ] of the state estimate is measured by the Root Mean Square x x 1 Rnvx1 2 Rnvx2 with vt ∈ , vt ∈ . Assume that the proba- Error (RMSE) between the true state and the state estimate. bility densities p(x1,0|y−1) = p(x1,0), p(x2,0|x1,0,y−1) = For example, the RMSE of xˆ is defined as p(x2,0|x1,0), p(z0|x1,0, x2,0,y−1) = p(z0|x1,0, x2,0) and for 250 20000 t ≥ 0, p(x1,t+1|x1,t, x2,t,zt), p(x2,t+1|x1,t:t+1, x2,t,zt), 1 1 i i 2 p(z |x , x ,z ) and p(y |x , x ,z ) are known. RMSE of xˆ = v ||xt − xˆt|| (47) t 1,t:t+1 2,t:t+1 t t 1,t 2,t t u250 20000 u Xt=1 Xi=1 The filtering problem of system (45) can be split into three t nested sub-problems according to the following factorization i where with a slight abuse of notation, xt denotes the true state p(zt, x1,0:t, x2,0:t|y0:t)= p(zt|x1,0:t, x2,0:t,y0:t) at time t for the ith simulation and xˆi is the corresponding (46) t × p(x2,0:t|x1,0:t,y0:t)p(x1,0:t|y0:t) state estimate. It is also tested how well the RMSE reflects the accuracy of the estimated posterior densities (See Remark where for i = 1, 2, xi,0:t , {xi,0, ..., xi,t}. It can be shown 5.1 for more details). that the DPF can be extended to handle the filtering problem of system (45) by using PFs to solve the three nested sub- (i) problems. Roughly speaking, a PF with Nx1 particles (x1,0:t, C. Performance evaluation: Timing i = 1, ..., Nx1 ) will be used to estimate p(x1,0:t|y0:t), and i,j One objective with the simulations is to assess the potential for each i = 1, ..., N , a PF with N particles (x( ) , j = x1 x2 2,0:t efficiency of the parallel implementation of the DPF. For that ) will be used to estimate (i) , and 1, ..., Nx2 p(x2,0:t|x1,0:t,y0:t) purpose, we record the following times for each i = 1, ..., Nx1 and j = 1, ..., Nx2 , a PF with Nz (i) (i,j) • T : This is the average execution time of the sequential particles will be used to estimate p(z |x , x ,y ). si t 1,0:t 2,0:t 0:t implementation of a PF. Similar to the DPF based on (4), the major operation that • T : This is the average time used by the operations that cannot be implemented in parallel in the DPF based on (46) cp cannot be implemented in parallel in a PF. is its first resampling, i.e., the resampling of Nx1 composite • T : This is the potential execution time of parallel imple- particles. If a satisfactory performance of the DPF based on pi mentation of a PF. For the bootstrap PF with centralized (46) can be achieved with N +N ≪ N , then the number x1 x2 x resampling and the DPF, it is calculated according to of composite particles involved in the first resampling of the T = T + (T − T )/N where N is the number DPF will be reduced from N to N . Therefore, in this way pi cp si cp PE PE x x1 of processing elements. For the DPF, let N = N . For the level of parallelism of the DPF is further increased. If PE x the bootstrap PF with centralized resampling, let N be the DPF is implemented in parallel, then the execution time PE the maximal N in the simulation of the corresponding of the DPF will be further decreased as well. However, it x example. Here, the bootstrap PF with centralized resam- should be noted that N ·N PEs are required to fully exploit x1 x2 pling means that besides the resampling, the remaining the parallelism of the DPF based on (46). Due to the space particle generation and importance weights calculation limitation, we cannot include the extension of the DPF in this of the bootstrap PF are implemented in parallel. For the paper and instead we refer the reader to [10] for the details. DRPA-PF, Tpi is calculated according to Tpi = Tcp+Tmir+ (Tsi − Tcp − Tmir)/NPE where NPE = K and Tmir is the V. NUMERICAL EXAMPLES average maximal intra-resampling time for the DRPA-PF. In this section we will test how the DPF performs on two examples. The simulations are performed using Matlab under the Linux operating system. The platform is a server consisting D. Performance evaluation: Divergence failures of eight Intel(R) Quad Xeon(R) CPUs (2.53GHz). The rate rd is used to reveal how often a PF diverges in the 20000 Monte Carlo simulations. The bootstrap PF and the A. Algorithms tested DRPA-PF are said to diverge if their importance weights are For the two examples, the bootstrap PF is implemented all equal to zero in the simulation. The DPF is said to diverge (i) in the standard fashion, using different number of particles if wt , i = 1, ..., Nx, are all equal to zero in the simulation. (M). The DPF is implemented according to Section III-B Once the divergence of a PF is detected, the PF will be rerun. for different combinations of “x and z particles” (Nx and Nz). The DRPA-PF according to [5] is tested as well, using different number of PEs (K). The formulas of [5] has been E. Sketch of the simulation closely followed, but the implementation is our own, and it is For the two examples, the bootstrap PF using M particles of course possible that it can be further trimmed. In addition, is first implemented and its accuracy measured by the RMSE as suggested in [17,20] systematic resampling is chosen as will be treated as the reference level. Then it is shown that the the resampling algorithm for all algorithms tested. DPF using suitable Nx and Nz “x and z particles” can achieve the same level of accuracy. In turn, the DRPA-PF using M B. Performance evaluation: Accuracy particles, but with different number of processing elements is In the tests, the performance of all algorithms are evaluated also implemented. Finally, the bootstrap PF using 2M particles by 20000 Monte Carlo simulations. Basically, the accuracy is implemented. 11

TABLE I SIMULATION RESULT FOR SYSTEM (48) WITH (49) – “SEE SECTIONS V-B - V-D FOREXPLANATIONSOFTHENUMBERS”

Case RMSE of [ˆxt, zˆt] Tsi (Sec) Tcp (Sec) Tpi (Sec) rd Bootstrap PF, M = 1000 [2.0173, 2.3322] 0.1891 0.0313 0.0326 0.0155 DPF, Nx = 100, Nz = 19 [2.0104, 2.3497] 0.3545 0.0168 0.0202 0.0133 DPF, Nx = 120, Nz = 19 [1.9914, 2.3045] 0.3901 0.0176 0.0207 0.0175 DPF, Nx = 110, Nz = 24 [1.9907, 2.3154] 0.4127 0.0175 0.0211 0.0113 DPF, Nx = 120, Nz = 24 [1.9906, 2.3259] 0.4338 0.0179 0.0214 0.0076 DRPA-PF, M = 1000, K = 40 [2.0222, 2.3557] 0.6324 0.0671 0.0878 0.0124 DRPA-PF, M = 1000, K = 25 [2.0332, 2.4049] 0.4769 0.0565 0.0799 0.0124 Bootstrap PF, M = 2000 [1.9714, 2.2664] 0.2579 0.0510 0.0528 0.0059

F. Two dimensional example 0.7 PF, M=1000 DPF, N =100, N =19 Consider the following two dimensional nonlinear system x z

0.6 zt x xt+1 = xt + 2 + vt 1+ zt 0.5 25zt z zt+1 = xt +0.5zt + + 8cos(1.2t)+ v (48) 1+ z2 t t 0.4 2 zt yt = atan(xt)+ + et 20 0.3

T where [x0 z0] is assumed Gaussian distributed with T x z T 0.2 [x0 z0] ∼ N (0, I2×2), vt = [vt vt ] and et are assumed white and Gaussian distributed with 0.1 1 0.1 vt ∼ N 0, , and et ∼ N (0, 1) (49)   0.1 10  0 0 50 100 150 200 250 Time (Sec) For the DPF, the proposal functions are chosen according to Remark 3.3 and (36). The simulation result over [1 250] is Fig. 5. The distance |F (xt|y t) − F˜(xt|y t)| between the “true” shown in Table I, from which it can be seen that the DPF has xt 0: 0: posterior cumulative distributionP F (xt|y0:t) and the empirical approximation the potential to achieve in a shorter execution time the same of the posterior cumulative distribution F˜(xt|y0:t) obtained by using the level of accuracy as the bootstrap PF. bootstrap PF and the DPF (with M = 1000 and Nx = 100, Nz = 19 particles, respectively) as a function of time t (Thin curve: the bootstrap PF, Remark 5.1: In Table I, the accuracy of the algorithms is Thick curve: the DPF). The result is an average over 20000 simulations. measured entirely through the RMSE of the state estimate (47). Since the PF actually computes estimates of the posterior density p(zt, x0:t|y0:t) one may discuss if this would be a more appropriate quantity to evaluate for comparison. Actually, the G. Four dimensional example state estimates xˆt could be quite accurate even though the estimate of the posterior density is poor. To evaluate that, Consider the following four dimensional nonlinear system we computed an accurate value of the true posterior density using the bootstrap PF with “many” (100000) particles, and x1 x1,t+1 =0.5x1,t + 8 sin(t)+ vt compared that with the estimates of the posterior densities x =0.4x +0.5x + vx2 using the bootstrap PF and the DPF with fewer (M = 1000 2,t+1 1,t 2,t t z2,t z1 and Nx = 100,Nz = 19) particles. To avoid smoothing issues z1,t+1 = z1,t + 2 + vt for the empirical approximations of the posterior densities, 1+ z2,t we made the comparisons for the posterior cumulative distri- z2,t+1 = z1,t +0.5z2,t (50) butions (empirical approximations of the posterior cumulative 25z2,t z2 + 2 + 8cos(1.2t)+ vt distributions). The result is shown in Fig. 5. Moreover, let 1+ z2,t i i i x¯t denote the true mean of xt, then (47) with xt replaced 2 i x1,t + x2,t z2,t by x¯t is also calculated: it is 0.7239 for the bootstrap PF with yt = 2 + atan(z1,t)+ + et 1+ x1,t 20 M = 1000, and 0.6851 for the DPF with Nx=100 and Nz=19. From the above simulation results, we see that the DPF is at least as good as the bootstrap PF in approximating the T T T T T where xt = [x1,t x2,t] and zt = [z1,t z2,t] . [x0 z0 ] is posterior cumulative distribution. We conclude that the RMSE T T T assumed Gaussian distributed with [x0 z0 ] ∼ N (0, I4×4), (47) gives a fair evaluation of the accuracy of the state estimate x T z T T x x1 x2 T z vt = [(vt ) (vt ) ] with vt = [vt vt ] and vt = produced by the DPF for system (48) with (49). ♦ z1 z2 T [vt vt ] , and et are assumed white and Gaussian distributed 12

TABLE II SIMULATION RESULT FOR SYSTEM (50) WITH (51) – “SEE SECTIONS V-B - V-D FOREXPLANATIONSOFTHENUMBERS”

Case RMSE of [ˆx1,t, xˆ2,t, zˆ1,t, zˆ2,t] Tsi (Sec) Tcp (Sec) Tpi (Sec) rd Bootstrap PF, M = 1500 [1.1566, 1.3494, 2.0111, 2.8241] 0.2022 0.0316 0.0339 0.0072 DPF, Nx = 50, Nz = 29 [1.1707, 1.3678, 2.0485, 2.9383] 0.2366 0.0145 0.0190 0.0109 DPF, Nx = 60, Nz = 49 [1.1633, 1.3569, 1.9879, 2.7911] 0.3255 0.0160 0.0212 0.0039 DPF, Nx = 75, Nz = 39 [1.1610, 1.3537, 1.9794, 2.7547] 0.3309 0.0164 0.0206 0.0040 DRPA-PF, M = 1500, K = 30 [1.1564, 1.3490, 2.0028, 2.7894] 0.5083 0.0470 0.0669 0.0084 DRPA-PF, M = 1500, K = 50 [1.1566, 1.3495, 2.0148, 2.8302] 0.6951 0.0607 0.0780 0.0107 Bootstrap PF, M = 3000 [1.1518, 1.3419, 1.9794, 2.7601] 0.3105 0.0539 0.0573 0.0021

with sub-problems using PFs. Returning to the questions in the 10 0 0 intuitive preview in Section II-A, we have seen that the DPF can produce results of not much worse accuracy compared to   01 0 0  v ∼ N 0, , and e ∼ N (0, 1) (51) t 00 1 0.1 t the regular PF, with comparable number of particles. We have      0 0 0.1 10  also seen that the structure gives a potential for more efficient    calculations. The advantage of the DPF over the regular PF Since the “x-dynamics” does not depend on zt, we let lies in that the DPF can increase the level of parallelism (i) (i) π(xt+1|x0:t,y0:t) = p(xt+1|xt ) and choose the other pro- of the PF in the sense that part of the resampling has a posal function according to (36). The simulation result over parallel structure. The parallel structure of the DPF is created [1 150] is shown in Table II, from which it can be seen that by decomposing the state space, differing from the parallel the DPF has the potential to achieve in a shorter execution structure of the distributed PFs which is created by dividing time the same level of accuracy as the bootstrap PF. the sample space. This difference results in a couple of unique features of the DPF in contrast with the existing distributed H. Summary PFs. As a result, we believe that the DPF is a new option for parallel implementation of PFs and the application of PFs in Regarding the accuracy, comparison of the first part of the real-time systems. RMSE column in Tables I and II shows that with suitably An interesting topic for future work is to study how to chosen N and N , the DPF achieves the same level of accu- x z decompose the state given a high dimensional system such that racy as the bootstrap PF. On the other hand, with comparable the execution time of the parallel implementation of the DPF number of particles (it is fair to compare M with N (N +1)) x z can be maximally reduced. Another interesting topic is to test the accuracy is not much worse for the DPF than for the the DPF in different types of parallel hardware, for example, bootstrap PF. In fact, in Table II the DPF even performs graphical processing units (GPU) and field-programmable gate slightly better than the PF for some of the states (no statistical arrays (FPGA). significance), illustrating that allocating points as in Fig. 1.c A further topic of future investigations is to generalize the could actually be beneficial for some systems. line pattern in Fig. 1.c to other lines and curves that may pick Regarding timing, comparison of the T and T column si pi up useful shapes in the posterior densities to be estimated. This in Tables I and II shows that the execution time of the DPF essentially involves a change of state variables before the DPF can be shortened significantly if the DPF can be implemented is applied. in parallel. Moreover, the DPF has a potential to offer better accuracy in shorter execution time. In particular, the Tpi of REFERENCES the DPF is less than that of the bootstrap PF with central- [1] C. Andrieu and A. Doucet, “Particle filtering for partially observed ized resampling. It is valuable to note that the Tpi of the Gaussian state space models,” Journal of the Royal Statistical Society: DPF is even smaller than the Tcp of the bootstrap PF with Series B (Statistical Methodology), vol. 64, pp. 827–836, 2002. centralized resampling, which is actually the lower bound [2] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial of the execution time of the bootstrap PF with centralized on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEE Transactions on Signal Processing, vol. 50, pp. 174–188, 2002. resampling. In addition, as discussed in Section IV-C, the Tpi [3] N. Bergman, “Recursive Bayesian estimation: Navigation and tracking of the DPF can be further shortened by using any one of the applications,” PhD thesis No 579, Link¨oping Studies in Science and distributed resampling algorithms to perform the resampling Technology, SE-581 83 Link¨oping, Sweden, May 1999. (i) (i,1) (i,1) (i,Nz) (i,Nz) [4] M. Boli´c, P. Djuri´c, and S. Hong, “Resampling algorithms for particle of {x˜t , z˜t , r˜t , ..., z˜t , r˜t }, i = 1, ..., Nx. As a filters: A computational complexity perspective,” EURASIP Journal on result, it is fair to say that the parallel implementation of the Applied Signal Processing, vol. 15, pp. 2267–2277, 2004. DPF has the potential to shorten the execution time of the PF. [5] ——, “Resampling algorithms and architectures for distributed particle filters,” IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2442–2450, 2005. VI. CONCLUSIONS [6] R. S. Bucy and K. D. Senne, “Digital synthesis on nonlinear filters,” Automatica, vol. 7, pp. 287–298, 1971. In this paper we have proposed a new way of allocating [7] J. Carpenter, P. Clifford, and P. Fearnhead, “Improved particle filter for the particles in particle filtering. By first decomposing the nonlinear problems,” IEE Proceedings - Radar, Sonar and Navigation, vol. 146, no. 1, pp. 2 –7, 1999. state into two parts, the DPF splits the filtering problem into [8] G. Casella and C. P. Robert, “Rao-Blackwellisation of sampling two nested sub-problems and then handles the two nested schemes,” Biometrika, vol. 83, pp. 81–94, 1996. 13

[9] R. Chen and J. Liu, “Mixture Kalman filters,” Journal of the Royal Tianshi Chen received the M.S. degree in 2005 Statistical Society: Series B (Statistical Methodology), vol. 62, pp. 493– from Harbin Institute of Technology, and the Ph.D 508, 2000. degree in December 2008 from Chinese University [10] T. Chen, T. B. Sch¨on, H. Ohlsson, and L. Ljung, “An extension of the of Hong Kong. Since April 2009, he has been a dencentralized particle filter with arbitrary state partitioning,” Automatic postdoctoral researcher at the Division of Auto- Control Division, Link¨oping University, Tech. Rep. No. 2930, 2010. matic Control, Department of Electrical Engineer- [11] A. Doucet, N. de Freitas, K. Murphy, and S. Russell, “Rao-Blackwellised ing, Link¨oping University, Link¨oping, Sweden. His particle filtering for dynamic Bayesian networks,” in Proceedings of the research interests include nonlinear control theory 16th Conference on Uncertainty in Artificial Intelligence, 2000, pp. 176– and applications, system identification and statistical 183. signal processing. [12] A. Doucet, N. D. Freitas, and N. Gordonn, Sequential Monte Carlo methods in practice. New york: Springer, 2001. [13] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte Carlo sampling methods for Bayesian filtering,” Statistics and Computing, vol. 10, pp. 197–208, 2000. [14] A. Doucet and A. M. Johansen, “A tutorial on particle filtering and smoothing: Fifteen years later,” in Handbook of Nonlinear Filtering, D. Crisan and B. Rozovsky, Eds. Oxford, UK: Oxford University Thomas B. Schon¨ was born in Sweden in 1977. Press, 2009. He received the M.Sc. degree in Applied Physics [15] N. Gordon, D. Salmond, and A. Smith, “Novel approach to and Electrical Engineering in Sep. 2001, the B.Sc. nonlinear/non-Gaussian Bayesian state estimation,” Radar and Signal degree in Business Administration and Economics Processing, IEE Proceedings F, vol. 140, no. 2, pp. 107–113, 1993. in Feb. 2001 and the Ph.D. degree in Automatic [16] F. Gustafsson, F. Gunnarsson, N. Bergman, U. Forssell, J. Jansson, Control in Feb. 2006, all from Link¨oping University, R. Karlsson, and P.-J. Nordlund, “Particle filters for positioning, navi- Link¨oping, Sweden. He has held visiting positions at gation, and tracking,” IEEE Transactions on Signal Processing, vol. 50, the University of Cambridge (UK) and the Univer- no. 2, pp. 425–437, 2002. sity of Newcastle (Australia). His research interests [17] J. D. Hol, T. B. Sch¨on, and F. Gustafsson, “On resampling algorithms are mainly within the areas of signal processing, for particle filters,” in IEEE Nonlinear Statistical Signal Processing system identification and machine learning, with Workshop, 2006, pp. 79–82. applications to the automotive and the aerospace industry. He is currently [18] X. Hu, T. B. Sch¨on, and L. Ljung, “A basic convergence result for an Associate at Link¨oping University. particle filtering,” IEEE Transactions on Signal Processing, vol. 56, no. 4, pp. 1337–1348, 2008. [19] R. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960. [20] G. Kitagawa, “Monte Carlo filter and smoother for non-Gaussian non- linear state space models,” J. Comput. Graph. Statist., vol. 5, no. 1, pp. 1–25, 1996. Henrik Ohlsson was born in Sweden in 1981. He [21] M. Klass, N. de Freitas, and A. Doucet, “Towards practical N 2 Monte received the M.Sc. degree in Applied Physics and Carlo: The marginal particle filter,” in Uncertainty in Artificial Intelli- Electrical Engineering in Oct. 2006 and his Licenti- gence, 2005. ate degree in Automatic Control in Dec. 2008, all [22] J. M´ıguez, “Analysis of parallelizable resampling algorithms for particle from Link¨oping University, Sweden. He has held filtering,” Signal Processing, vol. 87, pp. 3155–3174, 2007. visiting positions at the University of Cambridge [23] M. K. Pitt and N. Shephard, “Filtering via simulation: Auxiliary particle (UK) and the University of Massachusetts (USA). filters,” Journal of the American Statistical Association, vol. 94, no. 446, His research interests are mainly within the areas pp. 590–599, 1999. of system identification and machine learning. He is [24] L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov currently a Ph.D. student at Link¨oping University. models,” IEEE Acoust., Speech, Signal Processing Mag., pp. 4–16, 1986. [25] T. B. Sch¨on, F. Gustafsson, and P.-J. Nordlund, “Marginalized particle filters for mixed linear/nonlinear state-space models,” IEEE Transactions on Signal Processing, vol. 53, pp. 2279–2289, 2005.

ACKNOWLEDGMENT Lennart Ljung received his PhD in Automatic The authors would like to thank the reviewers for their use- Control from Lund Institute of Technology in 1974. ful comments and constructive suggestions, and the associate Since 1976 he is Professor of the chair of Automatic editor for his time handling the paper. The first author would Control in Link¨oping, Sweden, and is currently Director of the Stratetic Research Center ”Mod- also like to thank Umut Orguner for his continuous support eling, Visualization and Information Integration” and helpful discussions. (MOVIII). He has held visiting positions at Stanford This work was supported by the Strategic Research Center and MIT and has written several books on System Identification and Estimation. He is an IEEE Fellow, MOVIII, funded by the Swedish Foundation for Strategic an IFAC Fellow and an IFAC Advisor as well as a Research, SSF, and CADICS, a Linnaeus center funded by member of the Royal Swedish Academy of Sciences the Swedish Research Council. (KVA), a member of the Royal Swedish Academy of Engineering Sciences (IVA), an Honorary Member of the Hungarian Academy of Engineering and a Foreign Associate of the US National Academy of Engineering (NAE). He has received honorary doctorates from the Baltic State Technical University in St Petersburg, from , Sweden, from the Technical University of Troyes, France, from the Catholic University of Leuven, Belgium and from Helsinki University of Technology, Finland. In 2002 he received the Quazza Medal from IFAC, in 2003 he recieved the Hendrik W. Bode Lecture Prize from the IEEE Control Systems Society, and he was the recepient of the IEEE Control Systems Award for 2007.