Distributed Particle Filtering Via Optimal Fusion of Gaussian Mixtures Jichuan Li and Arye Nehorai , Life Fellow, IEEE

Home , Particle filter

280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 Distributed Particle Filtering via Optimal Fusion of Gaussian Mixtures Jichuan Li and Arye Nehorai , Life Fellow, IEEE

Abstract—We propose a distributed particle filtering algorithm communication between neighboring sensors. Depending on the based on an optimal fusion rule for local posteriors. We imple- type of information communicated in the consensus algorithm, ment the optimal fusion rule in a distributed and iterative fashion a distributed particle filtering algorithm can be categorized as via an average consensus algorithm. We approximate local posteriors as Gaussian mixtures and fuse Gaussian mixtures through weight-based, likelihood-based, or posterior-based. importance sampling. We prove that under certain conditions the Weight-based algorithms [5]Ð[9] communicate the weight of proposed distributed particle filtering algorithm converges in prob- each particle or, similarly, the local likelihood evaluated at each ability to a global posterior locally available at each sensor in the particle. To guarantee an accurate Monte Carlo approximation, network. Numerical examples are presented to demonstrate the the number of particles held by each local filter is usually con- performance advantages of the proposed method in comparison with other distributed particle filtering algorithms. siderably large, which results in considerably high communication overhead for weight consensus. Also, in order for weight Index Terms—Average consensus, data fusion, distributed par- consensus to make sense, different local filters must have iden- ticle filtering, Gaussian mixture model, importance sampling. tical particles, which necessitates perfect synchronization between the random number generators of different sensors. The I. INTRODUCTION reliance on perfect synchronization, together with the high communication overhead, makes weight-based algorithms costly to ARTICLE filtering, also known as the sequential Monte implement in practice. Carlo method, is a powerful tool for sequential Bayesian P Likelihood-based algorithms [10]Ð[12] communicate local estimation [1]. Unlike Kalman filtering [2], particle filtering likelihood functions approximated via factorization and linear is able to work with both nonlinear models and non-Gaussian regression. Since there is no universal approach to the desired noise, and thus is applicable to sequential estimation problems factorization format, the likelihood approximation approach under general assumptions. To improve estimation accuracy, a does not generalize well beyond the exponential family. Also, particle filter is often built on observations from more than one likelihood consensus requires uniform factorization across the perspective or sensor. These sensors form a network and collab- network and thus does not apply to scenarios where the noise orate via wireless communication. The network can designate distribution at each sensor varies. Hence, likelihood consensus one of the sensors or an external node as the fusion center, which might not be an ideal choice for general applications. receives and processes observations sent from all the other sen- Posterior-based algorithms [13]Ð[21] communicate local sors in the network. This centralized implementation is optimal posteriors parametrically approximated in a compact form, for particle filtering in terms of accuracy, but does not scale and have several advantages over likelihood-based and weight- with growing network size in applications like target tracking, based algorithms. First, unlike likelihood functions, posteriors environmental monitoring, and smart grids. This shortcoming are essentially probability density functions and thus easy to motivates distributed particle filtering [3]. represent parametrically. If a posterior follows a (multivariate) Distributed particle filtering consists of separate particle fil- Gaussian distribution, it can be losslessly represented by its ters that have access to local observations only and produce mean and variance (covariance matrix); if a posterior follows global estimates via communication. It is often implemented a non-Gaussian distribution, it can be sufficiently accurately using a consensus algorithm [4], where sensors in a net- approximated by a convex combination of multiple Gaussian work reach agreement among their beliefs iteratively through components, i.e., a Gaussian mixture (GM) [22]. Also, such a Manuscript received February 10, 2016; revised September 7, 2016 and March compact parametric representation incurs significantly lower 24, 2017; accepted April 4, 2017. Date of publication April 12, 2017; date of communication overhead than a nonparametric representation, current version May 8, 2018. This work was supported by the Air Force Office of Scientific Research under Grants FA9550-16-1-0386 and FA9550-11-1-0210. e.g., particles. Moreover, posterior-based algorithms are The associate editor coordinating the review of this manuscript and approving invariant to how local posteriors are obtained and thus allow it for publication was Dr. Marcelo G. S. Bruno. (Corresponding author: Arye diverse sensing modalities [23] and various filtering tools to be Nehorai.) The authors are with the Preston M. Green Department of Electrical and exploited in a network. Lastly, posterior-based algorithms give Systems Engineering, Washington University in St. Louis, St. Louis, MO 63130 each sensor privacy, since no sensor in the network needs to USA (e-mail: [email protected]; [email protected]). know how others compute their local posteriors. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. The challenge of posterior-based algorithms mainly lies Digital Object Identifier 10.1109/TSIPN.2017.2694318 in the fusion of parametrically-represented local posteriors.

2373-776X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information. LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 281

In [13]Ð[16], local posteriors are fused in a Bayesian fashion the graph G is connected, or in other words that there exists but assumed to be Gaussian for fusion tractability. As we know, a multi-hop communication route connecting any two sensors a posterior follows a Gaussian distribution only if both the state in the network. Moreover, we assume the sensor network to be transition model and the observation model are linear with addi- synchronous or, if not, synchronized via a clock synchronization tive Gaussian noise. Thus, the Gaussian assumption is so strong scheme [25]Ð[27]. that it will incur obvious approximation errors in nonlinear applications. In [17] and [18], local posteriors are approximated as B. Signal Model Gaussian mixtures but fused linearly through their parameters. We consider a single moving target to be observed by the This linear fusion rule is, however, suboptimal because it is not sensor network. We connect target state transition with sensor justified by the underlying statistical model. Also, it requires observation using a discrete-time state-space model, Gaussian mixtures to have a uniform number of components, thus limiting the flexibility and adaptivity of local parametric xn = g(xn−1 )+un representation. In [19]Ð[21], Gaussian mixtures are fused in , (1) yn,k = hk (xn )+vn,k (k =1, 2,...,K) an analytical fashion with approximations, but the approximate fusion strategy incurs inaccuracy and makes it difficult to con- where ∈ Rd trol the number of components within a reasonable range for a 1) xn is the target state at the nth time point; ∈ Rbk Gaussian mixture model. 2) yn,k is the observation taken by Sk at the nth time In this paper, we propose a posterior-based distributed particle point; filtering algorithm. We approximate local posteriors as Gaussian 3) g is a known state transition function; mixtures and fuse local posteriors via an optimal distributed fu- 4) hk is a known observation function of Sk ; { } { } sion rule derived from Bayesian statistics and implemented via 5) both un and vn,k are uncorrelated additive noise; average consensus. Unlike other posterior-based algorithms, the 6) the distribution of x0 is given as prior information; proposed algorithm neither compromises approximation accu- 7) state transition is Markovian, i.e., past and future states racy for fusion tractability nor compromises fusion validity for are conditionally independent, given the current state; approximation accuracy. Also, the proposed algorithm seeks 8) the current observation is conditionally independent of consensus on the posterior distribution, rather than on parame- past states and observations, given the current state. ters of the posterior distribution, thus giving flexibility to local parametric approximations by allowing each Gaussian mixture C. Goal to have an optimal yet possibly nonuniform number of com- The goal is to sequentially estimate the current state xn based ponents. To address the challenge in fusion, we design algo- on the estimate of the preceding state xn−1 and the newly avail- rithms based on importance sampling [24] to fuse Gaussian { } able observations yn,1 , yn,2 ,...,yn,K . mixtures nonlinearly within each consensus step. Finally, we prove the convergence of the proposed distributed particle filter- D. Notation ing algorithm and demonstrate its advantages through numerical { } examples. We denote consecutive states x1 , x2 ,...,xn as x1:n , ob- The rest of the paper is organized as follows. Section II in- servations taken by the whole network at the nth time point { } troduces the sensor network model and the state-space model. yn,1 , yn,2 ,...,yn,K as yn , and consecutive observations { } Section III introduces centralized particle filtering. Section IV taken by the whole network y1 , y2 ,...,yn as y1:n .Weuse presents our distributed particle filtering algorithm. Section V f to denote a probability density function (pdf) and q to denote analyzes the performance of the proposed algorithm. Section VI the pdf of a proposal distribution in importance sampling. presents numerical examples, and Section VII concludes the paper. III. CENTRALIZED PARTICLE FILTERING The problem formulated in Section II is a filtering problem. II. PROBLEM FORMULATION A filtering problem is often solved by a particle filter when the state-space model is nonlinear or the noise is non-Gaussian. A A. Network Model particle filter can be implemented in a centralized fashion by We model a sensor network as a graph G =(V , E), where collecting observations from all the sensors in the network and V = {S1 ,S2 ,...,SK } is the set of vertices, corresponding to processing them together. sensors, with cardinality |V | = K, and E ⊂ V × V is the set of A centralized particle filter approximates the posterior distri- | edges, corresponding to communication links between sensors. bution of the current state, f(xn y1:n ), as a weighted ensemble We assume each communication link to be bidirectional, in the of Monte Carlo samples (also known as particles): sense that sensors can transmit information in either direction M through the link. With no particular direction assigned to any | ≈ (m ) − (m ) f(xn y1:n ) wn δ(xn xn ), (2) edge, we assume the graph G to be undirected. We restrict each m =1 communication link to a local neighborhood defined as an area (m ) within a circle of radius ρ, in the sense that a sensor can directly where M is the total number of particles, xn is the mth (m ) (m ) M (m ) communicate only with its neighbors. Also, we assume that particle, wn is the weight of xn with m =1 wn =1, 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ is the Dirac delta function. Using importance sampling, following advantages in distributed data fusion. First, it ends up a particle is generated according to a proposal distribution with a global estimate available at each sensor in the network, | (m ) so that the network is robust to sensor failures and every sensor q(xn xn−1 , yn ), and its weight is updated according to in the network is ready to react based on the global estimate. | (m ) (m )| (m ) (m ) f(yn xn )f(xn xn−1 ) (m ) Second, it requires only local communications and does not w ∝ × w − . (3) n (m )| (m ) n 1 need global routing. Last but not least, it is robust to changes q(xn x − , yn ) n 1 in the network topology. In this paper, we fuse local posteriors The proposal distribution q is commonly chosen as the state provided by different sensors via consensus, so that every sensor | (m ) transition pdf f(xn xn−1 ), which, although slightly inefficient, in the network ultimately obtains a global posterior. yields a convenient weight update rule: Likelihood factorization in (5), as mentioned in Section III, makes data fusion convenient, because its logarithmic form (m ) ∝ | (m ) × (m ) wn f(yn xn ) wn−1 . (4) (from now on, we assume that every pdf is positive over its | (m ) support, so that its logarithm is well defined) The global likelihood function f(yn xn ) in (4) can be factor- ized into a product of local likelihood functions, K log f(y |xn )= log f(y |xn ) (7) K n n,k | (m ) | (m ) k=1 f(yn xn )= f(yn,k xn ), (5) k=1 gives rise to a straightforward implementation of an average consensus algorithm [4]. However, unlike a prior or posterior thus providing a centralized fusion rule. density function, a likelihood function is generally difficult to As time goes on, due to the finite number of particles, the approximate parametrically through a universal approach such weight in an ensemble tends to be concentrated in only a few as the Gaussian mixture model. This difficulty motivates us to particles, resulting in a small effective sample size and thus a communicate posterior density functions, instead of likelihood poor approximation. When an ensemble’s effective sample size functions, in average consensus. falls below a threshold, a possible remedy is to resample the To derive a distributed fusion rule for posteriors, we start from particles according to their weights. A popularly used estimate a centralized approach to posterior fusion [28]. of the effective sample size of an ensemble is Due to conditional independence, a likelihood function can − M 1 be equivalently written as ˆ (m ) 2 Me = (wn ) , (6) f(y |xn )=f(y |xn , y − ), (8) m =1 n,k n,k 1:n 1 and the threshold can be set as, for example, 60% of the original which, according to Bayes’ theorem, can be rewritten as sample size M,or100% if the plan is to resample in every f(x |y , y )f(y |y ) | n n,k 1:n−1 n,k 1:n−1 iteration. f(yn,k xn )= . (9) f(x |y − ) Although centralized particle filtering is optimal in estimation n 1:n 1 accuracy, it is impractical for large-scale sensor networks. First, Substitute (9) into (7), and we get it expends considerable energy and bandwidth on transmitting | | f(xn y1:n )f(yn y1:n−1 ) raw measurements from everywhere in the network to a common log f(x |y − ) fusion center. Second, it causes severely unbalanced energy n 1:n 1 consumption and communication traffic in the network, because K | | f(xn yn,k, y1:n−1 )f(yn,k y1:n−1 ) sensors located near the fusion center relay many more messages = log , (10) f(xn |y − ) than those located far away. Further, reliance on a common k=1 1:n 1 fusion center makes it vulnerable to a single point of failure. which simplifies to Moreover, it does not scale with the network size. Therefore, it | − | is often preferable to perform distributed particle filtering. log f(xn y1:n )+(K 1) log f(xn y1:n−1 ) K | IV. DISTRIBUTED PARTICLE FILTERING = log f(xn yn,k, y1:n−1 )+const, (11) In distributed particle filtering, every sensor in the network k=1 performs local particle filtering on its own observation and then where “const” represents a constant term equal to communicates with its neighbors for data fusion, thus achieving K centralized filtering in a distributed fashion. − | | log f(yn y1:n−1 )+ log f(yn,k y1:n−1 ). (12) k=1 A. Consensus Because the constant term is not a function of the state variable Consensus [4] is a type of data fusion algorithm in which xn , we do not have to explicitly compute it, when we compute every sensor in the network iteratively communicates with its the distribution of xn . neighbors and updates its own belief based on its neighbors’ Equation (11) presents a centralized fusion rule for local pos- | until all the sensors hold the same belief. Consensus has the teriors: f(xn yn,k, y1:n−1 ) on the right-hand side of (11) is the LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 283 local posterior of x by S , while f(x |y ) on the left-hand n k n 1:n Algorithm 1: GM Learning from Weighted Samples. side of (11) is the global posterior of x by the whole network. n { }M There are two other terms in (11), namely the constant term and 1: procedure GMLEARN( xi ,wi i=1, C) { Σ }C the prediction term. The constant term will disappear when we 2: initialize C (if not given) and αc , µc , c c=1 | 3: repeat normalize f(xn y1:n ) so that it integrates to 1; the prediction | 4: for i =1to M do E-step term f(xn y1:n−1 ) can be calculated as 5: for c =1to C do | | | 6: pi,c = αc N (xi|µ , Σc ) f(xn y1:n−1 )= f(xn xn−1 )f(xn−1 y1:n−1 )dxn−1 , c Rd 7: end for (13) { }C 8: normalize pi,c c=1 where f(x |x − ) is available from the state transition model, 9: end for n n 1 − | 10: for c =1to C do M-step and f(xn 1 y1:n−1 ), i.e., the global posterior of the last state, M is available at each sensor thanks to the consensus algorithm 11: αc = i=1pi,cwi −1 M performed during the last time step. 12: µc = αc i=1 pi,cwi xi Σ −1 M − − T The centralized fusion rule (11) can be implemented in a 13: c =αc i=1 pi,cwi (xi µc )(xi µc ) distributed manner through an average consensus algorithm. 14: end for (0) C | 15: normalize {αc } Denoting f(xn yn,k, y1:n−1 ) as ηk (xn ), the summation on c=1 the right-hand side of (11) can be computed iteratively based on 16: until convergence { Σ }C a two-step distributed fusion rule: 17: return GM = αc , µc , c c=1 18: end procedure (i+1) (i) Step 1: log ηk (xn )= εkj log ηj (xn ), (14) j∈N k Note that (18) is the final result of the proposed distributed Step 2: Normalize η(i+1)(x ), (15) k n fusion approach, calculated individually at each sensor based on (i) the posterior it holds locally at the end of the average consensus where ηk (xn ) is the posterior density function of xn held algorithm. As a distributed fusion result, (18) is also validated by Sk in the ith iteration of the average consensus algorithm by the centralized fusion result in [28], which, in contrast, is during the nth time step, Nk is the neighborhood of Sk with Sk calculated centrally at a global fusion center based on local included, and εkj is the Metropolis weight [29] defined as ⎧ posteriors provided by all the sensors in the network. ⎪ 1/ max{|N |, |N |} if (k, j) ∈ E ⎨ k j εkj = 1 − ∈ εkl if k = j . (16) B. Gaussian Mixture Model ⎩⎪ l N k 0 otherwise Consensus necessitates inter-sensor communication. Com- munication is a major source of energy consumption for wire- We call (14) the distributed fusion step and (15) the normal- less sensor networks. Since wireless sensor networks are usually ization step. In the distributed fusion step, every sensor sends subject to strong energy constraints, it is important to minimize its current belief to its neighbors and updates it with beliefs re- the amount of communication needed in consensus. A possible ceived from its neighbors; in the normalization step, an updated solution to communication minimization is to compress the data belief is normalized so that it appears as a valid probability to be transmitted. In this paper, we compress all the posteriors density function and can be parametrically represented as a in the distributed fusion step (14) and the recovery step (18) into Gaussian mixture model for future communication. Note that Gaussian mixtures [22]. (14) and (15) can also be reached from another perspective by A Gaussian mixture is a convex combination of Gaussian minimizing the weighted average Kullback-Leibler (KL) dis- components as follows, tance between the fused posterior and the posteriors to be fused and then taking the logarithm, as shown in [21]. C (i) ≈ N Σ In Section V-A, we show that under certain conditions, we ηk (xn ) αc (xn ; µc , c ) , (19) have for ∀ k c=1 1 K where C is the total number of components, and αc , µc , and (i) (0) Σ lim log η (xn )= log ηj (xn )+const, (17) c are the weight, mean, and covariance matrix, respectively, i→∞ k K j=1 of the cth component. where the constant term is added simply for the purpose of A Gaussian mixture model can be used to approximate an normalization. Combining (11) and (17), we have for ∀ k arbitrary probability distribution, and is often learned via the expectation-maximization (EM) algorithm [30] from samples K (i) generated from the underlying distribution. In particle filtering, limi→∞ ηk (xn ) f(x |y ) ∝ , (18) samples are often weighted due to importance sampling, and n 1:n | K −1 f(xn y1:n−1 ) thus we need to learn a Gaussian mixture model from weighted which, called the recovery step, concludes the consensus-based samples using the weighted EM algorithm [31], as summarized distributed particle filtering. in Algorithm 1. 284 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

The convergence of Algorithm 1 can be determined in various Algorithm 2: GM Fusion. ways. In this paper, we terminate Algorithm 1 when the abso- 1: procedure GMFUSE(GM , {GM } ∈ ) lute difference between the log-likelihoods of the current and k j j N k 2: initialize M, {ε } ∈ previous Gaussian mixture models is smaller than a chosen per- kj j N k centage of the absolute difference between the log-likelihoods 3: for j in Nk do of the current and initial models. 4: Mj = Mεkj { (m )}M j Note that the EM algorithm is often computationally ineffi- 5: generate xj m =1 from GMj cient and thus might not be a good choice for real-time applica- 6: for m =1to Mj do (m ) (m ) −1 (m ) k,l tions. For real-time applications, an efficient Gaussian mixture 7: wj = GMj (xj ) GMl (xj ) ∈ learning method [32] can be used, but the discussion of efficient l N k Gaussian mixture learning is beyond the scope of this paper. 8: end for Also, the EM algorithm is not adaptive in terms of the number 9: end for normalize {w(m )} of components in the Gaussian mixture model. To obtain adap- 10: j { (m ) (m )} tivity, an adaptive Gaussian mixture learning method [33]Ð[36] 11: return GMLEARN( xj ,wj ) can be used, but the discussion of adaptive Gaussian mixture 12: end procedure learning is also beyond the scope of this paper.

(j,m) (j,m) C. Fusion of Gaussian Mixtures be learned from the weighted samples {xn ,wn } using With posteriors represented as Gaussian mixtures, the fu- Algorithm 1. sion of Gaussian mixtures has to be considered for both the Here, the proposed approach draws samples from each Gaus- distributed fusion step (14) and the recovery step (18). For con- sian mixture to be fused, so that the drawn samples cover most venience, we convert the distributed fusion step (14) from the of the support of the fused density function. Since multiple pro- logarithmic form to the exponential form: posal distributions are used, the proposal approach equivalently samples from a mixture of proposal distributions. However, we ε (i+1) (i) kj do not have to use the whole mixture when calculating the ηk (xn )= ηj (xn ) . (20) j∈N k importance weight of each sample. Instead, since we know ex- actly which proposal distribution in the mixture each sample is Both (18) and (20) involve a product of powers of Gaussian drawn from, it would be more accurate if we use the correspond- mixtures, which is unfortunately intractable to compute ana- ing proposal distribution alone when calculating the importance lytically. Therefore, we consider importance sampling. In [37], weight. Also, since the sampling bias introduced by each pro- various methods are proposed to sample from a product of Gaus- posal distribution is eliminated when we divide the true density sian mixtures, but unfortunately none of them directly applies by the corresponding importance density, importance weights to our problem, because of the negative exponent in (18) and calculated under different proposal distributions are consistent. the fractional exponents in (20). In this paper, we extend the Another contribution of the proposed approach is weighted mixture importance sampling approach presented in [37] to a sample allocation. As we can see, the Gaussian mixtures in (20) general case where Gaussian mixtures in the product can have do not contribute equally to the product, and a Gaussian mixture fractional or negative exponents, and propose a weighted mix- with a large exponent contributes more to the product and is ture importance sampling approach for the fusion of Gaussian more influential in local fusion than one with a small exponent. mixtures in both (18) and (20). By adjusting the contribution of each Gaussian mixture to the 1) Distributed Fusion Step: We generate samples from each proposal distribution mixture according to its contribution to Gaussian mixture to be fused and assign to them importance the product, weighted sample allocation makes the proposal weights calculated under their corresponding proposal distri- distribution mixture closer to the product, thus improving the ∈ { (j,m)}M j butions. For each j Nk ,wedrawMj samples xn m =1 efficiency of importance sampling. (i) (j,m) from ηj (xn ) and assign to each xn an importance weight The weighted mixture importance sampling approach is sum- (j,m) wn calculated as marized in Algorithm 2 for the distributed fusion step. −1 ε 2) Recovery Step: We implement the recovery step (18) in (j,m) (i) (j,m) (i) (j,m) kl wn = ηj (xn ) ηl (xn ) . (21) a similar way via weighted mixture importance sampling. Let (∞) l∈N k GMk be the fully fused posterior held by Sk , i.e., ηk (xn ), and GMpk be the prior prediction of the current state by Sk , i.e., We set Mj to be proportional to the Metropolis weight εkj, i.e., f(xn |y − ). We draw half of the samples from GMk and the Mj = Mεkj , where is the floor function and M is the 1:n 1 (m ) given total number of samples to be drawn (the total number of other half from GMpk . For a sample xn drawn from GMk , its thus generated samples might be smaller than M due to round- importance weight is calculated as ing, but could be manually adjusted back to M by distributing (m ) K (m ) K −1 the unused quota to some of the Gaussian mixtures to be fused). (m ) GMk (xn ) GMk (xn ) wn = = ; (j,m) (m ) K −1 (m ) (m ) K −1 After applying the normalization step (15) to {wn }, a Gaus- GMpk (xn ) GMk (xn ) GMpk (xn ) (i+1) (22) sian mixture model of the updated posterior ηk (xn ) can LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 285

Algorithm 3: GM Recovery. Algorithm 4: Distributed Particle Filtering. 1: procedure GMRECOVER(GMk ,GMpk ) { (m ) (m ) }M,K 1: procedure DPF( xn−1,k,wn−1,k m,k=1, yn ) 2: initialize M 2: for k =1to K do in parallel filtering (m ) M/2 3: generate {x } from GM (m ) (m ) (m ) (m ) m =1 k {x ,w }M = {x ,w }M y 3: n,k n,k m =1 PF( n−1,k n−1,k m =1, n,k) 4: for m =1to M/2 do (m ) (m ) K −1 { }M (m ) (m ) (m ) 4: GMpk = GMLEARN( g(xn−1,k),wn−1,k m =1) 5: w = GMk (x )/GMpk (x ) { (m ) (m )}M 6: end for 5: GMk = GMLEARN( xn,k ,wn,k m =1) { (m )}M 6: end for 7: generate x m = M/2 +1 from GMpk 7: repeat fusion 8: for m = M/ 2 +1to M do K 8: for k =1to K do in parallel 9: w(m ) = GM (x(m ))/GM (x(m )) k pk 9: S sends GM to S for ∀j ∈ N 10: end for k k j k { (m )}M 10: end for 11: normalize w m =1 { (m ) (m )}M 11: for k =1to K do in parallel 12: return GMLEARN( x ,w m =1) 12: GM = GMFUSE(GM , {GM } ∈ ) 13: end procedure k k j j N k 13: end for 14: until convergence 15: for k =1to K do in parallel recovery (m ) for a sample xn drawn from GMpk , its importance weight is 16: GMk = GMRECOVER(GMk ,GMpk ) calculated as { (m )}M 17: generate xn,k m =1 from GMk (m ) K (m ) K 18: end for (m ) GMk (xn ) GMk (xn ) (m ) M,K wn = = . 19: return {x , 1/M } (m ) K −1 (m ) (m ) K n,k m,k=1 GMpk (xn ) GMpk (xn ) GMpk (xn ) 20: end procedure (23) A Gaussian mixture model of the recovered global posterior is (m ) (m ) V. P ERFORMANCE ANALYSIS then learned from the weighted samples {xn ,wn } using Algorithm 1. Note that we do not apply weighted sample al- In this section, we investigate the performance of the proposed location to the recovery step, because negative weights are not distributed particle filtering algorithm in terms of convergence, well justified for allocation. communication overhead, and computational complexity. The recovery step is summarized in Algorithm 3. As we can see, the fusion of Gaussian mixtures in A. Convergence of Average Consensus Algorithms 2 and 3 depends only on the density function de- The proposed distributed particle filtering algorithm is built scribed by each Gaussian mixture and does not care about how on an average consensus algorithm. A standard average con- many components each Gaussian mixture has. In other words, sensus algorithm is proved to converge under certain conditions the proposed method gives each individual sensor the flexibil- in [4], [21], [29], and [38]. However, the proof for standard ity to choose an optimal, yet not necessarily uniform, number average consensus does not directly apply to the proposed aver- of components based on its own samples, thus improving ap- age consensus algorithm in Section IV-A, because the proposed proximation accuracy and efficiency. In contrast, most other algorithm has an additional normalization step (15) for each sen- posterior-based algorithms fuse local posteriors based on their sor in each iteration and is thus different from standard average parameters rather than the density functions described by these consensus. We claim the convergence of the proposed average parameters, and thus put structural constraints on local paramet- consensus algorithm in (17) and, for rigorousness, show the ric representations. For example, linear fusion of Gaussian mix- convergence below, based on the proof for the convergence of tures [17], [18] requires each mixture to have the same number standard average consensus. of components. This requirement gives less flexibility to sensors Theorem 1: After a sufficiently large number of iterations and compromises adaptivity in local signal processing. of average consensus with normalization, the posterior held by each sensor converges in probability to the normalized geo- D. Summary metric mean of the initial local posteriors obtained from local We summarize the proposed distributed particle filtering al- particle filters. gorithm in Algorithm 4, in which “PF” is short for “particle Proof: The exponential form of (14) with normalization (15) can be written as filtering”, and the convergence in fusion is locally determined ε when the discrepancy among local beliefs is lower than a certain (i+1) (i+1) (i) kj η (xn )=γ η (xn ) , (24) threshold under a chosen metric or no neighbor is still sending k k j j∈N k data. We do not specify the exact particle filter used for local (i+1) (i+1) particle filtering, because any particle filter would fit. Also, each where γk is a constant coefficient that normalizes ηk (xn ) sensor can select its own customized particle filter, thanks to the so that it integrates to one. Each consensus iteration involves flexibility given by posterior-based fusion. such a constant coefficient for each sensor, and the constant 286 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 coefficient accumulates across iterations. We denote the part of Note that the proof above is for the average consensus al- (i) gorithm, instead of the distributed particle filtering algorithm, ηk that comes purely from the fusion of the original posteriors obtained from local particle filters (in other words, η(0))as and thus does not involve Gaussian mixture approximations or k Monte Carlo approximations. We discuss the convergence of p(i)(x ), and the accumulated constant coefficient that η(i) has k n k the distributed particle filtering algorithm, with approximations λ(i) (i) collected up to the ith iteration as k . Then, we have ηk (xn )= involved, in the following subsection. λ(i) (i) λ(0) (0) k pk (xn ). When i =0,wehave k =1and pk (xn )= | ≥ f(xn yn,k, y1:n−1 ); when i 1,wehave B. Convergence of Distributed Particle Filtering εkj η(i+1)(x )=γ(i+1) λ(i)p(i)(x ) The proposed distributed particle filtering algorithm imple- k n k j j n ments the proposed average consensus algorithm with approxi- j∈N k mations and asymptotically converges under the following three ε ε (i+1) λ(i) kj × (i) kj assumptions: (i) the number of consensus iterations is suffi- = γk j pj (xn ) . (25) ∈ ∈ ciently large, (ii) the number of generated samples is suffi- j Nk j N k ciently large, and (iii) the approximation error of a Gaussian λ( i +1) p( i +1)(x ) k k n mixture model is sufficiently small. In practice, however, none (i+1) of these can be perfectly satisfied without considerable commu- The logarithmic form of the last term pk (xn ) is nication or computation. Hence, convergence errors are usually (i+1) (i) inevitable. Due to independent randomness, different sensors log p (xn )= εkj log p (xn ) k j are likely to have different convergence errors, thus resulting j∈N k in consensus errors. Although the proposed algorithm does not (i) (i) − (i) = log pk (xn )+ εkj log pj (xn ) log pk (xn ) , require exact consensus as weight-based algorithms do, inexact j∈N k consensus, if too significant, leads to errors in both filtering and (26) fusion in upcoming time steps. Consensus errors can be manually eliminated by additional which coincides with the canonical form of (weighted) average average consensus on the parameters of the obtained Gaussian consensus. With the underlying graph G being connected and mixture models. To promote the convergence of the average con- not bipartite, according to [4], [21], [29], and [38], we have the sensus, we match similar components between different Gaus- following convergence in probability sian mixtures based on the Kullback-Leibler (KL) distance, and K perform local parameter averaging among the matched compo- (i) 1 (0) lim log p (xn )= log p (xn ), (27) nents. As mentioned in the Introduction, parameter-based aver- i→∞ k K k k=1 age consensus is not justified by the underlying statistical model or equivalently, and thus is suboptimal in the fusion of local posteriors. However, its suboptimality is not problematic here, because the method is K 1 (i) (0) K used not for fusion but for numerical fine-tuning of beliefs that lim p (xn )= p (xn ) i→∞ k k are already close to consensus. Also, because of the closeness to k=1 consensus, it is not expected to take many consensus iterations. K 1 Note that parameter-based average consensus requires that | K = f(xn yn,k, y1:n−1 ) . (28) all the Gaussian mixtures to be fused have the same number k=1 of components. To satisfy the constraint, we have to adjust the Hence, number of components for each Gaussian mixture in case they (i) (i) (i) do not agree. We achieve this via sampling. More specifically, lim η (xn ) = lim λ p (xn ) i→∞ k i→∞ k k we first sample from each Gaussian mixture and then learn from (i) (i) the samples a Gaussian mixture model with a specified uniform = lim λ lim p (xn ) i→∞ k i→∞ k number of components. K (i) 1 λ | K C. Communication Overhead = lim f(xn yn,k, y1:n−1 ) , (29) i→∞ k k=1 In the proposed algorithm, posteriors are transmitted be- (i) tween sensors in the form of Gaussian mixtures. Let C be where lim i→∞ ηk (xn ) is the posterior held by Sk at conver- K 1 the average number of components in these Gaussian mix- f(x |y , y ) K gence, k=1 n n,k 1:n−1 is the geometric mean of tures, then we need to transmit C(d2 + d +1) numbers per λ(i) the initial local posteriors, and limi→∞ k normalizes the ge- Gaussian mixture, with d being the state dimension. Since ometric mean so that it exists as a valid probability density covariance matrices are symmetric, we only need to transmit (i) 2 2 function in the form of limi→∞ ηk (xn ). (d + d)/2, instead of d , numbers for each covariance matrix Following the convergence, the global posterior can be ob- in a Gaussian mixture. Also, since component weights sum to tained separately by each sensor through a recovery step, which one, we only need to transmit C − 1, instead of C, component results from substituting (29) into (11). weights. Thus, the actual count of numbers needed to represent a LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 287

Gaussian mixture is Cd2 /2+(C/2+1)d + C − 1. In a con- approximation (M is the sample size, q is the dimension of the sensus iteration, each communication link is used once in each state function appearing in factorization, and R is the dimension direction, so the total number of Gaussian mixtures transmit- of the polynomial basis expansion) and O(LKR) on consensus ted in a consensus iteration is 2|E|.LetL be the number of (L is the number of consensus iterations). Thus, the overall consensus iterations, and then the proposed algorithm commu- complexity is O(R3 +(M + q)R2 +[(d + q)M + LK]R). nicates 2|E|L[Cd2 /2+(C/2+1)d + C − 1] numbers in to- Since R itself is a combinatorial function of the state dimension tal and 2|E|L[Cd2 /2+(C/2+1)d + C − 1]/K numbers per d, the cubic function of R might make the algorithm scale sensor during each time step. Since |E| ranges from O(K) to poorly in high-dimensional systems. A weight-based algorithm O(K2 ) for a connected graph, the communication complexity [9] only needs to perform average consensus on weights per sensor is between O(LCd2 ) and O(KLCd2 ). with a computational complexity of O(LKM). A Gaussian In comparison, the count of numbers transmitted by each sen- posterior-based algorithm costs O(LKd3 ). Generally, the sor in a weight-based algorithm is proportional to the number proposed algorithm and the likelihood-based algorithm require of particles, which is proved to grow exponentially with the more computation than the weight-based algorithm and the state dimensionality d for a successful particle filter [39]; the Gaussian posterior-based algorithm. The former two algorithms communication complexity of a likelihood-based algorithm is use a certain compact representation for inter-sensor com- combinatorial in the state dimension, because the number of re- munication and thus need to enclose information in and read gression coefficients needed for the polynomial approximation information out of the representation. Such a representation presented in [11] is combinatorial with the state dimension; the incurs much lower communication overhead than particles, and communication complexity of a posterior-based algorithm with provides a more accurate approximation than a single Gaussian the Gaussian approximations is quadratic in the state dimen- distribution, as shown in Section VI. sion, with d and (d2 + d)/2 numbers to represent the mean and covariance matrix, respectively, of a Gaussian distribution. It is hard to directly compare the communication complex- VI. NUMERICAL EXAMPLES ity of the proposed algorithm with those of other algorithms, In this section, we demonstrate the performance of the pro- because the dependence of L and C on d is mostly problem- posed distributed particle filtering algorithm in comparison specific. In Section VI-E, we compare the actual communication with weight-based, likelihood-based, and other posterior-based costs of different algorithms through numerical examples. algorithms, through numerical examples of distributed target tracking. D. Computational Complexity We now investigate the computational complexity of dis- A. General Settings tributed particle filtering algorithms, focusing on the fusion part We considered a wireless sensor network consisting of 20 with the filtering part excluded, because the latter has almost the sensors programmed to track a moving target. same complexity among different algorithms. Since a distributed The target followed a Wiener process acceleration model [40] particle filtering algorithm runs at each sensor in parallel, we in two-dimensional space. The target state consisted of the po- only consider the computation performed at a single sensor. sition, velocity, and acceleration of the target in each dimension The proposed algorithm calls Algorithms 1, 2, and 3. 2 as Algorithm 1 (GM learning) costs O(Lg Mg Cd ) to learn a Gaussian mixture of C components from Mg samples in Lg T iterations. Algorithm 2 (GM fusion) calls Algorithm 1 once xn = xn,1 xn,2 x˙ n,1 x˙ n,2 x¨n,1 x¨n,2 (30) 2 and costs O(KMf Cd ) in addition to calling Algorithm 1, where K means that a sensor has at most O(K) neighbors and The state transition function was Mf is the sample size for importance sampling in distributed fusion. Algorithm 3 (GM recovery) also calls Algorithm 1 g(x )=D · x , (31) 2 n n once and costs O(Mr Cd ) in addition to calling Algorithm 1, with Mr being the sample size for importance sampling in recovery. In addition to Algorithms 1, 2, and 3, the pro- where posed algorithm calls the parameter-based average consensus ⎡ ⎤ 2 1 2 algorithm for numerical fine-tuning, which costs O(KCd ) in 10t 0 2 t 0 ⎢ 1 ⎥ ⎢ 010t 0 t2 ⎥ each iteration. In summary, if we assume that the proposed ⎢ 2 ⎥ ⎢ 0010 t 0 ⎥ algorithm takes Lf iterations for distributed fusion and Lp iter- D = ⎢ ⎥ (32) ations for fine-tuning, then the overall computational complex- ⎢ 0001 0 t ⎥ 2 ⎣ 0000 1 0⎦ ity is O((Lf Lg Mg + KLf Mf + Mr + KLp )Cd ). Assuming 0000 0 1 that Mg , Mf , and Mr are all O(M), then the complexity sim- 2 plifies to O([(Lg + K)Lf M + KLp ]Cd ). In comparison, a likelihood-based algorithm [11] with t being the state transition interval. The state transi- 3 2 costs O(R +(M + q)R +(d + q)MR) on polynomial tion noise un followed a multivariate Gaussian distribution 288 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

20,000 samples for importance sampling in both distributed fusion and recovery of the proposed algorithm. We compared the proposed algorithm (“Optimal GM”) with other posterior-based algorithms, including the Bayesian fusion of Gaussian approximations (“Bayesian Gauss”) [13] and the linear fusion of Gaussian mixtures (“Linear GM”) [18]. We also compared it with a representative weight-based algorithm [9], likelihood-based algorithm [11], and distributed unscented Kalman filter (UKF) [41], which can be also considered as a posterior-based algorithm, although it does not involve particle filtering. Moreover, we compared it with centralized particle filtering, which served as a benchmark. We tested all the algorithms on repeated experiments and compared their average performance.

B. Metrics

Fig. 1. Wireless sensor network, communication links, and target trajectory. We considered the posterior mean as a point estimate of each state and used the root-mean-square error (RMSE) to quantify the performance. For a single state xn , the RMSE of an esti- N (0, R), where mate xˆn was defined as ||xˆn − xn ||, namely the l-2 norm of ⎡ ⎤ − { }T 1 t5 0 1 t4 0 1 t3 0 xˆn xn ; for a state sequence of length T , i.e., xn n=1,the 20 8 6 T ⎢ ⎥ average RMSE (ARMSE) of a sequence estimate, {xˆn } , ⎢ 0 1 t5 0 1 t4 0 1 t3 ⎥ n=1 ⎢ 20 8 6 ⎥ 1 T || − ||2 ⎢ 1 4 1 3 1 2 ⎥ was defined as T n=1 xˆn xn . In a network that per- 2 ⎢ 8 t 0 3 t 0 2 t 0 ⎥ R = σ ⎢ ⎥. (33) forms distributed filtering, each sensor holds a separate global u ⎢ 0 1 t4 0 1 t3 0 1 t2 ⎥ ⎢ 8 3 2 ⎥ estimate and thus has its own RMSE and ARMSE. We used their ⎣ 1 3 1 2 ⎦ 6 t 0 2 t 0 t 0 averages to quantify the performance of the whole network. We 1 3 1 2 used the Kullback-Leibler (KL) distance [42] to describe the 0 6 t 0 2 t 0 t dissimilarity between two Gaussian mixtures. Since it is ana- We assumed that the target traveled over 30 unit-length state lytically intractable to compute the KL distance between two transition intervals. Gaussian mixtures, we approximated it using the first Gaussian Each sensor measured the range and range rate (Doppler) of approximation approach introduced in [43]. the target. The kth sensor, Sk , was located at lk =(lk,1 ,lk,2 ) with the observation function C. Accuracy T hk (xn )= hk,range(xn ) hk,doppler(xn ) , (34) We tested all the methods on repeated experiments to com- where pare their trajectory estimation accuracy as an ensemble average. Fig. 2 compares the ARMSEs as a function of the number of 2 2 hk,range(xn )= (xn,1 − lk,1 ) +(xn,2 − lk,2 ) (35) particles, under sufficient consensus iterations. We can see that the error of the proposed method varied the most with the num- and ber of particles. With 2000 particles, its error was the second

x˙ n,1 (xn,1 − lk,1 )+x ˙n,2 (xn,2 − lk,2 ) highest; with no less than 8000 particles, its error was close hk,doppler(xn )= , (36) (x − l )2 +(x − l )2 to that of centralized particle filtering and no higher than that n,1 k,1 n,2 k,2 of any other method. The performance of the proposed method and the observation noise varied significantly because the approximation accuracy of a 0 σ2 0 Gaussian mixture is strongly affected by the number of particles v ∼N , v . (37) n,k 0 0 σ2 used in local particle filtering. In contrast, the error of Bayesian w Gauss, also a posterior-based method, stayed almost constant We set σu as 0.5, σv as 1, and σw as 1. We set the neigh- across different numbers of particles, because the accuracy of a borhood radius threshold ρ according to the sensor locations to Gaussian approximation, consisting of a mean and a covariance ensure that the network was connected. We set the initial target matrix only, is relatively robust to the number of particles used state x0 as 0, and assumed N (x0 , R) as the prior information to represent a local posterior. The error of Linear GM, another available to each sensor. Fig. 1 shows a wireless sensor network posterior-based method, did not vary much with the number of and a realization of the target trajectory under random noise. particles either, because Linear GM failed to benefit from the In a predefined fashion, we assumed three components for any increased number of particles due to its unjustified fusion rule. Gaussian mixture. We used the sampling importance resam- The errors of both the likelihood-based and weight-based meth- pling (SIR) particle filter [1] for local particle filtering. We used ods dropped as the number of particles increased. Their errors LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 289

Fig. 4. KL distance and state estimation RMSE across iterations during the 10th time step using the proposed method. Fig. 2. A comparison in the trajectory estimation ARMSE as a function of the number of particles. high errors at many time points. Among all the methods, Linear GM obviously yielded the highest errors.

D. Consensus We investigated the consensus process of the proposed method within a single time step. Fig. 4 shows the consensus process during the 10th time step as an example, in which we applied the proposed method, with 10 000 particles for local filtering, 20 iterations for average consensus, and 20 iterations for numerical fine-tuning, to the example in Fig. 1. We can see that in both average consensus and numerical fine-tuning, both the KL distance and the RMSE dropped and converged as the algorithm proceeded, which demonstrated the validity of the proposed average consensus algorithm in terms of convergence. Note that the metrics in Fig. 4 were computed based on unre- covered beliefs for average consensus and recovered beliefs for Fig. 3. A comparison in the state estimation RMSE as a function of time. numerical fine-tuning, so they came in different scales.

E. Communication Overhead were lower than that of the proposed method when the number of particles was small, and comparable to that of the proposed We investigated the communication overhead of each method method when the number of particles was either medium or and the relationship between communication overhead and es- large. In summary, the proposed method was the most accurate timation accuracy. For each method, we fixed the number of among all the posterior-based methods, and competitive with the particles at its elbow point, as specified in Section VI-C, and likelihood-based and weight-based methods, when the number investigated its performance, as an ensemble average from of particles was not too small. repeated experiments, under a varied number of consensus We also investigated the state estimation accuracy of each iterations. method. For each method, we used the number of particles cor- In Fig. 5, we demonstrate the effect of the number of con- responding to its elbow point in Fig. 2, i.e., 10 000 particles sensus iterations on the performance of a distributed filtering for the proposed method, 2000 for Bayesian Gauss, 4000 for algorithm. As we can see, the error of each method dropped as Linear GM, 10 000 for the weight-based method, 8000 for the the number of iterations increased and stayed constant beyond likelihood-based method, and 6000 for centralized particle fil- a certain threshold. tering. In Fig. 3, we show the state estimation RMSE of each In Fig. 6, we show the trajectory estimation ARMSE of method as a function of time along the trajectory of the tar- each method as a function of the communication overhead get. We can see that the proposed method, the likelihood-based per time step. We used the count of numbers transmitted be- method, the weight-based method, and centralized particle fil- tween sensors in the network to quantify the communication tering had state estimation errors at almost the same level, while overhead of each method. As expected, there was a trade-off Linear GM, Bayesian Gauss, and distributed UKF suffered from between estimation accuracy and communication efficiency. 290 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

Fig. 7. Trajectory estimation ARMSE and average degree as functions of the local communication radius. Fig. 5. Trajectory estimation ARMSE as a function of the number of consensus iterations.

accuracy of Gaussian approximations. The communication overhead of distributed UKF was close to that of the proposed method, while that of Bayesian Gauss was lower than that of any other method in Fig. 6. Note that the trade-off between estimation accuracy and communication efficiency existed not only within each method, but also between different methods. As we can see, the proposed method was more accurate than Bayesian Gauss, benefiting from the upgrade from Gaussian approximations to Gaussian mixture approximations, but in the meantime incurred extra communication overhead due to the upgrade. Given the significant improvement in accuracy, we claim that the extra communication incurred by the Gaussian mixture model used in the proposed method was justified.

F. Local Communication Radius The local communication radius determines the number of neighbors for each sensor. Fig. 7 shows the effect of the radius on the performance of distributed particle filtering methods, Fig. 6. Trajectory estimation ARMSE as a function of the communication with both the number of particles and the number of consensus cost per time step. iterations fixed at the respective elbow points corresponding to each method. The simulations were conducted on the network in Fig. 1, whose default radius was 48. As we can see in Fig. 7, For each method, the estimation error dropped as the com- when the radius was increasing below the default radius, the er- munication overhead increased, but stayed almost constant be- rors of distributed UKF and weight-based method decreased dra- yond a certain threshold. The proposed method, the weight- matically, and those of the proposed method, Bayesian Gauss, based method, and the likelihood-based method had errors at and the likelihood-based method decreased slightly; when the the same level but communication costs of different orders of radius was increasing above the default radius, the error of each magnitude. The weight-based method, which communicated method either stayed constant or decreased slightly. In fact, the non-parametric representations, transmitted significantly more radius controls the rate of consensus. When the radius is small, numbers than the proposed method and the likelihood-based it might takes many iterations of communication for informa- method, both of which communicated parametric represen- tion to be transmitted from a sensor to another in the network; tations. The likelihood-based method, which used polyno- when the radius is sufficiently large, a sensor can communicate mial approximations, transmitted more numbers than the pro- directly with any other sensor in the network, and the network posed method, which used Gaussian mixture approximations. becomes effectively centralized. When the number of consensus Bayesian Gauss and distributed UKF, both posterior-based iterations is fixed, the radius effectively controls the progress of methods, had errors at the same level, higher than that of consensus. Thus, when a radius is large enough for the net- the proposed method, due to the insufficient approximation work to reach consensus within a given number of consensus LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 291

Iterations, it would not help much to further increase the radius, [6] C. J. Bordin and M. G. S. Bruno, “Consensus-based distributed particle fil- as shown in Fig. 7. Also, a large radius adds to the difficulty of tering algorithms for cooperative blind equalization in receiver networks,” in Proc. 2011 IEEE Int. Conf. Acoust., Speech Signal Process., Prague, wireless communication, and thus might not be always desirable Czech Republic, May 2011, pp. 3968Ð3971. in distributed fusion. [7] C. J. Bordin and M. G. Bruno, “Distributed particle filtering for blind equalization in receiver networks using marginal non-parametric approx- VII. CONCLUSION imations,” in Proc. 2014 IEEE Int. Conf. Acoust., Speech Signal Process., Florence, Italy, May 2014, pp. 7984Ð7987. In this paper, we proposed a distributed particle filtering algo- [8] S. Farahmand, S. Roumeliotis, and G. B. Giannakis, “Set-membership rithm based on optimal fusion of local posteriors approximated constrained particle filter: Distributed adaptation for sensor networks,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4122Ð4138, as Gaussian mixtures. We implemented the optimal fusion rule Jun. 2011. in a distributed fashion via an average consensus algorithm. We [9] V.Savic, H. Wymeersch, and S. Zazo, “Belief consensus algorithms for fast derived a distributed fusion rule for the consensus algorithm distributed target tracking in wireless sensor networks,” Signal Process., vol. 95, pp. 149Ð160, Feb. 2014. and performed the fusion of Gaussian mixtures via a proposed [10] O. Hlinka, F. Hlawatsch, and P. M. Djuric,« “Likelihood consensus-based variant of importance sampling. With an extra normalization distributed particle filtering with distributed proposal density adaptation,” step involved in the distributed fusion rule, the convergence of in Proc. 2012 IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 2012, pp. 3869Ð3872. the proposed average consensus algorithm does not directly fol- [11] O. Hlinka, O. Sluciak,ˇ F. Hlawatsch, P. M. Djuric,« and M. Rupp, low from that of a standard average consensus algorithm. We “Likelihood consensus and its application to distributed particle fil- therefore proved the convergence of the proposed average con- tering,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4334Ð4349, Aug. 2012. sensus algorithm and then validated it with numerical examples. [12] O. Sluciak,ˇ O. Hlinka, M. Rupp, F. Hlawatsch, and P. M. Djuric,« “Se- We also demonstrated through numerical examples that the pro- quential likelihood consensus and its application to distributed parti- posed distributed particle filtering algorithm is more accurate cle filtering with reduced communications and latency,” in Proc. 45th Asilomar Conf. Signals, Syst. Comput., Pacific Grove, CA, Nov. 2011, than other posterior-based algorithms, and approaches central- pp. 1766Ð1770. ized particle filtering when sufficient particles are used for local [13] B. N. Oreshkin and M. J. Coates, “Asynchronous distributed particle filter filtering. We compared communication costs and showed that via decentralized evaluation of Gaussian products,” in Proc. 13th Int. Conf. Inform. Fusion, Edinburgh, U.K., Jul. 2010, pp. 1Ð8. the proposed algorithm incurs a lower communication cost than [14] A. Mohammadi and A. Asif, “Consensus-based distributed unscented the weight-based and likelihood-based algorithms, thanks to the particle filter,” in Proc. 2011 IEEE Statistical Signal Process. Workshop, compact representation of the Gaussian mixture model, and that Nice, France, Jun. 2011, pp. 237Ð240. [15] J. K. Uhlmann, “Dynamic map building and localization: New theoretical the extra communication cost on the use of the Gaussian mixture foundations,” Ph.D. dissertation, Dept. Eng. Sci., Univ. Oxford, Oxford, model, instead of a Gaussian approximation, in the proposed al- U.K., 1995. gorithm is justified by the improvement in accuracy. [16] S. J. Julier and J. K. Uhlmann, “A non-divergent estimation algorithm in the presence of unknown correlations,” in Proc. 1997 Am. Control Conf., The advantages of the proposed distributed particle filter- Jun. 1997, vol. 4, pp. 2369Ð2373. ing algorithm extend beyond accuracy and communication effi- [17] D. Gu, J. Sun, Z. Hu, and H. Li, “Consensus based distributed particle filter ciency. As a posterior-based algorithm, it allows diverse sensing in sensor networks,” in Proc. 2008 Int. Conf. Inform. Autom., Changsha, China, Jun. 2008, pp. 302Ð307. modalities and filtering tools to be exploited by the network; by [18] D. Gu, “Distributed particle filter for target tracking,” in Proc. performing importance sampling in fusion, it does not require 2007 IEEE Int. Conf. Robot. Autom., Rome, Italy, Apr. 2007, uniformity in local approximations but allows each sensor to pp. 3856Ð3861. [19] S. J. Julier, “An empirical study into the use of Chernoff information for approximate its local belief as a Gaussian mixture with an adap- robust, distributed fusion of gaussian mixture models,” in Proc. 9th Int. tively determined number of mixture components based on its Conf. Inform. Fusion, 2006, pp. 1Ð8. own data. Although adaptive Gaussian mixture learning is not [20] M. B. Guldogan, “Consensus Bernoulli filter for distributed detection and tracking using multi-static Doppler shifts,” IEEE Signal Process. Lett., a focus of this paper, we will explore efficient solutions in fu- vol. 21, no. 6, pp. 672Ð676, Jun. 2014. ture work. Also, we will study analytical approximations to the [21] G. Battistelli, L. Chisci, C. Fantacci, A. Farina, and A. Graziano, “Consen- product of powers of Gaussian mixtures. Moreover, we will ex- sus CPHD filter for distributed multitarget tracking.” IEEE J. Sel. Topics Signal Process., vol. 7, no. 3, pp. 508Ð520, 2013. plore gossip algorithms to be used in the fusion of posteriors in [22] C. M. Bishop, “Mixture models and EM,” in, Pattern Recognition and distributed particle filtering. Machine Learning. New York, NY, USA: Springer, 2007. [23] P. Chavali and A. Nehorai, “Managing multi-modal sensor networks using REFERENCES price theory,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4874Ð4887, Jun. 2012. [1] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on par- [24] J. V. Candy, Bayesian Signal Processing: Classical, Modern and Par- ticle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEE ticle Filtering Methods. New York, NY, USA: Wiley-Interscience, Trans. Signal Process., vol. 50, no. 2, pp. 174Ð188, Feb. 2002. 2009. [2] R. E. Kalman, “A new approach to linear filtering and prediction prob- [25] Y.-C. Wu, Q. Chauhari, and E. Serpedin, “Clock synchronization of wire- lems,” J. Basic Eng., vol. 82, no. 1, pp. 35Ð45, 1960. less sensor networks,” IEEE Signal Process. Mag., vol. 28, no. 1, pp. 124Ð [3] O. Hlinka, F. Hlawatsch, and P. M. Djuri, “Distributed particle filtering in 38, Jan. 2011. agent networks,” IEEE Signal Process. Mag., vol. 1, no. 30, pp. 61Ð81, [26] O. Simeone, U. Spagnolini, Y. Bar-Ness, and S. H. Strogatz, “Dis- Jan. 2013. tributed synchronization in wireless networks,” IEEE Signal Process. [4] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation Mag., vol. 25, pp. 81Ð97, Sep. 2008. in networked multi-agent systems,” Proc. IEEE, vol. 95, no. 1, pp. 215Ð [27] J. Li and A. Nehorai, “Joint sequential target estimation and clock syn- 233, Jan. 2007. chronization in wireless sensor networks,” IEEE Trans. Signal Inform. [5] D. Ustebay,¬ M. Coates, and M. Rabbat, “Distributed auxiliary par- Process. Netw., vol. 1, no. 2, pp. 74Ð88, Jun. 2015. ticle filters using selective gossip,” in Proc. 2011 IEEE Int. Conf. [28] Y. Bar-Shalom, “Distributed multitarget multisensor tracking,” in Acoust., Speech Signal Process., Prague, Czech Republic, May 2011, Multitarget-Multisensor Tracking: Advanced Applications.Storrs,CT, pp. 3296Ð3299. USA: YBS Publishing, 1990, vol. 1, ch. 8. 292 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

[29] L. Xiao, S. Boyd, and S. Lall, “Distributed average consensus with time- Arye Nehorai (S’80–M’83–SM’90–F’94–LF’17) varying Metropolis weights,” Automatica, Jun. 2006. [Online]. Available: received the B.Sc. and M.Sc. degrees from the Tech- http://web.stanford.edu/~boyd/papers/avg_metropolis.html nion, Israel and the Ph.D. from Stanford University, [30] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood Stanford, CA, USA. He is the Eugene and Martha from incomplete data via the EM algorithm,” J. Roy. Statist. Soc., Series Lohman Professor of Electrical Engineering in the B (Methodological), vol. 39, no. 1, pp. 1Ð38, 1977. Preston M. Green Department of Electrical and Sys- [31] N. R. Ahmed and M. Campbell, “Fast consistent Chernoff fusion of Gaus- tems Engineering (ESE), Washington University in sian mixtures for ad hoc sensor networks,” IEEE Trans. Signal Process., St. Louis (WUSTL), St. Louis, MO, USA. He served vol. 60, no. 12, pp. 6739Ð6745, Nov. 2012. as Chair of this department from 2006 to 2016. Un- [32] D. M. Blei and M. I. Jordan, “Variational inference for Dirichlet process der his leadership, the undergraduate enrollment has mixtures,” Bayesian Anal., vol. 1, no. 1, pp. 121Ð144, 2006. more than tripled and the master’s enrollment has [33] P. Pacl«õk and J. Novovicovˇ a,« “Number of components and initialization in grown sevenfold. He is also a Professor in the Division of Biology and Biomed- Gaussian mixture model for pattern recognition,” in Artificial Neural Nets ical Sciences, the Division of Biostatistics, the Department of Biomedical En- and Genetic Algorithms. New York, NY, USA: Springer, 2001, pp. 406Ð gineering, and Department of Computer Science and Engineering, and Director 409. of the Center for Sensor Signal and Information Processing at WUSTL. Prior [34] T. Huang, H. Peng, and K. Zhang, “Model selection for Gaussian mixture to serving at WUSTL, he was a faculty member at Yale University and the models,” arXiv:1301.3558, 2013. University of Illinois at Chicago. [35] R. J. Steele and A. Raftery, “Performance of Bayesian model selection He was the Editor-in-Chief of the IEEE TRANSACTIONS ON SIGNAL PRO- criteria for Gaussian mixture models,” in Frontiers Statistical Decision CESSING from 2000 to 2002. From 2003 to 2005, he was the Vice President Making Bayesian Analysis. New York, NY, USA: Springer-Verlag, 2010, (Publications) of the IEEE Signal Processing Society (SPS), the Chair of the pp. 113Ð130. Publications Board, and a member of the Executive Committee of this Society. [36] C. Olivier, F. Jouzel, and A. Matouat, “Choice of the number of component He was the founding Editor of the special columns on Leadership Reflections clusters in mixture models by information criteria,” in Proc. Vis. Interface, in the IEEE SIGNAL PROCESSING MAGAZINE from 2003 to 2006. 1999, pp. 74Ð81. He received the 2006 IEEE SPS Technical Achievement Award and the 2010 [37] A. T. Ihler, E. B. Sudderth, W. T. Freeman, and A. S. Willsky, “Efficient IEEE SPS Meritorious Service Award. He was elected Distinguished Lecturer multiscale sampling from products of Gaussian mixtures,” Adv. Neural of the IEEE SPS for a term lasting from 2004 to 2005. He received several best Inform. Process. Syst., vol. 16, pp. 1Ð8, 2004. paper awards in IEEE journals and conferences. In 2001, he was named Uni- [38] G. Battistelli and L. Chisci, “Kullback-Leibler average, consensus on versity Scholar of the University of Illinois. He was the Principal Investigator probability densities, and distributed state estimation with guaranteed sta- of the Multidisciplinary University Research Initiative project titled Adaptive bility,” Automatica, vol. 50, no. 3, pp. 707Ð718, 2014. Waveform Diversity for Full Spectral Dominance from 2005 to 2010. He is a [39] C. Snyder, T. Bengtsson, P. Bickel, and J. Anderson, “Obstacles to high- Fellow of the Royal Statistical Society since 1996 and Fellow of AAAS since dimensional particle filtering,” Monthly Weather Rev., vol. 136, no. 12, 2012. pp. 4629Ð4640, Dec. 2008. [40] Y. Bar-Shalom, P. K. Willett, and X. Tian, “Introduction,” in Tracking and Data Fusion: A Handbook of Algorithms. Storrs, CT: YBS Publishing, 2011, ch. 1. [41] W. Li and Y. Jia, “Consensus-based distributed multiple model UKF for jump Markov nonlinear systems,” IEEE Trans. Autom. Control, vol. 57, no. 1, pp. 227Ð233, Dec. 2012. [42] T. M. Cover and J. A. Thomas, Elements of Information Theory.New York, NY, USA: Wiley, 2012. [43] J. R. Hershey and P. Olsen, “Approximating the Kullback Leibler diver- gence between Gaussian mixture models,” in Proc. 2007 IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2007, vol. 4, pp. 317Ð320.

Jichuan Li received the B.Sc. degree in electrical engineering from Fudan University, Shanghai, China, in 2011 and the M.Sc. and Ph.D. degrees in electrical engineering from Washington Univer- sity in St. Louis, St. Louis, MO, USA, in 2014 and 2016, respectively, under the guidance of Dr. Arye Nehorai. He is currently a Software Engineer at Google, Inc., Mountain View, CA, USA. His research interests include statistical signal processing, Bayesian inference, Monte Carlo methods, machine learning, and distributed computing.