PARALLEL STOCHASTIC PARTICLE METHODS USING MARKOV CHAIN RANDOM WALKS
A DISSERTATION SUBMITTED TO THE DEPARTMENT OF AERONAUTICS AND ASTRONAUTICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
Sun Hwan Lee December 2010
© 2011 by Sun Hwan Lee. All Rights Reserved. Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/jn897hc5058
ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Matthew West, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Peter Glynn, Co-Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Juan Alonso
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Sanjay Lall
Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.
iii Abstract
Particle methods, also known as Monte Carlo methods in the statistical community, have become a powerful tool for a variety of research areas such as chemistry, as- tronomy, and finance, to list a few. This is mainly due to the enormous advances in computational resources in recent years. In this work, we consider an efficient and robust parallel methodology that can be applied to particle methods in a general setting. The parallel methodology proposed in this thesis takes advantage of Markov Chain random walks and corresponding Markov chain theory. We develop parallel stochastic particle methods in two different areas: (1) the optimal filtering problem, and (2) simulation of particle coagulation. In each application, a mathematical proof of convergence as well as a numerical example are provided. After a brief review of Markov Chain random walks and an explanation of the two application areas in chapter 1, the Markov Chain Distributed Particle Filter (MCDPF) algorithm is introduced. The performance of this method is demonstrated with a bearing-only-measurement target-tracking numerical example and is further compared with an existing method, the Distributed Extended Kalman Filter (DEKF), using a flocking model for the target vehicles. We study the convergence of MCDPF to the Centralized Particle Filter (CPF) and the optimal filtering solution by using results from Markov chain theory. In addition, the robustness of the MCDPF method is highlighted for practical problems. As the second application area, we developed a parallel stochastic particle method for the stochastic simulation of Smoluchowski’s coagulation equation. This equation is used in many broad areas and for high-dimensional problems the stochastic particle solution is more accurate, stable and computationally cheaper than classical numerical
iv integration schemes. In this application, simulated particles can be considered as representing physical particles. Since more particles result in more accurate and useful solutions, it is desirable to simulate this equation with a greater number of particles. By applying the parallel stochastic particle method, a comparable solution is obtained more efficiently using multiple processors, where one processor maintains many fewer particles by communicating with neighboring processors. A numerical study as well as a theoretical analysis are provided to demonstrate the convergence of the parallel stochastic particle algorithm.
v Acknowledgement
For the six years I spent for my M.S. and Ph.D. degree, I have never noticed the importance of people around me who helped me in various ways. One nice thing about defending the Ph.D. degree is that it gives me an opportunity to pause and appreciate such valuable people when I wrap up my studies at this moment. First, I would like to thank Professor Matt West, my advisor, for giving me a great research opportunity and a lot of valuable advice. He is the one who introduced me the area of numerical computation, stochastic systems, and probability, and I was able to explore totally new fields for me due to his deep knowledge and generous support. I could not have finished my degree without his guidance and patience. I also want to thank Professor Peter Glynn who advised me after Professor Matt West left Stanford. I learned a lot from his classes about stochastic systems and calculus, which equipped me with theoretical background on those subjects. It was a great help for me to have someone with whom I could consult about research in person. Thanks to Professor Sanjay Lall, Juan Alonso and James Primbs for generously being the committee members of my Ph.D. oral examination. The Samsung Scholarship Foundation supported me for four years of my graduate studies. Along with the financial support, I really appreciate the opportunities to meet with other Korean students studying across the world and the great experiences with them. My thanks should go to friends I met here at Stanford: Younggeun Cho, Taemie Kim, Taesup Moon, Jeeyoung Peck, Chunki Park, Jinsung Kwon, Jongyoon Peck, Kahye Song, Daeseok Nam, Minyong Shin, Hyungsik Shin, Jonghan Kim, Jaeheung Park, and all SGBT members. I always miss Korea because of my friends: Taesung Choi, Jiyoung Kang, Keum-Dong Jung, Yoonkyoung Hur, Jisun Peck, Hyejung Lee,
vi Seungmin Wie, and Sehyuk Kwak. It is not enough to thank my family in Korea for their spiritual support and love. My sincere thanks goes to my parents, Joowon Lee and Hwasook Park, whom I respect the most in the world, and to my older brother, Daehwan Lee, who is a good competitor at all kinds of sports and I hope to have many rounds of golf together. I also would like to thank my parents-in-law for their love and care. Last but not least, I would like to thank my family, YeoMyoung and Yuna. My marriage and the birth of my daughter changed my Ph.D. life dramatically, but in a very positive way. From the first moment at Stanford West tennis court to finishing my Ph.D. degree, we enjoyed life at Stanford as a student family and I am so excited about the journey toward the new stage of our life from now on. I love you and thank you, YeoMyoung and Yuna.
vii Contents
Abstract iv
Acknowledgement vi
1 Introduction 1 1.1 Problem description ...... 1 1.2 Dissertation overview ...... 3
2 Background 4 2.1 Markov chain random walk ...... 4 2.2 Steady state of Markov chains ...... 6
3 Markov Chain Distributed Particle Filter 9 3.1 Introduction ...... 10 3.2 Random walks on a graph ...... 12 3.3 Particle filters ...... 14 3.3.1 Centralized particle filters ...... 15 3.3.2 The Markov Chain Distributed Particle Filter (MCDPF) . . . 16 3.3.3 Convergence to CPF and algorithm ...... 18 3.3.4 Convergence to optimal filtering ...... 20 3.4 Strong convergence ...... 23 3.4.1 Preliminaries ...... 23 3.4.2 Proof of strong convergence ...... 27 3.5 Numerical certificate of strong convergence ...... 35
viii 3.6 Performance comparison ...... 38 3.6.1 Extended Kalman filter ...... 38 3.6.2 Numerical example ...... 41 3.7 Conclusions ...... 48
4 Parallel stochastic simulation of coagulation 50 4.1 Introduction ...... 51 4.2 Gillespie’s method ...... 52 4.2.1 Numerical example ...... 53 4.3 Parallel stochastic particle algorithm ...... 55 4.3.1 Numerical example ...... 58 4.4 Convergence of parallel stochastic particle method ...... 61 4.5 Conclusions ...... 74
Bibliography 75
ix List of Tables
3.1 Table of the algorithms, RMSE values and the fraction of divergence (Averaged over 1000 Monte Carlo Runs) ...... 43 3.2 Table of the algorithms, RMSE values and the fraction of divergence (Averaged over 1000 Monte Carlo Runs) ...... 46 3.3 Table of the algorithms, RMSE values and the fraction of divergence . 47
x List of Figures
3.1 The trajectory estimation by CPF and DPF with Markov chain steps k =4...... 37 3.2 RMSE of MCDPF and CPF with respect to number of execution (left) and different Markov chain steps k (right)...... 37 3.3 Trajectory of flocking model and its position estimation by EKF, DEKF, REKF, RDEKF, CPF and MCDPF...... 44 3.4 RMSE versus time for EKF, DEKF, REKF, RDEKF, CPF, MCDPF. 45 3.5 RMSE with respect to BW with changing N = 50, 100, 200, 500 (CPF),
kmc = 5, 10, 20, 50 (MCDPF) and kcon = 2, 6, 10, 14 (DEKF). The de- crease in RMSE is observed with increased BW...... 48
4.1 c(t, k) of linear kernel with 2 k 10...... 54 ≤ ≤ 4.2 The stochastic solution with M0 = 500, 120 and the histogram of τn.
For M0 = 120 and M0 = 500 the portion of τn 0.01 is 0.4311 and ≤ 0.8016...... 56
4.3 The plot of c104 (t, 5) andc ˜104 (t, 5) with 3 different τmix’s...... 59
4.4 The plot of c104 (5, k) andc ˜104 (5, k) with 3 different τmix’s...... 60 4 4.5 eR 2 defined in (4.22) up to an ensemble size of R = 10 ...... 60 k k 4.6 Particular realization of particle coagulation using Gillespie’s algorithm. 67
xi Chapter 1
Introduction
1.1 Problem description
Stochastic particle methods, which are based on Monte Carlo or sampling methods, have become powerful and practical tools in a variety of research areas due to signif- icant developments in computing power. Nonlinear and high dimensional functions and complex probability distributions are good examples that can be represented by a set of particles and associated weights. In many areas where particle methods are used, it is generally true that more particles means more accurate results. Therefore there is a need for faster computation to process a large number of particles for the purpose of applying particle methods in more practical situations. Parallel compu- tation is a direct method to achieve this objective, not only because it itself meets this requirement but because some practical situations are naturally distributed over a computer network. As simple examples, the position estimation of a moving object can be achieved by a physically distributed sensor network and numerous computa- tionally expensive simulations already execute on a parallel cluster. Applying parallel computation to particle methods, however, requires clarification of what type of in- formation is to be communicated and how to exchange it between processing nodes. We use a Markov chain random walk as a way to communicate information be- tween processors in a parallel particle method, and we exchange individual particles
1 CHAPTER 1. INTRODUCTION 2
as the basic unit of data. This is in contrast with methods that exchange parame- terized representations of sets of particles. While such methods often aim to reduce the communication load, we will show that a parallel method with raw particle data is superior in some cases. Furthermore, we can easily prove the convergence of our parallel methods to centralized approachs by adapting well established results from Markov chain theory. The parallel stochastic particle method that is introduced in this thesis is demon- strated through the application to two different areas: (1) an optimal filtering prob- lem, and (2) a stochastic simulation of particle coagulation. The application of the parallel stochastic particle method to the optimal filtering problem is motivated by recent interest in distributed nonlinear system estimation, which has practical impli- cations in many areas. We show that the Markov Chain Distributed Particle Filter (MCDPF) studied in this thesis converges both weakly and strongly to both the Cen- tralized Particle Filter (CPF) and the optimal filtering solution in a probabilistically well defined manner. The robustness and practicality of MCDPF is demonstrated numerically by comparing its performance with an existing distributed estimation method, the Distributed Extended Kalman Filter (DEKF), for a distributed target tracking problem with flocking vehicles. The motivation of the second application, the simulation of particle coagulation, stems from the need for a massive number of simulated particles to adequately capture complex, high-dimensional particle populations. The distributed stochastic particle simulation method is based on the centralized Stochastic Simulation Algorithm (SSA, or Gillespie’s method). The idea is to distribute particles across many processors and to exchange particles between neighboring processors so that particles at different processors can interact with each other. The convergence of the distributed simulation to the centralized simulation is shown analytically as the particle exchange rate goes to infinity. To explore the convergence rates a numerical example with varying particle exchange rate is demonstrated. CHAPTER 1. INTRODUCTION 3
1.2 Dissertation overview
We review the background on Markov chain random walks and steady state Markov chain distributions in the next chapter. Following this background material, we con- sider the use of parallel stochastic particle methods for the optimal filtering problem and the simulation of particle coagulation in separate chapters. In chapter 3, the Markov Chain Distributed Particle Filter (MCDPF) algorithm is introduced and its convergence to the Centralized Particle Filter (CPF) is proved. In addition, the performance of MCDPF is compared with an existing distributed non- linear system estimation method, the Distributed Extended Kalman Filter (DEKF), using distributed target tracking for a flocking model. In chapter 4, the parallel stochastic particle method is applied to solving Smolu- chowski’s coagulation equation. We present a parallel algorithm for this problem and prove convergence of the solution from this method to that of the serial solution. To understand the behavior of the parallel algorithm, a numerical example is also included. Chapter 2
Background
A Markov chain is a stochastic process whose next value only depends on the current value and can be classified as a discrete-time Markov chain (DTMC), a continuous- time Markov chain (CTMC), or Brownian motion, depending on the type of the state space and time. Many theorems on the subject of DTMC can be generalized for case of Brownian motion, but we focus only on DTMC in this thesis. The materials reviewed here mostly refer to [51]
2.1 Markov chain random walk
A discrete-time Markov chain (DTMC) is a Markov process whose state space is a finite or a countable set and whose time index set is discrete, T = (0, 1, 2,...). A DTMC has the Markov property,
Pr(Xn+1 = j X0 = i0,...,Xn−1 = in−1,Xn = i) = Pr(Xn+1 = j Xn = i) (2.1) | | where Xn is a random variable at time n and i0, . . . , in−1, i, j are states of DTMC. Since the future value depends on the past only through the current value, the one- step transition probability which describes the probability of Xn+1 being at state j
4 CHAPTER 2. BACKGROUND 5
given that Xn is at i is defined as follows,
n,n+1 P = Pr(Xn+1 = j Xn = i). (2.2) ij |
If a one-time transition probability is independent of time index n, then we say that the DTMC has a stationary transition probability and transition probabilities are arranged in a matrix form. We restrict our discussion in this thesis to stationary Markov chains from now on. P00 P01 P02 ··· P P P 10 11 12 P = ··· . (2.3) P20 P21 P22 . . . ···. . . . ..
The matrix P is called a Markov matrix or a transition probability matrix and it satisfies the following properties.
Pij 0 for i, j = 0, 1, 2,... (2.4) ≥ ∞ X Pij = 1 for i = 0, 1, 2,... (2.5) j=0
A DTMC is completely defined by its transition probability matrix and an initial state X0, since the probability of an arbitrary Markov process is obtained as follows.
Pr(X0 = i0,X1 = i1,...,Xn = in) (2.6)
= Pr(X0 = i0,X1 = i1,...,Xn−1 = in−1) (2.7)
Pr(Xn = in X0 = i0,X1 = i1,...,Xn−1 = in−1) (2.8) × | = Pr(X0 = i0,X1 = i1,...,Xn−1 = in−1)Pi ,i = = pi Pi ,i Pi ,i (2.9) n n−1 ··· 0 1 0 ··· n n−1
where pi0 = Pr(X0 = i0). A Markov chain random walk is a random process whose transition probability is defined by the transition probability matrix P . It is easily understood if we consider CHAPTER 2. BACKGROUND 6
particle moving in the state space according to the transition probability matrix. The probability that the particle moves from state i to j, where i, j are states of the Markov chain, is the (i, j)th element of the transition probability matrix P .
Pr(Xn = j Xn−1 = i) = Pij. (2.10) |
The Markov chain random walk introduced in this section will be used as a methodology to communicate or exchange information between connected nodes later. In the parallel setting, the particle is considered as an appropriate type of information and states are treated as processors of the parallel machines, sensors, or nodes in the sensor network.
2.2 Steady state of Markov chains
Given a random process such as a Markov chain random walk introduced in the previous section, it is impossible to know which state the particle will be at the next time index and to know how many times the particle will visit a particular state i during n time steps. The basic limit theorem of Markov chains says that if we run the process for a very long time, we can gain some useful information on the behavior of the random process. The distribution π is called a stationary distribution if it satisfies the following. X X πj = πiPij, πj = 1. (2.11) i∈S j∈S Here S is the state space of the Markov chain. The following theorem shows the existence of a stationary distribution for a finite-state Markov chain.
Theorem 1. Suppose P is the transition probability matrix of a finite-state Markov chain. Then there exists a matrix Λ such that
P Λ = ΛP = Λ2 = Λ. (2.12) CHAPTER 2. BACKGROUND 7
Furthermore, n−1 1 X lim P j = Λ. (2.13) n→∞ n j=0 To study the uniqueness of limit distributions, we recall some properties of Markov chains.
Irreduciblity
A state j is said to be accessible from the state i if P (n) > 0 for some integer n 0. ij ≥ If two states i, j are accessible to each other, then they are said to communicate. Communication is transitive, so given a Markov chain, we can partition the states into equivalence classes in which states communicate with each other. A Markov chain is irreducible if all states communicate with each other.
Periodicity of Markov chain
The period of a state i is defined as the greatest common divisor of all integers n 1 (n) ≥ for which Pii > 0. A Markov chain in which each state has period 1 is called aperiodic.
Recurrent or Transient
A state i is said to be recurrent if and only if the probability that the state starting from i returns to i is one. The states that are not recurrent are called transient.
Formally, let the random variable τi be the first return time to the state i,
τi = inf n 1 : Xn = i X0 = i . (2.14) { ≥ | }
Then, the state i is recurrent if and only if
Pr(τi < ) = 1. (2.15) ∞
Although the hitting time is finite, it need not have a finite expectation. The state i is positive recurrent if E[τi] is finite. CHAPTER 2. BACKGROUND 8
Theorem 2. An irreducible Markov chain has a stationary distribution π if and only if all of its states are positive recurrent. Furthermore π is related to the expected return time. 1 πi = . (2.16) E[τi] Theorem 3. Consider a positive recurrent irreducible aperiodic Markov chain with states S. Then we have a unique stationary distribution π satisfying
(n) X lim Pjj = πj = πiPij (2.17) n→∞ i∈S (n) 1 lim Pij = . (2.18) n→∞ E[τj]
The rigorous proof of the above theorems is worth reviewing for a deep under- standing of Markov chain random walks and their stationary behavior, but as a tool to develop a parallel methodology for stochastic particle methods we simply list good resources for a mathematical consideration of Markov chain random walk, for exam- ple [32, 51, 9]. The Markov chain random walk introduced in this chapter is the main idea re- quired for the development of our parallel stochastic particle methods. The follow- ing relation between Markov chain random walk and the parallel stochastic particle method is stated again before we jump into the two application areas. Each state is considered to be a node or sensor in a physically distributed sensor network, or an individual processor in a parallel computing cluster, and the particle moving by a random walk between states in the Markov chain is any type of information that is exchanged between nodes, sensors, or processors. Chapter 3
Markov Chain Distributed Particle Filter
Distributed particle filters (DPF) are known to provide robustness for the state esti- mation problem and can reduce the amount of information communication compared to centralized approaches. Due to the difficulty of merging multiple distributions represented by particles and associated weights, however, most uses of DPF to date tend to approximate the posterior distribution using a parametric model or to use a predetermined message path. In this chapter, the Markov Chain Distributed Par- ticle Filter (MCDPF) algorithm is proposed, based on particles performing random walks across the network. This approach maintains robustness since every sensor only needs to exchange particles and weights locally and furthermore enables more general representations of posterior distributions because there are no a priori assumptions on distribution form. In section 3.2, a review of basic properties and theorems for random walks on graphs is reviewed. Section 3.3 contains the Centralized Particle Filter (CPF) algorithm and the proposed decentralized particle filter algorithm. In addition, the weak convergence of the posterior distribution of the MCDPF to that of the CPF and the optimal filter, which is the main result of [29], is reviewed in this section. Section 3.4 consists of the proof of strong convergence of MCDPF to the optimal filtering. The definition of the strong convergence is also provided in the beginning of this section. Furthermore we compare the performance of MCDPF in
9 CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 10
a practical situation, a range-only tracking problem using a flocking model, with the Distributed Extended Kalman Filter (DEKF) in section 3.6. Conclusions and future work are discussed in section 3.7.
3.1 Introduction
In the Bayesian filtering problem, there have been many efforts to approximate the distribution. The popular methods developed in 1960s and 70s include the extended Kalman filter [3] and sequential Monte Carlo method [1, 23, 22, 54]. The very first introduction of the Sequential Monte Carlo method, also known as particle filtering, indeed goes back to the calculation of a polymer growing [21, 44]. Particle filtering was not able to be broadly adopted mainly because of its very high computational complexity and the lack of adequate computing resources at that time [7]. Along with a huge amount of development in computing power, particle filtering has be- come a very active research topic and has been applied to various areas. Among those areas, signal processing started to take advantage of particle filtering following a seminal paper [17]. Various modifications of a standard particle filter to improve the performance are introduced in the tutorial paper [31] in a clear manner. Stratified sampling, residual sampling [30] and Systematic resampling [27] were proposed as effi- cient resampling schemes. Pitt and Shephard [40] introduced the Auxiliary Sampling Importance Resampling (ASIR) filter for better estimation and the regularized par- ticle filter (RPF) was proposed in [33] to solve the problem induced by a resampling step. Distributed Particle Filters (DPF) have been emerging as an efficient tool for state estimation, for instance in target tracking with a robotic navigation system [45, 18]. The general benefits of distributed estimation include the robustness of the estimation, the reduction of the amount of information flow, and estimation results comparable to the centralized approach. Much effort has been directed toward the realization of the decentralized Kalman filtering [42, 35, 36] but decentralized particle filters were thought to be challenging due to the difficulty of merging probability distributions represented by particles and weights [38]. The currently existing distributed particle CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 11
filtering methods, however, are not able to gain all of these advantages or turn out to benefit from these properties only with relatively low dimensional systems by in- troducing an assumption such as Gaussian Mixture Model (GMM). The developed methods so far try to avoid exchanging the raw data, namely particles and associated weights, mainly due to the large amount of information that implies. The commu- nication of such raw data scales better with system dimension, however, than the existing methods do. The distributed particle filter proposed in [29] exchanges par- ticles and weights only between nearest neighbor nodes and estimates the true state by assimilating data with an algorithm based on a Markov chain random walk. Past work on distributed particle filters can be broadly categorized into two ap- proaches, namely message passing approaches and consensus-based local information exchange methods. Message passing approaches transfer information along a pre- determined route covering an entire network. For example, [4] passes parameters of the parametric model of the posterior distribution while [48] transmits the raw information, particles and weights, or the parameters of a GMM approximation of the posterior distribution. Consensus based methods communicate the information only between the nearest nodes and achieve global consistency by consensus filtering. The type of exchanged information can be, for example, the parameters of a GMM approximation [18] or the local mean and covariance [19]. The message passing approaches [4, 48] can have reduced robustness because the distributed algorithms cannot themselves cope with the failure of even one node since the system uses fixed message paths. Furthermore, the assumption of synchronization with identical particles at every node can cause fragility. On the other hand, the consensus based approaches so far proposed [18, 19] all use approximations of the posterior distribution with GMM because to reduce information flow. The amount of reduced information, however, is not significant compared with transmitting full particles when the dimension of the system is very large, due to the covariance matrix of the posterior distribution. For an n-dimensional system, consensus based DPF with a GMM approximation has to transmit (cn2 E ) data through the entire network per O | | consensus step, where c is the number of Gaussian mixtures and E is the number of | | network edges. If the DPF is realized with exchanging N particles, however, then the CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 12
amount of information per Markov chain iteration is (nmN), where m is the number O of nodes. A detailed description of the Markov chain iteration is given below. For a system with cn E mN, a GMM approximation no longer benefits from the effect | | of reduced information flow. Furthermore, it frequently happens that the posterior distribution is not well-approximated by small number of combination of Gaussian distribution. Just to mention a few, a fault detection problem [13] and estimating the indoor environment of a building system [26] are an example of non-Gaussian posterior distribution and high dimension system estimation respectively. In this chapter we briefly review a distributed particle filter based on exchang- ing particles and associated weights according to a Markov chain random walk, MCDPF, and prove the strong convergence of MCDPF to optimal filtering and its rate. MCDPF maintains the robustness of a distributed system since each node only needs local information and it scales well in the case of non-Gaussian and high dimen- sional systems. The convergence of particle filtering is shown generally in probability literature [8, 6] and [5] provides an excellent survey about standard particle filtering and proof of the convergence. The convergence result shown in this paper is based on the trivial modification of induction in time given in [5] but the effect of Markov chain iteration step is additionally considered in MCDPF setting and hence the con- vergence rate with respect to Markov chain property, the iteration step and spectral gap, is established.
3.2 Random walks on a graph
A sensor network system can modeled as a graph, G = (V,E), with a normalized adjacency matrix . The vertices V = 1, . . . , m correspond to nodes or sensors A { } in the network system and edges E represent the connection between sensors. The neighbors of node i is defined as Ni = i V : aij = 0 . Matrix is a Markov { ∈ 6 } A transition probability matrix defined on the graph because it satisfies
0, 1 = 1. A ≥ A CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 13
Consequently a random walk on the network system can be defined according to matrix and we assume no self-loop in the chain. Here we review several properties A of random walk on graph which are useful for a development of DPF.
Theorem 4. If is the normalized adjacency matrix of an undirected connected A graph G then Markov chain defined by has a unique stationary distribution Π and A for all i, Πi > 0. For any starting distribution
M( , k) lim · = Π. (3.1) k→∞ k where M( , k) Rm is a vector whose i-th elements is the number of visits to state i · ∈ during k steps. Furthermore M( , k) Rm converges to Π in distribution as k . · ∈ → ∞ M( , k) d √k · Π (0,V ). (3.2) k − −→N
Proof. See [43, Theorem 42.VII].
Theorem 5. If is the normalized adjacency matrix of an undirected graph G A then the stationary distribution of the Markov chain defined by is given by Π = d(i) A (Π1, Π2,..., Πm), where Πi = . d(i) is the degree of node i and E(G) is the 2|E(G)| | | number of edges of the graph.
Proof. We compute
m m X X d(i) Eij (Π )j = Πi ij = | | = Πj, (3.3) A A 2 E d(i) i=1 i=1 | | Pm where Eij is the number of edges connecting nodes i and j and Eij = d(j). | | i=1| | Since Π satisfies Π = Π , Π is the stationary distribution of the Markov chain defined A by . A Corollary 6. If G is d-regular connected graph then the stationary distribution of the Markov chain defined by the normalized adjacency matrix of G is the uniform 1 1 1 distribution Π = ( m , m ,..., m ), where m is the number of nodes. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 14
3.3 Particle filters
Suppose we have the general state space model,
xt+1 = f(xt, wt) (3.4)
yt = g(xt, vt), (3.5)
n p where xt R , yt R , and wt, vt are process and measurement noises respectively. ∈ ∈ We define two stochastic processes, X = Xt, t N and Y = Yt, t N , where { ∈ } { ∈ } X and Y are a signal process and an observation process respectively. The signal process X is a Markovian process with an initial distribution µ(x0) and transition kernel K(dxt xt−1) and the observation process Y is conditionally independent given | X. For simplicity, we assume that the kernel and conditional probability distribution of Y attain Lebesgue measures. Z Pr(Xt A Xt−1 = xt−1) = K(xt xt−1)dxt (3.6) ∈ | A | Z Pr(Yt B Xt = xt) = ρ(yt xt)dyt (3.7) ∈ | B | where ρ(yt xt) is the transition probability density of a measurement yt given the | state xt.
The filtering problem is to estimate the true state xt at time t given the time series of observations y1:t. The prediction and updating of the optimal filtering based on Bayes’ recursion are given as follows. Z p(xt y1:t−1) = p(xt−1 y1:t−1)K(xt xt−1) dxt−1 (3.8) | Rn | | ρ(yt xt)p(xt y1:t−1) p(xt y1:t) = R | | . (3.9) | n ρ(yt xt)p(xt y1:t−1) dxt R | | Analytic solutions for the posterior distribution in (3.9) do not generally exist except in special cases, such as linear dynamical systems with Gaussian noise. In the particle filtering setting, the posterior distribution is represented by a group of particles and associated weights so that the integral in (3.9) is approximated by the sum of discrete CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 15
values.
3.3.1 Centralized particle filters
Particle filtering is a recursive method to estimate the true state, given the time series of measurements [5, 8]. Suppose the posterior distribution at time t 1, − i N πt−1|t−1(dxt−1), is approximated by N particles x . Then we have { t−1}i=1
N N 1 X p(xt−1 y1:t−1) , πt−1|t−1(dxt−1) πt−1|t−1(dxt−1) = δxi (dxt−1). (3.10) N t−1 | ≈ i=1
i where particle i is at position xt−1 in state space. Now, particles go through the prediction and measurement update steps to approximate the posterior distribution at time t. Given N particles, new particles are sampled from the transition ker- i N 1 PN i nel density,x ˜ π K(dxt) = K(xt x ). This set of particles is the t ∼ t−1|t−1 N i=1 | t−1 approximation of πt|t−1,
N N 1 X p(xt y1:t−1) , πt|t−1(dxt) π˜t|t−1(dxt) = δx˜i (dx˜t). (3.11) N t | ≈ i=1
If the empirical distribution in (3.11) is substituted in (3.9), we have the following distribution approximating the posterior distribution p(xt y1:t). | N PN i ρ(yt xt)˜π (dxt) ρ(y x˜ )δ i (dx˜ ) N t|t−1 i=1 t t x˜t t π˜ (dxt) | = | (3.12) t|t , R N PN i n ρ(yt xt)˜π (dxt) dxt ρ(yt x˜ ) R | t|t−1 i=1 | t N X i = w δ i (dx˜ ) (3.13) t x˜t t i=1
PN i i where i=1 wt = 1 and wt are called the importance weights. To avoid the degeneracy problem, particles are selected according to a resampling step that samples N particles N 1 from the empirical distribution,π ˜t|t(dxt), and resets the weights to N . We then have CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 16
the empirical distribution approximating the posterior at time t given by
N N 1 X π (dxt) = δ i (dxt). (3.14) t|t N xt i=1
3.3.2 The Markov Chain Distributed Particle Filter (MCDPF)
The main difference between the CPF and DPF is that the CPF has a central unit to collect the entire measurements from all nodes and update particles using all measure- ments simultaneously. During the process of data collection, the CPF might suffer from bottlenecks in information flow. On the other hand, a DPF can overcome this problem by passing information only locally between connected nodes. If we have m nodes measuring the partial observations independently, then we can decompose the general state space model (3.5) as follows.
xt+1 = f(xt, wt) (3.15) y1,t g1(xt, v1,t) y g (x , v ) 2,t 2 t 2,t . = . . (3.16) . . ym,t gm(xt, vm,t)
n pi Pm Here xt R , yi,t R with pi = p and subscript i represents node i. ∈ ∈ i=1 In addition, the measurement noise at each nodes is assumed to be uncorrelated, T E[vtvt ] = diag(R1,R2,...,Rm). Uncorrelated noise structure enables us to have con- ditionally independent measurements at each node, yi,t, given the true state xt. As a i consequence of this assumption, the function ρ(yt x ) in (3.9) can be factorized by a | t i product of ρj(yj,t x ) at each node, | t m i Y i ρ(yt xt) = ρj(yj,t xt). (3.17) | j=1 |
We propose a distributed particle filtering method using a random walk on the graph defined by the network topology. In the sensor network, node i measures the CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 17
partial observation yi,t at time t and data at every node has to be fused to reach the global estimation of the true state. While achieving a global estimate by exchang- ing data, it is desirable to maintain a robustness with respect to the unexpected changes of global properties such as losing a node. The DPF proposed here is robust since the information, consisting of particles and weights, is transferred only to the connected neighborhood of each node. In other words, every node only needs local information. Transferring particle data is inefficient for low-dimensional systems, but scales well (only linearly) with dimension size, as opposed to existing methods using GMM approximations of posterior distribution [19] [18] [48]. As briefly explained in section 3.1, communicating raw data is more efficient in terms of bandwidth capacity for relatively high-order systems. MCDPF moves particles around the network according to the Markov chain on the network defined by the normalized adjacency matrix to compute the impor- A tance weights. The main idea is that each particle gains ρi(yi,t xt) exponentially | proportional to the expected number of visit to node i. Suppose we have the graph G = (V,E) based on the sensor network and the normalized adjacency matrix . A In the MCDPF setting, the Markov chain is run k steps on every particle after the prediction step and the number of visits to the i-th node is defined by M(i, k). Con- 2|E(G)| sidering the number of visits to each node, each particle multiplies ρi(yi,t xt) kd(i) to | its previous weight every time it visits the i-th node. If we have N particles after k Markov chain steps at a node, then the posterior distribution of the MCDPF is given as follows.
2|E(G)| PN Qm i ×M(j,k) N ρ (y x˜ ) kd(j) δ i (dx˜ ) N i=1 j=1 j j,t t x˜t t X i i π˜t|t,k(dxt) = | 2|E(G)| = wt,kδx˜ (dx˜t). (3.18) PN Qm i ×M(j,k) t ρj(yj,t x˜ ) kd(j) i=1 i=1 j=1 | t
i The MCDPF is defined in algorithm 1 below. We use the notation xj,t for the i-th particle of node j at time t and N(j) for the number of particle at node j. Also i→j I is the indices of particles moving from node i to j in the current Markov chain step and we recall that is the adjacency matrix of the network. A CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 18
Algorithm 1 Markov Chain Distributed Particle Filter (MCDPF) Initialization: i N i N 1 xj,0 i=1 p(x0), wj,0 i=1 = N for j = 1, . . . , m Importance{ } ∼ Sampling:{ }For j = 1, . . . , m i N(j) i N(j) i N(j) x˜ p xt x , w˜ = 1 { j,t}i=1 ∼ |{ j,t−1}i=1 { j,t}i=1 for k iterations do i N(j) i N(j) Move x˜·,t i=1 , w˜·,t i=1 according to matrix for j ={ 1 to} m do{ } A i N(j) S i x˜ = x˜ i∈I { j,t}i=1 l∈Nj { l,t} l→j i N(j) S i w˜j,t i=1 = l∈Nj w˜l,t i∈Il→j { } { } 2|E(G)| i N(j) i N(j) i N(j) kd(j) w˜j,t i=1 w˜j,t i=1 ρj(yj,t x˜j,t i=1 ) end{ for} ← { } × |{ } end for Resample: For j = 1, . . . , m Resample xi N(j) according to w˜i N(j) and set weights wi N(j) = 1 { j,t}i=1 { j,t}i=1 { j,t}i=1 N(j)
3.3.3 Convergence to CPF and algorithm
We will show that the empirical posterior distribution of the MCDPF converges weakly to that of the CPF as the Markov chain steps k per measurement goes to infinity. The notation that will be used throughout the proof mainly follows that of
[5]. In the stochastic filtering problem, functions at and bt are defined on a metric space (E, d) to itself and are considered as continuous maps from πt|t−1 πt|t and → k πt−1|t−1 πt|t−1, respectively. Additionally, a is also a continuous function mapping → t πt|t−1 πt|t,k, given by →
2|E(G)| Qm kd(j) ×M(j,k) k j=1 ρj(yj,t xt) p(xt y1:t−1) at (p(xt y1:t−1)) = | 2|E(G)| | . (3.19) | R Qm kd(j) M(j,k) n ρj(yj,t xt) p(xt y1:t−1) dxt R j=1 | | The perturbation cN is defined as a function that maps from a measure ν to a random sample size of size N of the measure, so that
N 1 X cN,w(ν) = δ , (3.20) N {Vj (w)} j=1 CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 19
n where Vj :Ω R is an IID random variable with the distribution ν and w Ω. For → ∈ N N N notational simplicity, let ht , h1:t be defined as the composition of functions at, bt, c as follows.
N N N N N N h c at c bt, h h h (3.21) t , ◦ ◦ ◦ 1:t , t ◦ · · · ◦ 1 N N k N N N N h c a c bt, h h h . (3.22) t,k , ◦ t ◦ ◦ 1:t,k , t,k ◦ · · · ◦ 1,k
Thus the posterior distribution of CPF and MCDPF at time t can then be expressed as
N N N N πt|t = ht (πt−1|t−1) = h1:t(π0) (3.23) N N N N πt|t,k = ht,k(πt−1|t−1,k) = h1:t,k(π0). (3.24)
N N To prove limk→∞ πt|t,k = πt|t, several lemmas are reviewed here.
k Lemma 7. Let (E, d) be a metric space with functions a , at, bt : E E such that t → k limk→∞ at = at pointwise for each t. Then
N N lim h1:t,k = h1:t. (3.25) k→∞ pointwise for each t and N.
N Proof. For e E and arbitrary t, we have c (bt(e)) E. Since we assumed pointwise ∈ ∈ k convergence of at to at, for all > 0 there exists k(e, ε) such that for k > k(e, ε),
k N N a (c (bt(e))) at(c (bt(e))) < ε, (3.26) k t − k where is the supremum norm on functions from (E, d) to itself. Equivalently, k·k N N limk→∞ ht,k = ht pointwise for all t. By induction over t we have (3.25). Lemma 8. For the MCDPF and CPF as defined above,
k lim at = at (3.27) k→∞ pointwise for all t. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 20
Proof. For any e E, ∈
k lim at (e) at(e) (3.28) k→∞k − k Qm 2|E(G)| ×M(j,k) ρj(yj,t xt) kd(j) e(xt) ρ(y x )e(x ) = lim j=1 | t t t (3.29) 2|E(G)| R | k→∞ R Qm kd(j) M(j,k) − n ρ(yt xt)e(dxt) n ρj(yj,t xt) e(dxt) R R j=1 | | Qm ρj(yj,t xt)e(xt) ρ(y x )e(x ) j=1 | t t t = R Qm R | = 0. (3.30) n ρj(yj,t xt)e(dxt) − n ρ(yt xt)e(dxt) R j=1 | R | The first equality is due to theorem 4 and the second equality comes from conditional independence of the measurements at each node.
Theorem 9. Consider a connected sensor network with measurements at different nodes conditionally independent given the true state. Then the estimated distribution of the MCDPF in Algorithm 1 converges weakly to the estimated distribution of the CPF as the number of Markov chain steps k per measurement goes to infinity. That is, N N lim πt|t,k = πt|t. (3.31) k→∞ pointwise.
Proof. Combining lemmas 7 and 8 with the optimal Bayesian filtering functions at, bt k and at gives (3.31).
3.3.4 Convergence to optimal filtering
So far we showed that the MCDPF converges weakly to the CPF. The next step is to prove the convergence of MCDPF to the optimal filtering distribution as N → ∞ as well as k . The convergence of the classical particle filter to optimal filtering → ∞ distribution is shown in [5, 24]. The only difference between the MCDPF and the k standard particle filter used in the proof is the function at , which needs to satisfy the following condition to ensure the convergence of MCDPF to optimal filtering. For all CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 21
sequences eN e E we have → ∈
k lim lim at (eN ) = at(e). (3.32) N→∞ k→∞
k This property is not obvious because the function at converges only pointwise to a function at. Fortunately, however, at is continuous, which is sufficient to give (3.32), as we see from the following lemma.
Lemma 10. Suppose (E, d) is a metric space and ak, a : E E are continuous → functions so that ak converges pointwise to a as k . For a convergent sequence → ∞ limN→∞ eN = e E, we have ∈
k k lim lim a (eN ) = lim lim a (eN ) = a(e). (3.33) N→∞ k→∞ k→∞ N→∞
k Proof. For functions a a pointwise and eN e E, we have → → ∈
k lim a (eN ) = a(eN ) for all N (3.34) k→∞ k lim lim a (eN ) = lim a(eN ) = a(e), (3.35) ⇒ N→∞ k→∞ N→∞ since a is continuous. Conversely, continuity of ak gives
k k lim a (eN ) = a (e) for all k (3.36) N→∞ k k lim lim a (eN ) = lim a (e) = a(e). ⇒ k→∞ N→∞ k→∞
k N Lemma 11. Consider at, bt from (3.9), at from (3.19), and c from (3.9). Assume k that the sequence at satisfies the property (3.32) for each t. Then we have
N lim lim h1:t,k = h1:t. (3.37) N→∞ k→∞
Moreover for all sequences eN e E we have → ∈
N lim lim h1:t,k(eN ) = h1:t(e). (3.38) N→∞ k→∞ CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 22
N Proof. From [5, Lemma 2], for all eN e E, c satisfies → ∈
N lim c (eN ) = e. (3.39) N→∞
Thus, for all eN e E and any continuous function bt, → ∈
N lim c (bt(eN )) = bt(e). (3.40) N→∞
From the property (3.32),
N lim c (bt(eN )) = bt(e) (3.41) N→∞ k N lim lim at (c (bt(eN ))) = at(bt(e)). (3.42) ⇒ N→∞ k→∞
And again from the property (3.39) of cN ,
k N lim lim at (c (bt(eN ))) = at(bt(e)) (3.43) N→∞ k→∞ N k N lim lim c (at (c (bt(eN )))) = at(bt(e)). (3.44) ⇒ N→∞ k→∞
N Thus we have limN→∞ limk→∞ ht,k(eN ) = ht(e) and from induction over t we can N conclude limN→∞ limk→∞ h1:t,k(eN ) = h1:t(e).
Recall that a kernel K is said to have the Feller property if Kϕ is a continuous bounded function whenever ϕ is a continuous bounded function. For such kernels we have the following result.
Lemma 12. Suppose at and bt are functions defined in (3.9). Then at is continuous provided the function ρ(yt ) is bounded, continuous and strictly positive. Furthermore, |· bt is a continuous function if the transition kernel K is Feller.
Proof. See [5, Section IV.B.].
Putting all the above lemmas together gives the following main result.
Theorem 13. Assume that the kernel K is Feller and the function ρ is bounded, continuous and strictly positive. Then the estimated distribution of the MCDPF in CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 23
Algorithm 1 converges to the optimal filtering distribution as the number of particles N and the number of Markov chain steps k per measurement go to infinity:
N lim lim πt|t,k = πt|t. (3.45) N→∞ k→∞
N Proof. For the initial probability measure µ0, we know that limN→∞ µ0 = µ0. From lemma 11, N N N lim lim πt|t,k = lim lim h1:t,k(µ0 ) = h1:t(µ0) = πt|t, (3.46) N→∞ k→∞ N→∞ k→∞ giving the desired result.
3.4 Strong convergence
In the previous section, the weak convergence of MCDPF to the optimal filtering distribution is proved. Another type of the convergence, which is a strong conver- gence of MCDPF is considered here: We say that the sequence of random probability ∞ n measures (µN ) converges to µ in a strong manner if for any ϕ B(R ), where N=1 ∈ B(Rn) is a set of Borel bounded measurable function,
2 lim E ((µN , ϕ) (µ, ϕ)) = 0. (3.47) N→∞ − where we define Z Z (µ, ϕ) ϕµ, Kϕ(x) K(dz x)ϕ(z). (3.48) , , |
3.4.1 Preliminaries
We need couple of lemmas to prove the strong convergence of MCDPF. First the main results on Markov chain central limit theorem are reviewed. S. Meyn and ∞ R. Tweedie [32] described that if (Xi)i=0 is a countable state space Markov chain on a state space and it is irreducible and positive recurrent with its stationary X CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 24
distribution Π, then for g : R the random variable Wk(g) which is defined as X →
k−1 X Wk(g) = √k (g(Xi) Π(g)) (3.49) i=0 − converges to normal random variable with mean 0 and the variance
2 τx0 −1 2 X γg = Π(x0)Ex0 [g(Xi) Π(g)] . (3.50) i=0 −
where τx0 is the first returning time to the initial state. In addition, [32] showed that for that type of Markov chain a constant R 1 and the second largest eigenvalue ≥ modulus (SLEM), ν, satisfy P i(g) Π(g) Rνi, where P i is the distribution of | x − | ≤ x Xi with X0 = x. Now we have the following theorem in [49, Corollary 1] on the convergence of the moment generating function.
Theorem 14. If there is a positive c such that g(x) Π(g) c for all x, then for | − | ≤ any λ 1/(√3 3L L0), all k 1, and all x , ≤ ∨ ≥ ∈ X √ −1/2 0 0 (λL0)2 −1/2 ( 3 3λL)2 Ex exp λWk(g) E exp λγgX k V (x) C L λe + k Ce { } − { } ≤ (3.51) C0L0λ C + k−1 + k−3/2 . 1 (λL0)2 1 (√3 3λL)2 − − (3.52) where X is a standard normal random variable and V (x) is a function to ensure the V-uniform ergodicity of a Markov chain. Furthermore the following positive constants C, L, C0 and L0 depend on SLEM, ν.
2 r ! (4e) + 3 + 2(1 ν) log 3 25 −1/2 2 C = − ,L = e 2·3 R 32 (1 ν) 2 √e , (3.53) 1 ν − ∨ 1 ν − − r r 2R 2R log 3 C0 = ,L0 = e 2(2·3+1) . (3.54) 1 ν 1 ν − − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 25
Proof. The proof of this theorem is given in [49, Section 4, Section 5.1]. Here we will show how constants L, L0,C and C0 are related to SLEM. The upper bound of the error of 2n-th moments given in [49, (42)] is reduced to
2n 2 r !! −1 n(2n)! (4e) + 3 + 2(1 ν) log 3 25 −1/2 2 k − e 2·3 R 32 (1 ν) 2 √e (n 1)! 1 ν − ∨ 1 ν − − − n! 1 + V (x), (3.55) × k
With defined C,L the error of even moments is bounded by 2n 2 n −1 2n n(2n)! n! ExWk(g) (2n 1)(γ ) k CL 1 + V (x). (3.56) − − g ≤ (n 1)! k − Similarly for the error of odd moments, the upper bound given in [49, (43)] is written as r r !2n+1 (2n + 1)! 2R 2R log 3 n! k−1/2 e 2(2·3+1) 1 + V (x). (3.57) n! 1 ν 1 ν k − − Using C0,L0 the error of odd moments is bounded by 2n −1/2 0 02n+1 (2n + 1)! n! ExWk(g) k C L 1 + V (x). (3.58) ≤ n! k
Hence the error of the moment generating function (3.51-3.52) is given by
Ex exp λWk(g) E exp λγgX (3.59) { } − { } ∞ X λ2nn2 n! λ2n+1 n! V (x) k−1CL2n 1 + + k−1/2C0L02n+1 1 + (3.60) n! k n! k ≤ n=0 √ 0 2 3 2 k−1/2V (x) C0L0λe(λL ) + k−1/2Ce( 3λL) (3.61) ≤ C0L0λ C + k−1 + k−3/2 . 1 (λL0)2 1 (√3 3λL)2 − − The following lemma is a trivial inequality from [24, Lemma 7.2]. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 26
Lemma 15. Let Y be a random variable. If p-th moment of Y is finite, E Y p < , | | ∞ then for any p 1, ≥ E Y E(Y ) p 2pE Y p. (3.62) | − | ≤ | | Proof. From Jensen’s inequality with p 1, we have ≥
(E Y )p E Y p. (3.63) | | ≤ | |
Minkowski’s inequality gives
(E Y E(Y ) p)1/p (E Y p)1/p + ( EY p)1/p = (E Y p)1/p + EY (3.64) | − | ≤ | | | | | | | | 2 (E Y p)1/p . (3.65) ≤ | |
The last inequality is from (3.63).
The equality in the following lemma used in [5] without proof is reviewed here.
i Lemma 16. In the particle filter setting, let x be particles at time t 1 and t−1 { t−1} − G be a σ-algebra generated by xi . Then for any ϕ B(Rn) we have { t−1} ∈
N N E (π , ϕ) t−1 = (π , Kϕ). (3.66) t|t−1 | G t−1|t−1
Proof. We have
" N # 1 X E (πN , ϕ) = E ϕ(xi) (3.67) t|t−1 t−1 N t t−1 | G i=1 | G
= E [ϕ(xt) t−1] (3.68) | G Z N 1 X j = ϕ(xt) K(dxt xt−1) (3.69) n N R j=1 | N Z 1 X j = K(dxt xt−1)ϕ(xt) (3.70) N n j=1 R | N 1 X = Kϕ(xj ) (3.71) N t−1 j=1 CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 27
N = (πt−1|t−1, Kϕ). (3.72)
The third equality is from the definition in (3.48).
3.4.2 Proof of strong convergence
We start this section by proving the error bound on the difference of likelihood as- signed to one particle in the case of the CPF and MCDPF. A weight of particles of the CPF is assigned with respect to the entire measurement whereas the weight of particles of the MCDPF is assigned sequentially by jumping around the nodes according to Markov chain. Hence understanding how those weights determined by likelihood are different is a fundamental step for the proof.
Lemma 17. Suppose we have uniformly ergodic MCDPF and ρt,k and ρt are functions defined respectively as
m m Y 2|E(G)| ×M(j,k) Y ρt,k = ρj(yj,t xt) kd(j) , ρt = ρj(yj,t xt). (3.73) j=1 | j=1 |
With the following constant c,
2|E(G)| d(j) ρj c = max ln (3.74) j ρt
? 2 1 there exists k such that √ √3 and we have the upper bound on the expected k? ≤ 3L∨L0 ? error between ρt,k and ρt for k k , ≥
2 2 E ρt,k ρt ρ Φ(k, ν). (3.75) | − | ≤ t where
√ √ 0 2 02 3 2 3 2 1 0 0 (2L ) L (2 3L) ( 3L) Φ(k, ν) = 2C L e k + e k + C e k + 2e k (3.76) k 1 1 + 2C0L0 + (3.77) k (2L0)2 k (L0)2 − − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 28
2 2 1 2 2γg γg + C + + e k 2e 2k + 1 . (3.78) k (2√3 3L)2 k (√3 3L)2 − − − where positive constants L, L0,C and C0 depend on the SLEM of Markov chain tran- sition matrix as before.
Proof. For the sake of a simple notation the argument of a function ρj is omitted. i N Also let t be a σ-field generated by particles x , then F { t}i=1
m m !2 2|E(G)| ×M(j,k) 2 Y kd(j) Y E ρt,k ρt t = xt = E ρj ρj t = xt (3.79) | − | | F j=1 − j=1 F
m m !2 2|E(G)| ×M(j,k)−1 Y 2 Y kd(j) = ρj E ρj 1 t = xt (3.80) j=1 j=1 − F
m !2 2|E(G)| M(j,k) − d(j) 2 Y d(j) ( k 2|E(G)| ) = ρt E ρj 1 t = xt (3.81) j=1 − F
m !2 2|E(G)| Z 2 Y d(j) k,j = ρt E ρj 1 t = xt (3.82) j=1 − F
Pm 2 2 Cj Zk,j = ρ E e j=1 1 t = xt . (3.83) t − F
2|E(G)| M(·,k) d(·) ∞ where Cj = ln ρj and Zk,· = . Let (Xi) be Markov chain on d(j) k − 2|E(G)| i=0 the state space and the function g : R be defined as g(Xi) = I(Xi = j)Cj X X → for j = 1, . . . , m. Now we have
m m m X X M(j, k) X d(j) CjZk,j = Cj Cj (3.84) k − 2 E(G) j=1 j=1 j=1 | | m m X M(j, k) X = C ln ρ (3.85) j k j j=1 − j=1 k−1 1 X = g(X ) ln ρ (3.86) k i t i=0 − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 29
k−1 1 X = g(X ) Π(g). (3.87) k i i=0 −
Pm Defining Wk(g) = √k j=1 CjZk,j and Markov chain central limit give the conver- 2 gence of Wk(g) to the normal random variable with mean 0 and the variance γg defined in (3.50). With the definition of Wk(g) and a standard normal random variable X, the following expectation over Wk(g) is
2 √1 W (g) h √2 W (g) √1 W (g) i E e k k 1 = E e k k 2e k k + 1 (3.88) − − 2 2 1 1 h √ W (g)i h √ γgX i h √ W (g)i h √ γgX i E e k k E e k + 2 E e k k E e k ≤ − − 2 1 h √ γgX i h √ γgX i + E e k 2E e k + 1 (3.89) − 2 2 1 1 h √ W (g)i h √ γgX i h √ W (g)i h √ γgX i = E e k k E e k + 2 E e k k E e k − − 2 2 2γg γg + e k 2e 2k + 1. (3.90) −
Since √1 is decreasing sequence for k = 1, 2,... there exists k? such that √2 1 k k? ≤ L∨L0 from theorem 14 and c defined in lemma satisfies the condition g(x) Π(g) = | − | ? Cj ln ρt c. Therefore for k k | − | ≤ ≥
0 2 √3 2 √3 2 1 2 (2L ) L02 (2 3L) ( 3L) √ Wk(g) 1 0 0 E e k 1 t = xt 2C L e k + e k + C e k + 2e k − | F ≤ k 1 1 + 2C0L0 + (3.91) k (2L0)2 k (L0)2 − − 1 2 + C + (3.92) k (2√3 3L)2 k (√3 3L)2 2 − 2 − 2γg γg + e k 2e 2k + 1. (3.93) −
2 Multiplying ρt gives the desired result.
Now the strong convergence of MCDPF will be proved through each steps of MCDPF such as prediction, measurement and resampling step. This inductive proof is a trivial modification of [5, Lemma 3-5]. The main difference, however, is that the CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 30
effect of Markov chain step, k, has to be considered and furthermore the mean square error in terms of SLEM is argued in the main theorem. The following lemma is about the prediction update step.
Lemma 18. Let us assume that for any ϕ B(Rn) ∈ 2 h N 2i 2 p √ct−1|t−1 E (πt−1|t−1, ϕ) (πt−1|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + , (3.94) − ≤ k k √N then we have
2 h N 2i 2 p √ct|t−1 E (πt|t−1, ϕ) (πt|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + . (3.95) − ≤ k k √N
2 where ct|t−1 = (2 + √ct−1|t−1) .
Proof. From Minkowski’s inequality
N 21/2 N N 21/2 E ((π , ϕ) πt|t−1, ϕ ) E ((π , ϕ) (π , Kϕ)) (3.96) t|t−1 − ≤ t|t−1 − t−1|t−1 N 21/2 + E ((π , Kϕ) πt−1|t−1, Kϕ ) . t−1|t−1 − (3.97)
The first term on the right hand side is bounded above as follows. From lemma 16 and lemma 15,
2 N N 2 N N E (π , ϕ) (π , Kϕ) t−1 = E (π , ϕ) E[(π , ϕ) t−1] t−1 | t|t−1 − t−1|t−1 | | G t|t−1 − t|t−1 | G | G (3.98) " N # 1 X 2 = E ϕ(xi) E[ϕ(xi) ] N 2 t t t−1 t−1 i=1 − | G | G N 1 X 2 = E ϕ(xi) E[ϕ(xi) ] N 2 t t t−1 t−1 i=1 − | G | G N 4 X E ϕ2(xi) (3.99) N 2 t t−1 ≤ i=1 | G CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 31
4 N 2 = E (π , ϕ ) t−1 (3.100) N t|t−1 | G 4 = (πN , Kϕ2). (3.101) N t−1|t−1
Since Markov operators are contraction [28], Kϕ ϕ ∞, k k ≤ k k
N N 2 4 2 E (π , ϕ) (π , Kϕ) t−1 ϕ . (3.102) | t|t−1 − t−1|t−1 | | G ≤ N k k∞
The upper bound of the second term is given in the condition of lemma.
2 N 2 2 p √ct−1|t−1 E (πt−1|t−1, Kϕ) πt−1|t−1, Kϕ ϕ ∞ φt−1 Φ(k, ν) + . | − | ≤ k k √N (3.103) And so
1/2 h N 2i p √ct|t−1 E (πt|t−1, ϕ) (πt|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + (3.104) − ≤ k k √N where √ct|t−1 = 2 + √ct−1|t−1.
Given the result of the prediction step together with lemma 17 the following lemma gives the error bound after Markov chain iteration step.
Lemma 19. Let us assume that for any ϕ B(Rn) ∈ 2 h N 2i 2 p √ct|t−1 E (πt|t−1, ϕ) (πt|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + (3.105) − ≤ k k √N and 2 2 E ρt,k ρt ρ Φ(k, ν). (3.106) | − | ≤ t Then we have
h i c 2 N 2 2 ˜ p √ t|t−1 E (πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ) ϕ ∞ φt Φ(k, ν) + , (3.107) − ≤ k k √N where φ˜t = φt−1 + ρt. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 32
Proof. From Minkowski’s inequality,
N 21/2 N N 21/2 E ((π , ρt,kϕ) (πt|t−1, ρtϕ)) E ((π , ρt,kϕ) (π , ρtϕ)) t|t−1 − ≤ t|t−1 − t|t−1 (3.108) N 21/2 + E ((π , ρtϕ) (πt|t−1, ρtϕ)) . t|t−1 − (3.109)
The first term in the right hand side is bounded as follows.
N !2 N N 2 1 X i i E ((πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ)) = E (ρt,kϕ(xt) ρtϕ(xt)) N 2 − i=1 − " N # 1 X E (ρ ϕ(xi) ρ ϕ(xi))2 (3.110) N t,k t t t ≤ i=1 − " 2 N # ϕ ∞ X 2 E k k (ρt,k ρt) (3.111) N ≤ i=1 − ϕ 2 ρ2Φ(k, ν). (3.112) ≤ k k∞ t
The first inequality is a trivial application of H¨oelder’s inequality. And so N 21/2 p √ct|t−1 E ((πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ)) ϕ ∞ (φt−1 + ρt) Φ(k, ν) + . − ≤ k k √N
The next MCDPF step is a measurement update and the following lemma provides the error bound after the measurement step.
Lemma 20. Let us assume that for any ϕ B(Rn) ∈ h i c 2 N 2 2 ˜ p √ t|t−1 E (πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ) ϕ ∞ φt Φ(k, ν) + , (3.113) − ≤ k k √N then we have
p !2 h N 2i 2 p c˜t|t E (˜πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + . (3.114) − ≤ k k √N CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 33
Proof. We have
N N (πt|t−1, ρt,kϕ) (πt|t−1, ρϕ) (˜πt|t,k, ϕ) (πt|t, ϕ) = N (3.115) − (πt|t−1, ρt,k) − (πt|t−1, ρ) N N N (πt|t−1, ρt,kϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρϕ) = N + . (πt|t−1, ρt,k) − (πt|t−1, ρ) (πt|t−1, ρ) − (πt|t−1, ρ) (3.116) where
N N N N (π , ρt,kϕ) (π , ρt,kϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρt,k) (πt|t−1, ρ) t|t−1 t|t−1 − N = N (3.117) (πt|t−1, ρt,k) − (πt|t−1, ρ) (πt|t−1, ρt,k)(πt|t−1, ρ)
N ϕ ∞ (πt|t−1, ρt,k) (πt|t−1, ρ) k k − . (3.118) ≤ (πt|t−1, ρ)
Also
N N (π , ρt,kϕ) (π , ρϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρϕ) t|t−1 t|t−1 − = . (3.119) (πt|t−1, ρ) − (πt|t−1, ρ) (πt|t−1, ρ)
Using Minkowski’s inequality again gives
N 21/2 ϕ ∞ N 21/2 E ((˜πt|t,k, ϕ) (πt|t, ϕ)) k k E ((πt|t−1, ρt,k) (πt|t−1, ρ)) (3.120) − ≤ (πt|t−1, ρ) −
1 N 21/2 + E ((πt|t−1, ρt,kϕ) (πt|t−1, ρϕ)) (πt|t−1, ρ) − (3.121) 2 ϕ ∞ p √ct|t−1 k k φ˜t Φ(k, ν) + . (3.122) ≤ (πt|t−1, ρ) √N
2φ˜t 4ct|t−1 where φt = andc ˜t|t = 2 . (πt|t−1,ρ) (πt|t−1,ρ) The next lemma provides the error bound after the resampling step which is the final step of an inductive proof. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 34
Lemma 21. Let us assume that for any ϕ B(Rn) ∈ p !2 h N 2i 2 p c˜t|t E (˜πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + , (3.123) − ≤ k k √N then we have
2 h N 2i 2 p √ct|t E (πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + . (3.124) − ≤ k k √N
Proof. We have
N N N N (π , ϕ) (πt|t, ϕ) = (π , ϕ) (˜π , ϕ) + (˜π , ϕ) (πt|t, ϕ). (3.125) t|t,k − t|t,k − t|t,k t|t,k −
i N If t is a σ-algebra generated by particles x , then H { t}i=1
N N N 1 X i 1 X i i N E (π , ϕ) t = Eϕ(x ) = w ϕ(x ) = (˜π , ϕ). (3.126) t|t,k N t N t t t|t,k H i=1 i=1
Thus by the same procedure from (3.98) to (3.101) in lemma 18, for some constant C¯, 2 N N 2 ϕ ∞ E ((π , ϕ) (˜π , ϕ)) t C¯ k k . (3.127) t|t,k − t|t,k H ≤ N Finally we have ! pc˜ N 21/2 p ¯ ϕ ∞ p t|t E ((πt|t,k, ϕ) (πt|t, ϕ)) C k k + ϕ ∞ φt Φ(k, ν) + . (3.128) − ≤ √N k k √N
¯ p where √ct|t = √C + c˜t|t.
Putting Lemma 18, 20 and 21 together gives the following theorem of the conver- gence of MCDPF.
Theorem 22. Under the assumptions made in theorem 9 there exists a time depen- dent constant ct|t and constant Φ(k, ν) dependent of Markov chain step k and SLEM CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 35
ν for all t 0 satisfying ≥ 1/2 h N 2i p √ct|t E (πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + . (3.129) − ≤ k k √N
Therefore the root mean square error of MCDPF converges proportional to ( √1 + O k √1 ) and increases proportional to ( √1 e1/δ) as δ 0 where δ is spectral gap δ = N O δ → 1 ν. − Proof. The root mean square error of MCDPF in terms of the number of the particle N is easily obtained from the upper bound in (3.129). Also we have Φ(k, ν) = (k−1) O for k k? and a fixed ν as given in (3.76-3.78). Similarly if k is fixed, then Φ(k, ν) = √ ≥2 3 2 e1/δ (2 3L) −1 −1 ( ) because Ce k term dominates and C = (δ ) and L = (δ ). O δ O O
3.5 Numerical certificate of strong convergence
In [29] the performance of the MCDPF is illustrated with a bearing-only tracking example and a relation between the root mean square error (RMSE) and Markov chain step is plotted for the purpose of demonstrating the error behavior. The rate of error, however, is thoroughly proved in this paper and the decay rate of RMSE numerically verifies the result of the main theorem 22. We consider a bearing-only tracking example in this section for the purpose of demonstrating the performance of the MCDPF. The dynamic model was a time- dependent linear system but the measurement model was nonlinear, with one moving target tracked by 4 bearing sensors linked in a network. There were two modes for the movement of the target, straight and turning mode. The target moved with linear dynamics, turning to the right by 90 degrees between 0.5 and 1, 2 and 2.5, and
3.5 and 4 seconds. The state vector is [xt yt x˙ t y˙t] and the state-space system and measurements at each sensor were given by
Ft∆t xt+1 = e xt + qt, (3.130) i i yt s (y) i θt = arctan − i + rt, (3.131) xt s (x) − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 36
where 0 0 1 0 ( 0 0 0 1 0 Straight mode Ft = , at = 0 0 0 a π Turning mode t 2×51∆t 0 0 at 0 −
∆t3 ∆t2 3 0 2 0 ∆t3 ∆t2 i 2 0 3 0 2 rt (0, 0.05 ), qt 0, 2 . ∼ N ∼ N ∆t 0 ∆t 0 2 ∆t2 0 2 0 ∆t
i i i Here ∆t = 0.01, (s (x), s (y)) was the position of ith sensor, and rt was the mea- surement noise. Each sensor was connected to its nearest two neighboring sensors. The true trajectory of the moving target was estimated by the CPF and MCDPF. The centralized particle filter tracked the true trajectory with N = 400 particles and bearing information gathered from all four different sensors. The trajectory was also estimated by the MCDPF with N = 400 particles at each node. Figures 3.1 shows the estimation results of the CPF and MCDPF with k = 4 Markov chain steps per measurement for the MCDPF. Even with such a small num- ber of Markov chain steps per measurement, the MCDPF at each sensors obtained a reasonable estimate of the true trajectory by exchanging information only with con- nected neighbor according to a random walk of particles and weights on the sensor network. To numerically study the convergence of the posterior distribution of the MCDPF at each node to the posterior distribution of the CPF (as proved in theorem 9), we define the root mean square error (RMSE) to be