PARALLEL STOCHASTIC PARTICLE METHODS USING RANDOM WALKS

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF AERONAUTICS AND ASTRONAUTICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Sun Hwan Lee December 2010

© 2011 by Sun Hwan Lee. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/jn897hc5058

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Matthew West, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Peter Glynn, Co-Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Juan Alonso

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Sanjay Lall

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii Abstract

Particle methods, also known as Monte Carlo methods in the statistical community, have become a powerful tool for a variety of research areas such as , as- tronomy, and finance, to list a few. This is mainly due to the enormous advances in computational resources in recent years. In this work, we consider an efficient and robust parallel methodology that can be applied to particle methods in a general setting. The parallel methodology proposed in this thesis takes advantage of Markov Chain random walks and corresponding Markov chain theory. We develop parallel stochastic particle methods in two different areas: (1) the optimal filtering problem, and (2) simulation of particle coagulation. In each application, a mathematical proof of convergence as well as a numerical example are provided. After a brief review of Markov Chain random walks and an explanation of the two application areas in chapter 1, the Markov Chain Distributed (MCDPF) algorithm is introduced. The performance of this method is demonstrated with a bearing-only-measurement target-tracking numerical example and is further compared with an existing method, the Distributed Extended Kalman Filter (DEKF), using a flocking model for the target vehicles. We study the convergence of MCDPF to the Centralized Particle Filter (CPF) and the optimal filtering solution by using results from Markov chain theory. In addition, the robustness of the MCDPF method is highlighted for practical problems. As the second application area, we developed a parallel stochastic particle method for the stochastic simulation of Smoluchowski’s coagulation equation. This equation is used in many broad areas and for high-dimensional problems the stochastic particle solution is more accurate, stable and computationally cheaper than classical numerical

iv integration schemes. In this application, simulated particles can be considered as representing physical particles. Since more particles result in more accurate and useful solutions, it is desirable to simulate this equation with a greater number of particles. By applying the parallel stochastic particle method, a comparable solution is obtained more efficiently using multiple processors, where one processor maintains many fewer particles by communicating with neighboring processors. A numerical study as well as a theoretical analysis are provided to demonstrate the convergence of the parallel stochastic particle algorithm.

v Acknowledgement

For the six years I spent for my M.S. and Ph.D. degree, I have never noticed the importance of people around me who helped me in various ways. One nice thing about defending the Ph.D. degree is that it gives me an opportunity to pause and appreciate such valuable people when I wrap up my studies at this moment. First, I would like to thank Professor Matt West, my advisor, for giving me a great research opportunity and a lot of valuable advice. He is the one who introduced me the area of numerical computation, stochastic systems, and , and I was able to explore totally new fields for me due to his deep knowledge and generous support. I could not have finished my degree without his guidance and patience. I also want to thank Professor Peter Glynn who advised me after Professor Matt West left Stanford. I learned a lot from his classes about stochastic systems and calculus, which equipped me with theoretical background on those subjects. It was a great help for me to have someone with whom I could consult about research in person. Thanks to Professor Sanjay Lall, Juan Alonso and James Primbs for generously being the committee members of my Ph.D. oral examination. The Samsung Scholarship Foundation supported me for four years of my graduate studies. Along with the financial support, I really appreciate the opportunities to meet with other Korean students studying across the world and the great experiences with them. My thanks should go to friends I met here at Stanford: Younggeun Cho, Taemie Kim, Taesup Moon, Jeeyoung Peck, Chunki Park, Jinsung Kwon, Jongyoon Peck, Kahye Song, Daeseok Nam, Minyong Shin, Hyungsik Shin, Jonghan Kim, Jaeheung Park, and all SGBT members. I always miss Korea because of my friends: Taesung Choi, Jiyoung Kang, Keum-Dong Jung, Yoonkyoung Hur, Jisun Peck, Hyejung Lee,

vi Seungmin Wie, and Sehyuk Kwak. It is not enough to thank my family in Korea for their spiritual support and love. My sincere thanks goes to my parents, Joowon Lee and Hwasook Park, whom I respect the most in the world, and to my older brother, Daehwan Lee, who is a good competitor at all kinds of sports and I hope to have many rounds of golf together. I also would like to thank my parents-in-law for their love and care. Last but not least, I would like to thank my family, YeoMyoung and Yuna. My marriage and the birth of my daughter changed my Ph.D. life dramatically, but in a very positive way. From the first moment at Stanford West tennis court to finishing my Ph.D. degree, we enjoyed life at Stanford as a student family and I am so excited about the journey toward the new stage of our life from now on. I love you and thank you, YeoMyoung and Yuna.

vii Contents

Abstract iv

Acknowledgement vi

1 Introduction 1 1.1 Problem description ...... 1 1.2 Dissertation overview ...... 3

2 Background 4 2.1 Markov chain random walk ...... 4 2.2 Steady state of Markov chains ...... 6

3 Markov Chain Distributed Particle Filter 9 3.1 Introduction ...... 10 3.2 Random walks on a graph ...... 12 3.3 Particle filters ...... 14 3.3.1 Centralized particle filters ...... 15 3.3.2 The Markov Chain Distributed Particle Filter (MCDPF) . . . 16 3.3.3 Convergence to CPF and algorithm ...... 18 3.3.4 Convergence to optimal filtering ...... 20 3.4 Strong convergence ...... 23 3.4.1 Preliminaries ...... 23 3.4.2 Proof of strong convergence ...... 27 3.5 Numerical certificate of strong convergence ...... 35

viii 3.6 Performance comparison ...... 38 3.6.1 Extended Kalman filter ...... 38 3.6.2 Numerical example ...... 41 3.7 Conclusions ...... 48

4 Parallel stochastic simulation of coagulation 50 4.1 Introduction ...... 51 4.2 Gillespie’s method ...... 52 4.2.1 Numerical example ...... 53 4.3 Parallel stochastic particle algorithm ...... 55 4.3.1 Numerical example ...... 58 4.4 Convergence of parallel stochastic particle method ...... 61 4.5 Conclusions ...... 74

Bibliography 75

ix List of Tables

3.1 Table of the algorithms, RMSE values and the fraction of divergence (Averaged over 1000 Monte Carlo Runs) ...... 43 3.2 Table of the algorithms, RMSE values and the fraction of divergence (Averaged over 1000 Monte Carlo Runs) ...... 46 3.3 Table of the algorithms, RMSE values and the fraction of divergence . 47

x List of Figures

3.1 The trajectory estimation by CPF and DPF with Markov chain steps k =4...... 37 3.2 RMSE of MCDPF and CPF with respect to number of execution (left) and different Markov chain steps k (right)...... 37 3.3 Trajectory of flocking model and its position estimation by EKF, DEKF, REKF, RDEKF, CPF and MCDPF...... 44 3.4 RMSE versus time for EKF, DEKF, REKF, RDEKF, CPF, MCDPF. 45 3.5 RMSE with respect to BW with changing N = 50, 100, 200, 500 (CPF),

kmc = 5, 10, 20, 50 (MCDPF) and kcon = 2, 6, 10, 14 (DEKF). The de- crease in RMSE is observed with increased BW...... 48

4.1 c(t, k) of linear kernel with 2 k 10...... 54 ≤ ≤ 4.2 The stochastic solution with M0 = 500, 120 and the histogram of τn.

For M0 = 120 and M0 = 500 the portion of τn 0.01 is 0.4311 and ≤ 0.8016...... 56

4.3 The plot of c104 (t, 5) andc ˜104 (t, 5) with 3 different τmix’s...... 59

4.4 The plot of c104 (5, k) andc ˜104 (5, k) with 3 different τmix’s...... 60 4 4.5 eR 2 defined in (4.22) up to an ensemble size of R = 10 ...... 60 k k 4.6 Particular realization of particle coagulation using Gillespie’s algorithm. 67

xi Chapter 1

Introduction

1.1 Problem description

Stochastic particle methods, which are based on Monte Carlo or methods, have become powerful and practical tools in a variety of research areas due to signif- icant developments in computing power. Nonlinear and high dimensional functions and complex probability distributions are good examples that can be represented by a set of particles and associated weights. In many areas where particle methods are used, it is generally true that more particles means more accurate results. Therefore there is a need for faster computation to process a large number of particles for the purpose of applying particle methods in more practical situations. Parallel compu- tation is a direct method to achieve this objective, not only because it itself meets this requirement but because some practical situations are naturally distributed over a computer network. As simple examples, the position estimation of a moving object can be achieved by a physically distributed sensor network and numerous computa- tionally expensive simulations already execute on a parallel cluster. Applying parallel computation to particle methods, however, requires clarification of what type of in- formation is to be communicated and how to exchange it between processing nodes. We use a Markov chain random walk as a way to communicate information be- tween processors in a parallel particle method, and we exchange individual particles

1 CHAPTER 1. INTRODUCTION 2

as the basic unit of data. This is in contrast with methods that exchange parame- terized representations of sets of particles. While such methods often aim to reduce the communication load, we will show that a parallel method with raw particle data is superior in some cases. Furthermore, we can easily prove the convergence of our parallel methods to centralized approachs by adapting well established results from Markov chain theory. The parallel stochastic particle method that is introduced in this thesis is demon- strated through the application to two different areas: (1) an optimal filtering prob- lem, and (2) a stochastic simulation of particle coagulation. The application of the parallel stochastic particle method to the optimal filtering problem is motivated by recent interest in distributed nonlinear system estimation, which has practical impli- cations in many areas. We show that the Markov Chain Distributed Particle Filter (MCDPF) studied in this thesis converges both weakly and strongly to both the Cen- tralized Particle Filter (CPF) and the optimal filtering solution in a probabilistically well defined manner. The robustness and practicality of MCDPF is demonstrated numerically by comparing its performance with an existing distributed estimation method, the Distributed Extended Kalman Filter (DEKF), for a distributed target tracking problem with flocking vehicles. The motivation of the second application, the simulation of particle coagulation, stems from the need for a massive number of simulated particles to adequately capture complex, high-dimensional particle populations. The distributed stochastic particle simulation method is based on the centralized Stochastic Simulation Algorithm (SSA, or Gillespie’s method). The idea is to distribute particles across many processors and to exchange particles between neighboring processors so that particles at different processors can interact with each other. The convergence of the distributed simulation to the centralized simulation is shown analytically as the particle exchange rate goes to infinity. To explore the convergence rates a numerical example with varying particle exchange rate is demonstrated. CHAPTER 1. INTRODUCTION 3

1.2 Dissertation overview

We review the background on Markov chain random walks and steady state Markov chain distributions in the next chapter. Following this background material, we con- sider the use of parallel stochastic particle methods for the optimal filtering problem and the simulation of particle coagulation in separate chapters. In chapter 3, the Markov Chain Distributed Particle Filter (MCDPF) algorithm is introduced and its convergence to the Centralized Particle Filter (CPF) is proved. In addition, the performance of MCDPF is compared with an existing distributed non- linear system estimation method, the Distributed Extended Kalman Filter (DEKF), using distributed target tracking for a flocking model. In chapter 4, the parallel stochastic particle method is applied to solving Smolu- chowski’s coagulation equation. We present a parallel algorithm for this problem and prove convergence of the solution from this method to that of the serial solution. To understand the behavior of the parallel algorithm, a numerical example is also included. Chapter 2

Background

A Markov chain is a whose next value only depends on the current value and can be classified as a discrete-time Markov chain (DTMC), a continuous- time Markov chain (CTMC), or Brownian , depending on the type of the state and time. Many theorems on the subject of DTMC can be generalized for case of , but we focus only on DTMC in this thesis. The materials reviewed here mostly refer to [51]

2.1 Markov chain random walk

A discrete-time Markov chain (DTMC) is a Markov process whose state space is a finite or a countable set and whose time index set is discrete, T = (0, 1, 2,...). A DTMC has the ,

Pr(Xn+1 = j X0 = i0,...,Xn−1 = in−1,Xn = i) = Pr(Xn+1 = j Xn = i) (2.1) | | where Xn is a at time n and i0, . . . , in−1, i, j are states of DTMC. Since the future value depends on the past only through the current value, the one- step transition probability which describes the probability of Xn+1 being at state j

4 CHAPTER 2. BACKGROUND 5

given that Xn is at i is defined as follows,

n,n+1 P = Pr(Xn+1 = j Xn = i). (2.2) ij |

If a one-time transition probability is independent of time index n, then we say that the DTMC has a stationary transition probability and transition are arranged in a form. We restrict our discussion in this thesis to stationary Markov chains from now on.   P00 P01 P02  ··· P P P   10 11 12  P =  ··· . (2.3) P20 P21 P22   . . . ···.  . . . ..

The matrix P is called a Markov matrix or a transition probability matrix and it satisfies the following properties.

Pij 0 for i, j = 0, 1, 2,... (2.4) ≥ ∞ X Pij = 1 for i = 0, 1, 2,... (2.5) j=0

A DTMC is completely defined by its transition probability matrix and an initial state X0, since the probability of an arbitrary Markov process is obtained as follows.

Pr(X0 = i0,X1 = i1,...,Xn = in) (2.6)

= Pr(X0 = i0,X1 = i1,...,Xn−1 = in−1) (2.7)

Pr(Xn = in X0 = i0,X1 = i1,...,Xn−1 = in−1) (2.8) × | = Pr(X0 = i0,X1 = i1,...,Xn−1 = in−1) ,i = = pi Pi ,i Pi ,i (2.9) n n−1 ··· 0 1 0 ··· n n−1

where pi0 = Pr(X0 = i0). A Markov chain random walk is a random process whose transition probability is defined by the transition probability matrix P . It is easily understood if we consider CHAPTER 2. BACKGROUND 6

particle moving in the state space according to the transition probability matrix. The probability that the particle moves from state i to j, where i, j are states of the Markov chain, is the (i, j)th element of the transition probability matrix P .

Pr(Xn = j Xn−1 = i) = Pij. (2.10) |

The Markov chain random walk introduced in this section will be used as a methodology to communicate or exchange information between connected nodes later. In the parallel setting, the particle is considered as an appropriate type of information and states are treated as processors of the parallel machines, sensors, or nodes in the sensor network.

2.2 Steady state of Markov chains

Given a random process such as a Markov chain random walk introduced in the previous section, it is impossible to know which state the particle will be at the next time index and to know how many times the particle will visit a particular state i during n time steps. The basic limit theorem of Markov chains says that if we run the process for a very long time, we can gain some useful information on the behavior of the random process. The distribution π is called a stationary distribution if it satisfies the following. X X πj = πiPij, πj = 1. (2.11) i∈S j∈S Here S is the state space of the Markov chain. The following theorem shows the existence of a stationary distribution for a finite-state Markov chain.

Theorem 1. Suppose P is the transition probability matrix of a finite-state Markov chain. Then there exists a matrix Λ such that

P Λ = ΛP = Λ2 = Λ. (2.12) CHAPTER 2. BACKGROUND 7

Furthermore, n−1 1 X lim P j = Λ. (2.13) n→∞ n j=0 To study the uniqueness of limit distributions, we recall some properties of Markov chains.

Irreduciblity

A state j is said to be accessible from the state i if P (n) > 0 for some integer n 0. ij ≥ If two states i, j are accessible to each other, then they are said to communicate. Communication is transitive, so given a Markov chain, we can partition the states into equivalence classes in which states communicate with each other. A Markov chain is irreducible if all states communicate with each other.

Periodicity of Markov chain

The period of a state i is defined as the greatest common divisor of all integers n 1 (n) ≥ for which Pii > 0. A Markov chain in which each state has period 1 is called aperiodic.

Recurrent or Transient

A state i is said to be recurrent if and only if the probability that the state starting from i returns to i is one. The states that are not recurrent are called transient.

Formally, let the random variable τi be the first return time to the state i,

τi = inf n 1 : Xn = i X0 = i . (2.14) { ≥ | }

Then, the state i is recurrent if and only if

Pr(τi < ) = 1. (2.15) ∞

Although the hitting time is finite, it need not have a finite expectation. The state i is positive recurrent if E[τi] is finite. CHAPTER 2. BACKGROUND 8

Theorem 2. An irreducible Markov chain has a stationary distribution π if and only if all of its states are positive recurrent. Furthermore π is related to the expected return time. 1 πi = . (2.16) E[τi] Theorem 3. Consider a positive recurrent irreducible aperiodic Markov chain with states S. Then we have a unique stationary distribution π satisfying

(n) X lim Pjj = πj = πiPij (2.17) n→∞ i∈S (n) 1 lim Pij = . (2.18) n→∞ E[τj]

The rigorous proof of the above theorems is worth reviewing for a deep under- standing of Markov chain random walks and their stationary behavior, but as a tool to develop a parallel methodology for stochastic particle methods we simply list good resources for a mathematical consideration of Markov chain random walk, for exam- ple [32, 51, 9]. The Markov chain random walk introduced in this chapter is the main idea re- quired for the development of our parallel stochastic particle methods. The follow- ing relation between Markov chain random walk and the parallel stochastic particle method is stated again before we jump into the two application areas. Each state is considered to be a node or sensor in a physically distributed sensor network, or an individual processor in a parallel computing cluster, and the particle moving by a random walk between states in the Markov chain is any type of information that is exchanged between nodes, sensors, or processors. Chapter 3

Markov Chain Distributed Particle Filter

Distributed particle filters (DPF) are known to provide robustness for the state esti- mation problem and can reduce the amount of information communication compared to centralized approaches. Due to the difficulty of merging multiple distributions represented by particles and associated weights, however, most uses of DPF to date tend to approximate the posterior distribution using a parametric model or to use a predetermined message path. In this chapter, the Markov Chain Distributed Par- ticle Filter (MCDPF) algorithm is proposed, based on particles performing random walks across the network. This approach maintains robustness since every sensor only needs to exchange particles and weights locally and furthermore enables more general representations of posterior distributions because there are no a priori assumptions on distribution form. In section 3.2, a review of basic properties and theorems for random walks on graphs is reviewed. Section 3.3 contains the Centralized Particle Filter (CPF) algorithm and the proposed decentralized particle filter algorithm. In addition, the weak convergence of the posterior distribution of the MCDPF to that of the CPF and the optimal filter, which is the main result of [29], is reviewed in this section. Section 3.4 consists of the proof of strong convergence of MCDPF to the optimal filtering. The definition of the strong convergence is also provided in the beginning of this section. Furthermore we compare the performance of MCDPF in

9 CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 10

a practical situation, a range-only tracking problem using a flocking model, with the Distributed Extended Kalman Filter (DEKF) in section 3.6. Conclusions and future work are discussed in section 3.7.

3.1 Introduction

In the Bayesian filtering problem, there have been many efforts to approximate the distribution. The popular methods developed in 1960s and 70s include the extended Kalman filter [3] and sequential Monte Carlo method [1, 23, 22, 54]. The very first introduction of the Sequential Monte Carlo method, also known as particle filtering, indeed goes back to the calculation of a growing [21, 44]. Particle filtering was not able to be broadly adopted mainly because of its very high computational complexity and the lack of adequate computing resources at that time [7]. Along with a huge amount of development in computing power, particle filtering has be- come a very active research topic and has been applied to various areas. Among those areas, started to take advantage of particle filtering following a seminal paper [17]. Various modifications of a standard particle filter to improve the performance are introduced in the tutorial paper [31] in a clear manner. Stratified sampling, residual sampling [30] and Systematic resampling [27] were proposed as effi- cient resampling schemes. Pitt and Shephard [40] introduced the Auxiliary Sampling Importance Resampling (ASIR) filter for better estimation and the regularized par- ticle filter (RPF) was proposed in [33] to solve the problem induced by a resampling step. Distributed Particle Filters (DPF) have been emerging as an efficient tool for state estimation, for instance in target tracking with a robotic navigation system [45, 18]. The general benefits of distributed estimation include the robustness of the estimation, the reduction of the amount of information flow, and estimation results comparable to the centralized approach. Much effort has been directed toward the realization of the decentralized Kalman filtering [42, 35, 36] but decentralized particle filters were thought to be challenging due to the difficulty of merging probability distributions represented by particles and weights [38]. The currently existing distributed particle CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 11

filtering methods, however, are not able to gain all of these advantages or turn out to benefit from these properties only with relatively low dimensional systems by in- troducing an assumption such as Gaussian Mixture Model (GMM). The developed methods so far try to avoid exchanging the raw data, namely particles and associated weights, mainly due to the large amount of information that implies. The commu- nication of such raw data scales better with system dimension, however, than the existing methods do. The distributed particle filter proposed in [29] exchanges par- ticles and weights only between nearest neighbor nodes and estimates the true state by assimilating data with an algorithm based on a Markov chain random walk. Past work on distributed particle filters can be broadly categorized into two ap- proaches, namely message passing approaches and consensus-based local information exchange methods. Message passing approaches transfer information along a pre- determined route covering an entire network. For example, [4] passes parameters of the parametric model of the posterior distribution while [48] transmits the raw information, particles and weights, or the parameters of a GMM approximation of the posterior distribution. Consensus based methods communicate the information only between the nearest nodes and achieve global consistency by consensus filtering. The type of exchanged information can be, for example, the parameters of a GMM approximation [18] or the local mean and covariance [19]. The message passing approaches [4, 48] can have reduced robustness because the distributed algorithms cannot themselves cope with the failure of even one node since the system uses fixed message paths. Furthermore, the assumption of synchronization with identical particles at every node can cause fragility. On the other hand, the consensus based approaches so far proposed [18, 19] all use approximations of the posterior distribution with GMM because to reduce information flow. The amount of reduced information, however, is not significant compared with transmitting full particles when the dimension of the system is very large, due to the covariance matrix of the posterior distribution. For an n-dimensional system, consensus based DPF with a GMM approximation has to transmit (cn2 E ) data through the entire network per O | | consensus step, where c is the number of Gaussian mixtures and E is the number of | | network edges. If the DPF is realized with exchanging N particles, however, then the CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 12

amount of information per Markov chain iteration is (nmN), where m is the number O of nodes. A detailed description of the Markov chain iteration is given below. For a system with cn E mN, a GMM approximation no longer benefits from the effect | |  of reduced information flow. Furthermore, it frequently happens that the posterior distribution is not well-approximated by small number of combination of Gaussian distribution. Just to mention a few, a fault detection problem [13] and estimating the indoor environment of a building system [26] are an example of non-Gaussian posterior distribution and high dimension system estimation respectively. In this chapter we briefly review a distributed particle filter based on exchang- ing particles and associated weights according to a Markov chain random walk, MCDPF, and prove the strong convergence of MCDPF to optimal filtering and its rate. MCDPF maintains the robustness of a distributed system since each node only needs local information and it scales well in the case of non-Gaussian and high dimen- sional systems. The convergence of particle filtering is shown generally in probability literature [8, 6] and [5] provides an excellent survey about standard particle filtering and proof of the convergence. The convergence result shown in this paper is based on the trivial modification of induction in time given in [5] but the effect of Markov chain iteration step is additionally considered in MCDPF setting and hence the con- vergence rate with respect to Markov chain property, the iteration step and spectral gap, is established.

3.2 Random walks on a graph

A sensor network system can modeled as a graph, G = (V,E), with a normalized adjacency matrix . The vertices V = 1, . . . , m correspond to nodes or sensors A { } in the network system and edges E represent the connection between sensors. The neighbors of node i is defined as Ni = i V : aij = 0 . Matrix is a Markov { ∈ 6 } A transition probability matrix defined on the graph because it satisfies

0, 1 = 1. A ≥ A CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 13

Consequently a random walk on the network system can be defined according to matrix and we assume no self-loop in the chain. Here we review several properties A of random walk on graph which are useful for a development of DPF.

Theorem 4. If is the normalized adjacency matrix of an undirected connected A graph G then Markov chain defined by has a unique stationary distribution Π and A for all i, Πi > 0. For any starting distribution

M( , k) lim · = Π. (3.1) k→∞ k where M( , k) Rm is a vector whose i-th elements is the number of visits to state i · ∈ during k steps. Furthermore M( , k) Rm converges to Π in distribution as k . · ∈ → ∞   M( , k) d √k · Π (0,V ). (3.2) k − −→N

Proof. See [43, Theorem 42.VII].

Theorem 5. If is the normalized adjacency matrix of an undirected graph G A then the stationary distribution of the Markov chain defined by is given by Π = d(i) A (Π1, Π2,..., Πm), where Πi = . d(i) is the degree of node i and E(G) is the 2|E(G)| | | number of edges of the graph.

Proof. We compute

m m X X d(i) Eij (Π )j = Πi ij = | | = Πj, (3.3) A A 2 E d(i) i=1 i=1 | | Pm where Eij is the number of edges connecting nodes i and j and Eij = d(j). | | i=1| | Since Π satisfies Π = Π , Π is the stationary distribution of the Markov chain defined A by . A Corollary 6. If G is d-regular connected graph then the stationary distribution of the Markov chain defined by the normalized adjacency matrix of G is the uniform 1 1 1 distribution Π = ( m , m ,..., m ), where m is the number of nodes. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 14

3.3 Particle filters

Suppose we have the general state space model,

xt+1 = f(xt, wt) (3.4)

yt = g(xt, vt), (3.5)

n p where xt R , yt R , and wt, vt are process and measurement noises respectively. ∈ ∈ We define two stochastic processes, X = Xt, t N and Y = Yt, t N , where { ∈ } { ∈ } X and Y are a signal process and an observation process respectively. The signal process X is a Markovian process with an initial distribution µ(x0) and transition kernel K(dxt xt−1) and the observation process Y is conditionally independent given | X. For simplicity, we assume that the kernel and distribution of Y attain Lebesgue measures. Z Pr(Xt A Xt−1 = xt−1) = K(xt xt−1)dxt (3.6) ∈ | A | Z Pr(Yt B Xt = xt) = ρ(yt xt)dyt (3.7) ∈ | B | where ρ(yt xt) is the transition probability density of a measurement yt given the | state xt.

The filtering problem is to estimate the true state xt at time t given the time of observations y1:t. The prediction and updating of the optimal filtering based on Bayes’ recursion are given as follows. Z p(xt y1:t−1) = p(xt−1 y1:t−1)K(xt xt−1) dxt−1 (3.8) | Rn | | ρ(yt xt)p(xt y1:t−1) p(xt y1:t) = R | | . (3.9) | n ρ(yt xt)p(xt y1:t−1) dxt R | | Analytic solutions for the posterior distribution in (3.9) do not generally exist except in special cases, such as linear dynamical systems with Gaussian noise. In the particle filtering setting, the posterior distribution is represented by a group of particles and associated weights so that the integral in (3.9) is approximated by the sum of discrete CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 15

values.

3.3.1 Centralized particle filters

Particle filtering is a recursive method to estimate the true state, given the of measurements [5, 8]. Suppose the posterior distribution at time t 1, − i N πt−1|t−1(dxt−1), is approximated by N particles x . Then we have { t−1}i=1

N N 1 X p(xt−1 y1:t−1) , πt−1|t−1(dxt−1) πt−1|t−1(dxt−1) = δxi (dxt−1). (3.10) N t−1 | ≈ i=1

i where particle i is at position xt−1 in state space. Now, particles go through the prediction and measurement update steps to approximate the posterior distribution at time t. Given N particles, new particles are sampled from the transition ker- i N 1 PN i nel density,x ˜ π K(dxt) = K(xt x ). This set of particles is the t ∼ t−1|t−1 N i=1 | t−1 approximation of πt|t−1,

N N 1 X p(xt y1:t−1) , πt|t−1(dxt) π˜t|t−1(dxt) = δx˜i (dx˜t). (3.11) N t | ≈ i=1

If the empirical distribution in (3.11) is substituted in (3.9), we have the following distribution approximating the posterior distribution p(xt y1:t). | N PN i ρ(yt xt)˜π (dxt) ρ(y x˜ )δ i (dx˜ ) N t|t−1 i=1 t t x˜t t π˜ (dxt) | = | (3.12) t|t , R N PN i n ρ(yt xt)˜π (dxt) dxt ρ(yt x˜ ) R | t|t−1 i=1 | t N X i = w δ i (dx˜ ) (3.13) t x˜t t i=1

PN i i where i=1 wt = 1 and wt are called the importance weights. To avoid the degeneracy problem, particles are selected according to a resampling step that samples N particles N 1 from the empirical distribution,π ˜t|t(dxt), and resets the weights to N . We then have CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 16

the empirical distribution approximating the posterior at time t given by

N N 1 X π (dxt) = δ i (dxt). (3.14) t|t N xt i=1

3.3.2 The Markov Chain Distributed Particle Filter (MCDPF)

The main difference between the CPF and DPF is that the CPF has a central unit to collect the entire measurements from all nodes and update particles using all measure- ments simultaneously. During the process of data collection, the CPF might suffer from bottlenecks in information flow. On the other hand, a DPF can overcome this problem by passing information only locally between connected nodes. If we have m nodes measuring the partial observations independently, then we can decompose the general state space model (3.5) as follows.

xt+1 = f(xt, wt) (3.15)     y1,t g1(xt, v1,t)      y   g (x , v )   2,t   2 t 2,t   .  =  .  . (3.16)  .   .      ym,t gm(xt, vm,t)

n pi Pm Here xt R , yi,t R with pi = p and subscript i represents node i. ∈ ∈ i=1 In addition, the measurement noise at each nodes is assumed to be uncorrelated, T E[vtvt ] = diag(R1,R2,...,Rm). Uncorrelated noise structure enables us to have con- ditionally independent measurements at each node, yi,t, given the true state xt. As a i consequence of this assumption, the function ρ(yt x ) in (3.9) can be factorized by a | t i product of ρj(yj,t x ) at each node, | t m i Y i ρ(yt xt) = ρj(yj,t xt). (3.17) | j=1 |

We propose a distributed particle filtering method using a random walk on the graph defined by the network topology. In the sensor network, node i measures the CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 17

partial observation yi,t at time t and data at every node has to be fused to reach the global estimation of the true state. While achieving a global estimate by exchang- ing data, it is desirable to maintain a robustness with respect to the unexpected changes of global properties such as losing a node. The DPF proposed here is robust since the information, consisting of particles and weights, is transferred only to the connected neighborhood of each node. In other words, every node only needs local information. Transferring particle data is inefficient for low-dimensional systems, but scales well (only linearly) with dimension size, as opposed to existing methods using GMM approximations of posterior distribution [19] [18] [48]. As briefly explained in section 3.1, communicating raw data is more efficient in terms of bandwidth capacity for relatively high-order systems. MCDPF moves particles around the network according to the Markov chain on the network defined by the normalized adjacency matrix to compute the impor- A tance weights. The main idea is that each particle gains ρi(yi,t xt) exponentially | proportional to the expected number of visit to node i. Suppose we have the graph G = (V,E) based on the sensor network and the normalized adjacency matrix . A In the MCDPF setting, the Markov chain is run k steps on every particle after the prediction step and the number of visits to the i-th node is defined by M(i, k). Con- 2|E(G)| sidering the number of visits to each node, each particle multiplies ρi(yi,t xt) kd(i) to | its previous weight every time it visits the i-th node. If we have N particles after k Markov chain steps at a node, then the posterior distribution of the MCDPF is given as follows.

2|E(G)| PN Qm i ×M(j,k) N ρ (y x˜ ) kd(j) δ i (dx˜ ) N i=1 j=1 j j,t t x˜t t X i i π˜t|t,k(dxt) = | 2|E(G)| = wt,kδx˜ (dx˜t). (3.18) PN Qm i ×M(j,k) t ρj(yj,t x˜ ) kd(j) i=1 i=1 j=1 | t

i The MCDPF is defined in algorithm 1 below. We use the notation xj,t for the i-th particle of node j at time t and N(j) for the number of particle at node j. Also i→j I is the indices of particles moving from node i to j in the current Markov chain step and we recall that is the adjacency matrix of the network. A CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 18

Algorithm 1 Markov Chain Distributed Particle Filter (MCDPF) Initialization: i N i N 1 xj,0 i=1 p(x0), wj,0 i=1 = N for j = 1, . . . , m Importance{ } ∼ Sampling:{ }For j = 1, . . . , m i N(j)  i N(j) i N(j) x˜ p xt x , w˜ = 1 { j,t}i=1 ∼ |{ j,t−1}i=1 { j,t}i=1 for k iterations do i N(j) i N(j) Move x˜·,t i=1 , w˜·,t i=1 according to matrix for j ={ 1 to} m do{ } A i N(j) S i x˜ = x˜ i∈I { j,t}i=1 l∈Nj { l,t} l→j i N(j) S i w˜j,t i=1 = l∈Nj w˜l,t i∈Il→j { } { } 2|E(G)| i N(j) i N(j) i N(j) kd(j) w˜j,t i=1 w˜j,t i=1 ρj(yj,t x˜j,t i=1 ) end{ for} ← { } × |{ } end for Resample: For j = 1, . . . , m Resample xi N(j) according to w˜i N(j) and set weights wi N(j) = 1 { j,t}i=1 { j,t}i=1 { j,t}i=1 N(j)

3.3.3 Convergence to CPF and algorithm

We will show that the empirical posterior distribution of the MCDPF converges weakly to that of the CPF as the Markov chain steps k per measurement goes to infinity. The notation that will be used throughout the proof mainly follows that of

[5]. In the stochastic filtering problem, functions at and bt are defined on a metric space (E, d) to itself and are considered as continuous maps from πt|t−1 πt|t and → k πt−1|t−1 πt|t−1, respectively. Additionally, a is also a continuous function mapping → t πt|t−1 πt|t,k, given by →

2|E(G)| Qm kd(j) ×M(j,k) k j=1 ρj(yj,t xt) p(xt y1:t−1) at (p(xt y1:t−1)) = | 2|E(G)| | . (3.19) | R Qm kd(j) M(j,k) n ρj(yj,t xt) p(xt y1:t−1) dxt R j=1 | | The perturbation cN is defined as a function that maps from a measure ν to a random sample size of size N of the measure, so that

N 1 X cN,w(ν) = δ , (3.20) N {Vj (w)} j=1 CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 19

n where Vj :Ω R is an IID random variable with the distribution ν and w Ω. For → ∈ N N N notational simplicity, let ht , h1:t be defined as the composition of functions at, bt, c as follows.

N N N N N N h c at c bt, h h h (3.21) t , ◦ ◦ ◦ 1:t , t ◦ · · · ◦ 1 N N k N N N N h c a c bt, h h h . (3.22) t,k , ◦ t ◦ ◦ 1:t,k , t,k ◦ · · · ◦ 1,k

Thus the posterior distribution of CPF and MCDPF at time t can then be expressed as

N N N N πt|t = ht (πt−1|t−1) = h1:t(π0) (3.23) N N N N πt|t,k = ht,k(πt−1|t−1,k) = h1:t,k(π0). (3.24)

N N To prove limk→∞ πt|t,k = πt|t, several lemmas are reviewed here.

k Lemma 7. Let (E, d) be a metric space with functions a , at, bt : E E such that t → k limk→∞ at = at pointwise for each t. Then

N N lim h1:t,k = h1:t. (3.25) k→∞ pointwise for each t and N.

N Proof. For e E and arbitrary t, we have c (bt(e)) E. Since we assumed pointwise ∈ ∈ k convergence of at to at, for all  > 0 there exists k(e, ε) such that for k > k(e, ε),

k N N a (c (bt(e))) at(c (bt(e))) < ε, (3.26) k t − k where is the supremum norm on functions from (E, d) to itself. Equivalently, k·k N N limk→∞ ht,k = ht pointwise for all t. By induction over t we have (3.25). Lemma 8. For the MCDPF and CPF as defined above,

k lim at = at (3.27) k→∞ pointwise for all t. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 20

Proof. For any e E, ∈

k lim at (e) at(e) (3.28) k→∞k − k Qm 2|E(G)| ×M(j,k) ρj(yj,t xt) kd(j) e(xt) ρ(y x )e(x ) = lim j=1 | t t t (3.29) 2|E(G)| R | k→∞ R Qm kd(j) M(j,k) − n ρ(yt xt)e(dxt) n ρj(yj,t xt) e(dxt) R R j=1 | | Qm ρj(yj,t xt)e(xt) ρ(y x )e(x ) j=1 | t t t = R Qm R | = 0. (3.30) n ρj(yj,t xt)e(dxt) − n ρ(yt xt)e(dxt) R j=1 | R | The first equality is due to theorem 4 and the second equality comes from of the measurements at each node.

Theorem 9. Consider a connected sensor network with measurements at different nodes conditionally independent given the true state. Then the estimated distribution of the MCDPF in Algorithm 1 converges weakly to the estimated distribution of the CPF as the number of Markov chain steps k per measurement goes to infinity. That is, N N lim πt|t,k = πt|t. (3.31) k→∞ pointwise.

Proof. Combining lemmas 7 and 8 with the optimal Bayesian filtering functions at, bt k and at gives (3.31).

3.3.4 Convergence to optimal filtering

So far we showed that the MCDPF converges weakly to the CPF. The next step is to prove the convergence of MCDPF to the optimal filtering distribution as N → ∞ as well as k . The convergence of the classical particle filter to optimal filtering → ∞ distribution is shown in [5, 24]. The only difference between the MCDPF and the k standard particle filter used in the proof is the function at , which needs to satisfy the following condition to ensure the convergence of MCDPF to optimal filtering. For all CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 21

sequences eN e E we have → ∈

k lim lim at (eN ) = at(e). (3.32) N→∞ k→∞

k This property is not obvious because the function at converges only pointwise to a function at. Fortunately, however, at is continuous, which is sufficient to give (3.32), as we see from the following lemma.

Lemma 10. Suppose (E, d) is a metric space and ak, a : E E are continuous → functions so that ak converges pointwise to a as k . For a convergent sequence → ∞ limN→∞ eN = e E, we have ∈

k k lim lim a (eN ) = lim lim a (eN ) = a(e). (3.33) N→∞ k→∞ k→∞ N→∞

k Proof. For functions a a pointwise and eN e E, we have → → ∈

k lim a (eN ) = a(eN ) for all N (3.34) k→∞ k lim lim a (eN ) = lim a(eN ) = a(e), (3.35) ⇒ N→∞ k→∞ N→∞ since a is continuous. Conversely, continuity of ak gives

k k lim a (eN ) = a (e) for all k (3.36) N→∞ k k lim lim a (eN ) = lim a (e) = a(e). ⇒ k→∞ N→∞ k→∞

k N Lemma 11. Consider at, bt from (3.9), at from (3.19), and c from (3.9). Assume k that the sequence at satisfies the property (3.32) for each t. Then we have

N lim lim h1:t,k = h1:t. (3.37) N→∞ k→∞

Moreover for all sequences eN e E we have → ∈

N lim lim h1:t,k(eN ) = h1:t(e). (3.38) N→∞ k→∞ CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 22

N Proof. From [5, Lemma 2], for all eN e E, c satisfies → ∈

N lim c (eN ) = e. (3.39) N→∞

Thus, for all eN e E and any continuous function bt, → ∈

N lim c (bt(eN )) = bt(e). (3.40) N→∞

From the property (3.32),

N lim c (bt(eN )) = bt(e) (3.41) N→∞ k N lim lim at (c (bt(eN ))) = at(bt(e)). (3.42) ⇒ N→∞ k→∞

And again from the property (3.39) of cN ,

k N lim lim at (c (bt(eN ))) = at(bt(e)) (3.43) N→∞ k→∞ N k N lim lim c (at (c (bt(eN )))) = at(bt(e)). (3.44) ⇒ N→∞ k→∞

N Thus we have limN→∞ limk→∞ ht,k(eN ) = ht(e) and from induction over t we can N conclude limN→∞ limk→∞ h1:t,k(eN ) = h1:t(e).

Recall that a kernel K is said to have the Feller property if Kϕ is a continuous bounded function whenever ϕ is a continuous bounded function. For such kernels we have the following result.

Lemma 12. Suppose at and bt are functions defined in (3.9). Then at is continuous provided the function ρ(yt ) is bounded, continuous and strictly positive. Furthermore, |· bt is a continuous function if the transition kernel K is Feller.

Proof. See [5, Section IV.B.].

Putting all the above lemmas together gives the following main result.

Theorem 13. Assume that the kernel K is Feller and the function ρ is bounded, continuous and strictly positive. Then the estimated distribution of the MCDPF in CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 23

Algorithm 1 converges to the optimal filtering distribution as the number of particles N and the number of Markov chain steps k per measurement go to infinity:

N lim lim πt|t,k = πt|t. (3.45) N→∞ k→∞

N Proof. For the initial µ0, we know that limN→∞ µ0 = µ0. From lemma 11, N N N lim lim πt|t,k = lim lim h1:t,k(µ0 ) = h1:t(µ0) = πt|t, (3.46) N→∞ k→∞ N→∞ k→∞ giving the desired result.

3.4 Strong convergence

In the previous section, the weak convergence of MCDPF to the optimal filtering distribution is proved. Another type of the convergence, which is a strong conver- gence of MCDPF is considered here: We say that the sequence of random probability ∞ n measures (µN ) converges to µ in a strong manner if for any ϕ B(R ), where N=1 ∈ B(Rn) is a set of Borel bounded measurable function,

 2 lim E ((µN , ϕ) (µ, ϕ)) = 0. (3.47) N→∞ − where we define Z Z (µ, ϕ) ϕµ, Kϕ(x) K(dz x)ϕ(z). (3.48) , , |

3.4.1 Preliminaries

We need couple of lemmas to prove the strong convergence of MCDPF. First the main results on Markov chain are reviewed. S. Meyn and ∞ R. Tweedie [32] described that if (Xi)i=0 is a countable state space Markov chain on a state space and it is irreducible and positive recurrent with its stationary X CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 24

distribution Π, then for g : R the random variable Wk(g) which is defined as X →

k−1 X Wk(g) = √k (g(Xi) Π(g)) (3.49) i=0 − converges to normal random variable with mean 0 and the

 2 τx0 −1 2 X γg = Π(x0)Ex0  [g(Xi) Π(g)]  . (3.50) i=0 −

where τx0 is the first returning time to the initial state. In addition, [32] showed that for that type of Markov chain a constant R 1 and the second largest eigenvalue ≥ modulus (SLEM), ν, satisfy P i(g) Π(g) Rνi, where P i is the distribution of | x − | ≤ x Xi with X0 = x. Now we have the following theorem in [49, Corollary 1] on the convergence of the moment generating function.

Theorem 14. If there is a positive c such that g(x) Π(g) c for all x, then for | − | ≤ any λ 1/(√3 3L L0), all k 1, and all x , ≤ ∨ ≥ ∈ X  √ −1/2 0 0 (λL0)2 −1/2 ( 3 3λL)2 Ex exp λWk(g) E exp λγgX k V (x) C L λe + k Ce { } − { } ≤ (3.51) C0L0λ C  + k−1 + k−3/2 . 1 (λL0)2 1 (√3 3λL)2 − − (3.52) where X is a standard normal random variable and V (x) is a function to ensure the V-uniform of a Markov chain. Furthermore the following positive constants C, L, C0 and L0 depend on SLEM, ν.

2 r ! (4e) + 3 + 2(1 ν) log 3 25 −1/2 2 C = − ,L = e 2·3 R 32 (1 ν) 2 √e , (3.53) 1 ν − ∨ 1 ν − − r r 2R 2R log 3 C0 = ,L0 = e 2(2·3+1) . (3.54) 1 ν 1 ν − − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 25

Proof. The proof of this theorem is given in [49, Section 4, Section 5.1]. Here we will show how constants L, L0,C and C0 are related to SLEM. The upper bound of the error of 2n-th moments given in [49, (42)] is reduced to

2n 2 r !! −1 n(2n)! (4e) + 3 + 2(1 ν) log 3 25 −1/2 2 k − e 2·3 R 32 (1 ν) 2 √e (n 1)! 1 ν − ∨ 1 ν − − −  n! 1 + V (x), (3.55) × k

With defined C,L the error of even moments is bounded by   2n 2 n −1 2n n(2n)! n! ExWk(g) (2n 1)(γ ) k CL 1 + V (x). (3.56) − − g ≤ (n 1)! k − Similarly for the error of odd moments, the upper bound given in [49, (43)] is written as r r !2n+1 (2n + 1)! 2R 2R log 3  n! k−1/2 e 2(2·3+1) 1 + V (x). (3.57) n! 1 ν 1 ν k − − Using C0,L0 the error of odd moments is bounded by   2n −1/2 0 02n+1 (2n + 1)! n! ExWk(g) k C L 1 + V (x). (3.58) ≤ n! k

Hence the error of the moment generating function (3.51-3.52) is given by

Ex exp λWk(g) E exp λγgX (3.59) { } − { } ∞ X λ2nn2  n! λ2n+1  n! V (x) k−1CL2n 1 + + k−1/2C0L02n+1 1 + (3.60) n! k n! k ≤ n=0  √ 0 2 3 2 k−1/2V (x) C0L0λe(λL ) + k−1/2Ce( 3λL) (3.61) ≤ C0L0λ C  + k−1 + k−3/2 . 1 (λL0)2 1 (√3 3λL)2 − − The following lemma is a trivial inequality from [24, Lemma 7.2]. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 26

Lemma 15. Let Y be a random variable. If p-th moment of Y is finite, E Y p < , | | ∞ then for any p 1, ≥ E Y E(Y ) p 2pE Y p. (3.62) | − | ≤ | | Proof. From Jensen’s inequality with p 1, we have ≥

(E Y )p E Y p. (3.63) | | ≤ | |

Minkowski’s inequality gives

(E Y E(Y ) p)1/p (E Y p)1/p + ( EY p)1/p = (E Y p)1/p + EY (3.64) | − | ≤ | | | | | | | | 2 (E Y p)1/p . (3.65) ≤ | |

The last inequality is from (3.63).

The equality in the following lemma used in [5] without proof is reviewed here.

i Lemma 16. In the particle filter setting, let x be particles at time t 1 and t−1 { t−1} − G be a σ-algebra generated by xi . Then for any ϕ B(Rn) we have { t−1} ∈

 N  N E (π , ϕ) t−1 = (π , Kϕ). (3.66) t|t−1 | G t−1|t−1

Proof. We have

" N # 1 X E (πN , ϕ)  = E ϕ(xi) (3.67) t|t−1 t−1 N t t−1 | G i=1 | G

= E [ϕ(xt) t−1] (3.68) | G Z N 1 X j = ϕ(xt) K(dxt xt−1) (3.69) n N R j=1 | N Z 1 X j = K(dxt xt−1)ϕ(xt) (3.70) N n j=1 R | N 1 X = Kϕ(xj ) (3.71) N t−1 j=1 CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 27

N = (πt−1|t−1, Kϕ). (3.72)

The third equality is from the definition in (3.48).

3.4.2 Proof of strong convergence

We start this section by proving the error bound on the difference of likelihood as- signed to one particle in the case of the CPF and MCDPF. A weight of particles of the CPF is assigned with respect to the entire measurement whereas the weight of particles of the MCDPF is assigned sequentially by jumping around the nodes according to Markov chain. Hence understanding how those weights determined by likelihood are different is a fundamental step for the proof.

Lemma 17. Suppose we have uniformly ergodic MCDPF and ρt,k and ρt are functions defined respectively as

m m Y 2|E(G)| ×M(j,k) Y ρt,k = ρj(yj,t xt) kd(j) , ρt = ρj(yj,t xt). (3.73) j=1 | j=1 |

With the following constant c,

2|E(G)| d(j) ρj c = max ln (3.74) j ρt

? 2 1 there exists k such that √ √3 and we have the upper bound on the expected k? ≤ 3L∨L0 ? error between ρt,k and ρt for k k , ≥

 2 2 E ρt,k ρt ρ Φ(k, ν). (3.75) | − | ≤ t where

√ √   0 2 02   3 2 3 2  1 0 0 (2L ) L (2 3L) ( 3L) Φ(k, ν) = 2C L e k + e k + C e k + 2e k (3.76) k  1 1  + 2C0L0 + (3.77) k (2L0)2 k (L0)2 − − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 28

   2 2  1 2 2γg γg + C + + e k 2e 2k + 1 . (3.78) k (2√3 3L)2 k (√3 3L)2 − − − where positive constants L, L0,C and C0 depend on the SLEM of Markov chain tran- sition matrix as before.

Proof. For the sake of a simple notation the argument of a function ρj is omitted. i N Also let t be a σ-field generated by particles x , then F { t}i=1

 m m !2  2|E(G)| ×M(j,k)  2  Y kd(j) Y E ρt,k ρt t = xt = E  ρj ρj t = xt (3.79) | − | | F j=1 − j=1 F

m  m !2  2|E(G)| ×M(j,k)−1 Y 2 Y kd(j) = ρj E  ρj 1 t = xt (3.80) j=1 j=1 − F

 m !2  2|E(G)| M(j,k) − d(j) 2 Y d(j) ( k 2|E(G)| ) = ρt E  ρj 1 t = xt (3.81) j=1 − F

 m !2  2|E(G)| Z 2 Y d(j) k,j = ρt E  ρj 1 t = xt (3.82) j=1 − F

 Pm 2  2 Cj Zk,j = ρ E e j=1 1 t = xt . (3.83) t − F

2|E(G)| M(·,k) d(·) ∞ where Cj = ln ρj and Zk,· = . Let (Xi) be Markov chain on d(j) k − 2|E(G)| i=0 the state space and the function g : R be defined as g(Xi) = I(Xi = j)Cj X X → for j = 1, . . . , m. Now we have

m m m X X M(j, k) X d(j) CjZk,j = Cj Cj (3.84) k − 2 E(G) j=1 j=1 j=1 | | m m X M(j, k) X = C ln ρ (3.85) j k j j=1 − j=1 k−1 1 X = g(X ) ln ρ (3.86) k i t i=0 − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 29

k−1 1 X = g(X ) Π(g). (3.87) k i i=0 −

Pm Defining Wk(g) = √k j=1 CjZk,j and Markov chain central limit give the conver- 2 gence of Wk(g) to the normal random variable with mean 0 and the variance γg defined in (3.50). With the definition of Wk(g) and a standard normal random variable X, the following expectation over Wk(g) is

 2  √1 W (g)  h √2 W (g) √1 W (g) i E e k k 1 = E e k k 2e k k + 1 (3.88) − − 2 2 1 1 h √ W (g)i h √ γgX i h √ W (g)i h √ γgX i E e k k E e k + 2 E e k k E e k ≤ − − 2 1 h √ γgX i h √ γgX i + E e k 2E e k + 1 (3.89) − 2 2 1 1 h √ W (g)i h √ γgX i h √ W (g)i h √ γgX i = E e k k E e k + 2 E e k k E e k − − 2 2 2γg γg + e k 2e 2k + 1. (3.90) −

Since √1 is decreasing sequence for k = 1, 2,... there exists k? such that √2 1 k k? ≤ L∨L0 from theorem 14 and c defined in lemma satisfies the condition g(x) Π(g) = | − | ? Cj ln ρt c. Therefore for k k | − | ≤ ≥

0 2 √3 2 √3 2  1 2    (2L ) L02   (2 3L) ( 3L)   √ Wk(g)  1 0 0 E e k 1 t = xt 2C L e k + e k + C e k + 2e k − | F ≤ k  1 1  + 2C0L0 + (3.91) k (2L0)2 k (L0)2 − −  1 2  + C + (3.92) k (2√3 3L)2 k (√3 3L)2 2 − 2 − 2γg γg + e k 2e 2k + 1. (3.93) −

2 Multiplying ρt gives the desired result.

Now the strong convergence of MCDPF will be proved through each steps of MCDPF such as prediction, measurement and resampling step. This inductive proof is a trivial modification of [5, Lemma 3-5]. The main difference, however, is that the CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 30

effect of Markov chain step, k, has to be considered and furthermore the mean square error in terms of SLEM is argued in the main theorem. The following lemma is about the prediction update step.

Lemma 18. Let us assume that for any ϕ B(Rn) ∈  2 h N 2i 2 p √ct−1|t−1 E (πt−1|t−1, ϕ) (πt−1|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + , (3.94) − ≤ k k √N then we have

 2 h N 2i 2 p √ct|t−1 E (πt|t−1, ϕ) (πt|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + . (3.95) − ≤ k k √N

2 where ct|t−1 = (2 + √ct−1|t−1) .

Proof. From Minkowski’s inequality

 N  21/2  N N 21/2 E ((π , ϕ) πt|t−1, ϕ ) E ((π , ϕ) (π , Kϕ)) (3.96) t|t−1 − ≤ t|t−1 − t−1|t−1  N  21/2 + E ((π , Kϕ) πt−1|t−1, Kϕ ) . t−1|t−1 − (3.97)

The first term on the right hand side is bounded above as follows. From lemma 16 and lemma 15,

 2   N N 2  N N E (π , ϕ) (π , Kϕ) t−1 = E (π , ϕ) E[(π , ϕ) t−1] t−1 | t|t−1 − t−1|t−1 | | G t|t−1 − t|t−1 | G | G (3.98) " N # 1 X 2 = E ϕ(xi) E[ϕ(xi) ] N 2 t t t−1 t−1 i=1 − | G | G N 1 X  2  = E ϕ(xi) E[ϕ(xi) ] N 2 t t t−1 t−1 i=1 − | G | G N 4 X E ϕ2(xi)  (3.99) N 2 t t−1 ≤ i=1 | G CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 31

4  N 2  = E (π , ϕ ) t−1 (3.100) N t|t−1 | G 4 = (πN , Kϕ2). (3.101) N t−1|t−1

Since Markov operators are contraction [28], Kϕ ϕ ∞, k k ≤ k k

 N N 2  4 2 E (π , ϕ) (π , Kϕ) t−1 ϕ . (3.102) | t|t−1 − t−1|t−1 | | G ≤ N k k∞

The upper bound of the second term is given in the condition of lemma.

 2  N  2 2 p √ct−1|t−1 E (πt−1|t−1, Kϕ) πt−1|t−1, Kϕ ϕ ∞ φt−1 Φ(k, ν) + . | − | ≤ k k √N (3.103) And so

1/2   h N 2i p √ct|t−1 E (πt|t−1, ϕ) (πt|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + (3.104) − ≤ k k √N where √ct|t−1 = 2 + √ct−1|t−1.

Given the result of the prediction step together with lemma 17 the following lemma gives the error bound after Markov chain iteration step.

Lemma 19. Let us assume that for any ϕ B(Rn) ∈  2 h N 2i 2 p √ct|t−1 E (πt|t−1, ϕ) (πt|t−1, ϕ) ϕ ∞ φt−1 Φ(k, ν) + (3.105) − ≤ k k √N and  2 2 E ρt,k ρt ρ Φ(k, ν). (3.106) | − | ≤ t Then we have

h i  c 2 N 2 2 ˜ p √ t|t−1 E (πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ) ϕ ∞ φt Φ(k, ν) + , (3.107) − ≤ k k √N where φ˜t = φt−1 + ρt. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 32

Proof. From Minkowski’s inequality,

 N 21/2  N N 21/2 E ((π , ρt,kϕ) (πt|t−1, ρtϕ)) E ((π , ρt,kϕ) (π , ρtϕ)) t|t−1 − ≤ t|t−1 − t|t−1 (3.108)  N 21/2 + E ((π , ρtϕ) (πt|t−1, ρtϕ)) . t|t−1 − (3.109)

The first term in the right hand side is bounded as follows.

 N !2  N N 2 1 X i i E ((πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ)) = E  (ρt,kϕ(xt) ρtϕ(xt))  N 2 − i=1 − " N # 1 X E (ρ ϕ(xi) ρ ϕ(xi))2 (3.110) N t,k t t t ≤ i=1 − " 2 N # ϕ ∞ X 2 E k k (ρt,k ρt) (3.111) N ≤ i=1 − ϕ 2 ρ2Φ(k, ν). (3.112) ≤ k k∞ t

The first inequality is a trivial application of H¨oelder’s inequality. And so    N 21/2 p √ct|t−1 E ((πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ)) ϕ ∞ (φt−1 + ρt) Φ(k, ν) + . − ≤ k k √N

The next MCDPF step is a measurement update and the following lemma provides the error bound after the measurement step.

Lemma 20. Let us assume that for any ϕ B(Rn) ∈ h i  c 2 N 2 2 ˜ p √ t|t−1 E (πt|t−1, ρt,kϕ) (πt|t−1, ρtϕ) ϕ ∞ φt Φ(k, ν) + , (3.113) − ≤ k k √N then we have

p !2 h N 2i 2 p c˜t|t E (˜πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + . (3.114) − ≤ k k √N CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 33

Proof. We have

N N (πt|t−1, ρt,kϕ) (πt|t−1, ρϕ) (˜πt|t,k, ϕ) (πt|t, ϕ) = N (3.115) − (πt|t−1, ρt,k) − (πt|t−1, ρ) N N N (πt|t−1, ρt,kϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρϕ) = N + . (πt|t−1, ρt,k) − (πt|t−1, ρ) (πt|t−1, ρ) − (πt|t−1, ρ) (3.116) where

N N N N (π , ρt,kϕ) (π , ρt,kϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρt,k) (πt|t−1, ρ) t|t−1 t|t−1 − N = N (3.117) (πt|t−1, ρt,k) − (πt|t−1, ρ) (πt|t−1, ρt,k)(πt|t−1, ρ)

N ϕ ∞ (πt|t−1, ρt,k) (πt|t−1, ρ) k k − . (3.118) ≤ (πt|t−1, ρ)

Also

N N (π , ρt,kϕ) (π , ρϕ) (πt|t−1, ρt,kϕ) (πt|t−1, ρϕ) t|t−1 t|t−1 − = . (3.119) (πt|t−1, ρ) − (πt|t−1, ρ) (πt|t−1, ρ)

Using Minkowski’s inequality again gives

 N 21/2 ϕ ∞  N 21/2 E ((˜πt|t,k, ϕ) (πt|t, ϕ)) k k E ((πt|t−1, ρt,k) (πt|t−1, ρ)) (3.120) − ≤ (πt|t−1, ρ) −

1  N 21/2 + E ((πt|t−1, ρt,kϕ) (πt|t−1, ρϕ)) (πt|t−1, ρ) − (3.121)   2 ϕ ∞ p √ct|t−1 k k φ˜t Φ(k, ν) + . (3.122) ≤ (πt|t−1, ρ) √N

2φ˜t 4ct|t−1 where φt = andc ˜t|t = 2 . (πt|t−1,ρ) (πt|t−1,ρ) The next lemma provides the error bound after the resampling step which is the final step of an inductive proof. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 34

Lemma 21. Let us assume that for any ϕ B(Rn) ∈ p !2 h N 2i 2 p c˜t|t E (˜πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + , (3.123) − ≤ k k √N then we have

 2 h N 2i 2 p √ct|t E (πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + . (3.124) − ≤ k k √N

Proof. We have

N N N N (π , ϕ) (πt|t, ϕ) = (π , ϕ) (˜π , ϕ) + (˜π , ϕ) (πt|t, ϕ). (3.125) t|t,k − t|t,k − t|t,k t|t,k −

i N If t is a σ-algebra generated by particles x , then H { t}i=1

N N  N  1 X i 1 X i i N E (π , ϕ) t = Eϕ(x ) = w ϕ(x ) = (˜π , ϕ). (3.126) t|t,k N t N t t t|t,k H i=1 i=1

Thus by the same procedure from (3.98) to (3.101) in lemma 18, for some constant C¯, 2  N N 2  ϕ ∞ E ((π , ϕ) (˜π , ϕ)) t C¯ k k . (3.127) t|t,k − t|t,k H ≤ N Finally we have ! pc˜  N 21/2 p ¯ ϕ ∞ p t|t E ((πt|t,k, ϕ) (πt|t, ϕ)) C k k + ϕ ∞ φt Φ(k, ν) + . (3.128) − ≤ √N k k √N

¯ p where √ct|t = √C + c˜t|t.

Putting Lemma 18, 20 and 21 together gives the following theorem of the conver- gence of MCDPF.

Theorem 22. Under the assumptions made in theorem 9 there exists a time depen- dent constant ct|t and constant Φ(k, ν) dependent of Markov chain step k and SLEM CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 35

ν for all t 0 satisfying ≥ 1/2   h N 2i p √ct|t E (πt|t,k, ϕ) (πt|t, ϕ) ϕ ∞ φt Φ(k, ν) + . (3.129) − ≤ k k √N

Therefore the error of MCDPF converges proportional to ( √1 + O k √1 ) and increases proportional to ( √1 e1/δ) as δ 0 where δ is spectral gap δ = N O δ → 1 ν. − Proof. The root mean square error of MCDPF in terms of the number of the particle N is easily obtained from the upper bound in (3.129). Also we have Φ(k, ν) = (k−1) O for k k? and a fixed ν as given in (3.76-3.78). Similarly if k is fixed, then Φ(k, ν) = √ ≥2 3 2 e1/δ (2 3L) −1 −1 ( ) because Ce k term dominates and C = (δ ) and L = (δ ). O δ O O

3.5 Numerical certificate of strong convergence

In [29] the performance of the MCDPF is illustrated with a bearing-only tracking example and a relation between the root mean square error (RMSE) and Markov chain step is plotted for the purpose of demonstrating the error behavior. The rate of error, however, is thoroughly proved in this paper and the decay rate of RMSE numerically verifies the result of the main theorem 22. We consider a bearing-only tracking example in this section for the purpose of demonstrating the performance of the MCDPF. The dynamic model was a time- dependent linear system but the measurement model was nonlinear, with one moving target tracked by 4 bearing sensors linked in a network. There were two modes for the movement of the target, straight and turning mode. The target moved with linear dynamics, turning to the right by 90 degrees between 0.5 and 1, 2 and 2.5, and

3.5 and 4 seconds. The state vector is [xt yt x˙ t y˙t] and the state-space system and measurements at each sensor were given by

Ft∆t xt+1 = e xt + qt, (3.130)  i  i yt s (y) i θt = arctan − i + rt, (3.131) xt s (x) − CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 36

where   0 0 1 0   ( 0 0 0 1  0 Straight mode Ft =   , at = 0 0 0 a  π Turning mode  t 2×51∆t 0 0 at 0 −

  ∆t3 ∆t2  3 0 2 0   ∆t3 ∆t2  i 2   0 3 0 2  rt (0, 0.05 ), qt 0,  2  . ∼ N ∼ N   ∆t 0 ∆t 0    2  ∆t2 0 2 0 ∆t

i i i Here ∆t = 0.01, (s (x), s (y)) was the position of ith sensor, and rt was the mea- surement noise. Each sensor was connected to its nearest two neighboring sensors. The true trajectory of the moving target was estimated by the CPF and MCDPF. The centralized particle filter tracked the true trajectory with N = 400 particles and bearing information gathered from all four different sensors. The trajectory was also estimated by the MCDPF with N = 400 particles at each node. Figures 3.1 shows the estimation results of the CPF and MCDPF with k = 4 Markov chain steps per measurement for the MCDPF. Even with such a small num- ber of Markov chain steps per measurement, the MCDPF at each sensors obtained a reasonable estimate of the true trajectory by exchanging information only with con- nected neighbor according to a random walk of particles and weights on the sensor network. To numerically study the convergence of the posterior distribution of the MCDPF at each node to the posterior distribution of the CPF (as proved in theorem 9), we define the root mean square error (RMSE) to be

 21/2 RMSE(ˆxDPF) = E xˆDPF xˆopt (3.132) k − k  21/2 E xˆDPF xˆCPF . (3.133) ≈ k − k

Figure 3.2 shows the RMSE versus the number of PF execution and the number CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 37

2.5 2.5 True trajectory True trajectory 2 CPF estimate 2 Position of sensors Position of sensors Particles 1.5 Particles 1.5 DPF estimates

1 1

0.5 0.5

0 0 2 2 x x −0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2

−2.5 −2.5

−3 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x 1 1

Figure 3.1: The trajectory estimation by CPF and DPF with Markov chain steps k = 4.

−1 10 k=50 k=106 k=224 k=473 k=1000

−2 10 −3 10 RMSE RMSE

−3 10 1

22

−4 10

−4 10 2 3 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 10 5 Number of PF execution x 10 Markov chain step k per measurement

Figure 3.2: RMSE of MCDPF and CPF with respect to number of execution (left) and different Markov chain steps k (right). of Markov Chain steps k per measurement. The RMSE is computed at time t = 0.1 during the simulation. The number of PF executions is increased up to 2.1 105 × so that we can remove non-sampling error of CPF and MCDPF. The RMSE on the right side of figure 3.2 is thus the value by averaging 2.1 105 CPF and MCDPF × executions. We can see that the decay rate of RMSE in the figure approaches to k−1/2 where k? = 216 in this example. This figure numerically verifies the strong convergence and its rate, k−1/2 stated in theorem 22. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 38

3.6 Performance comparison

In general, particle filters win over the EKF or the estimation result from EKF even diverges when the nonlinearity of the system becomes severe. To confirm the benefit of the MCDPF over the DEKF, performances of the MCDPF and the DEKF are compared using a highly nonlinear system, a flocking model with various system parameters.

3.6.1 Extended Kalman filter

In this section the Extended Kalman Filter (EKF) [3] in information filter form is reviewed and the distributed EKF using a consensus scheme is introduced. Suppose we have the following nonlinear system with Gaussian noise.

xt+1 = f(xt) + qt (3.134)

zt = h(xt) + rt. (3.135)

n p where xt R , zt R , and wt (0,Qt), rt (0,Rt) are process and measure- ∈ ∈ ∼ N ∼ N ment noises respectively. The following notations are used for state estimation.

xˆt|t−1 = E(xt z1:t−1), xˆt|t = E(xt z1:t). (3.136) | |

Here z1:t is the time series of measurements, z1,..., zt.

Centralized approach

The EKF estimates the state of a nonlinear system by linearizing the system and measurement equations around the previously estimated state. Given the of the initial state, x0 (x¯0,P0), the EKF initializes the information ∼ N matrix and information state as follows.

−1 −1 Y0|0 = P0 , y0|0 = Y0|0x¯0, xˆ0|0 = Y0|0 y0|0. (3.137) CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 39

For the prediction step,

−1  −1 T  Yt|t−1 = Ft−1Yt−1|t−1Ft−1 + Qt (3.138)

yt|t−1 = Yt|t−1f(xˆt−1|t−1) (3.139) −1 xˆt|t−1 = Yt|t−1yt|t−1. (3.140)

The matrices Ft−1 is Jacobian of the function f(xt) evaluated at xˆt−1|t−1.

∂f Ft−1 = . (3.141) ∂x xˆt−1|t−1

For the measurement step,

T −1 Yt|t = Yt|t−1 + Ht Rt Ht (3.142) T −1 yt|t = yt|t−1 + Ht Rt z˜t (3.143) −1 xˆt|t = Yt|t yt|t (3.144)

where z˜t = zt h(xˆt|t−1) + Htxˆt|t−1 and similarly Ht is Jacobian of the measurement − equation h(x ), t ∂h Ht = . (3.145) ∂x xˆt|t−1

Distributed approach

In the distributed approach we assume there are m nodes measuring the partial observations independently as follows.

xt+1 = f(xt) + qt (3.146)     z1,t h1(xt) + r1,t      z   h (x ) + r   2,t   2 t 2,t   .  =  .  . (3.147)  .   .      zm,t hm(xt) + rm,t CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 40

pi Pm Here zi,t R with pi = p and subscript i is the node index. In addition, the ∈ i=1 measurement noise at each node is assumed to be uncorrelated with that at other T nodes, E[rtrt ] = diag(R1,t,R2,t,...,Rm,t). The initialization and prediction steps are the same as for the centralized approach, except for the fact that the same procedure is repeated for every node. The measurement step is distributed for i = 1, . . . , m,

m T −1 X T −1 Yi,t|t = Yi,t|t−1 + Ht Rt Ht = Yi,t|t−1 + Hi,tRi,t Hi,t (3.148) i=1 m T −1 X T −1 yi,t|t = yi,t|t−1 + Ht Rt z˜t = yi,t|t−1 + Hi,tRi,t z˜i,t (3.149) i=1 −1 xˆi,t|t = Yi,t|tyi,t|t. (3.150)

Again z˜i,t = zi,t hi(xˆi,t|t−1) + Hi,txˆi,t|t−1. − Consensus is an iterative process for multiple agents to reach a common value, the average for example. A sensor network system can be modeled as a graph, G = (V,E), with a normalized adjacency matrix . The vertices V = 1, . . . , m correspond to A { } nodes or sensors in the network, while edges E represent the connections between sensors. The neighbors of node i are given by Ni = j V : ij = 0 . The { ∈ A 6 } Distributed Kalman Filter (DKF) [36, 37] is implemented with a consensus scheme and the DEKF is able to be distributed in exactly the same way as the DKF if system and measurement matrices are replaced by the Jacobian matrices Fi,t and Hi,t. When new measurements arrive at time t, the following terms are initialized.

T −1 T −1 Si,t = Hi,tRi,t Hi,t, si,t = Hi,tRi,t z˜i,t. (3.151)

Then the average values of Si,t and si,t are obtained by the following iterative con- sensus scheme [53].

X X Si,t Si,t + wij(t)(Sj,t Si,t) , si,t si,t + wij(t)(sj,t si,t) . (3.152) ← − ← − j∈Ni(t) j∈Ni(t) CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 41

While many weight choices are possible, we consider the Metropolis weights,

1 wij(t) = (3.153) 1 + max di(t), dj(t) { } where di(t) is the degree of node i. The average values of Si,t and si,t then converge exponentially to the true values. Then we can write the measurement step after the consensus procedure as

T −1 Yi,t|t = Yi,t|t−1 + Ht Rt Ht = Yi,t|t−1 + mSi,t (3.154) T −1 yi,t|t = yi,t|t−1 + Ht Rt z˜t = yi,t|t−1 + msi,t. (3.155)

3.6.2 Numerical example

We use a flocking model with a control law introduced in [50]. The model has K vehicles which move on the plane with the following dynamics.

    x1,t f1(xt)  .   .  xt =  .  , f(xt) = xt +  .  ∆t (3.156)     xK,t fK (xt) where     xi,t vi,t cos θi,t     yi,t vi,t sin θi,t  xi,t =   , fi(xt) =   (3.157)     θi,t   wi,t  vi,t ai,t for i = 1,...,K, where the state variables represent the position, orientation and translational speed of the vehicles. The variables wi,t and ai,t are control inputs and are given as

ai,t = x Vi,t cos θi,t y Vi,t sin θi,t (3.158) −∇ i,t − ∇ i,t X xi,t Vi,t sin θi,t yi,t Vi,t cos θi,t wi,t = k vi,t vj,t (θi,t θj,t) + ∇ − ∇ (3.159) − | || | − vi,t j∼i | | CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 42

where k is a control gain. The function Vi,t is given as

1 2 Vij,t(xt) = 2 + log rij,t (3.160) rij,t k k k Xk Vi,t(xt) = Vij,t(xt) (3.161) j6=i

2 2 2 where rij,t = (xi,t xj,t) +(yi,t yj,t) . The covariance matrix of a process noise Qt k k − − is chosen randomly and is proportional to ∆t2. The range-only measurement model is used and thus hj(xt) is given as follows.

 2 2 1/2  ((x1,t pj) + (y1,t pj) )  − . −  hj(xt) =  .  . (3.162)   2 2 1/2 ((xK,t pj) + (yK,t pj) ) − − The covariance of the measurement noise is proportional to the estimated distance and is given as  2 2 1/2  ((ˆx1,t pj) + (ˆy1,t pj) )  − . −  Rj,t = diag  .  . (3.163)   2 2 1/2 ((ˆxK,t pj) + (ˆyK,t pj) ) − − In the sensor network, each node is connected to two neighbors, forming a square in figure 3.3.

Regularizing the EKF and DEKF

It is known that the EKF may diverge for several reasons, including a poor linear approximation and very different measurement noise levels [39]. Divergence of the estimator is defined to be when the RMSE defined in (3.164) is larger than the RMSE for the null estimator that always returns the origin as the state estimate. When divergence occurs, the information matrices Yt|t−1 and Yt|t become ill-conditioned after the prediction or measurement update steps. To avoid divergence we use two methods of regularization. First, the system equations are modified so that the denominators 2 in (3.159) and (3.160) are vi,t +  and rij,t + , respectively, for a small value | | k k CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 43

Table 3.1: Table of the algorithms, RMSE values and the fraction of divergence (Averaged over 1000 Monte Carlo Runs)

Algorithm (4 vehicles), ∆t = 0.1 RMSE Divergence EKF N/A 49.7% DEKF (kcon = 2) N/A 62.3% Regularized EKF 1.1986 2.1% Regularized DEKF (kcon = 2) 2.5427 2.3% CPF (N = 100) 0.2871 0% MCDPF (N = 100, kmc = 10) 0.8011 0%

. Second, if the condition number of the information matrix exceeds 106 then we constrain its singular values to lie within [1, 106] by taking an SVD and capping values outside this range. These values were chosen heuristically based on cases where the estimation result was fairly good.

Performance comparison

The estimator tracks the position of 4 vehicles with 4 sensor nodes with range-only measurements. The number of particles is 100 for both the CPF and MCDPF in total, which means that each node maintains 25 particles on average for the MCDPF.

The number of Markov chain iterations is kmc = 10 and the number of consensus iterations is kcon = 2. The estimation results are shown in fig. 3.3. The time step is ∆t = 0.1 with one measurement per timestep. The performance of the different state estimations is evaluated by comparing the root mean squared error (RMSE) of the estimation result,

T/∆t K ∆t X X q RMSE = (x xˆ )2 + (y yˆ )2 (3.164) TK i,t i,t|t i,t i,t|t t=1 i=1 − − where the total simulation time is T = 4. The comparison of results is given in table 3.1. Note that the RMSE of the EKF and DEKF is noted as N/A because of the large fraction of divergent runs. By using CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 44

8 Starting point Starting point 6 Position of sensors Position of sensors True trajectory 6 True trajectory EKF REKF 4 DEKF RDEKF Sensor connection 4 Sensor connection

2 2 y y 0 0

−2 −2

−4 −4

−6 −6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 x x

8 Starting point Position of sensors 6 True trajectory CPF MCDPF 4 Sensor connection

2 y

0

−2

−4

−6 −6 −4 −2 0 2 4 6 x

Figure 3.3: Trajectory of flocking model and its position estimation by EKF, DEKF, REKF, RDEKF, CPF and MCDPF. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 45

2 EKF 1.8 DEKF REKF 1.6 RDEKF 1.4 CPF MCDPF 1.2

1

Position Error 0.8

0.6

0.4

0.2

0 0 0.5 1 1.5 2 2.5 3 3.5 4 Time

Figure 3.4: RMSE versus time for EKF, DEKF, REKF, RDEKF, CPF, MCDPF. regularization we can see that the divergence is suppressed but we still see a large RMSE. The position error at each time during the simulation is given in fig. 3.4. We observe that the EKF and DEKF quickly diverge at around t = 2.2 while the regularized EKF (REKF) and regularized DEKF (RDEKF) do not diverge.

Effect of ∆t

As we decrease ∆t, we expect the error of EKF and DEKF to become negligible and the performance of EKF to beat that of CPF for very small ∆t since the nonlinear system is well approximated by linearization. With the same numerical example using a smaller ∆t = 0.05 the same performance comparison is shown in table 3.2. With the faster measurement rate, the RMSE for the EKF, DEKF, REKF and RDEKF decreases much more significantly than that of the CPF and MCDPF. Furthermore it becomes very unlikely that we have a divergent run. This illustrates the fact that for sufficiently small ∆t the Kalman filter based approaches are superior, whereas for large ∆t with significant nonlinearities the particle filters are better. This is due to the fact that the accuracy of the linearization varies inversely with ∆t. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 46

Table 3.2: Table of the algorithms, RMSE values and the fraction of divergence (Averaged over 1000 Monte Carlo Runs)

Algorithm (4 vehicles), ∆t = 0.05 RMSE Divergence EKF 0.1236 0% DEKF (kcon = 2) 0.1733 0.1% Regularized EKF 0.1164 0% Regularized DEKF (kcon = 2) 0.1579 0% CPF (N = 100) 0.1921 0% MCDPF (N = 100, kmc = 10) 0.7453 0%

Effect of system complexity

We now change the number of vehicle to 10, so the dimension of the system is increased from 16 to 40. The same timesteps, ∆t = 0.1 and ∆t = 0.05, are used and the RMSE is shown in table 3.3. We see that increased system complexity dramatically reduces the performance of the Kalman filter based approaches, which no longer give acceptable performance even for the smaller timestep. In contrast, the particle filters (including the MCDPF) are robust with respect to the increase in system complexity, with very little loss of accuracy. We expect that much smaller ∆t would eventually stabilize the EKF, DEKF and improve the RMSE, but for sufficiently large numbers of vehicles this may be prohibitively expensive.

Information Exchange Bandwidth

It is worthwhile to quantify the bandwidth (amount of data to be sent per unit time) at each node for the purpose of practical design of estimators for sensor networks. The information is communicated between connected sensors within the time interval of each new measurement, ∆t. Suppose we have a fully connected sensor network with m sensors, N particles for the MCDPF, n system dimensions, kcon consensus iteration steps, kmc Markov chain iteration steps, and ∆t timestep for measurements. Taking a single scalar number as the basic unit of measurement (ignoring quantization issues), then BW (bandwidth, measured in numbers per unit time) for the DEKF, MCDPF CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 47

Table 3.3: Table of the algorithms, RMSE values and the fraction of divergence

Algorithm (10 vehicles), ∆t = 0.1 RMSE Divergence EKF N/A 100% DEKF (kcon = 2) N/A 100% Regularized EKF 1.7108 8% Regularized DEKF (kcon = 2) 12.7356 16.7% CPF (N = 100) 0.4987 0% MCDPF (N = 100, kmc = 10) 0.7416 0% Algorithm (10 vehicles), ∆t = 0.05 RMSE Divergence EKF N/A 100% DEKF (kcon = 2) N/A 100% Regularized EKF 1.5674 2.9% Regularized DEKF (kcon = 2) 5.8275 6.1% CPF (N = 100) 0.2434 0% MCDPF (N = 100, kmc = 10) 0.4807 0%

and CPF are computed as follows.

kcon(n(n + 1)(m 1)) Bps = − (3.165) DEKF ∆t N  kmc (n + 1) Bps = m (3.166) MCDPF ∆t N(n + 1) Bps = . (3.167) CPF ∆t

For a given system and sensor network, n, m and ∆t are fixed and the parameters we can choose are kcon, kmc and N. It is known that consensus schemes have a exponential convergence rate in term of kcon and the MCDPF has a geometric convergence rate in kmc and N, as seen in figure 3.5, allowing bandwidth constrained filters to be designed.

In figure 3.5 the convergence of CPF and MCDPF according to N and kmc is shown. CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 48

DEKF 0 10 CPF MCDPF −1 10 1 10 −2 exp(−k ) 1 10 con 2 −3 10

−4 10

RMSE 0 RMSE 10 −5 10

−6 10

−7 1 10 2 −1 −8 10 10 4 5 10 10 0 0.5 1 1.5 2 2.5 Bps (k ) 5 BW con x 10

Figure 3.5: RMSE with respect to BW with changing N = 50, 100, 200, 500 (CPF), kmc = 5, 10, 20, 50 (MCDPF) and kcon = 2, 6, 10, 14 (DEKF). The decrease in RMSE is observed with increased BW.

3.7 Conclusions

In this chapter, we introduced the new Markov Chain Distributed Particle Filter (MCDPF) which exchanges particles between distributed sensor nodes, thus provid- ing robust state estimation and good scaling in the dimension of the system being estimated. The robustness is maintained by the fact that the proposed MCDPF algo- rithm only exchanges information locally, while good scaling is due to the use of only particles to store the state. This does mean, however, that the MCDPF will be less efficient for a lower-dimensional system that can be efficiently represented by Gaus- sian Mixture Models or other similar approximations. We proved that the estimated distribution of the MCDPF algorithm converges to that of the classical Centralized Particle Filter (CPF) in both a weak and strong manner as the number of Markov chain steps k per measurement goes to infinity, and we presented a numerical exam- ple to demonstrate the result. In addition, along with the review of the DEKF and MCDPF estimators, we described the nominal EKF and a regularization strategy, as well as comparing their performance numerically for a vehicle flocking model with 4 and 10 vehicles. We showed that the DEKF had lower RMSE than the MCDPF for simple systems (fewer vehicles) or high measurement frequency (small ∆t), whereas CHAPTER 3. MARKOV CHAIN DISTRIBUTED PARTICLE FILTER 49

for complex systems (more vehicles) or low measurement frequency (large ∆t) we found that the MCDPF was both more robust and more accurate than the DEKF. In addition, the DEKF experienced divergence for complex or infrequently measured systems, whereas the MCDPF always gave stable estimates. Chapter 4

Parallel stochastic simulation of coagulation

The simulation of particle coagulation is used in many areas, such as atmospheric science, , and chemistry. Traditional numerical integration methods become extremely computationally expensive as the number of dimensions needed to describe each particle increases, and it is also difficult to compute fluctuation information. Stochastic solutions based on Monte Carlo sampling methods scale better as the problem dimension increases, and are also able to resolve population fluctuations. We consider in this chapter a distributed approach for stochastic particle simula- tion. By distributing particles across a finite number of processors, the computational load on each processor can be reduced if the communication load for an exchange of particles is maintained at a low level. The distributed algorithm using Markov chain random walks for the simulation of particle coagulation is proposed and its conver- gence to the centralized algorithm, Gillespie’s method, is also provided.

50 CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION51

4.1 Introduction

The following is the well-known discrete-size Smoluchowski’s coagulation equation for particle coagulation [52]

k−1 ∞ ∂ 1 X X c(t, k) = K(k j, j)c(t, k j)c(t, j) K(k, j)c(t, k)c(t, j), (4.1) ∂t 2 j=1 − − − j=1 where t 0, x = 1, 2,... and K(k, j) is the coalescence kernel while c(t, k) is the ≥ concentration of the particle with a size k at time t. This equation is used in many areas, such as atmospheric science, chemistry, and biology, to mention a few [2]. Classical deterministic numerical integration methods for such systems are well understood [47] [14], but the solution of the numerical method neglects finite-number fluctuation effects. To overcome this weakness of the numerical integration, Monte Carlo simulation was proposed to simulate the coagulating particles accurately. Gille- spie proposed a stochastic method using a Markov process to simulate the coales- cence model based on Monte Carlo simulation in [15] [16]. Also Eibeck and Wag- ner [11][12][10] demonstrated not only an efficient stochastic particle algorithm with an acceptance-rejection method and a reduced variance but also provided a theoret- ical analysis of the existence of the solution for a certain class of kernels and proved the convergence of the stochastic particle method. More rigorous results regarding a weak convergence of the stochastic particle method can be found in [20] [25] and [34]. The purpose of this chapter is to propose an efficient stochastic particle method for Smoluchowski’s coagulation equation by introducing a parallel methodology. So far the stochastic particle method, regardless of the type of the method, uses one cen- tral unit to process particle system. For example, Gillespie’s method should perform at most O(N 2) computation with N existing particles at one processor. As a conse- quence, it would be advantageous from the perspective of computational load if we could divide particles amongst several processors and each processor could perform a computation only with the reduced number of particles it holds. With the paral- lel method implemented, it should be considered how to exchange information only locally to achieve a global solution which equivalent to that obtained by collecting CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION52

all particles in one processor. We propose a Markov chain random walk as a way to exchange information across the processors in this paper. Markov random walks were introduced as a method with good parallel scalability for particle filters [29]. Even though the field of application is different, the basic idea of how to exchange infor- mation is identical. As particles hop around the processors according to a Markov chain random walk defined by a graph from the processor configuration, the particles carry information that they possess. This chapter reviews a stochastic particle method by Gillespie [15] in section 4.2 to represent an example of a stochastic particle method with one processor. The parallel stochastic particle method with the numerical example is presented in the following section 4.3. In section 4.4, the weak convergence of the parallel stochastic particle method to Gillespie’s method will be proved.

4.2 Gillespie’s method

We interpret Gillespie’s method in a slightly different manner for the purpose of an easy comparison with the parallel stochastic particle method. Rather than considering i, j as indices of particles, they are considered as the size of particles here and xi, xj are the number of particles of corresponding sizes. We introduce the space denoting the possible configurations of the finite number of particles. Let M0 be a space of C M0-dimensional vector satisfying the following.

( M0 ) M0 M0 X = x Z xi 0, ixi = M0 . (4.2) C ∈ | ≥ i=1

M0 Additionally the disjoint subsets of , l for l = 1,...,M0, are defined as follows. C C

( M0 ) M0 X l = x xi = l . (4.3) C ∈ C | i=1

Considering the index i as a size of particles and xi as a number of particles of a corresponding size i, M0 represents the possible configurations of particles through C CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION53

the coagulation events. Gillespie’s method defines the coalescence probability density function " M M # X0 X0 P (τ, i, j) = Cij(x) exp Cij(x)τ (4.4) − i=1 j=i where ( K(i, j)N(xi, xj) xixj if i = j Cij(x) ,N(xi, xj) = 6 (4.5) xi(xj −1) ≡ M0 2 if i = j. with M0 being the initial number of particles. With a stochastic model (4.4) in hand, the coagulation time τ and the pair (i, j) are drawn to simulate a coalescence model. Note that Gillespie’s method is equivalent to a simulation of an inhomogeneous Poisson process, N(t), in its stochastic nature with a coagulation of particles as an event and τ as the inter-arrival times that are exponentially distributed with the PM0 PM0 parameter C0(x) = i=1 j=i Cij(x). What we have in addition to the classic Poisson process is that we have a probability Cij(x)/C0(x) of a coagulation between particles with size i and j at each arrival time. Using the rate Cij(x), the probability that size i and j particles coagulate can be generally expressed as follows.

M0 M0 K(i, j)N(xi, xj). X X K(i, j)N(xi, xj) Cij(x) Cij(x) ij(x) = = = . M M PM0 PM0 C (x) P 0 i=1 j=1 0 i=1 j=1 Cij(x) 0 (4.6)

4.2.1 Numerical example

The linear kernel, K(i, j) = i + j, is one of the kernels for which an analytic solution is known as [11] and is plotted in figure 4.1 over the particle size 2 k 10. ≤ ≤ −t k−1 (k(1 e )) −t c(t, k) = e−t − e−k(1−e ), t 0, k = 1, 2,... (4.7) k! ≥

Along an with analytic solution, c(t, k), the moments of the solution are frequently CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION54

0.12

0.1

0.08 ) 0.06 t, k (

c 0.04

0.02

0 10 8 10 8 6 6 4 4 2 2 0 k t

Figure 4.1: c(t, k) of linear kernel with 2 k 10. ≤ ≤ of interest. Z ∞ δ mδ(t) = k c(t, k) dk, δ 0. (4.8) 0 ≥ For a linear kernel

−t 2t m0(t) = e , m1(t) = 1, m2(t) = e , t 0 (4.9) ≥ which are approximated in Gillespie’s method as

M 1 X0 m (t) iδx . (4.10) δ M i ∼ 0 i=1

Using Gillespie’s method with an initial number of particles M0 = 120 and M0 = 500, 10 independent realizations are repeated to obtain the solution. Figure 4.2 shows the stochastic solution for k = 5 and the logarithmic histogram of the coagulation time step after n coagulations, τn, is provided along with the mean and median to get a sense of the distribution of τn in terms of the initial number of particles. Also provided is the line which represents the theoretical distribution of τn. The mean and median of τn,τ ˜ andτ ¯ respectively, are not trivial to compute for a general kernel. But in the case of a linear and constant kernel, it is straightforward to computeτ, ˜ τ¯ because of CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION55

the linearity of the kernel. To be precise, the rate parameter CMn after n coagulations is identical for any given configurations due to the linearity of the kernel. Therefore, the rate parameter CMn for a linear kernel is given as   Mn 2 M0 Mn CMn = + (Mn 1) − = Mn 1 = M0 n 1 (4.11) 2 M0 − M0 − − − where n = 0, 1,...,M0 2. For M0 1 different rate parameters we have the mean − − and median 1 ln(2) τ˜n = , τ¯n = . (4.12) M0 n 1 M0 n 1 − − − − The theoretical histogram is obtained by summing fn(x) over n where

x x d  10  ln(10) x 10 fn(x) = 1 exp τ˜n = 10 exp τ˜n . (4.13) dx − τ˜n

Theτ, ˜ τ¯ shown in figure 4.2 are the mean and median from 10 independent simulations and the theoretical value of them defined in (4.12) are listed in the caption for the comparison.

4.3 Parallel stochastic particle algorithm

In this section, the parallel stochastic particle algorithm is proposed, where m nodes with a properly scaled kernel simulate Gillespie’s model independently along with exchanging particles through the symmetric network at times of multiples of τmix. In algorithm 2, M0 particles are distributed equally likely across m nodes initially and the nodes exchange particles according to the Markov chain random walk defined by a symmetric network whenever the next simulation time tk for all nodes exceeds the next multiple of τmix.

Let Nk(t) be a at each node. Then the stochastic solution of the coalescence model in the parallel setting is obtained by adding point processes from Pm each node, Nτmix (t) = k=1 Nk(t). The point process Nτmix (t) is not a Poisson pro- cess as opposed to Gillespie’s method since inter-arrival times are not independent, which will be explained in more detail later. We will prove later, however, that the CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION56

c(t, 5),R = 10,M0 = 120 Logarithmic histogram of τn 90 Analytic Solution τ¯ =0.0124 τ˜ =0.0473 0.016 Stochastic Solution 80

0.014 70

0.012 60

0.01 50

0.008 40

0.006 30

0.004 20

0.002 10

0 0 −6 −4 −2 0 2 0 2 4 6 8 10 10 10 10 10 10 t τn

c(t, 5),R = 10,M0 = 500 Logarithmic histogram of τn 0.016 400 τ¯ =0.0032 Analytic Solution τ˜ =0.0138 0.014 Stochastic Solution 350

0.012 300

0.01 250

0.008 200

0.006 150

0.004 100

0.002 50

0 0 −6 −4 −2 0 2 0 2 4 6 8 10 10 10 10 10 10 t τn

Figure 4.2: The stochastic solution with M0 = 500, 120 and the histogram of τn. For M0 = 120 and M0 = 500 the portion of τn 0.01 is 0.4311 and 0.8016. ≤ stochastic solution obtained by the proposed parallel stochastic algorithm converges to the solution of the centralized stochastic algorithm, which in this case is Gillespie’s algorithm, as the particle exchange rate goes to infinity (that is, τmix 0). → In the parallel stochastic particle method, particles are initially distributed across m different nodes and each processor runs Gillespie’s algorithm independently be- tween steps when particles are exchanged. Thus particle configurations are described by the following M0 m matrix. ×

( M0 m ) M0,m M0×m X X = y Z yi,a 0, iyi,a = M0 . (4.14) C ∈ | ≥ i=1 a=1 CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION57

Algorithm 2 Parallel Stochastic Particle Algorithm Distribute M0 particles according to uniform distribution across m nodes c 1 ← tk 0 for k = 1, . . . , m ← while maxk tk < tfinal and coagulation occurs do for k = 1 to m do Draw coagulation time τ at kth node while tk + τ < cτmix do tk tk + τ Draw← the coagulation pair i, j at kth node and simulate coagulation Draw coagulation time τ at kth node end while end for Exchange particles through nodes according to symmetric Markov chain random walk tk nτmix for k = 1, . . . , m c ←c + 1 end← while

The subset of M0,m is defined as C

( M0 m ) m M0,m X X l = y yi,a = l . (4.15) C ∈ C | i=1 a=1

We introduce the projection operator : M0,m M0 defined for y M0,m S C → C ∈ C m X (y) = z, zi = yi,a (4.16) S a=1

M0 M0,m M0 where z . Also the selection operator a : Z is defined as follows ∈ C I C → for later use.

a(y) = w, wi = yi,a. (4.17) I In addition to these operators the following terms for the parallel method are defined as the previous section.

m K(i, j)N( a(y)i, a(y)j) Cij ( a(y)) = I I = Cij( a(y))m, (4.18) I M0/m I CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION58

M0 M0 m X X K(i, j)N( a(y)i, a(y)j) C ( a(y)) = I I = C0( a(y))m. (4.19) 0 M /m I i=1 j=i 0 I

And we denote the rate of coagulation between particles with the size i and j after a mixing as C˜m( (y)) which will be explicitly computed later. The probability that ij S particle size i and j coagulate at node a is given as

m m Cij ( a(y)) Cij( a(y)) ij ( a(y)) = m I = I . (4.20) P I C ( a(y)) C0( a(y)) 0 I I

Now let a(y) be the probability that the coagulation happens at node a given y G ∈ M0,m. Then the probability is given as C

PM0 PM0 m i=1 j=i Cij( a(y)) C0 ( a(y)) a(y) = I = m I . (4.21) Pm PM0 PM0 P m G Cij( a(y)) a=1 C0 ( a(y)) a=1 i=1 j=i I I 4.3.1 Numerical example

We used a numerical example with M0 = 120 and m = 40 for the purpose of demon- strating the quality of stochastic solution of the proposed parallel method. In this example, each processor has on average 3 particles at the beginning of the simula- tion. In practice this is an unreasonably small number of particles per processor for an efficient simulation. This example, however, highlights that the solution from the parallel stochastic particle algorithm is comparable with the one from Gillespie’s method with an exchange of particles between processors, even in such an extreme case. The solutions in figure 4.3 are computed by averaging 104 parallel stochastic simulations. Plotted in 4.3 are c104 (t, 5) andc ˜104 (t, 5) for different τmix together with an analytic solution to provide a better insight of how the solution looks for a par- ticular k, the size of the particles. Furthermore, the distribution of particle sizes for

Gillespie’s method and the parallel method with 3 different τmix at specific time t = 5 are shown in figure 4.4. Due to a relatively small number of initial particles, the distribution deviates from an analytic solution for a large size. By decreasing τmix we can see that the distribution from parallel method is getting closer to that from Gillespie’s method as expected. CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION59

c 4(t, 5) c˜ 4(t, 5), τmix =0.14 0.016 10 0.016 10 Analytic solution 0.014 0.014 Gillespie’s method Parallel Method 0.012 0.012

0.01 0.01

0.008 0.008

0.006 0.006

0.004 0.004

0.002 0.002

0 0 0 2 4 6 8 10 0 2 4 6 8 10 t t

c˜ 4(t, 5), τmix =0.04 c˜ 4(t, 5), τmix =0.01 0.016 10 0.016 10

0.014 0.014

0.012 0.012

0.01 0.01

0.008 0.008

0.006 0.006

0.004 0.004

0.002 0.002

0 0 0 2 4 6 8 10 0 2 4 6 8 10 t t

Figure 4.3: The plot of c104 (t, 5) andc ˜104 (t, 5) with 3 different τmix’s.

To quantify the error between the solution from parallel stochastic particle meth- ods and Gillespie’s solution, the L2 norm error is introduced as follows. Dividing the time interval (0,T ) into the subintervals with length dt and letting cn(t, k), c˜n(t, k) be the solution of n independent simulations from Gillespie’s method and a parallel stochastic particle method respectively, define the L2 norm error from n simulations as  1/2 T/dt M0 1 X X 2 en 2 =  (cn(ti, k) c˜n(ti, k))  (4.22) k k T/dt − i=1 k=1 which is illustrated in figure 4.5. As we can see in the figure, the error between the solution from a parallel stochastic particle method and Gillespie’s method decreases as τmix decreases. CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION60

−2 c104(5,t) −1 c˜ 4(5, k), τmix =0.14 10 10 10 Analytic solution Gillespie’s method −2 10 −3 10 Parallel Method

−3 10

−4 10

−4 10

−5 10 −5 10

−6 −6 10 10 0 20 40 60 80 100 120 0 20 40 60 80 100 120 k k

−2 c˜ 4(5, k), τmix =0.04 −2 c˜ 4(5, k), τmix =0.01 10 10 10 10

−3 −3 10 10

−4 −4 10 10

−5 −5 10 10

−6 −6 10 10 0 20 40 60 80 100 120 0 20 40 60 80 100 120 k k

Figure 4.4: The plot of c104 (5, k) andc ˜104 (5, k) with 3 different τmix’s.

−1 −1 10 10 τmix =0.5 τmix =0.14 τmix =0.04 τmix =0.01

2 −2

2 10 k k −2 4

R 10 10 e e k k

1

2 −3 10

−3 10 0 1 2 0 2000 4000 6000 8000 10000 10 10 10 R 1/τmix

4 Figure 4.5: eR 2 defined in (4.22) up to an ensemble size of R = 10 . k k CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION61

4.4 Convergence of parallel stochastic particle method

In this section, we will prove the convergence of the parallel stochastic particle method by considering the convergence of probability measures over the configurations of particles and coagulation times. To provide a concrete view of the parallel stochastic particle algorithm, a couple of lemmas are reviewed here.

Lemma 23 (sum of Poisson processes). Suppose that the Poisson processes Nk(t) with λk(t) for k = 1, . . . , m are independent. Then the counting process

N(t) defined by N(t) = N1(t) + + Nm(t) is a Poisson process with rate function ··· λ(t) given by λ(t) = λ1(t) + + λm(t). ··· Proof. This can be shown by using the moment generating function.

sN(t) MN(t)(s) = Ee (4.23) = EesN1(t)+···+sNm(t) = EesN1(t) EesNm(t) (4.24) ··· R t s R t s λ1(t) dt(e −1) λm(t) dt(e −1) = MN (t)(s) MN (t)(s) = e 0 e 0 (4.25) 1 ··· m ··· R t(λ (t)+···+λ (t)) dt(es−1) = e 0 1 m . (4.26)

thus, N(t) is also Poisson and its rate is λ1(t) + + λm(t). ···

Lemma 24. Let X1,...,Xm be independent exponentially distributed random vari- ables with rate parameters λ1, . . . , λm. Then min X1,...,Xm is also exponentially { } distributed, with parameter λ = λ1 + + λm. ··· Proof. This can be seen by considering the complementary cumulative distribution function

p(min X1,...,Xm > x) = p (X1 > x and ... and Xm > x) (4.27) { } m m m ! Y Y X = p(Xi > x) = exp( xλi) = exp x λi . i=1 i=1 − − i=1

From lemma 23 and 24 it can be shown that the parallel stochastic particle algo- rithm is statistically same as computing minimum of τ1, . . . , τm at every time interval CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION62

τmix and letting coagulation happen only at the node produced smallest coagulation time.

Definition 25 (Weak convergence of random variables). We say that random vari- L ables Xn converge in law (or weakly) to a random variable X, denoted by Xn X, if −→ FX (x) FX (x) as n for each fixed x which is a continuity point of FX (where n → → ∞ x is a continuity point of FX ( ) if FX (xk) FX (x) whenever xk x). This is also · → D → called convergence in distribution, and denoted Xn X. −→ For a better understanding of the parallel stochastic particle algorithm, we derive the analytic expression of an inter-arrival time and a coagulation rate between par- ticles with size i and j for the parallel stochastic particle algorithm. The significant difference of the parallel algorithm stems from the introduction of τmix. The depen- dency of the current coagulation time on the past one is due to this mixing time step.

Suppose the past coagulation happened between (n 1)τmix and nτmix. If it is very − close to nτmix then it is highly likely that particles do not coagulate until nτmix. On the other hand, if the past coagulation time is rather close to (n 1)τmix, then we − have more chance that the next coagulation happens before nτmix. As described by this simple example, it is obvious that inter-arrival times of the parallel algorithm are indeed not independent and thus we lose one of the main properties of Poisson processes, independent inter-arrival times. We will show, however, that inter-arrival times become independent as we mix particles infinitely many times, which means

τmix 0. This convergence is crucial to establish the convergence of the parallel → stochastic particle algorithm to Gillespie’s method.

Considering the extreme case where τmix = , for example, if the rate parameters ∞ m m at each node are C ( 1(y)),...,C ( m(y)) then the inter-arrival time of the parallel 0 I 0 I Pm m method is exponentially distributed with the rate parameter C ( a(y)) and a=1 0 I thus the point process Nτmix (t) becomes a Poisson process because it is the sum of

Poisson process. Introducing a deterministic interval τmix, however, does makes the argument complicated because the next inter-arrival time certainly depends on the past inter-arrival time as explained above. Now we prove that the rate that particles with size i and j coagulate in the case of Gillespie’s algorithm is the same as the rate CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION63

with m different nodes under a perfect mixing of particles.

Lemma 26. Suppose we have y M0,m and x M0 such that x = (y). Then the ∈ C ∈ C S function N(xi, xj) defined in (4.5) satisfies the following.

  m X xj X xj −1 N(yi,a, yj,a) = m N(xi, xj). (4.28) yj,1, . . . , yj,m yj,1,...,yj,m≥0 a=1 yj,1+...+yj,m=xj

Proof. For i = j, 6   m X xj X N(yi,a, yj,a) (4.29) yj,1, . . . , yj,m yj,1,...,yj,m≥0 a=1 yj,1+...+yj,m=xj m   X X xj = yi,a yj,a (4.30) yj,1, . . . , yj,m a=1 yj,1,...,yj,m≥0 yj,1+...+yj,m=xj m   X X xj 1 = yi,a − xj (4.31) yj,1, . . . , yj,a 1, . . . , yj,m a=1 yj,1,...,yj,a−1,...,yj,m≥0 − yj,1+...+yj,a−1+...+yj,m=xj −1 m X xj −1 = yi,am xj (4.32) a=1

xj −1 xj −1 = xixjm = m N(xi, xj). (4.33)

Now for i = j,

  m X xj X yj,a(yj,a 1) − (4.34) yj,1, . . . , yj,m 2 yj,1,...,yj,m≥0 a=1 yj,1+...+yj,m=xj m   X X xj yj,a(yj,a 1) = − (4.35) yj,1, . . . , yj,m 2 a=1 yj,1,...,yj,m≥0 yj,1+...+yj,m=xj m   1 X X xj 2 = − xj(xj 1) (4.36) 2 yj,1, . . . , yj,a 2, . . . , yj,m − a=1 yj,1,...,yj,a−2,...,yj,m≥0 − yj,1+...+yj,a−2+...+yj,m=xj −2 CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION64

m 1 X = x (x 1) mxj −2 (4.37) 2 j j − a=1

xj −1 xj(xj 1) xj −1 = m − = m N(xi, xj). 2

M0,m Lemma 27. Suppose we have y and let M0 be an initial number of particles, ∈ C and ,K(i, j) be a kernel between two particles with the size i and j. The coagulation rate between size i and j particles for the parallel stochastic particle method after a mixing step is the same as the rate of Gillespie’s method with x M0 where x = (y). ∈ C S That is, ˜m K(i, j)N(xi, xj) Cij ( (y)) = Cij(x) = . (4.38) S M0 Proof. Considering the probabilities of all possible particle allocations after a mixing, which are described by a multinomial distribution,

M0   m ˜m Y 1 X xk X m Cij ( (y)) = x Cij ( a(˜y)). (4.39) S m k y˜k,1,..., y˜k,m I k=1 y˜k,1,...,y˜k,m≥0 a=1 y˜k,1+...+˜yk,m=xk

The last term of (4.39) is given

m m X X K(i, j)N(˜yi,a, y˜j,a) Cm( (˜y)) = . (4.40) ij a M /m a=1 I a=1 0

For k = j, from the result of lemma 26

  m 1 X xj X K(i, j)N(˜yi,a, y˜j,a) x (4.41) m j y˜j,1,..., y˜j,m M0/m y˜j,1,...,y˜j,m≥0 a=1 y˜j,1+...+˜yj,m=xj

1 xj −1 K(i, j)N( (˜y)i, (˜y)j) K(i, j)N(xi, xj) = x m S S = (4.42) m j M0/m M0

For k = j the multinomial theorem [46] gives 6   1 X xk x = 1. (4.43) m k y˜k,1,..., y˜k,m y˜k,1,...,y˜k,m≥0 y˜k,1+...+˜yk,m=xk CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION65

Therefore we have the following.

˜m K(i, j)N(xi, xj) Cij ( (y)) = = Cij(x). (4.44) S M0

To derive the probability density function of an inter-arrival time of the parallel stochastic particle algorithm we first consider when we have only one mix of particles.

Lemma 28. Suppose we have a particle configuration y M0,m and mix particles. ∈ C −C0(x)τmix Then the probability that no coagulation occurs during a time interval τmix is e where x = (y). S Proof. From lemma 27 the rate of coagulation between particles with the size i and j is identical to the rate of Gillespie’s method, Cij(x). Given the rate we can easily see without any further effort that the probability that no particles coagulate during

−C0(x)τmix τmix is e . For details see [15, (7a)].

Now we have an explicit form of the probability density function of an inter-arrival time.

Lemma 29. Suppose we have a particle configuration y M0,m at time t and x ∈ C ∈ M0 such that x = (y). Then the probability density function of an inter-arrival C S time is given as

( −λ0(y)(τ−t) λ0(y)e t τ < τnext(t, τmix) fτmix (τ; t) = ≤ −C0(x)(τ−τnext(t,τmix)) −λ0(y)(τnext(t,τmix)−t) C0(x)e e τ τnext(t, τmix) ≥ (4.45) Here m X m λ0(y) = C0 ( a(y)) (4.46) a=1 I and

τnext(t, τmix) = t + τmix (t mod τmix). (4.47) − Proof. Consider particles are mixed n times and the inter-arrival time τ is before

(n + 1)-th mixing. Then no coagulations occur in nτmix and we have the following CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION66

probability density.

Probcoagulation happens in the time interval (4.48)  (τnext(t, τmix) + nτmix + τ, τnext(t, τmix) + nτmix + τ + dτ) (4.49)

−λ0(y)(τnext(t,τmix)−t) −C0(x)τmixn −C0(x)(τ−(τnext(t,τmix)+nτmix)) = e e C0(x)e dτ (4.50)

−C0(x)(τ−τnext(t,τmix)) −λ0(y)(τnext(t,τmix)−t) = C0(x)e e dτ. (4.51)

Since this holds for n 0 we have the desired result. ≥ As we can see, the probability density function of τ is not exponentially distributed and has a discontinuity. Furthermore, it is not memoryless because the past coagu- lation time has an influence on the next coagulation time, which is reflected in the value α. Now we prove in the following lemma that inter-arrival times drawn from fτmix (τ) converge weakly to exponential random variables with the rate parameter of Gillespie’s method.

Lemma 30. Suppose we have a particle configuration y M0,m at time t and x ∈ C ∈ M0 such that x = (y). Then the inter-arrival time of the parallel stochastic particle C S algorithm with an arbitrary τmix > 0 drawn from fτmix (τ) in (4.45) converges weakly to the exponential random variable with a rate parameter C0(x) of Gillespie’s method as τmix 0. →

Proof. We will show that at any point T [t, ) the cumulative distribution Fτ (T ) ∈ ∞ mix −C0(x)(T −t) converges to F (T ) = 1 e as τmix 0. For T τnext(t, τmix), − → ≤

−λ0(y)(T −τnext(t,τmix)) Fτ (T ) = 1 e . (4.52) mix −

For T τnext(t, τmix), ≥

Z τnext(t,τmix) −λ0(y)(τ−t) Fτmix (T ) = λ0(y)e dτ (4.53) t Z T −C0(x)(τ−τnext(t,τmix)) −λ0(y)(τnext(t,τmix)−t) + C0(x)e e dτ (4.54) τnext(t,τmix) = 1 eλ0(y)(τnext(t,τmix)−t) (4.55) − CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION67

+ eλ0(y)(τnext(t,τmix)−t)eC0(x)τnext(t,τmix)(e−C0(x)τnext(t,τmix) e−C0(x)T ) (4.56) − = 1 eλ0(y)(τnext(t,τmix)−t)e−C0(x)(T −τnext(t,τmix)). (4.57) −

If τmix 0, then τnext(t, τmix) t and thus → →

−C0(x)(T −t) lim Fτmix (T ) = 1 e = F (T ). (4.58) τmix→0 −

Suppose Gillespie’s algorithm produces the sequence of particle configurations, xn M −n for n = 0,...,M0 1, at an arbitrary sequence of times 0 = t0 < t1 < ∈ C 0 − < tl as illustrated in figure 4.6. ···

M0 C . x4

x3 x2

x1

x0 t t t t t t t 0 1 2 3 4 5 ··· Figure 4.6: Particular realization of particle coagulation using Gillespie’s algorithm.

By introducing X = (X(t): t 0) as an M0 -valued stochastic process, the ≥ C probability density of this event is

P (X(t0) = x0,X(t1) = x1,...,X(tl) = xl) dt0dt1 dtl. (4.59) ···

This probability is written as the product of l + 1 density functions.

P (X(t0) = x0,X(t1) = x1,...,X(tl) = xl)

= P (X(t0) = x0)P (X(t1) = x1 X(t0) = x0) P (X(tl) = xl X(tl−1) = xl−1). | ··· | (4.60)

Similarly for the parallel stochastic method we can define the probability density CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION68

function using Y = (Y (t): t 0) as an M0,m-valued stochastic process, ≥ C

Pτ (Y (t0) = y0,Y (t1) = y1,...,Y (tl) = yl) dt0dt1 dtl (4.61) mix ··· which can be written as

Pτmix (Y (t0) = y0,Y (t1) = y1,...,Y (tl) = yl)

= Pτ (Y (t0) = y0)Pτ (Y (t1) = y1 Y (t0) = y0) Pτ (Y (tl) = yl Y (tl−1) = yl−1). mix mix | ··· mix | (4.62)

We claim the convergence of the parallel stochastic particle method to Gillespie’s method:

X X X lim Pτmix (Y (t0) = y0,Y (t1) = y1,...,Y (tl) = yl) τmix→0 −1 −1 ··· −1 y0∈S (x0) y1∈S (x1) yl∈S (xl)

= P (X(t0) = x0,X(t1) = x1,...,X(tl) = xl), (4.63)

which will be shown by using an induction. With the given initial probability P (X(t0) = x0) let’s begin with what P (X(tl) = xl X(tl−1) = xl−1) looks like. |

M0 M0 M0 Lemma 31. Suppose xl = xl−1 + δ (δ + δ ) where xl−1,i, xl−1,j 1 and the i+j − i j ≥ M0 M0-dimensional vector δi is defined as a zero vector except for i-th element whose value is 1. The probability density that we have the coagulation between size i and j between times tl and tl + dτ, given that the particle configuration is xl−1 at time tl−1, is as follows.

−C0(xl−1)(tl−tl−1) P (X(tl) = xl X(tl−1) = xl−1) = Cij(xl−1)e . (4.64) |

Proof. Almost the same probability is derived by Gillespie [15] but we consider i and j as the size of particles instead of particle indices. Following the derivation by Gillespie [15] the probability P (X(tl) = xl X(tl−1) = xl−1) is obtained by the | product of three terms: (i) the probability that none of the droplets coalesce in the time interval (tl−1, tl), (ii) the probability that the droplets with size i and j coalesce CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION69

in the next differential time interval (tl, tl + dτ) and (iii) the probability that no other droplets coalesce in that same differential time interval. It turns out that these probabilities are

(i) = e−C0(xl−1)(tl−tl−1) (4.65)

(ii) = Cij(xl−1)dτ (4.66) M M Y0 Y0 (iii) = (1 Ckl(xl−1)dτ) . (4.67) − k=1 l=k k6=i l6=j

Therefore we have

−C0(xl−1)(tl−tl−1) P (X(tl) = xl X(tl−1) = xl−1) = Cij(xl−1)e . (4.68) |

Furthermore for the sequence of times,

P (X(t0) = x0,X(t1) = x1,...,X(tl) = xl) l Y −C0(xk−1)(tk−tk−1) = P (X(t0) = x0) Cij(xk−1)e . (4.69) k=1

The transition probability for the parallel method is a little bit complicated be- cause of the mixing step which causes two different cases, with or without the mixing before the coagulation. Note that we have the following lemma.

m Lemma 32. Suppose we have arbitrary time 0 < tl−1 < tl and yl , yl−1 ∈ CM0−l ∈ m such that CM0−(l−1) M0 M0 M0 (yl) = (yl−1) + δ (δ + δ ). (4.70) S S i+j − i j

Then transition probability for the parallel method, Pτ (Y (tl) = yl Y (tl−1) = yl−1), mix | has the following property.

M0   Y 1 yli,1 + + yli,m lim Pτmix (Y (tl) = yl Y (tl−1) = yl−1) = (y +···+y ) ··· τmix→0 li,1 li,m y , . . . , y | i=1 m li,1 li,m (4.71) CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION70

−C0(S(yl−1))(tl−tl−1) Cij( (yl−1))e . (4.72) × S

Proof. Note that we have two different situations.

Pτ (Y (tl) = yl Y (tl−1) = yl−1) (4.73) mix | = Pτ (Y (tl) = yl tl τnext(tl−1, τmix),Y (tl−1) = yl−1) (4.74) mix | ≤ Pτ (tl τnext(tl−1, τmix) Y (tl−1) = yl−1) (4.75) × mix ≤ | + Pτ (Y (tl) = yl tl > τnext(tl−1, τmix),Y (tl−1) = yl−1) (4.76) mix | Pτ (tl > τnext(tl−1, τmix) Y (tl−1) = yl−1) (4.77) × mix | where τnext(tl−1, τmix) = tl−1+τmix (tl−1 mod τmix) and this specifies the time between − tl−1 and the next mixing time. If we explore term by term appearing in (4.73–4.77), then the first and the second terms are

Pτ (Y (tl) = yl tl τnext(tl−1, τmix),Y (tl−1) = yl−1) mix | ≤ m m X X m −C0 (Ia(yl−1))(tl−tl−1) = Cij ( a(yl−1))e , (4.78) a=1 I yl−1i,a ≥1, yl−1j,a ≥1

Pm m − C (Ia(yl−1))(τnext(tl−1,τmix)−tl−1) Pτ (tl τnext(tl−1, τmix) Y (tl−1) = yl−1) = 1 e a=1 0 . mix ≤ | − (4.79) The third term is obtained by following a similar derivation to (4.68). We first consider the following probability for xl = (yl), S

−1 Pτ (Y (tl) = (xl) tl > τnext(tl−1, τmix),Y (tl−1) = yl−1). (4.80) mix S |

From (4.45) and lemma 30, probability (i) is given as

(i) = eλ0(yl−1)(τnext(tl−1,τmix)−tl−1)e−C0(S(yl−1))(tl−τnext(tl−1,τmix)) (4.81)

where T = tl tl−1. The probabilities (ii) and (iii) are calculated by computing − the rate of coagulation between particles with the size i and j at any node in the CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION71

case of the parallel stochastic method. From lemma 27 it turns out that this rate is identical to that of Gillespie’s method under the condition of a perfect mixing. Thus for xl−1 = (yl−1) the probabilities (ii) and (iii) are S

(ii) = Cij( (yl−1)) dτ (4.82) S M M Y0 Y0 (iii) = (1 Ckl( (yl−1))dτ) . (4.83) − S l=1 n=l l6=i n6=j

The transition probability density (4.80) can be obtained by the product of these three probabilities divided by an infinitesimal dτ.

−1 Pτ (Y (tl) = (xl) tl > τnext(tl−1, τmix),Y (tl−1) = yl−1) mix S | λ0(yl−1)(τnext(tl−1,τmix)−tl−1) −C0(S(yl−1))(tl−τnext(tl−1,τmix)) = Cij( (yl−1))e e . (4.84) S

Now we consider (iv), the probability of having a particular particle configuration yl after the mixing.

Pτmix (Y (tl) = yl tl > τnext(tl−1, τmix)Y (tl−1) = yl−1) −1 | Pτ (Y (tl) = (xl) tl > τnext(tl−1, τmix),Y (tl−1) = yl−1) mix S | M0   Y 1 yli,1 + + yli,m = (y +···+y ) ··· . (4.85) li,1 li,m y , . . . , y i=1 m li,1 li,m

This is a multinomial distribution because particles are distributed equally like to nodes after the mixing. Putting (4.84) and (4.85) together gives the following result.

Pτ (Y (tl) = yl tl > τnext(tl−1, τmix),Y (tl−1) = yl−1) (4.86) mix | M0   Y 1 yli,1 + + yli,m = (y +···+y ) ··· (4.87) li,1 li,m y , . . . , y i=1 m li,1 li,m

λ0(yl−1)(τnext(tl−1,τmix)−tl−1) −C0(S(yl−1))(tl−τnext(tl−1,τmix)) Cij( (yl−1))e e . (4.88) × S CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION72

The last term is simply given as

Pm m − C (Ia(yl−1))(τnext(tl−1,τmix)−tl−1) Pτ (tl > τnext(tl−1, τmix) Y (tl−1) = yl−1) = e a=1 0 . mix | (4.89)

Because only last two terms remain as τmix 0, →

lim Pτmix (Y (tl) = yl Y (tl−1) = yl−1) (4.90) τmix→0 | M0   Y 1 yli,1 + + yli,m = lim (y +···+y ) ··· (4.91) τmix→0 li,1 li,m y , . . . , y i=1 m li,1 li,m

λ0(yl−1)(τnext(tl−1,τmix)−tl−1) −C0(S(yl−1))(tl−τnext(tl−1,τmix)) Cij( (yl−1))e e (4.92) × S M0   Y 1 yli,1 + + yli,m −(C0(S(yl−1)))(tl−tl−1) = (y +···+y ) ··· Cij( (yl−1))e . li,1 li,m y , . . . , y i=1 m li,1 li,m S

Finally we have the convergence of the probability density function as follows.

Theorem 33. Suppose we have Gillespie’s algorithm and the parallel stochastic par- ticle algorithm provided in Algorithm 2. If the rate of information exchange goes to infinity (i.e. τmix 0), then the stochastic solution of Smoluchowski’s coagula- → tion equation given by the parallel stochastic particle algorithm converges to that of Gillespie’s algorithm. That is,

X X X lim Pτmix (Y (t0) = y0,Y (t1) = y1,...,Y (tl) = yl) τmix→0 −1 −1 ··· −1 y0∈S (x0) y1∈S (x1) yl∈S (xl)

= P (X(t0) = x0,X(t1) = x1,...,X(tl) = xl). (4.93)

Proof. An induction is used to prove the result here. For t = t0 and (y0) = x0 we S have X P (X(t0) = x0) = Pτmix (Y (t0) = y0). (4.94) −1 y0∈S (x0)

Now for t = t1,

X X lim Pτmix (Y (t0) = y0,Y (t1) = y1) (4.95) τmix→0 −1 −1 y0∈S (x0) y1∈S (x1) CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION73

X X = Pτmix (Y (t0) = y0) lim Pτmix (Y (t1) = y1 Y (t0) = y0) τmix→0 −1 −1 | y0∈S (x0) y1∈S (x1) (4.96)

M0   X X Y 1 y1i,1 + + y1i,m = Pτmix (Y (t0) = y0) ··· (y1i,1 +···+y1i,m ) −1 −1 m y1i,1 , . . . , y1i,m y0∈S (x0) y1∈S (x1) i=1 (4.97)

−(C0(S(y0)))(t1−t0) Cij( (y0))e (4.98) × S X −(C0(S(y0)))(t1−t0) = Pτmix (Y (t0) = y0)Cij( (y0))e (4.99) −1 S y0∈S (x0) M0   X Y 1 y1 + + y1 i,1 ··· i,m (y1i,1 +···+y1i,m ) × −1 m y1i,1 , . . . , y1i,m y1∈S (x1) i=1 (4.100)

M0   −(C0(x0))(t1−t0) Y 1 X x1i = P (X(t0) = x0)Cij(x0)e x1i m y1i,1 , . . . , y1i,m i=1 y1i,1 ,...,y1i,m ≥0 y1i,1 +...+y1i,m =x1i (4.101)

−(C0(x0))(t1−t0) = P (X(t0) = x0)Cij(x0)e . (4.102)

The second equality is from lemma 32 and the last equality is based on the multinomial theorem [41]. Now suppose for t = tl−1 we have

X X lim Pτmix (t0, y0, . . . , tl−1, yl−1) = P (t0, x0, . . . , tl−1, xl−1). τmix→0 −1 ··· −1 y0∈S (x0) yl−1∈S (xl−1) (4.103)

Then applying lemma 32 to Pτ (Y (tl) = yl Y (tl−1) = yl−1) and repeating the same mix | procedure for t = t1 easily yield the following.

X X X lim Pτmix (Y (t0) = y0,...,Y (tl−1) = yl−1,Y (tl) = yl) τmix→0 −1 ··· −1 −1 y0∈S (x0) yl−1∈S (xl−1) yl∈S (xl)

−(C0(xl−1))(tl−tl−1) = P (X(t0) = x0,...,X(tl−1) = xl−1)Cij(xl−1)e (4.104)

= P (X(t0) = x0,...,X(tl−1) = xl−1)P (X(tl) = xl X(tl−1) = xl−1) (4.105) | CHAPTER 4. PARALLEL STOCHASTIC SIMULATION OF COAGULATION74

= P (X(t0) = x0,...,X(tl−1) = xl−1,X(tl) = xl). (4.106)

Therefore for any 0 = t0 < t1 < . . . < tl we have

X X X lim Pτmix (Y (t0) = y0,Y (t1) = y1,...,Y (tl) = yl) τmix→0 −1 −1 ··· −1 y0∈S (x0) y1∈S (x1) yl∈S (xl)

= P (X(t0) = x0,X(t1) = x1,...,X(tl) = xl). (4.107)

4.5 Conclusions

The classical single-processor approach to simulate Smoluchowski’s coagulation equa- tion, which is Gillespie’s method, is reviewed and a new distributed algorithm based on that method is presented. The numerical example and showed that as particles are exchanged at an increasing rate, the simulation by the proposed distributed algorithm converges to the simulation by Gillespie’s method. It would be necessary to further analyze the computational load of particle exchanges to understand the full computational cost, but a reduction of overall computational time is expected based on the reduced per-processor load. Bibliography

[1] H. Akashi and H. Kumamoto. Construction of discrete-time nonlinear filter by Monte Carlo methods with variance-reducing techniques. Systems and Control, 19:211–221, 1975.

[2] D. J. Aldous. Deterministic and stochastic models for coalescence (aggregation and coagulation): A review of the mean-field theory for probabilists. Bernoulli, 5(1):3–48, 1999.

[3] Anderson B.D.O. and Moore J.B. Optimal filtering. Prentice-Hall, 1979.

[4] M. Coates. Distributed particle filters for sensor networks. In In Proc. of 3nd workshop on Information Processing in Sensor Networks (IPSN), pages 99–107, 2004.

[5] D. Crisan and A. Doucet. A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on Signal Processing, 50(3):736– 746, Mar 2002.

[6] D. Crisan, P. Del Moral, and T. Lyons. Discrete filtering using branching and interacting particle systems. Markov Proc. Rel. Fields, 5:293–318, 1998.

[7] Petar M. Djuri´c,Jayesh H. Kotecha, Jianqui Zhang, Yufei Huang, Tadesse Ghir- mai, M´onicaF. Bugallo, and Joaqu´ınM´ıguez. Particle filtering. IEEE Signal Processing Magazine, 20(5):19–38, Sep 2003.

[8] A. Doucet, N. De Freitas, and N. Gordon. Sequential Monte Carlo methods in practice. Springer-Verlag, 2001.

75 BIBLIOGRAPHY 76

[9] R. Durrett. Probability: Theory and Examples. Duxbury Press, second edition, 1996.

[10] A. Eibeck and W. Wagner. Approximative solution of the coagulation- fragmentation equation by stochastic particle systems. Stochastic Anal. Appl., 18:921–948, 2000.

[11] A. Eibeck and W. Wagner. An efficient stochastic algorithm for studying coagu- lation dynamics and gelation phenomena. SIAM J. Sci. Comput., 22(3):802–821, 2000.

[12] A. Eibeck and W. Wagner. Stochastic particle approximations for Smoluchoski’s coagulation equation. The Annals of Applied Probability, 11(4):1137–1165, 2001.

[13] Nando De Freitas. Rao-Blackwellised particle filtering for fault diagnosis. In IEEE Aerospace, pages 1767–1772, 2001.

[14] D. T. Gillespie. The stochastic coalescence model for cloud droplet growth. Journal of the Atmospheric Sciences, 29:1496–1510, 1972.

[15] D.T. Gillespie. An exact method for numerically simulating the stochastic coa- lescence process in a cloud. Journal of the Atmospheric Sciences, 32:1977–1989, 1975.

[16] D.T. Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys., 22:403–434, 1976.

[17] N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. In Radar and Signal Pro- cessing, IEEE Proceedings F, volume 140, pages 107–113, Apr 1993.

[18] D. Gu. Distributed particle filter for target tracking. In Robotics and Automation, 2007 IEEE International Conference on, pages 3856–3861, April 2007.

[19] D. Gu, J. Sun, Z. Hu, and H. Li. Consensus based distributed particle filter in sensor networks. In Information and Automation, 2008. ICIA 2008. Interna- tional Conference on, pages 302–307, June 2008. BIBLIOGRAPHY 77

[20] F. Guia¸s.A Monte Carlo approach to the Smoluchowski equations. Monte Carlo Methods Appl., 3:313–326, 1997.

[21] J.M. Hammersley and K.W. Morton. Symposium on Monte Carlo methods: Poor man’s Monte Carlo. J. R. Stat. Soc., Ser. B, 16:23–38, 1954.

[22] J.E. Handschin. Monte Carlo techniques for prediction and filtering of non-linear stochastic processes. Automatica, 6:555–563, 1970.

[23] J.E. Handschin and D.Q. Mayne. Monte Carlo techniques to estimate the con- ditional expectation in multi-stage non-linear filtering. International Journal of Control, 9:547–559, 1969.

[24] Xiao-Li Hu, T. B. Schon, and L. Ljung. A basic convergence result for particle filtering. IEEE Transactions on Signal Processing, 56(4):1337–1348, 2008.

[25] I. Jeon. Existence of gelling solutions for coagulation-fragmentation equations. Communications in Mathematical , 194:541–567, 1998.

[26] Jong-Han Kim, M. West, E. Scholte, and S. Narayanan. Multiscale consensus for decentralized estimation and its application to building systems. In 2008 American Control Conference, pages 888–893, June 2008.

[27] Genshiro Kitagawa. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical , 5(1):1– 25, 1996.

[28] A. Lasota and M.C. Mackey. Chaos, , and Noise: Stochastic Aspects of Dynamics. Springer-Verlag, 1994.

[29] Sun Hwan Lee and Matthew West. Markov chain distributed particle filters (MCDPF). In 48th IEEE Conference on Decision and Control/28th Chinese Control Conference, pages 5496–5501, dec. 2009.

[30] Jun S. Liu and Rong Chen. Sequential Monte Carlo methods for dynamic sys- tems. Journal of the American Statistical Association, 93:1032–1044, 1998. BIBLIOGRAPHY 78

[31] Simon Maskell and Neil Gordon. A tutorial on particle filters for on-line nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Pro- cessing, 50:174–188, 2001.

[32] S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Springer– Verlag, 1993.

[33] C. Musso, N. Oudjane, and FranC¸. Legland. Improving regularized particle fil- ters. In A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in Practice, pages 247–271. Statistics for and Information Science, 2001.

[34] J. R. Norris. Smoluchowski’s coagulation equation: Uniqueness, nonuniqueness and a hydrodynamic limit for the stochastic coalescent. Ann. Appl. Probab., 9(1):78–109, 1999.

[35] R. Olfati-Saber. Distributed Kalman filter with embedded consensus filters. In 44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference., pages 8179–8184, Dec. 2005.

[36] R. Olfati-Saber. Distributed Kalman filtering for sensor networks. In 2007 46th IEEE Conference on Decision and Control, pages 5492–5498, Dec. 2007.

[37] R. Olfati-Saber and J.S. Shamma. Consensus filters for sensor networks and distributed sensor fusion. In 44th IEEE Conference onDecision and Control, 2005 and 2005 European Control Conference, pages 6698–6703, Dec. 2005.

[38] L. Ong, B. Upcroft, M. Ridley, T. Bailey, S. Sukkarieh, and H. Durrant-Whyte. Consistent methods for decentralised data fusion using particle filters. In 2006 IEEE International Conference on Multisensor Fusion and Integration for Intel- ligent Systems, pages 85–91, Sept. 2006.

[39] L. Perea, J. How, L. Breger, and P. Elosegui. Nonlinearities in sensor fusion. Di- vergence issues in EKF, modified truncated SOF, and UKF. In AIAA Guidance BIBLIOGRAPHY 79

Navigation and Control Conference and Exhibit, Hilton Head, South Carolina, aug. 2007.

[40] Michael K Pitt and Neil Shephard. Filtering via simulation: auxiliary particle filters. Journal of the American Statistical Association, 94(446):590–599, 1999.

[41] Lennart R˚adeand Bertil Westergren. Mathematics handbook for science and engineering. Birkhauser Boston, Inc., Secaucus, NJ, USA, 1995.

[42] B.S. Rao and H.F. Durrant-Whyte. Fully decentralised algorithm for multisensor Kalman filtering. In IEEE Proceedings-D Control Theory and Applications, pages 413–420, Sept. 1991.

[43] V. Romanovsky. Discrete Markov Chains. Wolters-Noordhoff Publishing, 1970.

[44] M.N. Rosenbluth and A.W. Rosenbluth. Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys., 23:356–359, 1956.

[45] M. Rosencrantz, G. Gordon, and S. Thrun. Decentralized sensor fusion with distributed particle filters. In In Proceedings of the Conference on Uncertainty in AI (UAI), 2003.

[46] S. M. Ross. Introduction to Probability Models, Ninth Edition. Academic Press, Inc., Orlando, FL, USA, 2006.

[47] W. T. Scott. Poisson statistics in distributions of coalescing droplets. Journal of the Atmospheric Sciences, 24:221–225, 1967.

[48] X. Sheng, Y. Hu, and P. Ramanathan. Distributed particle filter with GMM approximation for multiple targets localization and tracking in wireless sensor network. In IPSN ’05: Proceedings of the 4th international symposium on In- formation processing in sensor networks, page 24, Piscataway, NJ, USA, 2005. IEEE Press.

[49] D. Steinsaltz. Convergence of moments in a Markov-chain central limit theorem. Indagationes Mathematicae, 12:533–555, 2001. BIBLIOGRAPHY 80

[50] H. Tanner, A. Jadbabaie, and G. J. Pappas. Coordination of multiple au- tonomous vehicles. In IEEE Mediterranean Conference on Control and Automa- tion, June 2003.

[51] Howard M. Taylor and Samuel Karlin. An introduction to stochastic modeling. Acdemic Press, 1998.

[52] M. von Smoluchowski. Drei Vortr¨age ¨uber Diffusion, Brownsche Molekularbe- wegung und Koagulation von Kolloidteilchen. Phys. Z., 17:557–571, 585–599, 1916.

[53] Lin Xiao, Stephen Boyd, and Sanjay Lall. A space-time diffusion scheme for peer- to-peer least-squares estimation. In IPSN ’06: Proceedings of the 5th interna- tional conference on Information processing in sensor networks, pages 168–176, 2006.

[54] V.S. Zaritskii, V.B. Svetnik V.B., and L.I. Shimelevich. Monte Carlo technique in problems of optimal data processing. Automation and Remote Control, 12:95– 103, 1975.