<<

2 Sequential Monte Carlo Techniques and Bayesian Filtering : Applications in Tracking

Samarjit Das

Abstract—Visual tracking is a very important issue in various applications involving and/or computer vision. The basic idea is to estimate a hidden state sequence from noisy observations under a given state space model. The state space model comprises of a system model and an observation model. The system models that we are going to discuss here has the and the corresponding state space model is basically known as . Due to non-linearities in the system and the observation model, bayesian filtering ( also known as particle filtering ) is an ideal candidate for the purpose of tracking. Particle filtering is basically a non-linear, non-gaussian analog of kalman filter. It is based on sequential Monte Carlo techniques. In this article we are going to discuss Monte Carlo techniques to solve numerical problems, especially to compute posterior distribution which involves complicated . Then we are going to extend these ideas to sequential Monte Carlo techniques and develop leading to particle filtering. We demonstrate applications of particle filtering in areas such as target tracking in a wireless sensor network.

Index Terms—Monte Carlo simulation, sequential Monte Carlo techniques, particle filtering, visual tracking ✦

1 MONTE CARLO METHODS here δ(.) denotes a dirac delta function. Thus fˆX (x) is the discretized PMF (probability mass function) approx- Monte Carlo methods (also referred to as Monte Carlo imation of the continuous pdf fX . Now the can simulation) are a class of computational algorithms that be written as, rely on repeated random to compute their results. These methods are quite often used to simu- b late physical and mathematical systems. They are most I = ψ(x)dx Za suitable for computation by computers as they rely on b ψ(x) repeated computation and random or pseudo random = f (x)dx Z f (x) X numbers. Monte Carlo methods tend to be used when a X it is infeasible or impossible to compute an exact re- ψ(X) I = EfX [ ] sult with deterministic . A simple example is fX (X) numerical computation of complicated integrals using where E [.] denotes the w.r.t the Monte Carlo simulation. fX distribution fX . Now, we can estimate the expectation Consider the following case. Say we want to compute, and thus the integral using the discretized version of ψ(X) b the pdf. Or, Iˆ ≈ E [ ] which ca be computed as fˆX fX (X) I = ψ(x)dx follows, Za b N ψ x ψ(x) 1 where ( ) be a function with a very complicated closed Iˆ = ( δ(x − xi))dx form expression making it impossible to be integrated Z fX(x) N a Xi=1 in a closed form. We can solve this problem numerically N i using Monte Carlo technique. First, consider a random ˆ 1 ψ(x ) Or, I = i (1) N fX(x ) variable X distributed over the interval [a,b] with a Xi=1 probability density function (pdf) fX(x). The closed form expression of fX is known and we can draw samples Using the it can be shown that ˆ w.r.t. fX (.). We draw N (sufficiently large) samples from I → I as N → ∞. Thus by drawing a large number of the pdf, sample in the interval [a,b] we are able to numerically approximate the integral I using Monte Carlo technique. i x ∼ fX (x), i=1, ..., N The process of sampling from the distribution fX can be N extended to technique by which we 1 i fˆX (x) , δ(x − x ) can estimate the properties of a particular distribution , N Xi=1 while only having samples from a different distribution (known as importance density or importance function) • This article was submitted as a term paper for the course ComS 577, Fall rather than the distribution of interest. This will be 2008 at Iowa State University, Ames, IA 50010. discussed in detail while developing the particle filtering algorithm. 3

posterior distribution. But in many practical problems the posterior is highly non-gaussian ( quite often mul- timodal ) together with non-linear φ(.) and g(.). Under such circumstances, sequential Monte Carlo technique based particle filtering algorithm gives us a way to solve this posterior estimation problem. In the next few sections, we shall develop the particle filter starting with Monte Carlo sampling for pdf approximation. We also discuss sequential (especially sequential importance sampling) and how it leads to Fig. 1. This figure shows the hidden markov model structure of particle filtering. the state space. The state sequence xt is a markov process and the observations yt, t = 1, 2, ... are conditionally independent given the state at t 3 DERIVATION OF THE

2 BAYESIAN FILTERING PROBLEM Similar to eq (4), say we are trying to compute the ex- Consider a state-space model with t =0,...,T , pectation w.r.t the joint posterior distribution p(x1:t|y1:t)

xt = φ(xt−1)+ nt (2) Ep(x1:t|y1:t)[f(X1:t)] = f(x1:t)p(x1:t|y1:t)dx1:t =? yt = g(xt)+ vt (3) Z (5) where xt and yt denote the state and the observation at the current instant respectively. In real-life applications, In order to solve this problem, we have to look into for example, the state can be the position of a target two very important issues. a) Do we have a closed while the observation is noisy the sensor data about the form expression for p(x t|y t) ? b) Can we sample from current position and our goal could be to extract the true 1: 1: p(x t|y t) ? Depending on these two issues we can use ‘hidden’ state information using the observations and 1: 1: three different methods two solve this problem namely, the state dynamical model (the system model, eq (2)). simple Monte Carlo sampling, importance sampling and The functions φ(.) and g(.) can be any linear or non- bayesian importance sampling. The method involving linear function. The following things can be assumed bayesian importance is our main focus as it exactly is to be known. The initial state distribution p(x ), the 0 what bayesian filtering/particle does. These methods are state transition density p(x |x ) and the observation t t−1 discussed below. likelihood p(yt|xt). Here, p(.) is used to denote proba- bility density function. The transition density and the observation likelihood can be determined form the know 3.1 Simple Monte Carlo Sampling distributions of i.i.d noise sequences nt and vt. The state space is assumed to follow a hidden markov model Consider the case when we can sample from the joint (HMM). In other words, the state sequence xt is a posterior distribution p(x1:t|y1:t). In that case we can markov process and the observations yt, t = 1, 2, ... are approximate the distribution as a discretized version (as conditionally independent given the state at t. The states done in Sec. 1) in terms of the samples drawn from that are hidden (i.e. not observable); all we can know about distribution. Or, the system is through the observations known at each in- i stant. The of the HMM is shown in Fig. x1:t ∼ p(x1:t|y1:t), i=1, ..., N N 1. Under HMM assumption, p(xt|xt−1,past)= p(xt|xt−1) 1 pˆ(x |y ) ≈ δ(x − xi ) (6) and p(yt|xt,past)= p(yt|xt) e.g. p(yt|xt,yt−1)= p(yt|xt). 1:t 1:t N 1:t 1:t The goal of particle filtering (or bayesian filtering) is Xi=1 to estimate the joint posterior state distribution at time t Now, we can approximate the expectation in the equa- i.e. p(x |y ) or quite often its marginal p(x |y ). Here, 1:t 1:t t 1:t tion (5) as follows, x1:t = {x1, x2, ..., xt}. Finally, we would like to compute

Ep(x1:t|y1:t)[f(X1:t)] ≈ Epˆ(x1:t|y1:t)[f(X1:t)] It = Ep(x |y )(f(Xt)) = f(xt)p(xt|y1:t)dxt t 1:t Z N 1 (4) = f(xi ) N 1:t Xi=1 When the posterior can be assumed to be gaussian and function φ(.) and g(.) to be linear the same problem can This is the most ideal case. In reality, almost always be recursively solved using Kalman filter [1]. To tackle we cannot sample from the posterior but we might have non-linearity of φ and g(.), the Extended Kalman filter [1] a closed form expression for the distribution. This leads can be used, but it still assumes the gaussianity of the to the second method known as importance sampling. 4

3.2 Importance Sampling form expression for the posterior p(x1:t|y1:t). But it can Importance sampling can be used to estimate the prop- be expressed in the following manner, erties of a particular distribution, while only having p(x1:t,y1:t) p(x1:t|y1:t) = thus samples from a different distribution (known as impor- p(y1:t) tance density or importance function) rather than the p(x1:t|y1:t) ∝ p(x1:t,y1:t) (8) distribution of interest. Thus when we have a closed It turns out that we can at least compute p(x1:t,y1:t) form expression for p(x1:t|y1:t) but cannot sample from it, we use an importance density to draw the samples. in closed form recursively. This follows from the We choose the importance density such that it has a hidden markov model (HMM) assumption on the convenient closed form expression and can easily draw state space. The recursion can be performed as samples from it. Say, this importance density be given p(x1:t,y1:t) = p(x1:t−1,y1:t−1)p(yt|xt)p(xt|xt−1) with p(yt|xt) and p(xt|xt−1) known. We shall utilize this as π(x1:t|y1:t). Now, the problem of posterior expectation computation can be solved as follows, fact to solve the problem of computing the posterior expectation in a recursive fashion i.e. starting with a known p(x0) and using p(yt|xt), p(xt|xt−1), keep It = f(x1:t)p(x1:t|y1:t)dx1:t Z computing the approximate posterior pˆ(x1:t|y1:t) and

p(x1:t|y1:t) Ep(x1:t|y1:t)[f(X1:t)] for t> 0. = f(x1:t) π(x1:t|y1:t)dx1:t Z π(x1:t|y1:t) The bayesian importance sampling is performed as follows. The posterior expectation Ep x y [f(X1:t)] p(X1:t|Y1:t) ( 1:t| 1:t) = Eπ(x1:t|y1:t)[f(X1:t) ] can be written as, π(X1:t|Y1:t) I = f(x )p(x |y )dx Now, draw samples from π(.) and get its discretized t Z 1:t 1:t 1:t 1:t version πˆ(.). Then the computation of posterior expecta- p(x1:t,y1:t) tion boils down to, = f(x1:t) dx1:t Z p(y1:t) i f(x )p(x ,y )dx x1:t ∼ π(x1:t|y1:t), i=1, ..., N 1:t 1:t 1:t 1:t = R p(X1:t|Y1:t) p(y1:t) Ep(x1:t|y1:t)[f(X1:t)] ≈ Eπˆ(x1:t|y1:t)[f(X1:t) ] π(X |Y ) f(x1:t)p(x1:t,y1:t)dx1:t 1:t 1:t = (9) N R i x p(x1:t,y1:t)dx1:t 1 i p(x1:t|y1:t) 1:t = f(x1:t) i (7) R N π(x |y t) Now, let us consider the importance density to be Xi=1 1:t 1: π(x1:t|y1:t) from which we are going to draw samples. The corresponding approximation to the joint poste- We choose π(.) in such a way that we at least recursively rior distribution is given as follows, know how to compute the expression for π(x1:t|y1:t). The importance density π(.) is assumed to have certain p(x1:t|y1:t) ≈ pˆ(x1:t|y1:t) properties which are crucial to the development of the N i i recursive algorithm for approximating the posterior dis- = wtδ(x1:t − x1:t) where tribution and expectation. These will be discussed soon. X i=1 I i Now, let us get back to the problem of computing t. i 1 i i p(x1:t|y1:t) From equation (9) it follows that wt = w˜t, with w˜t = i N π(x |y1:t) 1:t p(x1:t,y1:t) f(x1:t) π(x1:t|y1:t)dx1:t π(x1:t|y1:t) It = Thus we can see that even if we cannot sample from R p(x1:t,y1:t) π(x1:t|y1:t)dx1:t the joint posterior, still we can get around the problem x1:t π(x1:t|y1:t) R by performing importance sampling from a conveniently p(X1:t ,Y1:t) Eπ(.)[f(X1:t) π(X |Y ) ] chosen π(.). = 1:t 1:t (10) E [ p(X1:t,Y1:t) ] π(.) π(X1:t|Y1:t) i 3.3 Bayesian Importance Sampling Now we can sample from π(.) as, x1:t ∼ π(x1:t|y1:t), i = 1,...,N and then get its discretized version πˆ(.). It turns out that in most real-life situations, we can Finally, It can be estimated as follows, neither sample from the posterior (i.e. p(x1:t|y1:t)) nor p(X1:t,Y1:t) E [f(X t) ] we have any closed form expression of it. Thus com- πˆ(.) 1: π(X1:t|Y1:t) Iˆt = puting the integral in equation (5) becomes challeng- p(X1:t,Y1:t) Eπˆ(.)[ π X Y ] ing. Bayesian importance sampling is a variant of the ( 1:t| 1:t) 1 N i i i standard importance sampling technique to tackle this N i=1 f(x1:t)˜wt i p(x1:t,y1:t) = N j , w˜t = i P1 π(x |y t) problem. The numerical methods developed for solving N j=1 w˜t 1:t 1: this problem leads to sequential importance sampling N P i (SIS) and then finally to particle/bayesian filtering. Now, i i i w˜t = f(x1:t)wt, with wt = N j (11) under the problem statement we do not have a closed Xi=1 j=1 w˜t P 5

i i i i i The corresponding approximation to the joint poste- a) Sample xt ∼ p(xt|xt−1) and x1:t = [x1:t−1, xt]. i i i rior distribution is given as, Compute weights as, w˜t =w ˜t−1p(yt|xt) i N b) Get the normalized importance weights wt N i i , i i and finally, pˆ(x1:t|y1:t)= wtδ(x1:t − x t) pˆ(x1:t|y1:t) wtδ(x1:t − x1:t) (12) i=1 1: Xi=1 3) Set t +1 ← t and go back to stepP 2. i where {wt}∀i are called the normalized importance The SIS algorithm gives us a way to recursively es- i weights. Now, clearly wt’s cannot be computed directly timate the discretized posterior distribution. But this at a given instant. So, we develop a recursive way to algorithm has one major problem which makes it almost compute them. Once this is done, we have a compu- ineffective for practical applications. It turns out that the tationally feasible method for computing the posterior of the importance weights conditioned on the distribution. In order to do that we first make sure the observations only increases over time. As a result, after following assumption holds for the importance density a few iterations, only a few samples (also called particles) π(x1:t|y1:t). will have non-zero normalized importance weights. This is known as degeneracy of weights and it ends up π(x1:t|y1:t)= π(x1:t−1|y1:t−1)π(xt|x1:t−1,y1:t) (13) wasting a lot of particles making it very inefficient. The particle filter comes into picture to solve this problem. Since p(x1:t,y1:t)= p(x1:t−1,y1:t−1)p(yt|xt)p(xt|xt−1), we can develop a recursive way of computing the impor- The basic particle filtering algorithm is discussed in the tance weights as, next section. i i p(x1:t,y1:t) w˜t = i 3.5 The Basic Particle Filter π(x1:t|y1:t) i i i i p(yt|xt)p(xt|xt−1) The particle filter basically modifies the SIS algorithms =w ˜t−1 i i where, to prevent the degeneracy of weights. In order to do π(xt|x1:t−1,y1:t) that at each time step, the particles (i.e. the samples) are xi ∼ π(x |xi ,y ) and xi = [xi , xi] t t 1:t−1 1:t 1:t 1:t−1 t resampled w.r.t their normalized importance weights. (14) i N In other words, {wt}i=1 is used as a probability mass Thus we can recursively compute the estimates of function (PMF) to sample the existing particles again. It xi ∼ P MF [{w }] the posterior distribution and the corresponding expec- is denoted as t t and the new importance wi = 1 tations starting with the initial distribution. The entire weights are assigned as t N . The basic particle algorithm is summarized in the next section which is filtering algorithm is summarized below. known as sequential importance sampling (SIS). Particle Filtering (PF) Algorithm 3.4 Sequential Importance Sampling (SIS) i 1) Sample from the initial distribution. x0 ∼ p(x0), Before we give the algorithm for SIS, it is important to i i 1 assign w˜0 = w0 = N , i =1, ..., N choose the importance density π(.). There could be many 2) For t> 0, choices. The simplest one is to use the state transition i i a) Sample xt ∼ p(xt|xt−1) and Compute weights density as the importance density [2]. Or, i i as, w˜t = p(yt|xt), i =1, ..., N i π(xt|x1:t−1,y1:t) = p(xt|xt−1) (15) b) Get the normalized importance weights wt as wi i i i wi ˜t This gives, w˜t =w ˜t p(yt|xt) (16) t = N j −1 j=1 w˜t P xi P MF w The optimal importance density [3], [4] is the one which c) Resample the particles as, t ∼ [{ t}] wi = 1 minimized the variance of the importance weights and reassign t N d) Get the estimated posterior as, pˆ(x1:t|y1:t) = conditioned on the observations and previous state N 1 i i i i δ(x1:t − x ) where x = [x , x ] samples. It can be shown that πopt(.) = p(xt|xt−1,yt) i=1 N 1:t 1:t 1:t−1 t e)P Finally, compute I = E [f(X )] ≈ under the hidden markov model (HMM) assumption. t p(xt|y1:t) t 1 N i The derivation of πopt(.) is beyond the scope of this Epˆ(xt|y1:t)[f(Xt)] = N i=1 f(xt) article. All our discussions will stick to the case where 3) Set t +1 ← t and go backP to step 2. transition density is used as the importance density (given in equation (15)). Finally, the recursive algorithm 3.5.1 A Simple PF Example for estimating the posterior density is summarized Consider a very simple linear state space model, below. 2 xt = 0.5xt−1 + wt, wt ∼ N (0, σw) SIS Algorithm 2 yt = 0.4xt + vt, vt ∼ N (0, σ ) i v 1) Sample from the initial distribution. x0 ∼ p(x0), i i 1 2 assign w˜0 = w0 = N , i =1, ..., N Given initial distribution p(x0) = N (0, σ0 ). Here 2) For t> 0 and i =1, ..., N, N (µ, σ2) denotes a gaussian distribution with µ 6

200 and variance σ2. The algorithm with N = 1000 particles 180 is implemented as follows, 160 1) Sample xi ∼ N (0, σ2), i =1,...,N 0 0 140 2) For t> 0, i i 2 120 a) Sample xt ∼ N (0.5xt−1, σw), i =1,...,N (y −0.4xi )2 100 − t t i 2σ2 b) Compute weight : w˜t ∝ e v 80

c) Compute the normalized importance weights 60 i i w˜t wt = N j 40 j=1 w˜t P i 20 d) Resample the particles as, xt ∼ P MF [{wt}] i 1 i i i w x x , x 0 and reassign t = N , 1:t = [ 1:t−1 t] 0 50 100 150 200 e) Compute pˆ(x |y )= N wiδ(x − xi ) 1:t 1:t i=1 t 1:t 1:t Fig. 2. This figure shows target tracking in a wireless sensor P 1 N i and It ≈ Epˆ(xt|y1:t)[f(Xt)] = N i=1 f(xt) network. The blue path denotes the ground truth i.e. the true 3) Set t +1 ← t and go back to step 2.P trajectory and the red × denote the tracked version. The blue, magenta and green circular patches are the sensors. The green color indicates a leader node while a magenta color indicatesa 4 APPLICATIONS : TARGET TRACKINGIN neighboring node. SENSOR NETWORK Consider a network coverage area of 200m × 200m with sampling followed by bayesian importance sampling. 200 randomly distributed sensors. The track-length was We have seen that particle filtering can be used in of 60 times steps. Let Xt = [xt,yt] be the target state problems with non-linear state-space together with non- (i.e the position of the target in the network) at time t gaussian and multimodal posterior for which we do not with corresponding velocity Vt = [vx,t, vy,t]. The state have a closed form expression. Such algorithms can give dynamics equation used was, us a discretized estimate of the posterior distribution in terms the particles (i.e. samples) and their corresponding xt = xt−1 + vx,t−1 and yt = yt−1 + vy,t−1 weights. This also enables us to approximate the poste- v = v + e and v = v + e (17) x,t x,t−1 1,t y,t y,t−1 2,t rior expectation of any function (equation (5)). We have And the sensor model (i.e. the measure- demonstrated the working of a basic PF algorithm for ment/observation model) was, a simple scalar case. Finally, we give application of the particle filtering for target tracking in a wireless sensor x − x z = tan−1( t s )+ e , network. Particle filters has numerous other applications t y y t t − s in areas such as image processing and computer vi- where (xs,ys) is the position of the current sensor leader. sion, robotics, finance and even for spacecraft trajectory The system and observation noises used were as fol- tracking. In computer vision, PF has been successfully lows: ei,t ∼ N (0, 0.22), i = 1, 2 and et ∼ N (0, 0.052) used to track deforming shape sequences leading to were assumed to be white, stationary and independent articulated body tracking. It has also found applications of each other. The ground truth (i.e. the original track) for deformable contour tracking, also know as conden- was generated by assuming the initial position of the sation [5]. More about bayesian/particle filtering and its target to be [5, 5] and corresponding initial velocity to be applications can be found in [2], [4], [3]. [2, 1.5]. While implementing tracking we assumed that we have an estimate of the initial position of the target. REFERENCES The current sensor leader was chosen using the mini- [1] G. Welch and G. Bishop, “An introduction to Kalman Filters,” mum distance criteria w.r.t the current target location. A SIGGRAPH, 2001. total of 4000 particles were used. At each step, using the [2] N. Gordon, D. Salmond, and A. Smith, “Novel approach to non- observations of the current sensor leader, we estimate linear/nongaussian bayesian state estimation,” IEE Proceedings-F (Radar and ), pp. 140(2):107–113, 1993. the state of the target by averaging over the resampled [3] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial particles. The results are shown in Fig. 2. on particle filters for on-line non-linear/non-gaussian bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, pp. 174–188, Feb. 2002. 5 CONCLUSION [4] A. Doucet, “On sequential monte carlo sampling methods for bayesian filtering,” in Technical Report CUED/F-INFENG/TR. 310, This article gives an overview of sequential Monte Carlo Cambridge University Department of , 1998. techniques leading to bayesian/particle filtering. We [5] M. Isard and A. Blake, “Condensation: Conditional Density Prop- agation for Visual Tracking,” Intl. Journal of Comp. Vision, pp. 5–28, have shown how Monte Carlo techniques can be effi- 1998. ciently used to compute numerical solutions to problems which are often impossible to solve in closed form. We have derived the particle filtering algorithm starting with simple Monte Carlo sampling and then importance