The Particle Filtering and Their Applications
Total Page:16
File Type:pdf, Size:1020Kb
The particle ltering and their applications Oppenheim, Georges a Philippe, Anne b de Rigal, Jean c aUniversité Orsay (Equipe de Probabilité et Statistiques. Laboratoire de Mathématiques. bat 425. Centre d'Orsay. 91405 Orsay Cedex. France [email protected]), et Université de Marne la Vallée bUniversité Nantes, Laboratoire de mathématiques Jean Leray UMR CNRS 6629, 2 rue de la Houssinière - BP 92208 - F-44322 Nantes Cedex 3 cL'Oréal Recherche. Chevilly Larue. France Abstract Particle ltering is a Monte - Carlo simulation method designed to approximate non linear lters that estimate and track the state of a dynamic system. We present the general principle of these algorithms and show the wide domain of applications using some examples. Key words: Dynamic system, Monte Carlo method, Non-linear ltering, Particle lter, Sequential Bayesian ltering. 1 Introduction, Examples and Problems 1.1 The goal In chemometrics, research relating to time varying systems is ongoing, as re- cent articles show Chen et al (2004); Chen et al (2007); Shen et al (2006). Whether they are autonomous or controlled, these systems account for dy- namic evolutions in domains as varied as signal processing, chemistry of oil or pharmaceuticals, biology, surveillance. The simplest situations are linear. These are well understood, both theoretically and practically. The key word for these is 'the Kalman Filter' (around 1960) where realistic situations are modelled by linear equations. The lter consists in estimating the conditional distribution of the partially observed state of a stochastic process from a sample path. The Kalman Filter solves this exactly and quickly for linear Gaussian dynamics. Outside these Preprint submitted to Elsevier March 16, 2007 cases, and in non-linear situations, other methods such as the particle meth- ods are available. These are based on the Monte-Carlo simulation which ap- proaches and estimates the conditional distribution (weak approximation). They propagate a particle system over time. These domains are currently lead- ing to publication of numerous articles where the theoretical and practical as- pects are raised. See, for example, books Cappé et al (2005); Del Moral (2004); Doucet et al (2001) or articles Doucet et al (2000); Le Gland and Oudjane (2004); Kong et al (1994); Del Moral et al (2006); Crasan and Doucet (2002). This list is not exhaustive, see http://www-sigproc.eng.cam.ac.uk/smc/ for complementary references. 1.2 Dynamic system 1.2.1 Linear dynamic system This is the representation of a physical system that evolves over time k. Evolu- tion of its state (Xk) is interesting and we wish to estimate and predict it. The reference for these systems is the linear model described by a state equation and a measurement equation: for any k ∈ N X State equation Xk = FkXk−1 + k Y Measurement equation Yk = GkXk + k the state noise X and the measurement noise( Y are independent, • (εk )k∈N εk )k∈N standard white noise (sequence of uncorrelated random variables with zero mean). The noise ( Y is independent of the state . εk ) (Xk) • the distribution of X0, called initial or prior distribution, is uncorrelated with the processes X and Y . (εk )k∈N (εk )k∈N • unknown parameters may be present in Fk,Gk and in characteristics of white noises X and Y . (εk )k∈N (εk )k∈N The rst equation is that of a Markov process called state process, (Xk)k∈N which is not completely measured and that we would like to estimate. Infor- mation on Xk comes from measurements Yk. We should remember that a Markov process is a stochastic process for which prediction of the future based on the present-and-the-past does not require the knowledge of the past. In other terms, conditional distribution Xn+1 given past states (Xj)j≤n is a function of Xn alone: P (Xn+1 ∈ A|X0,X1,X2,...,Xn) = P (Xn+1 ∈ A|Xn). The second equation gives information on state Xk based on the observed process . For a given state, we suppose that the random variables (Yk)k (Yk)k∈N 2 are independent. The theory is greatly simplied by these hypotheses on dependence structures. 1.2.2 Non-linear dynamic system Formulation of a non-linear system, modelled on the linear model is as follows: X = f(k, X , X , ϑX ) State equation k k−1 k (1) Y Y Measurement equation Yk = g(k, Xk, k , ϑ ) where ϑX and ϑY are unknown parameters. More generally the dynamic sys- tem can be described by two sequences of operators: (1) The sequence of operators describes evolution of states over (Qk)k∈N Xk time. Therefore, the operator Qk is a Markov transition kernel Qk(x, dy) = P (Xk+1 ∈ dy|Xk = x) (2) (2) a sequence of operators (called likelihood operators) that de- (ψk)k∈N scribes the conditionnal distribution of observations given the (Yk)k∈N state (Xk)k∈N ψk(x) = P (Yk|Xk = x) (3) These formulae may appear abstract but they cover numerous examples, both simple and complicated, that can all be studied in the same way. We give two classical statistical models that can be expressed in the form of a dynamic system. • The linear regression model: Y = θX + ε where Y represents the observa- tions, θ the parameter (or state) to be estimated and X the explanatory variables, can be expressed as a linear dynamic model by taking θk = θk−1 State equation Yk = θkXk + k Measurement equation • The linear dierence equation models as autoregressive processes. Let us take the example of an AR(3), which is a solution for the equation Yk = a1Yk−1 + a2Yk−2 + εk It can be described by the following system 0 1 0 Yk−2 Xk+1 = Xk + where Xk = a2 a1 k Yk−1 Yk = 0 1 Xk+1. 3 1.2.3 An example of a highly non-linear function: Biomass and sheries, Campillo et al (2006) This is an estimation of sh stocks and evaluation of the impacts of capture on the animal biomass Bt. Let It be an index of abundance measured each year t. We get a quite complex non-linear function Bt = F (Bt−1, θ) exp(σW Wt) State equation It = a5Bt exp(σV Vt) Measurement equation where • F is an AR(3) type operator F (Bt−1, θ) = St−1Bt−1 − a2St−1St−2Bt−2 + a3(1 − a4St−1) −1 and St = (Bt − Ct)Bt • the captured animal bio-mass sequences (Ct) during the year t and (It) are observed • (Wt) and (Vt) are two independent sequences of Gaussian white noise. • the constants aj are estimated by the algorithm. These equations show the type of problem that can be raised by particle methods. Here the diculty comes from the fact that the estimation is based on few noisy measurements taken over a long period of time. 2 How is it done? 2.1 The Kalman lter The Kalman lter is the optimal lter for Gaussian linear dynamical mod- els (see Anderson and Moore (1979) for example). Because of its Markovian character, we can compute the conditional distributions given the state by recurrence: 1. Forecast step p (xt/y0, . , yt−1) ← p (xt−1/y0, . , yt−1) 2. Filtering step p (xt/y0, . , yt) ← p (xt/y0, . , yt−1) and yt 4 In a Gaussian context, these formulae are reduced to conditional expectations and variances E (xt/yu, yu−1, . , y1) and Var (xt/yu, yu−1, . , y1) by taking u = t − 1 for the forecast step, and u = t for the ltering step. There are numerous extensions to Kalman ltering, for example the extended Kalman lter (EKF) for non-linear models. The method is based on techniques of linearisation of the function around the current estimate. This lter is ef- fective if a reference trajectory is available and if the model is almost linear in the proximity of the reference trajectory or of the target. Theoretical proofs of stability are recent. Outside the Gaussian domain, the problem can no longer be reduced to the computation of the two rst moments; the conditional distributions P (Xk/Yk = yk, ..., Y0 = y0) must be calculated or estimated. These probability distribu- tions describe what is interesting about the unknown state Xk, when the ob- servations yk, ..., y0 are known. The question is then: how can the conditional distribution P (Xk/Yk = yk, ..., Y0 = y0) be calculated and how does it evolve with time k as the observations are obtained? 2.2 Monte Carlo Method Monte Carlo methods are simulation techniques for approximating distribu- tions or functionals of the distribution for example expectation and variance. Two large complementary families of algorithms are distinguished: • Markov Chains for Monte Carlo (MCMC) techniques, • Monte-Carlo ltering techniques called particle ltering. 2.2.1 MCMC MCMC algorithms are iterative methods for simulating a sample whose dis- tribution is the target π (possibly known up to a constant). The central tool is simulating the Markov chains with stationary distribution π. The simulated trajectories are then used to estimate integrals of the form ∫ h(x)π(x)dx (see Robert and Casella (2004)). These methods are widely used in Bayesian statistics to approximate the posterior distributions when the models' parameters are assumed to be con- stant over time (see Robert (2001)). The MCMC approximation can also be considered for problems of estimations linked to dynamic systems. For these 5 models, MCMC algorithms give an estimation of the conditional distribution P (X1,...,Xk/Yk = yk, ..., Y0 = y0) and marginal distribution P (Xt, /Yk = yk, ..., Y0 = y0) for all t = 1, . , k. This is the classic situation of smoothing. These MCMC algorithms are not suited to dynamic models because the dis- tribution must be entirely recalculated each time that a new observation is obtained.