Bayesian Inference for Dynamic Models with Dirichlet Process Mixtures Francois Caron, Manuel Davy, Arnaud Doucet, Emmanuel Duflos, Philippe Vanheeghe
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian Inference for Dynamic Models with Dirichlet Process Mixtures Francois Caron, Manuel Davy, Arnaud Doucet, Emmanuel Duflos, Philippe Vanheeghe To cite this version: Francois Caron, Manuel Davy, Arnaud Doucet, Emmanuel Duflos, Philippe Vanheeghe. Bayesian Inference for Dynamic Models with Dirichlet Process Mixtures. 9th IEEE International Conference on Information Fusion, 2006, Florence, Italy. inria-00119993 HAL Id: inria-00119993 https://hal.inria.fr/inria-00119993 Submitted on 12 Dec 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Bayesian Inference for Dynamic Models with Dirichlet Process Mixtures Francois Caron1, Manuel Davy1, Arnaud Doucet2, Emmanuel Duflos1, Philippe Vanheeghe1 1 LAGIS (UMR CNRS 8146), Ecole Centrale de Lille BP48, Villeneuve d’Ascq, France fi[email protected] 2 Department of Statistics and Department of Computer Science, University of British Columbia, Vancouver, Canada, [email protected] Abstract - Using Kalman techniques, it is pos- and will assume that {wt} is Gaussian. However, our sible to perform optimal estimation in linear algorithms can be extended straightforwardly to the Gaussian state-space models. We address here case where the observation noise pdf should also be the case where the noise probability density estimated. functions are of unknown functional form. A flexible Bayesian nonparametric noise model Our methodology relies on the introduction of a based on mixture of Dirichlet processes is in- Dirichlet Process Mixtures (DPM) for the state noise troduced. Efficient Markov chain Monte Carlo pdf. DPM is a flexible Bayesian nonparametric model and Sequential Monte Carlo methods are then which have become very popular in statistics over the developed to perform optimal estimation in last few years to perform nonparametric density es- such contexts. timation. To the best of our knowledge, this power- Keywords: Bayesian nonparametrics, Dirichlet Pro- ful class of models has never been used in the context cess Mixture, Monte Carlo Markov Chain, Rao- of dynamic models. We demonstrate here that such Blackwellisation, Particle filter. tools can help to improve drastically the performance of standard algorithms when the noise pdfs are un- known. 1 Introduction The rest of this paper is organized as follows. In Let us consider the following dynamic linear model Section 2, we recall the basics of Bayesian nonparamet- ric density estimation and in Section 3 we derive the xt = Ftxt−1 + Ctut + Gtvt (1) dynamic model with unknown noise distribution. In Section 4, we develop an efficient Markov chain Monte zt = Htxt + wt (2) Carlo algorithm to perform optimal estimation in the where xt is the hidden state vector, zt is the observa- batch case. In Section 5, we develop an efficient Se- tion, vt and wt are sequence of indenpendent random quential Monte Carlo/Particle filter to perform optimal variables. Ft and Ht are the known state and obser- estimation in the sequential case. All these algorithms vation matrices, ut is a known input, Ct the input can be interpreted as generalized Rao-Blackwellised transfert matrix and Gt is the state transfert matrix. methods. Finally, we demonstrate our algorithms on For any sequence {at}, we write ai:j for two applications: blind deconvolution of impulse pro- (ai, ai+1, ..., aj). We are here interested in estimating cesses and robust regression. the hidden state xt given the observations z1:t (filter- ing) or z1:T for T ≥ t (smoothing). If the noise se- quences are assumed Gaussian with known parameters, optimal estimation can be performed using the Kalman filter/smoother. However, these Gaussian assumptions can be very inapproriate for many real-world models. 2 Bayesian nonparametric den- In this paper, we address the problem of optimal state estimation when the probability density functions (pdf) sity estimation of the noise sequences are unknown and need to be es- timated on-line or off-line from the data. To simplify the presentation, we will limit ourselves here to the es- We review here briefly modern Bayesian tools to per- timation of the pdf of the state noise sequence {vt} form nonparametric density estimation. 2.1 Density estimation where G ∼ DP (G0, α) then the posterior distribution of G|θ1:n is also a DP Let y1, ..., yn be a statistically exchangeable sequence n distributed according to the pdf F α 1 G|θ ∼ DP ( G + δ , α + n) 1:n α + n 0 α + n θk y ∼ F (·). k=1 k X We are interested here in estimating F and we consider Moreover, it can be shown that the predictive distribu- G the following nonparametric model tions, computed by integrating out the RPM , admits the following Polya urn representation [3] G n F (y) = f(y|θ)d (θ) (3) 1 α Θ θ |θ ∼ δ + G . Z n+1 1:n α + n θk α + n 0 k where θ ∈ Θ is called the latent variable or cluster vari- X=1 able, f(·|θ) is the mixed pdf and G is the mixing dis- tribution. Under a Bayesian framework, it is assumed 2.3 Dirichlet Process Mixture G that is a Random Probability Measure (RPM) dis- Using these modelling tools, it is now possible to refor- tributed according to a prior distribution (i.e., a dis- mulate the density estimation problem using the fol- tribution over the set of probability distributions). We lowing hierarchical model known as DPM [4] will select here the RPM to follow a Dirichlet Process (DP) prior. G ∼ DP (G0, α, ), θk|G ∼ G 2.2 Dirichlet Processes yk|θk ∼ f(·|θk) Ferguson [1] introduced the Dirichlet Process (DP) as The objective of density estimation boils down to es- a probability measure on the space of probability mea- timating the posterior distribution p(θ1:n|y1:n). Al- sures. Given a probability measure G0 on a (measur- though DPM were introduced in the 70’s, these mod- able) space (T , A) and a positive real number α, a els were too complex to handle numerically before the probability distribution G distributed according to a introduction of MCMC [5, 6]. They have now become DP of base distribution G0 and scale factor α, denoted extremely popular. G ∼ DP (G0, α), satisfies for any partition A1, ..., Ak k of T and any 3 Dynamic Linear Model with G G G G ( (A1), ..., (Ak)) ∼ D ( 0(A1), ..., 0(Ak), α) Unknown Noise Distribution where D is a standard Dirichlet distribution. Sethura- In this article, we are interested in the class of models man [2] established that the realizations of a Dirichlet (1)-(2) where {wt} are distributed according to wt ∼ process are discrete with probability one and admit the N (0,Rt) whereas the pdf of {vt} is unknown. We so-called stick-breaking representation adopt a flexible Bayesian nonparametric model for this ∞ noise pdf. {vt} is supposed to be distributed according G to a DPM of base mixed distribution N (µt, Σt), scale = πkδθk (4) k=1 parameter α and Normal-inverse Wishart base distri- X bution G0 [7] denoted G0 = NIW (µ0, κ0, ν0, Λ0) with k−1 G µ0, κ0, ν0, Λ0 are hyperparameters that are assumed to with θk ∼ 0, πk = βk j=1 (1 − βj) and βk ∼ B(1, α) where B denotes the Beta distribution and δθ denotes be known and fixed. To summarize, we have the fol- Q k the Dirac delta measure located in θk. Using (3), it lowing model comes that the following flexible prior model is adopted G DP G , α , for the unknown distribution F ∼ ( 0 ) (µt, Σt) ∼ G, ∞ i.i.d. vt ∼ N (µt, Σt). F (y) = πkf(y|θk). k X=1 This model is much more flexible than a standard mix- ture of Gaussians. It is equivalent to say that v is sam- Apart from its flexibility, a fundamental motivation t pled from a fixed but unknown distribution F which to use the DP model is the simplicity of the posterior admits the following representation update. Let θ1, .., θn be n random samples from G i.i.d. F (vt) = N (vt; µ, Σ)dG(µ, Σ). (5) θk|G ∼ G Z F is a countable infinite mixture of Gaussians pdf of Thus p(θt|z1:T , θ−t) is proportional to unknown parameters, and the mixing distribution G is sampled from a Dirichlet process. We will denote T p(z1:T |θ1:T ) × δθ (θt) + αG0(θt) . θt = {µt, Σt} the latent cluster variables giving the k k ,k t mean and covariance matrix for that cluster. =1X6= The MCMC algorithms available in the literature to We can sample from this density with a Metropolis- estimate these Bayesian nonparametric models – e.g. Hastings step, where the candidate pdf is the condi- [5, 6]– do not apply in our case because we do not ob- tional prior p(θt|θ−t). This is given by algorithm 2. serve the sequence {vt} but only the observations {zt} generated by the dynamic model (1)-(2). In the fol- Algorithm 2 Metropolis-Hastings step to sample from lowing sections, we propose two strategies to perform p(θt|z1:T , θ−t) inference in this more complex framework. Despite • Sample a candidate cluster the complexity of the model, we develop efficient com- putational methods based on a Rao-Blackwellisation T (i)∗ 1 α approach.