Sequential Monte Carlo Methods for System Identification (A Talk About
Total Page:16
File Type:pdf, Size:1020Kb
Sequential Monte Carlo methods for system identification (a talk about strategies) \The particle filter provides a systematic way of exploring the state space" Thomas Sch¨on Division of Systems and Control Department of Information Technology Uppsala University. Email: [email protected], www: user.it.uu.se/~thosc112 Joint work with: Fredrik Lindsten (University of Cambridge), Johan Dahlin (Link¨opingUniversity), Johan W˚agberg (Uppsala University), Christian A. Naesseth (Link¨opingUniversity), Andreas Svensson (Uppsala University), and Liang Dai (Uppsala University). [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Introduction A state space model (SSM) consists of a Markov process x ≥ f tgt 1 that is indirectly observed via a measurement process y ≥ , f tgt 1 x x f (x x ; u ); x = a (x ; u ) + v ; t+1 j t ∼ θ t+1 j t t t+1 θ t t θ;t y x g (y x ; u ); y = c (x ; u ) + e ; t j t ∼ θ t j t t t θ t t θ;t x µ (x ); x µ (x ); 1 ∼ θ 1 1 ∼ θ 1 (θ π(θ)): (θ π(θ)): ∼ ∼ Identifying the nonlinear SSM: Find θ based on y y ; y ; : : : ; y (and u ). Hence, the off-line problem. 1:T , f 1 2 T g 1:T One of the key challenges: The states x1:T are unknown. Aim of the talk: Reveal the structure of the system identification problem arising in nonlinear SSMs and highlight where SMC is used. 1 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Two commonly used problem formulations Maximum likelihood (ML) formulation { model the unknown pa- rameters as a deterministic variable and solve θML = arg max pθ(y1:T ): θ2Θ b Bayesian formulation { model the unknown parameters as a ran- dom variable θ π(θ) and compute ∼ p(y1:T θ)π(θ) pθ(y1:T )π(θ) p(θ y1:T ) = j = : j p(y1:T ) p(y1:T ) The combination of ML and Bayes is probably more interesting than we think. 2 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Central object { the likelihood The likelihood is computed by marginalizing the joint density T T −1 p (x ; y ) = µ (x ) g (y x ) f (x x ); θ 1:T 1:T θ 1 θ t j t θ t+1 j t t=1 t=1 Y Y w.r.t. the state sequence x1:T , pθ(y1:T ) = pθ(x1:T ; y1:T )dx1:T : Z We are averaging pθ(x1:T ; y1:T ) over all possible state sequences. Equivalently we have T T p (y ) = p (y y − ) = g (y x ) p (x y − ) dx : θ 1:T θ t j 1:t 1 θ t j t θ t j 1:t 1 t t=1 t=1 Y Y Z key challenge 3 / 14 [email protected] IFAC Symposium on System Identification (SYSID),| Beijing,{z China, October} 20, 2015. Sequential Monte Carlo The need for computational methods, such as SMC, is tightly coupled to the intractability of the integrals on the previous slide. SMC offers numerical approximations to state estimation problems. The particle filter and the particle smoother maintain empirical approximations N N i i pθ(xt y1:t) = w δ i (xt); pθ(x1:t y1:t) = w δ i (x1:t): j t xt j t x1:t Xi=1 Xi=1 Convergeb to the true distributionsb as N . −! 1 4 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. The use of SMC in nonlinear sysid SMC can be used to approximately 1. Compute the likelihood and its derivatives. 2. Solve state smoothing problems, e.g. compute p(x y ). 1:T j 1:T 3. Simulate from the smoothing pdf, x p(x y ). 1:T ∼ 1:T j 1:T e These three capabilities are key components in realizing various nonlinear system identification strategies. 5 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Identification strategies { overview Marginalisation Deal with x1:T by marginalizing (integrating) them out and view θ as the only unknown quantity. Frequentistic formulation: Prediction Error Method (PEM) • and direct maximization of the likelihood. Bayesian formulation: the Metropolis Hastings sampler. • Data augmentation Deal with x1:T by treating them as auxiliary variables to be estimated along with θ. Frequentistic formulation: Expectation Maximization (EM) • algorithm. Bayesian formulation: the Gibbs sampler. • In marginalization we integrate over x1:T . In data augmentation we simulate x1:T . 6 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Marginalization { Metropolis Hastings Metropolis Hastings (MH) is an MCMC method that produce a sequence of random variables θ[m] by iterating f gm≥1 1. Propose a new sample θ0 q( θ[m]). 2. Accept the new sample with∼ probability· j 0 0 pθ0 (y1:T )π(θ ) q(θ[m] θ ) α = min 1; 0 j p (y1:T )π(θ[m]) q(θ θ[m]) θ[m] j The above procedure results in a Markov chain θ[m] f gm≥1 with p(θ y ) as its stationary distribution! j 1:T SMC used to approximate the likelihood pθ(y1:T ). 7 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Exact approximations and PMCMC Exact approximation: p(θ y1:T ) is recovered exactly, despite the fact that we employ an SMCj approximation of the likelihood using a finite number of particles N. This is one of the members in the particle MCMC (PMCMC) family introduced by Christophe Andrieu, Arnaud Doucet and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010. The idea underlying PMCMC is to make use of SMC algorithms to propose (simulate) state trajectories x1:T . These state trajectories are then used within standard MCMC algorithms. 1. Particle Metropolis Hastings 2. Particle Gibbs 8 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Data augmentation { Gibbs sampling Intuitively the data augmentation strategy amounts to iterating between updating x1:T and θ. Gibbs sampling amounts to sequentially sampling from conditionals of the target distribution p(θ;x y ). A (blocked) example: 1:T j 1:T Draw θ[m] p(θ x [m 1]; y ); OK! • ∼ j 1:T − 1:T Draw x [m] p(x θ[m]; y ). Hard! • 1:T ∼ 1:T j 1:T Problem: p(x θ[m]; y ) not available! 1:T j 1:T SMC is used to simulate from the smoothing pdf p(x y ). 1:T j 1:T 9 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Data augmentation { EM The expectation maximization algorithm is an iterative approach to compute ML estimates of unknown parameters (θ) in probabilistic models involving latent variables (the state trajectory x1:T ). EM works by iteratively computing (θ;θ ) = log p (x ; y )p (x y )dx Q k θ 1:T 1:T θk 1:T j 1:T 1:T Z and then maximizing (θ;θ ) w.r.t. θ. Q k Problem: The E-step requires us to solve a smoothing problem, i.e. to compute an expectation under p (x y ). θk 1:T j 1:T EM MCEM PSEM SMC is used to approximate the smoothing pdf p (x y ). θk 1:T j 1:T 10 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Combined ML and Bayesian approach EM MCEM PSEM SAEM Markov SAEM PSAEM In stochastic approximation EM (SAEM) (θ;θ ) is replaced by Q k (θ) = (1 γ ) − (θ) + γ log p (x [k]; y ); Qk − k Qk 1 k θ 1:T 1:T where x [k] denotes a sample from p (x y ). 1:bT b θk 1:T j 1:T F. Lindsten, An efficient stochastic approximation EM algorithm using conditional particle filters. In Proceedings of the Int. Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May 2013. 11 / 14 [email protected] IFAC Symposium on System Identification (SYSID), Beijing, China, October 20, 2015. Example { regularization in nonlinear models Regularization allows us to tune the model complexity. TABLE I True function PlaceRESULTS a Gaussian FOR THE HAMMERSTEIN process-WIENER BENCHMARK Standard deviation of 푤t Identified function state spaceExperiment with model푇 = 2 000 (GP-SSM) Estimated standard deviation of 푤t Mean simulation error 0.0005 V prior onStandard the deviation nonlinear of simulation error 0.020 V RMS simulation error 0.020 V 2 m = 6 Run time 13 min 0 transition f( )! 2 · − ) 푡 x Toy example: ( IV. NUMERICAL EXAMPLES f 2 = m = 100 0 We demonstrate our proposed method on a series of numer- +1 10x 푡 t 2 ical examples. The source code is available via the web site x − xt+1 = − + wt; of the first author. 2 1 + 3xt 2 m = 100 A. Simulated example 0 + regularization yt = xt + et: 2 As a first simple numerical example, consider an au- − tonomous system (i.e., no 푢푡) defined by Data 10푥푡 푥푡+1 = − 2 + 푤푡, 푦푡 = 푥푡 + 푒푡, (10) Key: PSAEM1 + 3푥푡 3 2 1 0 1 2 3 − − − x where 푤푡 (0, 0.1) and 푒푡 (0, 0.5).