Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models

Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models Johan Dahlin Thomas B. Schön Linköping University Uppsala University Abstract This tutorial provides a gentle introduction to the particle Metropolis-Hastings (PMH) algorithm for parameter inference in nonlinear state-space models together with a software implementation in the statistical programming language R. We employ a step-by-step approach to develop an implementation of the PMH algorithm (and the particle filter within) together with the reader. This final implementation is also available as the package pmhtutorial from the Comprehensive R Archive Network (CRAN) repository. Throughout the tutorial, we provide some intuition as to how the algorithm operates and discuss some solutions to problems that might occur in practice. To illustrate the use of PMH, we consider parameter inference in a linear Gaussian state-space model with synthetic data and a nonlinear stochastic volatility model with real-world data. Keywords: Bayesian inference, state-space models, particle filtering, particle Markov chain Monte Carlo, stochastic volatility model. 1. Introduction We are concerned with Bayesian parameter inference in nonlinear state-space models (SSMs). This is an important problem as SSMs are ubiquitous in, e.g., automatic control (Ljung 1999), econometrics (Durbin and Koopman 2012), finance (Tsay 2005) and many other fields. An overview of some concrete applications of state-space modeling is given by Langrock(2011). The major problem with Bayesian parameter and state inference in SSMs is that it is an analytically intractable problem, which cannot be solved in closed-form. Hence, approximations are required and these typically fall into two categories: analytical approximations and statistical simulations. The focus of this tutorial is on a member of the latter category known as particle Metropolis-Hastings (PMH; Andrieu et al. 2010), which approximates the poste- arXiv:1511.01707v8 [stat.CO] 12 Mar 2019 rior distribution by employing a specific Markov chain to sample from it. This requires us to be able to point-wise compute unbiased estimates of the posterior. In PMH, these are provided by the particle filter (Gordon et al. 1993; Doucet and Johansen 2011), which is another important and rather general statistical simulation algorithm. The main aim and contribution of this tutorial is to give a gentle introduction (both in terms of methodology and software) to the PMH algorithm. We assume that the reader is familiar with traditional time series analysis and Kalman filtering at the level of Brockwell and Davis (2002). Some familiarity with Monte Carlo approximations, importance sampling and Markov chain Monte Carlo (MCMC) at the level of Ross(2012) is also beneficial. 2 Getting Started with PMH for Inference in Nonlinear Dynamical Models The focus is on implementing the particle filter and PMH algorithms step-by-step in the programming language R (R Core Team 2018). We employ literate programming1 to build up the code using a top-down approach, see Section 1.1 in Pharr and Humphreys(2010). The final implementation is available as a R package pmhtutorial (Dahlin 2018) distributed under the GPL-2 license via the Comprehensive R Archive Network (CRAN) repository and available at https://CRAN.R-project.org/package=pmhtutorial. The R source code is also provided via GitHub with corresponding versions for MATLAB (The MathWorks Inc. 2017) and Python (van Rossum et al. 2011) to enable learning by readers from many different backgrounds, see Section8 for details. There are a number of existing tutorials which relate to this one. A thorough introduction to particle filtering is provided by Arulampalam et al. (2002), where a number of different implementations are discussed and pseudo-code provided to facilitate implementation. This is an excellent start for the reader who is particularly interested in particle filtering. However, they do not discuss nor implement PMH. The software LibBi and the connected tutorial by Murray(2015) includes high-performance implementations and examples of particle filtering and PMH. The focus of the present tutorial is a bit different as it tries to explain all the steps of the algorithms and provides basic implementations in several different programming languages. Finally, Ala-Luhtala et al. (2016) presents an advanced tutorial paper on twisted particle filtering. It also includes pseudo-code for PMH but its focus is on particle filtering and does not discuss, e.g., how to tune the proposal in PMH. We return to discuss related work and software throughout this tutorial in Sections 3.3, 4.3,6 and7. The disposition of this tutorial is as follows. In Section2, we introduce the Bayesian parameter and state inference problem in SSMs and discuss how and why the PMH algorithm works. In Sections3 and4, we implement the particle filter and PMH algorithm for a toy model and study their properties as well as give a small survey of results suitable for further study. In Sections5 and6, the PMH implementation is adapted to solve a real-world problem from finance. Furthermore, some more advanced modifications and improvements to PMH are discussed and exemplified. Finally, we conclude this tutorial by Section7 where related software implementations are discussed. 2. Overview of the PMH algorithm In this section, we introduce the Bayesian inference problem in SSMs related to the parameters and the latent state. The PMH algorithm is then introduced to solve these intractable problems. The inclusion of the particle filter into PMH is discussed from two complementary perspectives: as an approximation of the acceptance probability and as auxiliary variables in an augmented posterior distribution. 1The skeleton of the code is outlined as a collection of different variables denoted by <variable>. These parts of the code are then populated with content using a C++ like syntax. Hence, the operator = results in the assignment of a piece of code to the relevant variable (overwriting what is already stored within it). On the other hand, += denotes the operation of adding some additional code after the current code stored in the variable. The reader is strongly encouraged to print the source code from GitHub and read it in parallel with the tutorial to enable rapid learning of the material. Johan Dahlin, Thomas B. Schön 3 ··· xt−1 xt xt+1 ··· ··· yt−1 yt yt+1 ··· Figure 1: The structure of an SSM represented as a graphical model. 2.1. Bayesian inference in SSMs In Figure1, we present a graphical model of the SSM with the latent states (top), the 2 observations (bottom), but without exogenous inputs . The latent state xt only depends on the previous state xt−1 of the process as it is a (first-order) Markov process. That is, all the t−1 information in the past states x0:t−1 , {xs}s=0 is summarized in the most recent state xt−1. The observations y1:t are conditionally independent as they only relate to each other through the states. ny We assume that the state and observations are real-valued, i.e., yt ∈ Y ⊆ R and xt ∈ X ⊆ nx R . The initial state x0 is distributed according to the density µθ(x0) parameterized by p some unknown static parameters θ ∈ Θ ⊂ R . The latent state and the observation processes 3 are governed by the densities fθ(xt|xt−1) and gθ(yt|xt), respectively. The density describing the state process gives the probability that the next state is xt given the previous state xt−1. An analogous interpretation holds for the observation process. With the notation in place, a general nonlinear SSM can be expressed as x0 ∼ µθ(x0), xt|xt−1 ∼ fθ(xt|xt−1), yt|xt ∼ gθ(yt|xt), (1) where no notational distinction is made between a random variable and its realization. The main objective in Bayesian parameter inference is to obtain an estimate of the parameters θ given the measurements y1:T by computing the parameter posterior (distribution) given by p(θ)p(y1:T |θ) γθ(θ) πθ(θ) , p(θ|y1:T ) = Z , , (2) 0 0 0 Zθ p(θ )p(y1:T |θ )dθ Θ where p(θ) and p(y1:T |θ) , pθ(y1:T ) denote the parameter prior (distribution) and the data distribution, respectively. The un-normalized posterior distribution denoted by γθ(θ) = p(θ)p(y1:T |θ) is an important component in the construction of PMH. The denominator Zθ is usually referred to as the marginal likelihood or the model evidence. 2It is straightforward to modify the algorithms presented for vector valued quantities and also to include a known exogenous input ut. 3 In this tutorial, we assume that xt|xt−1 and yt|xt can be modeled as continuous random variables with a density function. However, the algorithms discussed can be applied to deterministic states and discrete states/observations as well. 4 Getting Started with PMH for Inference in Nonlinear Dynamical Models The data distribution pθ(y1:T ) is intractable for most interesting SSMs due to the fact that the state sequence is unknown. The problem can be seen by studying the decomposition T Y pθ(y1:T ) = pθ(y1) pθ(yt|y1:t−1), (3) t=2 where the so-called predictive likelihood is given by Z pθ(yt|y1:t−1) = gθ(yt|xt)fθ(xt|xt−1)πt−1(xt−1)dxtdxt−1. (4) The marginal filtering distribution πt−1(xt−1) , pθ(xt−1|y1:t−1) can be obtained from the Bayesian filtering recursions (Anderson and Moore 2005), which for a fixed θ are given by gθ(yt|xt) Z πt(xt) = fθ(xt|xt−1)πt−1(xt−1) dxt−1, for 0 < t ≤ T, (5) pθ(yt|y1:t−1) X with π0(x0) = µθ(x0) as the initialization. In theory, it is possible to construct a sequence of filtering distributions by iteratively applying (5).

Load more