Inference for Dynamic and Latent Variable Models Via Iterated, Perturbed Bayes Maps

Inference for dynamic and latent variable models via iterated, perturbed Bayes maps Edward L. Ionidesa,1, Dao Nguyena, Yves Atchadéa, Stilian Stoeva, and Aaron A. Kingb,c Departments of aStatistics, bEcology and Evolutionary Biology, and cMathematics, University of Michigan, Ann Arbor, MI 48109 Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved December 9, 2014 (received for review June 6, 2014) Iterated filtering algorithms are stochastic optimization procedures Algorithms with this property have been called plug-and-play (12, for latent variable models that recursively combine parameter 16). Various other plug-and-play methods for POMP models have perturbations with latent variable reconstruction. Previously, been recently proposed (17–20), due largely to the convenience of theoretical support for these algorithms has been based on this property in scientific applications. the use of conditional moments of perturbed parameters to ap- proximate derivatives of the log likelihood function. Here, a theo- An Algorithm and Related Questions retical approach is introduced based on the convergence of A general POMP model consists of an unobserved stochastic an iterated Bayes map. An algorithm supported by this theory process fXðtÞ; t ≥ t0g with observations Y1; ...; YN made at times displays substantial numerical improvement on the computa- dimðXÞ t1; ...; tN . We suppose that XðtÞ takes values in X ⊂ R , Yn tional challenge of inferring parameters of a partially observed takesvaluesinY ⊂ RdimðYÞ, and there is an unknown param- Markov process. dimðΘÞ eter θ taking values in Θ ⊂ R . We adopt notation ym:n = ym; ym+1; ...; yn for integers m ≤ n, so we write the collection of sequential Monte Carlo | particle filter | maximum likelihood | observations as Y1:N .WritingXn = XðtnÞ, the joint density of Markov process X0:N and Y1:N is assumed to exist, and the Markovian prop- STATISTICS erty of X0:N together with the conditional independence of n iterated filtering algorithm was originally proposed for the observation process means that this joint density can be Amaximum likelihood inference on partially observed Markov written as process (POMP) models by Ionides et al. (1). Variations on the original algorithm have been proposed to extend it to f ; ðx : ; y : ; θÞ general latent variable models (2) and to improve numerical X0:N Y1:N 0 N 1 N QN performance (3, 4). In this paper, we study an iterated filtering = f ðx ; θÞ f ðx jx − ; θÞf ðy jx ; θÞ: X0 0 XnjXn − 1 n n 1 YnjXn n n ECOLOGY algorithm that generalizes the data cloning method (5, 6) and n=1 is therefore also related to other Monte Carlo methods for likelihood-based inference (7–9). Data cloning methodology The data consist of a sequence of observations, y1*:N. We write is based on the observation that iterating a Bayes map con- ; θ fY1:N ðy1:N Þ for the marginal density of Y1:N , and the likelihood verges to a point mass at the maximum likelihood estimate. ℓ θ = * ; θ function is defined to be ð Þ fY1:N ðy1:N Þ. We look for a max- Combining such iterations with perturbations of model θ^ ℓ θ parameters improves the numerical stability of data cloning imum likelihood estimate (MLE), i.e., a value maximizing ð Þ. and provides a foundation for stable algorithms in which the Bayes map is numerically approximated by sequential Monte Significance Carlo computations. We investigate convergence of a sequential Monte Carlo im- Many scientific challenges involve the study of stochastic dy- plementation of an iterated filtering algorithm that combines namic systems for which only noisy or incomplete measure- data cloning, in the sense of Lele et al. (5), with the stochastic ments are available. Inference for partially observed Markov parameter perturbations used by the iterated filtering algorithm process models provides a framework for formulating and of (1). Lindström et al. (4) proposed a similar algorithm, termed answering questions about these systems. Except when the fast iterated filtering, but the theoretical support for that algo- system is small, or approximately linear and Gaussian, state- rithm involved unproved conjectures. We present convergence of-the-art statistical methods are required to make efficient use results for our algorithm, which we call IF2. Empirically, it can of available data. Evaluation of the likelihood for a partially dramatically outperform the previous iterated filtering algorithm observed Markov process model can be formulated as a filter- of ref. 1, which we refer to as IF1. Although IF1 and IF2 both in- ing problem. Iterated filtering algorithms carry out repeated volve recursively filtering through the data, the theoretical justifi- Monte Carlo filtering operations to maximize the likelihood. cation and practical implementations of these algorithms are We develop a new theoretical framework for iterated filtering fundamentally different. IF1 approximates the Fisher score and construct a new algorithm that dramatically outperforms previous approaches on a challenging inference problem in function, whereas IF2 implements an iterated Bayes map. IF1 disease ecology. has been used in applications for which no other computa- tionally feasible algorithm for statistically efficient, likelihood- Author contributions: E.L.I., D.N., Y.A., and A.A.K. designed research; E.L.I., D.N., Y.A., S.S., based inference was known (10–15). The extra capabilities of- and A.A.K. performed research; E.L.I., D.N., Y.A., and A.A.K. analyzed data; and E.L.I., fered by IF2 open up further possibilities for drawing infer- D.N., Y.A., S.S., and A.A.K. wrote the paper. ences about nonlinear partially observed stochastic dynamic The authors declare no conflict of interest. models from time series data. This article is a PNAS Direct Submission. Iterated filtering algorithms implemented using basic sequential 1To whom correspondence should be addressed. Email: [email protected]. Monte Carlo techniques have the property that they do not need This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. to evaluate the transition density of the latent Markov process. 1073/pnas.1410597112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1410597112 PNAS | January 20, 2015 | vol. 112 | no. 3 | 719–724 Downloaded by guest on September 27, 2021 Z ^ Algorithm IF2. Iterated filtering ℓ ðθ0:N Þhðθ0:N jφ ; σÞfðφÞdφ dθ0:N−1 σ θ = Z ; [1] input: T fð N Þ ^ θ ℓ θ : h θ : φ ; σ f φ dφ dθ : Simulator for fX0 ðx0 ; Þ ð 0 N Þ ð 0 N j Þ ð Þ 0 N θ : Simulator for fXnjXn − 1 ðxnjxn−1 ; Þ, n in 1 N Evaluator for f ðy jx ; θÞ, n in 1 : N YnjXn n n with f and Tσ f approximating the initial and final density of the * Data, y1:N parameter swarm. For our theoretical analysis, we consider the Number of iterations, M case when the SD of the parameter perturbations is held fixed at Number of particles, J σm = σ > 0 for m = 1; ...; M. In this case, IF2 is a Monte Carlo Initial parameter swarm, Θ0,j in 1 : J f j g TM f θ σ θ φ σ : approximation to σ ð Þ. We call the fixed version of IF2 Perturbation density, hnð j ; Þ, n in 1 N “ ” σ homogeneous iterated filtering since each iteration imple- Perturbation sequence, 1:M ments the same map. For any fixed σ, one cannot expect a pro- ΘM : output: Final parameter swarm, f j ,j in 1 Jg cedure such as IF2 to converge to a point mass at the MLE. For m in 1 : M However, for fixed but small σ, we show that IF2 does approx- ΘF,m ∼ θ Θm−1 σ : imately maximize the likelihood, with an error that shrinks to 0,j h0ð j j ; mÞ for j in 1 J F,m ∼ ΘF,m : zero in a limit as σ → 0 and M → ∞. An immediate motivation for X0,j fX0 ðx0; 0,j Þ for j in 1 J For n in 1 : N studying the homogeneous case is simplicity; it turns out that ΘP,m ∼ h θ ΘF,m ,σ for j in 1 : J even with this simplifying assumption, the theoretical analysis n,j nð j nÀ−1,j mÞ Á XP,m ∼ f x jXF,m ; ΘP,m for j in 1 : J is not entirely straightforward. Moreover, the homogeneous anal- n,j XnjXn− 1 n n−1,j j ysis gives at least as much insight as an asymptotic analysis into wm = f y* XP,m; ΘP,m for j in 1 : J σ n,j Yn jXn n n,j n,j P the practical properties of IF2, when m decreases down to some P = = m = J m σ > Draw k1:J with ðkj iÞ wn,i u=1wn,u positive level 0 but never completes the asymptotic limit F,m P,m F,m P,m Θ = Θ and X = X for j in 1 : J σm → 0. Iterated filtering algorithms have been primarily devel- n,j n,kj n,j n,kj End For oped in the context of making progress on complex models for Θm = ΘF,m : Set j N,j for j in 1 J which successfully achieving and validating global likelihood End For optimization is challenging. In such situations, it is advisable to run multiple searches and continue each search up to the limits of available computation (25). If no single search can The IF2 algorithm defined above provides a plug-and-play reliably locate the global maximum, a theory assuring conver- θ^ Monte Carlo approach to obtaining . A simplification of IF2 gence to a neighborhood of the maximum is as relevant as a arises when N = 1, in which case, iterated filtering is called theory assuring convergence to the maximum itself in a prac- iterated importance sampling (2) (SI Text, Iterated Importance tically unattainable limit.

Load more