Markov Chain Monte Carlo Simulation Using the DREAM Software Package: Theory, Concepts, and MATLAB Implementation

Total Page:16

File Type:pdf, Size:1020Kb

Markov Chain Monte Carlo Simulation Using the DREAM Software Package: Theory, Concepts, and MATLAB Implementation UC Irvine UC Irvine Previously Published Works Title Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation Permalink https://escholarship.org/uc/item/6j55p5kh Author Vrugt, JA Publication Date 2016 DOI 10.1016/j.envsoft.2015.08.013 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Environmental Modelling & Software 75 (2016) 273e316 Contents lists available at ScienceDirect Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation * Jasper A. Vrugt a, b, c, a Department of Civil and Environmental Engineering, University of California Irvine, 4130 Engineering Gateway, Irvine, CA, 92697-2175, USA b Department of Earth System Science, University of California Irvine, Irvine, CA, USA c Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands article info abstract Article history: Bayesian inference has found widespread application and use in science and engineering to reconcile Received 17 February 2015 Earth system models with data, including prediction in space (interpolation), prediction in time (fore- Received in revised form casting), assimilation of observations and deterministic/stochastic model output, and inference of the 13 August 2015 ~ model parameters. Bayes theorem states that the posterior probability, pðH YÞ of a hypothesis, H is Accepted 13 August 2015 ~ proportional to the product of the prior probability, p(H) of this hypothesis and the likelihood, LðH YÞ of Available online xxx ~ ~ ~ the same hypothesis given the new observations, Y,orpðH YÞfpðHÞLðH YÞ. In science and engineering, H often constitutes some numerical model, F (x) which summarizes, in algebraic and differential equations, Keywords: state variables and fluxes, all knowledge of the system of interest, and the unknown parameter values, x Bayesian inference ~ Markov chain Monte Carlo (MCMC) are subject to inference using the data Y. Unfortunately, for complex system models the posterior dis- simulation tribution is often high dimensional and analytically intractable, and sampling methods are required to Random walk metropolis (RWM) approximate the target. In this paper I review the basic theory of Markov chain Monte Carlo (MCMC) Adaptive metropolis (AM) simulation and introduce a MATLAB toolbox of the DiffeRential Evolution Adaptive Metropolis (DREAM) Differential evolution Markov chain (DE- algorithm developed by Vrugt et al. (2008a, 2009a) and used for Bayesian inference in fields ranging from MC) physics, chemistry and engineering, to ecology, hydrology, and geophysics. This MATLAB toolbox pro- Prior distribution vides scientists and engineers with an arsenal of options and utilities to solve posterior sampling Likelihood function problems involving (among others) bimodality, high-dimensionality, summary statistics, bounded Posterior distribution Approximate Bayesian computation (ABC) parameter spaces, dynamic simulation models, formal/informal likelihood functions (GLUE), diagnostic Diagnostic model evaluation model evaluation, data assimilation, Bayesian model averaging, distributed computation, and informa- Residual analysis tive/noninformative prior distributions. The DREAM toolbox supports parallel computing and includes Environmental modeling tools for convergence analysis of the sampled chain trajectories and post-processing of the results. Seven Bayesian model averaging (BMA) different case studies illustrate the main capabilities and functionalities of the MATLAB toolbox. Generalized likelihood uncertainty © 2015 Elsevier Ltd. All rights reserved. estimation (GLUE) Multi-processor computing Extended metropolis algorithm (EMA) 1. Introduction and scope environmental models that use algebraic and (stochastic) ordinary (partial) differential equations (PDEs) to simulate the behavior of a Continued advances in direct and indirect (e.g. geophysical, myriad of highly interrelated ecological, hydrological, and biogeo- pumping test, remote sensing) measurement technologies and chemical processes at different spatial and temporal scales. These improvements in computational technology and process knowl- water, energy, nutrient, and vegetation processes are often non- edge have stimulated the development of increasingly complex separable, non-stationary with very complicated and highly- nonlinear spatio-temporal interactions (Wikle and Hooten, 2010) which gives rise to complex system behavior. This complexity poses * Department of Civil and Environmental Engineering, University of California significant measurement and modeling challenges, in particular Irvine, 4130 Engineering Gateway, Irvine, CA, 92697-2175, USA. how to adequately characterize the spatio-temporal processes of E-mail address: [email protected]. the dynamic system of interest, in the presence of (often) URL: http://faculty.sites.uci.edu/jasper http://dx.doi.org/10.1016/j.envsoft.2015.08.013 1364-8152/© 2015 Elsevier Ltd. All rights reserved. 274 J.A. Vrugt / Environmental Modelling & Software 75 (2016) 273e316 incomplete and insufficient observations, process knowledge and environmental system J to forcing variables U ¼ {u1,…,un}. The system characterization. This includes prediction in space (inter- observations or data are linked to the physical system. polation/extrapolation), prediction in time (forecasting), assimila- ~ Y)JðxÃÞþε; (1) tion of observations and deterministic/stochastic model output, Ã Ã ; …; Ã and inference of the model parameters. where x ¼ fx1 xdg are the unknown parameters, and The use of differential equations might be more appropriate ε ¼ {ε1,…,εn}isan-vector of measurement errors. When a hy- )F Ã; ~; j~ than purely empirical relationships among variables, but does not pothesis, or simulator, Y ðx u 0Þ of the physical process is guard against epistemic errors due to incomplete and/or inexact available, then the data can be modeled using process knowledge. Fig. 1 provides a schematic overview of most ~)F Ã; ~; j~ ; important sources of uncertainty that affect our ability to Y x U 0 þ E (2) describe as closely and consistently as possible the observed system behavior. These sources of uncertainty have been dis- j~ 2 J 2 ℝt t … where 0 signify the initial states, and E ¼ {e1, ,en} cussed extensively in the literature, and much work has focused includes observation error (forcing and output data) as well as error on the characterization of parameter, model output and state due to the fact that the simulator, F (,) may be systematically * variable uncertainty. Explicit knowledge of each individual error different from reality, JðxÃÞ for the parameters x . The latter may source would provide strategic guidance for investments in data arise from numerical errors (inadequate solver and discretization), collection and/or model improvement. For instance, if input and improper model formulation and/or parameterization. (forcing/boundary condition) data uncertainty dominates total By adopting a Bayesian formalism the posterior distribution of simulation uncertainty, then it would not be productive to in- the parameters of the model can be derived by conditioning the crease model complexity, but rather to prioritize data collection spatio-temporal behavior of the model on measurements of the instead. On the contrary, it would be naive to spend a large observed system response portion of the available monetary budget on system character- ization if this constitutes only a minor portion of total prediction ~ p x p Y x ~ ð Þ uncertainty. p x Y ; (3) ¼ ~ Note that model structural error (label 4) (also called epistemic p Y error) has received relatively little attention, but is key to learning fi ~ and scienti c discovery (Vrugt et al., 2005; Vrugt and Sadegh, where p(x) and pðx YÞ signify the prior and posterior parameter 2013). ~ ~ The focus of this paper is on spatio-temporal models that may distribution, respectively, and L x Y ≡ p Y x denotes the likeli- be discrete in time and/or space, but with processes that are ~ hood function. The evidence, pðYÞ acts as a normalization constant continuous in both. A MATLAB toolbox is described which can be (scalar) so that the posterior distribution integrates to unity used to derive the posterior parameter (and state) distribution, Z Z conditioned on measurements of observed system behavior. At ~ ~ ; ~ ; least some level of calibration of these models is required to make p Y ¼ pðxÞp Y x dx ¼ p x Y dx (4) sure that the simulated state variables, internal fluxes, and output c c variables match the observed system behavior as closely and d ~ consistently as possible. Bayesian methods have found widespread over the parameter space, x 2 c 2 ℝ . In practice, pðYÞ is not application and use to do so, in particular because of their innate required for posterior estimation as all statistical inferences about ~ ability to handle, in a consistent and coherent manner parameter, pðx YÞ can be made from the unnormalized density state variable, and model output (simulation) uncertainty. ~ ~ ; …; ~ fi ~ f ~ If Y ¼ fy1 yng signi es a discrete vector of measurements at p x Y pðxÞL x Y (5) times t ¼ {1,…,n} which summarizes the response of some If we assume, for the time being, that the prior distribution, p(x) Fig. 1. Schematic illustration of the most important sources of uncertainty in environmental systems modeling, including (1) parameter,
Recommended publications
  • Markov Chain Monte Carlo Convergence Diagnostics: a Comparative Review
    Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin1 Abstract A critical issue for users of Markov Chain Monte Carlo (MCMC) methods in applications is how to determine when it is safe to stop sampling and use the samples to estimate characteristics of the distribu- tion of interest. Research into methods of computing theoretical convergence bounds holds promise for the future but currently has yielded relatively little that is of practical use in applied work. Consequently, most MCMC users address the convergence problem by applying diagnostic tools to the output produced by running their samplers. After giving a brief overview of the area, we provide an expository review of thirteen convergence diagnostics, describing the theoretical basis and practical implementation of each. We then compare their performance in two simple models and conclude that all the methods can fail to detect the sorts of convergence failure they were designed to identify. We thus recommend a combination of strategies aimed at evaluating and accelerating MCMC sampler convergence, including applying diagnostic procedures to a small number of parallel chains, monitoring autocorrelations and cross- correlations, and modifying parameterizations or sampling algorithms appropriately. We emphasize, however, that it is not possible to say with certainty that a finite sample from an MCMC algorithm is rep- resentative of an underlying stationary distribution. 1 Mary Kathryn Cowles is Assistant Professor of Biostatistics, Harvard School of Public Health, Boston, MA 02115. Bradley P. Carlin is Associate Professor, Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455.
    [Show full text]
  • Meta-Learning MCMC Proposals
    Meta-Learning MCMC Proposals Tongzhou Wang∗ Yi Wu Facebook AI Research University of California, Berkeley [email protected] [email protected] David A. Moorey Stuart J. Russell Google University of California, Berkeley [email protected] [email protected] Abstract Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environ- ments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide fast approximations to block Gibbs conditionals. The learned neural proposals generalize to occurrences of common structural motifs across different models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler yields higher final F1 scores than classical single-site Gibbs sampling. 1 Introduction Model-based probabilistic inference is a highly successful paradigm for machine learning, with applications to tasks as diverse as movie recommendation [31], visual scene perception [17], music transcription [3], etc. People learn and plan using mental models, and indeed the entire enterprise of modern science can be viewed as constructing a sophisticated hierarchy of models of physical, mental, and social phenomena. Probabilistic programming provides a formal representation of models as sample-generating programs, promising the ability to explore a even richer range of models.
    [Show full text]
  • Markov Chain Monte Carlo Edps 590BAY
    Markov Chain Monte Carlo Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2019 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Overview ◮ Introduction to Bayesian computing. ◮ Markov Chain ◮ Metropolis algorithm for mean of normal given fixed variance. ◮ Revisit anorexia data. ◮ Practice problem with SAT data. ◮ Some tools for assessing convergence ◮ Metropolis algorithm for mean and variance of normal. ◮ Anorexia data. ◮ Practice problem. ◮ Summary Depending on the book that you select for this course, read either Gelman et al. pp 2751-291 or Kruschke Chapters pp 143–218. I am relying more on Gelman et al. C.J. Anderson (Illinois) MCMC Fall 2019 2.2/ 45 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Introduction to Bayesian Computing ◮ Our major goal is to approximate the posterior distributions of unknown parameters and use them to estimate parameters. ◮ The analytic computations are fine for simple problems, ◮ Beta-binomial for bounded counts ◮ Normal-normal for continuous variables ◮ Gamma-Poisson for (unbounded) counts ◮ Dirichelt-Multinomial for multicategory variables (i.e., a categorical variable) ◮ Models in the exponential family with small number of parameters ◮ For large number of parameters and more complex models ◮ Algebra of analytic solution becomes overwhelming ◮ Grid takes too much time. ◮ Too difficult for most applications. C.J. Anderson (Illinois) MCMC Fall 2019 3.3/ 45 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Steps in Modeling Recall that the steps in an analysis: 1. Choose model for data (i.e., p(y θ)) and model for parameters | (i.e., p(θ) and p(θ y)).
    [Show full text]
  • Tail Estimation of the Spectral Density Under Fixed-Domain Asymptotics
    TAIL ESTIMATION OF THE SPECTRAL DENSITY UNDER FIXED-DOMAIN ASYMPTOTICS By Wei-Ying Wu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2011 ABSTRACT TAIL ESTIMATION OF THE SPECTRAL DENSITY UNDER FIXED-DOMAIN ASYMPTOTICS By Wei-Ying Wu For spatial statistics, two asymptotic approaches are usually considered: increasing domain asymptotics and fixed-domain asymptotics (or infill asymptotics). For increasing domain asymptotics, sampled data increase with the increasing spatial domain, while under infill asymptotics, data are observed on a fixed region with the distance between neighboring ob- servations tending to zero. The consistency and asymptotic results under these two asymp- totic frameworks can be quite different. For example, not all parameters are consistently estimated under infill asymptotics while consistency holds for those parameters under in- creasing asymptotics (Zhang 2004). For a stationary Gaussian random field on Rd with the spectral density f(λ) that satisfies f(λ) ∼ c jλj−θ as jλj ! 1, the parameters c and θ control the tail behavior of the spectral density where θ is related to the smoothness of random field and c can be used to determine the orthogonality of probability measures for a fixed θ. Specially, c corresponds to the microergodic parameter mentioned in Du et al. (2009) when Mat´erncovariance is assumed. Additionally, under infill asymptotics, the tail behavior of the spectral density dominates the performance of the prediction, and the equivalence of the probability measures. Based on those reasons, it is significant in statistics to estimate c and θ.
    [Show full text]
  • Bayesian Phylogenetics and Markov Chain Monte Carlo Will Freyman
    IB200, Spring 2016 University of California, Berkeley Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman 1 Basic Probability Theory Probability is a quantitative measurement of the likelihood of an outcome of some random process. The probability of an event, like flipping a coin and getting heads is notated P (heads). The probability of getting tails is then 1 − P (heads) = P (tails). • The joint probability of event A and event B both occuring is written as P (A; B). The joint probability of mutually exclusive events, like flipping a coin once and getting heads and tails, is 0. • The probability of A occuring given that B has already occurred is written P (AjB), which is read \the probability of A given B". This is a conditional probability. • The marginal probability of A is P (A), which is calculated by summing or integrating the joint probability over B. In other words, P (A) = P (A; B) + P (A; not B). • Joint, conditional, and marginal probabilities can be combined with the expression P (A; B) = P (A)P (BjA) = P (B)P (AjB). • The above expression can be rearranged into Bayes' theorem: P (BjA)P (A) P (AjB) = P (B) Bayes' theorem is a straightforward and uncontroversial way to calculate an inverse conditional probability. • If P (AjB) = P (A) and P (BjA) = P (B) then A and B are independent. Then the joint probability is calculated P (A; B) = P (A)P (B). 2 Interpretations of Probability What exactly is a probability? Does it really exist? There are two major interpretations of probability: • Frequentists believe that the probability of an event is its relative frequency over time.
    [Show full text]
  • Statistica Sinica Preprint No: SS-2016-0112R1
    Statistica Sinica Preprint No: SS-2016-0112R1 Title Inference of Bivariate Long-memory Aggregate Time Series Manuscript ID SS-2016-0112R1 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0112 Complete List of Authors Henghsiu Tsai Heiko Rachinger and Kung-Sik Chan Corresponding Author Henghsiu Tsai E-mail [email protected] Notice: Accepted version subject to English editing. Statistica Sinica: Newly accepted Paper (accepted version subject to English editing) Inference of Bivariate Long-memory Aggregate Time Series Henghsiu Tsai, Heiko Rachinger and Kung-Sik Chan Academia Sinica, University of Vienna and University of Iowa January 5, 2017 Abstract. Multivariate time-series data are increasingly collected, with the increas- ing deployment of affordable and sophisticated sensors. These multivariate time series are often of long memory, the inference of which can be rather complex. We consider the problem of modeling long-memory bivariate time series that are aggregates from an underlying long-memory continuous-time process. We show that with increasing aggre- gation, the resulting discrete-time process is approximately a linear transformation of two independent fractional Gaussian noises with the corresponding Hurst parameters equal to those of the underlying continuous-time processes. We also use simulations to confirm the good approximation of the limiting model to aggregate data from a continuous-time process. The theoretical and numerical results justify modeling long-memory bivari- ate aggregate time series by this limiting model. However, the model parametrization changes drastically in the case of identical Hurst parameters. We derive the likelihood ratio test for testing the equality of the two Hurst parameters, within the framework of Whittle likelihood, and the corresponding maximum likelihood estimators.
    [Show full text]
  • The Markov Chain Monte Carlo Approach to Importance Sampling
    The Markov Chain Monte Carlo Approach to Importance Sampling in Stochastic Programming by Berk Ustun B.S., Operations Research, University of California, Berkeley (2009) B.A., Economics, University of California, Berkeley (2009) Submitted to the School of Engineering in partial fulfillment of the requirements for the degree of Master of Science in Computation for Design and Optimization at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2012 c Massachusetts Institute of Technology 2012. All rights reserved. Author.................................................................... School of Engineering August 10, 2012 Certified by . Mort Webster Assistant Professor of Engineering Systems Thesis Supervisor Certified by . Youssef Marzouk Class of 1942 Associate Professor of Aeronautics and Astronautics Thesis Reader Accepted by . Nicolas Hadjiconstantinou Associate Professor of Mechanical Engineering Director, Computation for Design and Optimization 2 The Markov Chain Monte Carlo Approach to Importance Sampling in Stochastic Programming by Berk Ustun Submitted to the School of Engineering on August 10, 2012, in partial fulfillment of the requirements for the degree of Master of Science in Computation for Design and Optimization Abstract Stochastic programming models are large-scale optimization problems that are used to fa- cilitate decision-making under uncertainty. Optimization algorithms for such problems need to evaluate the expected future costs of current decisions, often referred to as the recourse function. In practice, this calculation is computationally difficult as it involves the evaluation of a multidimensional integral whose integrand is an optimization problem. Accordingly, the recourse function is estimated using quadrature rules or Monte Carlo methods. Although Monte Carlo methods present numerous computational benefits over quadrature rules, they require a large number of samples to produce accurate results when they are embedded in an optimization algorithm.
    [Show full text]
  • Markov Chain Monte Carlo (Mcmc) Methods
    MARKOV CHAIN MONTE CARLO (MCMC) METHODS 0These notes utilize a few sources: some insights are taken from Profs. Vardeman’s and Carriquiry’s lecture notes, some from a great book on Monte Carlo strategies in scientific computing by J.S. Liu. EE 527, Detection and Estimation Theory, # 4c 1 Markov Chains: Basic Theory Definition 1. A (discrete time/discrete state space) Markov chain (MC) is a sequence of random quantities {Xk}, each taking values in a (finite or) countable set X , with the property that P {Xn = xn | X1 = x1,...,Xn−1 = xn−1} = P {Xn = xn | Xn−1 = xn−1}. Definition 2. A Markov Chain is stationary if 0 P {Xn = x | Xn−1 = x } is independent of n. Without loss of generality, we will henceforth name the elements of X with the integers 1, 2, 3,... and call them “states.” Definition 3. With 4 pij = P {Xn = j | Xn−1 = i} the square matrix 4 P = {pij} EE 527, Detection and Estimation Theory, # 4c 2 is called the transition matrix for a stationary Markov Chain and the pij are called transition probabilities. Note that a transition matrix has nonnegative entries and its rows sum to 1. Such matrices are called stochastic matrices. More notation for a stationary MC: Define (k) 4 pij = P {Xn+k = j | Xn = i} and (k) 4 fij = P {Xn+k = j, Xn+k−1 6= j, . , Xn+1 6= j, | Xn = i}. These are respectively the probabilities of moving from i to j in k steps and first moving from i to j in k steps.
    [Show full text]
  • An Introduction to Markov Chain Monte Carlo Methods and Their Actuarial Applications
    AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS AND THEIR ACTUARIAL APPLICATIONS DAVID P. M. SCOLLNIK Department of Mathematics and Statistics University of Calgary Abstract This paper introduces the readers of the Proceed- ings to an important class of computer based simula- tion techniques known as Markov chain Monte Carlo (MCMC) methods. General properties characterizing these methods will be discussed, but the main empha- sis will be placed on one MCMC method known as the Gibbs sampler. The Gibbs sampler permits one to simu- late realizations from complicated stochastic models in high dimensions by making use of the model’s associated full conditional distributions, which will generally have a much simpler and more manageable form. In its most extreme version, the Gibbs sampler reduces the analy- sis of a complicated multivariate stochastic model to the consideration of that model’s associated univariate full conditional distributions. In this paper, the Gibbs sampler will be illustrated with four examples. The first three of these examples serve as rather elementary yet instructive applications of the Gibbs sampler. The fourth example describes a reasonably sophisticated application of the Gibbs sam- pler in the important arena of credibility for classifica- tion ratemaking via hierarchical models, and involves the Bayesian prediction of frequency counts in workers compensation insurance. 114 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 115 1. INTRODUCTION The purpose of this paper is to acquaint the readership of the Proceedings with a class of simulation techniques known as Markov chain Monte Carlo (MCMC) methods. These methods permit a practitioner to simulate a dependent sequence of ran- dom draws from very complicated stochastic models.
    [Show full text]
  • Approximate Likelihood for Large Irregularly Spaced Spatial Data
    Approximate likelihood for large irregularly spaced spatial data1 Montserrat Fuentes SUMMARY Likelihood approaches for large irregularly spaced spatial datasets are often very diffi- cult, if not infeasible, to use due to computational limitations. Even when we can assume normality, exact calculations of the likelihood for a Gaussian spatial process observed at n locations requires O(n3) operations. We present a version of Whittle’s approximation to the Gaussian log likelihood for spatial regular lattices with missing values and for irregularly spaced datasets. This method requires O(nlog2n) operations and does not involve calculat- ing determinants. We present simulations and theoretical results to show the benefits and the performance of the spatial likelihood approximation method presented here for spatial irregularly spaced datasets and lattices with missing values. 1 Introduction Statisticians are frequently involved in the spatial analysis of huge datasets. In this situation, calculating the likelihood function to estimate the covariance parameters or to obtain the predictive posterior density is very difficult due to computational limitations. Even if we 1M. Fuentes is an Associate Professor at the Statistics Department, North Carolina State University (NCSU), Raleigh, NC 27695-8203. Tel.:(919) 515-1921, Fax: (919) 515-1169, E-mail: [email protected]. This research was sponsored by a National Science Foundation grant DMS 0353029. Key words: covariance, Fourier transform, periodogram, spatial statistics, satellite data. 1 can assume normality, calculating the likelihood function involves O(N 3) operations, where N is the number of observations. However, if the observations are on a regular complete lattice, then it is possible to compute the likelihood function with fewer calculations using spectral methods (Whittle 1954, Guyon 1982, Dahlhaus and K¨usch, 1987, Stein 1995, 1999).
    [Show full text]
  • Introduction to Markov Chain Monte Carlo
    1 Introduction to Markov Chain Monte Carlo Charles J. Geyer 1.1 History Despite a few notable uses of simulation of random processes in the pre-computer era (Hammersley and Handscomb, 1964, Section 1.2; Stigler, 2002, Chapter 7), practical widespread use of simulation had to await the invention of computers. Almost as soon as computers were invented, they were used for simulation (Hammersley and Handscomb, 1964, Section 1.2). The name “Monte Carlo” started as cuteness—gambling was then (around 1950) illegal in most places, and the casino at Monte Carlo was the most famous in the world—but it soon became a colorless technical term for simulation of random processes. Markov chain Monte Carlo (MCMC) was invented soon after ordinary Monte Carlo at Los Alamos, one of the few places where computers were available at the time. Metropolis ∗ et al. (1953) simulated a liquid in equilibrium with its gas phase. The obvious way to find out about the thermodynamic equilibrium is to simulate the dynamics of the system, and let it run until it reaches equilibrium. The tour de force was their realization that they did not need to simulate the exact dynamics; they only needed to simulate some Markov chain having the same equilibrium distribution. Simulations following the scheme of Metropolis et al. (1953) are said to use the Metropolis algorithm. As computers became more widely available, the Metropolis algorithm was widely used by chemists and physicists, but it did not become widely known among statisticians until after 1990. Hastings (1970) general- ized the Metropolis algorithm, and simulations following his scheme are said to use the Metropolis–Hastings algorithm.
    [Show full text]
  • Estimating Accuracy of the MCMC Variance Estimator: a Central Limit Theorem for Batch Means Estimators
    Estimating accuracy of the MCMC variance estimator: a central limit theorem for batch means estimators Saptarshi Chakraborty∗, Suman K. Bhattacharyay and Kshitij Kharey ∗Department of Epidemiology & Biostatistics Memorial Sloan Kettering Cancer Center 485 Lexington Ave New York, NY 10017, USA e-mail: [email protected] yDepartment of Statistics University of Florida 101 Griffin Floyd Hall Gainesville, Florida 32601, USA e-mail: [email protected] e-mail: [email protected] Abstract: The batch means estimator of the MCMC variance is a simple and effective measure of accuracy for MCMC based ergodic averages. Under various regularity conditions, the estimator has been shown to be consistent for the true variance. However, the estimator can be unstable in practice as it depends directly on the raw MCMC output. A measure of accuracy of the batch means estima- tor itself, ideally in the form of a confidence interval, is therefore desirable. The asymptotic variance of the batch means estimator is known; however, without any knowledge of asymptotic distribution, asymptotic variances are in general insufficient to describe variability. In this article we prove a central limit theorem for the batch means estimator that allows for the construction of asymptotically accu- rate confidence intervals for the batch means estimator. Additionally, our results provide a Markov chain analogue of the classical CLT for the sample variance parameter for i.i.d. observations. Our result assumes standard regularity conditions similar to the ones assumed in the literature for proving consistency. Simulated and real data examples are included as illustrations and applications of the CLT. arXiv:1911.00915v1 [stat.CO] 3 Nov 2019 MSC 2010 subject classifications: Primary 60J22; secondary 62F15.
    [Show full text]