Markov Chain Monte Carlo Gibbs Sampler Gibbs Sampler Gibbs Sampler

Total Page:16

File Type:pdf, Size:1020Kb

Markov Chain Monte Carlo Gibbs Sampler Gibbs Sampler Gibbs Sampler Markov Chain Monte Carlo Gibbs Sampler Recall: To compute the expectation h(Y ) we use the approximation Let Y = (Y1; : : : ; Yd) be d dimensional with d ¸ 2 and distribution f(y). n 1 ¡ ¢ (h(Y )) ¼ h(Y (t)) with Y (1); : : : ; Y (n) » h(y): The full conditional distribution of Yi is given by n t=1 f(y ; : : : ; y ; y ; y ; : : : ; y ) P 1 i¡1 i i+1 d (1) (n) f(yijy1; : : : ; yi¡1; yi+1; : : : ; yd) = Thus our aim is to sample Y ; : : : ; Y from f(y). f(y1; : : : ; yi¡1; yi; yi+1; : : : ; yd) dyi Problem: Independent sampling from f(y) may be di±cult. R Gibbs sampling Markov chain Monte Carlo (MCMC) approach Sample or update in turn: ± Generate Markov chain fY (t)g with stationary distribution f(y). Y (t+1) » f(y jY (t); Y (t); : : : ; Y (t)) ± Early iterations Y (1); : : : ; Y (m) reect starting value Y (0). 1 1 2 3 d (t+1) (t+1) (t) (t) Y » f(y2jY ; Y ; : : : ; Y ) ± These iterations are called burn-in. 2 1 3 d (t+1) (t+1) (t+1) (t) (t) Y3 » f(y3jY1 ; Y2 ; Y4 ; : : : ; Yd ) ± After the burn-in, we say the chain has \converged". ± Omit the burn-in from averages: (t+1) (t+1) (t+1) (t+1) Y » f(ydjY ; Y ; : : : ; Y ) 1 n d 1 2 d¡1 h(Y (t)) n ¡ m t=m+1 Always use most recent values. P 2 Burn−in Stationarity In two dimensions, the sample path of the Gibbs sampler looks like this: 1 0.30 ) t=1 t ( 0 Y t=2 −1 0.25 t=4 −2 0 100 200 300 400 500 600 700 800 900 1000 Iteration t=3 0.20 ) t ( 2 Y (t) 0.15 t=7 How do we construct a Markov chain fY g which has stationary distri- t=6 bution f(y)? t=5 0.10 ± Gibbs sampler 0.15 0.20 0.25 0.30 0.35 (t) Y1 ± Metropolis-Hastings algorithm (Metropolis et al 1953; Hastings 1970) MCMC, April 29, 2004 - 1 - MCMC, April 29, 2004 - 2 - Gibbs Sampler Gibbs Sampler T Detailed balance for Gibbs sampler: For simplicity, let Y = (Y1; Y2) . Then the Example: Bayes inference for a univariate normal sample (t+1) (t) update Y at time t + 1 is obtained from the previous Y in two steps: T Consider normally distributed observations Y = (Y1; : : : ; Yn) (t+1) (t) Y1 » p(y1jY2 ) (t+1) (t+1) iid 2 Y2 » p(y2jY1 ) Yi » N (¹; ): 0 (t+1) 0 (t) Accordingly the transition matrix P (y; y ) = ¡ (Y = y jY = y) can be factorized Likelihood function: into two separate transition matrices n n 2 1 2 1 2 0 0 f(Y j¹; ) » exp ¡ (Y ¡ ¹) P (y; y ) = P (y; y~) P (y~; y ) 2 2 i 1 2 2 i=1 ³ ´ ³ ´ 0 T P where y~ = (y1; y2) is the intermediate result after the ¯rst step. Obviously we have Prior distribution (noninformative prior): 0 0 0 0 P1(y; y~) = p(y1jy2) and P2(y~; y ) = p(y2jy1): 2 1 ¼(¹; ) » 2 0 0 0 0 0 Note that for any y; y , we have P1(y; y ) = 0 if y2 =6 y2 and P2(y; y ) = 0 if y1 =6 y1. According to the detailed balance for time-dependent Markov chains, it su±ces to show Posterior distribution: 0 0 n 2 +1 n detailed balance for each of the transition matrices: For any states y; y such that y2 = y2 2 1 1 2 ¼(¹; jY ) » 2 exp ¡ 2 (Yi ¡ ¹) 0 0 0 2 i=1 p(y) P1(y; y ) = p(y1; y2) p(y1jy2) = p(y1jy2) p(y1; y2) ³ ´ ³ ´ 0 0 0 0 0 2 P = p(y1jy2) p(y1; y2) = P1(y ; y) p(y ); De¯ne ¿ = 1/ . Then we can show that 0 0 while for y; y with y2 =6 y2 the equation is trivially ful¯lled. 2 ¹ 2 0 0 ¼(¹j ; Y ) = N Y ; =n Similarly we obtain for y; y such that y1 = y1 n n 1 2 0 0 0 ¼(¿j¹; Y ) = ¡ ¡ ; (¢Yi ¡ ¹) p(y) P2(y; y ) = p(y1; y2) p(y2jy1) = p(y2jy1) p(y2; y1) 2 2 i=1 0 0 0 0 0 ³ ´ = p(y2jy1) p(y1; y2) = P2(y ; y) p(y ); P 0 0 while for y; y with y1 =6 y1 the equation trivially holds. Altogether this shows that p(y) Gibbs sampler: is indeed the stationary distribution of the Gibbs sampler. Note that combined we get 0 0 0 0 0 0 p(y) P (y; y ) = p(y) P1(y; y~) P1(y~; y ) = p(y ) P2(y ; y~) P1(y~; y) =6 p(y ) P (y ; y): ¹(t+1) » N Y¹ ; (n ¢ ¿ (t))¡1 n Explanation: Markov chains fYtg which satisfy the detailed balance equation are called (t+1) n 1 (t+1) 2 ¿ » ¡ ¡ ; (Yi ¡ ¹¢ ) time-reversible since it can be shown that 2 2 i=1 ³ ´ 0 0 P ¡ ¡ t t (Yt+1 = y jYt = y) = (Yt = yjYt+1 = y ): with 2 ( +1) = 1=¿ ( +1) For the above Gibbs sampler, to go back in time we have to update the two components (t+1) (t+1) in reverse order - ¯rst Y2 and then Y1 . MCMC, April 29, 2004 - 3 - MCMC, April 29, 2004 - 4 - Gibbs Sampler Markov Chain Monte Carlo Implementation in R Example: Bivariate normal distribtution n<-20 #Data T T Y<-rnorm(n,2,2) Let Y = (Y1; Y2) be normally distributed with mean ¹ = (0; 0) and MC<-2;N<-1000 #Run MC=2 chains of length N=1000 covariance matrix p<-rep(0,2*MC*N) #Allocate memory for results dim(p)<-c(2,MC,N) 1 ½ for (j in (1:MC)) { #Loop over chains § = : p2<-rgamma(1,n/2,1/2) #Starting value for tau ½ 1 for (i in (1:N)) { #Gibbs iterations à ! p1<-rnorm(1,mean(Y),sqrt(1/(p2*n))) #Update mu p2<-rgamma(1,n/2,sum((Y-p1)^2)/2) #Update tau The conditional distributions are p[1,j,i]<-p1 #Save results p[2,j,i]<-p2 2 } Y1jY2 = N (½ Y2; 1 ¡ ½ ) } 2 Y2jY1 = N (½ Y1; 1 ¡ ½ ) Results: Bayes inference for a univariate normal sample Thus the steps of the Gibbs sampler are Two runs of Gibbs sampler (N=500): (t+1) (t) 2 Y1 » N (½ Y2 ; 1 ¡ ½ ); 3.5 (t+1) (t+1) 3.0 2 2.5 Y2 » N (½ Y1 ; 1 ¡ ½ ): 2.0 1.5 1.0 0.5 0.6 0.4 T 0.2 Note: (t) (t) (t) 0 We can obtain an independent sample Y = (Y1 ; Y2 ) by 0 100 200 300 400 500 Iteration (t) (t) (t+1) 1.0 µ 1.0 τ Y » N (0; 1); 0.8 0.8 1 0.6 0.6 (t+1) (t+1) 2 0.4ACF 0.4ACF Y2 » N (½ Y1 ; 1 ¡ ½ ): 0.2 0.2 0.0 0.0 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag Marginal and joint posterior distributions (based on 1000 draws): 100 0.6 80 0.5 80 0.4 60 60 ) t ( 0.3 τ 40 40 Frequency Frequency 0.2 20 20 0.1 0 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.5 1.0 1.5 2.0 2.5 3.0 3.5 ( ) ( ) µ t τ(t) µ t MCMC, April 29, 2004 - 5 - MCMC, April 29, 2004 - 6 - Markov Chain Monte Carlo Markov Chain Monte Carlo Comparison of MCMC and independent draws Convergence diagnostics 1.0 0.8 ² Plot chain for each quantity of interest. 0.6 (t) ACF 0.4 Y1 Y1 0.2 0.0 0 Iterations 100 to 400 Independent sampling (n=300) 0 0 50 100 150 200 30 ² Plot auto-correlation function (ACF) Lag Burn−in 2 1 ) t 0 ( 30 (t) (t+h) µ −1 −2 20 ½i(h) = corr Yi ; Yi : −3 0 50 100 150 200 100 100 iteration 20 ¡ ¢ Frequency Frequency measures the correlation of values h lags apart. 10 10 ± Slow decay of ACF indicates slow convergence and bad mixing. 200 200 0 0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (t) ± Can be used to ¯nd independent subsample. Y1 Y1 Iterations 100 to 700 Independent sampling (n=600) ² Run multiple, independent chains (e.g. 3-10). 80 300 300 50 ± Several long runs (Gelman and Rubin 1992) 60 40 ¢ gives indication of convergence 40 30 Frequency Frequency 400 20 400 ¢ a sense of statistical security 20 10 0 0 ± one very long run (Geyer, 1992) −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (t) Y1 Y1 ¢ reaches parts other schemes cannot reach. 500 Iterations 100 to 1000 Independent sampling (n=900) 500 80 ² Widely dispersed starting values are particularly helpful to detect slow convergence. 60 60 600 600 40 40 40 20 convergence Frequency Frequency © © 1 ©¼ µ 0 20 20 −20 700 0 0 700 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 0 200 400 600 800 1000 (t) Iteration Y1 Y1 Iterations 100 to 10000 Independent sampling (n=9900) 800 800 If not satis¯ed, try some other diagnostics (à literature). 800 800 600 600 400 400 Frequency Frequency 900 200 200 900 0 0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (t) Y1 Y1 1000 1000 MCMC, April 29, 2004 - 7 - MCMC, April 29, 2004 - 8 - Markov Chain Monte Carlo Note: Even after the chain reached convergence, it might not yet good enough for estimating (h(Y )).
Recommended publications
  • Markov Chain Monte Carlo Convergence Diagnostics: a Comparative Review
    Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin1 Abstract A critical issue for users of Markov Chain Monte Carlo (MCMC) methods in applications is how to determine when it is safe to stop sampling and use the samples to estimate characteristics of the distribu- tion of interest. Research into methods of computing theoretical convergence bounds holds promise for the future but currently has yielded relatively little that is of practical use in applied work. Consequently, most MCMC users address the convergence problem by applying diagnostic tools to the output produced by running their samplers. After giving a brief overview of the area, we provide an expository review of thirteen convergence diagnostics, describing the theoretical basis and practical implementation of each. We then compare their performance in two simple models and conclude that all the methods can fail to detect the sorts of convergence failure they were designed to identify. We thus recommend a combination of strategies aimed at evaluating and accelerating MCMC sampler convergence, including applying diagnostic procedures to a small number of parallel chains, monitoring autocorrelations and cross- correlations, and modifying parameterizations or sampling algorithms appropriately. We emphasize, however, that it is not possible to say with certainty that a finite sample from an MCMC algorithm is rep- resentative of an underlying stationary distribution. 1 Mary Kathryn Cowles is Assistant Professor of Biostatistics, Harvard School of Public Health, Boston, MA 02115. Bradley P. Carlin is Associate Professor, Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455.
    [Show full text]
  • Meta-Learning MCMC Proposals
    Meta-Learning MCMC Proposals Tongzhou Wang∗ Yi Wu Facebook AI Research University of California, Berkeley [email protected] [email protected] David A. Moorey Stuart J. Russell Google University of California, Berkeley [email protected] [email protected] Abstract Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environ- ments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide fast approximations to block Gibbs conditionals. The learned neural proposals generalize to occurrences of common structural motifs across different models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler yields higher final F1 scores than classical single-site Gibbs sampling. 1 Introduction Model-based probabilistic inference is a highly successful paradigm for machine learning, with applications to tasks as diverse as movie recommendation [31], visual scene perception [17], music transcription [3], etc. People learn and plan using mental models, and indeed the entire enterprise of modern science can be viewed as constructing a sophisticated hierarchy of models of physical, mental, and social phenomena. Probabilistic programming provides a formal representation of models as sample-generating programs, promising the ability to explore a even richer range of models.
    [Show full text]
  • Markov Chain Monte Carlo Edps 590BAY
    Markov Chain Monte Carlo Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2019 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Overview ◮ Introduction to Bayesian computing. ◮ Markov Chain ◮ Metropolis algorithm for mean of normal given fixed variance. ◮ Revisit anorexia data. ◮ Practice problem with SAT data. ◮ Some tools for assessing convergence ◮ Metropolis algorithm for mean and variance of normal. ◮ Anorexia data. ◮ Practice problem. ◮ Summary Depending on the book that you select for this course, read either Gelman et al. pp 2751-291 or Kruschke Chapters pp 143–218. I am relying more on Gelman et al. C.J. Anderson (Illinois) MCMC Fall 2019 2.2/ 45 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Introduction to Bayesian Computing ◮ Our major goal is to approximate the posterior distributions of unknown parameters and use them to estimate parameters. ◮ The analytic computations are fine for simple problems, ◮ Beta-binomial for bounded counts ◮ Normal-normal for continuous variables ◮ Gamma-Poisson for (unbounded) counts ◮ Dirichelt-Multinomial for multicategory variables (i.e., a categorical variable) ◮ Models in the exponential family with small number of parameters ◮ For large number of parameters and more complex models ◮ Algebra of analytic solution becomes overwhelming ◮ Grid takes too much time. ◮ Too difficult for most applications. C.J. Anderson (Illinois) MCMC Fall 2019 3.3/ 45 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Steps in Modeling Recall that the steps in an analysis: 1. Choose model for data (i.e., p(y θ)) and model for parameters | (i.e., p(θ) and p(θ y)).
    [Show full text]
  • Bayesian Phylogenetics and Markov Chain Monte Carlo Will Freyman
    IB200, Spring 2016 University of California, Berkeley Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman 1 Basic Probability Theory Probability is a quantitative measurement of the likelihood of an outcome of some random process. The probability of an event, like flipping a coin and getting heads is notated P (heads). The probability of getting tails is then 1 − P (heads) = P (tails). • The joint probability of event A and event B both occuring is written as P (A; B). The joint probability of mutually exclusive events, like flipping a coin once and getting heads and tails, is 0. • The probability of A occuring given that B has already occurred is written P (AjB), which is read \the probability of A given B". This is a conditional probability. • The marginal probability of A is P (A), which is calculated by summing or integrating the joint probability over B. In other words, P (A) = P (A; B) + P (A; not B). • Joint, conditional, and marginal probabilities can be combined with the expression P (A; B) = P (A)P (BjA) = P (B)P (AjB). • The above expression can be rearranged into Bayes' theorem: P (BjA)P (A) P (AjB) = P (B) Bayes' theorem is a straightforward and uncontroversial way to calculate an inverse conditional probability. • If P (AjB) = P (A) and P (BjA) = P (B) then A and B are independent. Then the joint probability is calculated P (A; B) = P (A)P (B). 2 Interpretations of Probability What exactly is a probability? Does it really exist? There are two major interpretations of probability: • Frequentists believe that the probability of an event is its relative frequency over time.
    [Show full text]
  • The Markov Chain Monte Carlo Approach to Importance Sampling
    The Markov Chain Monte Carlo Approach to Importance Sampling in Stochastic Programming by Berk Ustun B.S., Operations Research, University of California, Berkeley (2009) B.A., Economics, University of California, Berkeley (2009) Submitted to the School of Engineering in partial fulfillment of the requirements for the degree of Master of Science in Computation for Design and Optimization at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2012 c Massachusetts Institute of Technology 2012. All rights reserved. Author.................................................................... School of Engineering August 10, 2012 Certified by . Mort Webster Assistant Professor of Engineering Systems Thesis Supervisor Certified by . Youssef Marzouk Class of 1942 Associate Professor of Aeronautics and Astronautics Thesis Reader Accepted by . Nicolas Hadjiconstantinou Associate Professor of Mechanical Engineering Director, Computation for Design and Optimization 2 The Markov Chain Monte Carlo Approach to Importance Sampling in Stochastic Programming by Berk Ustun Submitted to the School of Engineering on August 10, 2012, in partial fulfillment of the requirements for the degree of Master of Science in Computation for Design and Optimization Abstract Stochastic programming models are large-scale optimization problems that are used to fa- cilitate decision-making under uncertainty. Optimization algorithms for such problems need to evaluate the expected future costs of current decisions, often referred to as the recourse function. In practice, this calculation is computationally difficult as it involves the evaluation of a multidimensional integral whose integrand is an optimization problem. Accordingly, the recourse function is estimated using quadrature rules or Monte Carlo methods. Although Monte Carlo methods present numerous computational benefits over quadrature rules, they require a large number of samples to produce accurate results when they are embedded in an optimization algorithm.
    [Show full text]
  • Markov Chain Monte Carlo (Mcmc) Methods
    MARKOV CHAIN MONTE CARLO (MCMC) METHODS 0These notes utilize a few sources: some insights are taken from Profs. Vardeman’s and Carriquiry’s lecture notes, some from a great book on Monte Carlo strategies in scientific computing by J.S. Liu. EE 527, Detection and Estimation Theory, # 4c 1 Markov Chains: Basic Theory Definition 1. A (discrete time/discrete state space) Markov chain (MC) is a sequence of random quantities {Xk}, each taking values in a (finite or) countable set X , with the property that P {Xn = xn | X1 = x1,...,Xn−1 = xn−1} = P {Xn = xn | Xn−1 = xn−1}. Definition 2. A Markov Chain is stationary if 0 P {Xn = x | Xn−1 = x } is independent of n. Without loss of generality, we will henceforth name the elements of X with the integers 1, 2, 3,... and call them “states.” Definition 3. With 4 pij = P {Xn = j | Xn−1 = i} the square matrix 4 P = {pij} EE 527, Detection and Estimation Theory, # 4c 2 is called the transition matrix for a stationary Markov Chain and the pij are called transition probabilities. Note that a transition matrix has nonnegative entries and its rows sum to 1. Such matrices are called stochastic matrices. More notation for a stationary MC: Define (k) 4 pij = P {Xn+k = j | Xn = i} and (k) 4 fij = P {Xn+k = j, Xn+k−1 6= j, . , Xn+1 6= j, | Xn = i}. These are respectively the probabilities of moving from i to j in k steps and first moving from i to j in k steps.
    [Show full text]
  • An Introduction to Markov Chain Monte Carlo Methods and Their Actuarial Applications
    AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS AND THEIR ACTUARIAL APPLICATIONS DAVID P. M. SCOLLNIK Department of Mathematics and Statistics University of Calgary Abstract This paper introduces the readers of the Proceed- ings to an important class of computer based simula- tion techniques known as Markov chain Monte Carlo (MCMC) methods. General properties characterizing these methods will be discussed, but the main empha- sis will be placed on one MCMC method known as the Gibbs sampler. The Gibbs sampler permits one to simu- late realizations from complicated stochastic models in high dimensions by making use of the model’s associated full conditional distributions, which will generally have a much simpler and more manageable form. In its most extreme version, the Gibbs sampler reduces the analy- sis of a complicated multivariate stochastic model to the consideration of that model’s associated univariate full conditional distributions. In this paper, the Gibbs sampler will be illustrated with four examples. The first three of these examples serve as rather elementary yet instructive applications of the Gibbs sampler. The fourth example describes a reasonably sophisticated application of the Gibbs sam- pler in the important arena of credibility for classifica- tion ratemaking via hierarchical models, and involves the Bayesian prediction of frequency counts in workers compensation insurance. 114 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 115 1. INTRODUCTION The purpose of this paper is to acquaint the readership of the Proceedings with a class of simulation techniques known as Markov chain Monte Carlo (MCMC) methods. These methods permit a practitioner to simulate a dependent sequence of ran- dom draws from very complicated stochastic models.
    [Show full text]
  • Introduction to Markov Chain Monte Carlo
    1 Introduction to Markov Chain Monte Carlo Charles J. Geyer 1.1 History Despite a few notable uses of simulation of random processes in the pre-computer era (Hammersley and Handscomb, 1964, Section 1.2; Stigler, 2002, Chapter 7), practical widespread use of simulation had to await the invention of computers. Almost as soon as computers were invented, they were used for simulation (Hammersley and Handscomb, 1964, Section 1.2). The name “Monte Carlo” started as cuteness—gambling was then (around 1950) illegal in most places, and the casino at Monte Carlo was the most famous in the world—but it soon became a colorless technical term for simulation of random processes. Markov chain Monte Carlo (MCMC) was invented soon after ordinary Monte Carlo at Los Alamos, one of the few places where computers were available at the time. Metropolis ∗ et al. (1953) simulated a liquid in equilibrium with its gas phase. The obvious way to find out about the thermodynamic equilibrium is to simulate the dynamics of the system, and let it run until it reaches equilibrium. The tour de force was their realization that they did not need to simulate the exact dynamics; they only needed to simulate some Markov chain having the same equilibrium distribution. Simulations following the scheme of Metropolis et al. (1953) are said to use the Metropolis algorithm. As computers became more widely available, the Metropolis algorithm was widely used by chemists and physicists, but it did not become widely known among statisticians until after 1990. Hastings (1970) general- ized the Metropolis algorithm, and simulations following his scheme are said to use the Metropolis–Hastings algorithm.
    [Show full text]
  • Estimating Accuracy of the MCMC Variance Estimator: a Central Limit Theorem for Batch Means Estimators
    Estimating accuracy of the MCMC variance estimator: a central limit theorem for batch means estimators Saptarshi Chakraborty∗, Suman K. Bhattacharyay and Kshitij Kharey ∗Department of Epidemiology & Biostatistics Memorial Sloan Kettering Cancer Center 485 Lexington Ave New York, NY 10017, USA e-mail: [email protected] yDepartment of Statistics University of Florida 101 Griffin Floyd Hall Gainesville, Florida 32601, USA e-mail: [email protected] e-mail: [email protected] Abstract: The batch means estimator of the MCMC variance is a simple and effective measure of accuracy for MCMC based ergodic averages. Under various regularity conditions, the estimator has been shown to be consistent for the true variance. However, the estimator can be unstable in practice as it depends directly on the raw MCMC output. A measure of accuracy of the batch means estima- tor itself, ideally in the form of a confidence interval, is therefore desirable. The asymptotic variance of the batch means estimator is known; however, without any knowledge of asymptotic distribution, asymptotic variances are in general insufficient to describe variability. In this article we prove a central limit theorem for the batch means estimator that allows for the construction of asymptotically accu- rate confidence intervals for the batch means estimator. Additionally, our results provide a Markov chain analogue of the classical CLT for the sample variance parameter for i.i.d. observations. Our result assumes standard regularity conditions similar to the ones assumed in the literature for proving consistency. Simulated and real data examples are included as illustrations and applications of the CLT. arXiv:1911.00915v1 [stat.CO] 3 Nov 2019 MSC 2010 subject classifications: Primary 60J22; secondary 62F15.
    [Show full text]
  • Monte Carlo Methods
    Section 2. Monte Carlo Methods Bob Carpenter Columbia University 1 Part I Monte Carlo Integration 2 Monte Carlo Calculation of π • Computing π = 3:14 ::: via simulation is the textbook application of Monte Carlo methods. • Generate points uniformly at random within the square • Calculate proportion within circle (x2 + y 2 < 1) and multiply by square’s area (4) to produce the area of the circle. • This area is π (radius is 1, so area is πr 2 = π) Plot by Mysid Yoderj courtesy of Wikipedia. 3 Monte Carlo Calculation of π (cont.) • R code to calcuate π with Monte Carlo simulation: > x <- runif(1e6,-1,1) > y <- runif(1e6,-1,1) > prop_in_circle <- sum(x^2 + y^2 < 1) / 1e6 > 4 * prop_in_circle [1] 3.144032 4 Accuracy of Monte Carlo • Monte Carlo is not an approximation! • It can be made exact to within any • Monte Carlo draws are i.i.d. by definition • Central limit theorem: expected error decreases at rate of 1 p N • 3 decimal places of accuracy with sample size 1e6 • Need 100× larger sample for each digit of accuracy 5 General Monte Carlo Integration • MC can calculate arbitrary definite inte- grals, Z b f (x) dx a • Let d upper bound f (x) in (a; b); tightness determines computational efficiency • Then generate random points uniformly in the rectangle bounded by (a; b) and (0; d) • Multiply proportion of draws (x, y) where y < f (x) by area of rectangle, d × (b − a). • Can be generalized to multiple dimensions in obvious way Image courtesy of Jeff Cruzan, http://www.drcruzan.com/NumericalIntegration.html 6 Expectations of Function of R.V.
    [Show full text]
  • Introduction to Bayesian Network Meta-Analysis
    Introduction to Bayesian Network Meta-Analysis Beth Devine, PhD, PharmD, MBA Ashley Cha, PharmD, MS CLEAR Center July 27, 2020 Outline > Part 1: Overview of Network Meta-Analysis (NMA) • Context for Evidence Synthesis • From meta-analysis (MA) to NMA • Fundamentals of NMA > Part 2: Case Study • Microvascular Benefits of New Anti-Diabetic Agents > NMA of Renal Outcomes Evidence Synthesis in Comparative effectiveness research (CER); Health technology assessment (HTA) RCT Systematic Reviews Clinical Guidelines FDA Product Approval Adapted from Luce…Sullivan, et al. Milbank Quarterly 2010;88(2):256–276; CED: coverage with evidence development; EBM = evidence-based medicine; RCT = randomized controlled trial From Meta-Analysis to NMA • Recall…. • Meta-analyses are useful for informing evidence-based decision-making – Quantitatively (statistically) pooling results – Comparable studies of the same intervention to the same comparator – Obtain overall estimate of effect – usually OR, RR, HR, or Standardized Mean Difference (SMD) – Each study weighted according to size and uncertainty (weighted mean) – Fixed effects and random effects models are used Traditional Meta-analysis Agapova, Devine, Nguyen, Wolf, Inoue J. Comp. Eff. Res. 2014;3(4), 345–357 Introduction to Network MA Methods (1) > But now… > Network of studies involves > 2 drugs • Drug A to C (studyAC) • Drug B to C (studyBC) > We wish to know how Drug A compared to Drug B – can make an indirect comparison studyAB = studyAC – studyBC Introduction to Network MA Methods (2) A B C “Indirect
    [Show full text]
  • Bayesian and Markov Chain Monte Carlo Methods for Identifying
    Bayesian and Markov chain Monte Carlo methods for rsta.royalsocietypublishing.org identifying nonlinear systems in the presence of uncertainty Opinion piece P.L. Green† and K. Worden Cite this article: Green PL, Worden K. 2015 Department of Mechanical Engineering, University of Sheffield, Bayesian and Markov chain Monte Carlo MappinStreet,SheffieldS13JD,UK methods for identifying nonlinear systems in thepresenceofuncertainty.Phil.Trans.R. In this paper, the authors outline the general principles Soc. A 373: 20140405. behind an approach to Bayesian system identification http://dx.doi.org/10.1098/rsta.2014.0405 and highlight the benefits of adopting a Bayesian framework when attempting to identify models of Accepted: 22 May 2015 nonlinear dynamical systems in the presence of uncertainty. It is then described how, through a summary of some key algorithms, many of the One contribution of 11 to a theme issue ‘A field potential difficulties associated with a Bayesian guide to nonlinearity in structural dynamics’. approach can be overcome through the use of Markov chain Monte Carlo (MCMC) methods. The paper Subject Areas: concludes with a case study, where an MCMC mechanical engineering algorithm is used to facilitate the Bayesian system identification of a nonlinear dynamical system from Keywords: experimentally observed acceleration time histories. nonlinear, system identification, model updating, Bayesian 1. Introduction Author for correspondence: System Identification (SI) is a technique of considerable P.L. Green importance within the discipline of structural dynamics. e-mail: [email protected] In the absence of a complete physics-based description of a system or structure, SI can provide the missing pieces of information that allow the formulation of a descriptive or predictive model.
    [Show full text]