The Markov Chain Monte Carlo Approach to Importance Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Monte Carlo Simulation in Statistics with a View to Recent Ideas
(sfi)2 Statistics for Innovation Monte Carlo simulation in statistics, with a view to recent ideas Arnoldo Frigessi [email protected] THE JOURNAL OF CHEMICAL PHYSICS VOLUME 21, NUMBER 6 JUNE, 1953 Equation of State Calculations by Fast Computing Machines NICHOLAS METROPOLIS, ARIANNA W. ROSENBLUTH, MARSHALL N. ROSENBLUTH, AND AUGUSTA H. TELLER, Los Alamos Scientific Laboratory, Los Alamos, New Mexico AND EDWARD TELLER, * Department of Physics, University of Chicago, Chicago, Illinois (Received March 6, 1953) A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two-dimensional rigid-sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four-term virial coefficient expansion. The Metropolis algorithm from1953 has been cited as among the top 10 algorithms having the "greatest influence on the development and practice of science and engineering in the 20th century." MANIAC The MANIAC II 1952 Multiplication took a milli- second. (Metropolis choose the name MANIAC in the hope of stopping the rash of ”silly” acronyms for machine names.) Some MCMC milestones 1953 Metropolis Heat bath, dynamic Monte Carlo 1970 Hastings Political Analysis, 10:3 (2002) Society for Political Methodology 1984 Geman & Geman Gibbs -
Markov Chain Monte Carlo Convergence Diagnostics: a Comparative Review
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin1 Abstract A critical issue for users of Markov Chain Monte Carlo (MCMC) methods in applications is how to determine when it is safe to stop sampling and use the samples to estimate characteristics of the distribu- tion of interest. Research into methods of computing theoretical convergence bounds holds promise for the future but currently has yielded relatively little that is of practical use in applied work. Consequently, most MCMC users address the convergence problem by applying diagnostic tools to the output produced by running their samplers. After giving a brief overview of the area, we provide an expository review of thirteen convergence diagnostics, describing the theoretical basis and practical implementation of each. We then compare their performance in two simple models and conclude that all the methods can fail to detect the sorts of convergence failure they were designed to identify. We thus recommend a combination of strategies aimed at evaluating and accelerating MCMC sampler convergence, including applying diagnostic procedures to a small number of parallel chains, monitoring autocorrelations and cross- correlations, and modifying parameterizations or sampling algorithms appropriately. We emphasize, however, that it is not possible to say with certainty that a finite sample from an MCMC algorithm is rep- resentative of an underlying stationary distribution. 1 Mary Kathryn Cowles is Assistant Professor of Biostatistics, Harvard School of Public Health, Boston, MA 02115. Bradley P. Carlin is Associate Professor, Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455. -
Meta-Learning MCMC Proposals
Meta-Learning MCMC Proposals Tongzhou Wang∗ Yi Wu Facebook AI Research University of California, Berkeley [email protected] [email protected] David A. Moorey Stuart J. Russell Google University of California, Berkeley [email protected] [email protected] Abstract Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environ- ments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide fast approximations to block Gibbs conditionals. The learned neural proposals generalize to occurrences of common structural motifs across different models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler yields higher final F1 scores than classical single-site Gibbs sampling. 1 Introduction Model-based probabilistic inference is a highly successful paradigm for machine learning, with applications to tasks as diverse as movie recommendation [31], visual scene perception [17], music transcription [3], etc. People learn and plan using mental models, and indeed the entire enterprise of modern science can be viewed as constructing a sophisticated hierarchy of models of physical, mental, and social phenomena. Probabilistic programming provides a formal representation of models as sample-generating programs, promising the ability to explore a even richer range of models. -
A Selected Bibliography of Publications By, and About, J
A Selected Bibliography of Publications by, and about, J. Robert Oppenheimer Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 17 March 2021 Version 1.47 Title word cross-reference $1 [Duf46]. $12.95 [Edg91]. $13.50 [Tho03]. $14.00 [Hug07]. $15.95 [Hen81]. $16.00 [RS06]. $16.95 [RS06]. $17.50 [Hen81]. $2.50 [Opp28g]. $20.00 [Hen81, Jor80]. $24.95 [Fra01]. $25.00 [Ger06]. $26.95 [Wol05]. $27.95 [Ger06]. $29.95 [Goo09]. $30.00 [Kev03, Kle07]. $32.50 [Edg91]. $35 [Wol05]. $35.00 [Bed06]. $37.50 [Hug09, Pol07, Dys13]. $39.50 [Edg91]. $39.95 [Bad95]. $8.95 [Edg91]. α [Opp27a, Rut27]. γ [LO34]. -particles [Opp27a]. -rays [Rut27]. -Teilchen [Opp27a]. 0-226-79845-3 [Guy07, Hug09]. 0-8014-8661-0 [Tho03]. 0-8047-1713-3 [Edg91]. 0-8047-1714-1 [Edg91]. 0-8047-1721-4 [Edg91]. 0-8047-1722-2 [Edg91]. 0-9672617-3-2 [Bro06, Hug07]. 1 [Opp57f]. 109 [Con05, Mur05, Nas07, Sap05a, Wol05, Kru07]. 112 [FW07]. 1 2 14.99/$25.00 [Ber04a]. 16 [GHK+96]. 1890-1960 [McG02]. 1911 [Meh75]. 1945 [GHK+96, Gow81, Haw61, Bad95, Gol95a, Hew66, She82, HBP94]. 1945-47 [Hew66]. 1950 [Ano50]. 1954 [Ano01b, GM54, SZC54]. 1960s [Sch08a]. 1963 [Kuh63]. 1967 [Bet67a, Bet97, Pun67, RB67]. 1976 [Sag79a, Sag79b]. 1981 [Ano81]. 20 [Goe88]. 2005 [Dre07]. 20th [Opp65a, Anoxx, Kai02]. -
Markov Chain Monte Carlo Edps 590BAY
Markov Chain Monte Carlo Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2019 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Overview ◮ Introduction to Bayesian computing. ◮ Markov Chain ◮ Metropolis algorithm for mean of normal given fixed variance. ◮ Revisit anorexia data. ◮ Practice problem with SAT data. ◮ Some tools for assessing convergence ◮ Metropolis algorithm for mean and variance of normal. ◮ Anorexia data. ◮ Practice problem. ◮ Summary Depending on the book that you select for this course, read either Gelman et al. pp 2751-291 or Kruschke Chapters pp 143–218. I am relying more on Gelman et al. C.J. Anderson (Illinois) MCMC Fall 2019 2.2/ 45 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Introduction to Bayesian Computing ◮ Our major goal is to approximate the posterior distributions of unknown parameters and use them to estimate parameters. ◮ The analytic computations are fine for simple problems, ◮ Beta-binomial for bounded counts ◮ Normal-normal for continuous variables ◮ Gamma-Poisson for (unbounded) counts ◮ Dirichelt-Multinomial for multicategory variables (i.e., a categorical variable) ◮ Models in the exponential family with small number of parameters ◮ For large number of parameters and more complex models ◮ Algebra of analytic solution becomes overwhelming ◮ Grid takes too much time. ◮ Too difficult for most applications. C.J. Anderson (Illinois) MCMC Fall 2019 3.3/ 45 Overivew Bayesian computing Markov Chains Metropolis Example Practice Assessment Metropolis 2 Practice Steps in Modeling Recall that the steps in an analysis: 1. Choose model for data (i.e., p(y θ)) and model for parameters | (i.e., p(θ) and p(θ y)). -
The Convergence of Markov Chain Monte Carlo Methods 3
The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo Michael Betancourt From its inception in the 1950s to the modern frontiers of applied statistics, Markov chain Monte Carlo has been one of the most ubiquitous and successful methods in statistical computing. The development of the method in that time has been fueled by not only increasingly difficult problems but also novel techniques adopted from physics. In this article I will review the history of Markov chain Monte Carlo from its inception with the Metropolis method to the contemporary state-of-the-art in Hamiltonian Monte Carlo. Along the way I will focus on the evolving interplay between the statistical and physical perspectives of the method. This particular conceptual emphasis, not to mention the brevity of the article, requires a necessarily incomplete treatment. A complementary, and entertaining, discussion of the method from the statistical perspective is given in Robert and Casella (2011). Similarly, a more thorough but still very readable review of the mathematics behind Markov chain Monte Carlo and its implementations is given in the excellent survey by Neal (1993). I will begin with a discussion of the mathematical relationship between physical and statistical computation before reviewing the historical introduction of Markov chain Monte Carlo and its first implementations. Then I will continue to the subsequent evolution of the method with increasing more sophisticated implementations, ultimately leading to the advent of Hamiltonian Monte Carlo. 1. FROM PHYSICS TO STATISTICS AND BACK AGAIN At the dawn of the twentieth-century, physics became increasingly focused on under- arXiv:1706.01520v2 [stat.ME] 10 Jan 2018 standing the equilibrium behavior of thermodynamic systems, especially ensembles of par- ticles. -
Bayesian Phylogenetics and Markov Chain Monte Carlo Will Freyman
IB200, Spring 2016 University of California, Berkeley Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman 1 Basic Probability Theory Probability is a quantitative measurement of the likelihood of an outcome of some random process. The probability of an event, like flipping a coin and getting heads is notated P (heads). The probability of getting tails is then 1 − P (heads) = P (tails). • The joint probability of event A and event B both occuring is written as P (A; B). The joint probability of mutually exclusive events, like flipping a coin once and getting heads and tails, is 0. • The probability of A occuring given that B has already occurred is written P (AjB), which is read \the probability of A given B". This is a conditional probability. • The marginal probability of A is P (A), which is calculated by summing or integrating the joint probability over B. In other words, P (A) = P (A; B) + P (A; not B). • Joint, conditional, and marginal probabilities can be combined with the expression P (A; B) = P (A)P (BjA) = P (B)P (AjB). • The above expression can be rearranged into Bayes' theorem: P (BjA)P (A) P (AjB) = P (B) Bayes' theorem is a straightforward and uncontroversial way to calculate an inverse conditional probability. • If P (AjB) = P (A) and P (BjA) = P (B) then A and B are independent. Then the joint probability is calculated P (A; B) = P (A)P (B). 2 Interpretations of Probability What exactly is a probability? Does it really exist? There are two major interpretations of probability: • Frequentists believe that the probability of an event is its relative frequency over time. -
Monte Carlo Sampling Methods
[1] Monte Carlo Sampling Methods Jasmina L. Vujic Nuclear Engineering Department University of California, Berkeley Email: [email protected] phone: (510) 643-8085 fax: (510) 643-9685 UCBNE, J. Vujic [2] Monte Carlo Monte Carlo is a computational technique based on constructing a random process for a problem and carrying out a NUMERICAL EXPERIMENT by N-fold sampling from a random sequence of numbers with a PRESCRIBED probability distribution. x - random variable N 1 xˆ = ---- x N∑ i i = 1 Xˆ - the estimated or sample mean of x x - the expectation or true mean value of x If a problem can be given a PROBABILISTIC interpretation, then it can be modeled using RANDOM NUMBERS. UCBNE, J. Vujic [3] Monte Carlo • Monte Carlo techniques came from the complicated diffusion problems that were encountered in the early work on atomic energy. • 1772 Compte de Bufon - earliest documented use of random sampling to solve a mathematical problem. • 1786 Laplace suggested that π could be evaluated by random sampling. • Lord Kelvin used random sampling to aid in evaluating time integrals associated with the kinetic theory of gases. • Enrico Fermi was among the first to apply random sampling methods to study neutron moderation in Rome. • 1947 Fermi, John von Neuman, Stan Frankel, Nicholas Metropolis, Stan Ulam and others developed computer-oriented Monte Carlo methods at Los Alamos to trace neutrons through fissionable materials UCBNE, J. Vujic Monte Carlo [4] Monte Carlo methods can be used to solve: a) The problems that are stochastic (probabilistic) by nature: - particle transport, - telephone and other communication systems, - population studies based on the statistics of survival and reproduction. -
Markov Chain Monte Carlo (Mcmc) Methods
MARKOV CHAIN MONTE CARLO (MCMC) METHODS 0These notes utilize a few sources: some insights are taken from Profs. Vardeman’s and Carriquiry’s lecture notes, some from a great book on Monte Carlo strategies in scientific computing by J.S. Liu. EE 527, Detection and Estimation Theory, # 4c 1 Markov Chains: Basic Theory Definition 1. A (discrete time/discrete state space) Markov chain (MC) is a sequence of random quantities {Xk}, each taking values in a (finite or) countable set X , with the property that P {Xn = xn | X1 = x1,...,Xn−1 = xn−1} = P {Xn = xn | Xn−1 = xn−1}. Definition 2. A Markov Chain is stationary if 0 P {Xn = x | Xn−1 = x } is independent of n. Without loss of generality, we will henceforth name the elements of X with the integers 1, 2, 3,... and call them “states.” Definition 3. With 4 pij = P {Xn = j | Xn−1 = i} the square matrix 4 P = {pij} EE 527, Detection and Estimation Theory, # 4c 2 is called the transition matrix for a stationary Markov Chain and the pij are called transition probabilities. Note that a transition matrix has nonnegative entries and its rows sum to 1. Such matrices are called stochastic matrices. More notation for a stationary MC: Define (k) 4 pij = P {Xn+k = j | Xn = i} and (k) 4 fij = P {Xn+k = j, Xn+k−1 6= j, . , Xn+1 6= j, | Xn = i}. These are respectively the probabilities of moving from i to j in k steps and first moving from i to j in k steps. -
An Introduction to Markov Chain Monte Carlo Methods and Their Actuarial Applications
AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS AND THEIR ACTUARIAL APPLICATIONS DAVID P. M. SCOLLNIK Department of Mathematics and Statistics University of Calgary Abstract This paper introduces the readers of the Proceed- ings to an important class of computer based simula- tion techniques known as Markov chain Monte Carlo (MCMC) methods. General properties characterizing these methods will be discussed, but the main empha- sis will be placed on one MCMC method known as the Gibbs sampler. The Gibbs sampler permits one to simu- late realizations from complicated stochastic models in high dimensions by making use of the model’s associated full conditional distributions, which will generally have a much simpler and more manageable form. In its most extreme version, the Gibbs sampler reduces the analy- sis of a complicated multivariate stochastic model to the consideration of that model’s associated univariate full conditional distributions. In this paper, the Gibbs sampler will be illustrated with four examples. The first three of these examples serve as rather elementary yet instructive applications of the Gibbs sampler. The fourth example describes a reasonably sophisticated application of the Gibbs sam- pler in the important arena of credibility for classifica- tion ratemaking via hierarchical models, and involves the Bayesian prediction of frequency counts in workers compensation insurance. 114 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 115 1. INTRODUCTION The purpose of this paper is to acquaint the readership of the Proceedings with a class of simulation techniques known as Markov chain Monte Carlo (MCMC) methods. These methods permit a practitioner to simulate a dependent sequence of ran- dom draws from very complicated stochastic models. -
Oral History Interview with Nicolas C. Metropolis
An Interview with NICOLAS C. METROPOLIS OH 135 Conducted by William Aspray on 29 May 1987 Los Alamos, NM Charles Babbage Institute The Center for the History of Information Processing University of Minnesota, Minneapolis Copyright, Charles Babbage Institute 1 Nicholas C. Metropolis Interview 29 May 1987 Abstract Metropolis, the first director of computing services at Los Alamos National Laboratory, discusses John von Neumann's work in computing. Most of the interview concerns activity at Los Alamos: how von Neumann came to consult at the laboratory; his scientific contacts there, including Metropolis, Robert Richtmyer, and Edward Teller; von Neumann's first hands-on experience with punched card equipment; his contributions to shock-fitting and the implosion problem; interactions between, and comparisons of von Neumann and Enrico Fermi; and the development of Monte Carlo techniques. Other topics include: the relationship between Turing and von Neumann; work on numerical methods for non-linear problems; and the ENIAC calculations done for Los Alamos. 2 NICHOLAS C. METROPOLIS INTERVIEW DATE: 29 May 1987 INTERVIEWER: William Aspray LOCATION: Los Alamos, NM ASPRAY: This is an interview on the 29th of May, 1987 in Los Alamos, New Mexico with Dr. Nicholas Metropolis. Why don't we begin by having you tell me something about when you first met von Neumann and what your relationship with him was over a period of time? METROPOLIS: Well, he first came to Los Alamos on the invitation of Dr. Oppenheimer. I don't remember quite when the date was that he came, but it was rather early on. It may have been the fall of 1943. -
Doing Mathematics on the ENIAC. Von Neumann's and Lehmer's
Doing Mathematics on the ENIAC. Von Neumann's and Lehmer's different visions.∗ Liesbeth De Mol Center for Logic and Philosophy of Science University of Ghent, Blandijnberg 2, 9000 Gent, Belgium [email protected] Abstract In this paper we will study the impact of the computer on math- ematics and its practice from a historical point of view. We will look at what kind of mathematical problems were implemented on early electronic computing machines and how these implementations were perceived. By doing so, we want to stress that the computer was in fact, from its very beginning, conceived as a mathematical instru- ment per se, thus situating the contemporary usage of the computer in mathematics in its proper historical background. We will focus on the work by two computer pioneers: Derrick H. Lehmer and John von Neumann. They were both involved with the ENIAC and had strong opinions about how these new machines might influence (theoretical and applied) mathematics. 1 Introduction The impact of the computer on society can hardly be underestimated: it affects almost every aspect of our lives. This influence is not restricted to ev- eryday activities like booking a hotel or corresponding with friends. Science has been changed dramatically by the computer { both in its form and in ∗The author is currently a postdoctoral research fellow of the Fund for Scientific Re- search { Flanders (FWO) and a fellow of the Kunsthochschule f¨urMedien, K¨oln. 1 its content. Also mathematics did not escape this influence of the computer. In fact, the first computer applications were mathematical in nature, i.e., the first electronic general-purpose computing machines were used to solve or study certain mathematical (applied as well as theoretical) problems.