Bayesian Programming

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Programming Bayesian Programming K13774_FM.indd 1 10/28/13 2:17 PM Chapman & Hall/CRC Machine Learning & Pattern Recognition Series SERIES EDITORS Ralf Herbrich Thore Graepel Amazon Development Center Microsoft Research Ltd. Berlin, Germany Cambridge, UK AIMS AND SCOPE This series reflects the latest advances and applications in machine learning and pat- tern recognition through the publication of a broad range of reference works, text- books, and handbooks. The inclusion of concrete examples, applications, and meth- ods is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of machine learning, pattern recognition, computational intelligence, robotics, computational/statistical learning theory, natural language processing, computer vision, game AI, game theory, neural networks, computational neurosci- ence, and other relevant topics, such as machine learning applied to bioinformatics or cognitive science, which might be proposed by potential contributors. PUBLISHED TITLES MACHINE LEARNING: An Algorithmic Perspective Stephen Marsland HANDBOOK OF NATURAL LANGUAGE PROCESSING, Second Edition Nitin Indurkhya and Fred J. Damerau UTILITY-BASED LEARNING FROM DATA Craig Friedman and Sven Sandow A FIRST COURSE IN MACHINE LEARNING Simon Rogers and Mark Girolami COST-SENSITIVE MACHINE LEARNING Balaji Krishnapuram, Shipeng Yu, and Bharat Rao ENSEMBLE METHODS: FOUNDATIONS AND ALGORITHMS Zhi-Hua Zhou MULTI-LABEL DIMENSIONALITY REDUCTION Liang Sun, Shuiwang Ji, and Jieping Ye BAYESIAN PROGRAMMING Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin, and Kamel Mekhnacha K13774_FM.indd 2 10/28/13 2:17 PM Chapman & Hall/CRC Machine Learning & Pattern Recognition Series Bayesian Programming Pierre Bessière CNRS, Paris, France Emmanuel Mazer CNRS, Grenoble, France Juan-Manuel Ahuactzin PROBAYES, Puebla, Mexico Kamel Mekhnacha PROBAYES, Grenoble, France K13774_FM.indd 3 10/28/13 2:17 PM CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20131023 International Standard Book Number-13: 978-1-4398-8033-3 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com To the late Edwin Thompson Jaynes for his doubts about certitudes and for his certitudes about probabilities This page intentionally left blank Contents Foreword xv Preface xvii 1 Introduction 1 1.1 Probability an alternative to logic . 1 1.2 Aneedforanewcomputingparadigm . 5 1.3 A need for a new modeling methodology . 5 1.4 A need for new inference algorithms . 8 1.5 A need for a new programming language and new hardware 10 1.6 Aplacefornumerouscontroversies . 11 1.7 Runningrealprogramsasexercises . 12 I Bayesian Programming Principles 15 2 Basic Concepts 17 2.1 Variable ............................. 18 2.2 Probability . 18 2.3 The normalization postulate . 19 2.4 Conditional probability . 19 2.5 Variable conjunction . 20 2.6 Theconjunctionpostulate(Bayestheorem) . 20 2.7 Syllogisms . 21 2.8 The marginalization rule . 22 2.9 Joint distribution and questions . 23 2.10 Decomposition ......................... 25 2.11 Parametricforms ........................ 26 2.12 Identification .......................... 28 2.13 Specification = Variables + Decomposition + Parametricforms ........................ 29 2.14 Description = Specification + Identification . 29 2.15 Question............................. 29 2.16 Bayesian program = Description + Question . 31 2.17 Results ............................. 32 vii viii Contents 3 Incompleteness and Uncertainty 35 3.1 Observingawatertreatmentunit . 35 3.1.1 The elementary water treatment unit . 36 3.1.2 Experimentation and uncertainty . 38 3.2 Lessons,comments,andnotes . 40 3.2.1 The effect of incompleteness . 40 3.2.2 Theeffectofinaccuracy . .. .. .. .. .. .. 41 3.2.3 Not taking into account the effect of ignored variables may lead to wrong decisions . 42 3.2.4 From incompleteness to uncertainty . 43 4 Description = Specification + Identification 47 4.1 Pushing objects and following contours . 48 4.1.1 TheKheperarobot ................... 48 4.1.2 Pushingobjects ..................... 49 4.1.3 Following contours . 53 4.2 Description of a water treatment unit . 56 4.2.1 Specification . 56 4.2.2 Identification . 59 4.2.3 Bayesianprogram.................... 59 4.2.4 Results . 60 4.3 Lessons,comments,andnotes . 60 4.3.1 Description = Specification + Identification . 60 4.3.2 Specification = Variables + Decomposition + Forms 61 4.3.3 Learning is a means to transform incompleteness into uncertainty ....................... 62 5 TheImportanceofConditionalIndependence 65 5.1 WatertreatmentcenterBayesianmodel . 65 5.2 Descriptionofthewatertreatmentcenter . 66 5.2.1 Specification . 66 5.2.2 Identification . 70 5.2.3 Bayesianprogram.................... 71 5.3 Lessons,comments,andnotes . 71 5.3.1 Independence versus conditional independence . 71 5.3.2 The importance of conditional independence . 73 6 BayesianProgram=Description+Question 75 6.1 WatertreatmentcenterBayesianmodel(end) . 76 6.2 Forward simulation of a single unit . 76 6.2.1 Question . 77 6.2.2 Results . 78 6.3 Forwardsimulationofthe water treatmentcenter . 78 6.3.1 Question . 78 6.3.2 Results . 80 Contents ix 6.4 Controlofthewatertreatmentcenter . 81 6.4.1 Question(1)....................... 81 6.4.2 Results(1)........................ 81 6.4.3 Question(2)....................... 82 6.4.4 Results(2)........................ 84 6.5 Diagnosis ............................ 85 6.5.1 Question . 86 6.5.2 Results . 86 6.6 Lessons,comments,andnotes . 87 6.6.1 Bayesian Program = Description + Question . 87 6.6.2 The essence of Bayesian inference . 88 6.6.3 No inverse or direct problem . 89 6.6.4 No ill-posed problem . 89 II Bayesian Programming Cookbook 91 7 Information Fusion 93 7.1 “Naive”Bayessensorfusion . 94 7.1.1 Statementoftheproblem . 94 7.1.2 Bayesianprogram.................... 94 7.1.3 Instanceandresults.. .. .. .. .. .. .. .. 96 7.2 Relaxing the conditional independence fundamental hypothesis ............................ 102 7.2.1 Statementoftheproblem . 102 7.2.2 Bayesianprogram. .. .. .. .. .. .. .. .. 103 7.2.3 Instanceandresults. 103 7.3 Classification . 105 7.3.1 Statementoftheproblem . 105 7.3.2 Bayesianprogram. .. .. .. .. .. .. .. .. 106 7.3.3 Instanceandresults. 106 7.4 Ancillary clues . 108 7.4.1 Statementoftheproblem . 108 7.4.2 Bayesianprogram. .. .. .. .. .. .. .. .. 108 7.4.3 Instanceandresults. 110 7.5 Sensor fusion with false alarm . 113 7.5.1 Statementoftheproblem . 113 7.5.2 Bayesianprogram. .. .. .. .. .. .. .. .. 114 7.5.3 Instanceandresults. 114 7.6 Inverseprogramming ...................... 116 7.6.1 Statementoftheproblem . 116 7.6.2 Bayesianprogram. .. .. .. .. .. .. .. .. 117 7.6.3 Instanceandresults. 118 x Contents 8 Bayesian Programming with Coherence Variables 121 8.1 Basic example with Boolean variables . 122 8.1.1 Statementoftheproblem . 122 8.1.2 Bayesianprogram. .. .. .. .. .. .. .. .. 123 8.1.3 Instanceandresults. 124 8.2 Basic example with discrete variables . 125 8.2.1 Statementoftheproblem . 125 8.2.2 Bayesianprogram. .. .. .. .. .. .. .. .. 126 8.2.3 Instanceandresults. 126 8.3 Checking the semantic of Λ .................. 130 8.3.1 Statementoftheproblem . 130 8.3.2 Bayesianprogram. .. .. .. .. .. .. .. .. 130 8.3.3 Instanceandresults. 131 8.4 Information fusion revisited using coherence variables . 132 8.4.1 Statementoftheproblems . 132 8.4.2 Bayesianprogram. .. .. .. .. .. .. .. .. 135 8.4.3 Instanceandresults. 135 8.5 Reasoning with soft evidence . 141 8.5.1 Statementoftheproblem . 141 8.5.2 Bayesianprogram. .. .. .. .. .. .. .. .. 142 8.5.3 Instanceandresults. 143 8.6 Switch .............................. 145 8.6.1 Statementoftheproblem . 145 8.6.2 Bayesianprogram.
Recommended publications
  • The Semantics of Subroutines and Iteration in the Bayesian Programming Language Probt
    1 The semantics of Subroutines and Iteration in the Bayesian Programming language ProBT R. LAURENT∗, K. MEKHNACHA∗, E. MAZERy and P. BESSIERE` z ∗ProbaYes S.A.S, Grenoble, France yUniversity of Grenoble-Alpes, CNRS/LIG, Grenoble, France zUniversity Pierre et Marie Curie, CNRS/ISIR, Paris, France Abstract—Bayesian models are tools of choice when solving 8 8 8 problems with incomplete information. Bayesian networks pro- > > >Variables > > <> vide a first but limited approach to address such problems. > <> Decomposition <> Specification (π) For real world applications, additional semantics is needed to Description > P arametric construct more complex models, especially those with repetitive > :>F orms > > P rogram structures or substructures. ProBT, a Bayesian a programming > :> > Identification (based on δ) language, provides a set of constructs for developing and applying :> complex models with substructures and repetitive structures. Question The goal of this paper is to present and discuss the semantics associated to these constructs. Figure 1. A Bayesian program is constructed from a Description and a Question. The Description is given by the programmer who writes a Index Terms—Probabilistic Programming semantics , Bayesian Specification of a model π and an Identification of its parameter values, Programming, ProBT which can be set in the program or obtained through a learning process from a data set δ. The Specification is constructed from a set of relevant variables, a decomposition of the joint probability distribution over these variables and a set of parametric forms (mathematical models) for the terms I. INTRODUCTION of this decomposition. ProBT [1] was designed to translate the ideas of E.T. Jaynes [2] into an actual programming language.
    [Show full text]
  • A Modern History of Probability Theory
    A Modern History of Probability Theory Kevin H. Knuth Depts. of Physics and Informatics University at Albany (SUNY) Albany NY USA 4/29/2016 Knuth - Bayes Forum 1 A Modern History of Probability Theory Kevin H. Knuth Depts. of Physics and Informatics University at Albany (SUNY) Albany NY USA 4/29/2016 Knuth - Bayes Forum 2 A Long History The History of Probability Theory, Anthony J.M. Garrett MaxEnt 1997, pp. 223-238. Hájek, Alan, "Interpretations of Probability", The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.), URL = <http://plato.stanford.edu/archives/win2012/entries/probability-interpret/>. 4/29/2016 Knuth - Bayes Forum 3 … la théorie des probabilités n'est, au fond, que le bon sens réduit au calcul … … the theory of probabilities is basically just common sense reduced to calculation … Pierre Simon de Laplace Théorie Analytique des Probabilités 4/29/2016 Knuth - Bayes Forum 4 Taken from Harold Jeffreys “Theory of Probability” 4/29/2016 Knuth - Bayes Forum 5 The terms certain and probable describe the various degrees of rational belief about a proposition which different amounts of knowledge authorise us to entertain. All propositions are true or false, but the knowledge we have of them depends on our circumstances; and while it is often convenient to speak of propositions as certain or probable, this expresses strictly a relationship in which they stand to a corpus of knowledge, actual or hypothetical, and not a characteristic of the propositions in themselves. A proposition is capable at the same time of varying degrees of this relationship, depending upon the knowledge to which it is related, so that it is without significance to call a John Maynard Keynes proposition probable unless we specify the knowledge to which we are relating it.
    [Show full text]
  • Teaching Bayesian Behaviours to Video Game Characters
    Robotics and Autonomous Systems 47 (2004) 177–185 Teaching Bayesian behaviours to video game characters Ronan Le Hy∗, Anthony Arrigoni, Pierre Bessière, Olivier Lebeltel GRAVIR/IMAG, INRIA Rhˆone-Alpes, ZIRST, 38330 Montbonnot, France Abstract This article explores an application of Bayesian programming to behaviours for synthetic video games characters. We address the problem of real-time reactive selection of elementary behaviours for an agent playing a first person shooter game. We show how Bayesian programming can lead to condensed and easier formalisation of finite state machine-like behaviour selection, and lend itself to learning by imitation, in a fully transparent way for the player. © 2004 Published by Elsevier B.V. Keywords: Bayesian programming; Video games characters; Finite state machine; Learning by imitation 1. Introduction After listing our practical objectives, we will present our Bayesian model. We will show how we use it to Today’s video games feature synthetic characters specify by hand a behaviour, and how we use it to involved in complex interactions with human players. learn a behaviour. We will tackle learning by exam- A synthetic character may have one of many different ple using a high-level interface, and then the natural roles: tactical enemy, partner for the human, strategic controls of the game. We will show that it is possible opponent, simple unit amongst many, commenter, etc. to map the player’s actions onto bot states, and use In all of these cases, the game developer’s ultimate this reconstruction to learn our model. Finally, we will objective is for the synthetic character to act like a come back to our objectives as a conclusion.
    [Show full text]
  • The Logical Basis of Bayesian Reasoning and Its Application on Judicial Judgment
    2018 International Workshop on Advances in Social Sciences (IWASS 2018) The Logical Basis of Bayesian Reasoning and Its Application on Judicial Judgment Juan Liu Zhengzhou University of Industry Technology, Xinzheng, Henan, 451100, China Keywords: Bayesian reasoning; Bayesian network; judicial referee Abstract: Bayesian inference is a law that corrects subjective judgments of related probabilities based on observed phenomena. The logical basis is that when the sample's capacity is close to the population, the probability of occurrence of events in the sample is close to the probability of occurrence of the population. The basic expression is: posterior probability = prior probability × standard similarity. Bayesian networks are applications of Bayesian inference, including directed acyclic graphs (DAGs) and conditional probability tables (CPTs) between nodes. Using the Bayesian programming tool to construct the Bayesian network, the ECHO model is used to analyze the node structure of the proposition in the first trial of von Blo, and the jury can be simulated by the insertion of the probability value in the judgment of the jury in the first instance, but find and set The difficulty of all conditional probabilities limits the effectiveness of its display of causal structures. 1. Introduction The British mathematician Thomas Bayes (about 1701-1761) used inductive reasoning for the basic theory of probability theory and created Bayesian statistical theory, namely Bayesian reasoning. After the continuous improvement of scholars in later generations, a scientific methodology system has gradually formed, "applied to many fields and developed many branches." [1] Bayesian reasoning needs to reason about the estimates and hypotheses to be made based on the sample information observed by the observer and the relevant experience of the inferencer.
    [Show full text]
  • A Brief Overview of Probability Theory in Data Science by Geert
    A brief overview of probability theory in data science Geert Verdoolaege 1Department of Applied Physics, Ghent University, Ghent, Belgium 2Laboratory for Plasma Physics, Royal Military Academy (LPP–ERM/KMS), Brussels, Belgium Tutorial 3rd IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis, 27-05-2019 Overview 1 Origins of probability 2 Frequentist methods and statistics 3 Principles of Bayesian probability theory 4 Monte Carlo computational methods 5 Applications Classification Regression analysis 6 Conclusions and references 2 Overview 1 Origins of probability 2 Frequentist methods and statistics 3 Principles of Bayesian probability theory 4 Monte Carlo computational methods 5 Applications Classification Regression analysis 6 Conclusions and references 3 Early history of probability Earliest traces in Western civilization: Jewish writings, Aristotle Notion of probability in law, based on evidence Usage in finance Usage and demonstration in gambling 4 Middle Ages World is knowable but uncertainty due to human ignorance William of Ockham: Ockham’s razor Probabilis: a supposedly ‘provable’ opinion Counting of authorities Later: degree of truth, a scale Quantification: Law, faith ! Bayesian notion Gaming ! frequentist notion 5 Quantification 17th century: Pascal, Fermat, Huygens Comparative testing of hypotheses Population statistics 1713: Ars Conjectandi by Jacob Bernoulli: Weak law of large numbers Principle of indifference De Moivre (1718): The Doctrine of Chances 6 Bayes and Laplace Paper by Thomas Bayes (1763): inversion
    [Show full text]
  • Debugging Probabilistic Programs
    Debugging Probabilistic Programs Chandrakana Nandi Adrian Sampson Todd Mytkowicz Dan Grossman Cornell University, Ithaca, USA Microsoft Research, Redmond, USA University of Washington, Seattle, USA [email protected] [email protected] fcnandi, [email protected] Kathryn S. McKinley Google, USA [email protected] Abstract ity of being true in a given execution. Even though these assertions Many applications compute with estimated and uncertain data. fail when the program’s results are unexpected, they do not give us While advances in probabilistic programming help developers build any information about the cause of the failure. To help determine such applications, debugging them remains extremely challenging. the cause of failure, we identify three types of common probabilistic New types of errors in probabilistic programs include 1) ignoring de- programming defects. pendencies and correlation between random variables and in training Modeling errors and insufficient evidence. Probabilistic pro- data, 2) poorly chosen inference hyper-parameters, and 3) incorrect grams may use incorrect statistical models, e.g., using Gaussian statistical models. A partial solution to prevent these errors in some (0.0, 1.0) instead of Gaussian (1.0, 1.0), where Gaussian (µ, s) rep- languages forbids developers from explicitly invoking inference. resents a Gaussian distribution with mean µ and standard deviation While this prevents some dependence errors, it limits composition s. On the other hand,even if the statistical model is correct, their and control over inference, and does not guarantee absence of other input data (e.g., training data) may be erroneous, insufficient or inappropriate for performing a given statistical task. types of errors.
    [Show full text]
  • Maximum Entropy: the Universal Method for Inference
    MAXIMUM ENTROPY: THE UNIVERSAL METHOD FOR INFERENCE by Adom Giffin A Dissertation Submitted to the University at Albany, State University of New York in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy College of Arts & Sciences Department of Physics 2008 Abstract In this thesis we start by providing some detail regarding how we arrived at our present understanding of probabilities and how we manipulate them – the product and addition rules by Cox. We also discuss the modern view of entropy and how it relates to known entropies such as the thermodynamic entropy and the information entropy. Next, we show that Skilling's method of induction leads us to a unique general theory of inductive inference, the ME method and precisely how it is that other entropies such as those of Renyi or Tsallis are ruled out for problems of inference. We then explore the compatibility of Bayes and ME updating. After pointing out the distinction between Bayes' theorem and the Bayes' updating rule, we show that Bayes' rule is a special case of ME updating by translating information in the form of data into constraints that can be processed using ME. This implies that ME is capable of reproducing every aspect of orthodox Bayesian inference and proves the complete compatibility of Bayesian and entropy methods. We illustrated this by showing that ME can be used to derive two results traditionally in the domain of Bayesian statistics, Laplace's Succession rule and Jeffrey's conditioning rule. The realization that the ME method incorporates Bayes' rule as a special case allows us to go beyond Bayes' rule and to process both data and expected value constraints simultaneously.
    [Show full text]
  • Bayesian Cognition Probabilistic Models of Action, Perception, Inference, Decision and Learning
    Bayesian Cognition Probabilistic models of action, perception, inference, decision and learning Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #1 Bayesian Cognition Cours 2 : Bayesian Programming Julien Diard http://diard.wordpress.com [email protected] CNRS - Laboratoire de Psychologie et NeuroCognition, Grenoble Pierre Bessière CNRS - Institut des Systèmes Intelligents et de Robotique, Paris Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #2 Contents / Schedule • c1 : fondements théoriques, • c1 : Mercredi 11 Octobre définition du formalisme de la • c2 : Mercredi 18 Octobre programmation bayésienne • c3 : Mercredi 25 Octobre • *pas de cours la semaine du 30 • c2 : programmation Octobre* bayésienne des robots • c4 : Mercredi 8 Novembre • c5 : Mercredi 15 Novembre • c3 : modélisation bayésienne • *pas de cours la semaine du 20 cognitive Novembre* • c6 : Mercredi 29 Novembre • c4 : comparaison bayésienne de modèles • Examen ?/?/? (pour les M2) Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #3 Plan • Summary & questions! • Basic concepts: minimal example and spam detection example • Bayesian Programming methodology – Variables – Decomposition & conditional independence hypotheses – Parametric forms (demo) – Learning – Inference • Taxonomy of Bayesian models Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #4 Probability Theory As Extended Logic • Probabilités • Probabilités « fréquentistes » « subjectives » E.T. Jaynes (1922-1998) – Une probabilité est une – Référence à un état de propriété physique d'un connaissance
    [Show full text]
  • UC Berkeley UC Berkeley Electronic Theses and Dissertations
    UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Bounds on the Entropy of a Binary System with Known Mean and Pairwise Constraints Permalink https://escholarship.org/uc/item/1sx6w3qg Author Albanna, Badr Faisal Publication Date 2013 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California Bounds on the Entropy of a Binary System with Known Mean and Pairwise Constraints by Badr Faisal Albanna A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Physics in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Michael R. DeWeese, Chair Professor Ahmet Yildiz Professor David Presti Fall 2013 Bounds on the Entropy of a Binary System with Known Mean and Pairwise Constraints Copyright 2013 by Badr Faisal Albanna 1 Abstract Bounds on the Entropy of a Binary System with Known Mean and Pairwise Constraints by Badr Faisal Albanna Doctor of Philosophy in Physics University of California, Berkeley Professor Michael R. DeWeese, Chair Maximum entropy models are increasingly being used to describe the collective activity of neural populations with measured mean neural activities and pairwise correlations, but the full space of probability distributions consistent with these constraints has not been explored. In this dissertation, I provide lower and upper bounds on the entropy for both the minimum and maximum entropy distributions over binary units with any fixed set of mean values and pairwise correlations, and we construct distributions for several relevant cases. Surprisingly, the minimum entropy solution has entropy scaling logarithmically with system size, unlike the possible linear behavior of the maximum entropy solution, for any set of first- and second-order statistics consistent with arbitrarily large systems.
    [Show full text]
  • Fully Bayesian Computing
    Fully Bayesian Computing Jouni Kerman Andrew Gelman Department of Statistics Department of Statistics Columbia University Columbia University [email protected] [email protected] November 24, 2004 Abstract A fully Bayesian computing environment calls for the possibility of defining vector and array objects that may contain both random and deterministic quantities, and syntax rules that allow treating these objects much like any variables or numeric arrays. Working within the statistical package R, we introduce a new object-oriented framework based on a new random variable data type that is implicitly represented by simulations. We seek to be able to manipulate random variables and posterior simulation objects conveniently and transparently and provide a basis for further development of methods and functions that can access these objects directly. We illustrate the use of this new programming environment with several examples of Bayesian com- puting, including posterior predictive checking and the manipulation of posterior simulations. This new environment is fully Bayesian in that the posterior simulations can be handled directly as random vari- ables. Keywords: Bayesian inference, object-oriented programming, posterior simulation, random variable ob- jects 1 1 Introduction In practical Bayesian data analysis, inferences are drawn from an L × k matrix of simulations representing L draws from the posterior distribution of a vector of k parameters. This matrix is typically obtained by a computer program implementing a Gibbs sampling scheme or other Markov chain Monte Carlo (MCMC) process. Once the matrix of simulations from the posterior density of the parameters is available, we may use it to draw inferences about any function of the parameters.
    [Show full text]
  • Nature, Science, Bayes' Theorem, and the Whole of Reality
    Nature, Science, Bayes' Theorem, and the Whole of Reality Moorad Alexanian Department of Physics and Physical Oceanography University of North Carolina Wilmington Wilmington, NC 28403-5606 Abstract A fundamental problem in science is how to make logical inferences from scientific data. Mere data does not suffice since additional information is necessary to select a domain of models or hypotheses and thus determine the likelihood of each model or hypothesis. Thomas Bayes' Theorem relates the data and prior information to posterior probabilities associated with differing models or hypotheses and thus is useful in identifying the roles played by the known data and the assumed prior information when making inferences. Scientists, philosophers, and theologians accumulate knowledge when analyzing different aspects of reality and search for particular hypotheses or models to fit their respective subject matters. Of course, a main goal is then to integrate all kinds of knowledge into an all-encompassing worldview that would describe the whole of reality. A generous description of the whole of reality would span, in the order of complexity, from the purely physical to the supernatural. These two extreme aspects of reality are bridged by a nonphysical realm, which would include elements of life, man, consciousness, rationality, mental and mathematical abstractions, etc. An urgent problem in the theory of knowledge is what science is and what it is not. Albert Einstein's notion of science in terms of sense perception is refined by defining operationally the data that makes up the subject matter of science. It is shown, for instance, that theological considerations included in the prior information assumed by Isaac Newton is irrelevant in relating the data logically to the model or hypothesis.
    [Show full text]
  • Distribution Transformer Semantics for Bayesian Machine Learning
    Distribution Transformer Semantics for Bayesian Machine Learning Johannes Borgstrom¨ 1, Andrew D. Gordon1, Michael Greenberg2, James Margetson1, and Jurgen Van Gael3 1 Microsoft Research 2 University of Pennsylvania 3 Microsoft FUSE Labs Abstract. The Bayesian approach to machine learning amounts to inferring pos- terior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of vari- ables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we pro- pose a core functional calculus with primitives for sampling prior distributions and observing variables. We define novel combinators for distribution transform- ers, based on theorems in measure theory, and use these to give a rigorous se- mantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid distributions, and observations of zero-probability events. We compile our core language to a small imperative lan- guage that in addition to the distribution transformer semantics also has a straight- forward semantics via factor graphs, data structures that enable many efficient inference algorithms. We then use an existing inference engine for efficient ap- proximate inference of posterior marginal distributions, treating thousands of ob- servations per second for large instances of realistic models. 1 Introduction In the past 15 years, statistical machine learning has unified many seemingly unrelated methods through the Bayesian paradigm. With a solid understanding of the theoreti- cal foundations, advances in algorithms for inference, and numerous applications, the Bayesian paradigm is now the state of the art for learning from data.
    [Show full text]