Stefano Andreon Brian Weaver Learning from Examples In
Total Page:16
File Type:pdf, Size:1020Kb
Springer Series in Astrostatistics Stefano Andreon Brian Weaver Bayesian Methods for the Physical Sciences Learning from Examples in Astronomy and Physics Springer Series in Astrostatistics Editor-in-chief: Joseph M. Hilbe, Jet Propulsion Laboratory, and Arizona State University, USA Jogesh Babu, The Pennsylvania State University, USA Bruce Bassett, University of Cape Town, South Africa Steffen Lauritzen, Oxford University, UK Thomas Loredo, Cornell University, USA Oleg Malkov, Moscow State University, Russia Jean-Luc Starck, CEA/Saclay, France David van Dyk, Imperial College, London, UK Springer Series in Astrostatistics, More information about this series at http://www.springer.com/series/1432 Springer Series in Astrostatistics Astrostatistical Challenges for the New Astronomy: ed. Joseph M. Hilbe Astrostatistics and Data Mining: ed. Luis Manuel Sarro, Laurent Eyer, William O’Mullane, Joris De Ridder Statistical Methods for Astronomical Data Analysis: by Asis Kumar Chattopadhyay & Tanuka Chattopadhyay Stefano Andreon • Brian Weaver Bayesian Methods for the Physical Sciences Learning from Examples in Astronomy and Physics 123 Stefano Andreon Brian Weaver INAF Statistical Sciences Osservatorio Astronomico di Brera Los Alamos National Laboratory Milano, Italy Los Alamos, NM, USA ISSN 2199-1030 ISSN 2199-1049 (electronic) Springer Series in Astrostatistics ISBN 978-3-319-15286-8 ISBN 978-3-319-15287-5 (eBook) DOI 10.1007/978-3-319-15287-5 Library of Congress Control Number: 2015932855 Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www. springer.com) Preface This book is a consultant’s guide for the researcher in astronomy or physics who is willing to analyze his (or her) own data by offering him (or her) a statistical background, some numerical advice, and a large number of examples of statisti- cal analyses of real-world problems, many from the experiences of the first author. While writing this book, we placed ourselves in the role of the researcher willing to analyze his/her own data but lacking a practical way to do it (in fact one of us was this once). For this reason, we choose the JAGS symbolic language to perform the Bayesian fitting, allowing us (the researchers) to focus on the research applications, not on programming (coding) issues. By using JAGS, it is easy for the researcher to take one of the many applications presented in this book and morph them into a form that is relevant to his needs. More than 50 examples are listed and are intended to be the starting points for the researchers, so they can develop the confidence to solve their own problem. All examples are illustrated with (more than 100) figures showing the data, the fitted model, and its uncertainty in a way that researchers in astronomy and physics are used to seeing them. All models and datasets used in this book are made available at the site http://www.brera.mi.astro.it/∼andreon/ BayesianMethodsForThePhysicalSciences/. The topics in this book include analyzing measurements subject to errors of dif- ferent amplitudes and with a larger-than-expected scatter, dealing with upper limits and with selection effects, modeling of two populations (one we are interested in and a nuisance population), regression (fitting) models with and without the above “complications” and population gradients, and how to predict the value of an un- available quantity by exploiting the existence of a trend with another, available, quantity. The book also provides some advice to answer the following questions: Is the considered model at odds with the fitted data? Furthermore, are the results constrained by the data or unduly affected by assumptions? These are the “bread and butter” activities of researchers in physics and astronomy and also to all those analyzing and interpreting datasets in other branches as far as they are confronted with similar problems (measurement errors, variety in the studied population, etc.). The book also serves as a guide for (and has been shaped thanks to) Ph.D. stu- dents in the physical and space sciences: by dedicating 12 h of studying the content v vi Preface of this book, the average student is able to write alone and unsupervised the complex models in Chap. 8 (and indeed the example in Sect. 8.2 was written by a student). Content The book consists of ten chapters. After a short description on how to use the book (Chap. 1), Chaps. 2 and 3 are a gentle introduction to the theory behind Bayesian methods and how to practically compute the target numbers (e.g., parameter esti- mates of the model). In later chapters, we present problems of increasing difficulty with an emphasis on astronomy examples. Each application in the book has the same structure: it presents the problem, provides the data (listed in the text, or in the case of large datasets, a URL is provided) or shows how to generate the data, has a detailed analysis including quantitative summaries and figures (more than 100 figures), and provides the code used for the analysis (using again the open-source JAGS program). The idea is that the reader has the capability to fully reproduce our analysis. The provided JAGS code (of which the book contain over 50 examples) can be the starting point for a researcher applying Bayesian methods to different problems. The key subject of model checking and sensitivity analysis is addressed in Chap. 9. Lastly, Chap. 10 provides some comparisons of the Bayesian approach to older ways of doing statistics (i.e., frequentist methods) in astronomy and physics. Most chapters conclude with a set of exercises. An exercise solution book is avail- able at the same URL given above. Contact Details Send comments and errors found to either [email protected] or theguz@ lanl.gov. Updated programs and an erratum are kept at http://www.brera.mi.astro.it/ ∼andreon/BayesianMethodsForThePhysicalSciences/. Acknowledgments SA thanks Merrilee Hurn for her collaborations that inspired much of the content of this book, Steve Gull for enjoyable discussions on this book’s content and for the use of his office in Cambridge (UK), Joe Hilbe for encouragements to continue the effort of writing this book, Andrew Gelman for his reactiveness in answering questions, Andrea Benaglia for the example in Sect. 8.2, and Giulio D’Agostini for computing the analytical expression of the posterior probability distribution for a few problems of astronomical relevance. The practical way to summarize our knowledge about the quantity under study, more than philosophical debates, is maximally useful to researchers. BW thanks William Meeker and Max Morris for teaching him how to be a prac- tical statistician. He also thanks Michael Hamada, Alyson Wilson, and Harry Martz for taking a young frequentist and opening his eyes to the beauty of Bayesian statis- tics, and David Higdon for teaching him how to have fun being a statistician. Lastly, we thank our beautiful wives for their encouragement and support. Finally, a warm thanks goes to Martin Plummer for JAGS, i.e., the software used to perform the numerical computations of this book. The ease of use of JAGS made our daily research work easier and allows us to easily explore other possible models and the sensitivity of our conclusions to (unavoidable) assumptions. Contents 1 Recipes for a Good Statistical Analysis ........................... 1 2 A Bit of Theory ................................................ 3 2.1 Axiom 1: Probabilities Are in the Range Zero to One............. 3 2.2 Axiom 2: When a Probability Is Either Zero or One .............. 3 2.3 Axiom 3: The Sum, or Marginalization, Axiom ................. 4 2.4 Product Rule .............................................. 5 2.5 Bayes Theorem ............................................ 5 2.6 Error Propagation .......................................... 7 2.7 Bringing It All Home ....................................... 8 2.8 Profiling Is Not Marginalization .............................. 8 2.9 Exercises ................................................. 10 References ..................................................... 13 3 A Bit of Numerical Computation ................................. 15 3.1 Some Technicalities ........................................ 17 3.2 How to Sample from a Generic Function ....................... 18 References