Parameterization and Bayesian Modeling

Total Page:16

File Type:pdf, Size:1020Kb

Parameterization and Bayesian Modeling Parameterization and BayesianModeling Andrew GELMAN Progressin statistical computationoften leads toadvances instatistical modeling.For example, it is surprisinglycommon that an existing modelis reparameterized, solelyfor computational purposes, but then this new conguration motivates a new familyof models that is useful inapplied statistics. One reasonwhy this phenomenon may nothave been noticed in statistics is thatreparameterizations donot change the likelihood.In a Bayesian framework, however,a transformationof parameters typicallysuggests a new familyof priordistributions. We discussexamples incensoredand truncated data, mixture modeling, multivariate imputation, stochastic processes, andmultilevel models. KEY WORDS:Censored data; Data augmentation;Gibbs sampler; Hierarchical model;Missing-data imputation; Parameter expansion; Priordistribution; Truncated data. 1.INTRODUCTION tospeed Gibbs samplers (e.g., Hills and Smith 1992; Boscardin 1996;Roberts and Sahu 1997). Progressin statistical computation often leads to advances From aBayesianperspective, however, new parameteriza- instatistical modeling. W eexplorethis idea in the context of tionscan lead to newprior distributions and thus new models. dataand parameter augmentation— techniques in which latent Oneway in which this often occurs is iftheprior distribution dataor parameters are added to a model.Data and parameter for aparameteris conditionallyconjugate (i.e.,conjugate in the augmentationare methods for reparameterizinga model,not conditionalposterior distribution), given the data and all other changingits description of data, but allowing computation to parametersin themodel. In Gibbs sampler computation, condi- proceedmore easily, quickly, or reliably.In a Bayesiancontext, tionalconjugacy can allow more ef cient computation of poste- however,these latent data and parameters can often be given riormoments using “ Rao–Blackwellization” (see Gelfandand substantiveinterpretations in a waythat expands the model’ s Smith1990). This technique is also useful for performingin- practicalutility. ferenceson latent parameters in mixture models, conditional on 1.1 Data Augmentation and Parameter Expansionin convergenceof thesimulations for thehyperparameters. As we Likelihood and BayesianInference discussin Section5, parameterexpansion leads to newfamilies ofconditionally conjugate models. Dataaugmentation (Tannerand W ong1987) refers toafam- Onceagain, there is ananalogyto Bayesian inference in sim- ilyof computationalmethods that typically add new latent data plersettings. For example,in classical regression, applying a thatare partially identi ed by thedata.By “ partiallyidenti ed,” lineartransformation to regression predictors has no effect on we meanthat there is some information about these new vari- thepredictions. But in a Bayesianregression with a hierarchi- ables,but as sample size increases, the amount of information calprior distribution, rescaling and other linear transformations abouteach variable does not increase. Examples included in canpull parameters closer together so that shrinkage is more thisarticle include censored data, latent mixture indicators, and effective,as we discussin Section5.1. latentcontinuous variables for discreteregressions. Data aug- mentationis designedto allow simulation-based computations 1.2 Model Expansionfor Substantiveor tobe performedmore simply on thelarger space of “complete Computational Reasons data,”by analogyto theworkings of theEMalgorithmfor max- Theusual reason for expandinga modelis for substantive imumlikelihood (Dempster, Laird, and Rubin 1977). reasons—to bettercapture an underlying model of interest,to Parameterexpansion (Liu,Rubin, and Wu 1998) typically better texistingdata, or both. Interesting statistical issues addsnew parameters that are nonidenti ed— inBayesian terms, arisewhen balancing these goals, and Bayesian inference with ifthey have improper prior distributions (as theytypically do), properprior distributions can resolve the potential nonidenti - thenthey have improper posterior distributions. An important abilityproblems that can arise. exampleis replacing a parameter µ bya product, ÁÃ, so that ABayesianmodel can be expandedby adding parameters, or inferencecan be obtainedabout the productbut not about either asetof candidatemodels can be bridgedusing discrete model individualparameter .Parameterexpansion can be viewed as averagingor continuous model expansion. Recent treatments partof alargerperspective on iterativesimulation (see vanDyk ofthese approaches from aBayesianperspective have been andMeng 2001; Liu 2003), but our focus here is on its con- presentedby Madigan and Raftery (1994), Hoeting, Madigan, structionof nonidentiable parameters as acomputationalaid. Raftery,and V olinsky(1999), Draper (1995),and Gelman, Bothdata augmentation and parameter expansion are excit- Huang,van Dyk, and Boscardin (2003, secs. 6.6 and 6.7). In ingtools for increasingthe simplicity and speed of compu- thecontext of data tting,Green and Richardson (1997) showed tations.In a likelihood-inferenceframework,that is all they howBayesian model mixing can be usedto perform theequiv- canbe— computational tools. By design, these methods do not alentof nonparametricdensity estimation and regression. changethe likelihood; they only change its parameterization. Anotherimportant form ofmodelexpansion is for sensitiv- Thesame goes for simplercomputational methods, such as ityto potential nonignorability in data collection (see Rubin standardizationof predictorsin regression models and rotations 1976;Little and Rubin 1987). The additional parameters in AndrewGelman is Professor,Department ofStatistics, Columbia Univer- ©2004American Statistical Association sity,New York,NY 10027(E-mail: [email protected] ).Theauthor Journalof the American Statistical Association thanksXiao-Li Meng and several reviewers forhelpful discussions and the U.S. June2004, V ol.99, No. 466, Review Article NationalScience Foundationfor nancialsupport. DOI 10.1198/016214504000000458 537 538 Journalof theAmerican Statistical Association, June 2004 thesemodels cannot be identied but are variedto explore sen- 2.2 Modeling Truncated Data asCensored but With an sitivityof inferencesto assumptionsabout selection (see Diggle Unknown Number ofCensored Data Points andKenward 1994; Kenward 1998; Rotnitzky, Robins, and Now supposethat N isunknown.W ecanconsider two op- Scharfstein1998; Troxel, Ma, and Heitjan 2003). Nandaram tionsfor modelingthe data: andChoi (2002) argued that continuous model expansion with aninformative prior distribution is appropriate for modeling 1.Using the truncated-data model (1) potentialnonignorability in nonresponse,and showed how the 2.Using the censored-data model (2), treating the original sample size N asmissingdata. variationin nonignorabilitycan be estimatedfrom ahierarchi- caldata structure (see alsoLittle and Gelman 1998). Thesecond option requires a probabilitydistribution for N. The Inthis article, we donot further consider these substantive completeposterior distribution of theobserved data y and miss- examplesof modelexpansion, but rather discuss several classes ing data N is then ofmodels for which theoretically or computationally motivated N p.¹; ¾; N y/ p.N/p.¹; ¾/ modelexpansion has unexpectedly led to newinsights or new j D 91 classesof models. All of these examples feature new para- ³ ´ N 91 91 metersor model structures that could be considered a purely ¹ 200 ¡ 8 ¡ N.y ¹; ¾ 2/: mathematicalconstructions but gain new life when given direct £ ¾ i j ³ ³ ´´ i 1 interpretations.Where possible, we illustratewith applications YD from ourown research so that we havemore certainty in our Wecanobtain the marginal posterior density of .¹; ¾/ by sum- claimsabout the original motivation and ultimate uses of the ming over N, N 91 reparameterizationsand model expansions. 1 N ¹ 200 ¡ p.¹; ¾ y/ p.N/p.¹; ¾ / 8 ¡ j / 91 ¾ N 91 ³ ´³ ³ ´´ 2.TRUNCA TED AND CENSORED DATA XD 91 Itis well understood that the censored data model is like N.y ¹; ¾ 2/ £ i j thetruncated data model, but with additional information. With i 1 YD censoring,certain speci c measurementsare missing. Here we 91 furtherexplore the connection between the two models. p.¹; ¾ / N.y ¹; ¾ 2/ D i j i 1 2.1 Truncated and Censored Data Models YD N 91 1 N ¹ 200 ¡ p.N/ 8 ¡ : (3) Weworkin the context of a simpleexample (see Gelman £ 91 ¾ N 91 ³ ´³ ³ ´´ etal. 2003, sec. 7.8). A randomsample of N animals are XD weighedon adigitalscale. The scale is accurate,but it doesnot Itturns out that if p.N/ 1=N,thenthe expression inside the / givea readingfor objectsthat weigh more than 200 pounds. Of summationin (3) hasthe form ofa negativebinomial den- sity with µ N 1, ® 91, and 1 8..¹ 200/=¾//. the N animals, n 91aresuccessfully weighed; their weights D ¡ D ¯ 1 D ¡ D Theexpression inside the summation Cis proportional to .1 are y1; : : : ; y91.Weassumethat the weights of the animals in ¹ 200 91 ¡ thepopulation are normallydistributed with mean ¹ and stan- 8. ¡¾ //¡ ,sothat for thisparticular choice of noninfor- darddeviation ¾ . mativeprior distribution, the entire expression (3) becomes Inthe “ truncateddata” scenario, N isunknown and the pos- proportionalto thesimple truncated-data posterior density (1). teriordistribution of the unknown parameters ¹ and ¾ of the Mengand Zaslavsky (2002) discussed other properties of the 1=N priordistribution and the negative binomial model. data is Itseems completely sensible that if we adda parameter N tothe model and average it out, then we
Recommended publications
  • DSA 5403 Bayesian Statistics: an Advancing Introduction 16 Units – Each Unit a Week’S Work
    DSA 5403 Bayesian Statistics: An Advancing Introduction 16 units – each unit a week’s work. The following are the contents of the course divided into chapters of the book Doing Bayesian Data Analysis . by John Kruschke. The course is structured around the above book but will be embellished with more theoretical content as needed. The book can be obtained from the library : http://www.sciencedirect.com/science/book/9780124058880 Topics: This is divided into three parts From the book: “Part I The basics: models, probability, Bayes' rule and r: Introduction: credibility, models, and parameters; The R programming language; What is this stuff called probability?; Bayes' rule Part II All the fundamentals applied to inferring a binomial probability: Inferring a binomial probability via exact mathematical analysis; Markov chain Monte C arlo; J AG S ; Hierarchical models; Model comparison and hierarchical modeling; Null hypothesis significance testing; Bayesian approaches to testing a point ("Null") hypothesis; G oals, power, and sample size; S tan Part III The generalized linear model: Overview of the generalized linear model; Metric-predicted variable on one or two groups; Metric predicted variable with one metric predictor; Metric predicted variable with multiple metric predictors; Metric predicted variable with one nominal predictor; Metric predicted variable with multiple nominal predictors; Dichotomous predicted variable; Nominal predicted variable; Ordinal predicted variable; C ount predicted variable; Tools in the trunk” Part 1: The Basics: Models, Probability, Bayes’ Rule and R Unit 1 Chapter 1, 2 Credibility, Models, and Parameters, • Derivation of Bayes’ rule • Re-allocation of probability • The steps of Bayesian data analysis • Classical use of Bayes’ rule o Testing – false positives etc o Definitions of prior, likelihood, evidence and posterior Unit 2 Chapter 3 The R language • Get the software • Variables types • Loading data • Functions • Plots Unit 3 Chapter 4 Probability distributions.
    [Show full text]
  • The Bayesian Approach to Statistics
    THE BAYESIAN APPROACH TO STATISTICS ANTHONY O’HAGAN INTRODUCTION the true nature of scientific reasoning. The fi- nal section addresses various features of modern By far the most widely taught and used statisti- Bayesian methods that provide some explanation for the rapid increase in their adoption since the cal methods in practice are those of the frequen- 1980s. tist school. The ideas of frequentist inference, as set out in Chapter 5 of this book, rest on the frequency definition of probability (Chapter 2), BAYESIAN INFERENCE and were developed in the first half of the 20th century. This chapter concerns a radically differ- We first present the basic procedures of Bayesian ent approach to statistics, the Bayesian approach, inference. which depends instead on the subjective defini- tion of probability (Chapter 3). In some respects, Bayesian methods are older than frequentist ones, Bayes’s Theorem and the Nature of Learning having been the basis of very early statistical rea- Bayesian inference is a process of learning soning as far back as the 18th century. Bayesian from data. To give substance to this statement, statistics as it is now understood, however, dates we need to identify who is doing the learning and back to the 1950s, with subsequent development what they are learning about. in the second half of the 20th century. Over that time, the Bayesian approach has steadily gained Terms and Notation ground, and is now recognized as a legitimate al- ternative to the frequentist approach. The person doing the learning is an individual This chapter is organized into three sections.
    [Show full text]
  • Posterior Propriety and Admissibility of Hyperpriors in Normal
    The Annals of Statistics 2005, Vol. 33, No. 2, 606–646 DOI: 10.1214/009053605000000075 c Institute of Mathematical Statistics, 2005 POSTERIOR PROPRIETY AND ADMISSIBILITY OF HYPERPRIORS IN NORMAL HIERARCHICAL MODELS1 By James O. Berger, William Strawderman and Dejun Tang Duke University and SAMSI, Rutgers University and Novartis Pharmaceuticals Hierarchical modeling is wonderful and here to stay, but hyper- parameter priors are often chosen in a casual fashion. Unfortunately, as the number of hyperparameters grows, the effects of casual choices can multiply, leading to considerably inferior performance. As an ex- treme, but not uncommon, example use of the wrong hyperparameter priors can even lead to impropriety of the posterior. For exchangeable hierarchical multivariate normal models, we first determine when a standard class of hierarchical priors results in proper or improper posteriors. We next determine which elements of this class lead to admissible estimators of the mean under quadratic loss; such considerations provide one useful guideline for choice among hierarchical priors. Finally, computational issues with the resulting posterior distributions are addressed. 1. Introduction. 1.1. The model and the problems. Consider the block multivariate nor- mal situation (sometimes called the “matrix of means problem”) specified by the following hierarchical Bayesian model: (1) X ∼ Np(θ, I), θ ∼ Np(B, Σπ), where X1 θ1 X2 θ2 Xp×1 = . , θp×1 = . , arXiv:math/0505605v1 [math.ST] 27 May 2005 . X θ m m Received February 2004; revised July 2004. 1Supported by NSF Grants DMS-98-02261 and DMS-01-03265. AMS 2000 subject classifications. Primary 62C15; secondary 62F15. Key words and phrases. Covariance matrix, quadratic loss, frequentist risk, posterior impropriety, objective priors, Markov chain Monte Carlo.
    [Show full text]
  • The Deviance Information Criterion: 12 Years On
    J. R. Statist. Soc. B (2014) The deviance information criterion: 12 years on David J. Spiegelhalter, University of Cambridge, UK Nicola G. Best, Imperial College School of Public Health, London, UK Bradley P. Carlin University of Minnesota, Minneapolis, USA and Angelika van der Linde Bremen, Germany [Presented to The Royal Statistical Society at its annual conference in a session organized by the Research Section on Tuesday, September 3rd, 2013, Professor G. A.Young in the Chair ] Summary.The essentials of our paper of 2002 are briefly summarized and compared with other criteria for model comparison. After some comments on the paper’s reception and influence, we consider criticisms and proposals for improvement made by us and others. Keywords: Bayesian; Model comparison; Prediction 1. Some background to model comparison Suppose that we have a given set of candidate models, and we would like a criterion to assess which is ‘better’ in a defined sense. Assume that a model for observed data y postulates a density p.y|θ/ (which may include covariates etc.), and call D.θ/ =−2log{p.y|θ/} the deviance, here considered as a function of θ. Classical model choice uses hypothesis testing for comparing nested models, e.g. the deviance (likelihood ratio) test in generalized linear models. For non- nested models, alternatives include the Akaike information criterion AIC =−2log{p.y|θˆ/} + 2k where θˆ is the maximum likelihood estimate and k is the number of parameters in the model (dimension of Θ). AIC is built with the aim of favouring models that are likely to make good predictions.
    [Show full text]
  • A Widely Applicable Bayesian Information Criterion
    JournalofMachineLearningResearch14(2013)867-897 Submitted 8/12; Revised 2/13; Published 3/13 A Widely Applicable Bayesian Information Criterion Sumio Watanabe [email protected] Department of Computational Intelligence and Systems Science Tokyo Institute of Technology Mailbox G5-19, 4259 Nagatsuta, Midori-ku Yokohama, Japan 226-8502 Editor: Manfred Opper Abstract A statistical model or a learning machine is called regular if the map taking a parameter to a prob- ability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model.
    [Show full text]
  • Bayes Theorem and Its Recent Applications
    BAYES THEOREM AND ITS RECENT APPLICATIONS Nicholas Stylianides Eleni Kontou March 2020 Abstract Individuals who have studied maths to a specific level have come across Bayes’ Theorem or Bayes Formula. Bayes’ Theorem has many applications in areas such as mathematics, medicine, finance, marketing, engineering and many other. This paper covers Bayes’ Theorem at a basic level and explores how the formula was derived. We also, look at some extended forms of the formula and give an explicit example. Lastly, we discuss recent applications of Bayes’ Theorem in valuating depression and predicting water quality conditions. The goal of this expository paper is to break down the interesting topic of Bayes’ Theorem to a basic level which can be easily understood by a wide audience. What is Bayes Theorem? Thomas Bayes was an 18th-century British mathematician who developed a mathematical formula to calculate conditional probability in order to provide a way to re-examine current expectations. This mathematical formula is well known as Bayes Theorem or Bayes’ rule or Bayes’ Law. It is also the basis of a whole field of statistics known as Bayesian Statistics. For example, in finance, Bayes’ Theorem can be utilized to rate the risk of loaning cash to possible borrowers. The formula for Bayes’ Theorem is: [1] P(A) =The probability of A occurring P(B) = The probability of B occurring P(A|B) = The probability of A given B P(B|A) = The probability of B given A P(A∩B) = The probability of both A and B occurring MA3517 Mathematics Research Journal @Nicholas Stylianides, Eleni Kontou Bayes’ Theorem: , a $% &((!) > 0 %,- .// Proof: From Bayes’ Formula: (1) From the Total Probability (2) Theorem: Substituting equation (2) into equation (1) we obtain .
    [Show full text]
  • A Review of Bayesian Optimization
    Taking the Human Out of the Loop: A Review of Bayesian Optimization The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Shahriari, Bobak, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. 2016. “Taking the Human Out of the Loop: A Review of Bayesian Optimization.” Proc. IEEE 104 (1) (January): 148– 175. doi:10.1109/jproc.2015.2494218. Published Version doi:10.1109/JPROC.2015.2494218 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:27769882 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP 1 Taking the Human Out of the Loop: A Review of Bayesian Optimization Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams and Nando de Freitas Abstract—Big data applications are typically associated with and the analytics company that sits between them. The analyt- systems involving large numbers of users, massive complex ics company must develop procedures to automatically design software systems, and large-scale heterogeneous computing and game variants across millions of users; the objective is to storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., rec- enhance user experience and maximize the content provider’s ommendation systems, medical analysis tools, real-time game revenue. engines, speech recognizers) thus involves many tunable config- The preceding examples highlight the importance of au- uration parameters.
    [Show full text]
  • Prior Distributions for Variance Parameters in Hierarchical Models
    Bayesian Analysis (2006) 1, Number 3, pp. 515–533 Prior distributions for variance parameters in hierarchical models Andrew Gelman Department of Statistics and Department of Political Science Columbia University Abstract. Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new folded-noncentral-t family of conditionally conjugate priors for hierarchical standard deviation pa- rameters, and then consider noninformative and weakly informative priors in this family. We use an example to illustrate serious problems with the inverse-gamma family of “noninformative” prior distributions. We suggest instead to use a uni- form prior on the hierarchical standard deviation, using the half-t family when the number of groups is small and in other settings where a weakly informative prior is desired. We also illustrate the use of the half-t family for hierarchical modeling of multiple variance parameters such as arise in the analysis of variance. Keywords: Bayesian inference, conditional conjugacy, folded-noncentral-t distri- bution, half-t distribution, hierarchical model, multilevel model, noninformative prior distribution, weakly informative prior distribution 1 Introduction Fully-Bayesian analyses of hierarchical linear models have been considered for at least forty years (Hill, 1965, Tiao and Tan, 1965, and Stone and Springer, 1965) and have remained a topic of theoretical and applied interest (see, e.g., Portnoy, 1971, Box and Tiao, 1973, Gelman et al., 2003, Carlin and Louis, 1996, and Meng and van Dyk, 2001). Browne and Draper (2005) review much of the extensive literature in the course of comparing Bayesian and non-Bayesian inference for hierarchical models. As part of their article, Browne and Draper consider some different prior distributions for variance parameters; here, we explore the principles of hierarchical prior distributions in the context of a specific class of models.
    [Show full text]
  • Computational Bayesian Statistics
    Computational Bayesian Statistics An Introduction M. Antónia Amaral Turkman Carlos Daniel Paulino Peter Müller Contents Preface to the English Version viii Preface ix 1 Bayesian Inference 1 1.1 The Classical Paradigm 2 1.2 The Bayesian Paradigm 5 1.3 Bayesian Inference 8 1.3.1 Parametric Inference 8 1.3.2 Predictive Inference 12 1.4 Conclusion 13 Problems 14 2 Representation of Prior Information 17 2.1 Non-Informative Priors 18 2.2 Natural Conjugate Priors 23 Problems 26 3 Bayesian Inference in Basic Problems 29 3.1 The Binomial ^ Beta Model 29 3.2 The Poisson ^ Gamma Model 31 3.3 Normal (Known µ) ^ Inverse Gamma Model 32 3.4 Normal (Unknown µ, σ2) ^ Jeffreys’ Prior 33 3.5 Two Independent Normal Models ^ Marginal Jeffreys’ Priors 34 3.6 Two Independent Binomials ^ Beta Distributions 35 3.7 Multinomial ^ Dirichlet Model 37 3.8 Inference in Finite Populations 40 Problems 41 4 Inference by Monte Carlo Methods 45 4.1 Simple Monte Carlo 45 4.1.1 Posterior Probabilities 48 4.1.2 Credible Intervals 49 v vi Contents 4.1.3 Marginal Posterior Distributions 50 4.1.4 Predictive Summaries 52 4.2 Monte Carlo with Importance Sampling 52 4.2.1 Credible Intervals 56 4.2.2 Bayes Factors 58 4.2.3 Marginal Posterior Densities 59 4.3 Sequential Monte Carlo 61 4.3.1 Dynamic State Space Models 61 4.3.2 Particle Filter 62 4.3.3 Adapted Particle Filter 64 4.3.4 Parameter Learning 65 Problems 66 5 Model Assessment 72 5.1 Model Criticism and Adequacy 72 5.2 Model Selection and Comparison 78 5.2.1 Measures of Predictive Performance 79 5.2.2 Selection by Posterior Predictive
    [Show full text]
  • 10. Exchangeability and Hierarchical Models Objective Recommended
    10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods. Recommended reading • Bernardo, J.M. (1996). The concept of exchangeability and its applications. Far East Journal of Mathematical Sciences, 4, 111–121. Available from http://www.uv.es/~bernardo/Exchangeability.pdf • Casella, G. (1985). An introduction to empirical Bayes data analysis. The American Statistician, 39, 83–87. • Yung, K.H. (1999). Explaining the Stein paradox. Available from http://www.cs.toronto.edu/~roweis/csc2515/readings/stein_paradox.pdf Bayesian Statistics Exchangeability Suppose that we have a sequence of variables X1,X2,...,Xn. Then, in many cases, we might wish to assume that the subscripts of each individual variable are uninformative. For example, in tossing a coin three times, it is natural to assume that P (0, 0, 1) = P (0, 1, 0) = P (1, 0, 0). This idea underlines the concept of exchangeability as developed by De Finetti (1970, 1974). Definition 27 A sequence of random variables X1,...,Xn is said to be (finitely) exchangeable if the distribution of any permutation is the same as that of any other permutation, that is if n n P (∩i=1Xπ(i)) = P (∩i=1Xi) for all permutation functions π(·). The definition of exchangeability can be extended to infinite sequences of variables. Bayesian Statistics Definition 28 An infinite sequence, X1,X2,... is said to be (infinitely) exchangeable if every finite subsequence is judged to be exchangeable in the above sense. Thus, a sequence of variables that are judged to be independent and identically distributed is exchangeable.
    [Show full text]
  • What Is Bayesian Statistics?
    What is...? series New title Statistics Supported by sanofi-aventis What is Bayesian statistics? G Statistical inference concerns unknown parameters that John W Stevens BSc describe certain population characteristics such as the true Director, Centre for mean efficacy of a particular treatment. Inferences are made Bayesian Statistics in Health Economics, using data and a statistical model that links the data to the University of Sheffield parameters. G In frequentist statistics, parameters are fixed quantities, whereas in Bayesian statistics the true value of a parameter can be thought of as being a random variable to which we assign a probability distribution, known specifically as prior information. G A Bayesian analysis synthesises both sample data, expressed as the likelihood function, and the prior distribution, which represents additional information that is available. G The posterior distribution expresses what is known about a set of parameters based on both the sample data and prior information. G In frequentist statistics, it is often necessary to rely on large- sample approximations by assuming asymptomatic normality. In contrast, Bayesian inferences can be computed exactly, even in highly complex situations. For further titles in the series, visit: www.whatisseries.co.uk Date of preparation: April 2009 1 NPR09/1108 What is Bayesian statistics? What is Bayesian statistics? Statistical inference probability as a measure of the degree of Statistics involves the collection, analysis and personal belief about the value of an interpretation of data for the purpose of unknown parameter. Therefore, it is possible making statements or inferences about one or to ascribe probability to any event or more physical processes that give rise to the proposition about which we are uncertain, data.
    [Show full text]
  • Making Models with Bayes
    California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of aduateGr Studies 12-2017 Making Models with Bayes Pilar Olid Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd Part of the Applied Statistics Commons, Multivariate Analysis Commons, Other Applied Mathematics Commons, Other Mathematics Commons, Other Statistics and Probability Commons, Probability Commons, and the Statistical Models Commons Recommended Citation Olid, Pilar, "Making Models with Bayes" (2017). Electronic Theses, Projects, and Dissertations. 593. https://scholarworks.lib.csusb.edu/etd/593 This Thesis is brought to you for free and open access by the Office of aduateGr Studies at CSUSB ScholarWorks. It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. Making Models with Bayes A Thesis Presented to the Faculty of California State University, San Bernardino In Partial Fulfillment of the Requirements for the Degree Master of Arts in Mathematics by Pilar Olid December 2017 Making Models with Bayes A Thesis Presented to the Faculty of California State University, San Bernardino by Pilar Olid December 2017 Approved by: Dr. Charles Stanton, Committee Chair Dr. Jeremy Aikin, Committee Member Dr. Yuichiro Kakihara, Committee Member Dr. Charles Stanton, Chair, Department of Mathematics Dr. Corey Dunn, Graduate Coordinator iii Abstract Bayesian statistics is an important approach to modern statistical analyses. It allows us to use our prior knowledge of the unknown parameters to construct a model for our data set. The foundation of Bayesian analysis is Bayes' Rule, which in its proportional form indicates that the posterior is proportional to the prior times the likelihood.
    [Show full text]