The Role of Hierarchical Priors in Robust Bayesian Inference
Total Page:16
File Type:pdf, Size:1020Kb
Order Number 9411946 The role of hierarchical priors in robust Bayesian inference George, Robert Emerson, Ph.D. The Ohio State University, 1993 U MI 300 N. Zeeb Rd. Ann Arbor, MI 48106 THE ROLE OF HIERARCHICAL PRIORS IN ROBUST BAYESIAN INFERENCE DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Robert Emerson George, B. A., M.S. The Ohio State University 1993 Dissertation Committee: Prem K. Goel Mark Berliner Adviser, Saul Blumenthal Department of Statistics To Mrs. Sharia J, Keebaugh With Deep Appreciation for Showing Me the Majesty of the Mathematical Sciences ACKNOWLEDGEMENTS I wish to express my gratitude to my adviser, Dr. Prem Goel: without his invaluable advice, farsighted guidance, and unflagging enthusiasm, this thesis could not have been completed. I also wish to thank Drs. Mark Berliner and Saul Blumenthal, the other members of my Committee; and Dr. Steve MacEachem, who after serving on my General Examination Committee was prevented by scheduling conflicts from serving on my Committee. Also, I have benefitted greatly from the efficiency, knowledge, and eagerness to help of the staffs of the Statistical Computing Laboratory and of The Ohio State University Libraries. Finally, I thank my family for all the innumerable kindnesses, many great and and some seemingly small, they have so willingly shown me throughout my life. iii VITA July 5, 1966 Bom - Urbana, Ohio 1988 B. A ., Wittenberg University, Springfield, Ohio 1988-1991 National Science Foundation Fellow, Department of Statistics, The Ohio State University, Columbus, Ohio 1990 M. S ., Department of Statistics, The Ohio State University, Columbus, Ohio 1991-1992 Graduate Teaching / Consulting Assistant, Department of Statistics, The Ohio State University, Columbus, Ohio 1992-1993 Fellow, Graduate School, The Ohio State University, Columbus, Ohio PUBLICATIONS 1987. "Some Heuristics for Solving Elementary Word Problems." Spectrum: Writing at Wittenberg (2), 59-61. FIELDS OF STUDY Major Field; Statistics Studies in Decision Theory, Sequential Analysis (Dr. Prem Goel) Studies in Bayesian Inference (Dr. Mark Berliner) Studies in Statistical Computing (Dr. Jason Hsu, Dr. Elizabeth Stasny) Studies in Multivariate Analysis (Dr. Sue Leurgans, Dr. Joseph Verducci) iv TABLE OF CONTENTS DEDICATION ii ACKNOWLEDGEMENTS iii VITA iv LIST OF TABLES vii LIST OF FIGURES viii CHAPTER PAGE I INTRODUCTION AND SUMMARY 1 Overview 1 Some Notation and Terminology 2 Earlier Work on Hierarchical Priors 3 Information Theory and Hierarchical Priors 8 A Survey of Bayesian Robustness 11 Summary 16 II. BAYESIAN ROBUSTNESS UNDER SQUARED ERROR LOSS 19 General Remarks 19 Estimation of a Normal Mean 22 Estimation of the Exponential Parameter 27 III. BAYESIAN ROBUSTNESS AND KULLBACK-LEIBLER DISTANCE 36 Kullback-Leibler Distance 36 The Kullback-Leibler Approach to the Normal Problem 38 Some Further Results on Kullback-Leibler Distance 42 IV BAYESIAN ROBUSTNESS AND FINITE MIXTURES 52 Introduction 52 Main Results 54 V. HIERARCHICAL PRIORS AND T-MINIMAXITY 67 r-Minimaxity: General Remarks 67 v r-Minimax Rules as Hierarchical Bayes Rules 68 r-Minimax Rules when F Contains Two Priors 80 A r-Mirrimax Regret Procedure for Testing a Normal Mean 91 VI. CONCLUSIONS AND FUTURE AVENUES OF RESEARCH 100 Summary of Chapters II-V 100 Future Avenues of Research 103 APPENDICES A SOME FORTRAN AND PASCAL PROGRAMS USED IN SECTION 2.3 106 B. FINITE MIXTURE DISTRIBUTIONS 113 C SOME EXACT AND APPROXIMATE FORMULAE 118 Introduction 118 Normal Likelihood, Cauchy Prior 118 Normal Likelihood, Double Exponential Prior 123 Approximate Computational Formulae 130 D. SOME FORTRAN AND PASCAL PROGRAMS USED IN CHAPTER IV 134 E SOME FORTRAN PROGRAMS USED IN CHAPTER V 147 LIST OF REFERENCES 150 vi LIST OF TABLES TABLES PAGE 1. Comparison of Hierarchical and "Best-Guess" Rules 35 2. Ratio of Regret for the Hierarchical Rule vs. Incorrect Benchmark Rule 61 3. Ratio of Risk for the Hierarchical Rule vs. the Benchmark Rule 61 4. Ratio of Regret for the Hierarchical Rule vs. Incorrect Benchmark Rule for S Containing Three Priors 64 5. Table of Risk for the Hierarchical Rule vs. Risk for the Benchmark Rule for E Containing Three Priors 65 6 . Ratio of Regret for Optimal Hierarchical Rule to Regret for Incorrect Rule 96 7. Ratio of Regret for Approximate vs. Optimal Hierarchical Rule 99 LIST OF FIGURES FIGURES PAGE 1. The Behavior of fP and fM 48 2. The Behavior of the Hierarchical and “Best Guess" Prior 51 3. Behavior of the Tails of the Three Priors 66 4. A Sketch of the Behavior of Regret Functions 87 CHAPTER I INTRODUCTION AND SUMMARY 1.1: Overview Two of the most extensively- and intensively-studied areas of Bayesian decision theory are those centered upon hierarchical-prior models and upon Bayesian robustness. The degree of interest in these areas is perhaps not surprising, for one can argue that hierarchical priors are (for reasons which will be discussed below) "as Bayesian as even a Bayesian can get" while on the other hand the issue of Bayesian robustness is one which must be confronted if serious criticisms of the Bayesian paradigm are to be addressed. In this thesis we will discuss various approaches to using hierarchical priors to achieve Bayesian robustness in a variety of situations. Section 1.2 establishes some fundamental notation and terminology which will be used in later chapters. Sections 1.3 and 1.4 give a brief survey of certain topics pertaining to hierarchical Bayesian models1 while Section 1.5 surveys Bayesian robustness. Section 1.6 summarizes the results of Chapters II through V; Chapter VI reviews and integrates earlier material, and discusses various problems to be examined in the future. 1 Those two sections by no means constitute an exhaustive survey of all the important or elegant work done in hierarchical Bayesian methods, and exclusion of any particular work is in no way reflective of a negative evaluation of that work. Rather, topics presented are those which convey something of the history of the development of hierarchical models. 1 1.2: Some Notation and Terminology Before proceeding further, we review some notation and terminology: the data x are realizations of a random variable X defined on a sample space X. The distribution of X is f(ie),e£0, (1.2.1) with 0 unknown. (Note that 0 could be any index set: in particular, attention is not restricted to parametric families, as we shall see in Chapter IV. ) We denote the prior distribution on 0 by it, and the distribution of 0|X = x (i.e., the posterior distribution) by n(.|x). The marginal (or predictive) distribution of X is given by m(x)= Jf(x|0)dit(0). ( 12.2) e The action space A consists of all possible actions (or decisions) open to the statisticians, and for each a e A, 0 e© there is associated a loss L(0,a). We shall assume that all loss functions discussed herein are bounded from below. The posterior expected loss of an action a is given by p U x ,a ) = E"'*‘)[L(e,a)]. (1.2.3) A (possibly randomized) decision rule mapping X into A will be denoted by 5, and the set of all such decision rules will be denoted by D*. The frequentist risk function R(0,5) and the Bayes risk r(7t,5) for the decision rule 5 are, respectively, R(0,S) = Ef(xp)[L(0,8(x))], (1.2.4) and: r(x,5) = E^e*[R(0,8)]. (1.2.5) It is well-known (Berger, 1985) that, for loss functions bounded from below r(it,5) = Em(x)[p(jt,X,6(X))]. (1.2.6) The above notation is used throughout this thesis, although at times certain extensions and modifications are made. 3 1.3: Earlier Work on Hierarchical Priors When one thinks of hierarchical priors, the name of I. J. Good immediately comes to mind. We begin this section by discussing Good's contribution to the theoretical and philosophical justification for hierarchical models. Good (Good, 1980) views "[t]he notion of a hierarchy of different types, orders, levels, or stages of probability" as arising naturally in three distinct settings. Before delineating these settings, some terminology is in order. By a "physical probability" Good means "an intrinsic property of the material world, existing irrespective of minds and logic. .. psychological probability is a degree of belief or intensity of conviction that is used .. for making decisions." A "subjective probability" is one of a set of psychological probabilities which are coherent in the sense that those probabilities cannot lead their adherents into bets which are certain to lose. A "logical probability . is a rational intensity of conviction, implicit in the given information . such that if a person does not agree with it he is wrong (Good, 1965). " Subjective probabilities are properly viewed as approximations to logical probabilities (Good, 1976). The three situations (Good, 1980) wherein hierarchies of probabilities can arise are: (i) Hierarchies o f physical probabilities. Good says that the "meaning . is made clear merely by mentioning populations, superpopulations, and super-duper-populations. " As one moves "downward" from the superpopulation to the subpopulations, probabilities will often change: this is the idea behind advertising which "targets" a particular subset of the population. The probability that a teenage consumer will buy a skateboard is much different from, and higher than, the probability that a consumer will buy a skateboard. (ii) Situations that involve more them one ",kind" o f probability (e.g., physical and subjective). By introducing a hierarchical structure, one can simplify the problem somewhat and separate the various kinds of probabilities: for instance, the first stage might 4 be physical while the second stage is subjective.