The Interplay of Bayesian and Frequentist Analysis M.J.Bayarriandj.O.Berger
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Science 2004, Vol. 19, No. 1, 58–80 DOI 10.1214/088342304000000116 © Institute of Mathematical Statistics, 2004 The Interplay of Bayesian and Frequentist Analysis M.J.BayarriandJ.O.Berger Abstract. Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the debate has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility, Bayesian model checking, condi- tional frequentist, confidence intervals, consistency, coverage, design, hierar- chical models, nonparametric Bayes, objective Bayesian methods, p-values, reference priors, testing. CONTENTS 5. Areas of current disagreement 6. Conclusions 1. Introduction Acknowledgments 2. Inherently joint Bayesian–frequentist situations References 2.1. Design or preposterior analysis 2.2. The meaning of frequentism 2.3. Empirical Bayes, gamma minimax, restricted risk 1. INTRODUCTION Bayes Statisticians should readily use both Bayesian and 3. Estimation and confidence intervals frequentist ideas. In Section 2 we discuss situations 3.1. Computation with hierarchical, multilevel or mixed in which simultaneous frequentist and Bayesian think- model analysis ing is essentially required. For the most part, how- 3.2. Assessment of accuracy of estimation ever, the situations we discuss are situations in which 3.3. Foundations, minimaxity and exchangeability 3.4. Use of frequentist methodology in prior develop- it is simply extremely useful for Bayesians to use fre- ment quentist methodology or frequentists to use Bayesian 3.5. Frequentist simplifications and asymptotic approxi- methodology. mations The most common scenarios of useful connections 4. Testing, model selection and model checking between frequentists and Bayesians are when no exter- 4.1. Conditional frequentist testing nal information (other than the data and model itself) 4.2. Model selection is to be introduced into the analysis—on the Bayesian 4.3. p-Values for model checking side, when “objective prior distributions” are used. Frequentists are usually not interested in subjective, M. J. Bayarri is Professor, Department of Statistics, informative priors, and Bayesians are less likely to be University of Valencia, Av. Dr. Moliner 50, 46100 interested in frequentist evaluations when using sub- Burjassot, Valencia, Spain (e-mail: susie.bayarri@ jective, highly informative priors. uv.es). J. O. Berger is Director, Statistical and Applied We will, for the most part, avoid the question Mathematical Sciences Institute, and Professor, Duke of whether the Bayesian or frequentist approach to University, P.O. Box 90251, Durham, North Carolina statistics is “philosophically correct.” While this is a 27708-0251, USA (e-mail: [email protected]). valid question, and research in this direction can be of 58 BAYESIAN AND FREQUENTIST ANALYSIS 59 fundamental importance, the focus here is simply on EXAMPLE 2.1. Suppose X1,...,Xn are i.i.d. methodology. In a related vein, we avoid the question Poisson random variables with mean θ, and that it is of what is “pedagogically correct.” If pressed, we desired to estimate√θ under the weighted squared er- would probably argue that Bayesian statistics (with ror loss (θˆ − θ)2/ θ and using the classical estima- ˆ ¯ emphasis on objective Bayesian methodology) should tor θ = X. This√ estimator√ has frequentist expected loss ¯ 2 be the type of statistics that is taught to the masses, Eθ [(X − θ) / θ]= θ/n. with frequentist statistics being taught primarily to A typical design problem would be to choose the advanced statisticians, but that is not an issue for sample size n so that the expected loss is less than this paper. some prespecified limit C. (An alternative formulation Several caveats are in order. First, we primarily focus might be to minimize C + nc,wherec is the cost of an on the Bayesian and frequentist approaches here; these observation, but this would not significantly alter the are the most generally applicable and accepted statisti- discussion here.) This is clearly not possible, for all θ; cal philosophies, and both have features that are com- hence we must bring prior knowledge about θ into play. pelling to most statisticians. Other statistical schools, A primitive recommendation that one often sees, in such as the likelihood school (see, e.g., Reid, 2000), such situations, is to make√ a “best guess” θ0 for θ, have many attractive features and vocal proponents, but ≤ and√ then choose n so that θ0/n C; that is, choose have not been as extensively developed or utilized as n ≈ θ0/C. This is needlessly dogmatic, in that one the frequentist and Bayesian approaches. rarely believes particularly strongly in a particular A second caveat is that the selection of topics value θ0. here is rather idiosyncratic, being primarily based on A common primitive recommendation in the oppo- situations and examples in which we are currently site direction is to choose√ an upper bound θU for θ, interested. Other Bayesian–frequentist synthesis works ≤ and then√ choose n so that θU /n C; that is, choose (e.g., Pratt, 1965; Barnett, 1982; Rubin, 1984; and n ≈ θU /C. This is needlessly conservative, in that even Berger, 1985a) focus on a quite different set of the resulting n will typically be much larger than situations. Furthermore, we almost completely ignore needed. many of the most time-honored Bayesian–frequentist The Bayesian approach to the design question is synthesis topics, such as empirical Bayes analysis. to elicit a subjective prior distribution π(θ) for θ, Hence, rather than being viewed as a comprehensive √ and then to choose n so that θ π(θ)dθ ≤ C;that review, this paper should be thought of more as a √ n ≈ personal view of current interesting issues in the is, choose n θπ(θ)dθ/C. This is a reasonable Bayesian–frequentist synthesis. compromise between the above two extremes and will typically result in the most reasonable values of n. 2. INHERENTLY JOINT BAYESIAN– Classical design texts often focus on the very spe- FREQUENTIST SITUATIONS cial situations in which the design criterion is constant There are certain statistical scenarios in which a in the unknown model parameter θ, and hence fail to joint frequentist–Bayesian approach is arguably re- clarify the philosophical centrality of Bayesian issues quired. As illustrations of this, we first discuss the in design. The basic fact is that, before experimenta- issue of design—in which the notion should not be tion, one knows neither the data nor θ, and so expec- controversial—and then discuss the basic meaning of tations over both (i.e., both frequentist and Bayesian frequentism, which arguably should be (but is not expectations) are needed for design. See Chaloner and typically perceived as) a joint frequentist–Bayesian Verdinelli (1995) and Dawid and Sebastiani (1999). endeavor. A very common situation in which design evaluation is not constant is classical testing, in which the sample 2.1 Design or Preposterior Analysis size is often chosen to achieve a given power at a Frequentist design focuses on planning of experi- specified value θ of the parameter under the alternative ments—for instance, the issue of choosing an appro- hypothesis. Again, specifying a specific θ is very priate sample size. In Bayesian analysis this is often crude when viewed from a Bayesian perspective. Far called preposterior analysis, because it is done before more reasonable for a classical tester would be to the data is collected (and, hence, before the posterior specify a prior distribution for θ under the alternative, distribution is available). and consider the average power with respect to this 60 M. J. BAYARRI AND J. O. BERGER distribution. (More controversial would be to consider The impact of this “real” frequentist principle thus an average Type I error.) arises when the frequentist evaluation of a procedure is not constant over the parameter space. Here is 2.2 The Meaning of Frequentism an example. There is a sense in which essentially everyone should EXAMPLE 2.2. Binomial confidence interval. ascribe to frequentism: Brown, Cai and DasGupta (2001, 2002) considered the ∼ FREQUENTIST PRINCIPLE. In repeated practical problem of observing X Binomial(n, θ) and deter- use of a statistical procedure, the long-run average mining a 95% confidence interval for the unknown actual accuracy should not be less than (and ideally success probability θ. We consider here the special = should equal) the long-run average reported accuracy. case of n 50, and two confidence procedures. The first is CJ (x), defined as the “Jeffreys equal-tailed This version of the frequentist principle is actually 95% confidence interval,” given by a joint frequentist–Bayesian principle. Suppose, for in- J = stance, that we decide it is relevant to statistical prac- (2.1) C (x) q0.025(x), q0.975(x) , tice to repeatedly use a particular statistical model and where qα(x) is the αth-quantile of the Beta(x + 0.5, procedure—for instance, a 95% classical confidence 50.5 − x) distribution. The second confidence proce- interval for a normal mean. This procedure will, in dure we consider is the “modified Jeffreys equal-tailed practice, be used on a series of different problems 95% confidence interval,” given by involving a series of different normal means with a cor- = q0.025(x), q0.975(x) , if x 0 responding series of data. Hence, in evaluating the pro- and x = n, cedure, we should simultaneously be averaging over J ∗ = (2.2) C (x) the differing means and data. 0,q0.975(x) , if x = 0, This is in contrast to textbook statements of the q0.025(x), 1 , if x = n.