Bayesian Models and Methods in Public Policy and Government

Statistical Science 2011, Vol. 26, No. 2, 212–226 DOI: 10.1214/10-STS331 c Institute of Mathematical Statistics, 2011 Bayesian Models and Methods in Public Policy and Government Settings1 Stephen E. Fienberg Abstract. Starting with the neo-Bayesian revival of the 1950s, many statisticians argued that it was inappropriate to use Bayesian methods, and in particular subjective Bayesian methods in governmental and public policy settings because of their reliance upon prior distributions. But the Bayesian framework often provides the primary way to respond to questions raised in these settings and the numbers and diversity of Bayesian applications have grown dramatically in recent years. Through a series of examples, both historical and recent, we ar- gue that Bayesian approaches with formal and informal assessments of priors AND likelihood functions are well accepted and should become the norm in public settings. Our examples include census-taking and small area estimation, US election night forecasting, studies reported to the US Food and Drug Administration, assessing global climate change, and measuring potential declines in disability among the elderly. Key words and phrases: Census adjustment, confidentiality, disability measurement, election night forecasting, Bayesian clinical drug studies, global warming, small area estimation. 1. INTRODUCTION AND HISTORY ods in governmental settings were based primarily on descriptive statistics and there was little debate Beginning with the posthumous publication in 1763 regarding the relevance of Bayesian approaches in of the essay attributed to the Rev. Thomas Bayes, public life despite efforts at implementation, for ex- and continuing well into the twentieth century, vir- ample, Laplace’s development of ratio estimation to tually the only approach to statistical inference was estimate the size of the population of France. the method of inverse probability based on applica- arXiv:1108.2177v1 [stat.ME] 10 Aug 2011 Criticism of the method of inverse probability, as tions of Bayes’s theorem (see, e.g., Fienberg, 2006a). Bayesian methodology was known for almost Nonetheless, most applications of statistical meth- 200 years, began in the mid-19th century with the rise of a philosophical school advocating objective Stephen E. Fienberg is Maurice Falk University probability. The fundamental concern of the objec- Professor, Department of Statistics, Heinz College, tivists was the requirement for a prior distribution Machine Learning Department, and Cylab, Carnegie and they argued for a frequentist view of probabil- Mellon University, Pittsburgh, Pennsylvania 15213-3890, USA e-mail: fi[email protected]; URL: ity. Unfortunately they failed to present a method- http://www.stat.cmu.edu/fienberg/. ology for inference to counter that of inverse proba- 1Discussed in 10.1214/11-STS331A, 10.1214/11-STS331B bility and it was not until the work of R. A. Fisher and 10.1214/11-STS331C; rejoinder at and Jerzy Neyman and Egon Pearson in the 1920s 10.1214/11-STS331REJ. that serious alternative statistical procedures were This is an electronic reprint of the original article in place. Neyman’s (1934) critique of Gini’s version published by the Institute of Mathematical Statistics in of the representative method for survey taking not Statistical Science, 2011, Vol. 26, No. 2, 212–226. This only ushered the frequentist repeated sampling per- reprint differs from the original in pagination and spective into the realm of official statistics, but it typographic detail. also introduced the frequentist tool of confidence in- 1 2 S. E. FIENBERG tervals and its long-run repeated sampling interpre- use in a public context. Then in Section 3, through tation (see Fienberg and Tanur, 1996). a series of examples, both historical and recent, I ar- Bayesian tools played an important role in a num- gue that Bayesian approaches with formal and infor- ber of statistical efforts during World War II, includ- mal assessments of priors and likelihood functions ing Alan Turing’s work at Bletchley Park, England, are well accepted and should become the norm in to crack the Enigma code, but with the creation public settings. My examples include US election of such frequentist methods as sequential analysis night forecasting, census-taking and small area es- by Barnard in England and Wald in the United timation, studies reported to the US Food and Drug States and the elaboration of design-based analy- Administration, assessing global climate change, and ses in sample surveys, as statistics passed the mid- measuring declines in disability among the elderly. century mark, frequentist approaches were in the We conclude with a brief summary of challenges fac- ascendancy in the public arena. This was especially ing broader implementation of Bayesian methods in true in statistical agencies where the ideas of ran- public contexts. dom selection of samples and repeated sampling as I do not claim to be providing a comprehensive ac- the basis of inference were synonymous, and statis- count of Bayesian applications but have merely at- tical models and likelihood-based methods frowned tempted to illustrate their breadth. One area where upon at best. Bayesian ideas have made serious inroads, both in With the introduction of computers for statistical theory and in actual practice, but which we do not calculations in the 1960s, however, Bayesian meth- discuss here is the law (e.g., see Fienberg and Kada- ods began a slow but prolonged comeback that ac- ne, 1983; Donnelly, 2005; Taroni et al., 2006; Kadane, celerated substantially with the introduction of Mar- 2008). The present article includes a purposeful se- kov chain Monte Carlo (MCMC) methods in the lection of references to guide the reader to some of early 1990s. Today Bayesian methods are challeng- the relevant recent Bayesian literature on applica- ing the supremacy of the frequentist approaches in tions in the domains mentioned, but the list is far a wide array of areas of application. from comprehensive and tends to emphasize work How do the approaches differ? In frequentist infe- closest to my own. rence, tests of significance are performed by suppo- sing that a hypothesis is true (the null hypothesis) 2. THE ARGUMENTS FOR AND AGAINST and then computing the probability of observing THE USE OF BAYESIAN METHODS a statistic at least as extreme as the one actually observed during hypothetical future repeated trials con- Bayesian and frequentist inference in a nutshell: ditional on the parameters, that is, a p-value. Baye- It is especially convenient for the present purposes sian inference relies upon direct inferences about pa- to think about Bayes’s theorem in terms of density rameters or predictions conditional on the observa- functions. Let h(y|θ) denote the conditional density tions. In other words, frequentist statistics examines of the random variable Y given a parameter value θ the probability of the data given a model (hypoth- in the parameter space Θ. Then we can go from esis) and looks at repeated sampling properties of the prior distribution for θ, g(θ), to that associated a procedure, whereas Bayesian statistics examines with θ given Y = y, g(θ|y), by the probability of a model given the observed data. (1) g(θ|y)= h(y|θ)g(θ)/ h(y|θ)g(θ) Bayesian methodology relies largely upon Bayes’s θX∈Θ theorem for computing posterior probabilities and provides an internally consistent and coherent nor- if θ has a discrete distribution, mative methodology; frequentist methodology has (2) g(θ|y)= h(y|θ)g(θ)/ h(y|θ)g(θ) dθ no such consistent normative framework. Freedman ZΘ (1995) gave an overview of these philosophical posi- if θ has a continuous distribution. tions, but largely from a frequentist perspective that is critical of the Bayesian normative approach. Bayesians make inferences about the parameters by The remainder of the article has the following looking directly at the posterior distribution g(θ|y) structure. In the next section I give a summary of given the data y. Frequentists make inferences some of the most common and cogent criticisms of about θ indirectly by considering the repeated sam- the Bayesian method, especially with regard to its pling properties of the distribution of the data y BAYESIAN METHODS IN PUBLIC POLICY 3 given the parameter θ, that is, through h(y|θ). Baye- of the parameter, or those that are “information- sians integrate out quantities not of direct substan- less.” Berger (2006) and Goldstein (2006) presented tive interest and then are able to make probabilis- arguments in favor of the objective and subjective tic inferences from marginal distributions. Most fre- Bayesian approaches in a forum followed by exten- quentists use some form of conditioning argument sive discussion. For a discussion of the fruitlessness for inference purposes while others maximize like- of the search for an objective and informationless lihood functions. Frequentists distinguish between prior, see the article by Fienberg (2006b). random variables and parameters which they take to There are a number of other features associated be fixed and this leads to linear mixed models where with the subjective approach including the elicita- some of the effects are fixed, that is, are parameters, tion of information for the formulation of prior dis- and some are random variables. For a Bayesian all tributions and the use of exchangeability in the de- linear models are in essence random effects models velopment of successive layers of hierarchical mod- since parameters are themselves considered as ran- els. A number of the examples described in the sec- dom variables. Thus it is natural for a Bayesian to tions that follow utilize subjective Bayesian features consider them to be independent draws from a com- although not always with full elicitation. mon distribution, g(θ), that is, treating them as ex- One characteristic of Bayesian inference that weak- changeable following the original argument of de Fi- ens this criticism of the reliance on the prior dis- netti (1937).

Load more