arXiv:1108.2177v1 [stat.ME] 10 Aug 2011 oiyadGvrmn Settings Public Fienberg E. Government in Stephen and Methods Policy and Models Bayesian oehls,ms plctoso ttsia meth- statistical of applications most Nonetheless, 01 o.2,N.2 212–226 2, DOI: No. 26, Vol. 2011, Statistical 10.1214/11-STS331REJ and http://www.stat.cmu.edu/fienberg/ Fienberg, e.g., applica- (see, theorem on Bayes’s based of probability tions inverse was of method statistical to the vir- approach century, Bayes, only twentieth the Thomas the tually Rev. into the well continuing to and attributed essay the of 51-80 S e-mail: USA 15213-3890, Pennsylvania Pittsburgh, Carnegie University, Cylab, Mellon and Department, College, Heinz Machine , of Department Professor,

tpe .Febr sMuieFl University Falk Maurice is Fienberg E. Stephen c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint einn ihtepshmu ulcto in publication posthumous the with Beginning nttt fMteaia Statistics Mathematical of Institute 1 icse in Discussed 10.1214/11-STS331C 10.1214/10-STS331 .ITOUTO N HISTORY AND INTRODUCTION 1. 10.1214/11-STS331A nttt fMteaia Statistics Mathematical of Institute , lblwrig ml raestimation. area small warming, clinic global Bayesian forecasting, night election measurement, el phrases: the and among words disability Key in declines clima potential global measuring assessing Administration, and Drug -ta and stud Food include forecasting, US night examples the election Our US estimation, shoul settings. area and public small accepted in well norm are asses the functions informal likelihood and rec AND formal and priors with historical approaches both Bayesian examples, that numb dramaticall of gue the series grown and a have settings Through applications years. these Bayesian in prim of raised the pr diversity questions provides upon to often reliance respond framework gover their to Bayesian of in the because But methods settings butions. Bayesian Bayes policy subjective use public to particular and inappropriate in was and it ods, that argued Abstract. 01 o.2,N.2 212–226 2, No. 26, Vol. 2011, . eone at rejoinder ; fi[email protected] 2011 , trigwt h e-aeinrvvlo h 90,many 1950s, the of revival neo-Bayesian the with Starting . , 10.1214/11-STS331B essajsmn,cndnilt,disability confidentiality, adjustment, Census URL: ; 2006a This . 1763 ). in 1 pcieit h el foca ttsis u it in- but confidence statistics, of tool official frequentist of the introduced realm also the per- into not repeated spective taking frequentist survey the for ushered only method representative the of ( were Neyman’s 1920s procedures place. the Fisher statistical in in A. alternative Pearson R. serious Egon of that work and the Neyman until Jerzy not proba- and was inverse of it that and counter method- bility to a inference probabil- present for of to ology view failed frequentist they a distribution Unfortunately ity. for prior argued a they for and requirement objec- the the was objective of tivists advocating concern almost the fundamental school with The philosophical probability. century for a mid-19th of the known rise in began was years, 200 methodology Bayesian France. of population the of size to the estimation estimate ratio ex- of for development implementation, Laplace’s in at ample, efforts approaches despite Bayesian life debate of public little relevance was the there primarily regarding and based statistics were descriptive settings on governmental in ods rtcs ftemto fivrepoaiiy as probability, inverse of method the of Criticism ldu studies, drug al e eotdto reported ies 1934 nrecent in y derly. n,w ar- we ent, echange, te meth- ian o distri- ior become d mnsof sments igand king rtqeo iisversion Gini’s of critique ) 1 nmental r way ary r and ers 2 S. E. FIENBERG tervals and its long-run repeated sampling interpre- use in a public context. Then in Section 3, through tation (see Fienberg and Tanur, 1996). a series of examples, both historical and recent, I ar- Bayesian tools played an important role in a num- gue that Bayesian approaches with formal and infor- ber of statistical efforts during World War II, includ- mal assessments of priors and likelihood functions ing Alan Turing’s work at Bletchley Park, England, are well accepted and should become the norm in to crack the Enigma code, but with the creation public settings. My examples include US election of such frequentist methods as night forecasting, census-taking and small area es- by Barnard in England and Wald in the United timation, studies reported to the US Food and Drug States and the elaboration of design-based analy- Administration, assessing global climate change, and ses in sample surveys, as statistics passed the mid- measuring declines in disability among the elderly. century mark, frequentist approaches were in the We conclude with a brief summary of challenges fac- ascendancy in the public arena. This was especially ing broader implementation of Bayesian methods in true in statistical agencies where the ideas of ran- public contexts. dom selection of samples and repeated sampling as I do not claim to be providing a comprehensive ac- the basis of inference were synonymous, and statis- count of Bayesian applications but have merely at- tical models and likelihood-based methods frowned tempted to illustrate their breadth. One area where upon at best. Bayesian ideas have made serious inroads, both in With the introduction of computers for and in actual practice, but which we do not calculations in the 1960s, however, Bayesian meth- discuss here is the (e.g., see Fienberg and Kada- ods began a slow but prolonged comeback that ac- ne, 1983; Donnelly, 2005; Taroni et al., 2006; Kadane, celerated substantially with the introduction of Mar- 2008). The present article includes a purposeful se- kov chain Monte Carlo (MCMC) methods in the lection of references to guide the reader to some of early 1990s. Today Bayesian methods are challeng- the relevant recent Bayesian literature on applica- ing the supremacy of the frequentist approaches in tions in the domains mentioned, but the list is far a wide array of areas of application. from comprehensive and tends to emphasize work How do the approaches differ? In frequentist infe- closest to my own. rence, tests of significance are performed by suppo- sing that a is true (the null hypothesis) 2. THE ARGUMENTS FOR AND AGAINST and then computing the probability of observing THE USE OF BAYESIAN METHODS a at least as extreme as the one actually ob- served during hypothetical future repeated trials con- Bayesian and in a nutshell: ditional on the , that is, a p-value. Baye- It is especially convenient for the present purposes sian inference relies upon direct about pa- to think about Bayes’s theorem in terms of density rameters or conditional on the observa- functions. Let h(y|θ) denote the conditional density tions. In other words, frequentist statistics examines of the Y given a value θ the probability of the given a model (hypoth- in the parameter space Θ. Then we can go from esis) and looks at repeated sampling properties of the prior distribution for θ, g(θ), to that associated a procedure, whereas examines with θ given Y = y, g(θ|y), by the probability of a model given the observed data. (1) g(θ|y)= h(y|θ)g(θ)/ h(y|θ)g(θ) Bayesian methodology relies largely upon Bayes’s θX∈Θ theorem for computing posterior probabilities and provides an internally consistent and coherent nor- if θ has a discrete distribution, mative methodology; frequentist methodology has (2) g(θ|y)= h(y|θ)g(θ)/ h(y|θ)g(θ) dθ no such consistent normative framework. Freedman ZΘ (1995) gave an overview of these philosophical posi- if θ has a continuous distribution. tions, but largely from a frequentist perspective that is critical of the Bayesian normative approach. Bayesians make inferences about the parameters by The remainder of the article has the following looking directly at the posterior distribution g(θ|y) structure. In the next section I give a summary of given the data y. Frequentists make inferences some of the most common and cogent criticisms of about θ indirectly by considering the repeated sam- the Bayesian method, especially with regard to its pling properties of the distribution of the data y BAYESIAN METHODS IN PUBLIC POLICY 3 given the parameter θ, that is, through h(y|θ). Baye- of the parameter, or those that are “- sians integrate out quantities not of direct substan- less.” Berger (2006) and Goldstein (2006) presented tive interest and then are able to make probabilis- arguments in favor of the objective and subjective tic inferences from marginal distributions. Most fre- Bayesian approaches in a forum followed by exten- quentists use some form of conditioning argument sive discussion. For a discussion of the fruitlessness for inference purposes while others maximize like- of the search for an objective and informationless lihood functions. Frequentists distinguish between prior, see the article by Fienberg (2006b). random variables and parameters which they take to There are a number of other features associated be fixed and this leads to linear mixed models where with the subjective approach including the elicita- some of the effects are fixed, that is, are parameters, tion of information for the formulation of prior dis- and some are random variables. For a Bayesian all tributions and the use of exchangeability in the de- linear models are in essence random effects models velopment of successive layers of hierarchical mod- since parameters are themselves considered as ran- els. A number of the examples described in the sec- dom variables. Thus it is natural for a Bayesian to tions that follow utilize subjective Bayesian features consider them to be independent draws from a com- although not always with full elicitation. mon distribution, g(θ), that is, treating them as ex- One characteristic of Bayesian inference that weak- changeable following the original argument of de Fi- ens this criticism of the reliance on the prior dis- netti (1937). This approach leads naturally to put- tribution is that the more data we collect, the less ting distributions on the parameters of prior distri- influence the prior distribution has on the posterior butions and to what we now call the hierarchical distribution relative to that of the data. There are Bayesian model. It is the normalizing constants [the situations, however, where even an infinite amount denominators of (1) and (2)] that are notoriously of data may not bring two people into agreement difficult to compute and this fact has led, in large (see, e.g., Diaconis and Freedman, 1986). part, to the use of MCMC methods such as Gibbs Another aspect of the Bayesian methodology that sampling that involve sampling from the posterior arises in many applications is the manner in which distribution. it “borrows strength” when we are estimating many A reviewer of an earlier version of this article sug- parameters simultaneously, especially through the gested that hierarchical models are really not Baye- use of hierarchical models. This feature, which is sian, unless one puts a prior at the top level of the hi- usually viewed as a virtue, has also been the focal erarchy. This ignores history. As Good (1965) noted, his own use of such ideas draws on work dating back point of criticism by frequentists. For example, see at least to the 1920s and the work of W. E. John- the commentary by Freedman and Navidi (1986) in son whose “sufficientness” postulate implicitly used the context of census adjustment, in which they cri- finite exchangeable sequences. And while non-Baye- tiqued a Bayesian methodology at least in part be- sians came to recognize the power of such structures cause it resulted in the use of data from one state to many decades later they did attempt to emulate the adjust the census-based population figures in other Bayesian approach, but of course without the clean ones. Today, borrowing strength via cross-area re- Bayesian probabilistic interpretation. gression models is common in frequentist circles, and Critique of the Bayesian perspective: The most the Freedman–Navidi argument thus takes on a non- common criticism of Bayesian methods is that, since statistical legal issue rather than a statistical one. there is no single correct prior distribution, g(θ), all For an interesting dialog on different frequentist conclusions drawn from the posterior distribution perspectives related to , see the are suspect. One counter to this argument is that discussion paper by a group of frequentist statisti- published analyses using Bayesian methods should cians at Groningen University in The Netherlands, consider and report the results associated with a va- Kardaun et al. (2003), which was a response to a se- riety of prior distributions, thus allowing the reader ries of questions posed by following a lec- to see the effects of different prior beliefs on the ture at Groningen. As someone else has noted, it is posterior distribution of a parameter. Others argue a rare occasion where frequentists seriously enter- that one should choose as a prior distribution one tain ideas such as those extolled by de Finetti (1937) that in some sense eliminates personal subjectiv- and attempt to reject them. A number of the ques- ity. Examples of such “objective” priors are those tions discussed in this article arise in the context of that are uniform or diffuse across all possible values the examples that follow. 4 S. E. FIENBERG

3. SMALL AREA ESTIMATION AND CENSUS to begin with, let alone Bayesian formulations; for ADJUSTMENT example, see the descriptions of small area estima- tion methodology in the book by Rao (2003), and Small area estimation: As we have already inti- contrast it with the Bayesian hierarchical formula- mated, small area estimation has been a ripe area for tions in the work of Ballin, Scanu and Vicard (2005) Bayesian methods although because so much of the and Trevisani and Torelli (2004). literature has been oriented toward national statis- Census adjustment: What is remarkable about the tical agency problems, the area is dominated by fre- ascendency of the small area estimation methodol- quentist techniques and assessments. Surveys con- ogy in the United States is that many of those who ducted by national statistical agencies typically gen- argued for its use opposed the use of essentially the erate “reliable” information either at national or re- same ideas for census adjustment for differential un- gional levels. But the demand for information at dercount in the 1980s and 1990s. The basic compo- lower levels of disaggregation is sufficiently great and nent of census adjustment in these debates was the resources tend to be relatively scarce, so that tech- use of the now standard capture-recapture method- niques that bolster the sparsity of data at the lower ology for population estimation (e.g., see Bishop, level of disaggregation with data from other sources Fienberg and Holland, 1975, Chapter 6), methodol- or from other areas or domains are essential to get- ogy that has its roots in Laplace’s method of ratio ting estimates with relatively small standard errors. estimation. Because a second count (the recapture) The big question is with respect to what distri- in a census context cannot reasonably be done for bution are the standard errors computed. There are the nation as a whole, methods that utilize a sam- three different answers depending on one’s perspec- ple of individuals were introduced in 1950 and to get tive. Sampling statisticians most often wish to take small area estimates of population, that is, for ev- expectations with respect to the random structure ery block in the nation, Ericksen and Kadane (1985) in the sampling design. At the other extreme are proposed the use of a Bayesian regression model for Bayesians for whom the variability is an inherent smoothing. Being fully Bayesian was especially im- part of the stochastic model structure for the phe- portant because of the sparseness of the data at their nomenon of interest, for example, unemployment or disposal for adjustment, based on a sample from crime. And in the middle are model-based likelihood the Current Population Survey. As we noted above, statisticians. My argument is that in the context Freedman and Navidi (1986) opposed the use of this of small area estimation the design-based statisti- methodology as did Fay and Herriot’s colleagues at cians were singularly unsuccessful until they emu- the US Census Bureau, at least in part on its use of lated Bayesian ideas of smoothing and borrowing models with unverifiable assumptions, and precisely strength, but even then they have insisted on av- because the approach embedded in the eraging with respect to the sampling design, with methodology borrowed strength across state bound- arguments about robustness of results. aries to get sufficiently tight estimates of error. Jiang and Lahiri (2006) suggested that the prob- Ericksen, Kadane and Tukey (1989) presented lem goes back almost a millennium to the eleventh a more refined version of the technical arguments century, but interest in formal statistical estimation looking back to the 1980 census, as well as ahead to for small areas is a relatively recent phenomenon the 1990 census. For the 1990 census, the US Census and much of the recent literature can be traced to Bureau essentially proposed the use of a frequen- a seminal article by Fay and Herriot (1979) who tist approach that had similar structure, at least in used the James–Stein “shrinkage” estimation ideas spirit, to that proposed for 1980, and this was pos- to carry out small area estimation in a frequen- sible only by increasing the size of the sample used tist manner. Given the close relationship between for adjustment purposes by an order of magnitude. such techniques and empirical Bayesian estimation This plan was opposed largely on political grounds (e.g., see Efron and Morris, 1973) and mixed linear as well as by Freedman and colleagues who contin- models, it is a relatively small leap to the use of ued to object to the role of statistical models in fully Bayesian methodology. But the evolution to- the estimation procedure. A similar controversy en- ward such methodology documented by Jiang and sued as planning for the 2000 census progressed with Lahiri has been relatively slow and marked by a gen- components for adjustment as well as sampling for eral resistance in statistical agencies to use models nonresponse followup, and ultimately the Supreme BAYESIAN METHODS IN PUBLIC POLICY 5

Court stepped in and interpreted the Census Act as methodology and to analyze the results as they flo- banning the use of sampling for this purpose. Ander- wed in on election night. Early members of the team son and Fienberg (1999) and Anderson et al. (2000) included Bob Abelson, David Brillinger, Dick Link, provided extensive details on the 1990 and 2000 ad- John Mauchly and David Wallace who joined for justment controversies. While American politicians the 1964 primaries. From 1962 through 1966, they have eschewed the use of Bayesian and non-Bayesian were consultants to RCA and they interacted with adjustment techniques, statistical agencies in sev- the political scientists and one-time Census Bureau eral other countries, such as Argentina, Australia official Richard Scammon who had his own method- and the United Kingdom, have implemented similar ology using a collection of key precinct results. methodology, although with little emphasis on its David Brillinger (2002) recalled: “Tukey sought Bayesian motivation. ‘improved’ estimates. His terminology was that the problem was one of ‘borrowing strength’.” There is 4. ELECTION NIGHT FORECASTING a remarkably close resemblance between this metho- dology and that used for small area estimation. The In the United States the use of statistical forecast- novel feature in the election night context comes ing of election outcomes based on early reported re- from the nature of the sparsity—because estimation turns began in the early 1950s. The CBS television was based on early reported returns. The method- network employed one of the early computers, the ology is now recognizable as hierarchical Bayesian UNIVAC, and the Max Woodbury devel- with the use of empirical Bayesian techniques at the oped a regression-style model that was used success- top level. Data flowed in with observations at the fully to predict the outcome of the 1952 presiden- precinct (polling place) level and were aggregated tial election. By 1960, computers had become a ma- to county level, and then to the state as a whole. jor tool of the US television networks in support of Subjective judgment was used in the choice of the their election night coverage. Everything was based subsets of “key” precincts and prior distributions in some form or another on the 150,000+ precincts were typically based on the results of prior state where votes were cast across the US, and attention elections with the choice being made subjectively to focused on subsets of “key” precincts, chosen in dif- capture the political scientists’ best judgment about ferent ways by the three major networks, and on which past election most closely resembled the elec- early access to precinct results. The following de- tion at hand. As early returns arrived at the com- scription draws upon that in the article by Fienberg puting central command facility, a team of statis- (2007). ticians reviewed the actual distribution of early re- In 1960, the RCA Corporation which owned the turns across the state to check for anomalies in light NBC television network, hired CEIR, a statistical of special circumstances and political practices. consulting firm, to develop a rapid election night pro- And estimates that really mattered were those at jection procedure. CEIR consultants included Max the state level since the model was used for statewide Woodbury, and a number of others including John elections for governor and senate positions as well as Tukey. Computers were still large, expensive and for presidential elections where state outcomes play slow, and much of what Max Woodbury had done for a crucial role. Two models were used: one for pro- CBS still had to be done by hand. Data of several jecting turnout and the other for projecting the ac- types were available: past history (at various lev- tual percentage difference (“swing”) between Demo- els, e.g., county), results of polls preceding the elec- cratic and Republican candidates. The occasional tion, political scientists’ predictions, partial county rise of serious independent candidates led to model returns flowing in during the evening, and complete extensions and empirical complications. results for selected precincts. The data of the anal- Brillinger went on to note: “Jargon was developed; yses were, in many cases, swings from sets of base for example, there were ‘barometric’ and ‘swing-o- values derived from past results and from political metric’ precinct samples. The procedures developed scientists’ opinions. It turned out that the impor- can be described as an early example of empiri- tant problem of projecting turnout was more dif- cal Bayes. The uncertainties, developed on a differ- ficult than that of projecting candidate percentage. ent basis, were just as important as the point esti- Starting with the 1962 congressional election, Tukey mates.” The calculations appeared nowhere assembled a statistical team to develop the required in the statistical literature and thus they had to 6 S. E. FIENBERG be derived and verified by members of the team. 5. BAYESIAN METHODOLOGY AND THE US This was at about the same time as David Wal- FOOD AND DRUG ADMINISTRATION lace was working with Frederick Mosteller on their landmark Bayesian study of The Federalist Papers, Traditional randomized clinical trials, evaluated which was published in 1964. Tukey’s attitude to with frequentist methodology, have long been viewed release of the techniques developed is worth com- as the bedrock of the drug and device approval sys- menting on. Brillinger recounted how, on various oc- tem at the US Food and Drug Administration (FDA). casions, members of his “team” were asked to give Over the past couple of decades the drug companies talks and write papers describing the work. When and some members of the US Congress have been Tukey’s permission was sought, his remark was in- critical of the lengthy FDA review processes that variably that it was “too soon” and that the tech- have resulted and the enormous expense associated niques were “proprietary” to RCA and NBC. With with bringing drugs and medical devices to market. Tukey’s death in 2002, we may well have lost the The statistical literature has also produced Bayesian opportunity to learn all of the technical details of randomized design alternatives (e.g., see Spiegelhal- the work done 40 years earlier. ter, Freedman and Parmar 1994; Berry, 1991, 1993, Tukey’s students and his collaborators began to 1997; Berry and Stangl, 1996; Simon, 1999), as well use related ideas on “borrowing strength,” for exam- as ethical critiques of traditional frequentist trials ple, in the National Halothane Study of anesthetics (e.g., see Kadane, 1996). Aside from the actual in- (Bunker et al., 1969) and for the analysis of contin- terpretation of the outcomes in a Bayesian frame- gency table data (e.g., see Bishop, Fienberg and Hol- work, these and other authors have argued that the land, 1975). All of this before the methodology was Bayesian approach can provide faster and more use- described in somewhat different form by I. J. Good ful information in a wide variety of cir- in his 1965 book and christened as “hierarchical cumstances in comparison with frequentist method- Bayes” in the classic 1972 paper by Dennis Lind- ology. ley and Adrian Smith. The specific version of hier- Bayesian designs and analyses are part of an in- archical Bayes in the election night model remained creasing number of premarket submissions to FDA’s unpublished, although in an ironic twist, something Center for Devices and Radiological Health (CDRH). close to it appeared in a paper written by one of This initiative, which began in the late 1990s, takes David Wallace’s former students, Alastair Scott, and advantage of good prior information on safety and a colleague, Fred Smith (1969, 1971), who were un- effectiveness that is often available for studies of the aware of any of the details of Wallace’s work for same or similar recent generation devices. In 2006, NBC and who developed their approach for different CDRH issued draft guidelines for the use of Bayesian purposes! Several other hierarchical Bayesian elec- statistics in clinical trials for medical devices (FDA, tion night forecasting models have now been used in 2006) and these were finalized in 2010 (FDA, 2010). other countries, for example, see the work of Brown, Previous regulatory guidelines have mentioned Baye- Firth and Payne (1997) and Bernardo and Gir´on sian methods briefly, but this was the first broadly (1992). The methods described here were in use at NBC circulated specific document focusing on Bayesian through the 1980 presidential elections. Other net- methodologies. The guidelines do, however, place works used different methodology and the statisti- considerable onus on the drug companies who wish cians who worked for the Tukey team were quite to present Bayesian studies, largely because of justi- proud of their record of early and more accurate calls fiable concerns over selective use of data from within of winners than those made by the other networks, studies and the reporting of results. especially in close elections. With Reagan’s land- As the guidelines make clear, Bayesian formula- slide presidential victory in 1980, the results were tions and methods can improve the assessment of seemingly better captured by exit polls and from new drugs and devices by incorporating expert opin- 1982 onward NBC switched to the use of exit polls ion, results of prior investigations, both in competition and then in collaboration with the and observational studies, and synthesizing results other television networks. See the article by Fien- across concurrent studies. There are sections that berg (2007) for further details and a number of the emphasize the importance of hierarchical models and recent controversies regarding exit poll forecasting the different roles for exchangeability, for example, and reporting. among patients within trials and among trials. We BAYESIAN METHODS IN PUBLIC POLICY 7 quote from the final guidelines on the role of prior agree with the opinions used to generate information: the prior (pages 22–23). We recommend you identify as many sour- The FDA guidelines include examples of Bayesian ces of good prior information as possible. studies that have met agency review standards. Two The evaluation of “goodness” of the prior examples are: information is subjective. Because your 2 trial will be conducted with the goal of Example 1 (T-Scan). T-scan 2000 is a de- FDA approval of a medical device, you vice to be used as an adjunct to mammography for should present and discuss your choice of patients with equivocal results. The FDA was pre- prior information with FDA reviewers (cli- sented with an “intended-use” study of 74 consec- nical, and statistical) before utive biopsies in Italy. The company combined the your study begins. results with those from a prospective double blind Possible sources of prior information in- study at seven centers compared T-scan to T-scan clude: plus mammography for 504 patients, and the results from a “targeted” study of 657 biopsy cases at two • clinical trials conducted overseas, centers in Israel using a Bayesian multinomial logis- • patient registries, tic model. It was able to demonstrate effectiveness • clinical data on very similar products, in intended use context where there was insufficient • pilot studies. information to demonstrate effectiveness. The prior The guidelines go on: was chosen to smooth the zero counts but to be rel- atively diffuse. The device was approved for this use Prior distributions based directly on data as a consequence in 1999. from other studies are the easiest to eval- uate. While we recognize that two stud- Example 2 (Inter Fix).3 Inter Fix is an implant ies are never exactly alike, we nonetheless device for spinal fusion procedure for patients with recommend the studies used to construct degenerative disc disease and back pain. There were the prior be similar to the current study data available for 139 patients in randomized clin- in the following aspects: ical trial, with 77 treated and 62 controls. There • protocol (endpoints, target population, were also 104 nonrandomized subjects treated. An etc.), and interim analysis was performed based on a Bayesian • time frame of the (e.g., predictive model for the future success rate of the to ensure that the practice of device, although most of the other analyses reported and the study populations are compa- appear to be frequentist in nature. The device was rable). approved in 1999 as well. In some circumstances, it may be helpful CDRH statisticians have been exploring and lec- if the studies are also similar in investiga- turing on important lessons learned in the course of tors and sites. Include studies that are fa- the Bayesian initiative for the design, conduct and vorable and nonfavorable. Including only analysis of medical devices studies such as the two favorable studies creates bias. Bias, based outlined here. on study selection may be evaluated by: Although the two studies described above made use of the pooling of , in many ways the key • the representativeness of the studies that benefit of Bayesian methods is the ability it offers to are included, and • the reasons for including or excluding change the study’s course when the welfare of sub- each study. jects is at stake—using what is known as adaptive . As Don Berry has argued: Prior distributions based on expert opin- ion rather than data can be problematic. 2 http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfTopic/ Approval of a device could be delayed or pma/pma.cfm?num=p970033. jeopardized if FDA advisory panel mem- 3 http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfTopic/ bers or other clinical evaluators do not pma/pma.cfm?num=p970015. 8 S. E. FIENBERG

In a multiyear frequentist study, new pa- Another place at the FDA where Bayesian method- tients will have the same chance of be- ology has recently come into vogue is in the post- ing enrolled in either group, regardless of approval surveillance of drugs and devices, especially whether the new or old drug is perform- with regard to side effects. DuMouchel (1999) dis- ing better. This approach can put patients cussed hierarchical Bayesian models for analyzing at a disadvantage. A Bayesian model, on a very large table that cross-classifies ad- the other hand, can periodically show re- verse events by type of drug used. Madigan et al. searchers that one arm is outperforming (2010) described a more elaborate, large-scale ap- the other and then put more new volun- proach to the analysis of adverse event data gath- teers into the better arm. (Don Berry quo- ered via spontaneous reporting systems linked to ted in Beckman, 2006) claims databases. It is worth noting that Bayesian methods have been As is the case in other applications, at the FDA used in innovative ways to study the combination of the main criticism of the Bayesian approach is the evidence across studies on matters directly before the difficulty associated with the choice of the prior. FDA. On the advice of an expert panel, the FDA in Spiegelhalter, Freedman and Parmar (1994) stressed 2004 put a “black-box” warning—its highest warning the use of different forms of priors such as reference level—on antidepressants for pediatric use especially priors, “clinical” priors, “skeptical” priors, and en- among teenagers. The panel’s advice was based not thusiastic priors. The FDA guidelines clearly argue on actual suicides, but on indications that suicidal against “subjective” expert opinion, but as we know thoughts and behaviors increased in some children from other settings the is often and teens taking newer selective serotonin reuptake at least as subjective as is the prior and hierarchical inhibitor (SSRI)-type antidepressants. Kaizar et al. Bayesian structures impose substantial constraints (2006) later addressed the combination of evidence on the prior and thus the posterior even when one using a hierarchical Bayesian meta-analytical ap- uses “diffuse” distributions on the parameters at the proach. They concluded that the evidence support- highest levels of the hierarchy! Moreover, when one ing a causal link between SSRI-type antidepressant is drawing upon previous studies, there is always an use and suicidality in children is weak. This will issue of how much “weight” these should receive in clearly be evidence that the FDA will need to con- the prior, especially if the previous studies did not sider when it next reviews this issue, as it surely will, involve randomization as in Example 2. because of subsequent observational studies that sug- Unfortunately, as these ideas move to other parts gest teen suicides have increased considerably de- of the FDA they are not without controversy. While spite a substantial decrease in the use of antidepres- we were completing this article, a new controversy sants (e.g., see Gibbons et al., 2007). over a specific drug made news. Vasogen Inc. an- Finally we note the extensive applications of a ran- nounced that on Friday, March 14, 2008 it had an ge of Bayesian methods in the related matters of initial teleconference with the FDA to discuss and health technology assessment as described by Spie- clarify the recent FDA comments regarding the use gelhalter et al. (2000) and Spiegelhalter (2004). of a Bayesian approach for ACCLAIM II, a clinical trial which is being planned to support an applica- TM 6. CONFIDENTIALITY AND THE tion for US market approval of the Celacade Sys- –UTILITY TRADE-OFF tem for the treatment of patients with New York Heart Association Class II heart failure.4 Oversight Protecting the confidentiality of data provided by of the drug approval had shifted from CDRH—which individuals and establishments has been and contin- had issued the guidelines for use of Bayesian me- ues to be a major preoccupation of statistical agen- thods—to the FDA Center for Biologics Evaluation cies around the world. Over the past 30 years, statis- and Research (CBER), which has adopted a far more ticians within and outside a number of major agen- cautious approach. How such issues will work them- cies have worked to cast the confidentiality problem selves out remains to be seen. as a statistical one, and over the past decade this ef- fort has taken on substantial Bayesian overtones as 4FDA deals blow to Vasogen’s heart treatment, Reuters, the focus has shifted to the trade-off between risk as- March 3, 2008. sociated with protection of confidentiality and the BAYESIAN METHODS IN PUBLIC POLICY 9 utility of databases for different kinds of statisti- thetic (imputed) samples from the posterior and to cal analyses. See the articles in the book by Doyle, use these samples to produce estimates of variabil- Theeuwes and Zayatz (2001) for a broad review of ity that have a frequentist interpretation. Raghu- the literature as it stood about a decade ago. nathan, Reiter and Rubin (2003) and authors of Some of the earlier confidentiality literature fo- a number of subsequent articles described the for- cused on the protection of data against intruders malisms of the methodology as well as extensions or “data snoopers” and Fienberg, Makov and Sanil involving only partially imputed data. Because sta- (1997) proposed modeling intruder behavior (and tistical agencies in the US were already - thus protection against it) using a subjective Baye- ing with multiple imputation to deal with missing sian “” model; cf. the discussion of Bayesian value problems, a number of them have recently ex- “matching” methods in the book by D’Orazio, Di perimented with this technology for confidentiality Zio and Scanu (2006). In 2001, Duncan et al. sug- protection as well. Since the methodology works for gested a Bayesian approach to the risk–utility trade- fairly general classes of prior distributions it could off problem, which was later generalized in the con- utilize, at least in principle, prior information from text of a formal statistical model multiple sources as well as expert judgment. by Trottini and Fienberg (2002) and implemented in illustrative form by Dobra, Fienberg and Trot- 7. CLIMATE CHANGE AND ITS ABATEMENT tini (2003) in the context of protecting categorical databases. By now there is hardly a literate person who has More recently, Ting, Fienberg and Trottini (2008) not heard about global warming and the dire con- contrasted their method of random orthogonal ma- sequences predicted if we do not change our behav- trix masking with other microdata perturbation me- ior regarding the emission of greenhouse gases and thods, such as additive noise, from the Bayesian per- aerosols. The following statements are typical and spective of the trade-off between disclosure risk and come from a report to the US Senate by Thomas data utility. This work has yet to be adopted by Karl (2001), a senior official in the National Oceanic statistical agencies, but related Bayesian modeling and Atmospheric Administration: in the same spirit by Franconi and Stander (2002), • The natural “greenhouse” effect is real, and is an Polettini and Stander (2004), Rinott and Shlomo essential component of the planet’s climate pro- (2007) and Forster and Webb (2007) has been done cess. in close collaboration with those in agencies in Is- • Some greenhouse gases are increasing in the atmo- rael, Italy and the United Kingdom. sphere because of human activities and increas- One other Bayesian approach to confidentiality ingly trapping more heat. protection which has already seen successful pen- • The increase in heat-trapping greenhouse gases etration into US statistical agencies is based on the due to human activities are projected to be ampli- method of multiple imputation approach due orig- fied by feedback effects, such as changes in water inally to and proposed by him for vapor, snow cover, and sea ice. application in the context of protecting confiden- • Particles (or aerosols) in the atmosphere resulting tiality in 1993. See the article by Fienberg, Makov from human activities can also affect climate. and Steele (1998) for a related proposal. The ba- • There is a growing set of observations that yields sic idea is simple although the details of the imple- a collective picture of a warming world over the mentation can be complex. We want to replace the past century. actual confidential data by simulated data drawn • It is likely that the frequency of heavy and ex- from the posterior distribution of a model that cap- treme precipitation events has increased as global tures the relationships among the variables to be temperatures have risen. released. Since these “sampled units” are synthetic • Scenarios of future human activities indicate conti- and do not actually correspond to original sample nued changes in atmospheric composition through- members, proponents claim that the resulting data out the 21st century. protect confidentiality by definition—others point out that synthetic people may be close enough to These and similar conclusions have been shared with “real” sample members for there still to be problems the public by the Intergovernmental Panel on Cli- of possible re-identification. The method of multi- mate Change (IPCC) and the US National Academy ple imputation allows one to generate multiple syn- of –National Research Council through a se- 10 S. E. FIENBERG ries of committee reports. Many of the statements Whether in the context of this work, or in many are backed up by elaborate statistical assessments other efforts to forecast future temperatures, Baye- and modeling and over the past decade this work has sian and non-Bayesian, almost all modeling efforts taken on an increasingly Bayesian flavor. There have agree that temperatures will continue to rise. Where also been challenges to many of these statements, the principal disagreements come in is “by how much” despite what the “global warming” proponents de- and “what would be the impact by various strategies scribe as increasingly strong empirical support. See, for abatement.” for example, the report by Wegman, Scott and Said It is worth noting that subjective Bayesian meth- (2006) for a statistical critique of some recent mod- ods were proposed for use in climate modeling as eling efforts. early as 1997 by Hobbs and the prominence of Baye- In Figure 1 we reproduce an example of the tem- sian arguments is due not only to statisticians work- perature reconstruction for the past 2000 years based ing in this area but also to climate modeling special- on multiple sources prepared by a panel from the ists such as Schneider (2002), who has noted: National Research Council (2006); see also National For three decades, I have been debating Academy of Sciences (2008). One thing that is ob- alternative solutions for sustainable devel- vious from this figure is the convergence of the data opment with thousands of fellow scientists sources for the past 150 years, from the start of the and policy analysts—exchanges carried out industrial revolution, showing temperatures increas- in myriad articles and formal meetings. ing substantially throughout recent times—this is Despite all that, I readily confess a lin- global warming! What is also clear is the uncer- gering frustration: uncertainties so infuse tainty associated with these reconstructions going the issue of climate change that it is still back further in time—this is indicated by the shad- impossible to rule out either mild or catas- ing in the background of the figure, with darkness trophic outcomes, let alone provide confi- associated with greater uncertainty; cf. the article dent probabilities for all the claims and by Chu (2005). counterclaims made about environmental The precise trajectory of the recent increases in problems. temperature clearly has substantial uncertainty Even the most credible international as- across the data sources and models and it would sur- sessment body, the Intergovernmental Pa- prise few of us to learn that projections from these nel on Climate Change (IPCC), has refu- data can vary dramatically. This has recently been sed to attempt subjective probabilistic esti- the focus of intensive Bayesian analysis by a number mates of future temperatures. This has for- of authors around the world; see, for example, the ced politicians to make their own guesses articles by Min and Hense (2006, 2007), and espe- about the likelihood of various degrees of cially work in the United States by Berliner, Levine global warming. Will temperatures in 2100 and Shea (2000), Tebaldi et al. (2005) and Sanso, increase by 1.4 degrees Celsius or by 5.8? Forest and Zantedeschi (2008). The difference relatively adaptable Tebaldi, Smith and Sans´o(2010) described a way changes or very damaging ones. . . to combine an ensemble of computer simulation mo- So what then is “the real state of the del results and projections and actual observations world”? Clearly, it isn’t knowable in tradi- via hierarchical modeling in order to derive poste- tional statistical terms, even though sub- rior probabilities of temperature and precipitation jective estimates can be responsibly of- change at regional scale. They considered the ensem- fered. The ranges presented by the IPCC ble of computer models as being drawn from a su- in its peer-reviewed reports give the best perpopulation of such models, and used hierarchical snapshot of the real state of climate chan- Bayesian models to combine results and compute ge: we could be lucky and see a mild effect the posterior predictive distribution for a new cli- or unlucky and get the catastrophic out- mate model’s projections along with the uncertainty comes. to be associated with them. For a related discussion about assessing the uncertainties of projections, see The IPCC assessment builds on formal and infor- the article by Chandler, Rougier and Collins (2010). mal use of subjective assessments of the evidence. BAYESIAN METHODS IN PUBLIC POLICY 11

Borehole temperatures (Huang et al., 2000) Glacier lengths (Oerlemans, 2005b) Multiproxy (Mann and Jones, 2003a) Multiproxy (Moberg et al., 2005a) Multiproxy (Hegerl et al., 2006) Tree rings (Esper et al., 2002a) Instrumental record (Jones et al., 2001)

Fig. 1. Smoothed reconstructions of large-scale (Northern Hemisphere or global mean) surface temperature variations from six different research teams are shown along with the instrumental record of global mean surface temperature. Source: Figure S-1, National Research Council (2006), page 2. Reproduced with permission.

There is in fact now a tradition in this field of ex- ADLs and IADLs, represented in the form of a 216 pert elicitation of expert judgments; for example, using a Bayesian latent variable see the articles by Morgan and Keith (1995), Keith model that was developed to be an analogue to the (1996) and Zickfeld et al. (2007). frequentist Grade of Membership (GoM) model of Manton, Woodbury and Tolley (1994), the likeli- 8. DISABILITY AMONG THE ELDERLY hood function for which is notoriously problematic. In the United States, there are no official gov- The Bayesian version of the GoM model utilizes ernment surveys of disability and how it is chang- hierarchical modeling ideas through a layered latent ing over time, but the National Institute on Ag- variable structure. Let x = (x1,x2,...,xJ ) be a vec- ing (NIA) has funded, with support of other gov- tor of binary manifest variables. The GoM model is ernment agencies, two major longitudinal surveys structured around K mixture components (extreme that capture information on disability and link it profiles), and it assigns to each individual a latent to other data—the Health and Retirement Survey partial membership vector of K nonnegative ran- (HRS) and the National Long Term Care Survey dom variables, g = (g1, g2,...,gK), whose compo- (NLTCS). The original cohort for the NLTCS was nents sum to 1. By assigning a distribution D(g) to surveyed in 1982 and there have been subsequent wa- the vector g and integrating, we obtain the marginal ves in 1984, 1989, 1994, 1999 and 2004. The NLTCS distribution for individual response patterns in the has been managed by a university-based organiza- form of individual-level mixtures. Erosheva, Fien- tion since the late 1980s, but actual data collection berg and Joutard explained how to fit this Bayesian has been carried out by the US Census Bureau. Con- GoM model using MCMC techniques and apply it 16 siderable interest in the NLTCS has focused on a se- to the data in the 2 contingency table displaying ries of measures of disability know as “Activities of outcomes on the 16 ADLs and IADLs, treating these Daily Living” (ADLs) and “Instrumental Activities different measures of disability as exchangeable, and of Daily Living” (IADLs), especially for those in the thus as if they were independent and drawn from sample exhibiting some dimension of disability on another common distribution. Airoldi et al. (2007, a screener question. Erosheva, Fienberg and Joutard 2010) explored related aspects of model specification (2007) studied a cross-sectional version of 16 binary and model choice. As with a number of the earlier 12 S. E. FIENBERG examples, the hierarchical latent structure embed- clusively descriptive or dominated by the frequentist ded in this modeling approach is a mechanism for approach that followed from the work of Fisher and gaining control over what might otherwise be an un- from Neyman and Pearson. With the neo-Bayesian manageable number of parameters and essential to revival of the 1950s, Bayesian methods and tech- the success of the related methods. niques slowly began to appear in the public arena, This work on disability opens the door to a num- and their use has accelerated dramatically during ber of challenging problems for the Bayesian mod- the past two decades, especially with the rise of eling community. For example: MCMC methods that have allowed for the sampling from posterior distributions in settings involving very • How should a Bayesian working with hierarchical models such as the Bayesian GoM model incorpo- large datasets. rate the survey weights that arise from the sam- In this article, we have attempted to give some pling scheme of the survey and adjustments for examples, both old and new, of Bayesian methods nonresponse? There is now an extensive literature in statistical practice in government and public pol- that provides conflicting advice on the use of sur- icy settings and to suggest why in most of the cases vey weights in the Bayesian framework, but the there was ultimately little or no resistance to the hierarchical model complexities bring these issues Bayesian approach. Our examples have included cen- into somewhat sharper focus in this setting; for sus-taking and small area estimation, US election example, see the contrasting arguments of Fien- night forecasting, studies reported to the US Food berg (2009) and Little (2009). and Drug Administration, assessing global climate • Manrique-Vallier and Fienberg (2010) extended change and measuring declines in disability among these ideas to longitudinal latent profiles applied the elderly. Their diversity suggests that there is to the six ADLs measured across all six waves of growing recognition of the value of Bayesian results, the survey, and Manrique-Vallier (2010) added in and a realization that the approach deals directly survival and generational effects to address the with questions of substantive interest. question of whether disability is increasing or de- Where there has been controversy, it has largely fo- creasing over time. He appeared to be able to cused on the role of the choice of prior distributions capture characteristics that others have addressed and the appropriateness of “borrowing strength” using comparisons across cross-sections for each across geographic boundaries. Arguments in favor of wave of the survey (see, e.g., Manton and Gu, the use of “objective” priors have done little to stem 2001; Manton, Gu and Lamb, 2006). Scaling these the frequentist criticism of Bayesian methods, and methods up to the full array of ADLs and IADLs typically ignore the highly subjective aspects of ele- with key covariates remains a major challenge. ments on hierarchical structures and likelihood func- This is a matter of considerable interest to policy tions. Through the examples discussed here, we have planners who are interested in forecasting future tried to convey the fact that a pragmatic Bayesian demands on the health-care infrastructure as a re- approach inevitably includes many subjective ele- sult of changes in long-term disability over time. ments, although prior distributions may well draw on data from related settings and have an empirical The Bayesian GoM model is a special case of flavor to them. Nonetheless, the principal challenge a much larger class of mixed membership models to Bayesian methods that remains is the need to that can be used to analyze a diverse array of data constantly rebut the notion that frequentist meth- types ranging from text in documents to images, to ods are “objective” and thus more appropriate for linkages in networks, and longitudinal versions may use in the public domain. prove applicable in other settings beyond the study In other areas of statistical application Bayesian of disability. methodology has also seen a major resurgence and this is especially true in connection with machine 9. CONCLUSION learning approaches to very large datasets, where For much of the twentieth century, approaches to the use of hierarchically structured latent variable the design and analysis of statistical studies in gov- models is essential to generating high-quality esti- ernment settings and public policy were almost ex- mates and predictions. BAYESIAN METHODS IN PUBLIC POLICY 13

ACKNOWLEDGMENTS Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (with contributions by R. J. Light and F. Mosteller) Supported in part by NIH Grant R01 AG023141- (1975). Discrete Multivariate Analysis: Theory and Prac- 01 to the Department of Statistics and Army con- tice. MIT Press, Cambridge, MA. MR0381130 tract DAAD19-02-1-3-0389 to CyLab at Carnegie Brillinger, D. R. (2002). John W. Tukey: His life and Mellon University and by NSF Grants EIA9876619 professional contributions. Ann. Statist. 30 1535–1575. and IIS0131884 to the National Institute of Statis- MR1969439 tical Sciences. Brown, P. J., Firth, D. and Payne, C. (1997). Forecasting on British election night 1997. J. Roy. Statist. Soc. Ser. A REFERENCES 162 211–226. Bunker, J. P., Forrest Jr., W. H., Mosteller, F. and Airoldi, E., Fienberg, S. E., Joutard, C. and Love, T. Vandam, L. D., eds. (1969). The National Halothane (2007). Discovering latent patterns with hierarchical Study, Report of the Subcommittee on the National Bayesian mixed-membership models. In Data Mining Halothane Study of the Committee on Anesthesia. Divi- Patterns: New Methods and Applications (F. Masseglia, sion of Medical Sciences, National Academy of Sciences— P. Poncelet and M. Teisseire, eds.) 240–275. Information National Research Council, US Government Printing Of- Science Reference (IGI Global), Hershey, PA. fice, Washington, DC. Airoldi, E., Fienberg, S. E., Joutard, C. and Love, T. Chandler, R., Rougier, J. and Collins, M. (2010). Cli- (2010). Hierarchical Bayesian mixed-membership models mate change: Making certain what the uncertainties are. and latent pattern discovery. In Frontier of Statistical De- Significance 7 9–12. cision Making and Bayesian Analysis (M.-H. Chen et al., Chu, S. (2005). Biological solution to the energy crisis. eds.) 360–376. Springer, New York. AAPPS Bull. 15 2–11. Anderson, M., Daponte, B. O., Fienberg, S. E., Kada- D’Orazio, M., Di Zio, M. and Scanu, M. (2006). Sta- ne, J. B., Spencer, B. D. and Steffey, D. (2000). tistical Matching Theory and Practice. Wiley, Chichester. Sample-based adjustment of the 2000 census—A balanced MR2268833 perspective. 40 341–356. de Finetti, B. (1937). La pr´evision: ses lois logiques, ses Anderson, M. and Fienberg, S. E. (1999). Who Counts? sources subjectives. Ann. Inst. Henri Poincar´e. Translated The Politics of Census-Taking in Contemporary America. into English as Foresight: Its logical , its subjective Russell Sage Foundation, New York. sources. In Studies in Subjective Probability (H. E. Kyburg Ballin, M., Scanu, M. and Vicard, P. (2005). Bayesian and H. E. Smokler, eds.) 93–158. Wiley, New York, 1964. networks and complex survey sampling from finite pop- MR0179814 ulations. 2005 FCSM Conference Papers, Federal Com- Diaconis, P. and Freedman, D. A. (1986). On the consis- mittee on Statistical Methodology, US Office of Man- tency of Bayes’ estimates (with discussion). Ann. Statist. agement and Budget. Available at http://www.fcsm.gov/ 14 1–87. MR0829555 05papers/Ballin Scanu Vicard IIC.pdf. Dobra, A., Fienberg, S. E. Trottini, M. Bayes, T. (1763). An essay towards solving a problem in the and (2003). As- doctrine of chances. Philos. Trans. Roy. Soc. London 53 sessing the risk of disclosure of confidential categorical data 370–418. (with discussion). In Bayesian Statistics 7 (J. Bernardo et Beckman, M. (2006). Are Bayes’ days upon us? Statistical al., eds.) 125–144. Oxford Univ. Press, Oxford. MR2003170 2 methods could change the conduct of clinical trials. J. Nat. Donnelly, P. (2005). Appealing statistics. Significance 46– Cancer Inst. 98 1512–1513. 48. MR2224084 Berger, J. O. (2006). The case for objective Bayesian anal- Doyle, P., Lane, J., Theeuwes, J. and Zayatz, L., eds. ysis. Bayesian Anal. 1 385–402. MR2221271 (2001). Confidentiality, Disclosure and Data Access: The- Berliner, L. M., Levine, R. A. and Shea, D. J. (2000). ory and Practical Applications for Statistical Agencies. El- Bayesian climate change assessment. J. Climate 13 3806– sevier, Amsterdam. 3820. Duncan, G. T., Fienberg, S. E., Krishnan, R., Pad- Bernardo, J. M. and Giron,´ F. J. (1992). Robust se- man, R. and Roehrig, S. F. (2001). Disclosure limitation quential from non-random samples: The election methods and information loss for tabular data. In Confi- night forecasting. In Bayesian Statistics 4 (J. M. Bernardo, dentiality, Disclosure and Data Access: Theory and Practi- J. O. Berger, D. V. Lindley and A. F. M. Smith, eds.) 3–60. cal Applications for Statistical Agencies (P. Doyle, J. Lane, Oxford Univ. Press, Oxford. MR1380270 J. Theeuwes and L. Zayatz, eds.) 135–166. Elsevier, Ams- Berry, D. A. (1991). Bayesian methods in phase III trials. terdam. Drug Inform. J. 25 345–368. DuMouchel, W. (1999). Bayesian data mining in large fre- Berry, D. A. (1993). A case for Bayesianism in clinical trials. quency tables, with an application to the FDA spontaneous Stat. Med. 12 1377–1393. reporting system. Amer. Statist. 53 177–202. Berry, D. A. (1997). Using a Bayesian approach in medical Efron, B. and Morris, C. N. (1973). Stein’s estimator and device development. Technical report. Available from Di- its competitors—an empirical Bayesian approach. J. Amer. vision of , Center for Devices and Radiological Statist. Assoc. 68 117–130. MR0388597 Health, FDA. Ericksen, E. P. and Kadane, J. B. (1985). Estimating the Berry, D. A. and Stangl, D. K., eds. (1996). Bayesian population in a census year: 1980 and beyond (with discus- Biostatistics. Dekker, New York. sion). J. Amer. Statist. Assoc. 80 927–943. 14 S. E. FIENBERG

Ericksen, E. P., Kadane, J. B. and Tukey, J. W. (1989). Franconi, L. and Stander, J. (2002). A model based Adjusting the 1980 census of population and housing. method for disclosure limitation of business microdata. The J. Amer. Statist. Assoc. 84 927–943. Statistician 51 51–61. MR1891578 Erosheva, E. A., Fienberg, S. E. and Joutard, C. Freedman, D. A. (1995). Some issues in the foundation of (2007). Describing disability through individual-level mix- statistics (with discussion). Foundations of Science 1 19– ture models for multivariate binary data. Ann. Appl. Stat. 83. MR1798108 1 502–537. MR2415745 Freedman, D. A. and Navidi, W. C. (1986). Regression Esper, J., Cook, E. R. and Schweingruber, F. H. models for adjusting the 1980 census (with discussion). (2002a). Low-frequency signals in long tree-ring chronolo- Statist. Sci. 1 1–39. gies for reconstructing past temperature variability. Sci- Gibbons, R. D., Brown, C. H., Hur, K., Marcus, S. M., ence 295 2250–2253. Bhaumik, D. K., Erkens, J. A., Herings, R. M. C. Fay, R. E. and Herriot, R. A. (1979). Estimation of in- and Mann, J. J. (2007). Early evidence on the effects of come for small places: An application of James–Stein pro- regulators’ suicidality warnings on SSRI prescriptions and 74 cedures to census data. J. Amer. Statist. Assoc. 269–277. suicide in children and adolescents. Amer. J. Psych. 164 MR0548019 1356–1363. Fienberg, S. E. (2006a). When did Bayesian inference be- Goldstein, M. (2006). Subjective Bayesian analysis: Princi- 1 come “Bayesian”? Bayesian Anal. 1–40. MR2227361 ples and practice. Bayesian Anal. 1 403–420. MR2221272 Fienberg, S. E. (2006b). Does it make sense to be an “ob- Good, I. J. (1965). The Estimation of Probabilities. MIT jective Bayesian”? (Comment on Articles by Berger and by Press, Cambridge, MA. MR0185724 1 Goldstein.) Bayesian Anal. 429–432. MR2221275 Hegerl, G. C., Crowley, T. J., Hyde, W. T. and Fienberg, S. E. (2007). Memories of election night predic- Frame, D. J. (2006). Climate sensitivity constrained by tions past: Psephologists and statisticians at work. Chance temperature reconstructions over the past seven centuries. 20 6–15. MR2416414 Nature 440 1029–1032. Fienberg, S. E. (2009). The relevance or irrelevance of Hobbs, B. F. (1997). Bayesian methods for analysing cli- weights for confidentiality and statistical analyses. J. Pri- mate change and water resource uncertainties. J. Environ. vacy Confidentiality 1 183–195. Manag. 49 53–72. Fienberg, S. E. and Kadane, J. B. (1983). The presenta- Huang, S. P., Pollack, H. N. and Shen, P.-Y. (2000). tion of Bayesian statistical analyses in legal proceedings. Temperature trends over the past five centuries recon- Statistician 32 88–98. structed from borehole temperatures. Nature 403 756–758. Fienberg, S. E., Makov, U. E. and Sanil, A. P. (1997). Jiang, J. and Lahiri, P. (2006). prediction and A Bayesian approach to data disclosure. Optimal intruder small area estimation. Test 15 1–96. MR2252522 behavior for continuous data. J. Official Statist. 13 75–90. Jones, P. D., Osborn, T. J., Briffa, K. R., Fol- Fienberg, S. E., Makov, U. E. and Steele, R. J. (1998). land, C. K., Horton, E. B., Alexander, L. V., Disclosure limitation using perturbation and related meth- Parker, D. E. and Rayner, N. A. (2001). Adjusting for ods for categorical data (with discussion). J. Official 14 sampling density in grid box land and ocean surface tem- Statist. 485–502. 106 Fienberg, S. E. and Tanur, J. M. (1996). Reconsidering perature . J. Geophys. Res. 3371–3380. Kadane, J. B. the fundamental contributions of Fisher and Neyman on (1996). Bayesian Methods and Ethics in experimentation and sampling. Int. Statist. Rev. 64 237– a Clinical Trial Design. Wiley, New York. 253. Kadane, J. B. (2008). Statistics in the Law: A Practitioner’s Food and Drug Administration (2006). Guidance for in- Guide, Cases, and Materials. Oxford Univ. Press, Oxford. dustry and FDA staff: Guidance for the use of Bayesian MR2367407 statistics in medical device clinical trials. Office of Surveil- Kaizar, E. E., Greenhouse, J. B., Seltman, H. and lance and Biometrics, Division of Biostatistics, Center for Kelleher, K. (2006). Do antidepressants cause suicidal- 3 Devices and Radiological Health, Food and Drug Admin- ity in children? A Bayesian meta-analysis. Clin. Trials istration, US Dept. Health and Human Services (issued 73–98. February 5, 2010). Kardaun, O. J. W. F., Salome,´ D., Schaafsma, W., Food and Drug Administration (2010). Draft guidance Steerneman, A. G. M., Willems, J. C. and Cox, D. R. for industry and FDA staff: Guidance for the use of (2003). Reflections on fourteen cryptic issues concerning Bayesian statistics in medical device clinical trials. Office the nature of statistical inference (with discussion by T. of Surveillance and Biometrics, Division of Biostatistics, Schweder and J. M. Bernardo). Int. Statist. Rev. 71 277– Center for Devices and Radiological Health, Food and 318. Drug Administration, US Dept. Health and Human Ser- Karl, T. (2001). Testimony before the US Senate Commit- vices. Available at http://www.fda.gov/MedicalDevices/ tee on Governmental Affairs, July 18, 2001. Available at DeviceRegulationandGuidance/GuidanceDocuments/ http://www.senate.gov/~govt-aff/071801_karl.htm. ucm071072.htm. Keith, D. W. (1996). When is it appropriate to combine Forster, J. J. and Webb, E. L. (2007). Bayesian disclo- expert judgments? Climatic Change 33 139–143. sure risk assessment: Predicting small frequencies in con- Lindley, D. V. and Smith, A. F. M. (1972). Bayes es- tingency tables. J. Roy. Statist. Soc. Ser. C 56 551–570. timates for the (with discussion). J. Roy. MR2405419 Statist. Soc. Ser. B 34 1–41. MR0415861 BAYESIAN METHODS IN PUBLIC POLICY 15

Little, R. J. A. (2009). Weighting and prediction in sample the method of purposive selection (with discussion). J. Roy. surveys (with discussion). Calcutta Statist. Assoc. Bull. 60 Statist. Soc. Ser. B 97 558–606. 1–48. MR2553424 Oerlemans, J. (2005b). Global Glacier length tempera- Madigan, D., Ryan, P., Simpson, S. and Zorych, I. (2010). ture reconstruction. IGBP PAGES/World Data Center Bayesian methods for drug safety surveilance. In Bayesian for Paleoclimatology. Data Contribution Series #2005-059. Statistics 9 (J. Bernardo et al., eds.). Oxford Univ. Press, NOAA/NCDC Paleoclimatology Program, Boulder, CO. Oxford. To appear. Polettini S. and Stander, J. (2004). A Bayesian hierar- Mann, M. E. and Jones, P. D. (2003a). 2,000 year Hemi- chical model approach to risk estimation in statistical dis- spheric multi-proxy temperature reconstructions. IGBP closure limitation. In Privacy in Statistical Databases (J. PAGES/World Data Center for Paleoclimatology Data Domingo-Ferrer and V. Torra, eds.). Lecture Notes in Com- Contribution Series #2003-051. NOAA/NGDC Paleocli- put. Sci. 3050 247–261. Springer, Berlin. matology Program, Boulder, CO. Raghunathan, T. E., Reiter, J. P. and Rubin, D. B. Manrique-Vallier, D. (2010). Longitudinal mixed member- (2003). Multiple imputation for statistical disclosure limi- ship models with applications. Ph.D. dissertation, Dept. tation. J. Official Statist. 19 1–16. Statistics, Carnegie Mellon Univ. Rao, J. N. K. (2003). Small Area Estimation. Wiley, New Manrique-Vallier, D. and Fienberg, S. E. (2010). Longi- York. MR1953089 tudinal mixed-membership models for survey data on dis- Rinott, Y. and Shlomo, N. (2007). A smoothing model ability. In Longitudinal Surveys: From Design to Analy- for sample disclosure risk estimation. In Complex Datasets sis: Proceedings of XXV International Methodology Sym- and Inverse Problems: Tomography, Networks and Beyond. posium, 2009. Statistics Canada, Ottawa, QC, Canada. IMS Lecture Notes Monogr. Ser. 54 161–171. IMS, Beach- Manton, K. G. and Gu, X. (2001). Changes in the preva- wood, OH. MR2459186 lence of chronic disability in the United States black and Rubin, D. B. (1993). Discussion: Statistical disclosure limi- nonblack population above age 65 from 1982 to 1999. Proc. tation. J. Official Statist. 9 462–468. MR1207984 Natl. Acad. Sci. USA 98 6354–6359. Sanso´ B., Forest, C. E. and Zantedeschi, D. (2008). In- Manton, K. G., Gu, X. and Lamb, V. L. (2006). Change in ferring climate system properties using a computer model chronic disability from 1982 to 2004/2005 as measured by (with discussion). Bayesian Anal. 3 1–62. MR2383247 long-term changes in function and health in the US elderly Schneider, S. H. (2002). Global warming: Neglecting the population. Proc. Natl. Acad. Sci. USA 103 18374–18379. complexities. Sci. Amer. 286 62–65. Manton, K. G., Woodbury, M. A. and Tolley, H. D. Scott, A. and Smith, T. M. F. (1969). Estimation for multi- (1994). Statistical Applications Using Fuzzy Sets. Wiley, stage surveys. J. Amer. Statist. Assoc. 64 830–840. New York. MR1269319 Scott, A. and Smith, T. M. F. (1971). Bayes estimates for Min. S.-K. and Andreas Hense, A. (2006). A Bayesian as- subclasses in stratified sampling. J. Amer. Statist. Assoc. sessment of climate change using multimodel ensembles. 66 834–836. Part I: Global mean surface temperature. J. Climate 19 Simon, R. (1999). Bayesian design and analysis of active con- 2769–2790. trol clinical trials. Biometrics 55 484–487. Min. S.-K. and Andreas Hense, A. (2007). A Bayesian as- Spiegelhalter, D. J. (2004). Incorporating Bayesian ideas sessment of climate change using multimodel ensembles. into health-care evaluation. Statist. Sci. 19 156–174. II: Regional and seasonal mean surface temperatures. J. MR2086325 Climate 20 3237–3256. Spiegelhalter, D. J., Freedman, L. S. and Parmar, Moberg, A., Sonechkin, D. M., Holmgren, K., Dat- M. K. B. (1994). Bayesian approaches to randomized tri- senko, N. M. and Karlen, W. (2005a). 2,000-year als. J. Roy. Statist. Soc. Ser. A 157 356–416. MR1321308 Northern Hemisphere temperature reconstruction. IGBP Spiegelhalter, D. J., Myles, J. P., Jones, D. R. and PAGES/World Data Center for Paleoclimatology Data. Abrams, K. R. (2000). Bayesian method in health technol- Contribution Series #2005-019. NOAA/NGDC Paleocli- ogy assessment: A review. Health Technology Assessment 4 matology Program, Boulder, CO. 1–130. Morgan, M. G. and Keith, D. W. (1995). Subjective judg- Taroni, F., Aitkin, C., Garbolino, P. and Biedermann, ments by climate experts. Environ. Sci. Technol. 29 A468– A. (2006). Bayesian Networks and Probabilistic Inference A476. in Forensic Science. Wiley, Chichester. MR2271510 Mosteller, F. and Wallace, D. L. (1964). Inference Tebaldi, C., Smith, R. L., Nychka, D. and Mearns, L. O. and Disputed Authorship: The Federalist. Addison-Wesley, (2005). Quantifying uncertainty in projections of regional Reading, MA. MR0175668 climate change: A Bayesian approach to the analysis of National Academy of Sciences (2008). Understanding multimodel ensembles. J. Climate 18 1524–1540. and Responding to Climate Change. Highlights of National Tebaldi, C., Smith, R. L. and Sanso,´ B. (2010). Character- Academies Reports. National Academy Press, Washington, izing uncertainty of future climate change projections using DC. hierarchical Bayesian models. In Bayesian Statistics 9 (J. National Research Council (2006). Surface Tempera- Bernardo et al., eds.). Oxford Univ. Press, Oxford. To ap- ture Reconstructions for the Last 2,000 Years. National pear. Academy Press, Washington, DC. Ting, D., Fienberg, S. E. and Trottini, M. (2008). Ran- Neyman, J. (1934). On the two different aspects of the rep- dom orthogonal matrix masking methodology for micro- resentative method: The method of stratified sampling and data release. Int. J. Inform. Comput. Secur. 2 86–105. 16 S. E. FIENBERG

Trevisani, M. and Torelli, N. (2004). Small area es- climate reconstruction. A Report to the House Com- timation by hierarchical Bayesian models: Some prac- mittee on Energy and Commerce and House Sub- tical and theoretical issues. In Atti della XLII Riu- committee on Oversight and Investigations. Avail- nione Scientifica, Societ`aItaliana di Statistica 273–276. able at http://energycommerce.house.gov/108/home/ Available at http://www.sis-statistica.it/files/pdf/atti/ 07142006 Wegman Report.pdf. RSBa2004p273-276.pdf. Zickfeld, K., Leverman, A., Keith, D. W., Kuhl- Trottini, M. and Fienberg, S. E. (2002). Modelling user brodt, T., Morgan, M. G. and Rahmstorf, S. (2007). uncertainty for disclosure risk and data utility. Internat. J. Expert judgements on the response of the Atlantic merid- Uncertainty Fuzziness Knowledge-Based Systems 10 511– ional overturning circulation to climate change. Climatic 528. Change 82 235–265. Wegman, E. J., Scott, D. W. and Said, Y. (2006). Ad hoc committee report on the ‘hockey stick’ global