Statistical Science 1998, Vol. 13, No. 2, 95᎐122 R. A. Fisher in the 21st Century

Invited Paper Presented at the 1996 R. A. Fisher Lecture

Bradley Efron

Abstract. Fisher is the single most important figure in 20th century . This talk examines his influence on modern statistical think- ing, trying to predict how Fisherian we can expect the 21st century to be. Fisher’s philosophy is characterized as a series of shrewd compro- mises between the Bayesian and frequentist viewpoints, augmented by some unique characteristics that are particularly useful in applied problems. Several current research topics are examined with an eye toward Fisherian influence, or the lack of it, and what this portends for future statistical developments. Based on the 1996 Fisher lecture, the article closely follows the text of that talk. Key words and phrases: Statistical inference, Bayes, frequentist, fidu- cial, empirical Bayes, model selection, bootstrap, confidence intervals.

1. INTRODUCTION to some speculations on Fisher’s role in the statisti- cal world of the 21st century. Even scientists need their heroes, and R. A. Fisher What follows is basically the text of the Fisher was certainly the hero of 20th century statistics. lecture presented to the August 1966 Joint Statisti- His ideas dominated and transformed our field to cal meetings in Chicago. The talk format has cer- an extent a Caesar or an Alexander might have tain advantages over a standard journal article. envied. Most of this happened in the second quarter First and foremost, it is meant to be absorbed of the century, but by the time of my own education quickly, in an hour, forcing the presentation to Fisher had been reduced to a somewhat minor concentrate on main points rather than technical figure in American academic statistics, with the details. Spoken language tends to be livelier than influence of Neyman and Wald rising to their high the gray prose of a journal paper. A talk encourages water mark. bolder distinctions and personal opinions, which There has been a late 20th century resurgence of are dangerously vulnerable in a written article but interest in Fisherian statistics, in England where appropriate I believe for speculations about the his influence never much waned, but also in Amer- future. In other words, this will be a broad-brush ica and the rest of the statistical world. Much of painting, long on color but short on detail. this revival has gone unnoticed because it is hidden These advantages may be viewed in a less favor- behind the dazzle of modern computational meth- able light by the careful reader. Fisher’s mathemat- ods. One of my main goals here will be to clarify ical arguments are beautiful in their power and Fisher’s influence on modern statistics. Both the economy, and most of that is missing here. The strengths and limitations of Fisherian thinking will broad brush strokes sometimes conceal important be described, mainly by example, finally leading up areas of controversy. Most of the argumentation is by example rather than theory, with examples from my own work playing an exaggerated role. Refer- Bradley Efron is Max H. Stein Professor of Human- ences are minimal, and not indicated in the usual ities and Sciences and Professor of Statistics and author᎐year format but rather collected in anno- Biostatistics, Department of Statistics, Stanford tated form at the end of the text. Most seriously, University, Stanford, California 94305-4065 Že- the one-hour limit required a somewhat arbitrary mail: [email protected].. selection of topics, and in doing so I concentrated on

95 96 B. EFRON those parts of Fisher’s work that have been most methods will be widely recognized as a central important to me, omitting whole areas of Fisherian element of scientific thinking. influence such as randomization and experimental The 20th century began on an auspicious statisti- design. The result is more a personal essay than a cal note with the appearance of Karl Pearson’s systematic survey. famous ␹ 2 paper in the spring of 1900. The This is a talk Žas I will now refer to it. on Fisher’s groundwork for statistics’s growth was laid by a influence, not mainly on Fisher himself or even his pre᎐World War II collection of intellectual giants: intellectual history. A much more thorough study of Neyman, the Pearsons, Student, Kolmogorov, the work itself appears in L. J. Savage’s famous Hotelling and Wald, with Neyman’s work being talk and essay, ‘‘On rereading R. A. Fisher,’’ the especially influential. But from our viewpoint at the 1971 Fisher lecture, a brilliant account of Fisher’s century’s end, or at least from my viewpoint, the statistical ideas as sympathetically viewed by a dominant figure has been R. A. Fisher. Fisher’s leading Bayesian ŽSavage, 1976.. Thanks to John influence is especially pervasive in statistical appli- Pratt’s editorial efforts, Savage’s talk appeared, cations, but it also runs through the pages of our posthumously, in the 1976 Annals of Statistics. In theoretical journals. With the end of the century in the article’s discussion, Oscar Kempthorne called it view this seemed like a good occasion for taking the best statistics talk he had ever heard, and stock of the vitality of Fisher’s legacy and its poten- Churchill Eisenhart said the same. Another fine tial for future development. reference is Yates and Mather’s introduction to the A more accurate but less provocative title for this 1971 five-volume set of Fisher’s collected works. talk would have been ‘‘Fisher’s influence on modern The definitive Fisher reference in Joan Fisher Box’s statistics.’’ What I will mostly do is examine some 1978 biography, The Life of a Scientist. topics of current interest and assess how much It is a good rule never to meet your heroes. I Fisher’s ideas have or have not influenced them. inadvertently followed this rule when Fisher spoke The central part of the talk concerns six research at the Stanford Medical School in 1961, without areas of current interest that I think will be impor- notice to the Statistics Department. The strength of tant during the next couple of decades. This will Fisher’s powerful personality is missing from this also give me a chance to say something about the talk, but not I hope the strength of his ideas. Heroic kinds of applied problems we might be dealing with is a good word for Fisher’s attempts to change soon, and whether or not Fisherian statistics is statistical thinking, attempts that had a profound going to be of much help with them. influence on this century’s development of statistics First though I want to give a brief review of into a major force on the scientific landscape. ‘‘What Fisher’s ideas and the ideas he was reacting to. One about the next century?’’ is the implicit question difficulty in assessing the importance of Fisherian asked in the title, but I won’t try to address that statistics is that it’s hard to say just what it is. question until later. Fisher had an amazing number of important ideas and some of them, like randomization inference and conditionality, are contradictory. It’s a little as if in 2. THE STATISTICAL CENTURY economics Marx, Adam Smith and Keynes turned Despite its title, the greater portion of the talk out to be the same person. So I am just going to concerns the past and the present. I am going to outline some of the main Fisherian themes, with no begin by looking back on statistics in the 20th attempt at completeness or philosophical reconcilia- century, which has been a time of great advance- tion. This and the rest of the talk will be very short ment for our profession. During the 20th century on references and details, especially technical de- statistical thinking and methodology have become tails, which I will try to avoid entirely. the scientific framework for literally dozens of fields, In 1910, two years before the 20-year-old Fisher including education, agriculture, economics, biology published his first paper, an inventory of the statis- and medicine, and with increasing influence re- tics world’s great ideas would have included the cently on the hard sciences such as astronomy, following impressive list: Bayes theorem, least geology and physics. squares, the normal distribution and the central In other words, we have grown from a small limit theorem, binomial and Poisson methods for obscure field into a big obscure field. Most people count , Galton’s correlation and regression, and even most scientists still don’t know much multivariate distributions, Pearson’s ␹ 2 and Stu- about statistics except that there is something good dent’s t. What was missing was a core for these about the number ‘‘.05’’ and perhaps something bad ideas. The list existed as an ingenious collection of about the bell curve. But I believe that this will ad hoc devices. The situation for statistics was change in the 21st century and that statistical similar to the one now faced by computer science. R. A. FISHER IN THE 21ST CENTURY 97 In Joan Fisher Box’s words, ‘‘The whole field was be freed from the a priori assumptions of the like an unexplored archaeological site, its structure Bayesian school. hardly perceptible above the accretions of rubble, Fisher’s main tactic was to logically reduce a its treasures scattered throughout the literature.’’ given inference problem, sometimes a very compli- There were two obvious candidates to provide a cated one, to a simple form where everyone should statistical core: ‘‘objective’’ Bayesian statistics in agree that the answer is obvious. His favorite tar- the Laplace tradition of using uniform priors for get for the ‘‘obvious’’ was the situation where we unknown parameters, and a rough frequentism ex- observe a single normally distributed quantity x emplified by Pearson’s ␹ 2 test. In fact, Pearson with unknown expectation ␪, was working on a core program of his own through 2 his system of Pearson distributions and the method Ž1. x ; NŽ␪ , ␴ ., of moments. the variance ␴ 2 being known. Everyone agrees, By 1925, Fisher had provided a central core for says Fisher, that in this case, the best estimate is statisticsᎏone that was quite different and more ␪ˆs x and the correct 90% confidence interval for 0 compelling than either the Laplacian or Pearsonian Žto use terminology Fisher hated. is schemes. The great 1925 paper already contains most of the main elements of Fisherian estimation Ž2. ␪ˆ" 1.645␴ . theory: consistency; sufficiency; likelihood; Fisher Fisher’s inductive logic might be called a theory of information; efficiency; and the asymptotic optimal- types, in which problems are reduced to a small ity of the maximum likelihood estimator. Partly catalogue of obvious situations. This had been tried missing is ancillarity, which is mentioned but not before in statistics, the Pearson system being a fully developed until the 1934 paper. good example, but never so forcefully nor success- The 1925 paper even contains a fascinating and fully. Fisher was astoundingly resourceful at reduc- still controversial section on what Rao has called ing problems to simple forms like Ž1.. Some of the the second order efficiency of the maximum likeli- devices he invented for this purpose were suffi- hood estimate ŽMLE.. Fisher, never really satisfied ciency, ancillarity and conditionality, transfor- with asymptotic results, says that in small samples mations, pivotal methods, geometric arguments, the MLE loses less information than competing randomization inference and asymptotic maximum asymptotically efficient estimators, and implies that likelihood theory. Only one major reduction princi- this helps solve the problem of small-sample infer- ple has been added to this list since Fisher’s time, ence Žat which point Savage wonders why one invariance, and that one is not in universal favor should care about the amount of information in a these days. point estimator.. Fisher always preferred exact small-sample re- Fisher’s great accomplishment was to provide an sults but the asymptotic optimality of the MLE has optimality standard for statistical estimationᎏa been by far the most influential, or at least the yardstick of the best it’s possible to do in any given most popular, of his reduction principles. The 1925 estimation problem. Moreover, he provided a practi- paper shows that in large samples the MLE ␪ˆ of an cal method, maximum likelihood, that quite reli- unknown parameter ␪ approaches the ideal form ably produces estimators coming close to the ideal Ž1., optimum even in small samples. ˆ 2 Optimality results are a mark of scientific matu- ␪ ª NŽ␪ , ␴ ., rity. I mark 1925 as the year statistical theory with the variance ␴ 2 determined by the Fisher came of age, the year statistics went from an ad hoc information and the sample size. Moreover, no other collection of ingenious techniques to a coherent dis- ‘‘reasonable’’ estimator of ␪ has a smaller asymp- cipline. Statistics was lucky to get a Fisher at the totic variance. In other words, the maximum likeli- beginning of the 20th century. We badly need an- hood method automatically produces an estimator other one to begin the 21st, as will be discussed that can reasonably be termed ‘‘optimal,’’ without near the end of the talk. ever invoking the Bayes theorem. Fisher’s great accomplishment triggered a burst of interest in optimality results. The most spectacu- 3. THE LOGIC OF STATISTICAL INFERENCE lar product of this burst was the Neyman᎐Pearson Fisher believed that there must exist a logic of lemma for optimal hypothesis testing, followed soon inductive inference that would yield a correct an- by Neyman’s theory of confidence intervals. The swer to any statistical problem, in the same way Neyman᎐Pearson lemma did for hypothesis testing that ordinary logic solves deductive problems. By what Fisher’s MLE theory did for estimation, by using such an inductive logic the statistician would pointing the way toward optimality. 98 B. EFRON Philosophically, the Neyman᎐Pearson lemma fits spelled out in his 1934 paper on ancillarity, was in well with Fisher’s program: using mathematical that c should be different for different samples x logic it reduces a complicated problem to an obvious depending upon the correct amount of information solution without invoking Bayesian priors. More- in x. over, it is a tremendously useful idea in applica- The decision theory movement eventually tions, so that Neyman’s ideas on hypotheses testing spawned its own counter-reformation. The neo- and confidence intervals now play a major role in Bayesians, led by Savage and de Finetti, produced day-to-day applied statistics. a more logical and persuasive Bayesianism, empha- However, the success of the Neyman᎐Pearson sizing subjective probabilities and personal decision lemma triggered new developments, leading to a making. In its most extreme form the Savage᎐de more extreme form of statistical optimality that Finetti theory directly denies Fisher’s claim of an Fisher deeply distrusted. Even though Fisher’s per- impersonal logic of statistical inference. There has sonal motives are suspect here, his philosophical also been a postwar revival of interest in objectivist qualms were far from groundless. Neyman’s ideas, Bayesian theory, Laplacian in intent but based on as later developed by Wald into decision theory, Jeffreys’s more sophisticated methods for choosing brought a qualitatively different spirit into statis- objective priors, which I shall talk more about later tics. on. Fisher’s maximum likelihood theory was Very briefly then, this is the way we arrived at launched in reaction to the rather shallow Lapla- the end of the 20th century with three competing cian Bayesianism of the previous century. Fisher’s philosophies of statistical inference: Bayesian; Ney- work demonstrated a more stringent approach to man᎐Wald frequentist; and Fisherian. In many statistical inference. The Neyman᎐Wald decision ways the Bayesian and frequentist philosophies theoretic school carried this spirit of astringency stand at opposite poles from each other, with much further. A strict mathematical statement of Fisher’s ideas being somewhat of a compromise. I the problem at hand, often phrased quite narrowly, want to talk about that compromise next because it followed by an optimal solution became the ideal. has a lot to do with the popularity of Fisher’s The practical result was a more sophisticated form methods. of frequentist inference having enormous mathe- matical appeal. 4. THREE COMPETING PHILOSOPHIES Fisher, caught I think by surprise by this flank- The chart in Figure 1 shows four major areas of ing attack from his right, complained that the Ney- disagreement between the Bayesians and the fre- man᎐Wald decision theorists could be accurate quentists. These are not just philosophical dis- without being correct. A favorite example of his agreements. I chose the four categories because concerned a Cauchy distribution with unknown they lead to different behavior at the data-analytic center level. For each category I have given a rough indi- 1 cation of Fisher’s preferred position. Ž3. f␪ Ž x. s 2 . ␲ w1 q Ž x y ␪ . x 4.1 Individual Decision Making versus Scientific Inference Given a random sample x s Ž x1, x2 , . . . , xn. from Ž3., decision theorists might try to provide the Bayes theory, and in particular Savage᎐de Finetti shortest interval of the form ␪ˆ" c that covers the Bayesianism Žthe kind I’m focusing on here, though true ␪ with probability 0.90. Fisher’s objection, later I’ll also talk about the Jeffreys brand of objec-

FIG. 1. Four major areas of disagreement between Bayesian and frequentist methods. For each one I have inserted a row of stars to indicate, very roughly, the preferred location of Fisherian inference. R. A. FISHER IN THE 21ST CENTURY 99 tive Bayesianism., emphasizes the individual deci- rately Žand optimally.. Fisher emphasized the use sion maker, and it has been most successful in of all available information as a hallmark of correct fields like business where individual decisions are inference, and in this way he is more in sympathy paramount. Frequentists aim for universal accep- with the Bayesian position. tance of their inferences. Fisher felt that the proper In this case Fisher tended toward the Bayesian realm of statistics was scientific inference, where it position both in theory and in methodology: maxi- is necessary to persuade all or at least most of the mum likelihood estimation and its attendant theory world of science that you have reached the correct of approximate confidence intervals based on Fisher conclusion. Here Fisher is far over to the frequen- information are superbly suited to the combination tist side of the chart Žwhich is philosophically ac- of information from different sources. ŽOn the other curate but anachronistic, since Fisher’s position hand, we have this quote from Yates and Mather: predates both the Savage᎐de Finetti and Ney- ‘‘In his own work Fisher was at his best when man᎐Wald schools.. confronted with small self-contained sets of data. . . . He was never much interested in the as- 4.2 Coherence versus Optimality sembly and analysis of large amounts of data from Bayesian theory emphasizes the coherence of its varied sources bearing on a given issue.’’ They blame judgments, in various technical ways but also in this for his stubbornness on the smoking᎐cancer the wider sense of enforcing consistency relation- controversy. Here as elsewhere we will have to view ships between different aspects of a decision-mak- Fisher as a lapsed Fisherian.. ing situation. Optimality in the frequentist sense is 4.4 Optimism versus Pessimism frequently incoherent. For example, the uniform minimum variance unbiased ŽUMVU. estimate of This last category is more psychological than expÄ␪ 4 does not have to equal expÄthe UMVU of ␪ 4, philosophical, but it is psychology rooted in the and more seriously there is no simple calculus re- basic nature of the two competing philosophies. lating the two different estimates. Fisher wanted to Bayesians tend to be more aggressive and risk-tak- have things both ways, coherent and optimal, and ing in their data analyses. There couldn’t be a more in fact maximum likelihood estimation does satisfy pessimistic and defensive theory than minimax, to choose an extreme example of frequentist philoso- $ expÄ␪ˆ4 s expÄ␪ 4 . phy. It says that if anything can go wrong it will. Of course a minimax person might characterize the The tension between coherence and optimality is Bayesian position as ‘‘If anything can go right it like the correctness᎐accuracy disagreement con- will.’’ cerning the Cauchy example Ž3., where Fisher Fisher took a middle ground here. He scorns the argued strongly for correctness. The emphasis on finer mathematical concerns of the decision theo- correctness, and a belief in the existence of a logic rists Ž‘‘Not only does it take a cannon to shoot a of statistical inference, moves Fisherian philosophy sparrow, but it misses the sparrow!’’., but he fears toward the Bayesian side of Figure 1. Fisherian averaging over the states of nature in a Bayesian practice is a less clear story. Different parts of the way. One of the really appealing features of Fisher’s Fisherian program don’t cohere with each other work is its spirit of reasonable compromise, cau- and in practice Fisher seemed quite willing to sacri- tious but not overly concerned with pathological fice logical consistency for a neat solution to a situations. This has always struck me as the right particular problem, for example, switching back attitude toward most real-life problems, and it’s and forth between frequentist and nonfrequentist certainly a large part of Fisher’s dominance in sta- justifications of the Fisher information. This kind of tistical applications. case-to-case expediency, which is a common at- Looking at Figure 1, I think it is a mistake trying tribute of modern data analysis has a frequentist too hard to make a coherent philosophy out of flavor. I have located the Fisherian stars for this Fisher’s theories. From our current point of view category a little closer to the Bayesian side of Fig- they are easier to understand as a collection of ure 1, but spreading over a wide range. extremely shrewd compromises between Bayesian and frequentist ideas. Fisher usually wrote as if he 4.3 Synthesis versus Analysis had a complete logic of statistical inference in hand, Bayesian decision making emphasizes the collec- but that didn’t stop him from changing his system tion of information across all possible sources, and when he thought up another landmark idea. the synthesis of that information into the final De Finetti, as quoted by Cifarelli and Regazzini, inference. Frequentists tend to break problems into puts it this way: ‘‘Fisher’s rich and manifold per- separate small pieces that can be analyzed sepa- sonality shows a few contradictions. His common 100 B. EFRON sense in applications on one hand and his lofty coming decades. Fisherian methods were fashioned conception of scientific research on the other lead to deal with the problems of the 1920s and 1930s. It him to disdain the narrowness of a genuinely objec- is not a certainty that they will be equally applica- tivist formulation, which he regarded as a wooden ble to the problems of the 21st centuryᎏa question attitude. He professes his adherence to the objec- I hope to shed at least a little light upon. tivist point of view by rejecting the errors of the Bayes᎐Laplace formulation. What is not so good 5.1 Fisher Information and the Bootstrap here is his mathematics, which he handles with This first example is intended to show how mastery in individual problems but rather cava- Fisher’s ideas can pop up in current work, but be lierly in conceptual matters, thus exposing himself difficult to recognize because of computational ad- to clear and sometimes heavy criticism. From our vances. First, here is a very brief review of Fisher point of view it appears probable that many of information. Suppose we observe a random sample Fisher’s observations and ideas are valid provided x1, x2 , . . . , xn from a density function f␪ Ž x. depend- we go back to the intuitions from which they spring ing on a single unknown parameter ␪, and free them from the arguments by which he thought to justify them ’’ . f␪ Ž x. ª x1 , x2 , . . . , xn . Figure 1 describes Fisherian statistics as a com- promise between the Bayesian and frequentist The Fisher information in any one x is the ex- schools, but in one crucial way it is not a compro- pected value of minus the second derivative of the mise: in its ease of use. Fisher’s philosophy was log density, always expressed in very practical terms. He seemed to think naturally in terms of computa- Ѩ 2 i E log f Ž x. , tional algorithms, as with maximum likelihood esti- ␪ s ␪ ½y Ѩ␪ 2 ␪ 5 mation, analysis of variance and permutation tests. If anything is going to replace Fisher in the 21st and the total Fisher information in the whole sam- century it will have to be a methodology that is ple is ni␪ . equally easy to apply in day-to-day practice. Fisher showed that the asymptotic standard er- ror of the MLE is inversely proportional to the 5. FISHER’S INFLUENCE ON square root of the total information, CURRENT RESEARCH 1 There are three parts to this talk: past, present Ž4. se␪ Ž␪ˆ. s˙ , ni and future. The past part, which you have just ' ␪ seen, didn’t do justice to Fisher’s ideas, but the subject here is more one of influence than ideas, and that no other consistent and sufficiently regu- admitting of course that the influence is founded on lar estimation of ␪ᎏessentially no other asymptoti- the ideas’s strengths. So now I am going to discuss cally, unbiased estimatorᎏcan do better. Fisher’s influence on current research. A tremendous amount of philosophical interpre- What follows are several Žactually six. examples tation has been attached to i␪ , concerning the of current research topics that have attracted a lot meaning of statistical information, but in practice of attention recently. No claim of completeness is Fisher’s formula Ž4. is most often used simply as a being made here. The main point I’m trying to handy estimate of the standard error of the MLE. make with these examples is that Fisher’s ideas are Of course, Ž4. by itself cannot be used directly still exerting a powerful influence on developments because i␪ involves the unknown parameter ␪. in statistical theory, and that this is an important Fisher’s tactic, which seems obvious but in fact is indication of their future relevance. The examples quite central to Fisherian methodology, is to plug will gradually get more speculative and futuristic, in the MLE ␪ˆ for ␪ in Ž4., giving a usable estimate and will include some areas of development not of standard error, satisfactorily handled by Fisher᎐holes in the Fishe- $ 1 rian fabricᎏwhere we might expect future work to Ž5. se s . be more frequentist or Bayesian in motivation. 'ni␪ˆ The examples will also allow me to talk about the new breed of applied problems statisticians are Here is an example of formula Ž5. in action. Figure starting to see, the bigger, messier, more compli- 2 shows the results of a small study designed to cated data sets that we will have to deal with in the test the efficacy of an experimental antiviral drug. R. A. FISHER IN THE 21ST CENTURY 101 where ␪ˆU is the sample for U U U U the bootstrap data set x1 , x2 , x3 , . . . , x20. This whole process was independently repeated 2,000 times, giving 2,000 bootstrap correlation coef- ficients ␪ˆU. Figure 3 shows their histogram. The empirical of the 2,000 ␪ˆU values is $ seboot s 0.112, which is the normal-theory bootstrap estimate of standard error for ␪ˆ; 2,000 is 10 times more than needed for a standard error, but we will need all 2,000 later for the discussion of approximate confi- dence intervals. 5.2 The Plug-in Principle The Fisher information and bootstrap standard error estimates, 0.107 and 0.112, are quite close to FIG. 2. The cd4 data; 20 AIDS patients had their cd4 counts each other. This is no accident. Despite the fact that measured before and after taking an experimental drug; correla- they look completely different, the two methods are tion coefficient ␪ˆs 0.723. doing very similar calculations. Both are using the ‘‘plug-in principle’’ as a crucial step in getting the A total of n s 20 AIDS patients had their cd4 answer. counts measured before and after taking the drug, Here is a plug-in description of the two methods: yielding data ⅷ Fisher informationᎏŽi. compute an Žapproxi- xi s Žbeforei , afteri . for i s 1, 2, . . . , 20. mate. formula for the standard error of the sam- ple correlation coefficient as a function of the The Pearson sample correlation coefficient was ␪ˆ s unknown parameters Ž ␮, ⌺.; Žii. plug in esti- 0.723. How accurate is this estimate? Ž ˆ . Ž . If we assume a bivariate normal model for the mates ␮ˆ, ⌺ for the unknown parameters ␮, ⌺ data, in the formula; ⅷ bootstrapᎏŽi. plug in Ž ␮ˆ, ˆ⌺. for the unknown Ž6. N2 Ž ␮, ⌺. ª x1 , x2 , x3 , . . . , x20 , parameters Ž ␮, ⌺. in the mechanism generating the data; Žii. compute the standard error of the the notation indicating a random sample of 20 pairs sample correlation coefficient, for the plugged-in from a bivariate normal distribution with expecta- mechanism, by Monte Carlo simulation. tion vector ␮ and covariance matrix ⌺, then ␪ˆ is the MLE for the true correlation coefficient ␪. The The two methods proceed in reverse order, ‘‘com- Fisher information for estimating ␪ turns out to be pute and then plug in’’ versus ‘‘plug in and then 2 2 i␪ s 1rŽ1 y ␪ . Žafter taking proper account of the compute,’’ but this is a relatively minor technical ‘‘nuisance parameters’’ in Ž6.ᎏone of those techni- difference. The crucial step in both methods, and cal points I am avoiding in this talk. so Ž5. gives the only statistical inference going on, is the substi- estimated standard error tution of the estimates Ž ␮ˆ, ˆ⌺. for the unknown parameters Ž ␮, ⌺., in other words the plug-in prin- 2 $ Ž1 y ␪ˆ . ciple. Fisherian inference makes frequent use of the se s s 0.107. '20 plug-in principle, and this is one of the main rea- sons that Fisher’s methods are so convenient to use Here is a bootstrap estimate of standard error for in practice. All possible inferential questions are the same problem, also assuming that the bivariate answered by simply plugging in estimates, usually normal model is correct. In this context the boot- maximum likelihood estimates, for unknown pa- strap samples are generated from model Ž6., but rameters. with estimates ␮ˆ and ˆ⌺ substituted for the un- The Fisher information method involves cleverer known parameters ␮ and ⌺: mathematics than the bootstrap, but it has to be- cause we enjoy a 107 computational advantage over ˆ U U U U ˆU NŽ ␮ˆ, ⌺. ª x1 , x2 , x3 , . . . , x20 ª ␪ , Fisher. A year’s combined computational effort by 102 B. EFRON

FIG. 3. Histogram of 2,000 bootstrap correlation coefficients; bivariate normal model.

all the statisticians of 1925 wouldn’t equal a minute The MLE and its estimated standard error were of modern computer time. The bootstrap exploits used by Fisher to form approximate confidence in- this advantage to numerically extend Fisher’s cal- tervals, which I like to call the standard intervals culations to situations where the mathematics be- because of their ubiquity in day-to-day practice, comes hopelessly complicated. One of the less attractive aspects of Fisherian statistics is its over- $ Ž7. ␪ˆ" 1.645 se. reliance on a small catalog of simple parametric models like the normal, understandable enough The constant, 1.645, gives intervals of approximate given the limitations of the mechanical calculators 90% coverage for the unknown parameter ␪, with Fisher had to work with . 5% noncoverage probabilities at each end of the Modern computation has given us the opportu- interval. We could use 1.96 instead of 1.645 for 95% nity to extend Fisher’s methods to a much wider coverage, and so on, but here I’ll stick to 90%. Ž class of models, including nonparametric ones the The standard intervals follow from Fisher’s re- . more usual arena of the bootstrap . We are begin- sult that ␪ˆ is asymptotically normal, unbiased and ning to see many such extensions, for example, the with standard error fixed by the sample size and extension of discriminant analysis to CART, and the Fisher information, the extension of to generalized additive models. 2 Ž8. ␪ˆª NŽ␪ , se ., 6. THE STANDARD INTERVALS as in Ž4.. We recognize Ž8. as one of Fisher’s ideal I want to continue the cd4 example, but proceed- ‘‘obvious’’ forms. ing from standard errors to confidence intervals. If usage determines importance then the stan- The confidence interval story illustrates how com- dard intervals were Fisher’s most important inven- puter-based inference can be used to extend Fisher’s tion. Their popularity is due to a combination of ideas in a more ambitious way. optimality, or at least asymptotic optimality, with R. A. FISHER IN THE 21ST CENTURY 103 computation tractability. The standard intervals Fisher suggested a fix for this specific situation: are: transform the correlation coefficient to ␾ˆ s tanhy1 Ž␪ˆ., that is, to ⅷ accurateᎏtheir noncoverage probabilities, which are supposed to be 0.05 at each end of the inter- val, are actually 1 1 q ␪ˆ Ž10. ␾ˆs log , 2 1 y ␪ˆ Ž9. 0.05 q cr'n , apply the standard method on this scale and then where c depends on the situation, so as the sam- transform the standard interval back to the ␪ scale. ple size n gets large we approach the nominal This was another one of Fisher’s ingenious reduc- value 0.05 at rate ny1r2 ; tion methods. The tanhy1 transformation greatly ⅷ correctᎏthe estimated standard error based on accelerates convergence to normality, as we can see U the Fisher information is the minimum possible from the histogram of the 2,000 values of ␪ˆ s 1 U for any asymptotically unbiased estimate of ␪ so tanhy Ž␪ˆ . in the right panel of Figure 4, and interval Ž7. doesn’t waste any information nor is makes the standard intervals far more accurate. it misleadingly optimistic; However, we have now lost the ‘‘automatic’’ prop- $ erty of the standard intervals. The tanhy1 trans- ⅷ automaticᎏ␪ˆ and se are computed from the same formation works only for the normal correlation basic algorithm no matter how complicated the coefficient and not for most other problems. problem may be. The standard intervals take literally the large Despite these advantages, applied statisticians sample approximation ␪ˆ; NŽ␪, se2 ., which says know that the standard intervals can be quite inac- that ␪ˆ is normally distributed, is unbiased for ␪ curate in small samples. This is illustrated in the and has a constant standard error. A more careful left panel of Figure 4 for the cd4 correlation exam- look at the asymptotics shows that each of these ple, where we see that the standard interval end- three assumptions can fail in a substantial way: the points lie far to the right of the endpoints for the sampling distribution of ␪ˆ can be skewed; ␪ˆ can be normal-theory exact 90% central confidence inter- biased as an estimate of ␪; and its standard error val. In fact, we can see from the bootstrap his- can change with ␪. Modern computation makes it togram Žreproduced from Figure 3. that in this case practical to correct all three errors. I am going to the asymptotic normality of the MLE hasn’t taken mention two methods of doing so, the first using the hold at n s 20, so that there is every reason to bootstrap histogram, the second based on likelihood doubt the standard interval. Being able to look at methods. the histogram, which has a lot of information in it, It turns out that there is enough information in is a luxury Fisher did not have. the bootstrap histogram to correct all three errors

FIG. 4. Ž Left panel. Endpoints of exact 90% confidence interval for cd4 correlation coefficient Žsolid lines. are much different than standard interval endpoints Ždashed lines., as suggested by the nonnormality of the bootstrap histogram. Ž Right panel. Fisher’s transformation normalizes the bootstrap histogram and makes the standard interval more accurate. 104 B. EFRON of the standard intervals. The result is a system of TABLE 2 approximate confidence intervals an order of mag- The occurrence of adverse events in a randomized nitude more accurate, with noncoverage probabili- experiment; sample log odds ratio ␪ˆs y4.2 ties Yes No 0.05 q crn Treatment 1 15 16 compared to Ž9., achieving what is called second Control 13 3 16 order accuracy. Table 1 demonstrates the practical 14 18 advantages of second order accuracy. In most situa- tions we would not have exact endpoints as a ‘‘gold standard’’ for comparison, but second order accu- racy would still point to the superiority of the boot- Fisher wondered how one might make appropri- strap intervals. ate inferences for ␪, the true log odds ratio. The The bootstrap method, and also the likelihood- trouble here is nuisance parameters. A multinomial based methods of the next section, are transforma- model for the 2 = 2 table has three free param- tion invariant; that is, they give the same interval eters, representing four cell probabilities con- for the correlation coefficient whether or not you go strained to add up to 1, and in some sense two of through the tanhy1 transformation. In this sense the three parameters have to be eliminated in order they automate Fisher’s wonderful transformation to get at ␪. To do this Fisher came up with another trick. device for reducing a complicated situation to a I like this example because it shows how a basic simple form. Fisherian construction, the standard intervals, can Fisher showed that if we condition on the be extended by modern computation. The extension marginals of the table, then the conditional density lets us deal easily with very complicated probability of ␪ˆ given the marginals depends only ␪. The nui- models, even nonparametric ones, and also with sance parameters disappear. This conditioning is complicated statistics such as a coefficient in a ‘‘correct’’ he argued because the marginals are act- stepwise . ing as what might be called approximate ancillary Moreover, the extension is not just to a wider set statistics. That is, they do not carry much direct of applications. Some progress in understanding information concerning the value of ␪, but they the theoretical basis of approximate confidence in- have something to say about how accurately ␪ˆ esti- tervals is made along the way. Other topics are mates ␪. Later Neyman gave a much more specific springing up in the same fashion. For example, frequentist justification for conditioning on the Fisher’s 1925 work on the information loss for in- marginals, through what is now called Neyman sufficient estimators has transmuted into our mod- structure. ern theories of the EM algorithm and Gibbs sam- For the data in Table 2, the conditional distribu- ˆ pling. tion of ␪ given the marginals yields wy6.3, y2.4x as a 90% confidence interval for ␪, ruling out the null 7. CONDITIONAL INFERENCE, hypothesis value ␪ s 0 where Treatment equals ANCILLARITY AND THE MAGIC FORMULA Control. However, the conditional distribution is not easy to calculate, even in this simple case, and Table 2 shows the occurrence of a very undesir- it becomes prohibitive in more complicated situa- able side effect in a randomized experiment that tions. will be described more fully later. The treatment In his 1934 paper, which was the capstone of produces a smaller ratio of these undesirable effects Fisher’s work on efficient estimation, he solved the than does the control, the sample log odds ratio conditioning problem for translation families. Sup- being pose that x s Ž x1, x2 , . . . , xn. is a random sample 1 13 from a Cauchy distribution Ž3. and that we wish to ␪ˆ log 4.2. s 15 3 s y use x to make inferences about ␪, the unknown ž / center point of the distribution. In this case there is a genuine ancillary statistic A, the vector of spac- TABLE 1 ings between the ordered values of x. Again Fisher Endpoints of exact and approximate 90% confidence intervals argued that correct inferences about ␪ should be for the cd4 correlation coefficient assuming bivariate normality based on f␪ Ž␪ˆ

FIG. 5. Fiducial density for a binomial parameter ␪ having observed 3 successes out of 10 trials. The dashed line is an approximation that is useful in complicated situations. tially correct. The fiducial distribution in Figure 5 profession’s 250-year search for a dependable objec- is the confidence density based on the usual confi- tive Bayes theory. dence limits for ␪ Žtaking into account the discrete nature of the binomial distribution.: ␪ˆw ␣ x is the 8.2 Objective Bayes value of ␪ such that S ; BinomialŽ10, ␪ . satisfies By ‘‘objective Bayes’’ I a Bayesian theory in 1 which the subjective element is removed from the prob ÄS ) 34 q 2 probÄS s 34 s ␣ . choice of prior distribution; in practical terms a Fisher was uncomfortable applying fiducial argu- universal recipe for applying Bayes theorem in the ments to discrete distributions because of the ad absence of prior information. A widely accepted hoc continuity corrections required, but the difficul- objective Bayes theory, which fiducial inference was ties caused are more theoretical than practical. intended to be, would be of immense theoretical The advantage of stating fiducial ideas in terms and practical importance. of the confidence density is that they then can be I have in mind here dealing with messy, compli- applied to a wider class of problems. We can use the cated problems where we are trying to combine approximate confidence intervals mentioned ear- information from disparate sourcesᎏdoing a me- lier, either the bootstrap or the likelihood ones, to taanalysis, for example. Bayesian methods are get approximate fiducial distribution even in very particularly well-suited to such problems. This is complicated situations having lots of nuisance pa- particularly true now that techniques like the Gibbs rameters. ŽThe dashed curve in Figure 5 is the sampler and Markov chain Monte Carlo are avail- confidence density based on approximate bootstrap able for integrating the nuisance parameters out of intervals.. And there are practical reasons why it high-dimensional posterior distributions. would be very convenient to have good approximate The trouble of course is that the statistician still fiducial distributions, reasons connected with our has to choose a prior distribution in order to use R. A. FISHER IN THE 21ST CENTURY 107 Bayes’s theorem. An unthinking use of uniform and even millions of data points and hundreds of priors is no better now than it was in Laplace’s day. candidate models. A thriving area called machine A lot of recent effort has been put into the develop- learning has developed to handle such problems, in ment of uninformative or objective prior distribu- ways that are not yet very well connected to statis- tions, priors that eliminate nuisance parameters tical theory. safely while remaining neutral with respect to the Table 3, taken from Gail Gong’s 1982 thesis, parameter of interest. Kass and Wasserman’s 1996 shows part of the data from a model selection prob- JASA article reviews current developments by lem that is only moderately complicated by today’s Berger, Bernardo and many others, but the task of standards, though hopelessly difficult from a pre- finding genuinely objective priors for high-dimen- war viewpoint. A ‘‘training set’’ of 155 chronic hep- sional problems remains daunting. atitis patients were measured on 19 diagnostic pre- Fiducial distributions, or confidence densities, of- diction variables. The outcome variable y was fer a way to finesse this difficulty. A good argument whether or not the patient died from liver failure can be made that the confidence density is the Ž122 lived, 33 died., the goal of the study being to posterior density for the parameter of interest, af- develop a prediction rule for y in terms of the ter all of the nuisance parameters have been inte- diagnostic variables. grated out in an objective way. If this argument In order to predict the outcome, a logistic regres- turns out to be valid, then our progress in con- sion model was built up in three steps: structing approximate confidence intervals, and ap- ⅷ Individual logistic regressions were run for each proximate confidence densities, could lead to an of the 19 predictors, yielding 13 that were signifi- easier use of Bayesian thinking in practical prob- cant at the 0.05 level. lems. This is all quite speculative, but here is a safe ⅷ A forward stepwise logistic regression program, prediction for the 21st century: statisticians will be including only those patients with none of the 13 asked to solve bigger and more complicated prob- predictors missing, retained 5 of the 13 predictors lems. I believe that there is a good chance that at significance level 0.10. objective Bayes methods will be developed for such ⅷ A second forward stepwise logistic regression pro- problems, and that something like fiducial infer- gram, including those patients with none of the 5 ence will play an important role in this develop- predictors missing, retained 4 of the 5 at signifi- ment. Maybe Fisher’s biggest blunder will become a cance level 0.05. big hit in the 21st century! These last four variables, 9. MODEL SELECTION Ž13. ascites, Ž15. bilirubin, Model selection is another area of statistical re- Ž7. malaise, Ž20. histology, search where important developments seem to be were deemed the ‘‘important predictors.’’ The logis- building up, but without a definitive breakthrough. tic regression based on them misclassified 16% of The question asked here is how to select the model the 155 patients, with cross-validation suggesting a itself, not just the continuous parameters of a given true error rate of about 20%. model, from the observed data. F-tests, and ‘‘F’’ A crucial question concerns the validity of the stands for Fisher, help with this task, and are selected model. Should we take the four ‘‘important certainly the most widely used model selection predictors’’ very seriously in a medical sense? The techniques. However, even in relatively simple bootstrap answer seems to be ‘‘probably not,’’ even problems things can get complicated fast, as anyone though it was natural for the medical investigator who has gotten lost in a tangle of forward and to do so given the impressive amount of statistical backward stepwise regression programs can testify. machinery involved in their selection. The fact is that classic Fisherian estimation and Gail Gong resampled the 155 patients, taking as testing theory are a good start, but not much more a unit each patient’s entire record of 19 predictors than that, on model selection. In particular, maxi- and response. For each bootstrap data set of 155 mum likelihood estimation theory and model fitting resampled records, she reran the three-stage logis- do not account for the number of free parameters tic regression model, yielding a bootstrap set of being fit, and that is why frequentist methods ‘‘important predictors.’’ This was done 500 times. like Mallow’s Cp , the Akaike information criterion Figure 6 shows the important predictors for the and cross-validation have evolved. Model selection final 25 bootstrap data sets. The first of these is seems to be moving away from its Fisherian roots. Ž13, 7, 20, 15., agreeing except for order with the set Now statisticians are starting to see really com- Ž13, 15, 7, 20. from the original data. This didn’t plicated model selection problems, with thousands happen in any other of the 499 bootstrap cases. In 108 B. EFRON ࠻ 155 154 153 152 151 150 149 148 147 146 145 3 3 3 3 2 2 2 2 2 2 2 20 y y y y logy Histo- 1 1 1 1 1 1 42 48 50 35 54 19 y y y y y y tein Pro- as 1 1 1 3 3 0 0 8 4 4 2 ...... 18 3 4 4 4 3 4 4 2 3 2 4 1 min Albu- or 0 s 19 19 20 30 17 y 142 242 152 528 120 193 114 SGOT 1 1 84 95 89 65 75 outcome 16 y y 100 126 120 109 Alk ; Phos 20 50 80 90 60 60 90 70 20 20 90 ...... 15 patients 1 1 0 0 7 0 0 1 4 1 1 Bili- rubin 11 3 2 1 2 2 1 2 2 1 2 2 last 14 data y ices Var- the 3 1 2 2 2 1 2 2 2 2 1 for 13 missing y As- cites

FIG. 6. The set of ‘‘important predictors’’ selected in the last 25 shown 3 1 1 1 2 1 2 2 1 2 1 of 500 bootstrap replications of the three-step logistic regression 12 y Spi- indicate ders model selection program; original choices were Ž13, 15, 7, 20. data . ; 3 3 1 1 2 2 2 2 2 1 2 2

alp all 500 bootstrap replications only variable 20, his- 11 y numbers P Spleen e tology, which appeared 295 times, was ‘‘important’’ ABLE variables T more than half of the time. These results certainly 3 3 2 2 1 1 2 2 1 2 2 discourage confidence in the causal nature of the 10 negativ y y ; Firm Liver predictor variables Ž13, 15, 7, 20.. diagnostic

died Or do they? It seems like we should be able to use 3 2 2 1 2 2 2 2 2 2 2 19 9 or

y the bootstrap results to quantitatively assess the Big Liver on validity of the various predictors. Perhaps they lived could also help in selecting a better prediction 2 2 2 2 1 2 2 1 2 2 1 8

exia model. Questions like these are being asked these Anor- measured patient days, but the answers so far are more intriguing than conclusive. 2 2 1 2 1 2 2 1 2 2 1 7 were aise Mal- It is not clear to me whether Fisherian methods will play much of a role in the further progress of 1 1 1 1 1 2 2 1 1 1 1 6

patients model selection theory Figure 6 makes model selec- Fa- . tigue tion look like an exercise in discrete estimation, while Fisher’s MLE theory was always aimed at 2 2 2 2 2 2 2 2 2 2 2 5 hepatitis viral Anti- continuous situations. Direct frequentist methods like cross-validation seem more promising right 2 1 1 2 2 2 1 1 2 1 2 4 now, and there have been some recent develop- chronic oid Ster- ments in Bayesian model selection, but in fact our

155 best efforts so far are inadequate for problems like 1 2 1 1 1 1 1 1 1 1 1 3

Sex the hepatitis data. We could badly use a clever Fisherian trick for reducing complicated model se- 2 43 53 61 44 46 36 20 70 41 31 45

Age lection problems to simple obvious ones.

10. EMPIRICAL BAYES METHODS 1 1 1 1 1 1 1 1 1 1 1 1 tant Cons- As a final example, I wanted to say a few words y 1 0 0 0 1 0 0 1 1 0 1 about empirical Bayes methods. Empirical Bayes R. A. FISHER IN THE 21ST CENTURY 109 seems like the wave of the future to me, but it seemed that way 25 years ago and the wave still hasn’t washed in, despite the fact that it is an area of enormous potential importance. It is not a topic that has had much Fisherian input. Table 4 shows the data for an empirical Bayes situation: independent clinical trials were run in 41 cities, comparing the occurrence of recurrent bleed- ing, an undesirable side effect, for two stomach ulcer surgical techniques, a new treatment and an older control. Each trial yielded an estimate of the true log odds ratio for recurrent bleeding, Treat- ment versus Control,

␪i s log odds ratio in city i, i s 1, 2, . . . , 41. In city 8, for example, we have the estimate seen earlier in Table 2, 1 13 ␪ˆ log 4.2, s ž 15 3 / s y indicating that the new surgery was very effective in reducing recurrent bleeding, at least in city 8. FIG. 7. Individual likelihood functions for ␪i, for 10 of the 41 experiments in Table 4; L , the likelihood for the log odds ratio Figure 7 shows the likelihoods for ␪i in 10 of the 8 41 cities. These are conditional likelihoods, using in city 8, lies to the left of most of the others. Fisher’s trick of conditioning on the marginals to get rid of the nuisance parameters in each city. It likelihood functions concentrating themselves on seems clear that the log odds ratios ␪i are not all the range Žy6, 3.. ŽThis is the kind of complicated the same. For instance, the likelihoods for cities 8 inferential situation I was worrying about in the and 13 barely overlap. On the other hand, the ␪i discussion of fiducial inference, confidence densities values are not wildly discrepant, most of the 41 and objective Bayes methods..

TABLE 4 Ulcer data: 41 independent experiments concerning the number of occurrences of recurrent bleeding following ulcer surgery; Ža, b. s Ž࠻ bleeding; ࠻ nonbleeding. for Treatment, a new surgical technique; Žc, d. is the same for Control, an older surgery; ␪ˆ is the sample log odds ratio, with $ estimated standard deviation SD; stars indicate cases shown in Figure 7

$ $ Experiment a b c d ˆ␪ SD Experiment a b c d ˆ␪ SD

U 1 7 8 11 2 y1.84 0.86 21 6 34 13 8 y2.22 0.61 2 8 11 8 8 y0.32 0.66 22 4 14 5 34 66 0.71 3U 5 29 4 35 0.41 0.68 23 14 54 13 61 20 0.42 4 7 29 4 27 0.49 0.65 24 6 15 8 13 y43 0.64 U 5 3 9 0 12 inf 1.57 25 0 6 6 0 yinf 2.08 U 6 4 3 4 0 yinf 1.65 26 1 9 5 10 y1.50 1.02 U 7 4 13 13 11 y1.35 0.68 27 5 12 5 10 y0.18 0.73 U 8 1 15 13 3 y4.17 1.04 28 0 10 12 2 yinf 1.60 9 3 11 7 15 y0.54 0.76 29 0 22 8 16 yinf 1.49 U 10 2 36 12 20 y2.38 0.75 30 2 16 10 11 y1.98 0.80 11 6 6 8 0 yinf 1.56 31 1 14 7 6 y2.79 1.01 U 12 2 5 7 2 y2.17 1.06 32 8 16 15 12 y0.92 0.57 U 13 9 12 7 17 0.60 0.61 33 6 6 7 2 y1.25 0.92 14 7 14 5 20 0.69 0.66 34 0 20 5 18 yinf 1.51 15 3 22 11 21 y1.35 0.68 35 4 13 2 14 0.77 0.87 16 4 7 6 4 y0.97 0.86 36 10 30 12 8 y1.50 0.57 17 2 8 8 2 y2.77 1.02 37 3 13 2 14 0.48 0.91 18 1 30 4 23 y1.65 0.98 38 4 30 5 14 y0.99 0.71 19 4 24 15 16 y1.73 0.62 39 7 31 15 22 y1.11 0.52 U 20 7 36 16 27 y1.11 0.51 40 0 34 34 0 yinf 2.01 41 0 9 0 16 NA 2.04 110 B. EFRON

Notice that L8 , the likelihood for ␪ 8 , lies to the we can solve. The current attention to metaanalysis left of most of the other curves. This would still be and hierarchical models certainly suggests a grow- true if we could see all 41 curves instead of just 10 ing interest in the empirical Bayes kind of situa- of them. In other words, ␪ 8 appears to be more tion. negative than the log odds ratios in most of the other cities. 11. THE STATISTICAL TRIANGLE What is a good estimate or confidence interval for

␪ 8? Answering this question depends on how much The development of modern statistical theory has the results in other cities influence our thinking been a three-sided tug of war between the Bayesian, about city 8. That is where empirical Bayes theory frequentist and Fisherian viewpoints. What I have comes in, giving us a systematic framework for been trying to do with my examples is apportion combining the direct information for ␪ 8 from city the influence of the three philosophies on several 8’s experiment with the indirect information from topics of current interest: standard error estima- the experiments in the other 40 cities. tion; approximate confidence intervals; conditional

The ordinary 90% confidence interval for ␪ 8 , inference; objective Bayes theories and fiducial in- based only on the data Ž1, 15, 13, 3. from its own ference; model selection; and empirical Bayes tech- experiment, is niques. Figure 8, the statistical triangle, does this more Ž13. ␪ 8 g wy6.3, y2.4x . concisely. It uses barycentric coordinates to indicate the influence of Bayesian, frequentist and Fishe- Empirical Bayes methods give a considerably dif- rian thinking upon a variety of active research ferent result. The empirical Bayes analysis uses the areas. The Fisherian pole of the triangle is located data in the other 40 cities to estimate a prior between the Bayesian and frequentist poles, as in density for log odds ratios. This prior density can be Figure 1, but here I have allocated Fisherian phi- combined with the likelihood L for city 8, using 8 losophy its own dimension to take account of its Bayes theorem, to get a central 90% a posteriori distinctive operational features: reduction to ‘‘obvi- interval for ␪ , 8 ous’’ types; the plug-in principle; an emphasis on inferential correctness; the direct interpretation of Ž14. ␪ 8 g wy5.1, y1.8x . likelihoods; and the use of automatic computational The fact that most of the cities had less nega- algorithms. tively tending results than city 8 plays an impor- Of course, a picture like this cannot be more than tant role in the empirical Bayes analysis. The roughly accurate, even if one accepts the author’s Bayesian prior estimated from the other 40 cities prejudices, but many of the locations are difficult to says that ␪ 8 is unlikely to be as negative as its own argue with. I had no trouble placing conditional data by itself would indicate. inference and partial likelihood near the Fisherian The empirical Bayes analysis implies that there pole, robustness at the frequentist pole and multi- is a lot of information in the other 40 cities’s data ple imputation near the Bayesian pole. Empirical for estimating ␪ 8 , as a matter of fact, just about as Bayes is clearly a mixture of Bayesian and frequen- much as in city 8’s own data. This kind of ‘‘other’’ tist ideas. Bootstrap methods combine the con- information does not have a clear Fisherian inter- venience of the plug-in principle with a strong pretation. The whole empirical Bayes analysis is frequentist desire for accurate operating character- heavily Bayesian, as if we had begun with a gen- istics, particularly for approximate confidence in- uinely informative prior for ␪ 8 and yet it still has tervals, while the jackknife’s development has been some claims to frequentist objectivity. more purely frequentistic. Perhaps we are verging here on a new compro- Some of the other locations in Figure 8 are more mise between Bayesian and frequentist methods, problematical. Fisher provided the original idea be- one that is fundamentally different from Fisher’s hind the EM algorithm, and in fact the self-con- proposals. If so, the 21st century could look a lot sistency of maximum likelihood estimation Žwhen less Fisherian, at least for problems with parallel is filled in by the statistician. is a structure like the ulcer data. Right now there aren’t classic Fisherian correctness argument. On the many such problems. This could change quickly if other hand EM’s modern development has had a the statistics community became more confident strong Bayesian component, seen more clearly in about analyzing empirical Bayes problems. There the related topic of Gibbs sampling. Similarly, weren’t many factorial design problems before Fisher’s method for combining independent p-val- Fisher provided an effective methodology for han- ues is an early form of metaanalysis, but the sub- dling them. Scientists tend to bring us the problems ject’s recent growth has been strongly frequentist. R. A. FISHER IN THE 21ST CENTURY 111

FIG. 8. A barycentric picture of modern statistical research, showing the relative influence of the Bayesian, frequentist and Fisherian philosophies upon various topics of current interest.

The trouble here is that Fisher wasn’t always a in your Bayesian corner, it seems to me that some- Fisherian, so it is easy to confuse parentage with thing interesting happens: the Jeffreys concept development. moves to the right and winds up much closer to the The most difficult and embarrassing case con- frequency corner than to the Bayesian one. For cerns what I have been calling ‘‘objective Bayes’’ example, you contrasted Bayes as optimistic and methods, among which I included fiducial infer- risk-taking with frequentist as pessimistic and ence. One definition of frequentism is the desire to playing it safe. On both of these scales Jeffreys is do well, or at least not to do poorly, against every much closer to the frequentist end of the spectrum. possible prior distribution. The Bayesian spirit, as In fact, the concept of uninformative prior is philo- epitomized by Savage and de Finetti, is to do very sophically close to Wald’s least favorable distribu- well against one prior distribution, presumably the tion, and the two often coincide.’’ right one. Lehmann’s advice is followed a bit in Figure 8, There have been a variety of objective Bayes where the Bayesian model selection ŽBIC. point, a compromises between these two poles. Working near direct legacy of Jeffreys’s work, has been moved a the frequentist end of the spectrum, Welch and little ways toward the frequentist pole. However, I Peers showed how to calculate priors whose a pos- have located fiducial inference, Fisher’s form of teriori credibility intervals coincide closely with objective Bayesianism, near the center of the trian- standard confidence intervals. Jeffreys’s work, gle. There isn’t much work in that area right now which has led to vigorous modern development of but there is a lot of demand coming from all three Bayesian model selection, is less frequentistic. In a directions. bivariate normal situation Jeffreys would recom- The point of my examples, and the main point of mend the same prior distribution for estimating the this talk, was to show that Fisherian statistics, is correlation coefficient or for the ratio of expecta- not a dead language and that it continues to inspire tions, while the Welch᎐Peers theory would use two new research. I think this is clear in Figure 8, even different priors in order to separately match each of allowing for its inaccuracies. But Fisher’s language the frequentist solutions. is not the only language in town, and it is not even Nevertheless Jeffreys’s Bayesianism has an un- the dominant language of our research journals. deniable objectivist flavor. Erich Lehmann Žper- That prize would have to go to a rather casual sonal communication. had this to say: ‘‘If one sepa- frequentism, not usually as hard-edged as pure rates the two Bayesian concepts wSavage᎐de Finetti decision theory these days. We might ask what and Jeffreysx and puts only the subjective version Figure 8 will look like 20 or 30 years from now, and 112 B. EFRON whether there will be many points of active re- search interest lying near the Fisherian pole of the Statistical Science interview now predicts the year 2020 . He could be right Bayesian methods are triangle. . . attractive for complicated problems like the ones 12. R. A. FISHER IN THE 21ST CENTURY just mentioned, but unless the scientific world changes the way it thinks I can’t imagine subjective Most talks about the future are really about the Bayes methods taking over. What I called objective present, and this one has certainly been no excep- Bayes, the use of neutral or uninformative priors, tion. But here at the end of the talk, and nearly at seems a lot more promising and is certainly in the the end of the 20th century, we can peek cautiously air these days. ahead and speculate at least a little bit about the A successful objective Bayes theory would have to future of Fisherian statistics. provide good frequentist properties in familiar situ- Of course Fisher’s fundamental discoveries like ations, for instance, reasonable coverage probabili- sufficiency, Fisher information, the asymptotic effi- ties for whatever replaces confidence intervals. Such ciency of the MLE, experimental design and ran- a Bayesian world might not seem much different domization inference are not going to disappear. than the current situation except for more straight- They might become less visible though. Right now forward analyses of complicated problems like mul- we use those ideas almost exactly as Fisher coined tiple comparisons. One can imagine the statistician them, but modern computing equipment could of the year 2020 hunched over his or her supercom- change that. puter terminal, trying to get Proc Prior to run For example, maximum likelihood estimates can successfully, and we can only wish that future col- be badly biased in certain situations involving a league ‘‘good luck.’’ great many nuisance parameters Žas in the Ney- man᎐Scott paradox.. A computer-modified version 12.2 Nonparametrics of the MLE that was less less biased could become As part of our Fisherian legacy we tend to overuse the default estimator of choice in applied problems. simple parametric models like the normal. A non- REML estimation of variance components offers a parametric world, where parametric models were a current example. Likewise, with the universal last resort instead of the first, would favor the spread of high-powered computers statisticians frequentist vertex of the triangle picture. might automatically use some form of the more accurate confidence intervals I mentioned earlier 12.3 A New Synthesis instead of the standard intervals. The postwar years and especially the last few Changes like these would conceal Fishers’s influ- decades have been more notable for methodological ence, but not really diminish it. There are a couple progress than the development of fundamental new of good reasons though that one might expect more ideas in the theory of statistical inference. This dramatic changes in the statistical world, the first doesn’t mean that such developments are finished of these being the miraculous improvement in our forever. Fisher’s work came out of the blue in the computational equipment, by orders of magnitude 1920s, and maybe our field is due for another bolt every decade. Equipment is destiny in science, and of lightening. statistics is no exception to that rule. Second, It’s easy for us to imagine that Fisher, Neyman statisticians are being asked to solve bigger, harder, and the others were lucky to live in a time when all more complicated problems, under such names the good ideas hadn’t been plucked from the trees. as pattern recognition, DNA screening, neural In fact, we are the ones living in the golden age of networks, imaging and machine learning. New statisticsᎏthe time when computation has become problems have always evoked new solutions in fast and easy. In this sense we are overdue for a statistics, but this time the solutions might have to new statistical paradigm, to consolidate the be quite radical ones. methodological gains of the postwar period. The Almost by definition it’s hard to predict radical rubble is building up again, to use Joan Fisher change, but I thought I would finish with a few Box’s simile, and we could badly use a new Fisher speculative possibilities about a statistical future to put our world in order. that might, or might not, be a good deal less Fisher- My actual guess is that the old Fisher will have a ian. very good 21st century. The world of applied statis- tics seems to need an effective compromise between 12.1 A Bayesian World Bayesian and frequentist ideas, and right now there In 1974 Dennis Lindley predicted that the 21st is no substitute in sight for the Fisherian synthesis. century would be Bayesian. ŽI notice that his recent Moreover, Fisher’s theories are well suited to life in R. A. FISHER IN THE 21ST CENTURY 113 the computer age. Fisher seemed naturally to think Section 4 in algorithmic terms. Maximum likelihood esti- EFRON, B. Ž1978.. Controversies in the foundations of statistics. mates, the standard intervals, ANOVA tables, per- Amer. Math. Monthly 85 231᎐246. ŽThe Bayes᎐Frequen- mutation tests are all expressed algorithmically tist᎐Fisherian argument in terms of what kinds of averages and are easy to extend with modern computation. should the statistician take. Includes Fisher’s famous circle Let me say finally that Fisher was a genius of the example of ancillarity.. FRON Ž . first rank, who has a solid claim to being the most E , B. 1982 . Maximum likelihood and decision theory. Ann. Statist. 10 240᎐356. ŽExamines five questions concerning important applied mathematician of the 20th cen- maximum likelihood estimation: What kind of theory is it? tury. His work has a unique quality of daring math- How is it used in practice? How does it look from a frequen- ematical synthesis combined with the utmost prac- tistic decision-theory point of view? What are its principal ticality. The stamp of his work is very much upon virtues and defects? What improvements have been sug- gested by decision theory?. our field and shows no sign of fading. It is the CIFARELLI, D. and REGAZZINI, E. Ž1996.. De Finetti’s contribution stamp of a great thinker, and statisticsᎏand sci- to probability and statistics. Statist. Sci. 11 253᎐282. wThe ence in generalᎏis much in his debt. second half of the quote in my Section 4, their Section 3.2.2, goes on to criticize the Neyman᎐Pearson school. De Finetti is less kind to Fisher in the discussion following Savage’s REFERENCES Ž1976. article.x

Section 1 Sections 5 and 6 SAVAGE, L. J. H. Ž1976.. On rereading R. A. Fisher Žwith discus- DICICCIO, T. and EFRON, B. Ž1996.. Bootstrap confidence inter- sion. Ann Statist 4 441᎐500 ŽSavage says that Fisher’s . . . . vals Žwith discussion.. Statist. Sci. 11 189᎐228. ŽPresents work greatly influenced his seminal book on subjective and discusses the cd4 data of Figure 2. The bootstrap confi- Bayesianism Fisher’s great ideas are examined lovingly . dence limits in Table 1 were obtained by the BC method.. here, but not uncritically.. a YATES, F. and MATHER K. Ž1971.. Ronald Aylmer Fisher. In Collected Papers of R. A. Fisher ŽK. Mather, ed.. 1 23᎐52. Section 7 Univ. Adelaide Press. ŽReprinted from a 1963 Royal Statisti- REID, N. Ž1995.. The roles of conditioning in inference. Statist. cal Society memoir. Gives a nontechnical assessment of U Sci. 10 138᎐157. wThis is a survey of the p formula, what I Fisher’s ideas, personality and attitudes toward science.. called the magic formula following Ghosh’s terminology, and BOX, J. F. Ž1978.. The Life of a Scientist. Wiley, New York. ŽThis many other topics in conditional inference; see also the is both a personal and an intellectual biography by Fisher’s discussion Žfollowing the companion article. on pages daughter, a scientist in her own right and also an historian 173᎐199, in particular McCullagh’s commentary. Gives an of science, containing some unforgettable vignettes of preco- extensive bibliography.x cious mathematical genius mixed with a difficulty in ordi- EFRON, B. and HINKLEY, D. Ž1978.. Assessing the accuracy of the nary human interaction. The sparrow quote in Section 4 is maximum likelihood estimator: observed versus expected put in context on page 130.. Fisher information. Biometrika 65 457᎐487. ŽConcerns an- cillarity, approximate ancillarity and the assessment of ac- Section 2 curacy for a MLE..

FISHER, R. A. Ž1925.. Theory of statistical estimation Proc . . Section 8 Cambridge Philos. Soc. 22 200᎐225. ŽReprinted in the Mather collection, and also in the 1950 Wiley Fisher collec- EFRON, B. Ž1993.. Bayes and likelihood calculations from con- tion Contributions to Mathematical Statistics. This is my fidence intervals. Biometrika 80 3᎐26. ŽShows how ap- choice for the most important single paper in statistical proximate confidence intervals can be used to get good theory. A competitor might be Fisher’s 1922 Philosophical approximate confidence densities, even in complicated prob- Society paper, but as Fisher himself points out in the Wiley lems with a great many nuisance parameters.. collection, the 1925 paper is more compact and businesslike than was possible in 1922, and more sophisticated as well.. Section 9 EFRON B. Ž1995.. The statistical century. Royal Statistical Soci- ety News 22 Ž5. 1᎐2. ŽThis is mostly about the postwar boom EFRON, B. and GONG, G. Ž1983.. A leisurely look at the bootstrap, in statistical methodology and uses a different statistical the jackknife, and cross-validation. Amer. Statist. 37 36᎐48. triangle than Figure 8.. ŽThe chronic hepatitis example is discussed in Section 10 of this bootstrap᎐jackknife survey article.. Section 3 O’HAGAN, A. Ž1995.. Fractional Bayes factors for model compari- son Žwith discussion.. J. Roy. Statist. Soc. Ser. B 57 99᎐138. FISHER, R. A. Ž1934.. Two new properties of mathematical likeli- ŽThis paper and the ensuing discussion, occasionally rather hood. Proc. Roy. Soc. Ser. A 144 285᎐307. ŽConcerns two heated, give a nice sense of Bayesian model selection in the situations when fully efficient estimation is possible in finite Jeffreys tradition.. samples: one-parameter exponential families, where the KASS, R. and RAFTERY, A. Ž1995.. Bayes factors. J. Amer. Statist. MLE is a sufficient statistic, and location᎐scale families, Soc. 90 773᎐795. ŽThis review of Bayesian model selection where there are exhaustive ancillary statistics. Reprinted in features five specific applications and an enormous bibliog- the Mather and the Wiley collections.. raphy.. 114 B. EFRON Section 10 tice, most Bayesian analyses are performed with so-called noninformative priors. . . . ’’. EFRON, B. Ž1996.. Empirical Bayes methods for combining likeli- hoods 538᎐565 . J. Amer. Statist. Assoc. 91 . Section 12 Section 11 LINDLEY, D. V. Ž1974.. The future of statisticsᎏa Bayesian 21st century. In Proceedings of the Conference on Directions for KASS, R. and WASSERMAN, L. Ž1996.. The selection of prior distri- Mathematical Statistics. Univ. College, London. butions by formal rules. J. Amer. Statist. Assoc. 91 SMITH, A. Ž1995.. A conversation with Dennis Lindley. Statist. 1343᎐1370. ŽBegins ‘‘Subjectivism has become the dominant Sci. 10 305᎐319. ŽThis is a nice view of Bayesians and philosophical tradition for Bayesian inference. Yet in prac- Bayesianism. The 2020 prediction is attributed to de Finetti..

Comment D. R. Cox

I very much enjoyed Professor Efron’s eloquent There are two aspects, one that the individual be- and perceptive assessment of R. A. Fisher’s contri- longs to an ensemble or population in a proportion butions and of their current relevance. I am sure p of which which the event holds. The other is that that Professor Efron is right to attach outstanding it is not possible to recognize the individual as lying importance to Fisher’s ideas. in a subpopulation with a different proportion. As Professor Efron emphasizes, Fisher’s ideas are Fisher considered, it seems to me correctly, that so wide ranging that it is not feasible to cover them this enabled probability statements to be attached all in a single paper. The following outline notes are to an unknown parameter on the basis of a random in supplementation of rather than in disagreement sample, no other information being available, with- with the paper. out invoking a prior distribution. The snag is that Ž1. Fisher stressed the need for different modes such distributions cannot be manipulated or com- of attack on different types of inferential problem. bined by the ordinary laws of probability. Ž2. While his formal ideas deal with fully para- Ž4. The development of Fisher’s ideas on Baye- metric problems he gave the ‘‘exact’’ randomization sian inference can be traced by comparing the test based on the procedure used in design. The polemical remarks at the start of Design of Experi- status to him of such tests is not entirely clear. Did ments ŽFisher, 1935. with the more measured com- he regard them as reassurance for the faint of ments in Fisher Ž1956.. heart, timid about working assumptions of normal- Ž5. Fisher was of course an extremely powerful ity, or are they the preferred method of analysis to mathematician especially with distributional and which normal theory based results are often a con- combinatorial calculations. It helps us to under- venient approximation? Yates vigorously rejected stand his attitude to mathematical rigor to note the the second interpretation. The more important point remarks of Mahalanobis Ž1938., who wrote: is probably that Fisher recognized that randomiza- The explicit statement of a rigorous ar- tion indicated the appropriate analysis of variance, gument interested him but only on the that is, appropriate estimate of error This replaced . important condition that such explicit the need for special ad hoc assumptions of a new demonstration of rigor was needed. Me- linear model for each design . chanical drill in the technique of rigorous Ž3. A key to understanding some of the distinc- statement was abhorrent to him, partly tions between Fisherian and Neyman᎐Pearson for its pedantry, and partly as an inhibi- approaches lies in Fisher’s special notion of the tion to the active use of the mind. He felt meaning of the probability p of a ‘‘unique’’ event as it was more important to think actively, set out, for example, in Fisher Ž1956, pages 31᎐36. . even at the expense of occasional errors from which an alert intelligence would soon recover, than to proceed with per- D. R. Cox is an Honorary Fellow, Nuffield College, fect safety at a snail’s pace along well Oxford OX1 1NF, United Kingdom Že-mail: known paths with the aid of a most [email protected].. perfectly designed mechanical crutch. R. A. FISHER IN THE 21ST CENTURY 115 This is a comment on Fisher’s attitude when an reference to original sources becomes unnecessary undergraduate at Cambridge; it is tempting to think except for the historian of ideas. It is a measure of that Mahalanobis was using Fisher’s own words. the range and subtlety of Fisher’s ideas that it is Ž6. In some ways Fisher’s greatest methodologi- still fruitful to read parts at least of the two books cal contributions, the analysis of variance including mentioned above as well as the papers that Profes- discriminant analysis, and ideas about experimen- sor Efron mentions in his list of references. tal design, appear to owe relatively little directly to Ž8. I agree with Professor Efron that one key his ideas on general statistical theory. There are of current issue is the synthesis of ideas on modified course connections to be perceived subsequently, likelihood functions, empirical Bayes methods and for example, to sufficiency and transformation mod- some notion of reference priors. els, and in the case of randomization somewhat Ž9. I like Professor Efron’s triangle although on underdeveloped connections to conditioning and an- balance I would prefer a square, labeled by one axis cillary statistics. Fisher’s mastery of distribution that represents mathematical formulation and the theory was, however, obviously relevant, perhaps other that represents conceptual objective. For ex- most strikingly in his approach to the connection ample Fisher and Jeffreys were virtually identical between the distribution theory of multiple regres- in their objective although of course different in sion to that of linear discriminant analysis. their mathematics. Ž7. In the normal process of scientific develop- Ž10. Finally, in contemplating Fisher’s contribu- ment important notions get simplified and ab- tions, one must not forget that he was as eminent sorbed into the general ethos of the subject and as a geneticist as he was as a statistician.

Comment Rob Kass

Very nice, very provocativeᎏas I read Efron’s should play a role somehow. It is possible, as Cox version of Fisher’s profound influence on our disci- implied in his 1994 Statistical Science interview, pline, I found myself wondering whether statistics that we have not yet recognized how, and I have could have succeeded to this point so spectacularly, the sense that Efron shares this view. But the third across so many disciplines, if it had been based and perhaps most important place even we primarily on Bayesian logic rather than largely Bayesians currently find frequentist methods use- Fisherian frequentist logic. Without the historical ful is that they offer highly desirable shortcuts. counterfactual, the question might be stated this This is related to Efron’s point at the end of Section way: from a Bayesian point of view, when, if ever, is 4. The Fisherian frequentist methods for what are frequentist logic necessary? now relatively simple situations, such as analysis of I believe there are three places where frequentist variance, are certainly easy to use; it remains un- reasoning is essential to the success of our enter- clear whether standardized Bayesian methods could prise. First, we need goodness-of-fit assessments, or entirely replace them. Equally crucial, however, are what Dempster in his Fisherian ruminations has the more sophisticated data analytic methods, such called ‘‘postdictive inference’’ Žsee Gelman, Meng as modern nonparametric regression, which share a and Stern, 1996, and the discussion of it.. Second, big relative advantage in ease of use over their although many principles have been formulated for Bayesian counterparts. In short, despite my strong defining noninformative, or reference, priors, it preference for Bayesian thinking, based on our cur- seems that good behavior under repeated sampling rent understanding of inference, I cannot see how the next century or any other could be exclusively Bayesian. I have felt for a long time that the Bayesian versus frequentist contrast, while strictly a matter of logic, might more usefully be consid- Rob Kass is Professor and Head, Department of ered, metaphorically, a matter of languageᎏthat Statistics, Carnegie Mellon University, Pittsburgh, is, they are two alternative languages that are used Pennsylvania 15213 Že-mail: [email protected].. to grapple with uncertainty, with fluent speakers of 116 B. EFRON one being capable of arriving at an understanding neous sense Žfrom his comment at the end of Sec- of essentially any phenomenon that is understood tion 8. that Efron connects this to his unease with by fluent speakers of the other, despite there being the current situation and the need for a new no good translation of certain phrases. paradigmᎏto be furnished perhaps by our Einstein I suspect we all agree on Fisher’s greatness. I like Messiah. I would instead look toward a different to say Fisher was to statisics what Newton was to big new theoretical development. Bayesian and fre- physics. Continuing the analogy, Efron suggests quentist analyses based on parametric models are, that we need a statistical Einstein. But the real from a consumer’s point of view, really quite simi- question is whether it is possible to obtain a new lar. Bayesian nonparametrics, however, is in its framework that achieves the goal of fiducial infer- infancy and its connection with frequentist non- ence. The situation in statistical inference is beauti- parametrics almost nonexistent. I hope for a much fully peaceful and compelling for one-parameter more thorough and successful Bayesian nonpara- problems: reference Bayesian and Fisherian roads metric theory and a resulting deeper understanding converge to second-order, via the magic formula. of infinite-dimensional problems. Perhaps entirely When we go to the multiparameter world, however, new principles would have to be invoked to supple- the hope dims not only for a reconciliation of ment those of Fisher, Jeffreys, Neyman and de Bayesian and frequentist paradigms, but for any Finetti᎐Savage. If it happens, an increasingly non- satisfactory, unified approach in either a frequen- parametric future would not, in principle, move us tist or Bayesian framework, and we must wonder toward the frequentist vertex of Efron’s triangle. whether the world is simply depressingly messy. Rather, Bayesian inference would continue to play Indeed, some of the cautionary notes sounded in its illuminating foundational role, important new the 1996 JASA review paper I wrote with Larry methodology would be developed, and it might even Wasserman implicitly suggest that Žas pure subjec- turn out that there is a genuine, deep and detailed tivists are quick to argue. there may be no way sense in which frequentist methods could be consid- around the fundamental difficulties. ered shortcut substitutes for full-fledged Bayesian It is clear that statistical problems are becoming alternatives. much more complicated. I got the possibly erro-

Comment Ole E. Barndorff-Nielsen

It has been a pleasure reading Professor Efron’s imental techniques that allow the study of very far-ranging, thoughtful and valiant paper. I agree small quantum systems. with most of the views presented there and this Moreover, there is already now, in the physical contribution to the discussion of the paper consists literature, a substantial body of results on quantum of a number of disperse comments, mostly adding to analogues of Fisher information and on associated what has been mentioned in the paper. results of statistical differential geometry, in the Ž1. One, potentially major, omission from the pa- vein of Amari. per’s vision of Fisherian influence in the next cen- Ž2. It also seems pertinent to stress the impor- tury is the lack of discussion of the role that ideas tance of ‘‘pseudolikelihood,’’ that is, functions of and methods of statistical inference may have in part or all of the data and part or all of the parame- quantum mechanics. Such ideas and methods are ters that to a large extent can be treated as genuine likely to become of increasing importance, particu- likelihoods. Many of the most fruitful advances in larly in connection with the developments in exper- the second half of the 20th century centers around such functions. Ž3. My own view on optimality is, perhaps, some- what different from that of Bradley Efron in that I Ole E. Barndorff-Nielsen is Professor, Theoretical find that the focussing on optimality has to a con- Statistics, Institute of Mathematics, Aarhus Univer- siderable extent been to the detriment of statistical sity, 8000 AARHUS C, Denmark Že-mail: oebn@ development. The problems and danger stem from mi.aau.dk.. too narrow definitions of what is meant by optimal- R. A. FISHER IN THE 21ST CENTURY 117 ity. One prominent instance of how too much em- nuisance parameters, I wonder how this is to be phasis on a restricted concept of optimality may reconciled with the many counterexamples to lead astray from general principles of scientific en- Fisher’s ideaŽs. for fiducial inference under multi- quiry is provided by the upper storeys of the Ney- parameter distributions. man᎐Person test theory edifice. Ž5. Concerning model selection I suspect that In general, what good is it to have a procedure techniques in Fisherian spirit can be developed that is ‘‘optimal’’ if this optimality’’ does not fit in although this largely remains to be done. naturally with a comprehensive and integrated Ž6. On a historical note, neither Bayesian statis- methodological approach to scientific studies. tics nor Neyman᎐Pearson᎐Wald type theory has However, if I read correctly, Professor Efron and ever taken serious foothold in statistics in Den- I do not disagree strongly here. I am thinking in mark, and thinking along lines close to those of particular of his reference to the ‘‘spirit of reason- Fisher has been prominent throughout. able compromise.’’ The tradition does, in fact, go back to the latter Ž4. In relation to the statement Žat the end of part of the 19th century, in particular to Thiele, Section 8.1, on ‘‘The Confidence Density’’. concern- who, one might add, essentially worked with analy- ing approximate confidence intervals and approxi- sis of variance long before Fisher Žcf. Hald, 1981.. mate fiducial distributions in situations with many

Comment D. V. Hinkley

Using this paper as an excuse, I revisited the distributions, and so incompatible with the applica- Collected Papers ŽBennett, 1972. after a gap of 15 tion of Bayes’s theorem. Fisher’s view about sam- years and was struck again by how clear and con- pling models is still widely held, especially among cise the writing wasᎏusually firm and irreverent, those using non-Bayesian methods, and so model but often good-humored. uncertainty is a nonissue in nearly all of statistical Fisher’s work certainly colored my own view of education Žnot so in some sciences, however.. But it the principles and practices of statistics, especially may yet turn out to be one of the most important in the early 1980s. But I think that the biological topics in statistics. Note that model uncertainty is context of much of Fisher’s work made him wrongly the antithesis of model selection, which we are dismissive of Bayes’s theorem as a potential tool. It beginning to understand fairly well using non- is instructive to read the two-page note by Fisher Fisherian ideas. Ž1929., which answered Student’s suggestion that Other discussants may comment on Efron’s inter- normal-theory methods such as analysis of variance pretation of Fisher, so I shall note only that he may be extended to nonnormal data. Fisher says ‘‘I have overstate the contradictions. For example, random- never known difficulty to arise in biological work ization inference could be conditioned, when appro- from imperfect normality of variation, often though priate, by appropriate designᎏas happened with I have examined data for this particular cause of the Knight’s Move and Diagonal Squares; see Yates difficulty. . . . This is not to say that the deviation Ž1970, page 58. and Savage et al. Ž1976, page 464.. from wnormal-theory methodsx may not have a real Indeed the combination of Efron Ž1971. and Cox application in some technological work . . . ’’ Žwhat Ž1982. seems quintessentially Fisherian! ŽSimilar would Fisher have thought of mathematical fi- ideas extend into resampling᎐bootstrap᎐cross- nance, one wonders!.. I think that Fisher saw sam- validation analysis, but are not yet common.. pling distributions as concrete, logically distinct But what of Fisher’s possible influence on the from the weaker uncertainty characterizing prior future of statistics? With the bootstrap I am not sure. Certainly the parametric bootstrap involves a harmless use of computers to do Fisher’s calcula- D. V. Hinkley is Professor, Department of Statistics tions. The ingenious theoretical basis for the para- and Applied Probability, University of California, metric BCa bootstrap method ŽEfron, 1987. could Santa Barbara, California 93106-3110 Že-mail: almost have been written by Fisher himself. The [email protected].. idea of going beyond a simple standard error and 118 B. EFRON standardized estimate Žfor confidence intervals or portance with the spread of metaanalysis ideas and tests. follows Fisher; see Fisher Ž1928., where in methods. To paraphrase Efron ŽSection 10., the reference to the conventional standard error of the information about one subpopulation in another sample correlation Fisher quips ‘‘among the things subpopulation ‘‘does not have a clear Fisherian in- ‘which students have to know, but only a fool would terpretation.’’ But surely Fisher would have ad- use’.’’ Nevertheless, the alternative developments dressed this problem in a sensible way, so we need based on likelihood seem closer to continuation of to figure out how by going back to read Fisher. the Fisher approach. ŽAll of these methods do need Fisher’s major contributions may be the ideas further development to become easily and widely about designs to avoid bias in experimental results useable.. and to enable calculation of reliable measures of A major question is whether or not the nonpara- uncertainty. Compared to these, the fine points metric bootstrap methods have or will have Fishe- about whether confidence intervals are Bayesian or rian influence. I think not. Certainly when we get not seem relatively unimportant. Fisher did make into model selection, highly nonparametric regres- valuable contributions to the discussion of the roles sion such as regression trees ŽRipley, 1996., we are of probability in statistics, much of it usefully sur- more in the arena of Tukey than Fisher. Purely veyed by Lane Ž1980.. As far as Fisher’s theoretical empirical validation and assessment does not seem work goes, particularly the 1925 and 1934 master- to appear in Fisher’s work, possibly due to impossi- pieces, it has long seemed to me that Fisher may bility of practical calculations. have inadvertently prepared us for an acceptable The topic of empirical Bayes estimation, or ran- future of Bayesian methodology. dom-effect modelling, has become of increasing im-

Comment D. A. S. Fraser

It is a delight to read Brad’s thoughtful and at the Stanford statistics seminar describing how insightful overview giving deep credits to Fisher’s he had graphed the likelihood function for an ap- contributions to current and future statistics. Brad plied problem to obtain insight on a parameter of mentions large areas not covered by his review and interest, something that now would be viewed as a it is our loss that these areas such as randomiza- sensible step in a statistical analysis. Siegel was tion and experimental design have been omitted; widely challenged by the Statistics faculty then, they may also be among Fisher’s major contribu- that this cannot and should not be done, a frontal tions although much neglected in current practice. I rejection of a Fisherian idea, leaving Siegel feeling strongly endorse Brad’s positive approval of Fisher’s dispossessed of his credibility as a statistician. Also contributions and add just a few further approving that same fall ‘‘Fisher spoke at the Stanford Medi- remarks. cal School’’ as noted by Brad. I was visiting the Brad notes that at ‘‘the time of ŽBrad’s. education Stanford Statistics Department that fall and did Fisher had been reduced to a somewhat minor have notice of Fisher’s talk. The absence of Statis- figure in American academic statistics.’’ This seems tics faculty from Fisher’s talk was a measure of to be a substantial understatement particularly at Fisher’s influence at that time, perhaps the nadir of Stanford in 1961᎐1962 where, and when, Brad be- his influence. Brad does much to correct this and gan his graduate work in statistics. In the fall of give credit for the wealth we have inherited from 1961 a psychologist᎐statistician Sidney Siegel spoke Fisher. I prefer a somewhat different interpretation for ‘‘frequentist’’ than that indicated by ‘‘competing D. A. S. Fraser is Professor, Department of Statis- philosophies . . . : Bayesian; Neyman᎐Wald frequen- tics, Sidney Smith Hall, University of Toronto, tist; and Fisherian.’’ ‘‘Frequentist’’ seems to refer to Toronto, Canada M5S 3G3 Že-mail: dfraser@ut- the interpretation for the probabilities given by the stat.toronto.edu.. statistical model alone, thus embracing both Fishe- R. A. FISHER IN THE 21ST CENTURY 119 rian and decision theoretic philosophies. Bayesian which reproduces the conditional density except for statistics adds prior probabilities as the distin- a norming constant. By contrast the given formula guishing feature. But even here the distinction may Ž10. taken literally with the preamble describing now be blurring when we view such additions on a LŽ␪ . ‘‘as a function of ␪ with x fixed’’ at an ob- ‘‘what if’’ basis of temporarily enlarging the model, served value Ž␪ˆ0, A. gives in a way similar to the ‘‘what if’’ basis we often apply to the model itself 0 . g Ž␪ˆ y ␪ < A. Brad refers to empirical Bayes as ‘‘not a topic c , 0 that has had much Fisherian input.’’ It does seem, g Ž␪ˆ y ␪ˆ< A. however, to be quite consonant with Fisher’s view and indeed Fisher should be treated as the initiator which is the MLE density approximation only for 0 of what was latter called empirical Bayes. In his the observed data point Ž␪ˆ, A. s Ž␪ˆ , A.. This may 1956 book Statistical Methods and Scientific Infer- seem like a very technical point but the formulas ence Žpage 18f. Fisher considers the genetic origins are often cited as a way ‘‘for calculating f␪ Ž␪ˆ< A..’’ of an animal under investigation and appends to In practice they can rarely be used for such a the initial statistical model the theoretical᎐em- calculation as it would require the likelihood func- pirical probabilities concerning the origins of the tion to be available at points Ž␪ˆ, A. with fixed A animal. This is pure what-we-now-call empirical and varying ␪ˆ, and this would require an explicit Bayes and Fisher should be credited. From another ancillary and much computation. viewpoint it is just enlarging the statistical model For the location model case there is a formula to provide in some sense proper modeling. from Fisher that gives the conditional density of ␪ˆ 0 0 In referring to Fisher’s ‘‘devices’’ for solving prob- from just the observed likelihood L Ž␪ . s LŽ␪; ␪ˆ , lems, Brad sometimes uses the seemingly pejora- A.: tive term ‘‘trick.’’ Most of these devices did seem to 0 0 be tricks when introduced but hardly now. Perhaps f␪ Ž␪ˆ< A; ␪ . s cL Ž␪ y ␪ˆq ␪ˆ .. we both long for more of these ‘‘tricks’’ as guides to future devices. An extension of this is available for transformation Two formulas, Ž10. and Ž11., are recorded for ‘‘the models. conditional density’’ f␪ Ž␪ˆ< A. ‘‘of the MLE ␪ˆ given Brad mentions that ‘‘the magic formula can be wanx ancillary’’ A. These formulas use cryptic nota- used to generate approximate confidence intervals tion that makes them technically wrong without wwithx at least second order accurawcyx.’’ In fact for clarification to indicate additional dependencies on continuous models third order confidence intervals ␪ˆ. The context assumes that x is equivalent to are available in wide generality but require addi- Ž␪ˆ, A. so that density and likelihood have the form tional theory for approximate ancillaries.

f␪ Ž␪ˆ, A. and LŽ␪; ␪ˆ, A.; the amplified formulas I particularly welcome Brad’s positive views on would then appear as the prospects for fiducial methods. As one who has worked extensively on contexts where fiducial cal- LŽ␪ ; ␪ˆ, A. culations have good conventional properties Žthe f␪ Ž␪ˆ< A. s c , transformation group context. or are useful for in- Ž ˆ ˆ . L ␪ ; ␪ , A creasing the accuracy of a significance value Žusing LŽ␪ ; ␪ˆ, A. fiducial to eliminate a nuisance parameter., I find Ž1. f␪ Ž␪ˆ< A. s c it reassuring to see optimism elsewhere. Certainly LŽ␪ˆ; ␪ˆ, A. the typical statistician feels fiducial is wrong but in 1 2 d 2 r most cases he is also unfamiliar with the details or ˆ и y 2 log LŽ␪ ; ␪ , A.< ␪s␪ˆ the overlap with confidence methods. ½ d␪ 5 For the case where the dimension of the variable has been reduced to the dimension of the parame- for the location and general cases Žwith appropriate ter both fiducial and confidence make use quite accuracies.. In the location model case with generally of a pivotal quantity. The confidence pro- cedure chooses a 90% region on the pivot space and f Ž x; ␪ . g Ž␪ˆ ␪ < A. hŽ A. ␪ s y then inverts to the parameter space to get the the amplified version of Ž10. then gives confidence region; by contrast the fiducial inverts to the parameter space and then chooses the 90% region. The fiducial issues arise then from the in- g Ž␪ˆy ␪ < A. c , creased generality in the second procedure. Of g Ž0< A. course any observed 90% fiducial region has a cor- 120 B. EFRON responding 90% pivot region and is then of course reality of applications! And there are alternatives. I the 90% confidence region for that pivot region. The do applaud Brad’s boldness in inverting the pivotal controversial issues arise with simulations and rep- distribution and getting a confidence density; the etitions. Should these repetitions always be with fiducial whisper campaign has been too constrain- the same value of the parameter? This is not the ing on the profession.

Comment A. P. Dempster

Bradley Efron’s colorful lecture is fun to read. Understanding Fisher requires understanding Brad is generous and accurate in crediting Fisher’s that a probability in practice determines a ‘‘well- important mathematical foundations for sparking specified state of uncertainty’’ ŽFisher, 1958. about the ‘‘frequentist’’ school whose framework is evi- a specific situation in hand, a position identifiable dently deeply embedded in Brad’s psyche and work. as describing a kind of formal subjectivity that At the same time, I question whether he is suffi- complements rather than contradicts the conven- ciently in touch with Fisher’s thinking to do justice tional view of science as objective. Interpreting the to Fisher’s also remarkable contributions to the p-value of a significance test Fisher’s way ŽFisher, logic of inference. A balanced reading of the Fish- 1956, page 39., after the result is reported, requires er᎐Neyman disputes suggests that the history of ‘‘postdictive’’ assessment of such a formal subjec- 20th century statistics is not a linear path from tive probability ŽDempster, 1971.. As early as 1935, Fisher, to Neyman and ultimately to a modern Fisher recognized that ‘‘confidence intervals’’ are ‘‘frequentist’’ statistics whose main challenger is ‘‘only another way of saying that, by a certain test ‘‘Bayesianism’’ of either ‘‘subjective’’ or ‘‘objective’’ of significance, some kinds of hypothetical possibili- varieties. The hackneyed terms in quotes need fun- ties are to be rejected, while others are not’’ ŽBen- damental clarification and definition, laying out nett, 1990, page 187.. It may be, as Brad says, roles in science as actually practiced. ‘‘generally considered’’ that fiducial probability was Fisher’s conception of inference is built around Fisher’s ‘‘biggest blunder,’’ but should not be on the the interpretation of sampling distributions in rela- basis of the simplistic arguments accompanying tion to sample data. Sample data are frequency Neyman’s exasperated polemics, for example, ‘‘a data, and sampling distributions have natural fre- conglomeration of mutually inconsistent assertions, quency interpretations. But these roles for fre- not a mathematical theory’’ ŽNeyman, 1977, page quency are basic to any view of statistics as a 100.. Interpreting a fiducial probability shares with discipline and are far from making Fisher ‘‘fre- interpreting a Bayesian posterior probability an quentist’’ in Neyman’s sense. Fisher aimed to char- intended postanalysis predictive interpretation of acterize the information in data, whereas Neyman formal subjective probability that depends on ac- settled on a theory that guides a statistician’s cepting detailed prior assumptions, including behavior in choosing among procedures seen as specifically in the case of the fiducial argument that competing on the basis of long run operating char- the distribution of an identified pivotal quantity is acteristics. ‘‘Neyman᎐Wald’’ theory provides useful independent of, and hence unaffected by, the obser- insights, but creates a sterile view of practice. After vations. Both fiducial inferences and Bayesian in- a procedure is chosen and applied, how does one ferences stand or fall on case-by-case judgments of then think postanalysis about uncertainties? Fisher both models and independences assumed. gave interpretations of significance tests, likelihood Was Fisher a ‘‘lapsed Fisherian’’ in the smoking and fiducial intervals that whatever their strengths and lung cancer debate of the late 1950s? He was and weaknesses meet the question head-on. certainly correct to draw attention to the potential for misleading selection biases affecting causal in- ferences from observational data. Are Fisher’s in- novative ideas ‘‘like randomization inference and A. P. Dempster is Professor, Department of Statis- conditionality . . . contradictory’’? Not if one accepts tics, Harvard University, Cambridge, Massachu- that postdictive reasoning and predictive reasoning setts 02138 Že-mail: [email protected].. are complementary activities. Conditionality is im- R. A. FISHER IN THE 21ST CENTURY 121 portant for estimation, but irrelevant for interpret- deeper and harder questions than Neymanian the- ing the outcome of a randomization test. Also, in ory attempts to solve. I believe we should downplay Fisher’s view of estimation, conditionality is not the sound-bite criticisms coming from self-limiting the- blanket principle of Bayesian statistics. Indeed, se- oretical perspectives, either frequentist or Bayesian. lective conditioning was for Fisher a key to avoid- I do not wish to protest too much, however. Brad ing universal submission to Bayes. Fisher was and I agree in seeking to resurrect Fisher from his unsuccessful in obvious ways, but he addresses relative obscurity in American academic statistics.

Rejoinder Bradley Efron

One could scarcely ask for a clearer set of discus- tist computations often useful and compelling, even sions, or a more qualified set of discussants, leaving for those who prefer the Bayesian paradigm? me with very little to rejoin. The lecture itself was Fisher’s work provides the best answers so far to written quickly, in a few weeks, and barely revised that question but we are still a long way from for publication. My fear was that a careful revision having a satisfactory resolution. Kass offers a dif- would become just that, careful, and lose its force in ferent hope, via Bayesian nonparametrics, for an attempt to cover all Fisherian bases. The com- bridging the chasm. Going this route requires us to mentaries each add important ideas that I omitted, solve another deep problem, how to put uninforma- while scarcely overlapping with each other. Fisher’s tive prior distributions on high- Žor infinite-. dimen- world must be a very high-dimensional one! Let me sional parameter spaces. end here with just a few brief reactions to the Ž3. On the same point, Professor Barndorff-Niel- commentaries. sen asks how we can reconcile fiducial inference Ž1. Paragraph Ž3. of Professor Cox’s comments with high-dimensional estimation results like the concerns Fisher’s definition of probability, hinging James᎐Stein theorem. For a multivariate version of 2 on the absence of recognizable subsets. This is a formula Ž1., x ; NK Ž␪, ␴ I., the original fiducial 2 crucial point for the relevance of statistical argu- argument seems to say that ␪ < x ; NK Ž x, ␴ I., a ments to actual decision making, but it is not an terrible answer if we are interested in estimating easy criterion to apply. As an analog I like to 5␪ 5 2, for example. The confidence density approach imagine a field of intermixed orange and white gives sensible results in such situations. My 1993 poppies, with the proportion of orange poppies rep- Biometrika paper argues, a la Kass, that this is a resenting the probability of an event of interest. way of using frequentist methods to aid Bayesian Perhaps the orange proportion increases steadily calculations. from east to west or from north to south or diago- Ž4. Professor Hinkley notes the continued vitality nally from corner to corner. A model-building pro- of Tukey-style data analysis. In its purest form this gram based on logistic regression or CART amounts line of work is statistics without probability theory to a systematic search for the field’s inhomo- Žsee, e.g., Mosteller and Tukey’s 1977 book ‘‘Data geneities. Analysis and Regression’’. and as such I could not We are all used to basing statistical inferences on place it anywhere in the statistical triangle of Sec- the outputs of such programs, saying perhaps that tion 11. This is my picture’s fault of course, not the estimated probability of an orange poppy at the Tukey’s. Problem-driven areas like neural networks field’s center point is 0.20, but how does this ‘‘prob- often begin with a healthy burst of pure data analy- ability’’ relate to Fisher’s definition? A different sis before settling down to an accommodation with model, amounting to a different partitioning of the statistical theory. field, might give a different probability. My problem Ž5. I am grateful to Professor Fraser for present- here is not with recognizability, which is obviously ing a more intelligible version of the magic formula. an important idea, but with its application to con- This was the spot in the talk where ‘‘avoiding tech- texts where the statistician must choose which sub- nicalities’’ almost avoided coherency. ‘‘Trick’’ is a sets might be recognizable. positive word in my vocabulary, reflecting a Caltech Ž2. Professor Kass discusses one of the deepest education, and I only wish I could think of some problems of statistical philosophy: why are frequen- more Fisher-level tricks. Fisherian statistics was 122 B. EFRON not entirely missing from early 1960s Stanford: it be to a dangerous degree. It elicited Professor flourished in the medical school, where Lincoln Lehman’s objection quoted in Section 11. Perhaps it Moses and Rupert Miller ran the biostatistics train- would be better to follow Professor Cox’s preference ing program. for a square instead of a triangle. Ž6. I resisted the urge to locate the discussants in Statistical philosophy is best taken in small doses. the statistical triangle, perhaps because Professor The discussants have followed this rule Ževen if I Dempster has placed me closer to the frequentist have not. with six excellent brief essays. I am corner than I feel comfortable with. Dempster em- grateful to them, to COPSS for the invitation to phasizes an important point: that Fisher’s theory, give the Fisher lecture, and to the Editors of Statis- more so than Neyman’s, aimed at postdata inter- tical Science for arranging this discussion. pretations of uncertainty. This appeared in almost pure Bayesian form with fiducial inference, but usually was expressed less formally, as in his inter- ADDITIONAL REFERENCES pretations of significance test p-values. ŽSee Sec- BENNETT, J. H. Ž1972.. Collected Papers of R. A. Fisher. Univ. tion 20 of the 1954 edition of Statistical Methods Adelaide Press. for Research Workers.. BENNETT, J. H., ed. Ž1990.. Statistical Inference and Analysis. Selected Correspondence of R. A. Fisher. Oxford Univ. Press. One cannot consider Fisher without also dis- COX, D. R. Ž1982.. A remark on randomization in clinical trials. cussing Neyman, and he is likely to come out of Utilitas Math. 21A 245᎐252. ŽBirthday volume for F. Yates.. such discussions sounding like the bad guy. This is DEMPSTER, A. P. Ž1971.. Model searching and estimation in the most unfair. To invert Kass’s metaphor, Neyman logic of inference. In Foundations of Statistical Inference played Niels Bohr to Fisher’s Einstein. Nobody could ŽV. P. Godambe and D. A. Sprott, eds.. 56᎐78. Holt, Rinehart and Winston, Toronto. match Fisher’s intuition for statistical inference, EFRON, B. Ž1971.. Forcing a sequential experiment to be bal- but even the strongest intuition can sometimes go anced. Biometrika 58 403᎐417. astray. Neyman᎐Wald decision theory was an heroic EFRON, B. Ž1987.. Better bootstrap confidence intervals Žwith and largely successful attempt to put statistical discussion.. J. Amer. Statist. Assoc. 82 171᎐200. inference on a sound mathematical foundation. Too FISHER, R. A. Ž1928.. Correlation coefficients in meteorology. Nature 121 712. much mathematical soundness can be stultifying, FISHER, R. A. Ž1929.. Statistics and biological research. Nature as Barndorff-Nielsen points out, but too much intu- 124 266᎐267. ition can cross over into mysticism, and one can FISHER, R. A. Ž1935.. Design of Experiments. Oliver and Boyd, sympathize with Neyman’s frustration over Fisher’s Edinburgh. sometimes Delphic fiducial pronouncements. FISHER, R. A. Ž1956.. Statistical Methods and Scientific Infer- Oliver and Boyd, Edinburgh ŽSlightly revised versions Ž . Ž ence. . 7 Hinkley and Dempster and others at the appeared in 1958 and 1960.. lecture. questioned whether randomization infer- FISHER, R. A. Ž1958.. The nature of probability. Centennial Re- ence really contradicts the conditionality principle. view 2 261᎐274. I have to admit to using both ends of the contradic- GELMAN, A., MENG, X.-L. and STERN, H. Ž1996.. Posterior predic- tion myself without feeling any great amount of tive assessments of model fitness Žwith discussion.. Statist. Sinica 6 773᎐807. guilt, but the logical inconsistency still seems to be HALD, A. Ž1981.. T. N. Thiele’s contributions to statistics. Inter- there. Isn’t a randomly chosen experimental design nat. Statist. Rev. 49 1᎐20. an ancillary, and shouldn’t we condition on it rather LANE, D. A. Ž1980.. Fisher, Jeffreys and the nature of probabil- than basing inference on its random properties? ity. R. A. Fisher: An Appreciation Lecture Notes in Statist. 1. Dempster’s statement ‘‘Conditionality is important Springer, New York. MAHALANOBIS, P. C. Ž1938.. Professor Ronald Aylmer Fisher. for estimation, but irrelevant for interpreting the Sankhya 4 265᎐272. outcome of a randomization test’’ seems to just NEYMAN, J. Ž1977.. Frequentist probability and frequentist restate the problem. We condition on a randomly statistics. Synthese 36 97᎐131. selected sample size n Žto borrow Professor Cox’s RIPLEY, B. D. Ž1996.. Pattern Recognition and Neural Networks. famous example. whether the data is used for esti- Cambridge Univ. Press. SAVAGE, L. J. Ž1976.. On rereading R A Fisher Ann Statist 4 mation or testing ...... 441᎐483. Ž8. Lumping subjective and objective Bayesian- YATES, F. Ž1970.. Experimental Design. Selected Papers of Frank ism together simplified my presentation, but may- Yates, C.B.E., F.R.S. Griffin, London.