Quick viewing(Text Mode)

Statistical Inference: the Big Picture

Statistical Inference: the Big Picture

arXiv:1106.2895v2 [stat.OT] 22 Jun 2011 ttsia neec:TeBgPicture Big Kass E. The Robert : Statistical 01 o.2,N.1 1–9 1, DOI: No. 26, Vol. 2011, Science Statistical 10.1214/11-STS337REJ 10.1214/11-STS337D [email protected] itbrh enyvna123 S e-mail: USA 15213, Machine Pennsylvania University, and Pittsburgh, Cognition Mellon Carnegie of Department, Basis Learning Neural the for Center iosocrec hogotsine Furthermore, ubiq- science. their throughout despite occurrence events uitous unique possibility for about the Meanwhile, inference ignored of have work. frequentists applied part, in their concepts these success obvious signifi- of the statistical aside sweep and to confidence attempting cance, of the nied the practice. with statistical our up of yet caught eclecticism debates, not old have these explanations all, past textbook have of moved We all, ownership failure. nearly exclusive to or goal doomed was its each as inference, philoso- took Because a camp attitudes. without each contemporary matches left that illumi- has phy deeply it been but has nating, disciples, many and Neyman, Savage Jeffreys, Fisher, by vociferously joined tics,

oetE asi rfso,Dprmn fStatistics, of Department Professor, is Kass E. Robert c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint 1 Statistics Mathematical of Institute h iclisg ohwy.Bysashv de- have Bayesians ways. both go difficulties The statis- of foundations the for battle protracted The icse in Discussed 10.1214/10-STS337 10.1214/11-STS337C .INTRODUCTION 1. nttt fMteaia Statistics Mathematical of Institute . , ut?Isgetta hlspycmail ihstatisti with i compatible to here philosophy ability a labeled our that leave suggest this I does sults? Where past. the of troversies Abstract. dcto,saitclpamts,saitclsignifica statistical , statistical education, phrases: depictio and picture” “big words alternative Key an propose I of I and process . the ference observed mischaracterize with often t courses models introductory emphasizes statistical and connect inclusive that is tions pragmatism Statistical ence. 01 o.2,N.1 1–9 1, No. 26, Vol. 2011, and . 10.1214/11-STS337B 2011 , ttsishsmvdbyn h rqets-aeincon- frequentist-Bayesian the beyond moved has Statistics ttsia pragmatism statistical , 10.1214/11-STS337A eone at rejoinder ; This . aein ofiec,feunit statistical frequentist, confidence, Bayesian, in , 1 evsa onainfrinfer- for foundation a as serves , oiatcneprr hlspyo statistics. right of the to philosophy describe discipline contemporary to here dominant our attempt I prodding imbalance, lingering of a intervals. hope correctly confidence the recite of With to interpretation them theoret- long-run for of the than role the assumptions be- appreciate ical for to important students more is ginning it understanding reasoning, widespread statistical foster of to strive we that as separately; and more ground taken is common either it the than share; fundamental is statistics frequentist me, and to Bayesian seems data. of it of world mismatch real This, the or center with match the assumptions the in theoretical framework place logical to sug- sense our would more of I makes mistaken. it that be may gest they ways and many assumptions inference, the theoretical of of aware acutely modes are alternative but about view minded fundamental practice. a statistical articulate of to attitude uncer- caricatures opportunity these of an , expression random forfeit a the on introduce based tainty inference. to statistical used sometimes of nature When and the limited of view a long-run confusing students of terms give in of , confidence terms or in belief, subjective posterior of interpretations attitudes: pragmatism cal otmdr rciinr ae hn,a open- an think, I have, practitioners modern Most rps ocl hsmdr philosophy modern this call to propose I .SAITCLPRAGMATISM STATISTICAL 2. hn ti ae ntefollowing the on based is it think I . nce. ttsia in- statistical a practice, cal n. trrtre- nterpret eassump- he ru that argue 1 statisti- 2 R. E. KASS

1. Confidence, statistical significance, and are all valuable inferential tools. 2. Simple chance situations, where counting argu- ments may be based on symmetries that generate equally likely outcomes (six faces on a fair die; 52 cards in a shuffled deck), supply basic intuitions about probability. Probability may be built up to important but less immediately intuitive situ- ations using abstract , much the way Fig. 1. The big picture of . real numbers are defined abstractly based on in- Statistical procedures are abstractly defined in tuitions coming from fractions. Probability is use- terms of mathematics but are used, in conjunc- tion with scientific models and methods, to explain fully calibrated in terms of fair bets: another way observable phenomena. This picture emphasizes the hypothet- to say the probability of rolling a 3 with a fair ical link between variation in data and its description using die is 1/6 is that 5 to 1 odds against rolling a 3 statistical models. would be a fair bet. 3. Long-run frequencies are important mathemati- conclusions most often concern scientific models cally, interpretively, and pedagogically. However, (or theories), so that their “real world” implica- it is possible to assign to unique tions (involving new data) are somewhat indirect events, including rolling a 3 with a fair die or (the new data will involve new and different ex- having a confidence interval cover the true , periments). without considering long-run frequency. Long-run The statistical models in Figure 1 could involve frequencies may be regarded as consequences of large function spaces or other relatively weak prob- the law of large numbers rather than as part of abilistic assumptions. Careful consideration of the the definition of probability or confidence. connection between models and data is a core com- 4. Similarly, the subjective interpretation of poste- ponent of both the art of statistical practice and rior probability is important as a way of under- the science of statistical . The purpose standing , but it is not funda- of Figure 1 is to shift the grounds for discussion. mental to its use: in reporting a 95% posterior Note, in particular, that data should not be con- interval one need not make a statement such as, fused with random variables. Random variables live “My personal probability of this interval covering in the theoretical world. When we say things like, the mean is 0.95.” “Let us assume the data are normally distributed” 5. Statistical of all kinds use statistical and we proceed to make a statistical inference, we models, which embody theoretical assumptions. do not need to take these words literally as asserting As illustrated in Figure 1, like scientific models, that the data form a random sample. Instead, this statistical models exist in an abstract framework; kind of language is a convenient and familiar short- to distinguish this framework from the real world hand for the much weaker assertion that, for our inhabited by data we may call it a “theoretical specified purposes, the variability of the data is ade- world.” Random variables, confidence intervals, quately consistent with variability that would occur and posterior probabilities all live in this theo- in a random sample. This linguistic amenity is used retical world. When we use a to routinely in both frequentist and Bayesian frame- make a statistical inference we implicitly assert works. Historically, the distinction between data and that the variation exhibited by data is captured random variables, the match of the model to the reasonably well by the statistical model, so that data, was set aside, to be treated as a separate topic the theoretical world corresponds reasonably well apart from the foundations of inference. But once to the real world. Conclusions are drawn by ap- the data themselves were considered random vari- plying a statistical inference technique, which is ables, the frequentist-Bayesian debate moved into a theoretical construct, to some real data. Fig- the theoretical world: it became a debate about the ure 1 depicts the conclusions as straddling the best way to reason from random variables to infer- theoretical and real worlds. Statistical inferences ences about parameters. This was consistent with may have implications for the real world of new developments elsewhere. In other parts of science, observable phenomena, but in scientific contexts, the distinction between quantities to be measured STATISTICAL INFERENCE 3

(A)

(B)

Fig. 2. (A) BARS fits to a pair of peri-stimulus time displaying neural firing rate of a particular neuron under two alternative experimental conditions. (B) The two BARS fits are overlaid for ease of comparison. and their theoretical counterparts within a mathe- would report perception with probability p = 0.5. matical theory can be relegated to a different sub- Because the data reported by Hecht et al. involved ject—to a theory of errors. In statistics, we do not fairly large samples, we would obtain essentially the have that luxury, and it seems to me important, same answer if instead we applied Bayesian methods from a pragmatic viewpoint, to bring to center stage to get an interval having 95% posterior probability. the identification of models with data. The purpose But how should such an interval be interpreted? of doing so is that it provides different interpreta- A more recent example comes from DiMatteo, Gen- tions of both frequentist and Bayesian inference, in- ovese and Kass (2001), who illustrated a new non- terpretations which, I believe, are closer to the atti- parametric regression method called Bayesian adap- tude of modern statistical practitioners. tive regression splines (BARS) by analyzing neu- A familiar practical situation where these issues ral firing rate data from inferotemporal cortex of arise is binary regression. A classic example comes a macaque monkey. The data came from a study ul- from a psychophysical conducted by timately reported by Rollenhagen and Olson (2005), Hecht, Schlaer and Pirenne (1942), who investigated which investigated the differential response of indi- the sensitivity of the human visual system by con- vidual neurons under two experimental conditions. structing an apparatus that would emit flashes of Figure 2 displays BARS fits under the two condi- light at very low intensity in a darkened room. Those tions. One way to quantify the discrepancy between authors presented light of varying intensities repeat- the fits is to estimate the drop in firing rate from edly to several subjects and determined, for each in- peak (the maximal firing rate) to the trough im- tensity, the proportion of times each subject would mediately following the peak in each condition. Let respond that he or she had seen a flash of light. us call these peak minus trough differences, under For each subject the resulting data are repeated bi- the two conditions, φ1 and φ2. Using BARS, DiMat- nary observations (“yes” perceived versus “no” did teo, Genovese and Kass reported a posterior mean not perceive) at each of many intensities and, these of φˆ1 φˆ2 = 50.0 with posterior days, the standard statistical tool to analyze such ( 20.−8). In follow-up work, Wallstrom, Liebner and data is . We might, for instance, Kass± (2008) reported very good frequentist cover- use maximum likelihood to find a 95% confidence in- age probability of 95% posterior probability inter- terval for the intensity of light at which the subject vals based on BARS for similar quantities under 4 R. E. KASS conditions chosen to mimic such experi- The main point here is that we do not need a long- mental data. Thus, a BARS-based posterior interval run interpretation of probability, but we do have could be considered from either a Bayesian or fre- to be reminded that the unique-event probability quentist point of view. Again we may ask how such of 0.95 remains a theoretical statement because it an inferential interval should be interpreted. applies to random variables rather than data. Let us turn to the Bayesian case. 3. INTERPRETATIONS Bayesian assumptions. Suppose X1, X2, Statistical pragmatism involves mildly altered in- ...,Xn form a random sample from a N(µ, 1) distri- 2 terpretations of frequentist and Bayesian inference. bution and the prior distribution of µ is N(µ0,τ ), with τ 2 1 and 49τ 2 µ . For definiteness I will discuss the paradigm case of ≫ 49 ≫ | 0| confidence and posterior intervals for a normal mean The posterior distribution of µ is normal, the pos- based on a sample of size n, with the standard de- terior mean becomes viation being known. Suppose that we have n = 49 τ 2 1/49 observations that have a sample mean equal to 10.2. µ¯ = 10.2+ µ 1/49 + τ 2 1/49 + τ 2 0 Frequentist assumptions. Suppose X , X , 1 2 and the posterior is ...,Xn are i.i.d. random variables from a normal − distribution with mean µ and standard deviation 1 1 v = 49 + σ = 1. In other words, suppose X1, X2,...,Xn form  τ 2  a random sample from a N(µ, 1) distribution. 2 1 2 but because τ 49 and 49τ µ0 we have Noting thatx ¯ = 10.2 and √49 = 7 we define the ≫ ≫ | | µ¯ 10.2 inferential interval ≈ and I = (10.2 2 , 10.2+ 2 ). − 7 7 1 v . The interval I may be regarded as a 95% confidence ≈ 49 interval. I now contrast the standard frequentist in- Therefore, the inferential interval I defined above terpretation with the pragmatic interepretation. has posterior probability 0.95. Frequentist interpretation of confidence Bayesian interpretation of posterior in- interval. Under the assumptions above, if we we- terval. Under the assumptions above, the prob- re to draw infinitely many random samples from a ability that µ is in the interval I is 0.95. N(µ, 1) distribution, 95% of the corresponding con- fidence intervals (X¯ 2 , X¯ + 2 ) would cover µ. Pragmatic interpretation of posterior in- − 7 7 terval. If the data were a random sample for Pragmatic interpretation of confidence which (2) holds, that is,x ¯ = 10.2, and if the as- interval. If we were to draw a random sample sumptions above were to hold, then the probability according to the assumptions above, the resulting that µ is in the interval I would be 0.95. This refers confidence interval (X¯ 2 , X¯ + 2 ) would have prob- to a hypothetical valuex ¯ of the − 7 7 ability 0.95 of covering µ. Because the random sam- X¯ , and because X¯ lives in the theoretical world the ple lives in the theoretical world, this is a theoretical statement remains theoretical. Nonetheless, we are statement. Nonetheless, substituting able to draw useful conclusions from the data as long ¯ as our theoretical world is aligned well with the real (1) X =x ¯ world that produced the data. together with Here, although the Bayesian approach escapes the (2) x¯ = 10.2 indirectness of confidence within the theoretical world, it cannot escape it in the world of data anal- we obtain the interval I, and are able to draw use- ysis because there remains the additional layer of ful conclusions as long as our theoretical world is identifying data with random variables. According aligned well with the real world that produced the to the pragmatic interpretation, the posterior is not, data. literally, a statement about the way the observed STATISTICAL INFERENCE 5 data relate to the unknown parameter µ because inference, and my claim is that Figure 1 is more ac- those objects live in different worlds. The language curate. For instance, in the psychophysical example of Bayesian inference, like the language of frequen- of Hecht, Schlaer and Pirenne discussed in Section tist inference, takes a convenient shortcut by blur- 2, there is no population of “yes” or “no” replies ring the distinction between data and random vari- from which a random sample is drawn. We do not ables. need to struggle to make an analogy with a simple The commonality between frequentist and Baye- random sample. Furthermore, any thoughts along sian inferences is the use of theoretical assumptions, these lines may draw attention away from the most together with a subjunctive statement. In both ap- important theoretical assumptions, such as indepen- proaches a statistical model is introduced—in the dence among the responses. Figure 1 is supposed to Bayesian case the prior distributions become part of remind students to look for the important assump- what I am here calling the model—and we may say tions, and ask whether they describe the variation that the inference is based on what would happen in the data reasonably accurately. if the data were to be random variables distributed One of the reasons the population and sample pic- according to the statistical model. This modeling as- ture in Figure 3 is so attractive pedagogically is that sumption would be reasonable if the model were to it reinforces the fundamental distinction between describe accurately the variation in the data. parameters and statistics through the terms popula- tion mean and sample mean. To my way of thinking, 4. IMPLICATIONS FOR TEACHING this terminology, inherited from Fisher, is unfortu- nate. Instead of “population mean” I would much It is important for students in introductory statis- prefer theoretical mean, because it captures better tics courses to see the subject as a coherent, princi- the notion that a theoretical distribution is being pled whole. Instructors, and textbook authors, may introduced, a notion that is reinforced by Figure 1. try to help by providing some notion of a “big pic- I have found Figure 1 helpful in teaching basic ture.” Often this is done literally, with an illustra- statistics. For instance, when talking about random tion such as Figure 3 (e.g., Lovett, Meyer and Thille, variables I like to begin with a set of data, where 2008). This kind of illustration can be extremely use- variation is displayed in a , and then say ful if referenced repeatedly throughout a course. that probability may be used to describe such vari- Figure 3 represents a standard story about statis- ation. I then tell the students we must introduce tical inference. Fisher introduced the idea of a ran- mathematical objects called random variables, and dom sample drawn from a hypothetical infinite pop- in defining them and applying the concept to the ulation, and Neyman and Pearson’s work encour- data at hand, I immediately acknowledge that this aged subsequent mathematical to drop is an abstraction, while also stating that—as the stu- the word “hypothetical” and instead describe statis- dents will see repeatedly in many examples—it can tical inference as analogous to simple random sam- be an extraordinarily useful abstraction whenever pling from a finite population. This is the concept the theoretical world of random variables is aligned that Figure 3 tries to get across. My complaint is well with the real world of the data. that it is not a good general description of statistical I have also used Figure 1 in my classes when de- scribing attitudes toward that statisti- cal training aims to instill. Specifically, I define sta- tistical thinking, as in the article by Brown and Kass (2009), to involve two principles: 1. Statistical models of regularity and variability in data may be used to express knowledge and un- certainty about a signal in the presence of noise, via . 2. Statistical methods may be analyzed to deter- mine how well they are likely to perform. Fig. 3. The big picture of statistical inference according to the standard conception. Here, a random sample is pictured Principle 1 identifies the source of statistical infer- as a sample from a finite population. ence to be the hypothesized link between data and 6 R. E. KASS

Fig. 4. A more elaborate big picture, reflecting in greater detail the process of statistical inference. As in Figure 1, there is a hypothetical link between data and statistical models but here the data are connected more specifically to their representation as random variables. statistical models. In explaining, I explicitly distin- to simple mathematical descriptions or “laws”) and guish the use of probability to describe variation unexplained variability, which is usually taken to be and to express knowledge. A probabilistic descrip- “noise.” The figure also includes the components tion of variation would be “The probability of rolling exploratory data analysis—EDA—and algorithms, a 3 with a fair die is 1/6” while an expression of but the main message of Figure 4, given by the la- knowledge would be “I’m 90% sure the capital of bels of the two big boxes, is the same as that in Wyoming is Cheyenne.” These two sorts of state- Figure 1. ments, which use probability in different ways, are sometimes considered to involve two different kinds 5. DISCUSSION of probability, which have been called “aleatory prob- According to my understanding, laid out above, ability” and “epistemic probability.” Bayesians mer- statistical pragmatism has two main features: it is ge these, applying the laws of probability to go from eclectic and it emphasizes the assumptions that con- quantitative description to quantified belief, but in nect statistical models with observed data. The prag- every form of statistical inference aleatory probabil- matic view acknowledges that both sides of the fre- ity is used, somehow, to make epistemic statements. quentist-Bayesian debate made important points. This is Principle 1. Principle 2 is that the same sorts Bayesians scoffed at the artificiality in using sam- of statistical models may be used to evaluate sta- pling from a finite population to motivate all of tistical procedures—though in the classroom I also inference, and in using long-run behavior to define explain that performance of procedures is usually characteristics of procedures. Within the theoretical investigated under varying circumstances. world, posterior probabilities are more direct, and For somewhat more advanced audiences it is pos- therefore seemed to offer much stronger inferences. sible to elaborate, describing in more detail the pro- Frequentists bristled, pointing to the subjectivity of cess trained statisticians follow when reasoning from prior distributions. Bayesians responded by treating data. A big picture of the overall process is given subjectivity as a virtue on the grounds that all in- in Figure 4. That figure indicates the hypothetical ferences are subjective yet, while there is a kernel connection between data and random variables, be- of truth in this observation—we are all human be- tween key features of unobserved mechanisms and ings, making our own judgments—subjectivism was parameters, and between real-world and theoretical never satisfying as a logical framework: an impor- conclusions. It further indicates that data display tant purpose of the scientific enterprise is to go be- both regularity (which is often described in theo- yond personal decision-making. Nonetheless, from retical terms as a “signal,” sometimes conforming a pragmatic perspective, while the selection of prior STATISTICAL INFERENCE 7 probabilities is important, their use is not so prob- only did we not emphasize this dichotomy but we lematic as to disqualify Bayesian methods, and in did not even mention the distinction between the looking back on history the introduction of prior approaches or their inferential interpretations. distributions may not have been the central bother- In fact, in my first publication involving analysis some issue it was made out to be. Instead, it seems of neural data (Olson et al., 2001) we reported more to me, the really troubling point for frequentists than a dozen different statistical analyses, some fre- has been the Bayesian claim to a philosophical high quentist, some Bayesian. Furthermore, methods from ground, where compelling inferences could be de- the two approaches are sometimes glued together in livered at negligible logical cost. Frequentists have a single analysis. For example, to examine several always felt that no such thing should be possible. neural firing-rate intensity functions λ1(t), . . . , λp(t), The difficulty begins not with the introduction of assumed to be smooth functions of time t, Behseta et prior distributions but with the gap between models al. (2007) developed a frequentist approach to test- 1 p and data, which is neither frequentist nor Bayesian. ing the hypothesis H0 : λ (t)= = λ (t), for all t, Statistical pragmatism avoids this irritation by ac- that incorporated BARS smoothing.· · · Such hybrids knowledging explicitly the tenuous connection be- are not uncommon, and they do not force a prac- tween the real and theoretical worlds. As a result, titioner to walk around with mutually inconsistent its inferences are necessarily subjunctive. We speak interpretations of statistical inference. Figure 1 pro- of what would be inferred if our assumptions were vides a general framework that encompasses both to hold. The inferential bridge is traversed, by both of the major approaches to methodology while em- frequentist and Bayesian methods, when we act as if phasizing the inherent gap between data and mod- the data were generated by random variables. In the eling assumptions, a gap that is bridged through normal mean example discussed in Section 4, the key subjunctive statements. The advantage of the prag- step involves the conjunction of the two equations matic framework is that it considers frequentist and (1) and (2). Strictly speaking, according to statisti- Bayesian inference to be equally respectable and al- cal pragmatism, equation (1) lives in the theoretical lows us to have a consistent interpretation, without world while equation (2) lives in the real world; the feeling as if we must have split personalities in or- bridge is built by allowingx ¯ to refer to both the der to be competent statisticians. More to the point, theoretical value of the random variable and the ob- this framework seems to me to resemble more closely served data value. what we do in practice: statisticians offer inferences In pondering the nature of statistical inference I couched in a cautionary attitude. Perhaps we might am, like others, guided partly by past and present even say that most practitioners are subjunctivists. sages (for an overview see Barnett, 1999), but also I have emphasized subjunctive statements partly by my own experience and by watching many col- because, on the frequentist side, they eliminate any leagues in action. Many of the sharpest and most need for long-run interpretation. For Bayesian meth- vicious Bayes-frequentist debates took place during ods they eliminate reliance on subjectivism. The the dominance of pure theory in academia. Statis- Bayesian point of view was articulated admirably ticians are now more inclined to argue about the by Jeffreys (see Robert, Chopin and Rousseau, 2009, extent to which a method succeeds in solving a data and accompanying discussion) but it became clear, analytic problem. Much statistical practice revolves especially from the of Savage and sub- around getting good estimates and standard errors sequent investigations in the 1970s, that the only in complicated settings where statistical uncertainty solid foundation for Bayesianism is subjective (see is smaller than the unquantified aggregate of many Kass and Wasserman, 1996, and Kass, 2006). Sta- other uncertainties in scientific investigation. In such tistical pragmatism pulls us out of that solipsistic contexts, the distinction between frequentist and quagmire. On the other hand, I do not mean to im- Bayesian logic becomes unimportant and con tem- ply that it really does not matter what approach porary practitioners move freely between frequentist is taken in a particular instance. Current attention and Bayesian techniques using one or the other de- frequently focuses on challenging, high-dimensional pending on the problem. Thus, in a review of statis- datasets where frequentist and Bayesian methods tical methods in neurophysiology in which my col- may differ. Statistical pragmatism is agnostic on leagues and I discussed both frequentist and Baye- this. Instead, procedures should be judged according sian methods (Kass, Ventura and Brown, 2005), not to their performance under theoretical conditions 8 R. E. KASS thought to capture relevant real-world variation in or situations described by statistical or quantum a particular applied setting. This is where our juxta- physics. I believe the chance-mechanism conception position of the theoretical world with the real world errs in declaring that data are assumed to be random earns its keep. variables, rather than allowing the gap of Figure 1 I called the story about statistical inference told to be bridged2 by statements such as (2). In say- by Figure 3 “standard” because it is imbedded in ing this I am trying to listen carefully to the voice many introductory texts, such as the path-breaking in my head that comes from the late David Freed- book by Freedman, Pisani and Purves (2007) and man (see Freedman and Ziesel, 1988). I imagine he the excellent and very popular book by Moore and might call crossing this bridge, in the absence of an McCabe (2005). My criticism is that the standard explicit chance mechanism, a leap of faith. In a strict story misrepresents the way statistical inference is sense I am inclined to agree. It seems to me, how- commonly understood by trained statisticians, por- ever, that it is precisely this leap of faith that makes traying it as analogous to simple random statistical reasoning possible in the vast majority of from a finite population. As I noted, the population applications. versus sampling terminology comes from Fisher, but Statistical models that go beyond chance mecha- I believe the conception in Figure 1 is closer to Fi- nisms have been central to statistical inference since sher’s conception of the relationship between theory Fisher and Jeffreys, and their role in reasoning has and data. Fisher spoke pointedly of a hypothetical been considered by many authors (e.g., Cox, 1990; infinite population, but in the standard story of Fig- Lehmann, 1990). An outstanding issue is the ex- ure 3 the “hypothetical” part of this notion—which tent to which statistical models are like the theo- is crucial to the concept—gets dropped (confer also retical models used throughout science (see Stan- Lenhard, 2006). I understand Fisher’s “hypotheti- ford, 2006). I would argue, on the one hand, that cal” to connote what I have here called “theoreti- they are similar: the most fundamental belief of any cal.” Fisher did not anticipate the co-option of his scientist is that the theoretical and real worlds are framework and was, in large part for this reason, aligned. On the other hand, as observed in Section horrified by subsequent developments by Neyman 2, statistics is unique in having to face the gap be- and Pearson. The terminology “theoretical” avoids tween theoretical and real worlds every time a model this confusion and thus may offer a clearer represen- 1 is applied and, it seems to me, this is a big part of tation of Fisher’s idea. what we offer our scientific collaborators. Statisti- We now recognize Neyman and Pearson to have cal pragmatism recognizes that all forms of statisti- made permanent, important contributions to sta- cal inference make assumptions, assumptions which tistical inference through their introduction of hy- can only be tested very crudely (with such things as pothesis testing and confidence. From today’s van- goodness-of-fit methods) and can almost never be tage point, however, their behavioral interpretation verified. This is not only at the heart of statistical seems quaint, especially when represented by their inference, it is also the great wisdom of our field. famous dictum, “We are inclined to think that as far as a particular hypothesis is concerned, no test ACKNOWLEDGMENTS based upon the theory of probability can by itself provide any valuable evidence of the truth or false- This work was supported in part by NIH Grant hood of that hypothesis.” Nonetheless, that inter- MH064537. The author is grateful for comments on pretation seems to have inspired the attitude be- an earlier draft by Brian Junker, Nancy Reid, Steven hind Figure 3. In the extreme, one may be led to Stigler, Larry Wasserman and Gordon Weinberg. insist that statistical inferences are valid only when some chance mechanism has generated the data. The problem with the chance-mechanism conception is REFERENCES that it applies to a rather small part of the real Barnett, V. (1999). Comparative Statistical Inference, 3rd world, where there is either actual random sampling ed. Wiley, New York. MR0663189

1Fisher also introduced populations partly because he used 2Because probability is introduced with the goal of drawing long-run frequency as a foundation for probability, which sta- conclusions via statistical inference, it is, in a philosophical tistical pragmatism considers unnecessary. sense, “instrumental.” See Glymour (2001). STATISTICAL INFERENCE 9

Behseta, S., Kass, R. E., Moorman, D. and Olson, C. R. Lehmann, E. L. (1990). Model specification: The views of (2007). Testing equality of several functions: Analysis of Fisher and Neyman, and later developments. Statist. Sci. single-unit firing rate curves across multiple experimental 5 160–168. MR1062574 conditions. Statist. Med. 26 3958–3975. MR2395881 Lenhard, J. (2006). Models and statistical inference: The Brown, E. N. and Kass, R. E. (2009). What is statistics? controversy between Fisher and Neyman–Pearson. British 57 (with discussion). Amer. Statist. 63 105–123. J. Philos. Sci. 69–91. MR2209772 Lovett, M., Meyer, O. Thille, C. Cox, D. R. (1990). Role of models in statistical analysis. and (2008). The open learning initiative: Measuring the effectiveness of the OLI Statist. Sci. 5 169–174. MR1062575 statistics course in accelerating student learning. J. Inter- DiMatteo, I., Genovese, C. R. and Kass, R. E. (2001). act. Media Educ. 14. Bayesian curve-fitting with free-knot splines. Biometrika Moore, D. S. and McCabe, G. (2005). Introduction to the 88 1055–1071. MR1872219 Practice of Statistics, 5th ed. W. H. Freeman, New York. Freedman, D., Pisani, R. and Purves, R. (2007). Statistics, Olson, C. R., Gettner, S. N., Ventura, V., Carta, R. 4th ed. W. W. Norton, New York. and Kass, R. E. (2001). Neuronal activity in macaque Freedman, D. and Ziesel (1988). From mouse-to-man: The supplementary eye field during planning of saccades in re- quantitative assessment of cancer risks (with discussion). sponse to pattern and spatial cues. J. Neurophysiol. 84 Statist. Sci. 3 3–56. 1369–1384. Glymour, C. (2001). Instrumental probability. Monist 84 Robert, C. P., Chopin, N. and Rousseau, J. (2009). 284–300. Harold Jeffreys’ theory of probability revisited (with dis- Hecht, S., Schlaer, S. and Pirenne, M. H. (1942). Energy, cussion). Statist. Sci. 24 141–194. MR2655841 quanta and vision. J. Gen. Physiol. 25 819–840. Rollenhagen, J. E. and Olson, C. R. (2005). Low- Kass, R. E. (2006). Kinds of Bayesians (comment on articles frequency oscillations arising from competitive interactions by Berger and by Goldstein). Bayesian Anal. 1 437–440. between visual stimuli in macaque inferotemporal cortex. 94 MR2221277 J. Neurophysiol. 3368–3387. Stanford, P. K. (2006). Exceeding Our Grasp. Oxford Univ. Kass, R. E., Ventura, V. and Brown, E. N. (2005). Sta- Press. tistical issues in the analysis of neuronal data. J. Neuro- Wallstrom, G., Liebner, J. and Kass, R. E. (2008). physiol. 94 8–25. An implementation of Bayesian adaptive regression splines Kass, R. E. Wasserman, L. A. and (1996). The selection of (BARS) in C with S and R wrappers. J. Statist. Software prior distributions by formal rules. J. Amer. Statist. Assoc. 26 1–21. 91 1343–1370. MR1478684