Statistical Inference: the Big Picture1 Robert E

Statistical Science 2011, Vol. 26, No. 1, 1–9 DOI: 10.1214/10-STS337 © Institute of Mathematical Statistics, 2011 Statistical Inference: The Big Picture1 Robert E. Kass Abstract. Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative “big picture” depiction. Key words and phrases: Bayesian, confidence, frequentist, statistical edu- cation, statistical pragmatism, statistical significance. 1. INTRODUCTION ple, these caricatures forfeit an opportunity to articulate a fundamental attitude of statistical practice. The protracted battle for the foundations of statis- Most modern practitioners have, I think, an open- tics, joined vociferously by Fisher, Jeffreys, Neyman, minded view about alternative modes of inference, but Savage and many disciples, has been deeply illuminat- are acutely aware of theoretical assumptions and the ing, but it has left statistics without a philosophy that matches contemporary attitudes. Because each camp many ways they may be mistaken. I would suggest that took as its goal exclusive ownership of inference, each it makes more sense to place in the center of our log- was doomed to failure. We have all, or nearly all, ical framework the match or mismatch of theoretical moved past these old debates, yet our textbook expla- assumptions with the real world of data. This, it seems nations have not caught up with the eclecticism of sta- to me, is the common ground that Bayesian and fre- tistical practice. quentist statistics share; it is more fundamental than ei- The difficulties go both ways. Bayesians have de- ther paradigm taken separately; and as we strive to fos- nied the utility of confidence and statistical signifi- ter widespread understanding of statistical reasoning, it cance, attempting to sweep aside the obvious success is more important for beginning students to appreciate of these concepts in applied work. Meanwhile, for their the role of theoretical assumptions than for them to re- part, frequentists have ignored the possibility of infer- cite correctly the long-run interpretation of confidence ence about unique events despite their ubiquitous oc- intervals. With the hope of prodding our discipline to currence throughout science. Furthermore, interpreta- right a lingering imbalance, I attempt here to describe tions of posterior probability in terms of subjective be- the dominant contemporary philosophy of statistics. lief, or confidence in terms of long-run frequency, give students a limited and sometimes confusing view of the 2. STATISTICAL PRAGMATISM nature of statistical inference. When used to introduce the expression of uncertainty based on a random sam- I propose to call this modern philosophy statistical pragmatism. I think it is based on the following attitudes: Robert E. Kass is Professor, Department of Statistics, Center for the Neural Basis of Cognition and Machine 1. Confidence, statistical significance, and posterior Learning Department, Carnegie Mellon University, probability are all valuable inferential tools. Pittsburgh, Pennsylvania 15213, USA (e-mail: 2. Simple chance situations, where counting argu- [email protected]). 1Discussed in 1012.14/11-STS337C, 1012.14/11-STS337A, ments may be based on symmetries that generate 1012.14/11-STS337D and 1012.14/11-STS337B; rejoinder at equally likely outcomes (six faces on a fair die; 52 1012.14/11-STS337REJ. cards in a shuffled deck), supply basic intuitions 1 2 R. E. KASS about probability. Probability may be built up to important but less immediately intuitive situations using abstract mathematics, much the way real numbers are defined abstractly based on intuitions com- ing from fractions. Probability is usefully calibrated in terms of fair bets: another way to say the probability of rolling a 3 with a fair die is 1/6isthat5to 1 odds against rolling a 3 would be a fair bet. 3. Long-run frequencies are important mathemati- FIG.1. The big picture of statistical inference. Statistical pro- cally, interpretively, and pedagogically. However, it cedures are abstractly defined in terms of mathematics but are used, in conjunction with scientific models and methods, to explain is possible to assign probabilities to unique events, observable phenomena. This picture emphasizes the hypothetical including rolling a 3 with a fair die or having a con- link between variation in data and its description using statistical fidence interval cover the true mean, without con- models. sidering long-run frequency. Long-run frequencies may be regarded as consequences of the law of large between models and data is a core component of both numbers rather than as part of the definition of prob- the art of statistical practice and the science of statis- ability or confidence. tical methodology. The purpose of Figure 1 is to shift 4. Similarly, the subjective interpretation of posterior the grounds for discussion. probability is important as a way of understanding Note, in particular, that data should not be confused Bayesian inference, but it is not fundamental to its with random variables. Random variables live in the use: in reporting a 95% posterior interval one need theoretical world. When we say things like, “Let us as- not make a statement such as, “My personal proba- sume the data are normally distributed” and we pro- bility of this interval covering the mean is 0.95.” ceed to make a statistical inference, we do not need 5. Statistical inferences of all kinds use statistical to take these words literally as asserting that the data models, which embody theoretical assumptions. As form a random sample. Instead, this kind of language illustrated in Figure 1, like scientific models, sta- is a convenient and familiar shorthand for the much tistical models exist in an abstract framework; to weaker assertion that, for our specified purposes, the distinguish this framework from the real world in- variability of the data is adequately consistent with habited by data we may call it a “theoretical world.” variability that would occur in a random sample. This Random variables, confidence intervals, and poste- linguistic amenity is used routinely in both frequentist rior probabilities all live in this theoretical world. and Bayesian frameworks. Historically, the distinction When we use a statistical model to make a statisti- between data and random variables, the match of the cal inference we implicitly assert that the variation model to the data, was set aside, to be treated as a exhibited by data is captured reasonably well by separate topic apart from the foundations of inference. the statistical model, so that the theoretical world But once the data themselves were considered random corresponds reasonably well to the real world. Con- variables, the frequentist-Bayesian debate moved into clusions are drawn by applying a statistical infer- the theoretical world: it became a debate about the ence technique, which is a theoretical construct, to best way to reason from random variables to inferences some real data. Figure 1 depicts the conclusions as about parameters. This was consistent with develop- straddling the theoretical and real worlds. Statisti- ments elsewhere. In other parts of science, the distinc- cal inferences may have implications for the real tion between quantities to be measured and their the- world of new observable phenomena, but in scien- oretical counterparts within a mathematical theory can tific contexts, conclusions most often concern scien- be relegated to a different subject—to a theory of er- tific models (or theories), so that their “real world” rors. In statistics, we do not have that luxury, and it implications (involving new data) are somewhat in- seems to me important, from a pragmatic viewpoint, to direct (the new data will involve new and different bring to center stage the identification of models with experiments). data. The purpose of doing so is that it provides dif- The statistical models in Figure 1 could involve large ferent interpretations of both frequentist and Bayesian function spaces or other relatively weak probabilistic inference, interpretations which, I believe, are closer to assumptions. Careful consideration of the connection the attitude of modern statistical practitioners. STATISTICAL INFERENCE 3 (A) (B) FIG.2. (A)BARS fits to a pair of peri-stimulus time histograms displaying neural firing rate of a particular neuron under two alternative experimental conditions.(B)The two BARS fits are overlaid for ease of comparison. A familiar practical situation where these issues arise monkey. The data came from a study ultimately re- is binary regression. A classic example comes from ported by Rollenhagen and Olson (2005), which in- a psychophysical experiment conducted by Hecht, vestigated the differential response of individual neu- Schlaer and Pirenne (1942), who investigated the sen- rons under two experimental conditions. Figure 2 dis- sitivity of the human visual system by constructing an plays BARS fits under the two conditions. One way apparatus that would emit flashes of light at very low to quantify the discrepancy between the fits is to esti- intensity in a darkened room. Those authors presented mate the drop in firing rate from peak (the maximal fir- light of varying intensities repeatedly to several sub- ing rate) to the trough immediately following the peak jects and determined, for each intensity, the proportion in each condition. Let us call these peak minus trough of times each subject would respond that he or she had differences, under the two conditions, φ1 and φ2.Us- seen a flash of light. For each subject the resulting data ing BARS, DiMatteo, Genovese and Kass reported a are repeated binary observations (“yes” perceived ver- posterior mean of φˆ1 − φˆ2 = 50.0 with posterior stan- sus “no” did not perceive) at each of many intensities dard deviation (±20.8).

Statistical Inference: the Big Picture1 Robert E

The Interpretation of Probability: Still an Open Issue? 1

Bayesian Inference in Linguistic Analysis Stephany Palmer 1

The Bayesian Approach to Statistics

Statistical Science Meets Philosophy of Science Part 2: Shallow Versus Deep Explorations

Hypothesis Testing and the Boundaries Between Statistics and Machine Learning

Statistical Inference: Paradigms and Controversies in Historic Perspective

Frequentist Interpretation of Probability

The Role of Randomization in Bayesian and Frequentist Design of Clinical Trial Berchialla Paola1, Gregori Dario2, Baldi Ileana2

PHIL 8670 (Fall, 2015): Philosophy of Statistics

Fusion Confusion? Comments on Nancy Reid: “BFF Four–Are We Converging?”

Principles of Statistical Inference

A Tutorial on Conducting and Interpreting a Bayesian ANOVA in JASP