Replication Statistics

07-Osborne (Best)-45409.qxd 10/9/2007 4:31 PM Page 103 7 REPLICATION STATISTICS PETER R. KILLEEN We come finally, however, to the relation of the ideal theory to real world, or “real” probability....To someone who wants [applications, a consistent mathematician] would say that the ideal system runs parallel to the usual theory: “If this is what you want, try it: it is not my business to justify application of the system; that can only be done by philosophizing; I am a mathematician.” In practice he is apt to say: “try this; if it works that will justify it.” But now he is not merely philosophizing; he is committing the characteristic fallacy. Inductive experience that the system works is not evidence. Littlewood (1953, p. 73) PROBABILITY AS A MODEL SYSTEM others, it matured as a coherent model system, inheriting most features of the earlier versions of For millennia, Euclidean geometry was a state- the probability calculus. ment of fact about the world order. Only in the The abstraction of model systems from the 19th century did it come to be recognized world permits their development as coherent, instead as a model system—an “ideal theory”— clear, and concise logics. But the abstraction that worked exceedingly well when applied to has another legacy: the eventual need for scien- many parts of the real world. It then stepped tists to reconnect the model system to the down from a truth about the world to its current empirical world. That such rapprochement is place as first among equals as models of the even possible is amazing; it stimulated Wigner’s world—the most useful of a cohort of geome- well-known allusion to “the unreasonable tries, each of differential service in particular effectiveness of mathematics in describing the cases, on spherical surfaces and relativistic uni- world.” Realizing such “unreasonably effective” verses and fractal percolates. In like manner, descriptions, however, can present reasonably probability theory was born as an explanation of formidable difficulties—difficulties that are the contingent world—“‘real’ probability”— sometimes overcome only by fiat, as noted by and, with the work of Kolmogorov among many Littlewood, a mathematician of no mean ability, Author’s Note: The research was supported by National Science Foundation Grant IBN 0236821 and National Institute of Mental Health Grant 1R01MH066860. 103 07-Osborne (Best)-45409.qxd 10/9/2007 4:31 PM Page 104 104 BEST PRACTICES IN RESEARCH DESIGN and M. Kline (1980), a scholar of comparable Connecting Probability to Data acuity. The toolbox that helps us apply the You are faced with two columns of numbers, “ideal theory” of probability to scientific ques- data collected from two groups of subjects. tions is called inferential statistics. These tools What do you want to know? Not, of course, are being continually sharpened, with new “whether there’s a significant difference between designs replacing old. them.” If they are identical, you would have Intellectual ontogeny recapitulates its cul- looked for the clerical error. If they are different, tural phylogeny. Just as we must outgrow naive they are different. You can review them 100 physics, we must outgrow naive statistics. The times, and they will continue to be different, former is an easier transition than the latter. hopefully 100 times; p = 1.0. “Significantly dif- Not only must we as students of contingency ferent,”you might emphasize, irritated. But what deal with the gamblers’ fallacies and exchange does that mean? “That the probability that they paradoxes; we must also cope with the acade- would be so different by chance is less than 5%,” mics’ fallacies and statistical paradoxes that are you recite. OK. Progress. Now we just need clar- visited upon us as idols of our theater, the uni- ification of probability, so, and chance. versity classroom. The first step, one already taken by most readers of this volume, is to rec- ognize that we deal with model systems, some Probability. Probability theory is a deductive cal- more useful than others, not with truths about culus. One starts with probability generators, real things. The second step is to understand such as coins or cards or dice, and makes deduc- the character of the most relevant tools for tions about their behavior. The premises are pre- their application, their strengths and weak- cise: coins with a probability of heads of .50, nesses, and attempt to determine in which perfectly balanced dice, perfectly shuffled cards. cases their marriage to data is one of mere Then elegant theorems solve problems such as convenience and in which it is blessed with “Given an unbiased coin, what is the probability a deeper, Wignerian resonance. That step of flipping 6 heads in a row?” But scientists are requires us to remain appreciative but critical never given such ideal objects. Their modal craftsmen. It requires us to look through the inferences are inductions, not deductions: An halo of mathematics that surrounds all statisti- informant gives them a series of outcomes from cal inference to assess the goodness of fit flipping a coin that landed heads six times in a between tool and task, to ask of each statistical row, and they must determine what probability technique whether it gives us leverage or just of heads should be assigned to the coin. They can adds decoration. solve this mystery either as Dr. Watson or as Mr. This chapter briefly reviews—briefly, because Holmes in the cherished tale, “The Case of the there are so many good alternative sources (e.g., Hypothesis That Had No Teeth.” As you well Harlow, Mulaik, & Steiger, 1997; R. B. Kline, remember, Dr. Watson studiously purged his 2004)—the most basic statistical technique we mind of all prior biases and opined that the use, null hypothesis statistical testing (NHST) probability of the coin being fair was manifestly and its limits. It then describes an alternative (½)6, < .025 and, further, that the best estimate of –6 statistic, prep, that predicts replicability. We the probability of a heads was 1 – 2 . Mr. Holmes remain mindful of Littlewood’s (1953) observa- stuffed his pipe; examined the coin; spun it; tion that “inductive experience that the system asked about its origin, how the coin was released works is not evidence [that it is true].” But then and caught, and how many sequences were Littlewood was a mathematician, not a scientist. required to get that run of six; and then inquired The search for truth about parameters has often about the bank account of the informant and his befuddled the progress of science, which recog- recent associates. Dr. Watson objected that that nizes simpler goals as well: to understand and was going beyond the information given; in any predict. If we “try [a tool, and] it works,”that can case, how could one ever combine all those diverse be very good news and may constitute a signifi- clues into a probability statement that was not cant advance over what has been. So, try this intrinsically subjective? “Elementary,”Mr. Holmes new tool, and see if it works for your inferential observed, “probability theory this is not, my dear problems. Watson; nor is it deduction. When I infer a state 07-Osborne (Best)-45409.qxd 10/9/2007 4:31 PM Page 105 Replication Statistics 105 of nature from evidence, the more evidence the were drawn from a large, normally distributed better I infer. My colleague shall explain how to population of data similar to those of the concatenate evidence in a later chapter of this control group. Increase the power of the test by sage book I saw you nodding over.” estimating the population variance from the How do we infer probability from a situation variances of both the experimental and control in which there is no uncertainty—the six heads groups. Then a theoretical sampling distribu- in a row, last week’s soccer cup, your two tion, such as the t distribution, can be directly columns of experimental data? There are two used to infer how often so deviant an outcome root metaphors for probability: For frequentists, as what you measured would have happened by probability is the long-run relative frequency chance under repeated sampling. This set of of an outcome; it does not apply to novel events tactics is the paradigmatic modus operandi (no long run) or to accomplished events (faits for statistical inference. In modern applications, accomplis support no probability other than the theoretical sampling distribution may be unity). For Bayesians, probability is the relative replaced with an empirical one, obtained by odds that an individual gives to an outcome. Monte Carlo elaboration of the original empiri- You do not have resources or interest to recon- cal distribution function (e.g., Davison & duct your experiment thousands of times, to esti- Hinkley, 1997; Mooney & Duval, 1993). mate the relative frequency of two means being so So in this standard scenario, probability different. And even if you did, you’d just be left means the long-run relative frequency that you with a much larger sample of accomplished data. stipulate in your test of the behavior of an ideal This seems to eliminate the frequentist solution. object. It is against this that you will test your On the other hand, the odds that you give to your data. Unlike a Bayesian probability, the parame- outcome will be different from the odds your ters tested (e.g., the null hypothesis that the reviewers or editor or your significant other gives means of the two populations are equal) are not to your outcome. This eliminates any unique inferred from data. Indeed, authorities such as Bayesian probability.

Load more