Hop://Cos.Io/ Brian Nosek University of Virginia Hop://Briannosek.Com
Total Page:16
File Type:pdf, Size:1020Kb
hp://cos.io/ Brian Nosek University of Virginia h"p://briannosek.com/ General Article Psychological Science XX(X) 1 –8 False-Positive Psychology: Undisclosed © The Author(s) 2011 Reprints and permission: sagepub.com/journalsPermissions.nav Flexibility in Data Collection and Analysis DOI: 10.1177/0956797611417632 Allows Presenting Anything as Significant http://pss.sagepub.com Joseph P. Simmons1, Leif D. Nelson2, and Uri Simonsohn1 1The Wharton School, University of Pennsylvania, and 2Haas School of Business, University of California, Berkeley Abstract In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Keywords methodology, motivated reasoning, publication, disclosure Received 3/17/11; Revision accepted 5/23/11 Our job as scientists is to discover truths about the world. We Which control variables should be considered? Should spe- generate hypotheses, collect data, and examine whether or not cific measures be combined or transformed or both? the data are consistent with those hypotheses. Although we It is rare, and sometimes impractical, for researchers to aspire to always be accurate, errors are inevitable. make all these decisions beforehand. Rather, it is common Perhaps the most costly error is a false positive, the incor- (and accepted practice) for researchers to explore various ana- rect rejection of a null hypothesis. First, once they appear in lytic alternatives, to search for a combination that yields “sta- the literature, false positives are particularly persistent. tistical significance,” and to then report only what “worked.” Because null results have many possible causes, failures to The problem, of course, is that the likelihood of at least one (of replicate previous findings are never conclusive. Furthermore, many) analyses producing a falsely positive finding at the 5% because it is uncommon for prestigious journals to publish null level is necessarily greater than 5%. findings or exact replications, researchers have little incentive This exploratory behavior is not the by-product of mali- to even attempt them. Second, false positives waste resources: cious intent, but rather the result of two factors: (a) ambiguity They inspire investment in fruitless research programs and can in how best to make these decisions and (b) the researcher’s lead to ineffective policy changes. Finally, a field known for desire to find a statistically significant result. A large literature publishing false positives risks losing its credibility. documents that people are self-serving in their interpretation In this article, we show that despite the nominal endorse- ment of a maximum false-positive rate of 5% (i.e., p ≤ .05), current standards for disclosing details of data collection and Corresponding Authors: Joseph P. Simmons, The Wharton School, University of Pennsylvania, 551 analyses make false positives vastly more likely. In fact, it is Jon M. Huntsman Hall, 3730 Walnut St., Philadelphia, PA 19104 unacceptably easy to publish “statistically significant” evi- E-mail: [email protected] dence consistent with any hypothesis. Leif D. Nelson, Haas School of Business, University of California, Berkeley, The culprit is a construct we refer to as researcher degrees Berkeley, CA 94720-1900 of freedom. In the course of collecting and analyzing data, E-mail: [email protected] researchers have many decisions to make: Should more data Uri Simonsohn, The Wharton School, University of Pennsylvania, 548 be collected? Should some observations be excluded? Which Jon M. Huntsman Hall, 3730 Walnut St., Philadelphia, PA 19104 conditions should be combined and which ones compared? E-mail: [email protected] Electronic copy available at: http://ssrn.com/abstract=1850704 Open access, freely available online Essay Why Most Published Research Findings Are False John P. A. Ioannidis factors that infl uence this problem and is characteristic of the fi eld and can Summary some corollaries thereof. vary a lot depending on whether the There is increasing concern that most fi eld targets highly likely relationships Modeling the Framework for False or searches for only one or a few current published research fi ndings are Positive Findings false. The probability that a research claim true relationships among thousands is true may depend on study power and Several methodologists have and millions of hypotheses that may bias, the number of other studies on the pointed out [9–11] that the high be postulated. Let us also consider, same question, and, importantly, the ratio rate of nonreplication (lack of for computational simplicity, of true to no relationships among the confi rmation) of research discoveries circumscribed fi elds where either there relationships probed in each scientifi c is a consequence of the convenient, is only one true relationship (among fi eld. In this framework, a research fi nding yet ill-founded strategy of claiming many that can be hypothesized) or is less likely to be true when the studies conclusive research fi ndings solely on the power is similar to fi nd any of the conducted in a fi eld are smaller; when the basis of a single study assessed by several existing true relationships. The effect sizes are smaller; when there is a formal statistical signifi cance, typically pre-study probability of a relationship greater number and lesser preselection for a p-value less than 0.05. Research being true is R⁄(R + 1). The probability of tested relationships; where there is is not most appropriately represented of a study fi nding a true relationship greater fl exibility in designs, defi nitions, and summarized by p-values, but, refl ects the power 1 − β (one minus outcomes, and analytical modes; when unfortunately, there is a widespread the Type II error rate). The probability there is greater fi nancial and other notion that medical research articles of claiming a relationship when none interest and prejudice; and when more truly exists refl ects the Type I error teams are involved in a scientifi c fi eld It can be proven that rate, α. Assuming that c relationships in chase of statistical signifi cance. most claimed research are being probed in the fi eld, the Simulations show that for most study expected values of the 2 × 2 table are designs and settings, it is more likely for fi ndings are false. given in Table 1. After a research a research claim to be false than true. fi nding has been claimed based on Moreover, for many current scientifi c should be interpreted based only on achieving formal statistical signifi cance, fi elds, claimed research fi ndings may p-values. Research fi ndings are defi ned the post-study probability that it is true often be simply accurate measures of the here as any relationship reaching is the positive predictive value, PPV. prevailing bias. In this essay, I discuss the formal statistical signifi cance, e.g., The PPV is also the complementary implications of these problems for the effective interventions, informative probability of what Wacholder et al. conduct and interpretation of research. predictors, risk factors, or associations. have called the false positive report “Negative” research is also very useful. probability [10]. According to the 2 “Negative” is actually a misnomer, and × 2 table, one gets PPV = (1 − β)R⁄(R ublished research fi ndings are the misinterpretation is widespread. − βR + α). A research fi nding is thus sometimes refuted by subsequent However, here we will target evidence, with ensuing confusion relationships that investigators claim Citation: Ioannidis JPA (2005) Why most published P exist, rather than null fi ndings. research fi ndings are false. PLoS Med 2(8): e124. and disappointment. Refutation and As has been shown previously, the controversy is seen across the range of Copyright: © 2005 John P. A. Ioannidis. This is an research designs, from clinical trials probability that a research fi nding open-access article distributed under the terms and traditional epidemiological studies is indeed true depends on the prior of the Creative Commons Attribution License, probability of it being true (before which permits unrestricted use, distribution, and [1–3] to the most modern molecular reproduction in any medium, provided the original research [4,5]. There is increasing doing the study), the statistical power work is properly cited. concern that in modern research, false of the study, and the level of statistical Abbreviation: PPV, positive predictive value fi ndings may be the majority or even signifi cance [10,11]. Consider a 2 × 2 the vast majority of published research table in which research fi ndings are John P. A. Ioannidis is in the Department of Hygiene compared against the gold standard and Epidemiology, University of Ioannina School of claims [6–8]. However, this should Medicine, Ioannina, Greece, and Institute for Clinical not be surprising. It can be proven of true relationships in a scientifi c Research and Health Policy Studies, Department of that most claimed research fi ndings fi eld. In a research fi eld both true and Medicine, Tufts-New England Medical Center, Tufts false hypotheses can be made about University School of Medicine, Boston, Massachusetts, are false. Here I will examine the key United States of America. E-mail: [email protected] the presence of relationships.