Why Replications Do Not Fix the Reproducibility Crisis: a Model And

Total Page:16

File Type:pdf, Size:1020Kb

Why Replications Do Not Fix the Reproducibility Crisis: a Model And 1 63 2 Why Replications Do Not Fix the Reproducibility 64 3 65 4 Crisis: A Model and Evidence from a Large-Scale 66 5 67 6 Vignette Experiment 68 7 69 8 Adam J. Berinskya,1, James N. Druckmanb,1, and Teppei Yamamotoa,1,2 70 9 71 a b 10 Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139; Department of Political Science, Northwestern 72 University, 601 University Place, Evanston, IL 60208 11 73 12 This manuscript was compiled on January 9, 2018 74 13 75 14 There has recently been a dramatic increase in concern about collective “file drawer” (4). When publication decisions depend 76 15 whether “most published research findings are false” (Ioannidis on factors beyond research quality, the emergent scientific 77 16 2005). While the extent to which this is true in different disciplines re- consensus may be skewed. Encouraging replication seems to 78 17 mains debated, less contested is the presence of “publication bias,” be one way to correct a biased record of published research 79 18 which occurs when publication decisions depend on factors beyond resulting from this file drawer problem (5–7). Yet, in the 80 19 research quality, most notably the statistical significance of an ef- current landscape, one must also consider potential publication 81 20 fect. Evidence of this “file drawer problem” and related phenom- biases at the replication stage. While this was not an issue 82 21 ena across fields abounds, suggesting that an emergent scientific for OSC since they relied on over 250 scholars to replicate the 83 22 consensus may represent false positives. One of the most noted 100 studies, the reality is that individual replication studies 84 23 responses to publication bias has involved increased emphasis on also face a publication hurdle. 85 24 replication where a researcher repeats a prior research study with In what follows, we present a model and a survey exper- 86 25 different subjects and/or situations. But do replication studies suf- iment that captures the publication process for initial and 87 26 fer themselves from publication bias, and if so, what are the implica- replication studies. In so doing, we introduce a distinct type of 88 27 tions for knowledge accumulation? In this study, we contribute to the publication bias, what we call “gotcha bias.” This bias occurs 89 28 emerging literature on publication and replication practices in sev- only for replication studies such that the likelihood of publi- 90 29 eral important ways. First, we offer a micro-level model of the publi- cation increases if the replication contradicts the findings of 91 30 cation process involving an initial study and a replication. The model original study. We also show that the common metric used to 92 31 incorporates possible publication bias both at the initial and replica- assess the replication success – the “reproducibility” rate (i.e., 93 32 tion stages, enabling us to investigate the implications of such bias proportion of published replication results that successfully 94 33 on various statistical metrics of quality of evidence. Second, we es- reproduce the original positive finding) – is not affected by 95 34 timate the key parameters of the model with a large-scale vignette initial study publication bias (i.e., the file drawer) according to 96 35 experiment conducted with the universe of political science profes- our model. In other words, a low reproducibility rate provides 97 36 sors teaching at Ph.D.-granting institutions in the United States. Our no insight into whether there is a file drawer problem. Finally, 98 37 results show substantial evidence of both types of publication bias our empirical results from a survey of political scientists sug- 99 38 in the discipline: On average, respondents judged statistically signif- gest that publication biases occur with greater frequency at 100 39 icant results about 20 percentage points more likely to be published the replication phase and the gotcha bias may, in practice, 101 40 than statistically insignificant results, and contradictory replication exacerbate the false positive rate. 102 41 results about six percentage points more publishable than replica- 103 42 tions that are consistent with original results. Based on these find- Model of Publication Decisions 104 43 ings, we offer practical recommendations for the improvement of We consider two distinct types of publication bias. First, we 105 44 publication practices in political science. examine the well-known file drawer problem (4). File drawer 106 45 DRAFT 107 Keyword 1 | Keyword 2 | Keyword 3 | Keyword 4 | Keyword 5 bias occurs if a positive test result (i.e., a statistical hypothesis 46 test that rejects the null hypothesis of no effect) is more likely 108 47 to be published than a negative test result (i.e., a hypothesis 109 48 eplication is a hallmark of science. In the ideal, all empir- test that does not reject the null hypothesis), ceteris paribus. 110 49 Rical research findings would be subject to replication with In other words, the published record of research is skewed away 111 50 knowledge accumulating as replications proceed. There has from the true distribution of the research record, overstating 112 51 been increasing concern that such an “ideal” would paint an the collective strength of the findings. For example, if 1 out of 113 52 unflattering portrait of science – a recent survey of scientists 10 studies showed that sending varying types of health-related 114 53 found that 90% of respondents agreed there is a reproducibility text messages leads people to eat less fatty food, and only that 115 54 crisis (1). Evidence of such a crisis comes, in part, from the 1 is published, the result is a mis-portrayal of the effect of text 116 55 Open Science Collaboration (OSC) project that replicated messages. There is a large theoretical and empirical literature 117 56 just 36% of initially statistically significant results from 100 that documents the evidence and possible consequences of this 118 57 previously published psychology experiments (2). 119 58 One possible driver of the “replication crisis” is publication 120 Please provide details of author contributions here. 59 bias at the initial stage: that is, the published literature 121 Please declare any conflict of interest here. 60 overstates statistically significant results because those are 122 1 61 the only kind that survive the publication process (3). Non- A.J.B., J.N.D. and T.Y. contributed equally to this work. 123 62 significant results are instead relegated to the discipline’s 2To whom correspondence should be addressed. E-mail: [email protected] 124 www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | January 9, 2018 | vol. XXX | no. XX | 1–7 125 type of publication bias (8–10). The file drawer bias reflects We note that our model is not intended to be an accurate, 187 126 an entrenched culture that prioritizes statistical significance descriptive portrayal of the actual scientific practice. For ex- 188 127 and novelty, as well as a funding system that rewards positive ample, not all published results will ever be replicated with 189 128 findings (3). fresh samples in reality, even with the current push towards 190 129 Second, we consider the possibility that the process of credible research. It is indeed possible that researchers might 191 130 replication itself could induce bias. We define gotcha bias to strategically choose what existing studies to replicate and 192 131 be the phenomenon that, ceteris paribus, a positive test result which replication results to submit for review, given their per- 193 132 is more likely to be published when there exists a prior study ception about publication biases. Instead, our goal here is to 194 133 that tested the same hypothesis and had a negative result than examine how the idealized model of replication science, as ex- 195 134 when a prior study showed a positive test result (and vice emplified by the OSC study (2), would differ if we “perturbed” 196 135 versa). That is, replications are more likely to be published if it by adding possible publication bias for replication studies 197 136 they overturn extant findings. The published record of research themselves. 198 137 therefore overly emphasizes replication that run counter to To study the consequences of the two types of publication 199 138 existing findings, as compared to the true distribution of biases, we specifically consider the following metrics of evidence 200 139 the research record. For example, there might be 10 similar quality in published studies. 201 140 202 studies of text messaging effects that each purport to replicate Definition 1 [Actual False Positive Rate (AFPR) in pub- 141 203 corresponding previous studies, and they all find significant lished replication studies] 142 effects but 9 of them are successful replications of the original 204 143 205 studies that found equally significant results. If only the one α˜2 = Pr(replication test significant | replication published, 144 206 study that finds a significant effect contrary to a previous non- the null is true) 145 significant study is published, the larger research community 207 146 will see a distorted portrayal of the research record. The Definition 2 [Reproducibility] 208 147 gotcha bias is a concept applicable to replication studies that 209 148 attempt to reproduce results of previous, original studies using R = Pr(replication test significant | original test significant 210 149 similar study designs. The hypothesized mechanism behind and published, replication published)
Recommended publications
  • Arxiv:2102.03380V2 [Cs.AI] 29 Jun 2021
    Reproducibility in Evolutionary Computation MANUEL LÓPEZ-IBÁÑEZ, University of Málaga, Spain JUERGEN BRANKE, University of Warwick, UK LUÍS PAQUETE, University of Coimbra, CISUC, Department of Informatics Engineering, Portugal Experimental studies are prevalent in Evolutionary Computation (EC), and concerns about the reproducibility and replicability of such studies have increased in recent times, reflecting similar concerns in other scientific fields. In this article, we discuss, within the context of EC, the different types of reproducibility and suggest a classification that refines the badge system of the Association of Computing Machinery (ACM) adopted by ACM Transactions on Evolutionary Learning and Optimization (https://dlnext.acm.org/journal/telo). We identify cultural and technical obstacles to reproducibility in the EC field. Finally, we provide guidelines and suggest tools that may help to overcome some of these reproducibility obstacles. Additional Key Words and Phrases: Evolutionary Computation, Reproducibility, Empirical study, Benchmarking 1 INTRODUCTION As in many other fields of Computer Science, most of the published research in Evolutionary Computation (EC) relies on experiments to justify their conclusions. The ability of reaching similar conclusions by repeating an experiment performed by other researchers is the only way a research community can reach a consensus on an empirical claim until a mathematical proof is discovered. From an engineering perspective, the assumption that experimental findings hold under similar conditions is essential for making sound decisions and predicting their outcomes when tackling a real-world problem. The “reproducibility crisis” refers to the realisation that many experimental findings described in peer-reviewed scientific publications cannot be reproduced, because e.g. they lack the necessary data, they lack enough details to repeat the experiment or repeating the experiment leads to different conclusions.
    [Show full text]
  • Experimental Evidence During a Global Pandemic
    Original research BMJ Glob Health: first published as 10.1136/bmjgh-2020-004222 on 23 October 2020. Downloaded from Is health politically irrelevant? Experimental evidence during a global pandemic Arnab Acharya,1 John Gerring,2 Aaron Reeves3 To cite: Acharya A, Gerring J, ABSTRACT Key questions Reeves A. Is health politically Objective To investigate how health issues affect voting irrelevant? Experimental behaviour by considering the COVID-19 pandemic, which What is already known? evidence during a global offers a unique opportunity to examine this interplay. pandemic. BMJ Global Health Political leaders in democracies are sensitive to cues Design We employ a survey experiment in which ► 2020;5:e004222. doi:10.1136/ from the electorate and are less likely to implement treatment groups are exposed to key facts about the bmjgh-2020-004222 unpopular policies. pandemic, followed by questions intended to elicit attitudes Electoral accountability is not automatic, however, toward the incumbent party and government responsibility ► Additional material is and it only exists if citizens connect specific policies ► for the pandemic. published online only. To view, to politicians and vote accordingly. Setting The survey was conducted amid the lockdown please visit the journal online ► We know little about how public health attitudes af- period of 15–26 April 2020 in three large democratic (http:// dx. doi. org/ 10. 1136/ fect voting intention or behaviour. bmjgh- 2020- 004222). countries with the common governing language of English: India, the United Kingdom and the United States. Due What are the new findings? to limitations on travel and recruitment, subjects were ► The majority of our respondents believe health is an Received 15 October 2020 recruited through the M-T urk internet platform and the important policy area and that government has some Accepted 16 October 2020 survey was administered entirely online.
    [Show full text]
  • Sensitivity and Specificity. Wikipedia. Last Modified on 16 November 2013
    Sensitivity and specificity - Wikipedia, the free encyclopedia Create account Log in Article Talk Read Edit View Sensitivity and specificity From Wikipedia, the free encyclopedia Main page Contents Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as Featured content classification function. Sensitivity (also called the true positive rate, or the recall rate in some fields) measures the proportion of Current events actual positives which are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having Random article the condition). Specificity measures the proportion of negatives which are correctly identified as such (e.g. the percentage of Donate to Wikipedia healthy people who are correctly identified as not having the condition, sometimes called the true negative rate). These two measures are closely related to the concepts of type I and type II errors. A perfect predictor would be described as 100% Interaction sensitive (i.e. predicting all people from the sick group as sick) and 100% specific (i.e. not predicting anyone from the healthy Help group as sick); however, theoretically any predictor will possess a minimum error bound known as the Bayes error rate. About Wikipedia For any test, there is usually a trade-off between the measures. For example: in an airport security setting in which one is testing Community portal for potential threats to safety, scanners may be set to trigger on low-risk items like belt buckles and keys (low specificity), in Recent changes order to reduce the risk of missing objects that do pose a threat to the aircraft and those aboard (high sensitivity).
    [Show full text]
  • A Randomized Control Trial Evaluating the Effects of Police Body-Worn
    A randomized control trial evaluating the effects of police body-worn cameras David Yokuma,b,1,2, Anita Ravishankara,c,d,1, and Alexander Coppocke,1 aThe Lab @ DC, Office of the City Administrator, Executive Office of the Mayor, Washington, DC 20004; bThe Policy Lab, Brown University, Providence, RI 02912; cExecutive Office of the Chief of Police, Metropolitan Police Department, Washington, DC 20024; dPublic Policy and Political Science Joint PhD Program, University of Michigan, Ann Arbor, MI 48109; and eDepartment of Political Science, Yale University, New Haven, CT 06511 Edited by Susan A. Murphy, Harvard University, Cambridge, MA, and approved March 21, 2019 (received for review August 28, 2018) Police body-worn cameras (BWCs) have been widely promoted as The existing evidence on whether BWCs have the anticipated a technological mechanism to improve policing and the perceived effects on policing outcomes remains relatively limited (17–19). legitimacy of police and legal institutions, yet evidence of their Several observational studies have evaluated BWCs by compar- effectiveness is limited. To estimate the effects of BWCs, we con- ing the behavior of officers before and after the introduction of ducted a randomized controlled trial involving 2,224 Metropolitan BWCs into the police department (20, 21). Other studies com- Police Department officers in Washington, DC. Here we show pared officers who happened to wear BWCs to those without (15, that BWCs have very small and statistically insignificant effects 22, 23). The causal inferences drawn in those studies depend on on police use of force and civilian complaints, as well as other strong assumptions about whether, after statistical adjustments policing activities and judicial outcomes.
    [Show full text]
  • The Reproducibility Crisis in Research and Open Science Solutions Andrée Rathemacher University of Rhode Island, [email protected] Creative Commons License
    University of Rhode Island DigitalCommons@URI Technical Services Faculty Presentations Technical Services 2017 "Fake Results": The Reproducibility Crisis in Research and Open Science Solutions Andrée Rathemacher University of Rhode Island, [email protected] Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License. Follow this and additional works at: http://digitalcommons.uri.edu/lib_ts_presentations Part of the Scholarly Communication Commons, and the Scholarly Publishing Commons Recommended Citation Rathemacher, Andrée, ""Fake Results": The Reproducibility Crisis in Research and Open Science Solutions" (2017). Technical Services Faculty Presentations. Paper 48. http://digitalcommons.uri.edu/lib_ts_presentations/48http://digitalcommons.uri.edu/lib_ts_presentations/48 This Speech is brought to you for free and open access by the Technical Services at DigitalCommons@URI. It has been accepted for inclusion in Technical Services Faculty Presentations by an authorized administrator of DigitalCommons@URI. For more information, please contact [email protected]. “Fake Results” The Reproducibility Crisis in Research and Open Science Solutions “It can be proven that most claimed research findings are false.” — John P. A. Ioannidis, 2005 Those are the words of John Ioannidis (yo-NEE-dees) in a highly-cited article from 2005. Now based at Stanford University, Ioannidis is a meta-scientist who conducts “research on research” with the goal of making improvements. Sources: Ionnidis, John P. A. “Why Most
    [Show full text]
  • Which Findings Should Be Published?∗
    Which findings should be published?∗ Alexander Frankel† Maximilian Kasy‡ December 5, 2020 Abstract Given a scarcity of journal space, what is the optimal rule for whether an empirical finding should be published? Suppose publications inform the public about a policy-relevant state. Then journals should publish extreme results, meaning ones that move beliefs sufficiently. This optimal rule may take the form of a one- or a two-sided test comparing a point estimate to the prior mean, with critical values determined by a cost-benefit analysis. Consideration of future studies may additionally justify the publication of precise null results. If one insists that standard inference remain valid, however, publication must not select on the study’s findings. Keywords: Publication bias, mechanism design, value of information JEL Codes: C44, D80, D83 1 Introduction Not all empirical findings get published. Journals may be more likely to publish findings that are statistically significant, as documented for instance by Franco et al. (2014), Brodeur et al. (2016), and Andrews and Kasy (2019). They may also be more likely to publish findings that are surprising, or conversely ones that confirm some prior belief. Whatever its form, selective publication distorts statistical inference. If only estimates with large effect sizes were to be written up and published, say, then ∗We thank Isaiah Andrews, Victoria Baranov, Ben Brooks, Gary Chamberlain, Zoe Hitzig, Emir Kamenica, Matthew Notowidigdo, Jesse Shapiro, and Glen Weyl for helpful discussions and com- ments. †University of Chicago Booth School of Business. Address: 5807 South Woodlawn Avenue Chicago, IL 60637. E-Mail: [email protected]. ‡Department of Economics, Oxford University.
    [Show full text]
  • 1 Maximizing the Reproducibility of Your Research Open
    1 Maximizing the Reproducibility of Your Research Open Science Collaboration1 Open Science Collaboration (in press). Maximizing the reproducibility of your research. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions. New York, NY: Wiley. Authors’ Note: Preparation of this chapter was supported by the Center for Open Science and by a Veni Grant (016.145.049) awarded to Hans IJzerman. Correspondence can be addressed to Brian Nosek, [email protected]. 1 Alexander A. Aarts, Nuenen, The Netherlands; Frank A. Bosco, Virginia Commonwealth University; Katherine S. Button, University of Bristol; Joshua Carp, Center for Open Science; Susann Fiedler, Max Planck Institut for Research on Collective Goods; James G. Field, Virginia Commonwealth University; Roger Giner-Sorolla, University of Kent; Hans IJzerman, Tilburg University; Melissa Lewis, Center for Open Science; Marcus Munafò, University of Bristol; Brian A. Nosek, University of Virginia; Jason M. Prenoveau, Loyola University Maryland; Jeffrey R. Spies, Center for Open Science 2 Commentators in this book and elsewhere describe evidence that modal scientific practices in design, analysis, and reporting are interfering with the credibility and veracity of the published literature (Begley & Ellis, 2012; Ioannidis, 2005; Miguel et al., 2014; Simmons, Nelson, & Simonsohn, 2011). The reproducibility of published findings is unknown (Open Science Collaboration, 2012a), but concern that is lower than desirable is widespread -
    [Show full text]
  • Intro to Science Studies I
    PHIL 209A / SOCG 255A / HIGR 238 / COGR 225A Intro to Science Studies I Fall 2017 Instructor: Kerry McKenzie [email protected] Seminars: Tuesday 9.30-12.20pm, HSS 3027. Office Hours: Wednesday 2-4pm, HSS 8088. 1 Overview. This course is a philosophically slanted introduction to Science Studies. Our central question is a conceptual one, whose relevance to Science Studies should speak for itself – namely, what is science, and what distinguishes science from other fields? In grappling with this question we’ll familiarize ourselves with central works in the philosophy of science canon, and make glancing acquaintance with some more contemporary issues in scientific epistemology and metaphysics. But our guiding motif is a normative one: what, if anything, makes science entitled to the privileged status that it enjoys within culture? In more detail. The ‘question of demarcation’ – that of what, if anything, makes what we call ‘science’ science – was a central preoccupation of many of the major 20th century philosophers of science. While interest in this topic waned after the appearance of Larry Laudan’s ‘Demise of the Demarcation Problem’ in 1983, the question of what separates science from ‘pseudoscience’ is now making something of a comeback. In this course, we will review the approaches to demarcation offered by the philosophers Popper, Kuhn, and Lakatos – all of which serve as concise introductions to the dominant themes of their work as a whole – and then examine some more contemporary approaches more centred on pragmatics and the philosophy of language. We will then consider how homeopathy – for most a pseudoscience par excellence – fares with regard to the criteria we’ll have studied.
    [Show full text]
  • Psychology's Replication Crisis and the Grant Culture: Righting the Ship
    PPSXXX10.1177/1745691616687745LilienfeldRighting the Ship 687745research-article2017 Perspectives on Psychological Science 2017, Vol. 12(4) 660 –664 Psychology’s Replication Crisis and the © The Author(s) 2017 Reprints and permissions: Grant Culture: Righting the Ship sagepub.com/journalsPermissions.nav DOI:https://doi.org/10.1177/1745691616687745 10.1177/1745691616687745 www.psychologicalscience.org/PPS Scott O. Lilienfeld Department of Psychology, Emory University Abstract The past several years have been a time for soul searching in psychology, as we have gradually come to grips with the reality that some of our cherished findings are less robust than we had assumed. Nevertheless, the replication crisis highlights the operation of psychological science at its best, as it reflects our growing humility. At the same time, institutional variables, especially the growing emphasis on external funding as an expectation or de facto requirement for faculty tenure and promotion, pose largely unappreciated hazards for psychological science, including (a) incentives for engaging in questionable research practices, (b) a single­minded focus on programmatic research, (c) intellectual hyperspecialization, (d) disincentives for conducting direct replications, (e) stifling of creativity and intellectual risk taking, (f) researchers promising more than they can deliver, and (g) diminished time for thinking deeply. Preregistration should assist with (a), but will do little about (b) through (g). Psychology is beginning to right the ship, but it will need to confront the increasingly deleterious impact of the grant culture on scientific inquiry. Keywords replication, grants, preregistration, confirmation bias “Science is a struggle for truth against method­ psychological phenomena, such as behavioral priming, ological, psychological, and sociological obstacles” the effects of facial expressions on emotions, and the (Fanelli & Ioannidis, 2013, p.
    [Show full text]
  • Paradoxical Conclusions in Randomised Controlled Trials with 'Doubly Null
    Open access Original research BMJ Open: first published as 10.1136/bmjopen-2019-029596 on 9 January 2020. Downloaded from ‘Not clinically effective but cost- effective’ - paradoxical conclusions in randomised controlled trials with ‘doubly null’ results: a cross- sectional study James Raftery ,1 HC Williams,2 Aileen Clarke,3 Jim Thornton,2 John Norrie,4 Helen Snooks,5 Ken Stein6 To cite: Raftery J, Williams HC, ABSTRACT Strengths and limitations of this study Clarke A, et al. ‘Not clinically Objectives Randomised controlled trials in healthcare effective but cost- effective’ increasingly include economic evaluations. Some show - paradoxical conclusions in ► A strength of this study is the identification of a small differences which are not statistically significant. randomised controlled trials with problem to do with results on cost- effectiveness ‘doubly null’ results: a cross- Yet these sometimes come to paradoxical conclusions from randomised trials which is fairly common but sectional study. BMJ Open such as: ‘the intervention is not clinically effective’ but has not otherwise been reported or examined. ‘is probably cost- effective’. This study aims to quantify 2020;10:e029596. doi:10.1136/ ► A limitation was that the sample was confined bmjopen-2019-029596 the extent of non- significant results and the types of to that of trials funded and published by the UK’s conclusions drawn from them. ► Prepublication history and Health Technology Assessment (HTA) programme. additional material for this Design Cross- sectional retrospective analysis of ► The study was strengthened by using full reports paper are available online. To randomised trials published by the UK’s National of each study in the HTA monograph series, which view these files, please visit Institute for Health Research (NIHR) Health Technology requires conclusions and recommendations for the journal online (http:// dx.
    [Show full text]
  • A Bayesian Treatise on the Replication Crisis in Science
    A Bayesian Treatise on The Replication Crisis in Science by True Gibson A thesis submitted to Oregon State University Honors College in partial fulfillment of the requirements for the degree of Honors Baccalaureate of Science in Biochemistry and Biophysics Honors Scholar Presented May 25, 2018 Commencement June 2018 AN ABSTRACT OF THE THESIS OF True Gibson for the degree of Honors Baccalaureate of Science in Biochemistry and Biophysics presented on May 25, 2018. Title: A Bayesian Treatise on The Replication Crisis in Science. Abstract approved: __________________________________________________ Jonathan Kaplan Abstract Over the past decade, it has come to light that many published scientific findings cannot be reproduced. This has led to the replication crisis in science. Many researchers feel that they can no longer trust much of what they read in scientific journals, and the public is becoming ever more dubious of published research findings. I argue herein that the replication crisis is a result of the mistaken belief that the methods of classical statistical inference provide a coherent objective basis for evaluating scientific hypotheses. To solve the replication crisis, I suggest that the scientific community ought to invoke a Bayesian framework of statistical inference which integrates both successful and failed replication attempts to determine the confidence they should have in any given scientific hypothesis. Key Words: Inductive Reasoning, Replication Crisis, Reproducibility, Statistical Inference, Bayesian Statistics Corresponding
    [Show full text]
  • Researchers Overturn Landmark Study on the Replicability of Psychological Science
    Researchers overturn landmark study on the replicability of psychological science By Peter Reuell Harvard Staff Writer Category: HarvardScience Subcategory: Culture & Society KEYWORDS: psychology, psychological science, replication, replicate, reproduce, reproducibility, Center for Open Science, Gilbert, Daniel Gilbert, King, Gary King, Science, Harvard, FAS, Faculty of Arts and Sciences, Reuell, Peter Reuell Summary: A 2015 study claiming that more than half of all psychology studies cannot be replicated turns out to be wrong. Harvard researchers have discovered that the study contains several statistical and methodological mistakes, and that when these are corrected, the study actually shows that the replication rate in psychology is quite high – indeed, it is statistically indistinguishable from 100%. RELATED LINKS: According to two Harvard professors and their collaborators, a 2015 landmark study showing that more than half of all psychology studies cannot be replicated is actually wrong. In an attempt to determine the replicability of psychological science, a consortium of 270 scientists known as The Open Science Collaboration (OSC) tried to replicate the results of 100 published studies. More than half of them failed, creating sensational headlines worldwide about the “replication crisis” in psychology. But an in-depth examination of the data by Daniel Gilbert (Edgar Pierce Professor of Psychology at Harvard University), Gary King (Albert J. Weatherhead III University Professor at Harvard University), Stephen Pettigrew (doctoral student in the Department of Government at Harvard University), and Timothy Wilson (Sherrell J. Aston Professor of Psychology at the University of Virginia) has revealed that the OSC made some serious mistakes that make this pessimistic conclusion completely unwarranted: The methods of many of the replication studies turn out to be remarkably different from the originals and, according to Gilbert, King, Pettigrew, and Wilson, these “infidelities” had two important consequences.
    [Show full text]