Multiplicity and Reproducibility in Scientific Studies Summer 2006
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Reproducibility, Replicability, and Generalization in the Social, Behavioral, and Economic Sciences
REPRODUCIBILITY, REPLICABILITY, AND GENERALIZATION IN THE SOCIAL, BEHAVIORAL, AND ECONOMIC SCIENCES Report of the Subcommittee on Replicability in Science of the SBE Advisory Committee to the National Science Foundation 13 May 2015 Presentation at SBE AC Spring Meeting by K. Bollen. Report of the Subcommittee on Replicability in Science Advisory Committee to the NSF SBE Directorate Subcommittee Members: Kenneth Bollen (University of North Carolina at Chapel Hill, Cochair) John T. Cacioppo (University of Chicago, Cochair) Robert M. Kaplan (Agency for Healthcare Research and Quality) Jon A. Krosnick (Stanford University) James L. Olds (George Mason University) Staff Assistance Heather Dean (National Science Foundation) TOPICS I. Background II. Subcommittee Report III. Definitions IV. Recommendations V. Final Comments Background • Spring, 2013 o NSF SBE Advisory Committee establishes subcommittee on how SBE can promote robust research practices • Summer & Fall 2013 o Subcommittee proposal for workshop on “Robust Research in the Social, Behavioral, and Economic Sciences.” • February 20-21, 2014 o Workshop convened o Participants drawn from variety of universities, funding agencies, scientific associations, and journals o Cover broad range of issues from extent and cause of problems to possible solutions o Details are in the appendix of the circulated report • Post-workshop period o Document & digest workshop content o Discuss and propose recommendations o Complete report Subcommittee Report • DEFINITIONS o No consensus in science on the meanings -
Downloading of a Human Consciousness Into a Digital Computer Would Involve ‘A Certain Loss of Our Finer Feelings and Qualities’89
When We Can Trust Computers (and When We Can’t) Peter V. Coveney1,2* Roger R. Highfield3 1Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK, https://orcid.org/0000-0002-8787-7256 2Institute for Informatics, Science Park 904, University of Amsterdam, 1098 XH Amsterdam, Netherlands 3Science Museum, Exhibition Road, London SW7 2DD, UK, https://orcid.org/0000-0003-2507-5458 Keywords: validation; verification; uncertainty quantification; big data; machine learning; artificial intelligence Abstract With the relentless rise of computer power, there is a widespread expectation that computers can solve the most pressing problems of science, and even more besides. We explore the limits of computational modelling and conclude that, in the domains of science and engineering that are relatively simple and firmly grounded in theory, these methods are indeed powerful. Even so, the availability of code, data and documentation, along with a range of techniques for validation, verification and uncertainty quantification, are essential for building trust in computer generated findings. When it comes to complex systems in domains of science that are less firmly grounded in theory, notably biology and medicine, to say nothing of the social sciences and humanities, computers can create the illusion of objectivity, not least because the rise of big data and machine learning pose new challenges to reproducibility, while lacking true explanatory power. We also discuss important aspects of the natural world which cannot be solved by digital means. In the long-term, renewed emphasis on analogue methods will be necessary to temper the excessive faith currently placed in digital computation. -
Caveat Emptor: the Risks of Using Big Data for Human Development
1 Caveat emptor: the risks of using big data for human development Siddique Latif1,2, Adnan Qayyum1, Muhammad Usama1, Junaid Qadir1, Andrej Zwitter3, and Muhammad Shahzad4 1Information Technology University (ITU)-Punjab, Pakistan 2University of Southern Queensland, Australia 3University of Groningen, Netherlands 4National University of Sciences and Technology (NUST), Pakistan Abstract Big data revolution promises to be instrumental in facilitating sustainable development in many sectors of life such as education, health, agriculture, and in combating humanitarian crises and violent conflicts. However, lurking beneath the immense promises of big data are some significant risks such as (1) the potential use of big data for unethical ends; (2) its ability to mislead through reliance on unrepresentative and biased data; and (3) the various privacy and security challenges associated with data (including the danger of an adversary tampering with the data to harm people). These risks can have severe consequences and a better understanding of these risks is the first step towards mitigation of these risks. In this paper, we highlight the potential dangers associated with using big data, particularly for human development. Index Terms Human development, big data analytics, risks and challenges, artificial intelligence, and machine learning. I. INTRODUCTION Over the last decades, widespread adoption of digital applications has moved all aspects of human lives into the digital sphere. The commoditization of the data collection process due to increased digitization has resulted in the “data deluge” that continues to intensify with a number of Internet companies dealing with petabytes of data on a daily basis. The term “big data” has been coined to refer to our emerging ability to collect, process, and analyze the massive amount of data being generated from multiple sources in order to obtain previously inaccessible insights. -
Metabolic Sensor Governing Bacterial Virulence in Staphylococcus Aureus
Metabolic sensor governing bacterial virulence in PNAS PLUS Staphylococcus aureus Yue Dinga, Xing Liub, Feifei Chena, Hongxia Dia, Bin Xua, Lu Zhouc, Xin Dengd,e, Min Wuf, Cai-Guang Yangb,1, and Lefu Lana,1 aDepartment of Molecular Pharmacology and bChinese Academy of Sciences Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; cDepartment of Medicinal Chemistry, School of Pharmacy, Fudan University, Shanghai 201203, China; dDepartment of Chemistry and eInstitute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637; and fDepartment of Basic Sciences University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND 58203 Edited by Richard P. Novick, New York University School of Medicine, New York, NY, and approved October 14, 2014 (received for review June 13, 2014) An effective metabolism is essential to all living organisms, in- To survive and replicate efficiently in the host, S. aureus has cluding the important human pathogen Staphylococcus aureus.To developed exquisite mechanisms for scavenging nutrients and establish successful infection, S. aureus must scavenge nutrients adjusting its metabolism to maintain growth while also coping with and coordinate its metabolism for proliferation. Meanwhile, it also stress (6, 11). On the other hand, S. aureus produces a wide array must produce an array of virulence factors to interfere with host of virulence factors to evade host immune defenses and to derive defenses. However, the ways in which S. aureus ties its metabolic nutrition either parasitically or destructively from the host during state to its virulence regulation remain largely unknown. Here we infections (6). -
Principles of Validation of Diagnostic Assays for Infectious Diseases1
PRINCIPLES OF VALIDATION OF DIAGNOSTIC ASSAYS FOR INFECTIOUS DISEASES1 R. H. JACOBSON, New York State College of Veterinary Medicine, Cornell University, Ithaca, New York, USA Abstract PRINCIPLES OF VALIDATION OF DIAGNOSTIC ASSAYS FOR INFECTIOUS DISEASES. Assay validation requires a series of inter-related processes. Assay validation is an experimental process: reagents and protocols are optimized by experimentation to detect the analyte with accuracy and precision. Assay validation is a relative process: its diagnostic sensitivity and diagnostic specificity are calculated relative to test results obtained from reference animal populations of known infection/exposure status. Assay validation is a conditional process: classification of animals in the target population as infected or uninfected is conditional upon how well the reference animal population used to validate the assay represents the target population; accurate predictions of the infection status of animals from test results (PV+ and PV−) are conditional upon the estimated prevalence of disease/infection in the target population. Assay validation is an incremental process: confidence in the validity of an assay increases over time when use confirms that it is robust as demonstrated by accurate and precise results; the assay may also achieve increasing levels of validity as it is upgraded and extended by adding reference populations of known infection status. Assay validation is a continuous process: the assay remains valid only insofar as it continues to provide accurate and precise results as proven through statistical verification. Therefore, the work required for validation of diagnostic assays for infectious diseases does not end with a time-limited series of experiments based on a few reference samples rather, to assure valid test results from an assay requires constant vigilance and maintenance of the assay, along with reassessment of its performance characteristics for each unique population of animals to which it is applied. -
Reproducibility: Promoting Scientific Rigor and Transparency
Reproducibility: Promoting scientific rigor and transparency Roma Konecky, PhD Editorial Quality Advisor What does reproducibility mean? • Reproducibility is the ability to generate similar results each time an experiment is duplicated. • Data reproducibility enables us to validate experimental results. • Reproducibility is a key part of the scientific process; however, many scientific findings are not replicable. The Reproducibility Crisis • ~2010 as part of a growing awareness that many scientific studies are not replicable, the phrase “Reproducibility Crisis” was coined. • An initiative of the Center for Open Science conducted replications of 100 psychology experiments published in prominent journals. (Science, 349 (6251), 28 Aug 2015) - Out of 100 replication attempts, only 39 were successful. The Reproducibility Crisis • According to a poll of over 1,500 scientists, 70% had failed to reproduce at least one other scientist's experiment or their own. (Nature 533 (437), 26 May 2016) • Irreproducible research is a major concern because in valid claims: - slow scientific progress - waste time and resources - contribute to the public’s mistrust of science Factors contributing to Over 80% of respondents irreproducibility Nature | News Feature 25 May 2016 Underspecified methods Factors Data dredging/ Low statistical contributing to p-hacking power irreproducibility Technical Bias - omitting errors null results Weak experimental design Underspecified methods Factors Data dredging/ Low statistical contributing to p-hacking power irreproducibility Technical Bias - omitting errors null results Weak experimental design Underspecified methods • When experimental details are omitted, the procedure needed to reproduce a study isn’t clear. • Underspecified methods are like providing only part of a recipe. ? = Underspecified methods Underspecified methods Underspecified methods Underspecified methods • Like baking a loaf of bread, a “scientific recipe” should include all the details needed to reproduce the study. -
New NIH Guidelines to Improve Experimental Rigor and Reproducibility
New NIH guidelines to improve Experimental Rigor and Reproducibility Oswald Steward Professor, Department of Anatomy & Neurobiology Senior Associate Dean for Research University of California School of Medicine What I’ll tell you 1) Background: Increasing reports of failures to replicate 2) Previous actions by NIH to enhance rigor 3) New requirements for grants submitted after January 2016 4) What NIH will be looking for in new required sections 5) Some suggested best practices to meet new standards for scientific rigor • The Problem: Numerous recent reports of findings that can’t be replicated. Failure to replicate key results from preclinical cancer studies: 1) Prinz et al. (2011) Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. Inconsistencies were found in 2/3 studies 2) Begley and Ellis (2012) Raise standards for preclinical cancer research. Nature. Only 6/53 “landmark” papers replicated 3) Begley (2013) Six red flags for suspect work. Nature. -Were experiments performed blinded? -Were basic experiments repeated? -Were all the results presented? -Were there positive and negative controls? -Were reagents validated? -Were statistical tests appropriate? • In ALS field, Scott et al., (2008) re-assessed 70 drugs that had been reported to increase lifespan in an animal model of ALS (SOD1-G93A knockout mice). • 221 separate studies were carried out over a five year period that involved over 18,000 mice. • Some drugs were already in clinical trials. • Outcome: There was no statistically significant positive effect for any of the 70 compounds tested. • Conclusion: Previously published effects were measurements of noise in the distribution of survival means as opposed to an actual drug effect. -
Rejecting Statistical Significance Tests: Defanging the Arguments
Rejecting Statistical Significance Tests: Defanging the Arguments D. G. Mayo Philosophy Department, Virginia Tech, 235 Major Williams Hall, Blacksburg VA 24060 Abstract I critically analyze three groups of arguments for rejecting statistical significance tests (don’t say ‘significance’, don’t use P-value thresholds), as espoused in the 2019 Editorial of The American Statistician (Wasserstein, Schirm and Lazar 2019). The strongest argument supposes that banning P-value thresholds would diminish P-hacking and data dredging. I argue that it is the opposite. In a world without thresholds, it would be harder to hold accountable those who fail to meet a predesignated threshold by dint of data dredging. Forgoing predesignated thresholds obstructs error control. If an account cannot say about any outcomes that they will not count as evidence for a claim—if all thresholds are abandoned—then there is no a test of that claim. Giving up on tests means forgoing statistical falsification. The second group of arguments constitutes a series of strawperson fallacies in which statistical significance tests are too readily identified with classic abuses of tests. The logical principle of charity is violated. The third group rests on implicit arguments. The first in this group presupposes, without argument, a different philosophy of statistics from the one underlying statistical significance tests; the second group—appeals to popularity and fear—only exacerbate the ‘perverse’ incentives underlying today’s replication crisis. Key Words: Fisher, Neyman and Pearson, replication crisis, statistical significance tests, strawperson fallacy, psychological appeals, 2016 ASA Statement on P-values 1. Introduction and Background Today’s crisis of replication gives a new urgency to critically appraising proposed statistical reforms intended to ameliorate the situation. -
Studies of Staphylococcal Infections. I. Virulence of Staphy- Lococci and Characteristics of Infections in Embryonated Eggs * WILLIAM R
Journal of Clinical Investigation Vol. 43, No. 11, 1964 Studies of Staphylococcal Infections. I. Virulence of Staphy- lococci and Characteristics of Infections in Embryonated Eggs * WILLIAM R. MCCABE t (From the Research Laboratory, West Side Veterans Administration Hospital, and the Department of Medicine, Research and Educational Hospitals, University of Illinois College of Medicine, Chicago, Ill.) Many of the determinants of the pathogenesis niques still require relatively large numbers of and course of staphylococcal infections remain staphylococci to produce infection (19). Fatal imprecisely defined (1, 2) despite their increas- systemic infections have been equally difficult to ing importance (3-10). Experimental infections produce in animals and have necessitated the in- in suitable laboratory animals have been of con- jection of 107 to 109 bacteria (20-23). A few siderable assistance in clarifying the role of host strains of staphylococci have been found that are defense mechanisms and specific bacterial virulence capable of producing lethal systemic infections factors with a variety of other infectious agents. with inocula of from 102 to 103 bacteria (24) and A sensitive experimental model would be of value have excited considerable interest (25-27). The in defining the importance of these factors in virulence of these strains apparently results from staphylococcal infections, but both humans and an unusual antigenic variation (27, 28) which, the usual laboratory animals are relatively re- despite its interest, is of doubtful significance in sistant. Extremely large numbers of staphylo- human staphylococcal infection, since such strains cocci are required to produce either local or sys- have been isolated only rarely from clinical in- temic infections experimentally. -
On the Reproducibility of Meta-Analyses.Docx
On the Reproducibility of Meta-Analyses 1 1 RUNNING HEAD: ON THE REPRODUCIBILITY OF META-ANALYSES 2 3 4 On the Reproducibility of Meta-Analyses: Six Practical Recommendations 5 6 Daniël Lakens 7 Eindhoven University of Technology 8 Joe Hilgard 9 University of Missouri - Columbia 10 Janneke Staaks 11 University of Amsterdam 12 13 In Press, BMC Psychology 14 15 Word Count: 6371 16 17 Keywords: Meta-Analysis; Reproducibility; Open Science; Reporting Guidelines 18 19 Author Note: Correspondence can be addressed to Daniël Lakens, Human Technology 20 Interaction Group, IPO 1.33, PO Box 513, 5600MB Eindhoven, The Netherlands. E-mail: 21 [email protected]. We would like to thank Bianca Kramer for feedback on a previous draft 22 of this manuscript. 23 24 Competing Interests: The authors declared that they had no competing interest with respect 25 to their authorship or the publication of this article. 26 27 Author Contributions: All authors contributed writing this manuscript. DL drafted the 28 manuscript, DL, JH, and JS revised the manuscript. All authors read and approved the final 29 manuscript. 30 31 Email Addresses: D. Lakens: [email protected] 32 Joe Hilgard: [email protected] 33 Janneke Staaks: [email protected] On the Reproducibility of Meta-Analyses 2 34 Abstract 35 Background 36 Meta-analyses play an important role in cumulative science by combining information 37 across multiple studies and attempting to provide effect size estimates corrected for 38 publication bias. Research on the reproducibility of meta-analyses reveals that errors are 39 common, and the percentage of effect size calculations that cannot be reproduced is much 40 higher than is desirable. -
10 Things to Know About Reproducibility and Replicability
10 Things to Know About Reproducibility and Replicability One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some argue that such an observed inconsistency can be an important precursor to new discovery while others fear it may be a symptom of a lack of rigor in science. When a newly reported scientific study has far-reaching implications for science or a major, potential impact on the public, the question of its reliability takes on heightened importance. 1 The terms reproducibility and replicability take on 6 Not all studies can be replicated. While scientists are a range of meanings in contemporary usage. The report able to test for replicability of most studies, it is impossible to do distinguishes and defines the terms as follows: Reproducibility so for studies of ephemeral phenomena. means obtaining consistent results using the same input data, computational steps, methods, and conditions of analysis; it is synonymous with computational reproducibility. Replicability 7 One type of scientific research tool, statistical means obtaining consistent results across studies aimed at inference, has an outsized role in replicability discussions answering the same scientific question, each of which has due to the frequent misuse of statistics and the use of a obtained its own data. p-value threshold for determining “statistical significance.” Biases in published research can occur due to the excess reliance on and misunderstanding of statistical significance. 2 Reproducibility and replicability matter. -
Lab Services Accuracy and Reproducibility
Accuracy and Reproducibility ABOUT The Laboratory Services Committee of the American Thyroid Association® (ATA) conducted a survey of ATA® members to identify areas of member interest for education in pathology and laboratory medicine. In response to the results of the survey, the Lab Service Committee developed a series of educational materials to share with the ATA® membership. The topics below were ranked as high educational priorities amongst the membership. FIGURE 1A: FIGURE 1B: Single perfectly accurate result. Three inaccurate results that ACCURACY AND PRECISION are accurate on average WHAT ARE THE INDICATORS OF TEST RELIABILITY? The most commonly used descriptors of analytical performance for laboratories are accuracy and precision. Accuracy is the degree to which the test results approximate the value that would be obtained by a reference method, and precision is the reproducibility of test results among measurements, days, and operating conditions. The metrics of analytical performance are distinct from the metrics of clinical performance, which are focused on sensitivity, specificity, and predictive values. FIGURE 1C: Three perfectly precise but ACCURACY (TRUENESS) inaccurate results. A test method is said to be accurate when the values it produces are close to the values that are obtained from a reference method. Thus, PRECISION (REPEATABILITY) the concept of accuracy is highly dependent on the correct choice of The description of accuracy above demonstrates why this metric alone a reference method as all judgments of accuracy treat the reference is insufficient to characterize how a test performs. Three arrows shown method as truth; deviations from the reference method are therefore in (Figure 1B) are all very inaccurate, but they average to be accurate.