Reproducibility from Discovery to Clinical Research What Could Be the Way Out?

Total Page:16

File Type:pdf, Size:1020Kb

Reproducibility from Discovery to Clinical Research What Could Be the Way Out? Reproducibility from Discovery to Clinical Research What could be the way out? Bruno Boulanger November 2019 Agenda The reproducibility crisis, i.e. the replicability crisis The p-value crisis How to make a decision? What is the question ? The Bayesian learning process in Pharmaceutical R&D The posterior predictive distribution The power, the Bayesian power and the Assurance The missing component: the elephant in the room ? Take away message © PharmaLex 2 The Replicability crisis: the beginning Nature, 2014 As the American Statistical Association officially reminded in March 2016…. The American Statistical Association reminded in a press release some key points: 1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 4. Proper inference requires full reporting and transparency. 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. Reproducibility now a public concern Emergency meeting: ASA Symposium in October 2017. The American Statistician - 2019 Maybe the key issue is the training in statistics S. Goodman: The American Statistician - 2019 10 Nature March 2019 11 National Academies May 2019 12 To « p » or not to « p »: what is the question? The objective: is my product effective ? How to make a decision ? What is the probability of obtaining the observed data, if the A product is not effective? What is the probability that the product is effective, given the B observed data? 2 / 18 Two different ways to make a decision based on Pr 퐨퐛퐬퐞퐫퐯퐞퐝 퐝퐚퐭퐚 퐩퐫퐨퐝퐮퐜퐭 퐢퐬 퐧퐨퐭 퐞퐟퐟퐞퐜퐭퐢퐯퐞 ) A ◼ Better known as the p-value concept ◼ Used in the null hypothesis test (or decision) ◼ This is the likelihood of the data assuming an hypothetical explanation (eg the “null hypothesis”) ◼ Classical statistics perspective (Frequentist) Pr 퐩퐫퐨퐝퐮퐜퐭 퐞퐟퐟퐞퐜퐭퐢퐯퐞 퐨퐛퐬퐞퐫퐯퐞퐝 퐝퐚퐭퐚 ) B ◼ Bayesian perspective ◼ It is the probability of efficacy given the data 3 / 18 Nature 2017 Nothing has changed in 20 years 1999 What is the question ? New patient : Cancer? Diagnostic test Test Result Interpretation ? What is the probability that the patient has Cancer given the observed positive results ? Context A disease D with a low prevalence 1 % of the population is diseased = D+ Major consequences if the disease is not detected A problem of decision making The accuracy of a diagnostic test is assessed as follows: Sensitivity: Pr 퐩퐨퐬퐢퐭퐢퐯퐞 퐫퐞퐬퐮퐥퐭 퐜퐚퐧퐜퐞퐫) Specificity: Pr 퐧퐞퐠퐚퐭퐢퐯퐞 퐫퐞퐬퐮퐥퐭 퐧퐨 퐜퐚퐧퐜퐞퐫) In practice: Given that the diagnostic test result is positive, what is the probability you truly have cancer? Pr 퐜퐚퐧퐜퐞퐫 퐩퐨퐬퐢퐭퐢퐯퐞 퐫퐞퐬퐮퐥퐭 ) = ? 6 / 18 1 Example Pr 퐜퐚퐧퐜퐞퐫 퐩퐨퐬퐢퐭퐢퐯퐞 퐫퐞퐬퐮퐥퐭) = = 0.077 12 + 1 sensitivity = 86% How can that be so low? The small proportion of errors for prevalence = 1% the large majority of women who do not have breast cancer swamps the large proportion of correct diagnoses for the few women who have it. specificity = 88% Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley, 2nd ed. Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. R. Soc. Open sci. 1(3): 140216 7 / 18 The clinical trial analogy effective? clinical trial data Pr 퐝퐫퐮퐠 퐞퐟퐟퐞퐜퐭퐢퐯퐞 퐝퐚퐭퐚) = ? depends largely on prior probability that there is a real effect 8 / 18 21 “If you use p = 0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time.” prior probability 80 Pr 퐫퐞퐚퐥 퐞퐟퐟퐞퐜퐭 퐩 < 0.05) = = 0.64 80 + 45 Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. R. Soc. Open sci. 1(3): 140216. 9 / 18 22 “If you use p = 0.05 “….when you are in early discovery 8 true + tests prior probability Effect = 10 tests 2 false P(real)=0.01 - tests 940 true - tests No Effect 990 tests 50 false + tests 8 Pr 퐫퐞퐚퐥 퐞퐟퐟퐞퐜퐭 퐩 < 0.05) = = ퟎ. ퟏퟒ ‼‼ 8 + 50 9 / 18 …. if the prior is good 560 true + tests prior probability Effect = 700 tests 140 false P(real)=0.7 - tests 285 true - tests No Effect 300 tests 15 false + tests 560 Pr 퐫퐞퐚퐥 퐞퐟퐟퐞퐜퐭 퐩 < 0.05) = = ퟎ. ퟗퟕ 560 + 15 9 / 18 24 False “Discovery" Rate for p<0.05, power=0.8 as function of Prior Probability 1.0 0.8 0.6 FDR 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Prior Probability False “Discovery” Rate for p<0.05, power=0.8 as function of Prior Probability Tufts report: 11.8% products entering 1.0 Clinical Development reach approval PoS 0.8 0.6 Clinical prior PoS FDR 0.4 Research Prior Research 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Prior Probability THE VALUE OF BAYESIAN APPROACH IN DRUG DEVELOPMENT 10 / 18 Bayesian inference is the mechanism used to update the state of knowledge prior information data information posterior information 푝(휃) 푝 data 휃) 푝 휃 data) x The process to arrive at a posterior distribution makes use of Bayes’ formula. The coin flipping experiment prior information data information posterior information 퐏퐫 < ퟎ. ퟓ ≈ ퟎ. ퟕퟓ 퐏퐫 < ퟎ. ퟓ = ퟎ. ퟗퟒ x 43 heads in 100 flips e n i l e s a b m o r f e g n a h c E a t l e D r o f l o n i c u R t s n i a g a a l u m r o F y b g n i k n a R s e c n e r e f f i D n a e M f o y t i s n e d r o i r e t s o P 5 1 7 4 = y a D t c e f r e P e t i h W e l c i h e V . s v t c e f r e P e t i h W % 1 4 1 4 7 D C point - end Clinical 0.25 0.50 0.75 1 1 0.75 0.50 0.25 1.25 5 2 . 0 - 0 5 . 0 - 5 7 . 0 - 0 0 . 1 - 5 2 . 1 - ) r e t t e b s i r e l l a m s ( e c n e r e f f i d n a e M t c e f r e P e t i h W e l c i h e V . s v t c e f r e P e t i h W % 1 4 1 4 7 D C . s r o i r p t a l F . a l u m r o f * y d u t s y d u t s t c e j b u s m o d n a r ; t n e m t a e r t + e s a b = e g n a h c : l e d o M n a i s e y a B Decision rules based on Posterior Probability Posterior on based rules Decision Drug development is a learning process Discovery Phase I Phase II Phase III pre-clinical We now have computing power to apply Bayesian statistics 17 / 18 Regulatory point of view 2010 & 2016 Guidance for medical device clinical trials Guidance for Industry and FDA Staff Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials Document issued on: February 5, 2010 33 FDA CID initiative © PharmaLex 34 Historical control 35 Part II: Power, Bayesian power and Assurance © PharmaLex 36 Power vs assurance assumptions: independent samples t-test (H0: 휇1 = 휇2 vs H1: 휇1 ≠ 휇2) frequentist approach (power) A power calculation takes a particular value of the effect within the range of possible values given by H1 and poses the question: if this particular value happens to obtain, what is the probability of coming to the correct conclusion that there is a difference? assumptions: 휇1 = 100; 휇2 = 120; very strong priors! 2 2 휎1 = 휎2 = 39 © PharmaLex 37 Power vs assurance assumptions: independent samples t-test (H0: 휇1 = 휇2 vs H1: 휇1 ≠ 휇2) bayesian approach (assurance) In order to reflect the uncertainty, a large number of effect sizes, i.e. (휇1−휇2)/휎pooled, are generated using the prior distributions. A power curve is obtained for each effect size the expected (weighted by prior beliefs) power curve is calculated Note1: Given those priors, using the Frequentist power approach, the Probability of Success of the trial is 50% Note2: about 50% of Phase III trials are failing because of lack of efficacy (S. Wang, FDA, 2008) © PharmaLex 38 (Frequentist) Power Assurance “Assurance is the unconditional probability Let R denote the rejection of the null that a trial will lead to a specific outcome” hypothesis, the power is, assuming parameter values of 휽 = 휽∗ 훾 푛 ≔ ∫ 휋 휃, 푛 푓 휃 푑휃 휋 휽∗, 푛 ≔ Pr(푅|휽∗, 풏) 훾(푛) ≔ Pr 푅 = 퐸휃[휋 휽 ] It is a conditional probability. It is It is thus also a function of n (and eventually conditional on the parameters of the other nuisance parameters) model, e.g. the “true effect size” in a The assurance is the expected power over all frequentist test and the sample size.
Recommended publications
  • Reproducibility, Replicability, and Generalization in the Social, Behavioral, and Economic Sciences
    REPRODUCIBILITY, REPLICABILITY, AND GENERALIZATION IN THE SOCIAL, BEHAVIORAL, AND ECONOMIC SCIENCES Report of the Subcommittee on Replicability in Science of the SBE Advisory Committee to the National Science Foundation 13 May 2015 Presentation at SBE AC Spring Meeting by K. Bollen. Report of the Subcommittee on Replicability in Science Advisory Committee to the NSF SBE Directorate Subcommittee Members: Kenneth Bollen (University of North Carolina at Chapel Hill, Cochair) John T. Cacioppo (University of Chicago, Cochair) Robert M. Kaplan (Agency for Healthcare Research and Quality) Jon A. Krosnick (Stanford University) James L. Olds (George Mason University) Staff Assistance Heather Dean (National Science Foundation) TOPICS I. Background II. Subcommittee Report III. Definitions IV. Recommendations V. Final Comments Background • Spring, 2013 o NSF SBE Advisory Committee establishes subcommittee on how SBE can promote robust research practices • Summer & Fall 2013 o Subcommittee proposal for workshop on “Robust Research in the Social, Behavioral, and Economic Sciences.” • February 20-21, 2014 o Workshop convened o Participants drawn from variety of universities, funding agencies, scientific associations, and journals o Cover broad range of issues from extent and cause of problems to possible solutions o Details are in the appendix of the circulated report • Post-workshop period o Document & digest workshop content o Discuss and propose recommendations o Complete report Subcommittee Report • DEFINITIONS o No consensus in science on the meanings
    [Show full text]
  • Metabolic Sensor Governing Bacterial Virulence in Staphylococcus Aureus
    Metabolic sensor governing bacterial virulence in PNAS PLUS Staphylococcus aureus Yue Dinga, Xing Liub, Feifei Chena, Hongxia Dia, Bin Xua, Lu Zhouc, Xin Dengd,e, Min Wuf, Cai-Guang Yangb,1, and Lefu Lana,1 aDepartment of Molecular Pharmacology and bChinese Academy of Sciences Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; cDepartment of Medicinal Chemistry, School of Pharmacy, Fudan University, Shanghai 201203, China; dDepartment of Chemistry and eInstitute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637; and fDepartment of Basic Sciences University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND 58203 Edited by Richard P. Novick, New York University School of Medicine, New York, NY, and approved October 14, 2014 (received for review June 13, 2014) An effective metabolism is essential to all living organisms, in- To survive and replicate efficiently in the host, S. aureus has cluding the important human pathogen Staphylococcus aureus.To developed exquisite mechanisms for scavenging nutrients and establish successful infection, S. aureus must scavenge nutrients adjusting its metabolism to maintain growth while also coping with and coordinate its metabolism for proliferation. Meanwhile, it also stress (6, 11). On the other hand, S. aureus produces a wide array must produce an array of virulence factors to interfere with host of virulence factors to evade host immune defenses and to derive defenses. However, the ways in which S. aureus ties its metabolic nutrition either parasitically or destructively from the host during state to its virulence regulation remain largely unknown. Here we infections (6).
    [Show full text]
  • Principles of Validation of Diagnostic Assays for Infectious Diseases1
    PRINCIPLES OF VALIDATION OF DIAGNOSTIC ASSAYS FOR INFECTIOUS DISEASES1 R. H. JACOBSON, New York State College of Veterinary Medicine, Cornell University, Ithaca, New York, USA Abstract PRINCIPLES OF VALIDATION OF DIAGNOSTIC ASSAYS FOR INFECTIOUS DISEASES. Assay validation requires a series of inter-related processes. Assay validation is an experimental process: reagents and protocols are optimized by experimentation to detect the analyte with accuracy and precision. Assay validation is a relative process: its diagnostic sensitivity and diagnostic specificity are calculated relative to test results obtained from reference animal populations of known infection/exposure status. Assay validation is a conditional process: classification of animals in the target population as infected or uninfected is conditional upon how well the reference animal population used to validate the assay represents the target population; accurate predictions of the infection status of animals from test results (PV+ and PV−) are conditional upon the estimated prevalence of disease/infection in the target population. Assay validation is an incremental process: confidence in the validity of an assay increases over time when use confirms that it is robust as demonstrated by accurate and precise results; the assay may also achieve increasing levels of validity as it is upgraded and extended by adding reference populations of known infection status. Assay validation is a continuous process: the assay remains valid only insofar as it continues to provide accurate and precise results as proven through statistical verification. Therefore, the work required for validation of diagnostic assays for infectious diseases does not end with a time-limited series of experiments based on a few reference samples rather, to assure valid test results from an assay requires constant vigilance and maintenance of the assay, along with reassessment of its performance characteristics for each unique population of animals to which it is applied.
    [Show full text]
  • New NIH Guidelines to Improve Experimental Rigor and Reproducibility
    New NIH guidelines to improve Experimental Rigor and Reproducibility Oswald Steward Professor, Department of Anatomy & Neurobiology Senior Associate Dean for Research University of California School of Medicine What I’ll tell you 1) Background: Increasing reports of failures to replicate 2) Previous actions by NIH to enhance rigor 3) New requirements for grants submitted after January 2016 4) What NIH will be looking for in new required sections 5) Some suggested best practices to meet new standards for scientific rigor • The Problem: Numerous recent reports of findings that can’t be replicated. Failure to replicate key results from preclinical cancer studies: 1) Prinz et al. (2011) Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. Inconsistencies were found in 2/3 studies 2) Begley and Ellis (2012) Raise standards for preclinical cancer research. Nature. Only 6/53 “landmark” papers replicated 3) Begley (2013) Six red flags for suspect work. Nature. -Were experiments performed blinded? -Were basic experiments repeated? -Were all the results presented? -Were there positive and negative controls? -Were reagents validated? -Were statistical tests appropriate? • In ALS field, Scott et al., (2008) re-assessed 70 drugs that had been reported to increase lifespan in an animal model of ALS (SOD1-G93A knockout mice). • 221 separate studies were carried out over a five year period that involved over 18,000 mice. • Some drugs were already in clinical trials. • Outcome: There was no statistically significant positive effect for any of the 70 compounds tested. • Conclusion: Previously published effects were measurements of noise in the distribution of survival means as opposed to an actual drug effect.
    [Show full text]
  • Studies of Staphylococcal Infections. I. Virulence of Staphy- Lococci and Characteristics of Infections in Embryonated Eggs * WILLIAM R
    Journal of Clinical Investigation Vol. 43, No. 11, 1964 Studies of Staphylococcal Infections. I. Virulence of Staphy- lococci and Characteristics of Infections in Embryonated Eggs * WILLIAM R. MCCABE t (From the Research Laboratory, West Side Veterans Administration Hospital, and the Department of Medicine, Research and Educational Hospitals, University of Illinois College of Medicine, Chicago, Ill.) Many of the determinants of the pathogenesis niques still require relatively large numbers of and course of staphylococcal infections remain staphylococci to produce infection (19). Fatal imprecisely defined (1, 2) despite their increas- systemic infections have been equally difficult to ing importance (3-10). Experimental infections produce in animals and have necessitated the in- in suitable laboratory animals have been of con- jection of 107 to 109 bacteria (20-23). A few siderable assistance in clarifying the role of host strains of staphylococci have been found that are defense mechanisms and specific bacterial virulence capable of producing lethal systemic infections factors with a variety of other infectious agents. with inocula of from 102 to 103 bacteria (24) and A sensitive experimental model would be of value have excited considerable interest (25-27). The in defining the importance of these factors in virulence of these strains apparently results from staphylococcal infections, but both humans and an unusual antigenic variation (27, 28) which, the usual laboratory animals are relatively re- despite its interest, is of doubtful significance in sistant. Extremely large numbers of staphylo- human staphylococcal infection, since such strains cocci are required to produce either local or sys- have been isolated only rarely from clinical in- temic infections experimentally.
    [Show full text]
  • On the Reproducibility of Meta-Analyses.Docx
    On the Reproducibility of Meta-Analyses 1 1 RUNNING HEAD: ON THE REPRODUCIBILITY OF META-ANALYSES 2 3 4 On the Reproducibility of Meta-Analyses: Six Practical Recommendations 5 6 Daniël Lakens 7 Eindhoven University of Technology 8 Joe Hilgard 9 University of Missouri - Columbia 10 Janneke Staaks 11 University of Amsterdam 12 13 In Press, BMC Psychology 14 15 Word Count: 6371 16 17 Keywords: Meta-Analysis; Reproducibility; Open Science; Reporting Guidelines 18 19 Author Note: Correspondence can be addressed to Daniël Lakens, Human Technology 20 Interaction Group, IPO 1.33, PO Box 513, 5600MB Eindhoven, The Netherlands. E-mail: 21 [email protected]. We would like to thank Bianca Kramer for feedback on a previous draft 22 of this manuscript. 23 24 Competing Interests: The authors declared that they had no competing interest with respect 25 to their authorship or the publication of this article. 26 27 Author Contributions: All authors contributed writing this manuscript. DL drafted the 28 manuscript, DL, JH, and JS revised the manuscript. All authors read and approved the final 29 manuscript. 30 31 Email Addresses: D. Lakens: [email protected] 32 Joe Hilgard: [email protected] 33 Janneke Staaks: [email protected] On the Reproducibility of Meta-Analyses 2 34 Abstract 35 Background 36 Meta-analyses play an important role in cumulative science by combining information 37 across multiple studies and attempting to provide effect size estimates corrected for 38 publication bias. Research on the reproducibility of meta-analyses reveals that errors are 39 common, and the percentage of effect size calculations that cannot be reproduced is much 40 higher than is desirable.
    [Show full text]
  • 10 Things to Know About Reproducibility and Replicability
    10 Things to Know About Reproducibility and Replicability One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some argue that such an observed inconsistency can be an important precursor to new discovery while others fear it may be a symptom of a lack of rigor in science. When a newly reported scientific study has far-reaching implications for science or a major, potential impact on the public, the question of its reliability takes on heightened importance. 1 The terms reproducibility and replicability take on 6 Not all studies can be replicated. While scientists are a range of meanings in contemporary usage. The report able to test for replicability of most studies, it is impossible to do distinguishes and defines the terms as follows: Reproducibility so for studies of ephemeral phenomena. means obtaining consistent results using the same input data, computational steps, methods, and conditions of analysis; it is synonymous with computational reproducibility. Replicability 7 One type of scientific research tool, statistical means obtaining consistent results across studies aimed at inference, has an outsized role in replicability discussions answering the same scientific question, each of which has due to the frequent misuse of statistics and the use of a obtained its own data. p-value threshold for determining “statistical significance.” Biases in published research can occur due to the excess reliance on and misunderstanding of statistical significance. 2 Reproducibility and replicability matter.
    [Show full text]
  • Lab Services Accuracy and Reproducibility
    Accuracy and Reproducibility ABOUT The Laboratory Services Committee of the American Thyroid Association® (ATA) conducted a survey of ATA® members to identify areas of member interest for education in pathology and laboratory medicine. In response to the results of the survey, the Lab Service Committee developed a series of educational materials to share with the ATA® membership. The topics below were ranked as high educational priorities amongst the membership. FIGURE 1A: FIGURE 1B: Single perfectly accurate result. Three inaccurate results that ACCURACY AND PRECISION are accurate on average WHAT ARE THE INDICATORS OF TEST RELIABILITY? The most commonly used descriptors of analytical performance for laboratories are accuracy and precision. Accuracy is the degree to which the test results approximate the value that would be obtained by a reference method, and precision is the reproducibility of test results among measurements, days, and operating conditions. The metrics of analytical performance are distinct from the metrics of clinical performance, which are focused on sensitivity, specificity, and predictive values. FIGURE 1C: Three perfectly precise but ACCURACY (TRUENESS) inaccurate results. A test method is said to be accurate when the values it produces are close to the values that are obtained from a reference method. Thus, PRECISION (REPEATABILITY) the concept of accuracy is highly dependent on the correct choice of The description of accuracy above demonstrates why this metric alone a reference method as all judgments of accuracy treat the reference is insufficient to characterize how a test performs. Three arrows shown method as truth; deviations from the reference method are therefore in (Figure 1B) are all very inaccurate, but they average to be accurate.
    [Show full text]
  • A Novel High Throughput, Parallel Infection Assay for Determining The
    viruses Article A Novel High Throughput, Parallel Infection Assay for Determining the Replication Capacities of 346 Primary HIV-1 Isolates of the Zurich Primary HIV-1 Infection Study in Primary Cells Audrey E. Rindler 1,2,3 , Herbert Kuster 1,2 , Kathrin Neumann 1,2 , Christine Leemann 1,2, Dominique L. Braun 1,2 , Karin J. Metzner 1,2,* and Huldrych F. Günthard 1,2,* 1 Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zürich, 8091 Zürich, Switzerland; [email protected] (A.E.R.); [email protected] (H.K.); [email protected] (K.N.); [email protected] (C.L.); [email protected] (D.L.B.) 2 Institute of Medical Virology, University of Zürich, 8057 Zürich, Switzerland 3 Life Sciences Graduate School, University of Zürich, 8057 Zürich, Switzerland * Correspondence: [email protected] (K.J.M.); [email protected] (H.F.G.) Abstract: HIV-1 replication capacity is an important characteristic to understand the replication competence of single variants or virus populations. It can further aid in the understanding of HIV-1 pathogenicity, disease progression, and drug resistance mutations. To effectively study RC, many assays have been established. However, there is still demand for a high throughput replication capacity assay using primary cells which is robust and reproducible. In this study, we established Citation: Rindler, A.E.; Kuster, H.; such an assay and validated it using 346 primary HIV-1 isolates from patients enrolled in the Neumann, K.; Leemann, C.; Braun, Zurich Primary HIV Infection study (ZPHI) and two control viruses, HIV-1 JR-CSF and HIV-1 D.L.; Metzner, K.J.; Günthard, H.F.
    [Show full text]
  • 1 Maximizing the Reproducibility of Your Research Open
    1 Maximizing the Reproducibility of Your Research Open Science Collaboration1 Open Science Collaboration (in press). Maximizing the reproducibility of your research. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions. New York, NY: Wiley. Authors’ Note: Preparation of this chapter was supported by the Center for Open Science and by a Veni Grant (016.145.049) awarded to Hans IJzerman. Correspondence can be addressed to Brian Nosek, [email protected]. 1 Alexander A. Aarts, Nuenen, The Netherlands; Frank A. Bosco, Virginia Commonwealth University; Katherine S. Button, University of Bristol; Joshua Carp, Center for Open Science; Susann Fiedler, Max Planck Institut for Research on Collective Goods; James G. Field, Virginia Commonwealth University; Roger Giner-Sorolla, University of Kent; Hans IJzerman, Tilburg University; Melissa Lewis, Center for Open Science; Marcus Munafò, University of Bristol; Brian A. Nosek, University of Virginia; Jason M. Prenoveau, Loyola University Maryland; Jeffrey R. Spies, Center for Open Science 2 Commentators in this book and elsewhere describe evidence that modal scientific practices in design, analysis, and reporting are interfering with the credibility and veracity of the published literature (Begley & Ellis, 2012; Ioannidis, 2005; Miguel et al., 2014; Simmons, Nelson, & Simonsohn, 2011). The reproducibility of published findings is unknown (Open Science Collaboration, 2012a), but concern that is lower than desirable is widespread -
    [Show full text]
  • RCR Rigor and Reproducibility
    Responsible Conduct of Research Workshop Series, 2017-2018 Rigor and Reproducibility -­‐-­‐ March 1, 2018-­‐-­‐ Rigour or rigor (see spelling differences) describes a condition of stiffness or strictness. Rigour frequently refers to a process of adhering absolutely to certain constraints, or the practice of maintaining strict consistency with certain predefined parameters. Webster’s DicBonary accessed 2016 December Rigor Allende, J.. Rigor – The essence of scientific work. Electronic Journal of Biotechnolo gy, North America, 724 07 2012. Have you talked about rigor in your research group? 1. Yes 2. No. 3. Maybe Reproducibility is the ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently. Reproducing an experiment is called replicating it. Reproducibility is one of the main principles of the scientific method. But a new paper in Science Translational Medicine argues that the current movement to replicate results is crippled by a lack of agreement about the very nature of the word “replication” and its synonyms. hDps://www.statnews.com/2016/06/01/reproducibility-­‐ hDps://en.wikipedia.org/wiki/Reproducibility science/ Have you talked about reproducibility in your research group? 1. Yes 2. No. 3. Maybe I followed the m•thods In your paper to the last Well .... When did you do the experiment? On a Fr iday? detail..• This experl ment doesn't world Oooh.M that's why! It never works on Fridays! Anll 20t'> S !otTWTfS /' )A'J<~"' .,/r' Y.a'>... ..,.,,,.'SJ! ,P . .. ll ll a. l> i. l' ll 1' ID Rigor, Transparency and Reproducibility Carrie Dykes, PhD Research Engagement Specialist ' ' I C~ Al &'l'IJ A ' 'SLATIONAL sca1'cE ..;{u.
    [Show full text]
  • Companion Guidelines on Replication and Reproducibility in Education
    Companion Guidelines on Replication & Reproducibility in Education Research A Supplement to the Common Guidelines for Education Research and Development A Report from The National Science Foundation and The Institute of Education Sciences, U.S. Department of Education Nov 28, 2018 Introduction The Institute of Education Sciences (IES) and the National Science Foundation (NSF) jointly issued the Common Guidelines for Education Research and Development in 2013 to describe “shared understandings of the roles of various types of ‘genres’ of research in generating evidence about strategies and interventions for increasing student learning” (IES and NSF, 2013: 7). In the intervening period, the education research community and federal policymakers have been increasingly attentive to the role of, and factors that promote and inhibit, replication and reproducibility of research. In order to build a coherent body of work to inform evidence-based decision making, there is a need to increase the visibility and value of reproducibility and replication studies among education research stakeholders. The purpose of this companion to the Common Guidelines is to highlight the importance of these studies and provide crossagency guidance on the steps investigators are encouraged to take to promote corroboration, ensure the integrity of education research, and extend the evidence base. The companion begins with a brief overview of the central role of replication in the advancement of science, including definitions of key terminology for the purpose of establishing a common understanding of the concepts. The companion also addresses the challenges and implications of planning and conducting reproducibility and replication studies within education. Background and terminology Efforts to reproduce and replicate research findings are central to the accumulation of scientific knowledge that helps inform evidence-based decision making and policies.
    [Show full text]