UC San Diego UC San Diego Electronic Theses and Dissertations

Total Page:16

File Type:pdf, Size:1020Kb

UC San Diego UC San Diego Electronic Theses and Dissertations UC San Diego UC San Diego Electronic Theses and Dissertations Title Multimodal evidence Permalink https://escholarship.org/uc/item/2x54p2rw Author Stegenga, Jacob Publication Date 2011 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Multimodal Evidence A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Philosophy by Jacob Stegenga Committee in charge: Professor Nancy Cartwright, Chair Professor William Bechtel Professor Craig Callender Professor Naomi Oreskes Professor Robert Westman 2011 © Jacob Stegenga, 2011 All rights reserved. The Dissertation of Jacob Stegenga is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Chair University of California, San Diego 2011 iii For Skye, who teaches that there is much about which to argue; For my Mother, whose arguments are sparing, and kind; For Bob, who argues well; For Alexa, who knows that one need not always argue. iv It is the profession of philosophers to question platitudes. A dangerous profession, since philosophers are more easily discredited than platitudes. David Lewis v TABLE OF CONTENTS Signature Page……………………………………………………………… iii Dedication………………………………………………………………….. iv Epigraph……………………………………………………………………. v Table of Contents…………………………………………………………... vi List of Tables………………………………………………………………. vii List of Terminology………………………………………………………... viii List of Abbreviations and Symbols………………………………………… xii Acknowledgements………………………………………………………… xiv Vita…………………………………………………………………………. xviii Abstract…………………………………………………………………….. xix Chapter 1: Introduction…………………………………………………….. 1 Chapter 2: Varieties of Evidential Experience……………………………... 14 Chapter 3: Underdetermination of Evidence by Theory…………………… 49 Chapter 4: Independent Evidence………………………………………….. 75 Chapter 5: Robustness, Discordance, and Relevance……………………… 102 Chapter 6: Amalgamating Multimodal Evidence………………………….. 124 Chapter 7: Is Meta-Analysis the Platinum Standard of Evidence?………… 145 Chapter 8: An Impossibility Theorem for Amalgamating Evidence………. 176 References………………………………………………………………….. 214 vi LIST OF TABLES Table 1. Features of Evidence……………………………………………… 33 Table 2. Likelihoods of Evidence in La Jolla Murder Mystery……………. 90 Table 3. Binary Outcomes in an Experiment and Control Group………….. 167 Table 4. Analogy Between Amalgamating Preferences and Amalgamating Evidence…………………………………………………… 182 Table 5. Profiles Constructed in Proof of Impossibility Theorem for Amalgamating Evidence…………………………………………………… 212 vii LIST OF TERMINOLOGY In this dissertation my use of some words departs slightly from standard philosophical or scientific usage, and for some concepts I have had to invent entirely new words or phrases. The following glossary provides informal definitions of such terms; for terms that have a corresponding formal definition I indicate its location in the dissertation. Technical terms used in standard ways (such as ‘meta-analysis’) are defined in the body of the dissertation. Amalgamation The combination of multimodal evidence. Amalgamation Method A method to amalgamate multimodal evidence into a measure of overall support for a hypothesis. Amalgamation Function A type of amalgamation method, the inputs and output of which are limited to ordinal rankings of hypotheses (formal definition on page 186- 187). Concordance Consistency or agreement of evidence from multiple modes. viii Conditional Probabilistic Independence Probabilistic independence between multiple modes of evidence, conditional on a hypothesis (formal definition on page 96). Confirmation Ordering A confirmation relation, denoted by ≽i , where i is a mode (the confirmation ordering relation is indexed to the mode of evidence), such that H1 ≽i H2 means “evidence from mode i confirmation orders H1 equally to or above H2” (formal definition on page 183-184). Constraint A desideratum of amalgamation methods which stipulates that AMs should constrain intersubjective assessment of hypotheses. Dictatorship A mode of evidence which always trumps other modes in an amalgamation function. One of the desiderata for an amalgamation function is ‘non-dictatorship’, which stipulates that no mode should be a dictator (formal definition on page 208). ix Discordance Inconsistency or disagreement of evidence from multiple modes. Dyssynergistic Evidence Multimodal evidence which confirms the negation of a hypothesis which is confirmed by evidence from the individual modes constituting the multimodal evidence (formal definition on page 89). Independence of Irrelevant Alternatives A desideratum for amalgamation functions which stipulates that a ranking of two hypotheses relative to each other by an amalgamation function should depend only on how the individual modes rank these two hypotheses relative to each other, and not on how the modes rank them relative to other hypotheses (formal definition on page 207-208). Mode A method of producing evidence or a particular way of learning about the world: a technique, apparatus, or experiment. Modes are types of which there can be a plurality of tokens. A full account of modes requires a criterion of individuation of modes, which is the subject of Chapter 4. x Multimodal Evidence The set of evidence produced by multiple independent modes relevant to a given hypothesis (formal notation on page 183). Ontic Independence The form of independence between multiple modes of evidence based on different materials constituting the modes or different assumptions or theories required by the modes. Robustness The state in which a hypothesis is supported by concordant multimodal evidence. Unanimity A desideratum for amalgamation functions which stipulates that if all modes confirm hypothesis H1 over H2, then the amalgamation function must do the same (formal definition on page 207). Unrestricted Domain A desideratum for amalgamation functions which stipulates that the amalgamation function can accept as input all possible confirmation orderings. xi LIST OF ABBREVIATIONS AND SYMBOLS AMM1944 Avery, McLeod, and McCarty (1944) AEC Atomic Energy Commission AF Amalgamation Function AM Amalgamation Method B Believability BT Bayes’ Theorem C Concordance CPI Conditional Probabilistic Independence D Non-Dictatorship DE Dyssynergistic Evidence DNA Deoxyribonucleic Acid e Evidence H Hypothesis I Independence of Irrelevant Alternatives JC Jeffrey Conditionalization NAS National Academy of Sciences NRS Non-randomized Study OI Ontic Independence P Patterns p(x) Probability of x xii p(x|y) Probability of x conditional on y Q Quality R Relevance RCT Randomized Controlled Trial RD Risk Difference RR Risk Ratio SC Strict Conditionalization T Transparency TS Transforming Substance U Unanimity UET Underdetermination of Evidence by Theory UV Ultraviolet xiii ACKNOWLEDGEMENTS The time and critical attention which Nancy Cartwright has dedicated to this dissertation is incredible; she has had an enormous influence on the material presented here. Nancy’s influence goes far beyond her copious written feedback on multiple drafts of these chapters. Our rambles through the English countryside and hikes through California deserts were ideal ways to work through material in the following pages. Her own work is my exemplar, as is her work ethic – I am grateful for Nancy’s supervision. Graduate students are often asked who is on their supervisory committee. My response has always felt like a boast. Bill Bechtel’s leadership of the Philosophy of Biology Research Group allowed discussions which helped to develop nascent ideas into chapters. Craig Callender encouraged me to pursue the eccentric topics that constitute this dissertation, criticized my work when needed, and provided guidance on broader matters of graduate study. I am grateful to Bill and Craig for being proxy advisors. During early discussions, and in her reading of my prospectus, Naomi Oreskes posed questions which have occupied me for two years. Bob Westman has provided thoughtful guidance since my first weeks at UCSD when I took his seminar in history and philosophy of science. Moreover, Bob has facilitated the development of my dissertation, and my ability to share it in a dozen cities, via his directorship of the Science Studies Program. I know too well that this dissertation is not what Bill, Craig, Naomi, or Bob had hoped for; I wish that it could suffice as its own apology. xiv Several other faculty members in the philosophy department provided feedback on aspects of my dissertation, including Don Rutherford, Sam Rickless, Christian Wüthrich, Rick Grush, David Brink, and Clinton Tolley. My fellow graduate students at UCSD have been heavily involved with this dissertation. Tarun Menon has encouraged and challenged me from the start, and has read most of this dissertation and spent many hours discussing it with me. Tarun is also a co-author of Chapter 4. I am fortunate to have Tarun as a colleague and friend. Eric Martin’s paper on combining evidence was an early inspiration. Long bike rides through San Diego County with Charlie Kurth were a great way to discuss philosophy. Daniel Schwartz, Ioan Muntean, Cole Macke, Sindhuja Bhakthavatsalam, Nat Jacobs, Marta Halina, Matt Brown, and Joyce Havstad have read parts of this dissertation and provided critical feedback. My fellow students engaged this work when it was in early, painfully inchoate stages, and so their fortitude is admirable. Nancy occasionally
Recommended publications
  • A Dynamic Epistemic Logic Analysis of Equality Negation and Other Epistemic Covering Tasks$,$$
    A Dynamic Epistemic Logic Analysis of Equality Negation and other Epistemic Covering TasksI,II Hans van Ditmarscha, Eric´ Goubaultb, Marijana Lazi´cc, J´er´emy Ledentd, Sergio Rajsbaume aUniversit´ede Lorraine, CNRS, LORIA, F-54000 Nancy, France bLIX, CNRS, Ecole´ Polytechnique, Institute Polytechnique de Paris, France cTechnical University of Munich, Germany dDepartment of Computer and Information Sciences, University of Strathclyde, UK eInstituto de Matem´aticas, UNAM, Mexico Abstract In this paper we study the solvability of the equality negation task in a simple wait- free model where two processes communicate by reading and writing shared variables or exchanging messages. In this task, the two processes start with a private input value in the set f0; 1; 2g, and after communicating, each one must decide a binary output value, so that the outputs of the processes are the same if and only if the input values of the processes are different. This task is already known to be unsolvable; our goal here is to prove this result using the dynamic epistemic logic (DEL) approach introduced by Goubault, Ledent and Rajsbaum in GandALF 2018. We show that in fact, there is no epistemic logic formula that explains why the task is unsolvable. Furthermore, we observe that this task is a particular case of an epistemic covering task. We thus establish a connection between the existing DEL framework and the theory of covering spaces in topology, and prove that the same result holds for any epistemic covering task: no epistemic formula explains the unsolvability. Keywords: Dynamic Epistemic Logic, distributed computing, equality negation, combinatorial topology, covering spaces.
    [Show full text]
  • This Page Intentionally Left Blank [50] Develop Computer Programs for Simplifying Sums That Involve Binomial CoeCients
    This page intentionally left blank [50] Develop computer programs for simplifying sums that involve binomial coecients. Exercise 1.2.6.63 in The Art of Computer Programming, Volume 1: Fundamental Algorithms by Donald E. Knuth, Addison Wesley, Reading, Massachusetts, 1968. A=B Marko Petkovsek Herbert S. Wilf University of Ljubljana University of Pennsylvania Ljubljana, Slovenia Philadelphia, PA, USA Doron Zeilberger Temple University Philadelphia, PA, USA April 27, 1997 ii Contents Foreword vii A Quick Start ... ix IBackground 1 1 Proof Machines 3 1.1Evolutionoftheprovinceofhumanthought.............. 3 1.2Canonicalandnormalforms....................... 7 1.3Polynomialidentities........................... 8 1.4Proofsbyexample?............................ 9 1.5Trigonometricidentities......................... 11 1.6Fibonacciidentities............................ 12 1.7Symmetricfunctionidentities...................... 12 1.8Ellipticfunctionidentities........................ 13 2 Tightening the Target 17 2.1Introduction................................ 17 2.2Identities.................................. 21 2.3Humanandcomputerproofs;anexample................ 23 2.4AMathematicasession.......................... 27 2.5AMaplesession.............................. 29 2.6Whereweareandwhathappensnext.................. 30 2.7Exercises.................................. 31 3 The Hypergeometric Database 33 3.1Introduction................................ 33 3.2Hypergeometricseries........................... 34 3.3Howtoidentifyaseriesashypergeometric..............
    [Show full text]
  • Downloading of a Human Consciousness Into a Digital Computer Would Involve ‘A Certain Loss of Our Finer Feelings and Qualities’89
    When We Can Trust Computers (and When We Can’t) Peter V. Coveney1,2* Roger R. Highfield3 1Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK, https://orcid.org/0000-0002-8787-7256 2Institute for Informatics, Science Park 904, University of Amsterdam, 1098 XH Amsterdam, Netherlands 3Science Museum, Exhibition Road, London SW7 2DD, UK, https://orcid.org/0000-0003-2507-5458 Keywords: validation; verification; uncertainty quantification; big data; machine learning; artificial intelligence Abstract With the relentless rise of computer power, there is a widespread expectation that computers can solve the most pressing problems of science, and even more besides. We explore the limits of computational modelling and conclude that, in the domains of science and engineering that are relatively simple and firmly grounded in theory, these methods are indeed powerful. Even so, the availability of code, data and documentation, along with a range of techniques for validation, verification and uncertainty quantification, are essential for building trust in computer generated findings. When it comes to complex systems in domains of science that are less firmly grounded in theory, notably biology and medicine, to say nothing of the social sciences and humanities, computers can create the illusion of objectivity, not least because the rise of big data and machine learning pose new challenges to reproducibility, while lacking true explanatory power. We also discuss important aspects of the natural world which cannot be solved by digital means. In the long-term, renewed emphasis on analogue methods will be necessary to temper the excessive faith currently placed in digital computation.
    [Show full text]
  • Caveat Emptor: the Risks of Using Big Data for Human Development
    1 Caveat emptor: the risks of using big data for human development Siddique Latif1,2, Adnan Qayyum1, Muhammad Usama1, Junaid Qadir1, Andrej Zwitter3, and Muhammad Shahzad4 1Information Technology University (ITU)-Punjab, Pakistan 2University of Southern Queensland, Australia 3University of Groningen, Netherlands 4National University of Sciences and Technology (NUST), Pakistan Abstract Big data revolution promises to be instrumental in facilitating sustainable development in many sectors of life such as education, health, agriculture, and in combating humanitarian crises and violent conflicts. However, lurking beneath the immense promises of big data are some significant risks such as (1) the potential use of big data for unethical ends; (2) its ability to mislead through reliance on unrepresentative and biased data; and (3) the various privacy and security challenges associated with data (including the danger of an adversary tampering with the data to harm people). These risks can have severe consequences and a better understanding of these risks is the first step towards mitigation of these risks. In this paper, we highlight the potential dangers associated with using big data, particularly for human development. Index Terms Human development, big data analytics, risks and challenges, artificial intelligence, and machine learning. I. INTRODUCTION Over the last decades, widespread adoption of digital applications has moved all aspects of human lives into the digital sphere. The commoditization of the data collection process due to increased digitization has resulted in the “data deluge” that continues to intensify with a number of Internet companies dealing with petabytes of data on a daily basis. The term “big data” has been coined to refer to our emerging ability to collect, process, and analyze the massive amount of data being generated from multiple sources in order to obtain previously inaccessible insights.
    [Show full text]
  • Arxiv:1803.01386V4 [Math.HO] 25 Jun 2021
    2009 SEKI http://wirth.bplaced.net/seki.html ISSN 1860-5931 arXiv:1803.01386v4 [math.HO] 25 Jun 2021 A Most Interesting Draft for Hilbert and Bernays’ “Grundlagen der Mathematik” that never found its way into any publi- Working-Paper cation, and 2 CVof Gisbert Hasenjaeger Claus-Peter Wirth Dept. of Computer Sci., Saarland Univ., 66123 Saarbrücken, Germany [email protected] SEKI Working-Paper SWP–2017–01 SEKI SEKI is published by the following institutions: German Research Center for Artificial Intelligence (DFKI GmbH), Germany • Robert Hooke Str.5, D–28359 Bremen • Trippstadter Str. 122, D–67663 Kaiserslautern • Campus D 3 2, D–66123 Saarbrücken Jacobs University Bremen, School of Engineering & Science, Campus Ring 1, D–28759 Bremen, Germany Universität des Saarlandes, FR 6.2 Informatik, Campus, D–66123 Saarbrücken, Germany SEKI Editor: Claus-Peter Wirth E-mail: [email protected] WWW: http://wirth.bplaced.net Please send surface mail exclusively to: DFKI Bremen GmbH Safe and Secure Cognitive Systems Cartesium Enrique Schmidt Str. 5 D–28359 Bremen Germany This SEKI Working-Paper was internally reviewed by: Wilfried Sieg, Carnegie Mellon Univ., Dept. of Philosophy Baker Hall 161, 5000 Forbes Avenue Pittsburgh, PA 15213 E-mail: [email protected] WWW: https://www.cmu.edu/dietrich/philosophy/people/faculty/sieg.html A Most Interesting Draft for Hilbert and Bernays’ “Grundlagen der Mathematik” that never found its way into any publication, and two CV of Gisbert Hasenjaeger Claus-Peter Wirth Dept. of Computer Sci., Saarland Univ., 66123 Saarbrücken, Germany [email protected] First Published: March 4, 2018 Thoroughly rev. & largely extd. (title, §§ 2, 3, and 4, CV, Bibliography, &c.): Jan.
    [Show full text]
  • Reproducibility: Promoting Scientific Rigor and Transparency
    Reproducibility: Promoting scientific rigor and transparency Roma Konecky, PhD Editorial Quality Advisor What does reproducibility mean? • Reproducibility is the ability to generate similar results each time an experiment is duplicated. • Data reproducibility enables us to validate experimental results. • Reproducibility is a key part of the scientific process; however, many scientific findings are not replicable. The Reproducibility Crisis • ~2010 as part of a growing awareness that many scientific studies are not replicable, the phrase “Reproducibility Crisis” was coined. • An initiative of the Center for Open Science conducted replications of 100 psychology experiments published in prominent journals. (Science, 349 (6251), 28 Aug 2015) - Out of 100 replication attempts, only 39 were successful. The Reproducibility Crisis • According to a poll of over 1,500 scientists, 70% had failed to reproduce at least one other scientist's experiment or their own. (Nature 533 (437), 26 May 2016) • Irreproducible research is a major concern because in valid claims: - slow scientific progress - waste time and resources - contribute to the public’s mistrust of science Factors contributing to Over 80% of respondents irreproducibility Nature | News Feature 25 May 2016 Underspecified methods Factors Data dredging/ Low statistical contributing to p-hacking power irreproducibility Technical Bias - omitting errors null results Weak experimental design Underspecified methods Factors Data dredging/ Low statistical contributing to p-hacking power irreproducibility Technical Bias - omitting errors null results Weak experimental design Underspecified methods • When experimental details are omitted, the procedure needed to reproduce a study isn’t clear. • Underspecified methods are like providing only part of a recipe. ? = Underspecified methods Underspecified methods Underspecified methods Underspecified methods • Like baking a loaf of bread, a “scientific recipe” should include all the details needed to reproduce the study.
    [Show full text]
  • Rejecting Statistical Significance Tests: Defanging the Arguments
    Rejecting Statistical Significance Tests: Defanging the Arguments D. G. Mayo Philosophy Department, Virginia Tech, 235 Major Williams Hall, Blacksburg VA 24060 Abstract I critically analyze three groups of arguments for rejecting statistical significance tests (don’t say ‘significance’, don’t use P-value thresholds), as espoused in the 2019 Editorial of The American Statistician (Wasserstein, Schirm and Lazar 2019). The strongest argument supposes that banning P-value thresholds would diminish P-hacking and data dredging. I argue that it is the opposite. In a world without thresholds, it would be harder to hold accountable those who fail to meet a predesignated threshold by dint of data dredging. Forgoing predesignated thresholds obstructs error control. If an account cannot say about any outcomes that they will not count as evidence for a claim—if all thresholds are abandoned—then there is no a test of that claim. Giving up on tests means forgoing statistical falsification. The second group of arguments constitutes a series of strawperson fallacies in which statistical significance tests are too readily identified with classic abuses of tests. The logical principle of charity is violated. The third group rests on implicit arguments. The first in this group presupposes, without argument, a different philosophy of statistics from the one underlying statistical significance tests; the second group—appeals to popularity and fear—only exacerbate the ‘perverse’ incentives underlying today’s replication crisis. Key Words: Fisher, Neyman and Pearson, replication crisis, statistical significance tests, strawperson fallacy, psychological appeals, 2016 ASA Statement on P-values 1. Introduction and Background Today’s crisis of replication gives a new urgency to critically appraising proposed statistical reforms intended to ameliorate the situation.
    [Show full text]
  • Chapter 10 PROOFS of IMPOSSIBILITY and POSSIBILITY PROOFS
    Chapter 10 PROOFS OF IMPOSSIBILITY AND POSSIBILITY PROOFS There are proofs of impossibility and possibility proofs, a situation that sometimes produced a bit of confusion in the debates on hidden-variables theories and sometimes led to something worse than confusion. So, we have to clarify it. Proofs of Impossibility von Neumann Brief Beginning of the History In his famous book, providing an axiomatization of quantum mechanics, von Neumann [26] introduced in 1932 a proof of impossibility of hidden- variables theories, the famous “von Neumann’s proof.” This were a few years after the Solvay Congress of 1927 during which de Broglie’s pilot wave was defeated. Although de Broglie had already renegated his work, and rejoined the camp of orthodoxy, the publication of the proof certainly confirmed that he had lost time with his heretical detour. Furthermore, in the words of Pinch [154], “the proof was accepted and welcomed by the physics elite.” The continuation of the story shows that the acceptance was not based on a careful examination of the proof but rather on other elements such as a psychological pressure exerted by the very high reputation of its author or the unconscious desire to definitively eliminate an ill-timed and embarrassing issue. Afterward, for a long time, the rejection of hidden variables could be made just by invoking von Neumann (the man himself more than the proof). As stated by Belinfante [1], “the truth, however, happens to be that for decades nobody spoke up against von Neumann’s arguments, and that his conclusions were quoted by some as the gospel … the authority of von Neumann’s over-generalized claim for nearly two decades stifled any progress in the search for hidden variables theories.” Belinfante also [1] remarked that “the work of von Neumann (1932) was mainly concerned with the axiomatization of the mathematical methods “CH10” — 2013/10/11 — 10:28 — page 228 — #1 Hidden101113.PDF 238 10/16/2013 4:20:19 PM Proofs of Impossibility and Possibility Proofs 229 of quantum theory.
    [Show full text]
  • Fermat's Last Theorem
    Fermat's Last Theorem In number theory, Fermat's Last Theorem (sometimes called Fermat's conjecture, especially in older texts) Fermat's Last Theorem states that no three positive integers a, b, and c satisfy the equation an + bn = cn for any integer value of n greater than 2. The cases n = 1 and n = 2 have been known since antiquity to have an infinite number of solutions.[1] The proposition was first conjectured by Pierre de Fermat in 1637 in the margin of a copy of Arithmetica; Fermat added that he had a proof that was too large to fit in the margin. However, there were first doubts about it since the publication was done by his son without his consent, after Fermat's death.[2] After 358 years of effort by mathematicians, the first successful proof was released in 1994 by Andrew Wiles, and formally published in 1995; it was described as a "stunning advance" in the citation for Wiles's Abel Prize award in 2016.[3] It also proved much of the modularity theorem and opened up entire new approaches to numerous other problems and mathematically powerful modularity lifting techniques. The unsolved problem stimulated the development of algebraic number theory in the 19th century and the proof of the modularity theorem in the 20th century. It is among the most notable theorems in the history of mathematics and prior to its proof was in the Guinness Book of World Records as the "most difficult mathematical problem" in part because the theorem has the largest number of unsuccessful proofs.[4] Contents The 1670 edition of Diophantus's Arithmetica includes Fermat's Overview commentary, referred to as his "Last Pythagorean origins Theorem" (Observatio Domini Petri Subsequent developments and solution de Fermat), posthumously published Equivalent statements of the theorem by his son.
    [Show full text]
  • The Statistical Crisis in Science
    The Statistical Crisis in Science Data-dependent analysis— a "garden of forking paths"— explains why many statistically significant comparisons don't hold up. Andrew Gelman and Eric Loken here is a growing realization a short mathematics test when it is This multiple comparisons issue is that reported "statistically sig expressed in two different contexts, well known in statistics and has been nificant" claims in scientific involving either healthcare or the called "p-hacking" in an influential publications are routinely mis military. The question may be framed 2011 paper by the psychology re Ttaken. Researchers typically expressnonspecifically as an investigation of searchers Joseph Simmons, Leif Nel the confidence in their data in terms possible associations between party son, and Uri Simonsohn. Our main of p-value: the probability that a per affiliation and mathematical reasoning point in the present article is that it ceived result is actually the result of across contexts. The null hypothesis is is possible to have multiple potential random variation. The value of p (for that the political context is irrelevant comparisons (that is, a data analysis "probability") is a way of measuring to the task, and the alternative hypoth whose details are highly contingent the extent to which a data set provides esis is that context matters and the dif on data, invalidating published p-val- evidence against a so-called null hy ference in performance between the ues) without the researcher perform pothesis. By convention, a p-value be two parties would be different in the ing any conscious procedure of fishing low 0.05 is considered a meaningful military and healthcare contexts.
    [Show full text]
  • Replicability Problems in Science: It's Not the P-Values' Fault
    The replicability problems in Science: It’s not the p-values’ fault Yoav Benjamini Tel Aviv University NISS Webinar May 6, 2020 1. The Reproducibility and Replicability Crisis Replicability with significance “We may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us statistically significant results.” Fisher (1935) “The Design of Experiments”. Reproducibility/Replicability • Reproduce the study: from the original data, through analysis, to get same figures and conclusions • Replicability of results: replicate the entire study, from enlisting subjects through collecting data, and analyzing the results, in a similar but not necessarily identical way, yet get essentially the same results. (Biostatistics, Editorial 2010, Nature Editorial 2013, NSA 2019) “ reproducibilty is the ability to replicate the results…” in a paper on “reproducibility is not replicability” We can therefore assure reproducibility of a single study but only enhance its replicability Opinion shared by 2019 report of National Academies on R&R Outline 1. The misguided attack 2. Selective inference: The silent killer of replicability 3. The status of addressing evident selective inference 6 2. The misguided attack Psychological Science “… we have published a tutorial by Cumming (‘14), a leader in the new-statistics movement…” • 9. Do not trust any p value. • 10. Whenever possible, avoid using statistical significance or p- values; simply omit any mention of null hypothesis significance testing (NHST). • 14. …Routinely report 95% CIs… Editorial by Trafimow & Marks (2015) in Basic and Applied Social Psychology: From now on, BASP is banning the NHSTP… 7 Is it the p-values’ fault? ASA Board’s statement about p-values (Lazar & Wasserstein Am.
    [Show full text]
  • Just How Easy Is It to Cheat a Linear Regression? Philip Pham A
    Just How Easy is it to Cheat a Linear Regression? Philip Pham A THESIS in Mathematics Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Master of Arts Spring 2016 Robin Pemantle, Supervisor of Thesis David Harbater, Graduate Group Chairman Abstract As of late, the validity of much academic research has come into question. While many studies have been retracted for outright falsification of data, perhaps more common is inappropriate statistical methodology. In particular, this paper focuses on data dredging in the case of balanced design with two groups and classical linear regression. While it is well-known data dredging has pernicious e↵ects, few have attempted to quantify these e↵ects, and little is known about both the number of covariates needed to induce statistical significance and data dredging’s e↵ect on statistical power and e↵ect size. I have explored its e↵ect mathematically and through computer simulation. First, I prove that in the extreme case that the researcher can obtain any desired result by collecting nonsense data if there is no limit on how much data he or she collects. In practice, there are limits, so secondly, by computer simulation, I demonstrate that with a modest amount of e↵ort a researcher can find a small number of covariates to achieve statistical significance both when the treatment and response are independent as well as when they are weakly correlated. Moreover, I show that such practices lead not only to Type I errors but also result in an exaggerated e↵ect size.
    [Show full text]