“Statcheck”: Automatically Detect Statistical Reporting Inconsistencies to Increase Reproducibility of Meta-Analyses

Tilburg University “statcheck” Nuijten, Michèle; Polanin, Joshua Published in: Research Synthesis Methods DOI: 10.1002/jrsm.1408 Publication date: 2020 Document Version Publisher's PDF, also known as Version of record Link to publication in Tilburg University Research Portal Citation for published version (APA): Nuijten, M., & Polanin, J. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574-579. https://doi.org/10.1002/jrsm.1408 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 25. sep. 2021 Received: 13 August 2019 Revised: 24 January 2020 Accepted: 26 March 2020 DOI: 10.1002/jrsm.1408 COMPUTATIONAL TOOLS AND METHODS “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses Michèle B. Nuijten1 | Joshua R. Polanin2 1The Department of Methodology and Statistics, Tilburg University, Tilburg, The We present the R package and web app statcheck to automatically detect statis- Netherlands tical reporting inconsistencies in primary studies and meta-analyses. Previous 2 Research & Evaluation, American research has shown a high prevalence of reported p-values that are inconsis- Institutes for Research, Washington, DC tent - meaning a re-calculated p-value, based on the reported test statistic and Correspondence degrees of freedom, does not match the author-reported p-value. Such incon- Michèle B. Nuijten, The Department of sistencies affect the reproducibility and evidential value of published findings. Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB The tool statcheck can help researchers to identify statistical inconsistencies so Tilburg, The Netherlands. that they may correct them. In this paper, we provide an overview of the prev- Email: [email protected] alence and consequences of statistical reporting inconsistencies. We also dis- Funding information cuss the tool statcheck in more detail and give an example of how it can be Campbell Collaboration, Grant/Award used in a meta-analysis. We end with some recommendations concerning the Number: CMG2.02 use of statcheck in meta-analyses and make a case for better reporting standards of statistical results. KEYWORDS meta-analysis, reporting standards, reproducibility, statcheck, statistical error 1 | INTRODUCTION journals had “grossly inconsistent p-values that may have affected the statistical conclusion”.6 The authors applied Researchers in the health and social sciences continue to the phrase “grossly inconsistent” to represent cases in draw conclusions in the health and social sciences based which conclusions of the significance test would change solely on Null Hypothesis Significance Tests (NHST).1-4 based on a recalculation of the p-value. For example, a Primary study authors use these tests often, yet meta- study's author said a p-value was <.05 but the test statis- analysts use them as well: NHST results in primary stud- tic and degrees of freedom indicated the p-value was ies can also be used to calculate effect sizes to include in actually >.05, or vice versa. An alarmingly high number meta-analyses, and a recent review of meta-analyses pub- of impactful results of statistical significance tests were lished in the social sciences5 revealed that the average inconsistent and potentially misleadingly inaccurate, too: review conducted nearly 60 NHSTs. NHSTs can therefore the results indicated that gross inconsistencies favored lead to policy and practice decisions, and as such, their statistically significant results. accuracy is paramount. Detecting statistical reporting inconsistencies is Extant evidence suggests that statistical reporting time-consuming and, ironically, error-prone work. errors are widespread. A recent review of significance Because of that, Epskamp and Nuijten7 developed the testing in primary studies found that one in eight primary R package statcheck: an automated tool to extract studies published in eight high-profile psychology NHST results from articles and recalculate p-values. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. Res Syn Meth. 2020;1–6. wileyonlinelibrary.com/journal/jrsm 1 2 NUIJTEN AND POLANIN Recently, Polanin and Nuijten8 extended statcheck's functionality to include tests often used in meta-ana- Highlights lyses. In this paper, we elaborate on how statcheck can • Reporting inconsistencies where the reported be useful in the context of meta-analysis. We give a p-value does not match the degrees of freedom brief overview of the prevalence and consequences of and test statistic are widespread. statistical reporting inconsistencies based on a review • The R package and web app statcheck can of 402 meta-analyses. We also discuss the tool automatically detect statistical reporting incon- statcheck in more detail and give an example of how it sistencies in meta-analyses. can be used in a meta-analysis. We end with some rec- • If meta-analysts adhere to APA reporting style, ommendations concerning the use of statcheck in statcheck provides a quick and easy tool to meta-analyses and make a case for better reporting detect reporting inconsistencies and increase standards for statistical results. reproducibility. 2 | WHY SHOULD RESEARCH SYNTHESISTS CARE ABOUT STATISTICAL REPORTING 3 | INTRODUCING “statcheck” AS INCONSISTENCIES? A SOLUTION FOR META-ANALYSES We focus on a specific type of statistical error: statistical To detect statistical reporting inconsistencies, Epskamp reporting inconsistencies, where the reported p-value does and Nuijten7 developed the R package statcheck, with not match the accompanying test statistic and degrees of an accompanying web app at https://statcheck.io.13 stat- freedom. Statistical reporting inconsistencies are harmful checkis a free and easy-to-use tool that automatically for several reasons. First, these inconsistencies can lead to extracts statistical results from articles and recomputes p- wrong substantive conclusions when the reported p-value values to check their internal consistency. statcheck was is significant whereas the recalculated p-value is not, or developed to check results in primary studies, and we vice versa. Second, statistical reporting inconsistencies can recently extended its functionality to meta-analyses.8 also be symptoms of deeper, underlying problems. Reporting inconsistencies, for example, could signal human error, sloppiness,9 or questionable research prac- 3.1 | How does statcheck work? tices.10 Third, regardless of their cause, statistical inconsistencies affect the overall reproducibility of a paper: the The algorithm behind statcheck consists of four steps. ability to obtain the same numbers with the same data First, statcheck converts an article (or a folder of articles) and analyses. Results that appear erroneous and that can- from PDF or HTML to plain text. Second, using regular not be reproduced by reanalysis are unreliable and, worse, expressions, statcheck searches for specific combinations might be considered invalid.11 of letters, numbers, and symbols that signal the presence Statistical reporting inconsistencies can also affect the of an NHST result. Polanin and Nuijten8 updated quality of meta-analyses in various ways. From the per- statcheck to recognize Q tests in addition to the original spective of the primary studies included, reported NHST recognition of t, F, χ2, Z, and correlations that are results can be used to calculate effect sizes to include in a reported in the full text according to APA style (e.g., meta-analysis: reported results of t tests or F tests can be t(28) = 2.14, p = .04; 14). Third, statcheck uses the converted to Cohen's d. However, if the results of these reported test statistic and degrees of freedom to NHSTs are inconsistent, it is possible that the test statis- recalculate the p-value. Fourth, it compares the reported tics are incorrect (e.g., a typo in a t-value). If that errone- and computed p-value to see if they match. If they do not ous test statistic is then used to calculate the effect size to match, the result is flagged as an “inconsistency.” If the include in the meta-analysis, the eventual meta-analytic reported p-value is significant and the computed p-value effect size will also contain error.12 Furthermore, from is not, or vice versa, the result is flagged as a “gross the perspective of the meta-analytic results, the reported inconsistency.” By default, statcheck assumes an α of .05, NHSTs of meta-analytical averages, heterogeneity tests, but

“Statcheck”: Automatically Detect Statistical Reporting Inconsistencies to Increase Reproducibility of Meta-Analyses

Reproducibility and Replicability in Science (2019)

Sources of False Positives and False Negatives in the STATCHECK 1 Algorithm: Reply to Nuijten Et Al

Changes on CRAN 2014-07-01 to 2014-12-31

The Reproducibility of Statistical Results in Psychological Research: an Investigation Using Unpublished Raw Data

Replicability, Robustness, and Reproducibility in Psychological Science

The Science of Technology and Human Behavior: Standards, Old and New

Content Mining Psychology Articles for Statistical Test Results

March Methodology Madness

Striking Similarities Between Publications from China Describing Single Gene Knockdown Experiments in Human Cancer Cell Lines

Sprievodca Svetom Vedeckého Publikovania Učebný Text Pre Kurz Publikačný Poradca

Taking Stock of the Credibility Revolution: Scientific Reform 2011-Now

Running Head: ADOPTION and EFFECTS of OPEN SCIENCE 1