Working with Statcheck

Working with statcheck Michèle B. Nuijten Tilburg University May 30, 2013 Abstract Conclusions in experimental psychology often are the result of null hypothesis significance testing. Unfortunately, as much as 18% of the resulting statistical conclusions is reported incorrectly and can make a insignificant result appear significant or vice versa (Bakker & Wicherts, 2011). Often these reporting errors are in line with the researchers' expectations and thus these errors introduce systematic bias. To get an idea of the prevalence of reporting errors in various fields, journals, or other subcategories we created the R package statcheck (Epskamp & Nuijten, in preparation). This package can be used to extract statistics from articles, recompute p values, and diagnose a plausible cause for any errors. In this manual you will find instruction on the installation and use of statcheck. Many conclusions in experimental psychology depend on null hypothesis significance testing (NHST). It is therefore important that researchers perform and report these tests and their conclusions correctly. Unfortunately, recent research by Bakker and Wicherts (2011) has shown that as many as 18% of statistical results in psychological literature are reported incorrectly, and even worse, that around 15% of the articles contained an error that made an insignificant result appear significant and vice versa. Bakker and Wicherts also found that these errors were often in line with the researchers' expectations. This means that the misreported statistical results introduce systematic bias into the psychology literature. Systematic bias in psychology is a serious problem. To solve it, it is essential that we first get a clear idea of the prevalence of the reporting errors in different fields, journals, etc.. To this end we created the R package statcheck (Epskamp & Nuijten, in preparation). statcheck can be used to extract statistics from articles, recompute p values, and diagnose a plausible cause for any inconsistencies. 1 What statcheck Can and Cannot Do Before we get to the technical parts of installing and using statcheck, we should consider what statcheck can and cannot do. The package statcheck is a program that automatically extracts statistics from articles and recomputes their p values. It works as follows: 1. Convert PDF to plain text If the articles are HTML files they don't need to be converted anymore. 2. Scan text for statistical results statcheck searches for specific patterns and can only recognize statistical results that are reported in APA style. It recognizes t, F, r, χ2 and Z values, and results from the Wald test. 3. Use test statistics and degrees of freedom to recompute p value Contact: [email protected] 1 4. Compare reported and recomputed p value This comparison takes into account how the results were reported, e.g. p < .05 is treated differently than p = .05. statcheck is meant to give a rough overview of the number and type of statistical errors in a set of articles. It is not meant as a tool to draw conclusions about the error rate of a specific author or article, simply because statcheck is not sensitive or specific enough to justify such strong conclusions (or accusations, for that matter). For instance, if a statistical result is not reported in APA style, statcheck will not recognize it. Furthermore, it is entirely possible that statcheck labels a result as erroneous when in fact it is not, e.g. a correctly reported one-sided test will have a p value twice as small as statcheck expects, so it will be incorrectly marked as an error. Scientific errors are a delicate subject so you should be careful before labeling someone's results as erroneous. 2 Installation There are several programs you need to install before you can start using statcheck. Of course you need to install R, and preferably the R environment Rstudio. Furthermore, you will need the program pdftotext to enable statcheck to convert PDF articles to plain textfiles. 2.1 R and Rstudio To use statcheck you first need to install R. R is a free programming language and environment for statistical computing and graphics. You can obtain the latest version from me or download it at https://github.com/SachaEpskamp/statcheck. Beware: figuring out how to install an R package directly from GitHub can take some time, so if you are not familiar with GitHub I recommend you to email me. If you want you can run R from Rstudio. Rstudio is a prettier interface for R and it has several nifty tricks that can make programming in R easier. Also, the (not terribly important but still informative) statcheck function identify only works in Rstudio. You can obtain the latest version of Rstudio from rstudio.com. 2.2 pdftotext statcheck relies on the program pdftotext that, surprisingly, converts PDF files to plain text files. To install pdftotext do the following: 2.2.1 Step 1 The first step is to download the precompiled binaries of pdftotext from http://www.foolabs.com/xpdf/download.html (see Figure 1). 2.2.2 Step 2 Unzip the precompiled binaries (see Figure 2). When you click on \Extract all files" you will enter the \Extraction Wizard". Just keep clicking on \Next" and finally \Finish". Now you have unzipped the binaries. http://en.wikipedia.org/wiki/Binary_file 2 Figure 1: Download the precompiled binaries. 3 Figure 2: Unzip the precompiled binaries. 4 Figure 3: Getting to the PATH variable. 2.2.3 Step 3 You have now downloaded and unzipped the binaries of pdftotext, but now we have to make sure that R can find it. We can do this by adding the folder with the binaries to the PATH variable. You can find the PATH variable under Settings > Control Panel > System (see Figure 3). In the System menu, go to Advanced > Environment Variables (see Figure 4), select \Path" and press Edit (see Figure 5). In the \Edit System Variable" menu you will find your PATH next to \Variable value" (see Figure 6). Copy your PATH and paste it in a text editor (e.g. Notepad). I would advise you to save an untouched copy of your PATH as a backup in case things go wrong and you want to restore the default settings. That said, you can add the folder of the unzipped binaries of pdftotext to the PATH. To do this, you copy the location of the binaries (32/64 bit, depending on your system ). In my case the location is C: n Program Files n xpdfbin-win-3.03 n xpdfbin-win-3.03 n bin32, because I saved the binaries in my Program Files folder. Next, you paste this location into your PATH, in my case it looks like this: (...) C: n WINDOWS n system32 n nls n ENGLISH; C: n Program Files n xpdfbin-win-3.03 n xpdfbin-win-3.03 n bin32; C: n Program Files n Novell n ZENworks n ; (...) Now you have downloaded R and pdftotext and added pdftotext to your PATH, we can load Statcheck in R. 2.3 Installing pdftotext on Mac To use pdftotext on Mac you first need to install XQuartz. This is \a version of the X.Org X Win- dow System that runs on OS X". You can download it from http://xquartz.macosforge.org/landing/. http://en.wikipedia.org/wiki/PATH_(variable) http://pcsupport.about.com/od/fixtheproblem/f/32-bit-64-bit-windows.htm 5 Figure 4: Getting to the PATH variable. 6 Figure 5: Getting to the PATH variable. 7 Figure 6: Getting to the PATH variable. 8 Figure 7: Add statcheck to your library After you installed XQuartz, you have to download and install xpdf-tools-3.dmg. You can find it at http://en.sourceforge.jp/projects/sfnet_xpdf.mirror/downloads/xpdf-tools- 3.dmg/. Furthermore, you need to install the R package tcltk that is available from http://cran.us.r- project.org/bin/macosx/tools/. 2.4 Loading the Package To use statcheck in R we have to take two steps: first we have to install the package, next we have to load it. Since statcheck is not on CRAN yet, you will have to install it by dragging its (unzipped) folder to the R library folder (see Figure 7). Next, you can load statcheck into R with the following command: library("statcheck") statcheck is now ready to use. 3 Example Articles We will use a set of example articles to practice working with statcheck. The articles we will use are both the HTML and PDF versions of Murayama, Elliot, and Yamagata (2011), Stapel and Suls (2004) and Van Zomeren, Spears, Fischer, and Leach (2004). You can either download these articles via e.g. Google Scholar (although Stapel and Suls (2004) might have been retracted due to fraud) or obtain the files from me. 9 4 Scanning the Articles You can choose to scan either a specific article or an entire directory of articles. It is also possible to scan text that you saved in an R object, but since there are so little occasions on which you would want to do this we won't consider it now. If, however, you want to see how this works, check out the helpfile through the R command ?statcheck. The next sections explain in detail how you can scan single articles and directories with articles. 4.1 Scanning a Specific Article If you want to analyze a specific article you can use the statcheck functions checkPDF() or checkHTML(), depending on the filetype of the article. This function needs the path to the file, for instance: checkHTML("C:/users/michele/dropbox/articles/stapel.html") or a vector of paths if you want to select multiple specific articles: checkPDF(c("C:/users/michele/dropbox/articles/stapel.pdf", "C:/users/michele/dropbox/articles/zomeren.pdf")) If you want to further analyze or plot the statcheck output (and you probably do), it is most practical to save the entire output in an object: output <- checkHTML("C:/users/michele/dropbox/articles/zomeren.html") 4.2 Scanning Multiple Articles It is also possible to scan multiple articles at once.

Load more