Working with statcheck

Mich`ele B. Nuijten Tilburg University May 30, 2013

Abstract Conclusions in experimental often are the result of null hypothesis significance testing. Unfortunately, as much as 18% of the resulting statistical conclusions is reported incorrectly and can make a insignificant result appear significant or vice versa (Bakker & Wicherts, 2011). Often these reporting errors are in line with the researchers’ expectations and thus these errors introduce systematic bias. To get an idea of the prevalence of reporting errors in various fields, journals, or other subcategories we created the package statcheck (Epskamp & Nuijten, in preparation). This package can be used to extract statistics from articles, recompute p values, and diagnose a plausible cause for any errors. In this manual you will find instruction on the installation and use of statcheck. Many conclusions in experimental psychology depend on null hypothesis significance testing (NHST). It is therefore important that researchers perform and report these tests and their con- clusions correctly. Unfortunately, recent research by Bakker and Wicherts (2011) has shown that as many as 18% of statistical results in psychological literature are reported incorrectly, and even worse, that around 15% of the articles contained an error that made an insignificant result appear significant and vice versa. Bakker and Wicherts also found that these errors were often in line with the researchers’ expectations. This means that the misreported statistical results introduce systematic bias into the psychology literature. Systematic bias in psychology is a serious problem. To solve it, it is essential that we first get a clear idea of the prevalence of the reporting errors in different fields, journals, etc.. To this end we created the statcheck (Epskamp & Nuijten, in preparation). statcheck can be used to extract statistics from articles, recompute p values, and diagnose a plausible cause for any inconsistencies.

1 What statcheck Can and Cannot Do

Before we get to the technical parts of installing and using statcheck, we should consider what statcheck can and cannot do. The package statcheck is a program that automatically extracts statistics from articles and recomputes their p values. It works as follows:

1. Convert PDF to plain text If the articles are HTML files they don’t need to be converted anymore.

2. Scan text for statistical results statcheck searches for specific patterns and can only recognize statistical results that are reported in APA style. It recognizes t, F, r, χ2 and Z values, and results from the Wald test. 3. Use test statistics and degrees of freedom to recompute p value

Contact: [email protected]

1 4. Compare reported and recomputed p value This comparison takes into account how the results were reported, e.g. p < .05 is treated differently than p = .05.

statcheck is meant to give a rough overview of the number and type of statistical errors in a set of articles. It is not meant as a tool to draw conclusions about the error rate of a specific author or article, simply because statcheck is not sensitive or specific enough to justify such strong conclusions (or accusations, for that matter). For instance, if a statistical result is not reported in APA style, statcheck will not recognize it. Furthermore, it is entirely possible that statcheck labels a result as erroneous when in fact it is not, e.g. a correctly reported one-sided test will have a p value twice as small as statcheck expects, so it will be incorrectly marked as an error. Scientific errors are a delicate subject so you should be careful before labeling someone’s results as erroneous.

2 Installation

There are several programs you need to install before you can start using statcheck. Of course you need to install R, and preferably the R environment Rstudio. Furthermore, you will need the program pdftotext to enable statcheck to convert PDF articles to plain textfiles.

2.1 R and Rstudio To use statcheck you first need to install R. R is a free programming language and environment for statistical computing and graphics. You can obtain the latest version from me or download it at https://github.com/SachaEpskamp/statcheck. Beware: figuring out how to install an R package directly from GitHub can take some time, so if you are not familiar with GitHub I recommend you to email me. If you want you can run R from Rstudio. Rstudio is a prettier interface for R and it has several nifty tricks that can make programming in R easier. Also, the (not terribly important but still informative) statcheck function identify only works in Rstudio. You can obtain the latest version of Rstudio from .com.

2.2 pdftotext statcheck relies on the program pdftotext that, surprisingly, converts PDF files to plain text files. To install pdftotext do the following:

2.2.1 Step 1 The first step is to download the precompiled binaries of pdftotext from http://www.foolabs.com/xpdf/download.html (see Figure 1).

2.2.2 Step 2 Unzip the precompiled binaries (see Figure 2). When you click on “Extract all files” you will enter the “Extraction Wizard”. Just keep clicking on “Next” and finally “Finish”. Now you have unzipped the binaries.

http://en.wikipedia.org/wiki/Binary_file

2 Figure 1: Download the precompiled binaries.

3 Figure 2: Unzip the precompiled binaries.

4 Figure 3: Getting to the PATH variable.

2.2.3 Step 3 You have now downloaded and unzipped the binaries of pdftotext, but now we have to make sure that R can find it. We can do this by adding the folder with the binaries to the PATH variable. You can find the PATH variable under Settings > Control Panel > System (see Figure 3). In the System menu, go to Advanced > Environment Variables (see Figure 4), select “Path” and press Edit (see Figure 5). In the “Edit System Variable” menu you will find your PATH next to “Variable value” (see Figure 6). Copy your PATH and paste it in a text editor (e.g. Notepad). I would advise you to save an untouched copy of your PATH as a backup in case things go wrong and you want to restore the default settings. That said, you can add the folder of the unzipped binaries of pdftotext to the PATH. To do this, you copy the location of the binaries (32/64 bit, depending on your system ). In my case the location is C: \ Program Files \ xpdfbin-win-3.03 \ xpdfbin-win-3.03 \ bin32, because I saved the binaries in my Program Files folder. Next, you paste this location into your PATH, in my case it looks like this: (...) C: \ WINDOWS \ system32 \ nls \ ENGLISH; C: \ Program Files \ xpdfbin-win-3.03 \ xpdfbin-win-3.03 \ bin32; C: \ Program Files \ Novell \ ZENworks \ ; (...) Now you have downloaded R and pdftotext and added pdftotext to your PATH, we can load Statcheck in R.

2.3 Installing pdftotext on Mac To use pdftotext on Mac you first need to install XQuartz. This is “a version of the X.Org X Win- dow System that runs on OS X”. You can download it from http://xquartz.macosforge.org/landing/.

http://en.wikipedia.org/wiki/PATH_(variable) http://pcsupport.about.com/od/fixtheproblem/f/32-bit-64-bit-windows.htm

5 Figure 4: Getting to the PATH variable.

6 Figure 5: Getting to the PATH variable.

7 Figure 6: Getting to the PATH variable.

8 Figure 7: Add statcheck to your library

After you installed XQuartz, you have to download and install xpdf-tools-3.dmg. You can find it at http://en.sourceforge.jp/projects/sfnet_xpdf.mirror/downloads/xpdf-tools- 3.dmg/. Furthermore, you need to install the R package tcltk that is available from http://cran.us.r- project.org/bin/macosx/tools/.

2.4 Loading the Package To use statcheck in R we have to take two steps: first we have to install the package, next we have to load it. Since statcheck is not on CRAN yet, you will have to install it by dragging its (unzipped) folder to the R library folder (see Figure 7). Next, you can load statcheck into R with the following command:

library("statcheck")

statcheck is now ready to use.

3 Example Articles

We will use a set of example articles to practice working with statcheck. The articles we will use are both the HTML and PDF versions of Murayama, Elliot, and Yamagata (2011), Stapel and Suls (2004) and Van Zomeren, Spears, Fischer, and Leach (2004). You can either download these articles via e.g. Google Scholar (although Stapel and Suls (2004) might have been retracted due to fraud) or obtain the files from me.

9 4 Scanning the Articles

You can choose to scan either a specific article or an entire directory of articles. It is also possible to scan text that you saved in an R object, but since there are so little occasions on which you would want to do this we won’t consider it now. If, however, you want to see how this works, check out the helpfile through the R command ?statcheck. The next sections explain in detail how you can scan single articles and directories with articles.

4.1 Scanning a Specific Article If you want to analyze a specific article you can use the statcheck functions checkPDF() or checkHTML(), depending on the filetype of the article. This function needs the path to the file, for instance:

checkHTML("C:/users/michele/dropbox/articles/stapel.html") or a vector of paths if you want to select multiple specific articles:

checkPDF(c("C:/users/michele/dropbox/articles/stapel.pdf", "C:/users/michele/dropbox/articles/zomeren.pdf"))

If you want to further analyze or plot the statcheck output (and you probably do), it is most practical to save the entire output in an object:

output <- checkHTML("C:/users/michele/dropbox/articles/zomeren.html")

4.2 Scanning Multiple Articles It is also possible to scan multiple articles at once. Create a directory where all the articles of interest are saved. statcheck will scan every document in PDF or HTML (or both, depending on the function you choose), so make sure that the directory only contains the files you want to analyze. Next, use checkdir(), checkHTMLdir, or checkPDFdir to select the directory of choice. With the function checkdir() you will scan all HTML and PDF files in the directory, whereas checkHTMLdir and checkPDFdir will only scan HTML or PDF files respectively. You can select the directory of choice in two ways. The first way is again by directly specifying the path of the folder, and the second way is by a point and click menu. Specifying the path is much like we did when we selected the specific article:

output <- checkHTMLdir("C:/users/michele/dropbox/articles")

When you want to use a point and click menu, you can use on of the following commands:

output1 <- checkdir() output2 <- checkHTMLdir() output3 <- checkPDFdir()

A pop-up window will appear (mind you, it can be hidden behind your R window). In this window you can select the directory of interest and statcheck will automatically analyze all articles of the chosen filetype in it.

10 Figure 8: Part of the statcheck output of the example articles

4.3 PDF vs. HTML statcheck can handle both articles in PDF and HTML, but if possible: use HTML. When you use HTML the article already is in plain text and you can be fairly sure that all statistical results are scanned properly. With PDF files however, journals sometimes use images of special signs such as “=”, “<”, or “χ”, instead of the actual sign. These images are not converted to text and therefore not read by statcheck. You can test whether an article made use of images (bad) or signs (good) by selecting the sign, copying it, and pasting it in Word. If the copied sign still looks like the sign, statcheck will be able to read it as well. If the copied sign looks like an empty square in Word, it was an image and it cannot be read. Use an HTML version of the article if possible.

5 Interpreting the statcheck Output statcheck returns a lot of information. You can choose to inspect the full output or a summary.

5.1 Full Output Figure 8 shows the first part of the statcheck output of our example articles. The first columns contain the filenames of the articles. Next to that you find the details of the extracted statistics: which statistic was it, how many degrees of freedom, the reported test and p value and how the p value was reported. In the column Raw you can find the extracted string as a whole. The three columns after that (InExactError, ExactError, and DecisionError) indicate whether there are any incongruencies in the reported and recomputed p value. Firstly, when InExactError is TRUE, it means that statcheck found an incongruency in an inexactly reported p value. Take a look at line 8 of the output. Here you see that one of the results of Murayama et al. (2011) contains an InExactError. The extracted statistics are: χ2(31) = 38.78, p < .05. However, if we recompute the p value we get p=.1589, which is not significant. Secondly, when ExactError is TRUE, it means that statcheck found an incongruency in an exactly reported p value. An example of this can be found at line 1 (Murayama et al., 2011) and 22 (Stapel & Suls, 2004) of the output. At line 22 the reported statistics were F (1, 58) = 2.30, p = .14, whereas the recomputed p value is .1348. As you can see, this inconsistency is much smaller than in the previous example. Finally, when DecisionError is TRUE, it means that a non-significant result was reported as significant, or vice versa. In our example output this again leads us to line 8 (Murayama et al., 2011). As we have seen before, the reported significant results proved non-significant after recalculation. When you want to save an HTML article, make sure you just save the html part of the webpage, not the complete webpage.

11 5.2 Summary You can also ask for a summary of the results per article as follows:

stat <- checkHTMLdir("C:/users/michele/dropbox/articles") summary(stat)

This results in a dataframe in which you can look up the number of reported p values per article and the number of exact, inexact, and decision errors per article. If we take a look at the summary of our example in Figure 9, we see among other things that most of the errors were in Van Zomeren et al. (2004), that decision errors are the rarest kind of error, and that both Stapel and Suls (2004) and Van Zomeren et al. (2004) report more than twice as many statistics than Murayama et al. (2011).

6 Plotting the Results

If you’re analyzing many articles (or few articles with many statistics, for that matter) it can be helpful to plot the results.

6.1 Simple Plot You can plot statcheck as follows:

stat <- checkHTMLdir("C:/users/michele/dropbox/articles") plot(stat)

The resulting figure is based on Bakker and Wicherts (2011) and shows you the reported p values against the recomputed p values. If a dot falls on the diagonal, the reported and recomputed p value are the same. Note: if a dot does not fall on the diagonal, it does not necessarily mean that it was reported incorrectly. For instance, if a value was reported as p < .05, and the recomputed value is .02, the dot will not lie on the diagonal, but it obviously was correctly reported. To distinguish between errors and inexactly reported p values, the exactly reported p values have a diamond around them. When we take a look at the plot of our example articles (Figure 10), it looks quite good: most of the results lie on the diagonal which means that they are congruent.

6.2 Identify A function based on the plot function is identify(). With identify you are able to identify (surprise, surprise) the points in the plot. When you type in

stat <- checkHTMLdir("C:/users/michele/dropbox/articles") identify(stat) your plotted results will appear just as with plot, but now you can click on specific points to obtain information about that specific result. You can click on as many points as you like. When

Figure 9: The summary of the statcheck output of the example articles

12 ● ● ● 1.0

underestimated ● 0.8

● 0.6

● non−sig reported ● ● as sig

overestimated 0.4

●● recalculated p value ● ●

● ●

● 0.2 ● ● ●

●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● sig reported as non−sig 0.0

0.0 0.2 0.4 0.6 0.8 1.0

reported p value ● p inconsistency ● decision error exact Figure 10: Using plot on our example articles you’re done, press Esc. The row number of the selected point will appear next to it (see Figure 11) and a dataframe will appear with the information on the specified points (Figure 12). This identify trick only works properly in Rstudio. If you’re working in R you can make use of R’s own identify function as follows: stat <- checkHTMLdir("C:/users/michele/dropbox/articles") identify(stat$Reported.P.Value,stat$Computed) Next, click on the points of interest and press Esc when you’re done. The row number of that point in your statcheck output will appear next to it. You can then ask for the information on those points manually: stat[c(1,8,65,66),]

7 Possible Causes of Incongruencies

We found several errors in our example articles. Now we want to find out what kind of errors they are and if they are in fact errors at all. We can do this with diagnose(). This function looks at every inconsistent pair of reported and computed p values and tries to calculate what the most likely mistake was that resulted in the mismatched pair. For instance, if the reported p value is exactly twice as small as the computed value, the reported test was probably one-sided and thus not necessarily erroneous. The error categories are based on Bakker and Wicherts (2011). We can use diagnose as follows:

13 Figure 11: Using identify on our example articles

Figure 12: The resulting dataframe with information on the points that we selected with identify

14 Figure 13: The ErrorDiagnosis of the example articles

stat <- checkHTMLdir("C:/users/michele/dropbox/articles") diagnose(stat)

You will get a list with two dataframes: ErrorDiagnosis and CopyPaste. Another dataframe in this list is FullDiagnosis, but this is not printed by default. You can ask for a specific dataframe with the $ sign:

diag <- diagnose(stat) # save the diagnosis in an object diag$ErrorDiagnosis diag$CopyPaste diag$FullDiagnosis

7.1 ErrorDiagnosis We will now take a look at the diagnosis of our example articles. In Figure 13 we see the ErrorDiagnosis of the articles. In this dataframe you will find the diagnosis of all incongru- ent pairs of statistics. The first column indicates the article in which the inconsistency was found. Per error the full string of extracted statistics is printed with the recomputed p value next to it. Then we get to several error categories that can be either TRUE or FALSE. One error can belong to multiple categories. We can see that the error on the first line of the dataframe ErrorDiagnosis falls in the category OneTail. This means that the reported p value is consistent with the one-sided computed p value. If the researchers report that they have conducted a one-sided test, such an incongruency is not necessarily an error. However, in this case a χ2 test was performed which already is one- sided and it makes no sense to divide the p value again. Note that you can indicate whether you want statcheck to classify one-sided results as errors (default) or not, with the argument OneSidedAsError. Another example from ErrorDiagnosis is on line 64 (Van Zomeren et al., 2004). Here we see that the incongruency was diagnosed as both a RoundError as a SmallerInsteadEqual. When you look at the reported p value “< .07” and the computed value “.074”, you can imagine that the incongruency could have resulted from both a rounding error as well as switching the “=” sign for a “<” sign. The dataframe also contains other error categories. Check the helpfile (?diagnose) for a full description of each category.

A list is a structure in R that can contain objects of different classes.

Figure 14: The CopyPaste diagnosis of the example articles

15 7.2 CopyPaste With the second dataframe of our diagnosis, CopyPaste, we can check for copy paste errors. Sometimes researchers copy the results of a test and use it as a template for other results. However, it can happen that they forget to adjust the numbers and the same string of result will appear twice (or more) in the article. We labeled this a “copy-paste error”. Note that a copy-paste error does not have to be an inconsistent result, so statcheck will not necessarily label it an error. If we take a look at the CopyPaste dataframe for our articles, we see that one of the results of Stapel and Suls (2004) seems to be a copy-paste error (see Figure 14). The result F (1, 154) = 7.67, p < .01 appears twice in the article. This probably is a copy-paste error.

7.3 FullDiagnosis If you want to see the ErrorDiagnosis information for all the extracted statistics and not just the incongruent statistics, you can ask for the FullDiagnosis:

stat <- checkHTMLdir("C:/users/michele/dropbox/articles") diag <- diagnose(stat) diag$FullDiagnosis

The diagnose function can help you determine a plausible cause for an incongruency, but be careful with your conclusions. As mentioned before, statcheck is an automatized program and it is possible that a result marked as incongruent comes with an in-text narrative that explains the inconsistency.

8 Tips and Tricks

When you are working with statcheck there are several tips and tricks that can make your life easier.

8.1 Helpfiles First of all, take a look at statcheck’s helpfiles. As mentioned before, you can get to them by typing a question mark in front of a function’s name:

?statcheck ?checkHTMLdir # etc.

In the helpfiles you can find information about the input and output of the functions.

8.2 General R Questions If you have a general question about working with R there are several things you can do. First of all: Google is your friend. Many people work with R and there is a large community that devotes its time to answering R questions on different internet forums. Because of that you will probably find an answer if you Google your question. Another very helpful website for R questions is www.stackoverflow.com. Stackoverflow is a free question and answer site for programming. It has a clear layout and answers to questions are graded on relevance and clarity. If you have specific R questions you can specify your question by adding the tag [r] to your question.

16 8.3 Writing Output to Excel Sometimes it can be practical to write the statcheck results to an Excel file. The function you can use for this is write.csv2().

setwd("C:/users/michele/dropbox/results") # I want to save the result here stat <- checkHTMLdir("C:/users/michele/dropbox/articles") write.csv2(stat,"results",row.names=FALSE)

This function creates an excel file in the your current working directory (check your current working directory with getwd()). Next, go to Excel, choose “Open”, and select the file you just created. You might get a warning like “The file you are trying to open is in a different format [...] Are you sure you want to open the file now?”. Yes you want to open it now. A “Text Import Wizard” will pop up. Here you have to specify how the columns of your file are separated. In the first step of the wizard, indicate that the file type is “Delimited”. Click Next. Here you indicate which characters separate the columns. Check “semicolon” and click Next. In the next step we don’t have to change anything. Just click “Finish” and you will have your results in Excel.

8.4 If Everything Else Fails... If you come across an error that is impossible to solve, or you have a statcheck output that leaves you completely bewildered, or of course if you come across a bug, you can send me an e-mail and I’ll see what I can do!

17 References

Bakker, M., & Wicherts, J. (2011). The (mis) reporting of statistical results in psychology journals. Behavior Research Methods, 43 (3), 666–678. Epskamp, S., & Nuijten, M. B. (in preparation). statcheck: Extract and recompute significance values from articles. [Computer software manual]. (R package version 0.1.0) Murayama, K., Elliot, A., & Yamagata, S. (2011). Separation of performance-approach and performance-avoidance achievement goals: A broader analysis. Journal of Educational Psy- chology, 103 (1), 238. Stapel, D., & Suls, J. (2004). Method matters: Effects of explicit versus implicit social comparisons on activation, behavior, and self-views. Journal of Personality and Social Psychology, 87 (6), 860. Van Zomeren, M., Spears, R., Fischer, A., & Leach, C. (2004). Put your money where your mouth is! explaining collective action tendencies through group-based anger and group efficacy. Journal of Personality and Social Psychology, 87 (5), 649.

18