Research Quality in Behavioral Science 1

Running head: RESEARCH QUALITY IN BEHAVIORAL SCIENCE 1 1 Citation counts and journal impact factors do not capture key aspects of research quality 2 in the behavioral and brain sciences 3 Michael R. Dougherty 4 Department of Psychology, University of Maryland 5 Zachary Horne 6 Department of Psychology, University of Edinburgh RESEARCH QUALITY IN BEHAVIORAL SCIENCE 2 7 Abstract 8 Citation data and journal impact factors are important components of faculty dossiers and 9 figure prominently in both promotion decisions and assessments of a researcher’s broader 10 societal impact. Although these metrics play a large role in high-stakes decisions, the 11 evidence is mixed about whether they are strongly correlated with key aspects of research 12 quality. We use data from three large scale studies to assess whether citation counts and 13 impact factors predict three indicators of research quality: (1) the number of statistical 14 reporting errors in a paper, (2) the evidential value of the reported data, and (3) the 15 replicability of a given experimental result. Both citation counts and impact factors were 16 weak and inconsistent predictors of research quality, so defined, and sometimes negatively 17 related to quality. Our findings impugn the validity of citation data and impact factors as 18 indices of research quality and call into question their usefulness in evaluating scientists 19 and their research. In light of these results, we argue that research evaluation should 20 instead focus on the process of how research is conducted and incentivize behaviors that 21 support open, transparent, and reproducible research. RESEARCH QUALITY IN BEHAVIORAL SCIENCE 3 22 Citation counts and journal impact factors do not capture key aspects of research quality 23 in the behavioral and brain sciences 24 Introduction 25 Researchers and administrators often assume that journal impact factors and citation 26 counts are indicators of research quality (e.g., McKiernan et al., 2019; Sternberg, 2016). 27 This assumption seems plausible: High impact journals may seem to have a more selective 28 and rigorous review process, thereby weeding out lower quality research as a consequence. 29 One might also view citation counts as reflecting something akin to the wisdom of the 30 crowd whereby high-quality research garners more citations than low-quality research. One 31 need not look far to see these assumptions on display: University libraries often promote 32 bibliometric indices such as citation counts and journal impact factors as indices of 33 “impact” or “quality”, academic departments use these metrics for important decisions 34 such as hiring, tenure, and promotion, and science journalists promote research from 35 high-impact journals. It is also common for authors to equate impact factors and citation 36 counts with quality (Ruscio, 2016; Sternberg, 2016) – an assumption that appears in 37 university promotion and tenure policies (McKiernan et al., 2019). The inclusion of these 38 metrics in high-stakes decisions starts with the assumption that there is a positive and 39 meaningful relation between the quality of ones work on the one hand, and impact factors 40 and citations on the other. This raises the question, are we justified in thinking that 41 high-impact journals or highly-cited papers are of higher quality? 42 Before proceeding, it is important to note that citation counts and journal impact 43 factors are often treated as variables to be maximized, under the assumption that high 44 citation counts and publishing in high-impact journals demonstrate that ones work is of 45 high quality. This view implicitly places these variables on the left-hand side of the 46 prediction equation, as if the goal of research evaluation is to predict (and promote) 47 individuals who are more likely to garner high citation counts and publish in high-impact 48 factor journals. This view, whether implicitly or explicitly endorsed is problematic for a RESEARCH QUALITY IN BEHAVIORAL SCIENCE 4 49 variety of reasons. 50 First, it neglects the fact that citation counts themselves are determined by a host of 51 factors unrelated to quality, or for that matter even unrelated to the science being 52 evaluated (Aksnes, Langfeldt, & Wouters, 2019; Bornmann & Daniel, 2008; Lariviére & 53 Gingras, 2010). For instance, citation counts covary with factors such as the length of the 54 title and the presence of colons or hyphens in the title (Paiva, Lima, & Paiva, 2012; Zhou, 55 Tse, & Witheridge, 2019) and can be affected by other non-scientific variables, such as the 56 size of ones social network (Mählck & Persson, 2000), the use of social media, and the 57 posting of preprints on searchable archives (Gargouri et al., 2010; Gentil-Beccot, Mele, & 58 Brooks, 2009; S. Lawrence, 2001; Luarn & Chiu, 2016; Peoples, Midway, Sackett, Lynch, & 59 Cooney, 2016). Citation counts also tend to be higher for papers on so-called “hot” topics 60 and for researchers associated with big-named faculty (Petersen et al., 2014). Not only are 61 researchers incentivized to maximize these, but it is also easy to find advice on how to game 62 them (e.g., https://www.natureindex.com/news-blog/studies-research-five-ways-increase- 63 citation-counts, https://www.nature.com/news/2010/100813/full/news.2010.406.html, 64 https://uk.sagepub.com/en-gb/eur/increasing-citations-and-improving-your-impact-factor). 65 Second, treating these variables in this way can lead to problematic inferences such as 66 inferring that mentorship quality varies by gender simply because students of male mentors 67 tend to enjoy a citation advantage (see the now retracted paper by AlShebli, Makovi, & 68 Rahwan, 2020), and even perpetrate systemic inequalities in career advancement and 69 mobility due to biases in citation patterns that disfavor women and persons from 70 underrepresented groups (Greenwald & Schuh, 1994; X. Wang et al., 2020b). Finally, 71 treating citations and impact factors as the to-be-maximized variables may alter 72 researchers’ behaviors in ways that can undermine science (Chapman et al., 2019). For 73 example, incentivizing researchers to maximize citations may lead researchers to focus on 74 topics that are in vogue regardless of whether doing so addresses key questions that will 75 advance their field. RESEARCH QUALITY IN BEHAVIORAL SCIENCE 5 76 If we instead think of citation counts and impact factors as predictors of success, we 77 can ask whether they are valid proxies for assessing key aspects of the quality of a 78 researcher. To be clear, these metrics are not and cannot be considered direct measures of 79 research quality, but of course, addressing this question requires a way of measuring quality 80 that is independent of citation counts. Past work on this topic has primarily addressed the 81 issue by relying on subjective assessments of quality provided by experts or peer reviews. 82 On balance, these studies have shown either weak or inconsistent relationships between 83 quality and citation counts (e.g., Nieminen, Carpenter, Rucker, & Schumacher, 2006; 84 Patterson & Harris, 2009; West & McIlwaine, 2002). One challenge in relying on subjective 85 assessments is that their use assumes that the judges can reliably assess quality—an 86 assumption that has been challenged by Bornmann, Mutz, and Daniel(2010), who showed 87 that the inter-rater reliability of peer review ratings is extremely poor. Indeed, controlled 88 studies of consistency across reviewers also indicate a surprisingly high level of arbitrariness 89 in the review process (see Francois, 2015; Langford & Guzdial, 2015; Peters & Ceci, 1982). 90 Peters and Ceci(1982), for instance, resubmitted 12 articles (after changing the author 91 names) that had previously been accepted for publication in psychology journals and found 92 that the majority of the articles that had been accepted initially were rejected the second 93 time around based on serious methodological errors. Similarly, N. Lawrence and Cortes 94 (2014) reported a high degree of arbitrariness in accepted papers submitted to the Neural 95 Information Processing Systems annual meeting, widely viewed as one the premier peer 96 reviewed outlets in Machine Learning. In this study, a random sample of submissions went 97 through two independent review panels; the authors estimated that 60% of decisions 98 appeared to arise from an arbitrary process. By comparison, a purely random process 99 would have yielded an arbitrariness coefficient of 78%, whereas a process without an 100 arbitrary component would have yielded a coefficient of 0%. If reviewers who are 101 specifically tasked with judging quality cannot agree on the acceptability of research for 102 publication, then it is unlikely that impact factors or citation counts, which themselves are RESEARCH QUALITY IN BEHAVIORAL SCIENCE 6 103 dependent on the peer review process, are reliable indicators of quality. 104 Recent discussions have raised several issues with the use of bibliometrics for faculty 105 evaluation and the incentive structure for appointments, tenure, and promotion. First, 106 there is growing concern about the improper use of bibliometrics when evaluating faculty. 107 This concern has been expressed by numerous scholars (Aksnes et al., 2019; Hicks, Wouters, 108 Waltman, de Rijcke, & Rafols, 2015; Moher et al., 2018; Seglen, 1997) and has been codified 109 in the San Francisco Declaration on Research Assessment (DORA) – a statement that has 110 been endorsed by hundreds of organizations, including the Association

Load more