The Eigenfactor Metricstm: a Network Approach to Assessing Scholarly Journals
Total Page:16
File Type:pdf, Size:1020Kb
The Eigenfactor MetricsTM: A network approach to assessing scholarly journals Jevin D. West1 Theodore C. Bergstrom2 Carl T. Bergstrom1 July 16, 2009 1Department of Biology, University of Washington, Seattle, WA 2Department of Economics, University of California, Santa Barbara, CA *The authors are the founders of the Eigenfactor Project. All of the rank- ings, algorithms, visual tools and maps of science described here are freely available at http://www.eigenfactor.org/. Correspondence can be sent to Jevin D. West at [email protected]. Keywords: EigenfactorTM Score, Article InfluenceTM Score, Impact Factor, Bibliometrics, Citation Networks 1 Abstract Limited time and budgets have created a legitimate need for quan- titative measures of scholarly work. The well-known journal impact factor is the leading measure of this sort; here we describe an alter- native approach based on the full structure of the scholarly citation network. The Eigenfactor Metrics | Eigenfactor Score and Article Influence Score | use an iterative ranking scheme similar to Google's PageRank algorithm. By this approach, citations from top journals are weighted more heavily than citations from lower-tier publications. Here we describe these metrics and the rankings that they provide. 2 1 The Need for Alternative Metrics There is only one adequate approach to evaluating the quality of an individ- ual paper: read it carefully, or talk to others who have done so. The same is largely true when it comes to evaluating any small collection of papers, such as the publications of an individual scholar. But as one moves toward assessment challenges that involve larger bodies of work across broader seg- ments of scholarship, reading individual papers becomes infeasible and a legitimate need arises for quantitative metrics for research evaluation. The impact factor measure is perhaps the best known tool for this pur- pose. Impact factor was originally conceived by Eugene Garfield as way of selecting which journals to include in his Science Citation Index (Garfield 2006), but its use has expanded enormously: impact factor scores now affect hiring decisions, ad placement, promotion and tenure, university rankings and academic funding (Menastosky 2005). With so much at stake, we should be careful how aggregate, journal-level metrics like impact factor are used1. Impact factor has certain advantages as a citation measure: it is widely used and well understood. Moreover it is simple to calculate, and simple to explain. But this simplicity comes at a cost. Impact factor tallies the number of citations received, but ignores any information about the sources of those citations. A citation from top tier journal such as The American Economic Review is weighted the same as a citation from a journal that is 1Because of the large skew in the distribution of citations to papers in any given journal (Redner 1998), the quality or influence of a single paper is poorly estimated by the impact factor of the journal in which it has been published. For example, in 2005 the journal Nature reported that 89 percent of its impact factor came from 25 percent of its papers (Editor 2005). As a result, most papers from this journal are over-inflated by this method and some are greatly under-inflated. 3 rarely cited by anyone. Accounting for the source of each citation requires a more complicated computation, but the reward is a richer measure of quality. The Eigenfactor Metrics take this approach. 2 The Eigenfactor Metrics Each year, tens of thousands of scholarly journals publish hundreds of thou- sands of scholarly papers, collectively containing tens of millions of citations. As De Solla Price recognized in 1965(de Solla Price 1965), these citations form a vast network linking up the collective research output of the schol- arly community. If we think of this network at the journal level, each node in the network represents an individual journal. Each link in the network represents citations from one journal to another. The links are weighted and directed: strong weights represent large numbers of citations, and the direction of the link indicates the direction of the citations (see Figure 1). By viewing citation data as a network, we can use powerful algorithmic tools to mine valuable information from these data. The most famous of these tools, known as eigenvector centrality, was first introduced by sociologist Phillip Bonacich in 1972 as a way of quantifying an individual's status or popularity in communication networks (Bonacich 1972). Bonacich's aim was to use a network structure's to figure out who were the important people in the network. How do we tell who are the important people? They are the ones with important friends, of course. While this answer may sound circular, it turns out to be well-defined math- ematically, and moreover the \importances" of individuals in a network are easy to compute in a recursive manner. The most prominent commercial application of eigenvector centrality is Google's PageRank algorithm, which 4 Jrn D Jrn C Jrn B Jrn A Figure 1: A small journal citation network. Arrows indicate citations from each of four journals, A, B, C, and D, to one another. The size of the nodes represent the centrality of each node in the network, determined by the Eigenfactor Algorithm. Larger, darker nodes are more highly connected to other highly connected nodes. ranks the importance of websites by looking at the hyperlink structure of the world wide web (Page et al. 1998). Researchers have likewise applied this approach to a number of other network types, including citation net- works (Pinski and Narin 1976; Liebowitz and Palmer 1984; Kalaitzidakis, Mamuneas, and Stengos 2003; Palacios-Huerta and Volij 2004; Kodrzycki and Yu 2006; Bollen, Rodriquez, and Van de Sompel 2006). The concept of eigenvector centrality is at the core of the Eigenfactor Metrics as well(Bergstrom 2007). The idea is to take a network like the one shown in Figure 1 and determine which journals are the important journals. The importance depends on where a journal resides in this mesh of citation links. The more citations a journal receives|especially from other well 5 connected journals|the more central the journal is in the network. There are a number of ways to think about the recursive calculations by which importance scores are determined. For our purposes, it is particularly useful to think about the importance scores as coming from the result of a simple random process: Imagine that a researcher is to spend all eternity in the library randomly following citations within scientific periodicals. The researcher begins by picking a random journal in the library. From this volume she selects a ran- dom citation. She then walks over to the journal referenced by this citation. From this new volume she now selects another random citation and proceeds to that journal. This process is repeated ad infinitum. How often does the researcher visit each journal? The researcher will frequently visit journals that are highly cited by journals that are also highly cited. The Eigenfactor score of a journal is the percentage of the time that the model researcher visits that journal in her walk through the library2. So when we report that Nature had an Eigenfactor score of 2.0 in 2006, that means that two percent of the time, the model researcher would have been directed to Nature. Figure 1 provides an example network where this idea of centrality can be explored further. Because of the simplicity of the network, it is not difficult to see that in Figure 1 the most central node is Journal B. It receives more 2The Eigenfactor Algorithm expands somewhat upon the basic eigenvector centrality approach to better estimate the influence of journals from citation data. Further details are provided at http://www.eigenfactor.org/methods.htm. The full mathematical description of the Eigenfactor Algorithm is available at http://www.eigenfactor.org/methods.pdf. In addition, a pseudocode description that provides the recipe for the calculation is available at http://www.eigenfactor.org/methods.htm. 6 incoming links than any other node. The size of this node in Figure 1 reflects this centrality. If citations are a proxy for scientific importance, this journal would likely be a key component of a library's collection. Real citation networks are much more complicated than the one in Fig- ure 1. At Eigenfactor.org, we present metrics based on a network of 7,600 journals and over 8,500,000 citations, using data from the Thomson-Reuters Journal Citation Reports (JCR)3. With networks of this size, we need a fast computational approach to assess the importance of each journal. For- tunately, the Eigenfactor Algorithm computes the importance values for a network of this size in a matter of seconds on a standard desktop computer. We use the Eigenfactor Algorithm to calculate two principal metrics that address two different questions: EigenfactorTM Score and Article InfluenceTM Score. If one is interested in asking what the total value of a journal is| in other words, how often our model researcher is directed to any article within the journal by following citation chains|one would use the Eigen- factor score. When looking at the cost-effectiness of a journal, it is therefore useful to compare subscription price with Eigenfactor score. Table 2 lists the top twenty journals by Eigenfactor Score in 2006. The Eigenfactor Score is additive: to find the Eigenfactor of a group of journals, simply sum the Eigenfactors of each journal in the group. (One cannot do this with a measure such as impact factor or Article Influence, discussed below.) For example, the top five journals in Table 2 have an Eigenfactor sum of 8.909. This means that a researcher spends approxi- mately 8.909 percent of her time at this five journals (and thus these five are an important backbone of a science library collection).