COMMENT SUSTAINABILITY Data needed CONSERVATION Economics GEOLOGY Questions raised over HISTORY Music inspired to drive UN development and environ­mental proposed Anthropocene Newton to add more colours goals p.432 catastrophe p.434 dates p.436 to the rainbow p.436

The Leiden Manifesto for research metrics Use these ten principles to guide research evaluation, urge Diana Hicks, Paul Wouters and colleagues.

ata are increasingly used to govern advice on, good practice and interpretation. were introduced, such as InCites (using the science. Research evaluations that Before 2000, there was the Science Cita- Web of Science) and SciVal (using Scopus), were once bespoke and performed tion Index on CD-ROM from the Institute for as well as software to analyse individual cita- Dby peers are now routine and reliant on Scientific Information (ISI), used by experts tion profiles using Google Scholar (Publish or metrics1. The problem is that evaluation is for specialist analyses. In 2002, Thomson Perish, released in 2007). now led by the data rather than by judge- Reuters launched an integrated web platform, In 2005, Jorge Hirsch, a physicist at the ment. Metrics have proliferated: usually making the Web of Science database widely University of California, San Diego, pro-

well intentioned, not always well informed, accessible. Competing citation indices were posed the h-index, popularizing citation PARKINS DAVID BY ILLUSTRATION often ill applied. We risk damaging the sys- created: ’s Scopus (released in 2004) counting for individual researchers. Inter- tem with the very tools designed to improve and Google Scholar (beta version released est in the journal grew steadily it, as evaluation is increasingly implemented in 2004). Web-based tools to easily compare after 1995 (see ‘Impact-factor obsession’). by organizations without knowledge of, or institutional research productivity and impact Lately, metrics related to social usage

23 APRIL 2015 | VOL 520 | | 429 © 2015 Macmillan Publishers Limited. All rights reserved COMMENT

and online comment have gained deliberation. This should strengthen peer stated rules, set before the research has been momentum — F1000Prime was estab- review, because making judgements about completed. This was common practice lished in 2002, Mendeley in 2008, and colleagues is difficult without a range of rel- among the academic and commercial groups Altmetric.com (supported by Macmillan evant information. However, assessors must that built bibliometric evaluation methodol- Science and Education, which owns Nature not be tempted to cede decision-making to ogy over several decades. Those groups Publishing Group) in 2011. the numbers. Indicators must not substitute referenced protocols published in the peer- As scientometricians, social scientists and for informed judgement. Everyone retains reviewed literature. This transparency research administrators, we have watched responsibility for their assessments. enabled scrutiny. For example, in 2010, pub- with increasing alarm the pervasive misap- lic debate on the technical properties of an plication of indicators to the evaluation of 2 Measure performance against the important indicator used by one of our scientific performance. The following are research missions of the institution, groups (the Centre for Science and Technol- just a few of numerous examples. Across the group or researcher. Programme goals ogy Studies at Leiden University in the Neth- world, universities have become obsessed should be stated at the start, and the indica- erlands) led to a revision in the calculation with their position in global rankings (such as tors used to evaluate performance should of this indicator6. Recent commercial the Shanghai Ranking and Times Higher Edu- relate clearly to those goals. The choice of entrants should be held to the same stand- cation’s list), even when such lists are based indicators, and the ards; no one should accept a black-box on what are, in our view, inaccurate data and ways in which they “Simplicity evaluation machine. arbitrary indicators. are used, should take is a virtue in Simplicity is a virtue in an indicator Some recruiters request h-index values for into account the an indicator because it enhances transparency. But sim- candidates. Several universities base promo- wider socio-eco- because it plistic metrics can distort the record (see tion decisions on threshold h-index values nomic and cultural enhances principle 7). Evaluators must strive for bal- and on the number of articles in ‘high- contexts. Scientists transparency.” ance — simple indicators true to the com- impact’ journals. Researchers’ CVs have have diverse research plexity of the research process. become opportunities to boast about these missions. Research that advances the fron- scores, notably in biomedicine. Everywhere, tiers of academic knowledge differs from 5 Allow those evaluated to verify data supervisors ask PhD students to publish in research that is focused on delivering solu- and analysis. To ensure data quality, high-impact journals and acquire external tions to societal problems. Review may be all researchers included in bibliometric stud- funding before they are ready. based on merits relevant to policy, industry ies should be able to check that their outputs In Scandinavia and China, some universi- or the public rather than on academic ideas have been correctly identified. Everyone ties allocate research funding or bonuses on of excellence. No single evaluation model directing and managing evaluation pro- the basis of a number: for example, by cal- applies to all contexts. cesses should assure data accuracy, through culating individual impact scores to allocate self-verification or third-party audit. Univer- ‘performance resources’ or by giving research- 3 Protect excellence in locally relevant sities could implement this in their research ers a bonus for a publication in a journal with research. In many parts of the world, information systems and it should be a guid- an impact factor higher than 15 (ref. 2). research excellence is equated with English- ing principle in the selection of providers of In many cases, researchers and evalua- language publication. Spanish law, for exam- these systems. Accurate, high-quality data tors still exert balanced judgement. Yet the ple, states the desirability of Spanish scholars take time and money to collate and process. abuse of research metrics has become too publishing in high-impact journals. The Budget for it. widespread to ignore. impact factor is calculated for journals We therefore present the Leiden Manifesto, indexed in the US-based and still mostly 6 Account for variation by field in named after the conference at which it crys- English-language Web of Science. These publication and citation practices. tallized (see http://sti2014.cwts.nl). Its ten biases are particularly problematic in the Best practice is to select a suite of possible principles are not news to scientometricians, social sciences and humanities, in which indicators and allow fields to choose among although none of us would be able to recite research is more regionally and nationally them. A few years ago, a European group of them in their entirety because codification engaged. Many other fields have a national historians received a relatively low rating in has been lacking until now. Luminaries in the or regional dimension — for instance, HIV a national peer-review assessment because field, such as (founder of the epidemiology in sub-Saharan Africa. they wrote books rather than articles in jour- ISI), are on record stating some of these prin- This pluralism and societal relevance nals indexed by the Web of Science. The ciples3,4. But they are not in the room when tends to be suppressed to create papers of historians had the misfortune to be part of a evaluators report back to university admin- interest to the gatekeepers of high impact: psychology department. Historians and istrators who are not expert in the relevant English-language journals. The Spanish social scientists require books and national- methodology. Scientists searching for litera- sociologists that are highly cited in the Web language literature to be included in their ture with which to contest an evaluation find of Science have worked on abstract mod- publication counts; computer scientists the material scattered in what are, to them, els or study US data. Lost is the specificity require conference papers be counted. obscure journals to which they lack access. of sociologists in high-impact Spanish- Citation rates vary by field: top-ranked We offer this distillation of best practice language papers: topics such as local labour journals in mathematics have impact fac- in metrics-based research assessment so that law, family health care for the elderly or tors of around 3; top-ranked journals in researchers can hold evaluators to account, immigrant employment5. Metrics built on cell biology have impact factors of about 30. and evaluators can hold their indicators to high-quality non-English literature would Normalized indicators are required, and the account. serve to identify and reward excellence in most robust normalization method is based locally relevant research. on percentiles: each paper is weighted on the TEN PRINCIPLES basis of the percentile to which it belongs Quantitative evaluation should sup- 4 Keep data collection and analytical in the citation distribution of its field (the 1 port qualitative, expert assessment. processes open, transparent and top 1%, 10% or 20%, for example). A single Quantitative metrics can challenge bias simple. The construction of the databases highly cited publication slightly improves tendencies in peer review and facilitate required for evaluation should follow clearly the position of a university in a ranking that

430 | NATURE | VOL 520 | 23 APRIL 2015 © 2015 Macmillan Publishers Limited. All rights reserved DATA SOURCE: THOMSON REUTERS WEB OF SCIENCE; ANALYSIS: D.H., L.W. ambiguity and uncertainty and require indicatorsogy are prone to conceptual andactivities influenceis best. about an individual’s experience, expertise, approach that considers more information comparing large numbers of researchers, an than relying on one number. Even when researcher’s work is much more appropriate Scholar in Google around 10 in Webthe of but Science of 20–30 computer have who science an database dependent: there are researchers in scientistsand at social 20–30(ref. 8).It is life scientists top out at 200;physicists at 100 of papers. new The your higher the ment of their portfolio. top of aranking built on citation averages propel university the from to middle the the on percentileis based indicators, but may for example, has long debated. been Thus, accepted. The meaning of citation counts, strong assumptions that are not universally 8 Multidisciplinary 7 IMPACT-FACTOR OBSESSION *Indexed intheWeb ofScience. crisis inresearch evaluation. of itemspublishedinajournalthepasttwoyears Soaring interest inonecrudemeasure Medical andlife

2 Papers mentioning ‘impact factor’ in title (per 100,000 papers*) WHO ISMOST OBSESSED? ARTICLES MENTIONING‘IMPACT FACTOR’ INTITLE false precision.false researchers on a qualitative judge Avoid concreteness misplaced and Base assessmentBase of individual 0 2 4 6 8 sciences sciences sciences Physical journals 1984 Social h 0 -index, even in the absence -index, absence eveninthe Research article Editorial material 9 . Reading and. Reading judging a h †DORA, -index varies by field: 1989 Science and Science technol The older you are, 50 San Francisco Declaration onResearch Assessment. Papers publishedin2005–14mentioning‘impactfactor’ intitle, articles tosocialsciences. large numberofresearch Bibliometric journalsadda h 100 -index of 1994 — by discipline(per100,000papers*) theaverage citationcounts 7 . 150 - - 1999 © 2015 Macmillan Publishers Limited. Allrights reserved — sities could calculate ‘value’ the of apaper in of papers published by an institute. Univer using aformula largely based on number the 1990s, Australia university funded research Forbecomes goal). the example, in the placement measurement the (inwhich single one invite will gaming and dis goal suite of indicators is always preferable —a should anticipated. be This means that a incentives establish. they effects These tors change system the through the sion: only one is warranted. decimal impact factor differences. Avoid preci false journals onbetween small basis the of very counts, it makes to no distinguish sense ambiguity and random variability of citation avoid However, ties. given conceptual the factor is published to places to three decimal precision. For example, impact journal the cator producers should at least avoid false indicator values. If is this not possible, indi information should accompany published quantified, for instance using error bars, this picture. If uncertainty and error can be provide amore robust and pluralistic multiple uses practice best indicators to illustrates the 200 9 assessment and indicators. Recognize the systemic the of effects Recognize 250 2004 on impactfactors. Special issueof 300 Research article Editorial material DORA research quality. impact factor with equating ofjournal calls forahaltonthe 2009 350 † declaration journal Indica 2014 400 - - - - - 23 APRIL 2015 | VOL 2015 | NATURE 520 | APRIL 23 |431 crucial informationcrucial that would difficult be with society. Research can provide metrics development of and science its interactions evaluation can play an important in the part Abiding by ten these principles, research NEXTSTEPS emphasizes quality. Research for Australia initiative, which duced its more complex Excellence in simplistic formula, Australia in2010intro of its effects the perhaps Realizing modified. cator systems have and reviewed to be become inadequate; ones new emerge. Indi system itself co-evolves. metrics Once-useful goals ofthe assessment and shift research the suggesting that article quality fell up, but were they inless-cited journals, published by Australian researchers went funding. Predictably, number the of papers (around US$480in2000)research a refereed in2000,it journal; was Aus$800 10. 9. 8. 7. 6. 5. 4. 3. 2. Netherlands. the and Technology Studies, University, Leiden professor, Science assistant for Centre at the aresearcher,is and director, and scientometrics Georgia, USA. the Georgia Institute of Technology, Atlanta, Hicks Diana informed by highest the quality data. on high-qualitybased processes that are Decision-making about must science be are eachis objective needed; inits own way. quantitativeBoth and qualitative evidence and nature of research the that is evaluated. robust statistics with sensitivity to aim the an instrument into goal. the mation must not allowed be to morph from But quantitative this vidual expertise. infor to gather or understand by means of indi 1. e-mail: [email protected] University of Valencia, Spain. Research Council and the Polytechnic policy researcher at the Spanish National 10

The best decisionsThe best are by taken combining

16569–16572 (2005). 2419–2432 (2012). 4, M. S.,van Eck, N.J.&Waltman, L. (2015). (2006). (2011). Press, 2014). (eds Cronin, B.&Sugimoto,C.)47–66(MIT Multidimensional IndicatorsofScholarlyImpact ulr L. Butler, Bar-Ilan, J. Hirsch, J.E. Waltman, L. van Raan, A. F. J.,van Leeuwen,T. N.,Visser, López Piñeiro, C.&Hicks,D. Garfield, E. Seglen, P. O. Shao, J.&Shen,H. Wouters, P. in 431–435(2010). update them. Scrutinize indicators and regularly Scrutinize Res. Policy Scientometrics J. Am.Med.Assoc. Proc. NatlAcad.Sci. USA et al.J.Am.Soc.Inf.Sci.Technol. is professor of public policy at at policy public of professor is Br. Med.J. Beyond : Harnessing Beyond Bibliometrics:Harnessing Paul Wouters Ismael Rafols Learned Publ. Sarah de Rijcke de Sarah

32, Research missions and

314, 143–155 (2003). 143–155(2003).

74, 498–502 (1997). 498–502(1997). Res. Eval.

257–271(2008). 295, Ludo WaltmanLudo is professor of of professor is

J.Informetrics 24, is ascience- is COMMENT 90–93 90–93

95–97 95–97 102, 10

24, . is is 78–89 78–89

63,

- - - -