Over-Optimization of Academic Publishing Metrics Draft Article

Total Page:16

File Type:pdf, Size:1020Kb

Over-Optimization of Academic Publishing Metrics Draft Article Over-Optimization of Academic Publishing Metrics Draft Article Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action Michael Fire1,2* and Carlos Guestrin1 Abstract The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which “when a measure becomes a target, it ceases to be a good measure.” In this study, we analyzed over 120 million papers to examine how the academic publishing world has evolved over the last century. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of over 2600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same de- partment. Academic publishing has changed considerably; now we need to reconsider how we measure success. Multimedia Links I Interactive Data Visualization I Code Tutorials I Fields-of-Study Features Table Keywords Science of Science; Scientometrics; Goodhart’s Law; Data Science; Big Data; Academic Publishing Metrics 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 2The eScience Institute, University of Washington, Seattle, WA *Corresponding author: fi[email protected] Contents 5 Discussion 13 6 Conclusions 17 Introduction1 7 Data and Code Availability 17 2 Related Work4 Acknowledgments 18 2.1 Changes in Publication Trends.............4 References 18 2.2 Success Metrics and Citation Trends.........4 A Supplementary Materials - Additional Results 20 3 Methods and Experiments5 A.1 Additional Results..................... 20 3.1 Datasets.............................5 The Microsoft Academic Graph (MAG) Dataset • The AMiner Dataset • The SCImago Journal Rank Dataset • The Join Dataset Introduction 3.2 Code Framework.......................5 In the last century, the academic publishing world has changed 3.3 Analysis of Publication Trends.............6 drastically in volume and velocity. The volume of papers has Paper Trends • Author Trends • Journal Trends • Field-of-Research increased sharply from about 174,000 papers published in Trends 1950 to over 7 million papers published 2014 (see Figure2). 4 Results7 Furthermore, the speed in which researchers can share and publish their studies has increased significantly. Today’s re- 4.1 Results of Paper Trends..................7 searchers can publish not only in an ever-growing number of 4.2 Results of Author Trends................ 11 traditional venues, such as conferences and journals, but also 4.3 Results of Journal Trends............... 12 in electronic preprint repositories and in mega-journals that 4.4 Results of Fields-of-Research Trends....... 13 provide rapid publication times [1]. Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action — 2/47 Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action — 3/47 Along with the exponential increase in the quantity of changes in academia at different levels (see Figure 1): published papers, the number of ranked scientific journals has increased to over 20,000 journals (see Figures 15 and S17), • Papers – We observed that on average papers became and the number of published researchers has soared (see Fig- shorter, yet other features, such as titles, abstracts, and ure S14). As part of this escalation, metrics such as the number author lists, became longer (see Figures4,9, S2, S5, S6, of papers, number of citations, impact factor, h-index, and and S7). Furthermore, both the number of references altmetrics are being used to compare the impact of papers, and the number of self-citations considerably increased researchers, journals, and universities [2, 3, 4, 5]. Using quan- (see Figures S9 and8). Moreover, the total number titative metrics to rank researchers contributes to a hypercom- of papers without any citations at all increased sharply petitive research environment, which is changing academic over time (see Figure 10). culture - and not in a positive direction [6]. • Authors – We noticed a sharp increase in the number of Studies suggests that publication patterns have changed as new authors. Moreover, we observed a negative relation a result of Goodhart’s Law, according to which, “When a mea- between researchers’ career ages and the number of sure becomes a target, it ceases to be a good measure” [6, 7]. publications, i.e., early career researchers tended to Goodhart’s Law, and its closely related Campbell’s Law [8], publish more quickly than those later in their careers influence many systems in our everyday life, including eco- (see Figure 12). Additionally, the average number of nomical [9], educational [8], and other decision-making sys- coauthors per author considerably increased over time tems [10]. As an example, Goodhart’s Law can be found in (see Figure 13). the NYPD’s manipulation of crime reports (the “measure”) in order to improve crime statistics (the “target”) [11]. Another • Journals – We observed a drastic increase in the number example is found in the educational system, revealing that of ranked journals, with several hundreds of new ranked when “test scores become the goal of the teaching process, journals published each year (see Figures 15, S17, and S18). they both lose their value as indicators of educational status In addition, we observed that journal ranking changed and distort the educational process in undesirable ways” [8]. significantly as average citations per document and Recent studies indicate that when measures become tar- SCImago Journal Rank (SJR) measures increased, while gets in academic publishing, the effectiveness of the measures the h-index measure decreased over time. Moreover, the can be compromised, and unwelcome and unethical behaviors percentage of papers with returning authors increased may develop, such as salami publications [12], ghost author- sharply in recent years (see Figure 19). For example, in ships [13], p-hacking [14], metrics manipulation [15], and Cell journal about 80% of all papers published in 2016 even faking of peer reviews [16]. included at least one author who had published in the If the influence of Goodhart’s Law on academia is indeed journal before, while in 1980 this rate stood at less than significant, then it should be possible to observe that academic 40%. entities, such as researchers and journals, will over-optimize their own measures to achieve a desired target. Similar to the • Fields of Research – We analyzed the properties of consequences of making test scores a target, chasing after cer- 19 major research domains, such as art, biology, and tain measures in the academic publishing world to desperately computer science, as well as the properties of 2600 sub- win the battle of “impact or perish”[7] can have undesirable domains, such photography, genetic diversity, and com- effects. puter animation. Our analysis revealed that different In this study, our main goal was to utilize new advances in domains had widely ranging properties (see Figures 21 data science tools to perform an in-depth and precise bottom- and S24), even within subdomains. For example, dif- up analysis of academic publishing over the decades. Our ferent subdomains of biology, and even subdomains of comprehensive analysis ranged from micro to macro levels as genetics, had surprisingly different average numbers of we studied individual researchers’ behaviors as well as behav- citations (see Figures S33, and S34). ioral changes within large research domains. Additionally, we These observations support the hypothesis that commonly wanted to uncover how and if Goodhart’s Law has changed used measures, such as the author’s number of papers, h-index, academic publishing. and citation number, have become targets. Researchers can To achieve our research goals, we developed an open- increase their average number of papers by writing shorter source code framework to analyze data from several large- papers with more coauthors. The h-index can be boosted by scale datasets containing over 120 million publications, with increasing the number of papers and self-citations. A higher 528 million references and 35 million authors,1 since the citation number is attained with longer titles and abstracts, as beginning of the 19th century. This provided a precise and full well as more references (see Section 4.1 and Figure S13). picture of how the academic publishing world has evolved. It is time to consider how we judge academic papers. In our analysis, we uncovered a wide variety of underlying Citation-based measures have been the standard for decades, 1The number of authors was estimated according to the unique full names but these measures are far from perfect. In fact, our study in the Microsoft Academic Graph dataset (see Section 4.2) shows that the validity of citation-based measures is being Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action — 4/47 compromised and their usefulness is lessening. Goodhart’s Additionally, many studies have focused on how pub- Law is in action in the academic publishing world. lication trends have changed over time, often focusing on The remainder of the paper is organized as follows: In specific geographical areas, various demographic character- Section2, we provide an overview of related studies.
Recommended publications
  • Market Power in the Academic Publishing Industry
    Market Power in the Academic Publishing Industry What is an Academic Journal? • A serial publication containing recent academic papers in a certain field. • The main method for communicating the results of recent research in the academic community. Why is Market Power important to think about? • Commercial academic journal publishers use market power to artificially inflate subscription prices. • This practice drains the resources of libraries, to the detriment of the public. How Does Academic Publishing Work? • Author writes paper and submits to journal. • Paper is evaluated by peer reviewers (other researchers in the field). • If accepted, the paper is published. • Libraries pay for subscriptions to the journal. The market does not serve the interests of the public • Universities are forced to “double-pay”. 1. The university funds research 2. The results of the research are given away for free to journal publishers 3. The university library must pay to get the research back in the form of journals Subscription Prices are Outrageous • The highest-priced journals are those in the fields of science, technology, and medicine (or STM fields). • Since 1985, the average price of a journal has risen more than 215 percent—four times the average rate of inflation. • This rise in prices, combined with the CA budget crisis, has caused UC Berkeley’s library to cancel many subscriptions, threatening the library’s reputation. A Comparison Why are prices so high? Commercial publishers use market power to charge inflated prices. Why do commercial publishers have market power? • They control the most prestigious, high- quality journals in many fields. • Demand is highly inelastic for high-quality journals.
    [Show full text]
  • Mapping the Future of Scholarly Publishing
    THE OPEN SCIENCE INITIATIVE WORKING GROUP Mapping the Future of Scholarly Publishing The Open Science Initiative (OSI) is a working group convened by the National Science Communi- cation Institute (nSCI) in October 2014 to discuss the issues regarding improving open access for the betterment of science and to recommend possible solutions. The following document summa- rizes the wide range of issues, perspectives and recommendations from this group’s online conver- sation during November and December 2014 and January 2015. The 112 participants who signed up to participate in this conversation were drawn mostly from the academic, research, and library communities. Most of these 112 were not active in this conversa- tion, but a healthy diversity of key perspectives was still represented. Individual participants may not agree with all of the viewpoints described herein, but participants agree that this document reflects the spirit and content of the conversation. This main body of this document was written by Glenn Hampson and edited by Joyce Ogburn and Laura Ada Emmett. Additional editorial input was provided by many members of the OSI working group. Kathleen Shearer is the author of Annex 5, with editing by Dominque Bambini and Richard Poynder. CC-BY 2015 National Science Communication Institute (nSCI) www.nationalscience.org [email protected] nSCI is a US-based 501(c)(3) nonprofit organization First edition, January 2015 Final version, April 2015 Recommended citation: Open Science Initiative Working Group, Mapping the Future of Scholarly
    [Show full text]
  • A Quick Guide to Scholarly Publishing
    A QUICK GUIDE TO SCHOLARLY PUBLISHING GRADUATE WRITING CENTER • GRADUATE DIVISION UNIVERSITY OF CALIFORNIA • BERKELEY Belcher, Wendy Laura. Writing Your Journal Article in 12 Weeks: A Guide to Academic Publishing Success. Thousand Oaks, CA: SAGE Publications, 2009. Benson, Philippa J., and Susan C. Silver. What Editors Want: An Author’s Guide to Scientific Journal Publishing. Chicago: University of Chicago Press, 2013. Derricourt, Robin. An Author’s Guide to Scholarly Publishing. Princeton, NJ: Princeton University Press, 1996. Germano, William. From Dissertation to Book. 2nd ed. Chicago Guides to Writing, Editing, and Publishing. Chicago: University of Chicago Press, 2013. ———. Getting It Published: A Guide for Scholars and Anyone Else Serious about Serious Books. 3rd ed. Chicago: University of Chicago Press, 2016. Goldbort, Robert. Writing for Science. New Haven & London: Yale University Press, 2006. Harman, Eleanor, Ian Montagnes, Siobhan McMenemy, and Chris Bucci, eds. The Thesis and the Book: A Guide for First- Time Academic Authors. 2nd ed. Toronto: University of Toronto Press, 2003. Harmon, Joseph E., and Alan G. Gross. The Craft of Scientific Communication. Chicago: University of Chicago Press, 2010. Huff, Anne Sigismund. Writing for Scholarly Publication. Thousand Oaks, CA: SAGE Publications, 1999. Luey, Beth. Handbook for Academic Authors. 5th ed. Cambridge, UK: Cambridge University Press, 2009. Luey, Beth, ed. Revising Your Dissertation: Advice from Leading Editors. Updated ed. Berkeley: University of California Press, 2007. Matthews, Janice R., John M. Bowen, and Robert W. Matthews. Successful Scientific Writing: A Step-by-Step Guide for the Biological and Medical Sciences. 3rd ed. Cambridge, UK: Cambridge University Press, 2008. Moxley, Joseph M. PUBLISH, Don’t Perish: The Scholar’s Guide to Academic Writing and Publishing.
    [Show full text]
  • Using Bibliometric Big Data to Analyze Faculty Research Productivity in Health Policy and Management Christopher A. Harle
    Using Bibliometric Big Data to Analyze Faculty Research Productivity in Health Policy and Management Christopher A. Harle, PhD [corresponding author] Joshua R. Vest, PhD, MPH Nir Menachemi, PhD, MPH Department of Health Policy and Management, Richard M. Fairbanks School of Public Health, Indiana University 1050 Wishard Blvd. Indianapolis IN 46202-2872 PHONE: 317-274-5396 Keywords: big data, faculty productivity, bibliometrics, citation analysis ___________________________________________________________________ This is the author's manuscript of the article published in final edited form as: Harle, C. A., Vest, J. R., & Menachemi, N. (2016). Using Bibliometric Big Data to Analyze Faculty Research Productivity in Health Policy and Management. The Journal of Health Administration Education; Arlington, 33(2), 285–293. ABSTRACT Bibliometric big data and social media tools provide new opportunities to aggregate and analyze researchers’ scholarly impact. The purpose of the current paper is to describe the process and results we obtained after aggregating a list of public Google Scholar profiles representing researchers in Health Policy and Management or closely-related disciplines. We extracted publication and citation data on 191 researchers who are affiliated with health administration programs in the U.S. With these data, we created a publicly available listing of faculty that includes each person’s name, affiliation, year of first citation, total citations, h-index and i-10 index. The median of total citations per individual faculty member was 700, while the maximum was 46,363. The median h-index was 13, while the maximum was 91. We plan to update these statistics and add new faculty to our public listing as new Google Scholar profiles are created by faculty members in the field.
    [Show full text]
  • Tipping Points: Cancelling Journals When Arxiv Access Is Good Enough
    Tipping points: cancelling journals when arXiv access is good enough Tony Aponte Sciences Collection Coordinator UCLA Library ASEE ELD Lightning Talk June 17, 2019 Preprint explosion! Brian Resnick and Julia Belluz. (2019). The war to free science. Vox https://www.vox.com/the-highlight/2019/6/3/18271538/open- access-elsevier-california-sci-hub-academic-paywalls Preprint explosion! arXiv. (2019). arXiv submission rate statistics https://arxiv.org/help/stats/2018_by_area/index 2018 Case Study: two physics journals and arXiv ● UCLA: heavy users of arXiv. Not so heavy users of version of record ● Decent UC authorship ● No UC editorial board members 2017 Usage Annual cost Cost per use 2017 Impact Factor Journal A 103 $8,315 ~$80 1.291 Journal B 72 $6,344 ~$88 0.769 Just how many of these articles are OA? OAISSN.py - Enter a Journal ISSN and a year and this python program will tell you how many DOIs from that year have an open access version2 Ryan Regier. (2018). OAISSN.py https://github.com/ryregier/OAcounts. Just how many of these articles are OA? Ryan Regier. (2018). OAISSN.py https://github.com/ryregier/OAcounts. Just how many of these articles are OA? % OA articles from 2017 % OA articles from 2018 Journal A 68% 64% Journal B 11% 8% Ryan Regier. (2018). OAISSN.py https://github.com/ryregier/OAcounts. arXiv e-prints becoming closer to publisher versions of record according to UCLA similarity study of arXiv articles vs versions of record Martin Klein, Peter Broadwell, Sharon E. Farb, Todd Grappone. 2018. Comparing Published Scientific Journal Articles to Their Pre-Print Versions -- Extended Version.
    [Show full text]
  • Microsoft Academic Search and Google Scholar Citations
    Microsoft Academic Search and Google Scholar Citations Microsoft Academic Search and Google Scholar Citations: A comparative analysis of author profiles José Luis Ortega1 VICYT-CSIC, Serrano, 113 28006 Madrid, Spain, [email protected] Isidro F. Aguillo Cybermetrics Lab, CCHS-CSIC, Albasanz, 26-28 28037 Madrid, Spain, [email protected] Abstract This paper aims to make a comparative analysis between the personal profiling capabilities of the two most important free citation-based academic search engines, namely Microsoft Academic Search (MAS) and Google Scholar Citations (GSC). Author profiles can be very useful for evaluation purposes once the advantages and the shortcomings of these services are described and taken into consideration. A total of 771 personal profiles appearing in both the MAS and the GSC databases are analysed. Results show that the GSC profiles include more documents and citations than those in MAS, but with a strong bias towards the Information and Computing sciences, while the MAS profiles are disciplinarily better balanced. MAS shows technical problems such as a higher number of duplicated profiles and a lower updating rate than GSC. It is concluded that both services could be used for evaluation proposes only if they are applied along with other citation indexes as a way to supplement that information. Keywords: Microsoft Academic Search; Google Scholar Citations; Web bibliometrics; Academic profiles; Research evaluation; Search engines Introduction In November 2009, Microsoft Research Asia started a new web search service specialized in scientific information. Even though Google (Google Scholar) already introduced an academic search engine in 2004, the proposal of Microsoft Academic Search (MAS) went beyond a mere document retrieval service that counts citations.
    [Show full text]
  • Evolving Standards for Academic Publishing: a Q-R Theory
    Evolving Standards for Academic Publishing: A q-r Theory Glenn Ellison Massachusetts Institute of Technology and Nationat Bureau of Economic Research This paper develops models of quality standards to examine two trends: academic journals increasingly require extensive revisions of submissions, and articles are becoming longer and changing in other ways. Papers are modeled as varying along two quality dimensions: q reflects the importance ofthe main ideas and rother aspects of quality. Observed trends are regarded as increases in r^qualit). A static equi- libritim model illustrates comparative statics explanations. A dynamic model in which referees (with a biased view of their own work) learn social norms for weighting q and r is shown to produce a long, gradual evolution of social norms. I. Introduction I encourage readers of this paper to first put it down for two minutes and thumb through a 30- or 40-year-old issue of an economics jotirnal. This should convey better than I can with words how dramatically eco- nomics papers have changed over the last few decades. Papers today are much longer.' They have longer introductions.^ They have more I would like to thank the National Science Foundation (SBR-9ai8534), the Sloan Foun- dation, and the Center for Advanced Study in the Behavioral Sciences for their .support. I would like to thank Robt^rt Barro, John (>)chrane. Ken Corts, Drew Fudenberg, John Geanakoplns, Robin Pemantle, Gary Snxonhouse, and various seminar audiences for help- ful comments and (Jhristine Kiang and Nada Mora for outstanding research assistance. '.Ariel Rubinstein's 1982 Econometrica ariicle, "Perfect Equilibrium in a Bargaining Model," e.g., is about as lonj^ as the average note in Eronometrica in 2001.
    [Show full text]
  • Scholarly Metrics Recommendations for Research Libraries: Deciphering the Trees in the Forest Table of Contents
    EUROPE’S RESEARCH LIBRARY NETWORK Scholarly Metrics Recommendations for Research Libraries: Deciphering the Trees in the Forest Table of Contents 01 3D. Learn About Platforms Before Implementing Them In Services & 15 INTRODUCTION 03 Homogenize Different Sources ABOUT LIBER 3E. Work With Researchers To Build Awareness Of Benefits But Educate 16 1. DISCOVERY & DISCOVERABILITY 05 About Weaknesses of Scholarly Metrics 1A. Provide Contextual Information To Allow the Discovery of Related 05 3F. Expand Your Perspective When Developing Services; Avoid Single- 17 Work & Users Purpose Approaches 1B. Exploit Rich Network Structures & Implement Bibliometric Methods To 07 3G. Teach Colleagues New Skills & Library Goals & Services 17 Enable Discovery 1C. Encourage Sharing Of Library Collections Under Open Licenses 08 4. RESEARCH ASSESSMENT 19 4A. Establish Appropriate Goals for Assessment Exercises Before Selecting 19 2. SHOWCASING ACHIEVEMENTS 09 Databases & Metrics; Be Transparent About Use & Interpretation 2A. Incentivize Researchers To Share Scholarly Works, Promote Achievements 09 4B. Use Different Data Sources to Include Various Scholarly Works & 20 Online & Engage With Audiences Disciplinary Communication & Publication Cultures 2B. Encourage Researchers To Showcase Scientific Contributions & Monitor 11 4C. Rely on Objective, Independent & Commonly-Accepted Data Sources 21 Impact To Provide Sound & Transparent Scholarly Metrics 4D. Avoid Using Composite Indicators & Conflating Different Aspects 21 3. SERVICE DEVELOPMENT 13 Of Scholarly Works & Impact; Don’t Lose The Multifaceted Nature of 3A. Join Forces With Stakeholders To Provide Services Reusing Existing 13 Metrics & Distort Interpretation Resources, Tools, Methods & Data 3B. Value Various Levels of Engagement; Favour Standardized, Well-Established 15 23 CONCLUSION & CALL FOR ACTION Practices & Easy-To-Use Tools 25 REFERENCES 3C.
    [Show full text]
  • 1 Scientometric Indicators and Their Exploitation by Journal Publishers
    Scientometric indicators and their exploitation by journal publishers Name: Pranay Parsuram Student number: 2240564 Course: Master’s Thesis (Book and Digital Media Studies) Supervisor: Prof. Fleur Praal Second reader: Dr. Adriaan van der Weel Date of completion: 12 July 2019 Word count: 18177 words 1 Contents 1. Introduction ............................................................................................................................ 3 2. Scientometric Indicators ........................................................................................................ 8 2.1. Journal Impact Factor ...................................................................................................... 8 2.2. h-Index .......................................................................................................................... 10 2.3. Eigenfactor™ ................................................................................................................ 11 2.4. SCImago Journal Rank.................................................................................................. 13 2.5. Source Normalized Impact Per Paper ........................................................................... 14 2.6. CiteScore ....................................................................................................................... 15 2.6. General Limitations of Citation Count .......................................................................... 16 3. Conceptual Framework .......................................................................................................
    [Show full text]
  • Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and Opencitations’ COCI: a Multidisciplinary Comparison of Coverage Via Citations
    ✅ The content of this manuscript has been peer-reviewed and accepted for publication at Scientometrics (ISSN 1588-2861) Cite as: Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & Delgado López-Cózar, E. (2021). Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: A multidisciplinary comparison of coverage via citations. Scientometrics, 126(1), 871-906. https://doi.org/10.1007/s11192-020-03690-4 Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations Alberto Martín-Martín 1 , Mike Thelwall 2 , Enrique Orduna- 3 1 Malea , Emilio Delgado López-Cózar Abstract (also in Spanish, Chinese) Introduction New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. Methods In response, this paper investigates 3,073,351 citations found by these six data sources to 2,515 English-language highly-cited documents published in 2006 from 252 subject categories, expanding and updating the largest previous study. Results Google Scholar found 88% of all citations, many of which were not found by the other sources, and nearly all citations found by the remaining sources (89%-94%). A similar pattern held within most subject categories. Microsoft Academic is the second largest overall (60% of all citations), including 82% of Scopus citations and 86% of Web of Science citations. In most categories, Microsoft Academic found more citations than Scopus and WoS (182 and 223 subject categories, respectively), but had coverage gaps in some areas, such as Physics and some Humanities categories.
    [Show full text]
  • Citeseerx: 20 Years of Service to Scholarly Big Data
    CiteSeerX: 20 Years of Service to Scholarly Big Data Jian Wu Kunho Kim C. Lee Giles Old Dominion University Pennsylvania State University Pennsylvania State University Norfolk, VA University Park, PA University Park, PA [email protected] [email protected] [email protected] ABSTRACT access to a growing number of researchers. Mass digitization par- We overview CiteSeerX, the pioneer digital library search engine, tially solved the problem by storing document collections in digital that has been serving academic communities for more than 20 years repositories. The advent of modern information retrieval methods (first released in 1998), from three perspectives. The system per- significantly expedited the process of relevant search. However, spective summarizes its architecture evolution in three phases over documents are still saved individually by many users. In 1997, three the past 20 years. The data perspective describes how CiteSeerX computer scientists at the NEC Research Institute (now NEC Labs), has created searchable scholarly big datasets and made them freely New Jersey, United States – Steven Lawrence, Kurt Bollacker, and available for multiple purposes. In order to be scalable and effective, C. Lee Giles, conceived an idea to create a network of computer AI technologies are employed in all essential modules. To effectively science research papers through citations, which was to be imple- train these models, a sufficient amount of data has been labeled, mented by a search engine, the prototype CiteSeer. Their intuitive which can then be reused for training future models. Finally, we idea, automated citation indexing [8], changed the way researchers discuss the future of CiteSeerX.
    [Show full text]
  • Tracking Self-Citations in Academic Publishing Kacem, Ameni; Flatt, Justin W.; Mayr, Philipp
    www.ssoar.info Tracking self-citations in academic publishing Kacem, Ameni; Flatt, Justin W.; Mayr, Philipp Veröffentlichungsversion / Published Version Zeitschriftenartikel / journal article Zur Verfügung gestellt in Kooperation mit / provided in cooperation with: GESIS - Leibniz-Institut für Sozialwissenschaften Empfohlene Zitierung / Suggested Citation: Kacem, A., Flatt, J. W., & Mayr, P. (2020). Tracking self-citations in academic publishing. Scientometrics, 123(2), 1157-1165. https://doi.org/10.1007/s11192-020-03413-9 Nutzungsbedingungen: Terms of use: Dieser Text wird unter einer CC BY Lizenz (Namensnennung) zur This document is made available under a CC BY Licence Verfügung gestellt. Nähere Auskünfte zu den CC-Lizenzen finden (Attribution). For more Information see: Sie hier: https://creativecommons.org/licenses/by/4.0 https://creativecommons.org/licenses/by/4.0/deed.de Diese Version ist zitierbar unter / This version is citable under: https://nbn-resolving.org/urn:nbn:de:0168-ssoar-71640-3 Scientometrics (2020) 123:1157–1165 https://doi.org/10.1007/s11192-020-03413-9 Tracking self‑citations in academic publishing Ameni Kacem1 · Justin W. Flatt2,3 · Philipp Mayr1 Received: 26 February 2020 / Published online: 18 March 2020 © The Author(s) 2020 Abstract Citation metrics have value because they aim to make scientifc assessment a level play- ing feld, but urgent transparency-based adjustments are necessary to ensure that meas- urements yield the most accurate picture of impact and excellence. One problematic area is the handling of self-citations, which are either excluded or inappropriately accounted for when using bibliometric indicators for research evaluation. Here, in favor of openly tracking self-citations we report on self-referencing behavior among various academic disciplines as captured by the curated Clarivate Analytics Web of Science database.
    [Show full text]