“Publish Or Perish” in the Internet Age

“Publish or Perish” in the Internet Age A study of publication statistics in computer networking research Dah Ming Chiu and Tom Z. J. Fu Department of Information Engineering, CUHK {dmchiu, zjfu6}@ie.cuhk.edu.hk This article is an editorial note submitted to CCR. It has NOT been peer reviewed. The author takes full responsibility for this article’s technical content. Comments can be posted through CCR Online. ABSTRACT in the academic world. In this paper, we study the publi- This study takes papers from a selected set of computer net- cation records of a selected set of conferences and journals working conferences and journals spanning the past twenty in the networking field in the past 10-20 years at an aggre- years (1989-2008) to produce various statistics to show how gate level, summarize, and discuss the statistics. Through our community publishes papers, and how this process is these statistics, we hope to (1) share some interesting ob- changing over the years. We observe the rapid growth in the servations about the way our community publish, and the rate of publications, venues, citations, authors, and number characteristics of some familiar conferences and journals; (2) of co-authors. We explain how these quantities are related, ask questions and generate interest for future studies; (3) in particular explore how they are related over time and get feedback on methodologies and possible collaboration the reasons behind the changes. The widely accepted model for this kind of studies. to explain the power law distribution of paper citations is For the rest of the paper, we begin by describing (a) the preferential attachment. We propose an extension and re- paperset we use in the context of existing on-line sources, finement that suggests elapsed time is also a factor to deter- and (b) the most important previous works and our ap- mine which papers get cited. We try to compare the selected proach. Subsequently, we discuss various results and ob- venues based on citation count, and discuss how we might servations one by one, ending with a conclusion section. For think about these comparisons, in terms of the roles played more detailed statistics and a detailed explanation of the by different venues, and the ability to predict impact by methodology, see our technical report [9]. venues, and citation counts. The treatment of these issues is general and can be applied to study publication patterns 2. DATA MODEL in other research communities. The larger goal of this study We begin with a brief definition of terminology and an is to generate discussion about our publication system, and explanation of the data we analyze. work towards a vision to transform our publication system A paper is published by a venue at a date. Venues refer to for better scalability and effectiveness. conferences (workshops) or journals. Most of them publish papers on a periodic basis (e.g. every year, or every month). Categories and Subject Descriptors A paperset is a collection of papers we use to calculate various statistics. Each paper has one or more authors;each C.2.0 [Computer-Communication Networks]: General author may publish one or more papers. As a rule, each paper cites related papers published earlier. The citation count General Terms of a paper, which changes as time goes on, is the total cita- Documentation, Performance tion received by a paper by a given time. This data model is used by various providers of academic publication statistics, e.g. Google Scholar [1], DBLP [2], IEEE [3], ACM [4], Keywords ISI [5], Citeseer [6], or Microsoft Libra [7]. These providers, citation, h-index, g-index, co-authorship, academic publish- however, tend to use different papersets. For example, Cite- ing seer, DBLP and Libra focus mostly on computer science and related literature, but each has its own rules of which conferences/papers to include or not. The various statistics, 1. INTRODUCTION e.g. citation count of papers, are derived from the respective The academic publication machineries, taken as a whole, closed papersets, leaving them not cross-comparable. They provides an archive for peer-reviewed academic papers. In also tend to use different metrics, e.g. ISI and Citeseer may the process, the meta-information associated with the pub- have different definitions of impact factor for venues. lications, such as date and venue of publication, author- For our study, we decided to focus on a particular research ship and citations can also be readily gathered from various field, in this case, computer networking. We also decided to sources [1–5]. Such publication records can be very useful use the Google Scholar citation count (for each paper we for research, though it is perhaps most often used in per- consider) because Google Scholar’s paperset is arguably the formance evaluation for hiring, promotion and tenure cases largest. This decision has a couple of consequences: (a) The ACM SIGCOMM Computer Communication Review 34 Volume 40, Number 1, January 2010 process of gathering citation counts from Google Scholar than the focus on the computer networking field, is mainly cannot be completely automated. For a given set of papers, in the consideration of how things change over time. In the it takes some time to gather the citation counts with some discussion of our results, we will make passing reference to level of manual verification. (b) The citation count of a pa- the prior works as the opportunities arise. per in Google Scholar is continuously updated and changing. This means we need to focus on a limited set of conferences. 3. PAPER INFLATION In this study, we tried to choose those venues that we are It should not be surprising to observe that we are seeing more familiar with, and tried to pick different types to make a rapid increase in the rate we are publishing papers in our them reasonably representative. field. We can account for the increase in terms of: The paperset we consider comes from the set of venues listed in Table 1. We illustrate its relationship to the other 1. Many conferences increase the number of papers the papersets in Fig 1. The citation counts are gathered from publish, usually by increasing the number of tracks of Google Scholar in late summer of 2009, and can be consid- presentation. Fig 2 shows the total rate of publication ered as a snapshot. of the list of venues we consider. 2. Most successful conferences gradually add workshops Google Scholar before/after the main event, which allow more papers (?) to be published. Fig 2 also shows the number of workshops associated with the list of conferences in our list. Libra (3314754) 3. The number of conferences and journals have increased Our data set significantly over the last 10-20 years. Fig 3 shows the (19907) number of venues in CiteSeerX’s venue impact factor DBLP CiteSeer report. CiteSeerX tracks a much larger field than net- (1293019) (1440519) working; based on our experience, we are assuming the growth of the networking field is strongly correlated to ISI that of the superset tracked by CiteSeerX. (87+Million) 2000 80 Total number of published papers Total number of associated workshops 1750 70 1500 60 # Associated workshops Figure 1: Illustration of several existing data sets. 1250 50 Venue name #years Total # papers 1000 40 Sigcomm (88-08) 21 612 750 30 Infocom (89-08) 20 4069 # Published papers Sigmetrics (89-08) 20 795 500 20 Elsevier CN (89-08) 20 2953 250 10 IEEE/ACM ToN (93-08) 16 1331 0 0 ICNP (93-08) 16 559 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Year MobiCom (95-08) 14 385 ICC (98-08) 11 7721 WWW (01-08) 8 1063 Figure 2: Total number papers and number of asso- IMC (01-08) 8 281 ciated workshops changing by year. NSDI (04-08) 5 138 Discussion - Likely reasons for paper inflation: We Table 1: Paper set. think the following are important reasons for paper inflation. 1. The number of authors is increasing. We will show The study of academic publication statistics is by no means some statistics in a later section on authorship.There new. Previous attention focused mostly in different areas of may be various reasons for the increase, but two comes science, especially physics. In fact, the field of this study to mind: (a) The success of Internet and WWW has is referred to as Scientometrics. The most influential work drastically raised people’s awareness and interest in was published in 1965 by Derek de Solla Price, [10] in which working on computer networking. This is sometimes he considered papers and citations as a network, and no- referred to as the dotcom effect. (b) Due the eco- ticed the citation distribution (degree distribution) follows nomic developments and opening-up of many devel- the power law. He tried to explain this phenomenon using a oping countries, most notably China, a large number simple model, later referred to as the model of preferential of authors are joining the research community overall. attachment (i.e. a paper is more likely to cite another paper with more existing citations). Other authors also turned 2. The rate of publication per author is edging up. In or- their attention to the network formed by authors who wrote der to raise the level of measurable output, many aca- papers together [13–17]. The difference in our study, other demic units or individual themselves strive to publish ACM SIGCOMM Computer Communication Review 35 Volume 40, Number 1, January 2010 4 1000 10 Our data set 900 800 3 10 700 600 2 500 10 400 # Listed venues R2 = 0.9952 300 1 # Papers with X citations 10 200 100 CiteSeerX Statistic Fitting curve (1993 − 2005) 0 0 10 0 1 2 3 4 1992 1994 1996 1998 2000 2002 2004 2006 2008 10 10 10 10 10 Year Citation count, X Figure 3: Number of listed venues by CiteSeerX in Figure 4: Distribution of number of citations (our each year.

“Publish Or Perish” in the Internet Age

Understanding the Value of Arts & Culture | the AHRC Cultural Value

A Comprehensive Framework to Reinforce Evidence Synthesis Features in Cloud-Based Systematic Review Tools

Technical Writing

Bibliometric Impact Measures Leveraging Topic Analysis

Benchmarking of Control Strategies for Wastewater Treatment Plants Layout, Controllers, Sensors, Performance Criteria and Test Procedures, I.E

Technical Reports, Working Papers, and Preprints

Technical Report Writing

Campus Climate Survey Validation Study Final Technical Report

Guidelines for the Production of Scientific and Technical Reports: How to Write and Distribute Grey Literature

Pisa-Based Test for Schools

A Framework to Monitor Open Science Trends in the EU

Guide for Writing Technical Reports