Venue Analytics: A Simple Alternative to Citation-Based Metrics

Leonid Keselman Carnegie Mellon University Pittsburgh, PA [email protected]

AI/ML HCI Robotics Theory Vision 12 9 7 12 6 8 10 6 10 5 7 8 5 8 6 NIPS (10.9) CHI (7.2) 4 RSS (6.2) STOC (11.7) CVPR (8.2) 4 5 6 ICML (9.5) CSCW (5.2) ICRA (5.2) 6 FOCS (11.0) ECCV (6.4) 3 4 AAAI (7.7) 3 UIST (5.0) WAFR (5.2) CCC (10.4) ICCV (6.2) 4 AISTATS (7.5) UbiComp (4.8) 2 IJRR (5.1) 4 SODA (10.2) 3 PAMI (5.5)

venue scores venue scores 2 venue scores venue scores venue scores UAI (5.8) ICWSM (4.6) HRI (4.1) APPROX-RANDOM (7.9) 2 IJCV (4.3) 2 2 IJCAI (5.5) 1 DIS (3.1) 1 IROS (3.8) COLT (7.8) 1 WACV (3.7) 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 year year year year year

Database IR/Web NLP Graphics Crypto 7 8 10 9 6 6 7 8 8 5 5 6 7 VLDB (7.8) WWW (6.8) EMNLP (6.6) 5 SIGGRAPH (7.6) 6 TCC (9.1) 6 4 4 SIGMOD (7.6) KDD (6.2) ACL (5.9) 4 SIGGRAPH Asia (7.4) 5 CRYPTO (7.2) ICDE (6.4) 3 CIKM (4.7) 3 NAACL (5.3) TOG (5.5) 4 EUROCRYPT (7.2) 4 3 PODS (5.9) ICWSM (4.6) TACL (4.8) TVCG (5.0) J. Cryptology (4.9) 2 2 3 venue scores CIDR (5.6) venue scores ICDM (4.5) venue scores COLING (3.3) venue scores 2 CGF (4.9) venue scores ASIACRYPT (4.0) 2 2 TDS (5.1) 1 SIGIR (4.2) 1 INTERSPEECH (1.7) 1 SCA (4.7) 1 PKC (3.8) 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 year year year year year

Figure 1: Results from our method, demonstrating differences and changes in conference quality over time in various subfields. These graphs includes size and year normalization and are the result of combining multiple different metrics of interest. ABSTRACT from citation measurements) can be inconsistent, even across these We present a method for automatically organizing and evaluating large, established providers [15]. the quality of different publishing venues in Computer Science. In this work, we propose a method for evaluating a comprehen- Since this method only requires paper publication data as its in- sive collection of published academic work by using an external put, we can demonstrate our method on a large portion of the evaluation metric. By taking a large collection of papers and using DBLP dataset, spanning 50 years, with millions of authors and thou- only information about their publication venue, who wrote them, sands of publishing venues. By formulating venue authorship as and when, we provide a largely automated way of discovering not a regression problem and targeting metrics of interest, we obtain only venue’s value. Further, we also develop a system for automatic venue scores for every conference and journal in our dataset. The organization of venues. This is motivated by the desire for an open, obtained scores can also provide a per-year model of conference reproducible, and objective metric that is not subject to some of the quality, showing how fields develop and change over time. Ad- challenges inherent to citation-based methods [10, 15, 22]. ditionally, these venue scores can be used to evaluate individual We accomplish this by setting up a linear regression from a academic authors and academic institutions. We show that using publication record to some metric of interest. We demonstrate three venue scores to evaluate both authors and institutions produces valid regression targets: status as a faculty member (a classification quantitative measures that are comparable to approaches using task), awarded grant amounts, and salaries. By using DBLP [32] as citations or peer assessment. In contrast to many other existing our source of publication data, NSF grants as our source of awards, evaluation metrics, our use of large-scale, openly available data University of California data for salaries, and CSRankings [6] for enables this approach to be repeatable and transparent. faculty affiliation status, we’re able to formulate these as large data arXiv:1904.12573v2 [cs.DL] 5 Jun 2019 To help others build upon this work, all of our code and data is regression tasks, with design matrix dimensions on the order of available at https://github.com/leonidk/venue_scores. a million in each dimension. However, since these matrices are sparse, regression weights can be obtained efficiently on a single laptop computer. Details of our method are explained in section4. 1 INTRODUCTION We call our results venue scores and validate their performance There exist many tools to evaluate professional academic scholar- in the tasks of evaluating conferences, evaluating professors, and ship. For example Elseiver’s Scorpus provides many author-level ranking universities. We show that our venue scores correlate and journal-level metrics to measure the impact of scholars and highly with other influence metrics, such as h-index24 [ ], cita- their work [14, 16]. Other publishers, such as the Public Library of tions or highly-influential citations [54]. Additionally, we show Science, provide article-level metrics for their published work [19]. that university rankings, derived from publication records correlate Large technology companies, such as Google and Microsoft provide highly with both established rankings [17, 37, 42] and with recently their own publicly available metrics for scholarship [11]. Even inde- published quantitative metrics [6,8, 12, 56]. pendent research institutes, such as the Allen Institute’s Semantic Scholar [2], manage their own corpus and metrics for scholarly productivity. However, these author-based metrics (often derived 2 RELATED WORK network-based techniques for ranking venues have included cita- Quantitative measures of academic productivity tend to focus on tion information [60], and are able to produce temporal models of methods derived from citation counts. By using citation count as the quality. In contrast, our proposed model doesn’t require citation primary method of scoring a paper, one can decouple an individual data to produce sensible venue scores. article from the authors who wrote it and the venue it was published Our work, in some ways, is most similar to that of CSRank- in. Then, robust citation count statistics, such as h-index [24], can ings [6]. CSRankings maintains a highly curated set of top-tier be used as a method of scoring either individual authors, or a venues; venues selected for inclusion are given 1 point per paper, specific venue. Specific critiques of h-index scores arose almost as while excluded venues are given 0 points. Additionally, university soon as the h-index was published, ranging from a claimed lack-of- rankings produced by CSRankings include a manually curated set utility [31], to a loss of discriminatory power [53]. of categories, and rankings are produced via a geometric mean Citations can also be automatically analyzed for whether or not over these categories. In comparison, in this work, we produce they’re highly influential to the citing paper, producing a "influen- unique scores for every venue and simply sum together scores for tial citations" metric used by Semantic Scholar [54]. Even further, evaluating authors and institutions. techniques from graph and network analysis can be used to under- In one of the formulations of our method, we use authors status stand systematic relationships in the citation graph [7]. Citation- as faculty (or not faculty) to generate our venue scores. We are based metrics can even be used to provide a ranking of different not alone in this line of analysis, as recent work has demonstrated universities [8]. that faculty hiring information can be used to generate university Citations-based metrics, despite their wide deployment in the prestige rankings [12]. scientometrics community, have several problems. For one, cita- Many existing approaches either focus only on journals [18], or tion behavior varies widely by fields10 [ ]. Additionally, citations do not have their rankings available online. For our dataset, we do often exhibits non-trivial temporal behavior [22], which also varies not have citation-level data available, so we are unable to compare greatly by sub-field. These issues highly affect one’s ability tocom- against certain existing methods on our dataset. However, as these pare across disciplines and produce different scores at different methods often deploy a variant of PageRank [39], we describe a times. Recent work suggests that citation-based metrics struggle PageRank baseline in Section 6.1 and report its results. to effectively capture a venue’s quality with a single number [57]. Comparing citation counts with statistical significance requires an 3 DATA order-of-magnitude difference in the citation counts [30], which lim- its their utility in making fine-grained choices. Despite these quality Our primary data is the dblp computer science bibliography [32]. issues, recent work [56] has demonstrated that citation-based met- DBLP contains millions of articles, with millions of authors across rics can be used to build a university ranking that correlates highly thousands of venues in Computer Science. We produced the results with peer assessment; we show that our method provides a similar in this paper by using the dblp-2019-01-01 snapshot. We restricted quality of correlation. ourselves to only consider conference and journal publications, Our use of straightforward publication data (Section3) enables skipping books preprints, articles below 6 pages and over 100 pages. a much simpler model. This simplicity is key, as the challenges in We also merged dblp entries corresponding to the same confer- maintaining good citation data have resulted in the major sources ence. This lead to a dataset featuring 2,965,464 papers, written by of h-index scores being inconsistent with one another [15]. 1,766,675 authors, across 11,255 uniquely named venues and 50 While there exist many forms of ranking journals such as Eigen- years of publications (1970 through 2019). factor [7] or SJR [18], these tend to focus on journal-level metrics Our first metric of interest is an individual’s status as a faculty while our work focuses on all venues, including conferences. member at a university. For this, we used the faculty affiliation data from CSRankings [6], which are manually curated and contain hundreds of universities across the world and about 15,000 profes- sors. For evaluation against other university rankings, we used the 2.1 Venue Metrics ScholarRank [56] data to obtain faculty affiliation, which contains We are not the first to propose that scholars and institutions can a more complete survey of American universities (including more be ranked by assigning scores to published papers[44]. However, than 50 not currently included in CS Rankings). While CSRank- in prior work the list of venues is often manually curated and ings data is curated to have correct DBLP names for faculty, the assigned equal credit, a trend that is true for studies in 1990s [23] ScholarRank data does not. To obtain a valid affiliation, the names and their modern online versions [6]. Instead, we propose a method were automatically aligned with fuzzy string matching, resulting for obtaining automatic scores for each venue, and in doing so, in about 4,000 faculty with good seemingly unique DBLP names requires no manual curation of valid venues. and correct university affiliation. Although those two methods are Previous work [58] has developed methods for ranking venues manually curated, automatic surveys of faculty affiliations have automatically, generating unique scores based only on author data recently been demonstrated [36]. by labeling and propagating notions of "good" papers and author- Our second metric of interest was National Science Foundation itative authors. However, this work required a manually curated grants, where we used awards from 1970 until 2018. This data is seed of what good work is. It was only demonstrated to work on available directly from the NSF [20]. We adjusted award amounts a small sub-field of conferences, as new publication cliques would using annual CPI inflation data. We restricted ourselves to awards require new labeling of "good" papers. Recent developments in that had finite amount, where we could match at least halfthe 2 Principal Investigators on the grant to DBLP names and the grant The resulting weights are often useful [13], and this methodology was above $20,000. Award amounts over 10 million dollars were is widely used in natural language processing [34, 41]. clipped in a smooth way to avoid matching to a few extreme outliers. This resulted in 407,012 NSF grants used in building our model. 4.1 Formal Setup Our third and final metric of interest was University of Califor- In general, we will obtain a score for each venue by setting up a nia salary data [28]. This was inspired by a paper that predicted linear regression task in the form of equation1. ACM/IEEE Fellowships for 87 professors and used salary data [38]. conf conf ... conf We looked at professors across the entire University of California 1 2 n system, matching their names to DBLP entries in an automated way. auth1 1 3 ... 0 1 x0 isProf1 © ... ª ©x ª © isProf ª We used the maximum salary amount for a given individual across auth2 ­ 1 0 1 1 ® ­ 1 ® ­ 2 ® . ­ . . . . . ® ­ . ® = ­ . ® (1) the 2015, 2016 and 2017 datasets, skipping individuals making less . ­ ...... ® ­ . ® ­ . ® . ­ . . . . ® ­ . ® ­ . ® than 120, 000 or over 800, 000 dollars. This resulted in 2,436 individ- authm 0 2 ... 4 1 xn isProfm uals, down from 3,102 names that we matched and an initial set of « ¬ « ¬ « ¬ about 20,000 initial professors. As DBLP contains some Chemistry, Authors are listed along the rows, and venues are listed along Biology, and Economics venues, we expect that some of these are the columns. The number of publications that an author has in a likely not Computer Science professors. conference is noted in the design matrix. There is an additional column of 1s to learn a bias offset in the regression. Different forms of counting author credit are discussed insec- Table 1: Spearman’s ρ correlation between rankings pro- tion 4.6, while different regression targets are discussed in section3. duced by targeting different metrics of interest. In equation1, the regression target is shown as a binary variable indicating whether or not that author is currently a professor. If Faculty NSF Salary this linear system, Ax = b is solved, then the vector x will con- tain real-valued scores for every single publishing venue. Since our Faculty 1.00 0.91 0.84 system is over-determined, there is generally no exact solution. NSF 0.91 1.00 0.86 Instead of solving this sparse linear system directly, we instead Salary 0.84 0.86 1.00 solve a regularized regression, using a robust loss and L2 regular- ization. That is, we iteratively minimize the following expression via stochastic gradient descent [45] 4 METHOD L(Ax,b) + λ||x||2 (2) Our basic model is that a paper in a given publication venue (either The L2 regularization enforces a Gaussian prior on the learned a conference or a journal), having passed peer review, is worth conference scores. We can perform this minimization in Python a certain amount of value. Certain venues are more prestigious, using common machine learning software packages [40]. We tend impactful or selective, and thus should have higher scores. Other to use a robust loss function, such as the Huber loss in the case of venues have less strict standards, or perhaps provide less opportu- regression [26], which is quadratic for errors of less than δ and nity to disseminate their ideas [35] and should be worth less. While linear for errors of larger than δ. It can be written as this model explicitly ignores the differences in paper quality ata ( 1 (y − yˆ)2, if |y − yˆ| ≤ δ publication venue, discarding this information enables the use of a ( ) 2 L yˆ,y = 1 2 (3) large quantity of data to develop a statistical scoring system. δ |y − yˆ| − 2δ , otherwise This methodology is not valid for all fields of science, nor all In the case of classification, we have labels y ∈ {−1, 1} and use the models of how impactful ideas are developed and disseminated. modified Huber loss [61], Our method requires individual authors have multiple publications ( across many different venues, which is more true in Computer max(0, 1 − yyˆ)2, if yyˆ ≥ −1 L(yˆ,y) = (4) Science than the natural sciences or humanities, where publishing −4yyˆ, otherwise rates are lower [33]. If instead we assume that all research ideas We experimented with other loss functions, such as the logistic produce only a single paper, or that passing peer-review is a noisy loss, and while they tended to produce similar rankings and results, measurement of quality [50], then our proposed method would not we found that the modified Huber loss provided better empirical work very well. Instead, the underlying process which supports our performance in our test metrics, even though the resulting curves methodology is that good research ideas produce multiple research looked very similar upon qualitative inspection. publications in selective venues; better ideas would produce more individual publications in higher quality venues. The concept of All models are wrong, but some are useful is our guide here. This assumption allows us to obtain venue scores in an automatic way, and then use these scores to evaluate both authors and institutions. Our approach to ranking venues is to construct a regression task, apply an optimization method to solve it, and use the resulting weights. Optimization’s ability to generate powerful representa- tions is well-studied in machine learning [47] and statistics [27]. 3 4.2 Metrics of Interest 0.5 0.5 As detailed in section3, we targeted three metrics of interest: status as a faculty member, NSF award sizes, and professor salaries. Each 0.4 0.4 of these metrics came from a completely independent data source, 0.3 0.3 and we found that they each had their own biases and strengths (more in section5). 0.2 0.2 For faculty status classification, we used the modified huber loss and CSRankings [6] faculty affiliations. To build venue scores 0.1 0.1 that reward top-tier conferences more highly, we only gave pro- 0.0 0.0 fessors in the top-k ranked universities positive labels. We tried 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 k = 5, 16, 40, 80, and found that we got qualitatively different results with quantitative similar performance. Unless otherwise stated, Figure 2: Truncated Gaussian ( = . ) used to splat a pub- we used k = 40. The university ranking used to select top-k was σ 4 5 lication’s value across multiple years. Examples centered at CSRankings itself, and included international universities, covering the year 2000 and the year 2018 the Americas, Europe and Asia. This classification is performed across all authors, leading to 1.7 million rows in our design matrix. For the NSF awards, every Principal Investigator on the award Normalization Methods had their papers up to the award year as the input features. We None Size 10 NIPS NIPS used a Huber loss, λ = 0.03, and experimented with normalizing 25 AAAI AAAI our award data to have zero mean and unit variance. Additionally, ICML 8 ICML IJCAI IJCAI 20 we built models for both raw award sizes and log of award sizes; the AISTATS AISTATS UAI 6 UAI raw award sizes seem to follow a power-law while the log award 15 sizes seem distributed as approximately a Gaussian. Another model 4 10 choice is whether to regress each NSF grant as an independent measurement or instead a marginal measurement which tracks the 5 2 venue scores (in standard deviations) venue scores (in standard deviations) cumulative total of NSF grants received by the authors. If not all 0 0 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 authors on the grant were matched to DBLP names, we only used year year the fraction of the award corresponding to the fraction of identified Year Year + Size

1 NIPS NIPS authors. This regression had ∼ million rows in its design matrix . 17.5 2 AAAI 8 AAAI ICML ICML For the salary data, we found that normalizing the salary data 15.0 IJCAI IJCAI AISTATS AISTATS to have zero mean and unit variance led to a very poor regres- 12.5 6 UAI UAI sion result, while having no normalization produced a good result. 10.0 4 This regression only had ∼ 2, 400 datapoints, and thus provided 7.5 information about fewer venues than the other metrics of interest. 5.0 2 2.5 venue scores (in standard deviations) venue scores (in standard deviations) 0.0 0 4.3 Modeling Change Over Time 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 year year In modeling conference values, we wanted to build a model that could adjust for different values in different years. For example, a venue may be considered excellent in the 1980s, but may have Figure 3: Results showing the effect of performing a normal- declined in influence and prestige since then. To account for this ization for venue year and size. See Sections 4.4 and 4.5. behavior, we break our dataset into chunks of n years and create a different regression variable for each conference for each chunk. 4.4 Normalizing Differences Across Years The non-temporal model is obtained simply by setting n ≥ 50. The temporal models described in the previous section have an We also examine a model that creates an independent variable inherent bias. Due to the temporal window of the DBLP publica- for each year, for each conference. After setting n = 1 in the block tion history, there is variation in distributed value due to changes model, we splat each publication as a Gaussian at a given year. annual NSF funding, the survivorship of current academic faculty, By modifying σ, we can control the smoothness of the obtained etc. To adjust for this bias, we scale each conference-year value weights. The Gaussian is applied via a sparse matrix multiply of the by the standard deviation of conference values in that year. This design matrix A with the appropriate band diagonal sparse matrix scaling can help or hurt performance, depending on which metric G. The use of a truncated Gaussian (where p < 0.05 is clipped of interest the model is built against. It generally produces flatter and the Gaussian is re-normalized) enables our matrix to maintain value scores over time but leads to some artifacts. The effects of this sparsity. We used a σ = 4.5, which produced an effective window normalization are shown in Figure3 and Table2. In our final ver- size of about 10 years. This can be seen visually in Figure2. sion, we instead normalize by the average of the top 10 conferences Our different temporal models are compared in Table5 by corre- in each year, which produced flatter results over time, lating against existing author-level, journal-level and university- level metrics. For evaluation details see Section6. 4 Table 2: Spearman correlation between our model and exist- Table 3: Correlation (Spearman’s ρ) between our model and ing metrics, showing the effect of different normalization Semantic Scholar [54], showing the properties of different schemes. See Sections 4.4 and 4.5 for model details. See Sec- authorship models. For details see section 4.6. tion6 for evaluation details. Evaluation Author Model Normalization influential h-index h-index h-index 1 2 3 4 citations (author) (university) (venue) 1 0.70 0.72 0.65 0.70 (author) 2 0.68 0.71 0.61 0.67 None 0.72 0.61 0.60 0.26 3 0.71 0.73 0.66 0.71 Year 0.72 0.66 0.58 0.18 Regression Model Size 0.74 0.63 0.61 0.25 Author 4 0.70 0.72 0.65 0.71 Year + Size 0.72 0.61 0.60 0.26

Table 4: Correlation between our model and traditional mea- 4.5 Normalizing Differences In Venue Size sures of scholarly output on the dataset of CMU faculty. For model details see Section 4.7. For evaluation details see Sec- Our model uses L2 regularization on the venue scores, which tends tion 6.4. to squash the value of variables with less explanatory power.This process often resulted in the under-valuing of smaller, more se- lective venues. To correct for this size bias, instead of giving each Model citations h-index [24] influential citations [54] 1 paper 1 point of value in the design matrix, we give each paper M Faculty 0.59 0.68 0.71 credit, where M is the number of papers at that venue in that year; NSF 0.63 0.66 0.67 this step is performed before Gaussian splatting. This produced Salary 0.36 0.36 0.41 rankings which emphasized small venues and provided a good no- Combined 0.69 0.77 0.75 tion of efficient venues. In practice, we instead used a blended result with a multiplier of √1 with α = 1.5849, the Hausdorff dimension α M of the Sierpinski triangle, an arbitrary constant. 4.7 Combining Models 4.6 Modeling Author Position Since the proposed metrics of interest (faculty status, NSF awards, Another question to consider is how credit for a paper is divided salaries) were generated from different independent regression up amongst the authors. We consider four models of authorship targets, with different sized design matrices, there may be value in credit assignment: combining them to produce a joint model. The value of ensemble 1 models is well documented in both theory [21] and practice [4]. In (1) Authors get n credit for each paper, where n is the number of authors on the paper. Used by [6]. the absence of a preferred metric with which to cross-validate our (2) All authors get full credit (1 point) for each paper model, we simply perform an unweighted average of our models to 1 1 1 1 obtain a gold model. To ensure that the weights are of similar scale, (3) Authors receive less credit for later positions ( 1 , 2 , 3 ··· n ), normalized so total credit sums to 1). This model assigns the conference scores are normalized to have zero mean and unit more credit to earlier authors and is used by certain practi- variance before combining them. Venue scores that are too large or tioners [25, 49]. too small are clipped at 12 standard deviations. For the temporal (4) The same as (3), except the last author is explicitly assigned models, this normalization is performed on a per-year basis. While equal credit with the first author before normalization. table4 shows results for a simple combination, one could average together many models with different choices of hyperparameters, Using Spearman correlation with Semantic Scholar’s "highly influ- regression functions, datasets, filters to scrub the data, etc. ential citations", an evaluation metric described in depth in Sec. 6.4, we can evaluate each of these models. Specifically, there are two places where venues scores require a selection of authorship model. 5 RESULTS The first is how much credit is assigned to each paper whenper- A visual example of some of venue scores is shown in Figure1. forming regression (in the case of our faculty metric of interest). The We kept the y-axis fixed across all the different Computer Science second is when evaluating authors with the obtained regression sub-disciplines to show how venue scores can be used to compare vector. See table3 for a summary of experimental results. For the different fields in a unified metric. There are additional resultsin purposes of evaluation, assigning full credit to authors (model 2) our qualitative demonstration of normalization methods, Figure3. produced the best results, while model 3 consistently produced the Due to the variation in rankings produced by one’s choice of lowest quality correlations. On the other hand, for the purposes of hyperparameters, and the large set of venues being evaluated, we do performing the classification task, the roles are flipped. Assigning not have a canonical set of rankings that can be presented succinctly full credit (model 2) consistently produces the worst quality correla- here. Instead, we will focus on quantitative evaluations of our results tions while using model 3 produces the highest quality correlations. in the following section. 5 Table 5: Correlation between rankings produced by our Table 6: Section 6.2. Kendall’s τ correlation across different model against rankings produced by traditional scholarly University Rankings. metrics. Different rows correspond to different hyperparam- eters choices for our model. Each column is a corresponds Ranking Correlation with to a traditional metric. AI = Author Highly Influential Ci- US News 2018 tations, AH = Author H-index, USN = US News 2018, VH = Venue H-index, VC = Venue Citations. USN2018 [37] 1.000 USN2010 [12] 0.928 Venue Scores 0.780 Years Metric AI AH USN VH VC ScholarRank [56] 0.768 σ = 4.5 Faculty 0.73 0.69 0.74 0.63 0.42 ScholarRankFull 0.757 10 0.67 0.57 0.76 0.57 0.35 CSMetrics [8] 0.746 50 0.75 0.68 0.76 0.38 0.21 CSRankings [6] 0.724 Times [17] 0.721 σ = 4.5 NSF 0.64 0.62 0.62 0.61 0.59 NRC95 [12] 0.713 10 0.68 0.68 0.60 0.59 0.60 t10Sum [56] 0.713 50 0.67 0.65 0.63 0.64 0.67 Prestige [12] 0.666 σ = 4.5 Salary 0.62 0.58 0.59 0.48 0.55 Citations [56] 0.665 10 0.65 0.62 0.57 0.45 0.55 Shanghai [43] 0.586 50 0.66 0.63 0.56 0.43 0.63 # of papers 0.585 BestPaper [25] 0.559 PageRankA 0.535 PageRankC 0.532 QS [42] 0.518 6 EVALUATION To validate the venue scores obtained by our regression methods, our evaluation consists of correlating our results against existing rankings and metrics. We consider three classes of existing scholarly measurements to correlate against: those evaluating universities, authors, and venues. Each of these classes has different standard 6.2 University Ranks techniques, and a different evaluation dataset, so they will be de- For this work, we produce university rankings simply as an evalu- scribed separately in Sections 6.2, 6.4, and 6.3. ation method to demonstrate the quality and utility of our venue In the case of our proposed method, venue scores, we have a scoring system. The reader is cautioned that university ranking simple way to turn them from a journal-based to an author-based systems can tend to produce undesirable gaming behavior [29], and or institution-based metric. Venues are evaluated directly with the are prone to manipulation. scores. Authors are evaluated as the dot product of venue scores We obtained and aligned many existing university rankings for and the publication vector of an author. Universities are evaluated Computer Science departments. These include rankings curated as the dot product of venue scores and the total publication vector by journalistic sources, such as the US News Rankings [37], the of all faculty affiliated with that university. QS Rankings [42], Shanghai Ranking [43], Times Higher Education Rankings [17] and the National Research Council report [12]. In addition, we consider purely quantitative evaluation systems such 6.1 PageRank Baseline as ScholarRank [56], CSRankings [6], CSMetrics [8], and Prestige Many existing approaches build on the idea of eigenvalue central- Rankings [12]. We additionally include ScholarRank’s t10sum across ity [7, 58, 60]. We implemented PageRank [39] using the power itera- the matched faculty that our venue scores result uses. tion method to compute a centrality measure to use for both author- We follow a recent paper [56], which demonstrated the efficacy level and venue-level metrics. Unlike most versions of PageRank, of a citation-based metric in producing rankings with large corre- which use citation counts, we implement two variants based solely lation against US News rankings. We extend these experiments to on co-authorship information. include more baselines. In contrast with [56], we use a rank corre- Author-level PageRank (PageRankA) is computed on the 1.7M x lation metric (namely Kendall’s τ ), which naturally handles ordinal 1.7M sized co-authorship graph, where an edge is added for every ranking systems. While ScholarRank [56] claimed a correlation of time two authors co-author a paper. We found that the authors > 0.9 with US News, this was under Pearson’s correlation coef- with highest centrality measures are often common names with ficient, and the result under Kendall’s τ is 0.768 in the published insufficient disambiguation information in DBLP. version and 0.757 using full precision ScholarRank. Journal-level PageRank (PageRankC) is computed on the 11,000 Our faculty-based regression is able to generate a result with x 11,000 co-authorship graph, where an edge is added for every the highest correlation against the US News rankings. We perform author who publishes in both venues. When ran on the unfiltered even better than ScholarRank, which was designed to optimize this DBLP data, the highest scoring venue was arXiv, an expected result. metric (although under a non-rank correlation metric). 6 Table 7: Spearman’s ρ correlation between different journal- paper count paper count (norm by annual paper totals) level metrics (N=1008). For details see Section 6.3. 35 8 30

25 papers citations h-index PageRank venue 6 20 C scores 15 4 papers 0.75 0.41 0.91 0.25 10 2 citations 0.75 0.60 0.69 0.27 5 h-index 0.41 0.60 0.37 0.65 annual value (4.5 sigma smoothing) 0 annual value (4.5 sigma smoothing) 0 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 PageRankC 0.91 0.69 0.37 0.27 year year venue scores 0.25 0.27 0.65 0.27 Judea Pearl Richard M. Karp Michael I. Jordan Leonidas J. Guibas Takeo Kanade Elisa Bertino

venue scores

6.3 Journal-level metrics 15.0 To evaluate the fidelity of our venue scores for journals and con- 12.5 ferences, we obtain the h-index [24] and citation count for 1,308 10.0 conferences from Microsoft Academic Graph [48]. We continue 7.5 to use Spearman’s ρ as our correlation metric, even though rank- 5.0 correlation metrics can be highly impacted by noisy data [1]. 2.5

Under this metric, venue scores correlated highly with h-index. annual value (4.5 sigma smoothing) 0.0 1970 1980 1990 2000 2010 2020 Notably, h-index and venue scores are highly correlated with each year other, even though venue scores are much less correlated with raw conference size. See Figure7 for detailed results. Figure 4: The career arcs of several accomplished Computer Scientists. The first row uses a simple model where all pa- 6.4 Author-level Metrics pers are all given equal weight; first using raw counts and To evaluate our venue scores in the application of generating author- then normalizing by the number of papers published each level metrics, we will use rank correlation (also known as Spear- year. The second row shows our model. man’s ρ)[52] between our venue scores and traditional author-level metrics such as h-index. Google Scholar was used to obtain cita- tions, h-index [24], i10-index, and Semantic Scholar used to obtain Venue scores have been shown to be robust against hyperparam- highly influential citations [54]. Prior work has critiqued the h- eter choices (Tables2,3,4,5). Even venue scores produced from index measure [59] and proposed an alternative metric, derived completely different data sources tend to look very similar (Table1). from a citation count. However, our use of a rank correlation means Additionally, venue scores can naturally capture the variation of that monotonically transformed approximations of citation counts conference quality over time (Figures1,3). would lead to identical scores. As with any inductive method, venue scores are data-driven For evaluation, we collected a dataset for the largest Computer and will be subject to past biases. For example, venue scores can Science department in CSRankings (n = 148). The results are shown clearly be biased by hiring practices, pay inequality and NSF funding in Table8. We can see that venue scores highly correlate with h- priorities. As these are the supervising metrics, bias in those datasets index, influential citations, i10-scores and CSRankings scores. The will be encoded in our results. For example, we found that the faculty results from the author-based PageRank are surprisingly similar hiring metric prioritized Theoretical Computer Science, while using to our venue scores. However, the conference-based PageRank NSF awards prioritized Robotics. The faculty classification task performed worse than venue scores on every correlation metric. may devalue publishing areas where candidates pursue industry jobs, while the NSF grant regression task may devalue areas with 7 DISCUSSION smaller capital requirements. By using large datasets and combining Our results show a medium to strong correlation of venue scores multiple metrics in a single model (Section 4.7), the final model against existing scholarly metrics, such as citation count and h- could reduce the biases in any individual dataset. index. For author metrics, venue scores correlate with influential Each of our metrics of interest has an inherent bias in timescale, citations [54] or h-index about as well as such measures correlate which our temporal normalization tries to correct for, but likely against each other or raw citation counts (see Table8). For venue does an incomplete job of. Salaries are often higher for senior fac- metrics, venue scores correlate with h-index (0.64) and citations ulty. NSF Awards can have a long response time and a preferences (0.61) nearly as well as citations correlate with h-index (0.66). For towards established researchers. Faculty classification prioritizes university metrics, venue scores correlate as well with measures of the the productive years of existing faculty. Additionally, faculty peer assessment as citation-based metrics do [56]. hiring as a metric will have a bias towards work from prestigious As h-index and citation counts have their flaws, obtaining perfect universities [12] and their venue preferences. Some of these issues correlation is not necessarily a desirable goal. Instead, these strong also exist in citation metrics, and may be why our uncorrected correlations serve as evidence for the viability of venue scores. models correlated better with them (Table2). 7 Table 8: Correlation between different author-level metrics for a dataset of professors (N=148). Details are inSection 6.4.

papers citations h-index i10 CSR [6] venue score PageRankA PageRankC influence [54] papers 0.66 0.79 0.81 0.71 0.94 0.94 0.89 0.76 citations 0.66 0.93 0.88 0.49 0.66 0.68 0.60 0.81 h-index 0.79 0.93 0.97 0.56 0.75 0.81 0.68 0.80 i10 0.81 0.88 0.97 0.53 0.75 0.82 0.69 0.73 CSR [6] 0.71 0.49 0.56 0.53 0.84 0.64 0.80 0.64 venue score 0.94 0.66 0.75 0.75 0.84 0.86 0.92 0.78 PageRankA 0.94 0.68 0.81 0.82 0.64 0.86 0.83 0.72 PageRankC 0.89 0.60 0.68 0.69 0.80 0.92 0.83 0.67 influence [54] 0.76 0.81 0.80 0.73 0.64 0.78 0.72 0.67

8 SIMILARITY METRICS metrics. As this system is based on easily obtainable, publicly avail- While the previous sections of this paper have focused on evalua- able data, it is transparent and reproducible. Our method builds tion, the same dataset can be used to organize venues into groups. on simple techniques and demonstrates that their application to For organization, we use a much smaller dataset, using data since large-scale data can produce surprisingly robust and useful tools 2005 and only evaluating the 1,155 venues that have at least 20 for scientometric analysis. R1 universities with faculty publishing in them. We then build the venue × author matrix, counting the number of papers each that ACKNOWLEDGMENTS author published in each venue. Performing a d-dimensional La- Martial Hebert suggested the use of correlation metrics as a tech- tent Dirichlet Allocation [9], we obtain a 50-dimensional vector nique for quantitative evaluation, thereby providing the framework representing each conference in a meaningful way. for every table of results in this paper. Emery Berger developed These vectors can then be clustered [3] to produce automatic CSRankings [6], which was highly influential in the design and categories for each conference. These high dimensional vectors can implementation of this project. Joseph Sill [51], Wayne Winston, also be embedded [55] into two dimensions to produce a visual map Jeff Sagarin and Dan Rosenbaum developed Adjusted Plus-Minus, a of Computer Science. See Figure5 for our result. These clusters sports analytics technique that partially inspired this work. Kevin- represent natural categories in Computer Science. For example, it is jeet Gill was unrelenting in advocating for a year-by-year regression easy to see groups that could be called Theory, Artificial Intelligence, model to avoid sampling and quantization artifacts. Machine Learning, Graphics, Vision, Parallel Computing, Software Engineering, Human-Computer Interaction, among many others. While some clusters are distinct and repeatable, others are not. When datasets contain challenging cases, ideal clustering can be hard to estimate [5]. Using silhouette scores [46], we can estimate how many natural clusters of publishing exist in Computer Science. In our experiments, silhouette scores were maximized with 20 to 28 clusters. As the clustering process is stochastic, we were unable to determine the optimal cluster number with statistical significance. By embedding each author with the weighted average of their publication’s vectors, we can also obtain a fingerprint that shows which areas of Computer Science each university focuses on. See Figure6 for an example of such fingerprints for many top depart- ments. The same clustering method can be used to analyze the focus areas of a single department, for an example see Figure7.

9 CONCLUSION We have presented a method for ranking and organizing a schol- arly field based only on simple publication information- namely a list of papers, each labeled with only their published venue, au- thors, and year. By regressing venue scores from metrics of interest, one obtains a plausible set of venue scores. These scores can be compared across sub-fields and aggregated into author-level and institution-level metrics. The scores provided by this system, and their resulting rankings, correlate highly with other established 8 Figure 5: Automatic clustering for venues in Computer Science, the largest venues in each cluster are labeled.

9 Carnegie Mellon University Massachusetts Institute of Technology Stanford University University of California - Berkeley Univ. of Illinois at Urbana-Champaign University of Michigan

Cornell University University of Washington Georgia Institute of Technology Tsinghua University ETH Zurich University of California - San Diego

University of Maryland - College Park University of Wisconsin - Madison National University of Singapore Columbia University University of Pennsylvania University of Toronto

University of Southern California University of Texas at Austin Northeastern University Princeton University University of California - Los Angeles Technion

Purdue University University of Edinburgh University of Massachusetts Amherst University of Waterloo EPFL Max Planck Institute

KAIST Peking University New York University Harvard University University of British Columbia University of California - Irvine

Figure 6: Heatmap showing the differences in research focus across different universities. Figure5 can be used as a guide.

CSD L Yao L Dabbish R Kraut N MichaelA Kelly J JForlizzi Zimmerman M Mason A Kittur RI H Choset S KieslerC Harrison M ErdmannW Whittaker G KaufmanJ Hammer H Geyer J BighamS Hudson MLD D WettergreenC Riviere M Goel D Held C Kulkarni C Atkeson J Hong M LikhachevN Pollard B Myers LTI I Nourbakhsh K Crane J Hodgins J Bagnell J McCann HCI A Dubrawski A Steinfeld ISR C Liu L Kara H Admoni BIO

AP RavikumarTalwalkar R Salakhutdinov S Balakrishnan T Berg-KirkpatrickY Tsvetkov B PóczosA Singh J Herbsleb A Black B Vasilescu J KolterS Kim E HovyG Neubig G Gordon E Nyberg C Goues R RosenfeldA Waibel C Kästner E Xing RB SternRaj M Shaw M Hebert M Balcan J Baker D Garlan T Kanade J Callan T Breaux K FragkiadakiK Kitani T Lee Y Yang A Gupta I Gkioulekas S Lucey D Ramanan T MitchellAT Procaccia Sandholm S NarasimhanY SheikhM O'Toole WJ Carbonell Cohen F Fang J MostowC Rosé K Sycara LJ Morency Cassell A Ogan C Faloutsos R ReddyN Sadeh K VKoedinger AlevenJ Stamper J Ma R Murphy K CarleyK Zhang Z CBar-Joseph Langmead R Dannenberg C Kingsford A Pfenning L Ahn

V Goyal R Rajkumar N Shah A Pavlo K Rashmi Y Agarwal D FSickerP Jahanian Narasimhan B HaeuplerV Guruswami P Steenkiste P Gibbons R O'Donnell D O'Hallaron G BlellochB MoseleyA GuptaD WoodruffM Shamos M Harchol-Balter GD Miller SleatorM BlumS Rudich M Satyanarayanan R BryantDS Siewiorek Goldstein J Morris L Blum J Hoe D Scott G Fedder G Ganger D MarculescuR Marculescu S Seshan DV Andersen Sekar J Sherry

W Scherlis J AldrichH MillerU RAcar Harper K Crary N BeckmannTB Mowry Lucia J Hoffmann E Clarke F PfenningA Platzer B Parno S Brookes LL Cranor BauerD Brumley A Datta A AcquistiM FredriksonV Gligor

Figure 7: An embedding of Carnegie Mellon University’s School of Computer Science with colors indicating sub- departments. For example, the Robotics Institute (RI) has clear clusters for Robotics, Graphics, CV and HRI.

10 REFERENCES [31] Sune Lehmann, Andrew D Jackson, and Benny E Lautrup. 2006. Measures for [1] Mokhtar Bin Abdullah. 1990. On a Robust Correlation Coefficient. Journal measures. Nature 444, 7122 (2006), 1003. of the Royal Statistical Society. Series D (The Statistician) 39, 4 (1990), 455–460. [32] Michael Ley. 2002. The DBLP Computer Science Bibliography: Evolution, Re- http://www.jstor.org/stable/2349088 search Issues, Perspectives. In String Processing and Information Retrieval, Alberto [2] Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Craw- H. F. Laender and Arlindo L. Oliveira (Eds.). Springer Berlin Heidelberg, 1–10. ford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, [33] Sarah Theule Lubienski, Emily K. Miller, and Evthokia Stephanie Saclarides. 2018. Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Sex Differences in Doctoral Student Publication Rates. Educational Researcher 47, Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Wang, Chris Will- 1 (2018), 76–81. https://doi.org/10.3102/0013189X17738746 helm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni. 2018. Construction [34] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient of the Literature Graph in Semantic Scholar. In NAACL HLT. Association for estimation of word representations in vector space. ICLR (2013). Computational Linguistics, 84–91. https://doi.org/10.18653/v1/N18-3011 [35] Allison C. Morgan, Dimitrios J. Economou, Samuel F. Way, and Aaron Clauset. [3] David Arthur and Sergei Vassilvitskii. 2007. K-means++: The Advantages of 2018. Prestige drives epistemic inequality in the diffusion of scientific ideas. EPJ Careful Seeding. In SODA. SIAM, Philadelphia, PA, USA, 1027–1035. Data Science 7, 1 (19 Oct 2018), 40. [4] Robert M. Bell, Yehuda Koren, and Chris Volinsky. 2009. The BellKor solution to [36] Allison C. Morgan, Samuel F. Way, and Aaron Clauset. 2018. Automatically the Netflix Prize. assembling a full census of an academic field. PLOS ONE 13, 8 (08 2018), 1–18. [5] Shai Ben-David. 2015. Clustering is Easy When ....What? arXiv e-prints, Article https://doi.org/10.1371/journal.pone.0202223 arXiv:1510.05336 (Oct 2015), arXiv:1510.05336 pages. arXiv:stat.ML/1510.05336 [37] US News and World Report. 2018. Best Computer Science Schools. [6] Emery Berger. 2018. CSRankings: Computer Science Rankings. http://csrankings. https://www.usnews.com/best-graduate-schools/top-science-schools/ org/. computer-science-rankings. [7] Carl Bergstrom. 2007. Eigenfactor: Measuring the value and prestige of scholarly [38] Andrew Nocka, Danning Zheng, Tianran Hu, and Jiebo Luo. 2014. Moneyball for journals. College & Research Libraries News 68, 5 (2007), 314–316. Academia: Toward Measuring and Maximizing Faculty Performance and Impact. [8] Steve Blackburn, Carla Brodley, H. V. Jagadish, Kathryn S McKinley, Mario In 2014 IEEE International Conference on Data Mining Workshop. 193–197. Nascimento, Minjeong Shin, Sean Stockwel, Lexing Xie, and Qiongkai Xu. 2018. [39] Lawrence Page, Sergey Brin, , and Terry Winograd. 1999. The csmetrics.org: Institutional Publication Metrics for Computer Science. https: PageRank citation ranking: Bringing order to the web. Technical Report. Stanford. //github.com/csmetrics/csmetrics.org/blob/master/docs/Overview.md. [40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. [9] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- Journal of machine Learning research 3, Jan (2003), 993–1022. napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine [10] Lutz Bornmann and Hans-Dieter Daniel. 2008. What do citation counts measure? Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. A review of studies on citing behavior. J. Doc. 64, 1 (2008), 45–80. [41] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: [11] D. Butler. 2011. Computing giants launch free science metrics. Nature 476, 7358 Global vectors for word representation. In EMNLP. 1532–1543. (Aug 2011), 18. [42] QS World University Rankings. 2018. Computer Science & Infor- [12] Aaron Clauset, Samuel Arbesman, and Daniel B Larremore. 2015. Systematic mation Systems. https://www.topuniversities.com/university-rankings/ inequality and hierarchy in faculty hiring networks. Science Advances (2015). university-subject-rankings/2018/computer-science-information-systems. [13] Adam Coates and Andrew Y Ng. 2011. The importance of encoding versus [43] Shanghai Rankings. 2015. Academic Ranking of World Universities in Computer training with sparse coding and vector quantization. In ICML. 921–928. Science. http://www.shanghairanking.com/SubjectCS2015.html. [14] Lisa Colledge, Félix de Moya-Anegón, Vicente P Guerrero-Bote, Carmen López- [44] Jie Ren and Richard N Taylor. 2007. Automatic and versatile publications ranking Illescas, Henk F Moed, et al. 2010. SJR and SNIP: two new journal metrics in for research institutions and scholars. Commun. ACM 50, 6 (2007), 81–85. Elsevier’s Scopus. Insights 23, 3 (2010), 215. [45] Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method. [15] Jaime A Teixeira da Silva and Judit Dobránszki. 2018. Multiple versions of the The Annals of Mathematical Statistics 22, 3 (1951), 400–407. h-index: Cautionary use for formal academic purposes. Scientometrics 115, 2 [46] Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation (2018), 1107–1113. and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53 – 65. [16] Jaime A Teixeira da Silva and Aamir Raoof Memon. 2017. CiteScore: A cite for https://doi.org/10.1016/0377-0427(87)90125-7 sore eyes, or a valuable, transparent metric? Scientometrics 111, 1 (2017), 553–556. [47] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning [17] Times Higher Education. 2018. World University Rankings, Computer Sci- representations by back-propagating errors. Nature 323, 6088 (1986), 533–536. ence. https://www.timeshighereducation.com/world-university-rankings/2018/ [48] Boris Schauerte. 2014. Microsoft Academic: conference field ratings. http://www. subject-ranking/computer-science. conferenceranks.com/visualization/msar2014.html. [18] Matthew E. Falagas, Vasilios D. Kouranos, Ricardo Arencibia-Jorge, and Drosos E. [49] Cagan H. Sekercioglu. 2008. Quantifying Coauthor Contributions. Science 322, Karageorgopoulos. 2008. Comparison of SCImago journal rank indicator with 5900 (2008), 371–371. https://doi.org/10.1126/science.322.5900.371a journal impact factor. The FASEB Journal 22, 8 (2008), 2623–2628. [50] Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, and Ulrike [19] Martin Fenner. 2013. What can article-level metrics do for you? PLoS biology 11, von Luxburg. 2018. Design and Analysis of the NIPS 2016 Review Process. JMLR 10 (2013), e1001687. 19, 49 (2018), 1–34. http://jmlr.org/papers/v19/17-511.html [20] National Science Foundation. 2018. Download Awards by Year. https://www.nsf. [51] Joseph Sill. 2010. Improved NBA adjusted+/-using regularization and out-of- gov/awardsearch/download.jsp. sample testing. In MIT Sloan Sports Analytics Conference. [21] Yoav Freund and Robert E Schapire. 1997. A Decision-Theoretic Generalization [52] Charles Spearman. 1904. The proof and measurement of association between of On-Line Learning and an Application to Boosting. J. Comput. System Sci. 55, 1 two things. The American journal of psychology 15, 1 (1904), 72–101. (1997), 119 – 139. https://doi.org/10.1006/jcss.1997.1504 [53] Richard S. J. Tol. 2008. A rational, successive g-index applied to economics [22] Sebastian Galiani and Ramiro H Gálvez. 2017. The life cycle of scholarly articles departments in Ireland. J. Informetrics 2 (2008), 149–155. across fields of research. Technical Report. National Bureau of Economic Research. [54] Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying meaningful cita- [23] Robert Geist, Madhu Chetuparambil, Stephen Hedetniemi, and A. Joe Turner. tions. In AAAI Workshop: Scholarly Big Data. 1996. Computing Research Programs in the U.S. Commun. ACM 39, 12 (Dec. [55] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using 1996), 96–99. https://doi.org/10.1145/240483.240505 t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605. [24] J E Hirsch. 2005. An index to quantify an individual’s scientific research output. [56] Slobodan Vucetic, Ashis Kumar Chanda, Shanshan Zhang, Tian Bai, and Anirud- Proc. Natl. Acad. Sci. U. S. A. 102, 46 (nov 2005), 16569–16572. https://doi.org/10. dha Maiti. 2018. Peer Assessment of CS Doctoral Programs Shows Strong Corre- 1073/pnas.0507655102 lation with Faculty Citations. Commun. ACM 61, 9 (Aug. 2018), 70–76. [25] Jeff Huang. 2018. Best Paper Awards in Computer Science (since 1996). https: [57] W. H. Walters. 2017. Citation-Based Journal Rankings: Key Questions, Metrics, //jeffhuang.com/best_paper_awards.html. and Data Sources. IEEE Access 5 (2017), 22036–22053. https://doi.org/10.1109/ [26] Peter J. Huber. 1964. Robust Estimation of a Location Parameter. Ann. Math. ACCESS.2017.2761400 Statist. 35, 1 (03 1964), 73–101. https://doi.org/10.1214/aoms/1177703732 [58] Su Yan and Dongwon Lee. 2007. Toward Alternative Measures for Ranking [27] Peter J. Huber. 1985. Projection Pursuit. The Annals of Statistics 13, 2 (1985), Venues: A Case of Database Research Community. In Proceedings of the 7th 435–475. http://www.jstor.org/stable/2241175 ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07). ACM, New York, [28] Nevada Policy Research Institute. 2018. Transparent California. https:// NY, USA, 235–244. transparentcalifornia.com/agencies/salaries/#university-system. [59] Alexander Yong. 2014. Critique of Hirsch’s citation index: A combinatorial Fermi [29] Jill Johnes. 2018. University rankings: What do they really show? Scientometrics problem. Notices of the AMS 61, 9 (2014), 1040–1050. 115, 1 (01 Apr 2018), 585–606. https://doi.org/10.1007/s11192-018-2666-1 [60] Fang Zhang and Shengli Wu. 2018. Ranking Scientific Papers and Venues in [30] Michael J. Kurtz and Edwin A. Henneken. 2017. Measuring metrics - a 40- Heterogeneous Academic Networks by Mutual Reinforcement. In JCDL. ACM, year longitudinal cross-validation of citations, downloads, and peer review in New York, NY, USA, 127–130. astrophysics. JAIST 68, 3 (2017), 695–708. https://doi.org/10.1002/asi.23689 [61] Tong Zhang. 2004. Solving Large Scale Linear Prediction Problems Using Sto- chastic Gradient Descent Algorithms. In ICML. ACM, New York, NY, USA, 116–. 11 A CREDIT ASSIGNMENT Aging Curve for Authors In order to address issues of collinearity raised by having authors 2.0 who publish papers together, we wanted to solve a credit assignment problem. We adapted a well-known method for addressing this by 1.5 adapting regularized plus minus [51] from the sports analytics literature. In our case, we simply regress each publications values 1.0 from its authors, as shown below.

author1 author2 ... authorn 0.5 paper1 1 1 ... 0 x1 score1 paper2 © 1 0 ... 0 ª ©x2 ª ©score2 ª 0.0 ­ ® ­ ® = ­ ® 0 10 20 30 40 50 . ­ . . . . . ® ­ . ® ­ . ® average annual value generated . ­ . . . . ® ­ . ® ­ . ® years since first publication ­ ® ­ ® ­ ® paper 0 1 ... 1 x score m « ¬ « n¬ « n¬ This technique produced scores that correlated highly with total Figure 9: The average productivity of all DBLP authors for value scores. While this may be valuable for understanding an that year of their publishing career. individual’s contribution, but we were unable to find an evaluation for this method. The form of this model that considers n-wise or pairwise relationships was also considered, but these matrices were Optimal Number of Clusters too large for us to run trials. 0.675 0.650 B AGING CURVE 0.625 To evaluate if our model makes a sensible prediction over the timescale of a scholar’s career, we built a model to see what an 0.600 average academic career looks like, given that the author is still 0.575 publishing in those years. See Figure9. Our model suggests a rise Silhouette Score 0.550 in productivity for the first 20 years of one’s publishing history, and then a steady decline. 0.525 15 20 25 30 35 40 45 50 Number of Clusters

Figure 8: Curve showing the natural number of clusters in the dataset using R1 authors

12 Systems Data Mining Computer Vision Parallel 9 7 10 8 10 6 7 8 8 5 6

6 4 5 6 INFOCOM (7.6) WWW (7.1) CVPR (8.0) IPDPS (5.8) 4 IEEE/ACM TON (7.4) 3 KDD (6.1) ECCV (6.4) IEEE-TPDS (4.9) 4 4 MobiCom (6.0) SDM (5.0) 3 ICCV (0.0) SC (4.3) MobiSys (5.9) 2 IEEE-TKDE (4.9) PAMI (5.2) IPDPS Workshops (4.9) 2 2 SIGMETRICS (5.3) CIKM (4.3) IJCV (3.8) 2 HPDC (4.1) 1 IEEE TMC (5.4) ICDM (4.3) 1 WACV (3.9) IEEE Trans. Com (3.7)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 EE Theory CAD Robotics HCI 7 6 7 8 6 6 5 5 5 6 4 4 4 Allerton (9.2) DAC (5.2) RSS (6.4) CHI (7.3) 3 4 3 ISIT (6.4) ICCAD (5.0) ICRA (5.2) 3 CSCW (0.0) IEEE TIT (6.0) Proceedings of (5.1) IJRR (5.3) UIST (5.0) 2 ICASSP (3.8) 2 ICCD (4.6) HRI (4.2) 2 UbiComp (4.8) 2 ACSSC (5.2) ISLPED (4.0) WAFR (0.0) CHI Extended Ab (3.5) 1 1 ITW (3.8) IEEE TCAD (3.9) IROS (3.8) 1 DIS (3.1)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 Theory Parallelism Cyber Physical AI 12 12 5 10 8 10 4 8 8 6 3 6 STOC (11.1) SPAA (6.0) HSCC (5.3) 6 AAAI (7.8) FOCS (10.1) 4 SIGACT News (4.5) CDC (4.6) EC (6.1) 4 CCC (9.7) Algorithmica (3.6) 2 IEEE-TAC (3.7) 4 IJCAI (5.6) SODA (10.2) LATIN (2.5) ACC (3.7) AAMAS (4.9) 2 2 SIAM JC (9.9) Theor. Comput. (2.2) 1 SIAM Journal on (2.8) 2 SIGecom Exchang (4.1) J. ACM (8.0) Inf. Comput. (2.4) IEEE-CNS (4.9) ACM TEAC (6.0)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 NLP Parallelism Graphics Computer Science 7 10 8 6 7 8 8 5 6 6 4 6 5 EMNLP (6.8) PODC (5.4) 4 SIGGRAPH (7.4) Commun. ACM (6.7) 3 ACL (6.0) 4 DISC (4.1) SIGGRAPH Asia (8.1) 4 IEEE Computer (3.2) NAACL (5.6) Distributed Com (2.7) 3 TOG (5.5) Advances in Com (1.3) 2 TACL (5.4) OPODIS (1.6) 2 TVCG (4.7) SCIENCE CHINA I (1.3) COLING (3.4) 2 ICDCN (1.6) CGF (4.9) 2 CIG (1.1) 1 Computational L (2.7) SSS (0.8) 1 IEEE Vis (4.8) CEC (0.7)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 Software Engineering Computational Biology Machine Learning Networking 10 8 7 10 7 6 8 6 8 5 5 6 4 6 4 SIGSOFT FSE (6.3) RECOMB (5.4) NIPS (10.9) HotNets (9.7) ICSE (5.9) 3 JCB (3.0) ICML (9.7) 4 CCS (8.7) 3 ISSTA (5.2) WABI (3.8) 4 COLT (7.9) NSDI (8.9) 2 2 ASE (4.6) Bioinformatics (2.9) AISTATS (7.7) SIGCOMM (7.7) ECOOP (3.3) IEEE TCBB (2.7) 2 JMLR (6.0) 2 USENIX Security (7.8) 1 IEEE Trans. Sof (2.9) 1 BCB (3.0) UAI (5.5) NDSS (7.9)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 Programming Languages Databases Architecture Computational Geometry 10 10 8 8 8 8 6 6 6 6 PLDI (7.9) VLDB (7.6) ASPLOS (8.7) SCG (4.1) 4 4 4 OOPSLA (7.7) SIGMOD (7.6) HotOS (6.6) DCG (2.5) 4 POPL (7.5) ICDE (5.8) MICRO (6.1) CCCG (2.2) CAV (5.3) PODS (5.6) ISCA (6.2) Comput. Geom. (2.0) 2 2 2 VMCAI (3.6) 2 CIDR (5.5) HPCA (5.8) ISAAC (2.3) TPLS (3.4) TDS (4.9) PPOPP (5.5) WADS (0.0)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 Crypto 3.0 1.2 2.0 8 2.5 1.0 2.0 1.5 6 0.8

TCC (9.1) 1.5 0.6 1.0 IVA (1.9) 4 CRYPTO (6.8) ACII (0.0) EUROCRYPT (7.3) Softw. Test., V (1.2) 1.0 IEEE Trans. Inf (2.9) SocInfo (1.9) J. Cryptology (4.8) 0.4 SEKE (0.7) WIFS (1.0) 0.5 JASIST (1.0) 2 ASIACRYPT (3.6) International J (0.8) ICB (0.8) 0.5 ASIST (0.0) PKC (4.0) 0.2 CSEE&T (0.0) BTAS (0.9)

1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020

Figure 10: An automatic clustering and rating of the entirity of CS.

13 Table 9: Top 55 Venues

Name Score Size STOC 11.71 128 FOCS 11.04 120 NIPS 10.94 484 Conference on Computational Complexity 10.41 53 SODA 10.17 224 SIAM J. Comput. 9.98 138 HotNets 9.53 46 ICML 9.46 303 TCC 9.06 76 NSDI 8.76 69 CCS 8.76 114 INFOCOM 8.59 434 ITCS 8.56 103 Allerton 8.53 331 ASPLOS 8.43 48 SIGCOMM 8.31 55 CVPR 8.22 606 PLDI 8.05 68 J. ACM 8.03 82 Theory of Computing 7.94 29 USENIX Security Symposium 7.91 62 NDSS 7.86 63 APPROX-RANDOM 7.86 74 COLT 7.78 82 VLDB 7.75 185 IEEE/ACM Trans. Netw. 7.74 226 AAAI 7.73 403 SIGMOD Conference 7.62 98 SIGGRAPH 7.59 125 AISTATS 7.52 138 OOPSLA 7.41 75 SIGGRAPH Asia 7.38 140 CRYPTO 7.24 80 EUROCRYPT 7.24 78 POPL 7.22 68 CHI 7.15 431 IEEE SSP 7.11 54 WWW 6.82 232 HotOS 6.75 27 EMNLP 6.61 263 EC 6.60 63 Commun. ACM 6.59 222 GetMobile 6.50 41 IMC 6.49 70 OSDI 6.46 29 ICDE 6.44 183 SIGSOFT FSE 6.42 93 ECCV 6.42 252 ICALP 6.40 148 ISIT 6.28 1032 KDD 6.23 240 ICCV 6.22 292 Robotics: Science and Systems 6.20 88 MICRO 6.19 62 14 ISCA 6.18 73 Table 10: Top 55 Universities school authors papers venue score size normed Carnegie Mellon University 174 17275 73126 14159 University of California - Berkeley 108 11011 51062 10884 Massachusetts Institute of Technology 108 9613 47037 10026 Univ. of Illinois at Urbana-Champaign 103 11248 44714 9627 Technion 96 8795 39601 8656 Stanford University 69 7028 36169 8513 Georgia Institute of Technology 108 9337 38083 8118 Tsinghua University 150 14643 40662 8104 University of California - Los Angeles 46 6589 30368 7888 University of Michigan 83 7897 33666 7598 Tel Aviv University 48 5468 28816 7404 University of California - San Diego 74 6646 30836 7142 University of Maryland - College Park 76 7751 30795 7089 ETH Zurich 39 6673 26120 7081 Cornell University 81 5918 30278 6871 University of Washington 69 5727 29087 6846 University of Southern California 58 7206 26044 6387 Columbia University 53 5064 25251 6330 EPFL 55 6927 25320 6290 Princeton University 47 4991 24203 6252 HKUST 57 6640 25134 6190 National University of Singapore 75 7210 25123 5801 University of Pennsylvania 53 5340 22696 5690 University of California - Irvine 71 6624 24070 5628 Pennsylvania State University 49 5668 21325 5451 Peking University 147 11858 27050 5413 University of Toronto 99 5955 24759 5376 University of Texas at Austin 52 4485 21225 5346 University of California - Santa Barbara 38 4698 19137 5224 University of Waterloo 105 7316 23781 5099 Max Planck Institute 32 4307 17808 5093 Purdue University 71 5659 21734 5082 New York University 67 4915 20752 4918 Rutgers University 63 4935 20311 4884 Nanyang Technological University 71 9143 20827 4870 University of Wisconsin - Madison 54 3988 18986 4738 University of Massachusetts Amherst 57 4129 19155 4717 Hebrew University of Jerusalem 42 3316 17478 4647 University of California - Riverside 43 4230 17114 4522 Shanghai Jiao Tong University 77 8220 19652 4511 Zhejiang University 89 7531 20232 4496 Ohio State University 42 4121 16740 4451 Duke University 23 2879 13936 4385 Chinese Academy of Sciences 46 5795 16497 4285 University of Minnesota 43 4346 15855 4190 Northwestern University 49 3993 16183 4137 Stony Brook University 47 4565 15819 4086 University of Edinburgh 115 6461 19292 4058 University of Tokyo 80 7524 17762 4042 University of Oxford 69 5549 17160 4039 Chinese University of Hong Kong 32 3927 13843 3959 TU Munich 53 6432 15521 3891 KAIST 81 5591 17114 3884 Harvard University 34 2471 13640 3837 15 Imperial College London 65 6622 15834 3779 Table 11: Top 50 Authors

Name Score Philip S. Yu 4354 H. Vincent Poor 3873 Kang G. Shin 3540 Jiawei Han 0001 3143 Micha Sharir 3086 Thomas S. Huang 2992 Don Towsley 2913 Xuemin Shen 2755 Luc J. Van Gool 2700 2596 Christos H. Papadimitriou 2531 Leonidas J. Guibas 2525 Mahmut T. Kandemir 2467 Jie Wu 0001 2452 Wen Gao 0001 2416 Shuicheng Yan 2373 Rama Chellappa 2286 Alberto L. Sangiovanni-Vincentelli 2266 Georgios B. Giannakis 2266 Yunhao Liu 2257 Mohamed-Slim Alouini 2238 Michael I. Jordan 2213 Christos Faloutsos 2173 Massoud Pedram 2153 Dacheng Tao 2149 Hans-Peter Seidel 2145 Luca Benini 2139 Moshe Y. Vardi 2092 Lajos Hanzo 2081 Yishay Mansour 2073 Sudhakar M. Reddy 2070 Sajal K. Das 2045 Pankaj K. Agarwal 2042 2024 Robert E. Tarjan 2019 Irith Pomeranz 2003 Elisa Bertino 2003 Zhu Han 1992 Shlomo Shamai 1976 David Peleg 1974 Victor C. M. Leung 1966 Kaushik Roy 0001 1949 Xiaoou Tang 1938 Hector Garcia-Molina 1923 Larry S. Davis 1918 Rafail Ostrovsky 1895 Eric P. Xing 1881 Ness B. Shroff 1879 Krishnendu Chakrabarty 1868 Hai Jin 0001 1867

16