Statistical Modelling of Citation Exchange Between Statistics Journals
Total Page:16
File Type:pdf, Size:1020Kb
52 Discussion on the Paper by Varin, Cattelan and Firth Pengsheng Ji (University of Georgia, Athens), Jiashun Jin (Carnegie Mellon University, Pittsburgh) and Zheng Tracy Ke (University of Chicago) We congratulate Varin, Cattelan and Firth for a very stimulating paper. They use the Stigler model on cross-citation data and provide a mode-based method to rank statistical journals. Their approach allows for evaluation of uncertainty of rankings and sheds light on how to avoid overinterpretation of the insignificant difference between journal rankings. In a related context, we study social networks for authors (instead of journals) with a data set that we collect (based on all papers in the Annals of Statistics, Biometrika, the Journal of the American Statistical Association and the Journal of the Royal Statistical Society, Series B, 2003–2012). The data set will be publicly available soon. The data set provides a fertile ground for studying networks for statisticians. In Ji and Jin (2014) we have presented results including (a) ‘hot’ authors and papers, (b) many meaningful communities and (c) research trends. Here, we report results only on community detection of the citation network (for authors). Intuitively, network communities are groups of nodes that have more edges within than across (Jin, 2015). The goal of community detection is to identify such groups (i.e. clustering). We have analysed the citation network with the method of directed scores (Ji and Jin, 2014; Jin, 2015) and identified three meaningful communities; Fig. 16. The first community is ‘large-scale multiple testing’, including (a) a Bayesian group, James Berger and Peter Muller,¨ (b) a Carnegie Mellon group, Christopher Genovese, Jiashun Jin, Isabella Verdinelli and Larry Wasser- man, (c) a causal inference group, Donald Rubin and Paul Rosenbaum, (d) three Berkeley–Stanford groups, (i) Bradley Efron, David Siegmund and John Storey, (ii) David Donoho, Iain Johnstone, Mark Low (University of Pennsylvania) and John Rice and (iii) Eric Lehmann and Joseph Romano, and (e) a Tel Aviv group, Felix Abramovich, Yoav Benjamini, Abba Krieger (University of Pennsylvania) and Daniel Yekutieli. The second community is ‘spatial statistics’ and can be further split into three subgroups: (a) a non-parametric spatial statistics subgroup, including David Blei, Alan Gelfand, Yi Li and Trivel- lore Raghunathan; (b) a parametric spatial statistics subgroup, including Tilmann Gneiting, Douglas Nychka, Anthony O’Hagan, Adrian Raftery, Nancy Reid and Michael Stein; (c) a semiparametric–non-parametric statistics (subgroup), including Raymond Carroll, Ciprian Craniceanu, David Ruppert and Naisyin Wang. The third community is ‘variable selection’ including researchers on dimension reduction (Dennis Cook), quantile regression (Xuming He), variable selection (Peter Bickel, Peter Buhlmann,¨ Emmanuel Candes, Jianqing Fan, Peter Hall, Trevor Hastie, Runze Li, Terrence Tao, Robert Tibshirani, Alexandre Tsybakov, Ming Yuan, Cun-Hui Zhang, Ji Zhu and Hui Zou). Our results must be interpreted with caution, for the scope of the data set is limited. Also, it is not our intention to rank authors or papers. Jon R. Kettenring (Drew University, Madison) This paper provides an excellent comprehensive discussion of citation analysis with emphasis on ranking journals. Citations are a weak form of data. The hope is that the data will nevertheless be sufficiently rich to produce useful insights. With this in mind, my comments focus on Section 3, clustering journals, which the authors suggest ‘can help to establish relatively homogeneous subsets of journals that might reasonably be ranked together’. A complete linkage hierarchical clustering algorithm is used to produce the dendrogram shown in Fig. 2. Six clusters and two singletons are identified by cutting the tree. How well determined are they? For comparison, I repeated the analysis by using the average and minimax (Bien and Tibshirani, 2011) linkage Discussion on the Paper by Varin, Cattelan and Firth 53 John Rice David L Donoho David Siegmund John D Storey Joseph P Romano Bradley Efron Subhashis Ghosal Daniel Yekutieli Abba M Krieger Sanat K Sarkar Donald B Rubin Aad van der Vaart Jiashun Jin Larry Wasserman Zhiyi Chi Yoav Benjamini D R Cox Peter Muller E L Lehmann Christian P Robert James O Berger Mark G Low Christopher Genovese Paul R Rosenbaum Felix Abramovich Iain M Johnstone (a) Ming−Hui Chen Paul Fearnhead R Todd Ogden Yi Li Gareth Roberts David RuppertAmy H Herring Omiros Papaspiliopoulos Simon N Wood Gary L Rosner Jeffrey S Morris Joseph G Ibrahim Mohammad Hosseini−Nasab Ciprian M Crainiceanu Steven N MacEachern Ulrich Stadtmuller Athanasios Kottas Raymond J Carroll Alan E Gelfand Naisyin Wang Sudipto BanerjeeMark F J Steel Alan H Welsh Andrew O Finley Anthony OHagan Brian S Caffo Michael L SteinMontserrat Fuentes Yongtao Guan Hao Purdue Zhang Marc G Genton Michael Sherman Huiyan Sang Theo Gasser Tilmann Gneiting Douglas W Nychka Martin Schlather Adrian E Raftery N Reid Jonathan Tawn Laurens de Haan Robin Henderson (b) Michael Kosorok Bin Yu Terence Tao Jian HuangHansheng Wang Dennis Cook Alexandre Tsybakov Helen Zhang Emmanuel Candes Lixing Zhu Jinchi Lv Nicolai Meinshausen Hui Zou Trevor Hastie Cun−Hui Zhang Peter Bickel Yi Lin Heng Peng Peter Buhlmann Joel Horowitz Dan Yu Lin Jianqing Fan Ming Yuan Runze Li Elizaveta LevinaJi Zhu Robert Tibshirani Peter Hall Hua Liang Xuming He Tony Cai Bernard Silverman Jianhua Huang Jonathan Taylor Jane−Ling Wang Fang Yao Hans−Georg Muller Mohsen Pourahmadi Marina Vannucci Xihong Lin (c) Fig. 16. Communities found in the citation network: (a) ‘large-scale multiple testing’ (359 nodes; only 26 nodes with 24 or more citers are shown); (b) ‘spatial statistics’ (1010 nodes; only 42 nodes with 24 or more citers are shown); (c) ‘variable selection’ community (1285 nodes; only 40 nodes with 54 or more citers are shown).