Network Community Detection by SCORE

Network Community Detection by SCORE

Community Detection by SCORE with applications to Statisticians' Networks Jiashun Jin Statistics Department Carnegie Mellon University Collaborators: Pengsheng Ji (Univ. of Georgia) Zheng Tracy Ke (Univ. of Chicago) April 6, 2015 Jiashun Jin Community Detection by SCORE Network community detection I n = 1222 web blogs (nodes) I 16714 hyperlinks (edges) 2 I #fedgesg n : adjacency matrix X is very sparse I Two perceivable communities I Goal. Find the (unknown) community labels Political web blogs (Adamic and Glance; 2005) Jiashun Jin Coauthorship and Citation networks for Statisticians Jiashun Jin Community Detection by SCORE Abstraction (undirected) Data: adjacency matrix A of a network N = (V ; E) I V = f1; 2;:::; ng: nodes 1; an edge between nodes i and j A(i; j) = 0; otherwise I K perceivable \communities" V = V (1) [ V (2) ::: [ V (K) Goal. For each node, predict the community label. Diagonals of A are 0 for convenience Jiashun Jin Community Detection by SCORE Signal and noise decomposition Adjacency matrix : A = E[A]+W ; W ≡ (A−E[A]); \signal"+\noise" I W = A − E[A]: generalized Wigner matrix I upper triangles: independent centered-Bernoulli I Question: How to model Ω if we write E[X ] = Ω − diag(Ω) Jiashun Jin Community Detection by SCORE Box's wisdom \All models are wrong, but some are useful" George E.P. Box (1919{2013) Jiashun Jin Community Detection by SCORE Degree Corrected Block Model (DCBM) Ω(i; j) = P(k; `) () Ω = ΘLΘ θ(i) · θ(j) 2 θ(1) 3 a b 6 θ(2) 7 P = ; Θ = 6 . 7 b c 4 .. 5 θ(7) 2 abababa 3 2 aaaabbb 3 6 bcbcbcb 7 6 aaaabbb 7 6 7 6 7 6 abababa 7 6 aaaabbb 7 6 7 permute 6 7 L = 6 bcbcbcb 7 −−−−! 6 aaaabbb 7 6 abababa 7 6 bbbbccc 7 6 7 6 7 4 bcbcbcb 5 4 bbbbccc 5 abababa bbbbccc Karrer and Newman (2010) Jiashun Jin Community Detection by SCORE Tukey's suggestion \Which part of the sample contains the information" Tukey (1965), PNAS John W. Tukey (1915{2000) Jiashun Jin Community Detection by SCORE Where is the information? A = Ω − diag(Ω) + W ≈ Ω 0 SVD : Ω = ΘLΘ = Un;K DK;K (Un;K ) 2 3 s1 t1 2 3 θ(1) 6 s2 t2 7 6 7 6 θ(2) 7 6 s1 t1 7 Un;K = ΘTn;K = 6 . 7 6 . 7 4 .. 5 6 . 7 6 7 θ(n) 4 s1 t1 5 s2 t2 Jiashun Jin Community Detection by SCORE SCORE: algorithm SCORE: Spectral Clustering On Ratios-of-Eigenvectors Input: A and K I Obtain leading eigenvectorsη ^1,η ^2, :::,η ^K I Obtain n × (K − 1) matrix of entry-wise ratios η^ (i) R^(i; k) = k+1 ; 1 ≤ i ≤ n; 1 ≤ k ≤ K − 1 η^1(i) I Apply k-means to R^ (assume ≤ K clusters) Jiashun Jin Community Detection by SCORE Political weblog network (K = 2) x-axis: i = 1; 2;:::; n; y-axis: R^(i); 58 errors (lowest in literature) 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −3 −3.5 −4 0 200 400 600 800 1000 1200 Methods SCORE PCA normalized PCA NSC BCPL Errors 58 437 600 69 104:5 (SD: 145:4) Newman (2016), Bickel and Chen (2009), Zhao et al (2012) Jiashun Jin Community Detection by SCORE Regularity conditions K X 0 A = Ω − diag(Ω) + W ; Ω = ΘLΘ; L = P(k; `)1k 1` k;`=1 I (a). Eigen-spacing of DPD is ≥ a constant C X D(k; k)2 = θ(i)2=kθk2 i2V (k) 4 I (b). log(n)θmax kθk1=kθk ! 0, so that kW k kΩk; with prob. 1 − o(n−3) 2 3 I (c). log(n)θmax /θmin ≤ kθk3, so matrix-form Bernstein inequality holds (for the sum of random matrices) Jiashun Jin Community Detection by SCORE Consistency of SCORE n 3 n ^ −1 X ^ kθk3 X 1 1 kθk1 2 Hammp(`; `) = n min P `i 6= π(`i ) ; errn = 4 max ; 2 π kθk θ(i) θmin kθk i=1 i=1 Theorem. Consider DCBM where (a)-(c) hold. As n ! 1, if −1 n∗ log(n)errn ! 0, where n∗ is the minimum community size, ^score −1 3 then Hammp(` ; `) ≤ Cn log (n)errn. −1 Proof. Full analysis of Θ (^ηk − ηk ) I Spectral perturbation theory I Classical large deviations inequalities I Matrix-form Bernstein inequality (Tropp, 2012) iid Remark. If we assume θ(i) ∼ F as in Zhao et al (2012), then score −1 3 Hammp(`^ ; `) ≤ Cn log (n) Jiashun Jin Community Detection by SCORE Coauthor/Citation Networks (statisticians) I People most interested: statisticians/friends I We know \inside information" N/A to outsiders Scientific Problem: Dynamics of US-based statisticians in theory & methods of the HDDA era HDDA: High-Dimensional Data Analysis Data: All published research papers in AoS, Biometrika, JASA, and JRSS-B, 2003{2012 Jiashun Jin Community Detection by SCORE Disclaimer I Data and scope of scientific interests: limited I It is not our intention to I rank one author/paper/area over the others I label an author/paper to a certain area I We have to use real names because the networks are for real people (\us") Jiashun Jin Community Detection by SCORE Citation Network, I Large-Scale Multiple Testing by SCORE (359 nodes; 26 shown) 40 John● Rice David L● Donoho David Siegmund● John D● Storey 35 Joseph P● Romano Bradley● Efron Subhashis● Ghosal Daniel Yekutieli● 30 Abba M● Krieger Sanat K● Sarkar Donald ●B Rubin ● Aad van ●der Vaart Jiashun Jin 25 Larry Wasserman● Zhiyi● Chi Yoav Benjamini● D R● Cox 20 Peter ●Muller E L Lehmann● Christian● P Robert James O● Berger Mark ●G Low 15 Christopher● Genovese Paul R Rosenbaum● Felix Abramovich● 10 Iain M Johnstone● 0 5 10 15 20 Jiashun Jin Community Detection by SCORE Citation Network, II Spatial stat./nonparametric stat. by SCORE (1010 nodes; 42 shown) Ming−Hui Chen Paul Fearnhead R Todd Ogden Yi Li Gareth Roberts David RuppertAmy H Herring Omiros Papaspiliopoulos Simon N Wood Gary L Rosner Jeffrey S Morris Joseph G Ibrahim Mohammad Hosseini−Nasab Ciprian M Crainiceanu Steven N MacEachern Ulrich Stadtmuller Athanasios Kottas Raymond J Carroll Alan E Gelfand Naisyin Wang Sudipto BanerjeeMark F J Steel Alan H Welsh Andrew O Finley Anthony OHagan Brian S Caffo Michael L SteinMontserrat Fuentes Yongtao Guan Hao Purdue Zhang Marc G Genton Michael Sherman Huiyan Sang Theo Gasser Tilmann Gneiting Douglas W Nychka Martin Schlather Adrian E Raftery N Reid Jonathan Tawn Laurens de Haan Robin Henderson Jiashun Jin Community Detection by SCORE Citation Network, II (further split, I) Parametric Spatial Statistics by SCORE (304 nodes; 21 shown) Fadoua Balabdaoui● Laurens● de Haan Adrian E● Raftery Cristiano● Varin Jonathan● Tawn Martin Schlather● Anthony● OHagan Marc G● Genton Tilmann● Gneiting Nicolas● Chopin N Reid● Douglas W● Nychka Haavard● Rue Huiyan● Sang Michael● L Stein Paolo ●Vidoni Hao Zhang● (Purdue) Montserrat● Fuentes Leah J● Welty Sudipto ●Banerjee Andrew ●O Finley Jiashun Jin Community Detection by SCORE Citation Network, II (further split, II) Nonparametric Spatial Statistics by SCORE (212 nodes; 21 shown) Herbert ●K H Lee Natesh● Pillai Trivellore E ●Raghunathan Steven N MacEachern● ● Matthew J Beal ● ● Robert B Gramacy Gary L● RosnerAthanasios Kottas David ●M Blei ● Fernando A QuintanaMark F● J Steel Yee Whye● Teh Alan E ●Gelfand Ju−Hyun● Park Gareth ●Roberts Omiros Papaspiliopoulos● Paul Fearnhead● Radford● M Neal Pilar L ●Iglesias Alexandros● Beskos Yi● Li Jiashun Jin Community Detection by SCORE Citation Network, II (further split, III) Non-parametrics/semi-parametrics by SCORE (392 nodes; 24 shown) Brian S● Caffo Mohammad Hosseini−Nasab● Ulrich Stadtmuller● Jeffrey S● Morris Ciprian M Crainiceanu● David Ruppert● Naisyin● Wang Vic Patrangenaru● Thomas ●C M Lee ● Hongtu● Zhu Rui Paulo Ray Carroll● Nilanjan Chatterjee● Joseph G● Ibrahim Rabi Bhattacharya● ● Alan H Welsh Robert A● Rigby Michael A● Benjamin Hua Yun● Ming−HuiChen ● Chen Silvia Shimakura● D Mikis Stasinopoulos● Robin Henderson● Theo Gasser● Jiashun Jin Community Detection by SCORE Citation Network, III Variable Selection by SCORE (1285 nodes; 40 shown) Qiwei Yao Mohsen Pourahmadi Elizaveta Levina Ji Zhu Hans−Georg Muller Peter HallPeter J Bickel Hansheng Wang Robert J Tibshirani Jinchi Lv Cun−Hui Zhang Heng Peng Jianqing Fan Emmanuel J Candes Jianhua ZLixing Huang Zhu Jian Huang Ming Yuan Peter Buhlmann Runze Li Xuming He Terence TaoTrevor J HastieHui Zou Alexandre B Tsybakov Yi Lin Nicolai Meinshausen Joel L Horowitz Hao Helen Zhang R Dennis Cook Michael R Kosorok Dan Yu Lin L J Wei Jiashun Jin Community Detection by SCORE Coauthorship Network, I Objective Bayes by SCORE (64 nodes; 14 shown) John A● Cafeo ● ●● ● ● 11 Fei● Liu ●● Jerry ●Sacks ● Carlos M Carvalho ● ● ● ● ● Steven N MacEachern ● 10 ● ● ● Rui Paulo● ● ● ● ● ● ● 9 ● ● ● ● ● ● ● Alan E Gelfand ● James O● Berger ● Athanasios● Kottas 8 ● ● ● ● ● ● ● R J Parthasarathy● ● ● ● 7 ● ● ● ● Gonzalo● Garcia−Donato● ● ● ● Daniel● Walsh J Palomo● 6 M J Bayarri● ● 0 5 10 15 20 ● Jiashun Jin Community Detection by SCORE Coauthorship Network, II Biostatistics by SCORE (388 nodes; 16 shown) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Heping Zhang Steve Marron● ● ● ● ● ● ● Ji Zhu ● ● ● ● ● ● ● ● ● ● ● ● ● Hongtu● ● Zhu ● ● ● ● ● ● ● ● ● ● ● Louise● Ryan ●Weili● ● Lin ● ● ● ● ● ● ● ● ● ● ● ● ● Yimei●● ● ●Li Helen Zhang ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Joseph Ibrahim ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Eric Feuer Tapabrata● Maiti● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Debajyoti● ● Sinha ● ● Trivellore● Raghunathan ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Zhiliang● ● Ying ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Jun Liu ● ● ● ● ● ● ● ● ● ● ● ● ● L J Wei● ● ● ● ● ● ● David Dunson● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Jiashun Jin Community Detection by SCORE Coauthorship Network, III HDDA by SCORE (1811 nodes, 32 shown) Hans−Georg●

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    30 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us