Communies in Networks

1. Overview • Hub & Authority Scores 2. Methods • MGP Exchange Network 3. Temporal/Mulplex • Temporal Authority 4. Applicaons – Redux: treat identy arc between slices differently 5. Temporal Centrality • SVD stabilizes strongly- coupled singular limit • Perturbave expansion

With Sean Myers, Elizabeth Leicht, Aaron Clauset & Mason Porter Thanks to Mitch Keller, NSF & JSMF Hubs, Authories, and HITS (Kleinberg)

• Let Aij indicate links from i to j • A good hub points to good authories • A good authority is pointed to by good hubs • Seeming circularity is an eigenvector problem: find the leading eigenvectors of AA’ and A’A • “Hypertext-Induced Topic Search” CHAOS 21, 041104 (2011) Ph.D. Exchange Networks • Han Social Networks (2003): FIG. 1. (Color) Visualizations of a mathematics genealogy network. Lingua Franca’s annual Job Mathematical genealogy and department prestige Tracks compilaons, 1993-4 to Sean A. Myers,1 Peter J. Mucha,1 and Mason A. Porter2 1Department of Mathematics, University of North Carolina, Chapel Hill, North Carolina 27599, USA 1999-2000, for English, History, 2Mathematical Institute, University of Oxford, OX1 3LB, UK (Received 1 July 2011; published online 20 December 2011) [doi:10.1063/1.3668043] Mathemacs, Economics, PoliSci,

The Mathematics Genealogy Project (http://www. Psychology, Sociology. genealogy.ams.org/) is a database of over 150 000 scholars with advanced degrees in mathematics and related fields. Entries include dissertation titles, adviser(s), graduation years, degree-granting institutions, and advisees. The MGP FIG. 2. (Color) Rankings versus authority scores. • Burris ASR (2004): ~1993 rosters is popular among mathematicians, and it can be used to trace We use a “geographically inspired” layout to balance node academic lineages through luminaries like Courant, Hilbert, locations and node overlap. A Kamada-Kawai visualization4 and Wiener to historical predecessors such as Gauss, Euler, places the high-authority universities in the network’s center. for Sociology, History, PoliSci. and even Kant. For example, MGP data was used recently to 1 In Fig. 2,wecompareauthorityscoreswiththreerankings study the role of mentorship in prote´ge´ performance. of mathematics departments5–7 for the 58 universities that We consider recent branches of this mathematical fam- appear in the top 40 of at least one of the rankings or have one ily tree by projecting the MGP data for degrees granted since • Fowler, Grofman & Masuoka PS: of the top-40 authority scores. As expected, higher authority 1973 onto a network whose nodes represent academic insti- scores correlate with higher prestige (i.e., smaller rank numbers). tutions in the . An individual who earns a doc- However, scatter is obviously present, particularly with the 2010 torate from institution (during the selected period) and Polical Science & Polics (2007): A National Research Council (NRC) rankings. later advises students at institution B is represented by a We thank Mitch Keller at the MGP for providing data. directed edge of unit weight pointing from to . The total B A This work was supported by the NSF (PJM: DMS-0645369) edge weight from B to A counts the number of such advisers. PoliSci placements 1960-2000. and the James S. McDonnell Foundation (MAP: #220020177). This network representation can be used to estimate the mathematical prestige of each university using various 1R. D. Malmgren, J. M. Ottino, and L. A. N. Amaral, 465, 622 2 Kleinberg’s hub/authority à “centrality” scores of the corresponding node (see Fig. 1). (2010). We represent “hub” and “authority” scores3 using node size 2M. E. J. Newman, Networks: An Introduction (Oxford University Press, and color (red to blue), respectively. Institutions with high Oxford, UK, 2010). 3J. M. Kleinberg, J. ACM 46, 604 (1999). hiring/placement capacity. authority scores have high-valued hubs pointing to them, and 4T. Kamada and S. Kawai, Inf. Process. Lett. 31, 7 (1988). high-valued hub nodes point to high-valued authorities. A 5National Research Council 1995, http://www.ams.org/notices/199512/ university with a high authority score is a strong source of nrctables.pdf. 6 prestigious Ph.D. students and a university with a high hub National Research Council 2010 (rank order of medians of the S-ranking ranges; original release), http://graduate-school.phds.org/rankings/mathematics. score is a strong destination. In the legend of Fig. 1, we list 7US News & World Report 2010, http://grad-schools.usnews.rankingsand the top 20 institutions in order of their authority scores. reviews.com/best-graduate-schools/top-mathematics-programs/rankings.

1054-1500/2011/21(4)/041104/1/$30.00 21, 041104-1 VC 2011 American Institute of Physics

Chaos (2010): 1973-2010 Aij = #Ph.D.s from school j that later advise at school i (i nominates j’s “product”).

Kleinberg’s authority measures presge (“placement capacity”) Chaos (2010): 1973-2010 Aij = #Ph.D.s from school j that later advise at school i (i nominates j’s “product”).

Kleinberg’s authority measures presge (“placement capacity”) Mathemacs Genealogy Project: 1973-2010

UW

Minnesota Michigan Harvard Wisconsin Cornell MIT

Chicago Princeton Yale Berkeley Brown Purdue CMU Illinois NYU Maryland Stanford Caltech

UCLA (1973-2010) Mathemacs Genealogy Project: Absolute #s v. graduate normalizaon 0.45

Harvard 0.4

0.35

Princeton 0.3 Yale Stanford

0.25 MIT Chicago Berkeley

Cornell 0.2 Caltech CMU Normalized authority alternave: Normalized Authority 0.15 Brown Aij = fracon (cf. #) of granted doctorates Utah NYU Wisconsin from school j that later advise at school I UW 0.1 Illinois [threshold out schools with few graduates] UCLA Minimum = 20 0.05 Minimum = 100 Minimum = 200

0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Absolute Authority REPORTS

expected at random fails to provide any contribu- tion from these interslice couplings. Because they are specified by common identifications of nodes Extend centrality to in across slices, interslice couplings are either present or absent by definition, so when they do fall inside communities, their contribution in the count of intra- Time-Dependent, Multiscale, community edges exactly cancels that expected at random. In contrast, by formulating a null model in and Multiplex Networks terms of stability of communities under Laplacian dynamic network dynamics, we have derived a principled generaliza- Peter J. Mucha,1,2* Thomas Richardson,1,3 Kevin Macon,1 Mason A. Porter,4,5 Jukka-Pekka Onnela6,7 tion of community detection to multislice networks,

Network science is an interdisciplinary endeavor, with methods and applications drawn from across the natural, social, and information sciences. A prominent problem in network science is the coupling = 0 Movaon: Mucha et al. (2010), algorithmic detection of tightly connected groups of nodes known as communities. We developed a generalized framework of network quality functions that allowed us to study the community 5 structure of arbitrary multislice networks, which are combinations of individual networks coupled through links that connect each node in one network slice to itself in other slices. This framework 10 Modularity-based community allows studies of community structure in a general setting encompassing networks that evolve over 15 time, have multiple types of links (multiplexity), and have multiple scales.

nodes 20 detecon (counng argument fails, he study of graphs, or networks, has a long piece together the structures obtained at different tradition in fields such as sociology and times (6–9)orhaveabandonedqualityfunctions 25 mathematics, and it is now ubiquitous in in favor of such alternatives as the Minimum T 30 identy arcs defined, null model academic and everyday settings. An important Description Length principle (10). Although tensor on May 13, 2010 tool in network analysisisthedetectionof decompositions (11)havebeenusedtocluster 1 2 3 4 mesoscopic structures known as communities (or network data with different types of connections, resolution parameters obtained by Laplacian dynamics a cohesive groups), which are defined intuitively as no quality-function method has been developed coupling = 0.1 groups of nodes that are more tightly connected to for such multiplex networks.

each other than they are to the rest of the network We developed a methodology to remove these 5 la Lambioe et al.). (1–3). One way to quantify communities is by a limits, generalizing the determination of commu- quality function that compares the number of nity structure via quality functions to multislice 10 intracommunity edges to what one would expect networks that are defined by coupling multiple at random. Given the network adjacency matrix A, adjacency matrices (Fig. 1). The connections 15 www.sciencemag.org where the element Aij details a direct connection encoded by the network slices are flexible; they nodes 20 between nodes i and j, one can construct a qual- can represent variations across time, variations 1. Identy arcs connect same ity function Q (4, 5) for the partitioning of nodes across different types of connections, or even 25 into communities as Q = ∑ ij (Aij − Pij)d(gi, gj), community detection ofthesamenetworkat 30 where d(gi, gj)=1ifthecommunityassignments different scales. However, the usual procedure for gi and gj of nodes i and j are the same and 0 establishing a quality function as a direct count of 1 2 3 4 actor across slices otherwise, and P is the expected weight of the the intracommunity edge weight minus that resolution parameters ij

edge between i and j under a specified null model. coupling = 1 Downloaded from The choice of null model is a crucial con- 2. Strength of identy coupling sideration in studying network community struc- 5 ture (2). After selecting a null model appropriate to the network and application at hand, one can 10 controls temporal smoothing use a variety of computational heuristics to assign nodes to communities to optimize the quality Q 15

(2, 3). However, such null models have not been nodes 1 20 3. What behavior do we get if we available for time-dependent networks; analyses have instead depended on ad hoc methods to 2 25

3 30 1Carolina Center for Interdisciplinary Applied Mathematics, apply Hub/Authority directly? Department of Mathematics, University of North Carolina, 4 2 1 2 3 4 Chapel Hill, NC 27599, USA. Institute for Advanced Materials, resolution parameters Nanoscience and Technology, University of North Carolina, Fig. 1. Schematic of a multislice network. Four slices 3 Chapel Hill, NC 27599, USA. Operations Research, North s ={1,2,3,4}representedbyadjacenciesAijs encode Fig. 2. Multislice community detection of the Carolina State University, Raleigh, NC 27695, USA. 4Oxford intraslice connections (solid lines). Interslice con- Zachary Karate Club network (22)acrossmultiple Centre for Industrial and Applied Mathematics, Mathematical 5 nections (dashed lines) are encoded by Cjrs,specifying resolutions. Colors depict community assignments of Institute, University of Oxford, Oxford OX1 3LB, UK. CABDyN the coupling of node j to itself between slices r and s. the 34 nodes (renumbered vertically to group (y) Complexity Centre, University of Oxford, Oxford OX1 1HP, UK. A ij = # graduated in year y from 6Department of Health Care Policy, Harvard Medical School, For clarity, interslice couplings are shown for only two similarly assigned nodes) in each of the 16 slices 7 Boston, MA 02115, USA. Harvard Kennedy School, Harvard nodes and depict two different types of couplings: (i) (with resolution parameters gs ={0.25,0.5,…,4}), University, Cambridge, MA 02138, USA. coupling between neighboring slices, appropriate for for w =0(top),w =0.1(middle),andw = school j that later advise at school i *To whom correspondence should be addressed. E-mail: ordered slices; and (ii) all-to-all interslice coupling, 1(bottom).Dashedlinesboundthecommunities [email protected] appropriate for categorical slices. obtained using the default resolution (g =1).

876 14 MAY 2010 VOL 328 SCIENCE www.sciencemag.org Mathemacs Genealogy Project: 1946-2010 as “mulslice” network (“Take 1”) Ordered

(y) A ij = # graduated in year y from school j that later advise at school i Calculate authority from the NT-by-NT adjacency matrix as the leading eigenvector of M(c) [N universies, T me slices = individual years] Large c: Emphasis on current/recent years Small c: Temporal smoothing of actor roles c à 0: Singular limit as mulslice network disconnects into N chains

MGP: 1946-2010 (“Take 1”) Total authority in each slice/year MGP: 1946-2010 (“Take 1”) Node authority relave to that year’s total

−1 10

c =0.0001 c =0.005 c =0.01

−2 10 Authority for MIT, Florida, and Maryland

−3 10 1950 1960 1970 1980 1990 2000 2010 Year MGP: 1946-2010 (“Take 1”) Node authority relave to that year’s total

−1 10

c =0.0001 c =0.005 • The strongly-coupled c =0.01 singular limit is N −2 10 idencal T-node lines Authority for MIT, Florida, and Maryland

−3 10 1950 1960 1970 1980 1990 2000 2010 Year MGP: 1946-2010 (“Take 1”) Node authority relave to that year’s total

−1 10

c =0.0001 c =0.005 • The strongly-coupled c =0.01 singular limit is N −2 10 idencal T-node lines Authority for MIT, Florida, and Maryland

−3 10 1950 1960 1970 1980 1990 2000 2010 Year

GOOD: Authority on a line is analycal calculaon BAD: C  0 limit decouples odd and even nodes MGP: 1946-2010 (“Take 2”) A beer definion of temporal authority

• Eigenvector Centrality of MGP: 1946-2010 (“Take 2”) A beer definion of temporal authority

• Eigenvector Centrality of MGP: 1946-2010 (“Take 2”) A beer definion of temporal authority SVD stabilizaon of singular limit

c à 0: Singular limit as mulslice network disconnects into N chains Reshaping values into N (schools) by T (years) matrix of “presge” P, expect P à outer product of me-averaged presge with temporal variability along chain (analycally obtained sine waves) as c à 0. First component of SVD at small c. Perturbave soluon around c=0

• Essenal issue of the singular nature: at c=0, the connected temporal network of N actors across T mes degenerates into N lines of length T, each with the sine wave me dependence; but the authories of chains relave to one another are only obtained in the limit as cà0+ Perturbave soluon around c=0 Base soluon around c=0 Base soluon around c=0

Solvability condion in next order: Base soluon around c=0

Solvability condion in next order: Base soluon around c=0

Solvability condion in next order: First correcon around c=0

λ2 is given directly by requirement to zero out component in null space: the solvability Solvability condion in next order: condion has its own solvability condion Perturbave soluon around c=0 Summary: Temporal Centrality • Authority (“placement capacity”) of Ph.D. exchange to measure academic “presge” • Hub/Authority scores applied in mulslice seng by block matrix coupling of authority matrices – While some other eigenvector centralies may be applied in mulslice seng without modificaon, temporal hub and authority again calls to need to treat identy arcs differently than links • Vary identy coupling to control temporal scale • Strong identy limit is mathemacally singular – Singular value decomposion extrapolaon to limit – Perturbave expansion, with the wrinkle of a solvability condion inside another solvability condion • Other centrality measures for temporal network data: e.g., Grindrod, Higham, Parsons & Estrada PRE 2011 Communies in Networks

Overview

Methods I: Clustering Network Data Methods II: Temporal & Mulplex Network Data

Applicaons I: Modeling Network Dynamics Applicaons II: Centrality in Temporal Network Data

THANK YOU!