Analysing Higher-Order Network Data Using Simplicial Complexes
Total Page:16
File Type:pdf, Size:1020Kb
Analysing higher-order network data using simplicial complexes 1,2* 3 3 3 1 Michael T. Schaub , Austin Benson , Rediet Abebe , Jon Kleinberg , Ali Jadbabaie 1 Institute for Data,Systems and Society, Massachusetts Institute of Technology 2 Department of Engineering Science, University of Oxford 3 Department of Computer Science, Cornell University * Email Address: [email protected] Funding: Marie Sklodowska-Curie grant agreement No 702410 (MTS), Google / Facebook Scholarship (RA), Simons Investigator Award (JK), Vannevar Bush Fellowship (AJ) We thank G. Lippner and P. Horn for earlier contributions, leading up to those reported here. Part I: Higher-order Link Prediction and simplicial closure Part II: Diffusion on simplicial complexes and simplicial PageRank 1. Motiviation: from networks to simplicial complexes 2. Link prediction and simplicial closure - synopsis 1. Motivation: Diffusion processes on graphs 2. Diffusion processes on simplicial complexes?! Graph-based diffusion Networks provide a powerful formalism for modeling complex systems, The traditional link prediction problem seeks to predict the appearance of Diffusion processes and random walks are at the core of many influential 1. State Space: nodes (no orientation) by representing the underlying set of pairwise interactions. But much of new links in a network. Here we adapt it to predict which (larger) sets of data-mining and machine learning techniques ranging from centrality 2. Spectral properties inherited from Laplacian the structure within these systems involves interactions that take place elements will have future interactions. We study the temporal evolution of measures and ranking (e.g., PageRank), to dimensionality reduction and Simplicial complex based diffusion? among more than two nodes at once — for example, communication 19 datasets, and use our higher-order formulation of link prediction to manifold learning (e.g., diffusion maps) 1. State Space: edges (orientation matters) 'positive' and 'negative' flows within a group rather than person-to-person, collaboration among a team assess the types of structural features that are most predictive of new 2. Spectral properties inherited from Laplacian rather than a pair of co-authors, or biological interaction between a set of multi-way interactions. molecules rather than just two. ... ... Problem: because of orientation of edges no interpretation in terms of probability Solution: consider lifting into higher order state space. time-stamped interactions simplicial complex graph based In higher order space: simplicial dynamics = diffusion! (sets) representation representation Schematic: diffusion process evolving over time Projection Lifting Schematic: temporal evolution leading to simplicial closure They are intimately related to the theory of harmonic functions and algebraic topology via graph Laplacians, and there exists a well developed theory relating topological properties of the graph to features Among our results, we find that different domains vary considerably in of the Laplacian / diffusion process. How does this theory extend to random walk (undirected graph) their distribution of higher-order structural parameters, and that the higher-order models such as simplicial complexes? higher-order link prediction problem exhibits some fundamental differences from traditional pairwise link prediction, with a greater role We refer to these type of simultaneous interactions on sets of more than for local rather than long-range information in predicting the appearance two nodes as higher-order interactions; they are ubiquitous, but the of new interactions. Hodge Laplacians by Algebraic Topology Random walk on lifted complex empirical study of them has lacked a general framework for evaluating Lifted simplicial complex Random walk higher-order models. lifted state-space Space of k-chains, Space of k-cochains Edge View Boundary / Co-boundary maps (correspond to incidence matrices) Datasets analysed / Example of a 'simplicial lifecycle' Transition types 19 datasets Example: lifecycle of triangular motifs in co-authorship networks (history) initial position self-loop Chain / Cochain Complex # non-closing initial position # simplices State-space (self-loops not shown) aligned transition initial position (lower adjacent) Hodge Laplacian (general) 150 - 2.5 Mio Nodes anti-aligned initial position 680 - 14 Mio time-stamped simplices Question: can we predict simplicial closure (triads, other higher order structures)? transition (upper adjacent) initial position Simplicial closure probability: triangular closure 3. Higher-order link prediction: brief results Oriented simplicial complexes and Hodge Laplacians by Example Closure Probability Comparison of configurations Define reference orientations of simplices Example Application: Personalized PageRank vectors for edge-flows (choose basis for computations) 2 • Many simplices remain open (counter triadic closure 10 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 1 100 10 PageRank (Graphs) 10 3 hypothesis) 4 Personalized PageRank (Graphs) (PageRank Matrix) 1 10 1 10 5 coauth-DBLP Personalized PageRank vectors measure importance of node w.r.t. its neighbors 10 6 (in terms of a diffusion: related to zero homology / connected components) coauth-MAG-Geology 10 6 10 5 10 4 10 3 10 2 Simplicial Personalized PageRank 1 coauth-MAG-History 1 1 music-rap-genius open wedge < open triangle Incidence matrices (node, edges, triangular faces, ...) tags-stack-overflow •Simple independent null model can reproduce some of tags-math-sx this behavior. 10 2 Simplicial personalized PageRank vectors measure importance of edges tags-ask-ubuntu 3 (in terms of edge-flows: related to first homology / cycle space) 1 10 1 threads-stack-overflow 1 10 4 Analyzed simplicial complex threads-math-sx 10 5 (all triangles filled in) threads-ask-ubuntu 10 5 10 4 10 3 10 2 1 Hodge Laplacian (1-Laplacian) 1 NDC-substances 2+ 'bulk edge' open triangle ≈ low induced flow NDC-classes 'cyclic edge' (+localized) open triangle high harmonic flow DAWN (stronger weights) • Closure prediction (not shown): simple local statistics congress-bills can perform well; however, no clear winner. Hodge Decomposition of edge-flows 2 congress-committees 10 em ail-Eu 10 3 References 1 1 email-Enron 1 10 4 [1] Benson, Austin R., et al. "Simplicial Closure and Higher-order Link Prediction." 5 arXiv preprint arXiv:1802.06916 (2018). contact-high-school 10 [2] Liben‐Nowell, David, and Jon Kleinberg. "The link‐prediction problem for social 'cut edge' 10 5 10 4 10 3 10 2 high gradient flow 2+ networks." journal of the Association for Information Science and Technology 58.7 contact-primary-school 2+ 2+ 2+ 2+ 2+ (+localized) 1 1 1 1 2+ (2007): 1019-1031. 1 1 1 1 1 2+ 2+ [3] Lü, Linyuan, and Tao Zhou. "Link prediction in complex networks: A survey." 1 2+ 2+ 2+ strong wedge vs open triangle Physica A: statistical mechanics and its applications 390.6 (2011): 1150-1170. Simplicial PageRank vectors 3-node configuration data dependent differentiate roles of edges.