Learning on Hypergraphs: Spectral Theory and Clustering

Learning on Hypergraphs: Spectral Theory and Clustering Pan Li, Olgica Milenkovic Coordinated Science Laboratory University of Illinois at Urbana-Champaign March 12, 2019 Learning on Graphs Graphs are indispensable mathematical data models capturing pairwise interactions: k-nn network social network publication network Important learning on graphs problems: clustering (community detection), semi-supervised/active learning, representation learning (graph embedding) etc. Beyond Pairwise Relations A graph models pairwise relations. Recent work has shown that high-order relations can be significantly more informative: Examples include: Understanding the organization of networks (Benson, Gleich and Leskovec'16) Determining the topological connectivity between data points (Zhou, Huang, Sch}olkopf'07). Graphs with high-order relations can be modeled as hypergraphs (formally defined later). Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. High-order network motifs: Motif (Benson’16) Microfauna Pelagic fishes Crabs & Benthic fishes Macroinvertebrates Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. (Zhou, Yu, Han'11) Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Review of Graph Clustering: Notation and Terminology Graph Clustering Task: Cluster the vertices that are \densely" connected by edges. Graph Partitioning and Conductance A (weighted) graph G = (V ; E; w): for e 2 E, we is the weight. Partition V into two sets: V = S [ S¯. ¯ Boundary of set S: @S , fe 2 Eje \ S 6= ;; e \ S 6= ;g. Edge e is cut by (S; S¯) if e 2 @S. Total cut cost: X Cut(S) = we : e2@S Volume of set S: X X X Vol(S) = dv = we : v2S v2S e:v2e Conductance of a set: Cut(S) (S) = : minfVol(S); Vol(S¯)g Spectral Clustering It is well-known that sets with small conductance correspond to high-quality clusters: community detection in real networks [YL2012], image segmentation [SM2002], etc. Cut(S) min (S); (S) , : S minfVol(S); Vol(S¯)g Algorithmic method of choice: spectral clustering Spectral Graph Partitioning Input: The adjacency matrix A and diagonal degree matrix D. Step 1: Compute the normalized Laplacian L = I − D−1=2AD−1=2. T Step 2: Compute the eigenvector u = (u1; u2; :::; un) corresponding to the second smallest eigenvalue of L. Step 3: Partition the set of vertices according to D−1=2u. Performance Guarantees for Spectral Clustering ^: the conductance obtained via spectral graph partitioning; ∗: the optimal conductance (Cheeger constant); ^ ≤ 2p ∗ [C97] A direct consequence of Cheeger's inequality: If λ is the second smallest eigenvalue of L, then ( ∗)2 λ ≤ ≤ ∗: 4 2 From Graphs to Hypergraphs Each hyperedge e is associated with weight we e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): we (S) describes the "contribution" of the set S to the high-order relation e. If for all S 62 f;; eg, we (S) = const:, then we obtain standard hypergraphs. Hypergraphs and Inhomogeneous hypergraphs Formal definition: A hypergraph is an ordered pair G = (V ; E), where V is the vertex set, and E comprises hyperedges e ⊆ V s.t. jej ≥ 2. 1 6 2 7 5 3 8 4 10 9 3-uniform hypergraph: jej = 3 Hypergraphs and Inhomogeneous hypergraphs Formal definition: A hypergraph is an ordered pair G = (V ; E), where V is the vertex set, and E comprises hyperedges e ⊆ V s.t. jej ≥ 2. Each hyperedge e is associated with weight we e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): we (S) describes the "contribution" of the set S to the high-order relation e. If for all S 62 f;; eg, we (S) = const:, then we obtain standard hypergraphs. A we we(A) A we we(B) we B C we(C) B C homogenous Inhomogenous hyperedges (Metabolicinhomogenous Networks Example) Inhomogenous Hypergraph Partitioning: Two Clusters Inhomogeneous weights: e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): Cost of cut (inhomogeneous case): X Cut(S) = we (S \ e): e2@S Volume of set S: X Vol(S) = dv ; v2S P where d : V ! R≥0 is the \degree" dv = e:v2e kwe k1. Objective Inhomogenous Hypergraph Partitioning: Minimize the conductance Cut(S) min (S); (S) , : S maxfVol(S); Vol(S¯)g New Algorithms for Inhomogenous Hypergraph Partitioning Major technical challenge: there is no matrix form for the Laplacian(s) of (inhomogeneous) hypergraphs. Two variants of spectral clustering: \Matrix approximation" of Laplacian (projection) + Standard graph spectral clustering; Pros: Efficient algorithms; Cons: Large distortion. Nonlinear Laplacian spectrum approximation; Pros: Small distortion; Cons: Potentially large complexity. \Matrix approximation" of Laplacian (projection) + Standard graph spectral clustering Projection-based Methods The algorithm essentially follows a 3-step framework: Spectral Hypergraph Partitioning Step 1: Project each hyperedge onto a weighted clique. Step 2: Merge the \projected cliques" into one graph. Step 3: Perform classical spectral graph partitioning based on the normalized Laplacian. Projection-based Methods An example of projection-based methods for constant-weight hypergraphs [ZHS'07, BGL'16] : (e) Projection: wvv~ = we =jej; 1 1 2 2 2 55 55 1 P (e) 2 Merging: wvv~ , e2E wvv~ ; 5 1 3 1 2 1 4 2 2 1 5 2 3 1 2 4 Spectral graph partitioning. Distortion Caused by Projection Consider a constant-weight hyperedge with size jej: weights in the projected clique are uniform. |e| - 1 |e|2/4 Projection may induce the distortion even for constant-weight hyperedges. Projection for Inhomogeneous Hyperedges Key question: How does one project inhomogeneous hyperedges onto cliques, with optimal cost approximation? min β(e) w (e) X (e) (e) s.t. we (S) ≤ wvv~ ≤ β we (S); v2S;v~2e=S e for all S 2 2 for which we (S) is defined, (e) where fwvv~ gv;v~2e stand for projected edge weights. O(2jej) constraints: LP may be complex; the problem may be infeasible. Problems: 1) There may be no feasible β(e) values for some hyperedges e. 2) One may have wvv~ < 0 for some v; v~. Empirically, wvv~ = (wvv~)+ appears to work well, but no general analysis available. Theoretical Performance Guarantees β(e) is the approximation ratio when projecting e into a clique. Theorem If there exist feasible constants β(e) for all hyperedges e and wvv~ ≥ 0 for all fv; v~g, then the underlying ^ satisfies ∗ p ^ ≤ 2(β ) ∗; ∗ ∗ (e) where is the Cheeger constant, β = maxe2E β . Theoretical Performance Guarantees β(e) is the approximation ratio when projecting e into a clique. Theorem If there exist feasible constants β(e) for all hyperedges e and wvv~ ≥ 0 for all fv; v~g, then the underlying ^ satisfies ∗ p ^ ≤ 2(β ) ∗; ∗ ∗ (e) where is the Cheeger constant, β = maxe2E β . Problems: 1) There may be no feasible β(e) values for some hyperedges e. 2) One may have wvv~ < 0 for some v; v~. Empirically, wvv~ = (wvv~)+ appears to work well, but no general analysis available. Submodular Weights Sufficient conditions to approximate a set-partition function we (S) by a graph-cut function? Submodular costs we (S)! Definition e ≥0 A function we : 2 ! R that satisfies e we (S1) + we (S2) ≥ we (S1 \ S2) + we (S1 [ S2) for all S1; S2 2 2 ; is referred to as submodular. Performance Guarantees under Submodular Costs Theorem A symmetric submodular function we (·) with we (;) = 0 has a constant graph-cut approximation function with nonnegative (e) weights fwvv 0 gv;v 02e . Specific Questions Is there a simple algorithm that avoids solving the optimization problem? (e) Constrain wvv~ = fvv~(we ), where fvv~(·) is linear. Finding the optimal β becomes a min-max problem: min max β ffvv~gv;v~2e submodular we X V s.t. we (S) ≤ fvv~(we (S)) ≤ β we (S); for S 2 2 . v2S;v~2e=S fvv~(·) does not depend on we but may depend on jej. Theoretical Performance Guarantees Theorem The min-max optimal solution (linear, 2 ≤ jej ≤ 7) satisfies: ∗(e) X we (S) w = 1 vv~ 2jSj(jej − jSj) jfv;v~g\Sj=1 S22e =f;;eg w (S) − e 1 2(jSj + 1)(jej − jSj − 1) jfv;v~g\Sj=0 w (S) − e 1 ; 2(jSj − 1)(jej − jSj + 1) jfv;v~g\Sj=2 jej 2 3 4 5 6 7 with constants β(e): β 1 1 3=2 2 4 6 Conjecture: The claim holds for all jej.(Proof?) Applications of Inhomogeneous Hypergraph Clustering in Foodweb Hierarchical Clustering, Category Learning in Rankings and Subspace Segmentation Application 1: Foodweb Hierarchical Clustering 3 2 Producers 1 0 Primary Consumers -1 … -2 High-level Consumers -3 Hierarchical communities in the Florida -4 Florida Bay food web Bay food web. -2 -1 0 1 2 3 4 5 Foodweb Hierarchical Clustering I 3 2 1 If two species share the same 0 food source, they are in the same niche. -1 If two species share the same predators, they are in the same -2 niche. -3 Florida Bay food web -4 -2 -1 0 1 2 3 4 5 Foodweb Hierarchical Clustering II Motif of interest: v1 v 3 we (fvi g) = 1 for i = 1; 2; 3; 4 we (fv1; v3g) = 2; and we (fv1; v4g) = 2 w (fv ; v g) = 0 v2 v4 e 1 2 The clustering result: Secondary consumers Primary consumers Producers Invertebrates Forage fishes Predatory fishes & Birds Top-level Predators Only 5 links in reverse directions.

Learning on Hypergraphs: Spectral Theory and Clustering

Three Conjectures in Extremal Spectral Graph Theory

Manifold Learning

Segment-Based Clustering of Hyperspectral Images Using Tree-Based Data Partitioning Structures

Image Segmentation Based on Multiscale Fast Spectral Clustering Chongyang Zhang, Guofeng Zhu, Minxin Chen, Hong Chen, Chenjian Wu

Contemporary Spectral Graph Theory

Multiplex Conductance and Gossip Based Information Spreading in Multiplex Networks

Fast Approximate Spectral Clustering

Spectral Geometry for Structural Pattern Recognition

CS 598: Spectral Graph Theory. Lecture 10

An Algorithmic Introduction to Clustering

Spectral Clustering: a Tutorial for the 2010’S

Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts