Learning on Hypergraphs: Spectral Theory and Clustering Pan Li, Olgica Milenkovic Coordinated Science Laboratory University of Illinois at Urbana-Champaign March 12, 2019 Learning on Graphs Graphs are indispensable mathematical data models capturing pairwise interactions: k-nn network social network publication network Important learning on graphs problems: clustering (community detection), semi-supervised/active learning, representation learning (graph embedding) etc. Beyond Pairwise Relations A graph models pairwise relations. Recent work has shown that high-order relations can be significantly more informative: Examples include: Understanding the organization of networks (Benson, Gleich and Leskovec'16) Determining the topological connectivity between data points (Zhou, Huang, Sch}olkopf'07). Graphs with high-order relations can be modeled as hypergraphs (formally defined later). Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. High-order network motifs: Motif (Benson’16) Microfauna Pelagic fishes Crabs & Benthic fishes Macroinvertebrates Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. (Zhou, Yu, Han'11) Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Review of Graph Clustering: Notation and Terminology Graph Clustering Task: Cluster the vertices that are \densely" connected by edges. Graph Partitioning and Conductance A (weighted) graph G = (V ; E; w): for e 2 E, we is the weight. Partition V into two sets: V = S [ S¯. ¯ Boundary of set S: @S , fe 2 Eje \ S 6= ;; e \ S 6= ;g. Edge e is cut by (S; S¯) if e 2 @S. Total cut cost: X Cut(S) = we : e2@S Volume of set S: X X X Vol(S) = dv = we : v2S v2S e:v2e Conductance of a set: Cut(S) (S) = : minfVol(S); Vol(S¯)g Spectral Clustering It is well-known that sets with small conductance correspond to high-quality clusters: community detection in real networks [YL2012], image segmentation [SM2002], etc. Cut(S) min (S); (S) , : S minfVol(S); Vol(S¯)g Algorithmic method of choice: spectral clustering Spectral Graph Partitioning Input: The adjacency matrix A and diagonal degree matrix D. Step 1: Compute the normalized Laplacian L = I − D−1=2AD−1=2. T Step 2: Compute the eigenvector u = (u1; u2; :::; un) corresponding to the second smallest eigenvalue of L. Step 3: Partition the set of vertices according to D−1=2u. Performance Guarantees for Spectral Clustering ^: the conductance obtained via spectral graph partitioning; ∗: the optimal conductance (Cheeger constant); ^ ≤ 2p ∗ [C97] A direct consequence of Cheeger's inequality: If λ is the second smallest eigenvalue of L, then ( ∗)2 λ ≤ ≤ ∗: 4 2 From Graphs to Hypergraphs Each hyperedge e is associated with weight we e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): we (S) describes the "contribution" of the set S to the high-order relation e. If for all S 62 f;; eg, we (S) = const:, then we obtain standard hypergraphs. Hypergraphs and Inhomogeneous hypergraphs Formal definition: A hypergraph is an ordered pair G = (V ; E), where V is the vertex set, and E comprises hyperedges e ⊆ V s.t. jej ≥ 2. 1 6 2 7 5 3 8 4 10 9 3-uniform hypergraph: jej = 3 Hypergraphs and Inhomogeneous hypergraphs Formal definition: A hypergraph is an ordered pair G = (V ; E), where V is the vertex set, and E comprises hyperedges e ⊆ V s.t. jej ≥ 2. Each hyperedge e is associated with weight we e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): we (S) describes the "contribution" of the set S to the high-order relation e. If for all S 62 f;; eg, we (S) = const:, then we obtain standard hypergraphs. A we we(A) A we we(B) we B C we(C) B C homogenous Inhomogenous hyperedges (Metabolicinhomogenous Networks Example) Inhomogenous Hypergraph Partitioning: Two Clusters Inhomogeneous weights: e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): Cost of cut (inhomogeneous case): X Cut(S) = we (S \ e): e2@S Volume of set S: X Vol(S) = dv ; v2S P where d : V ! R≥0 is the \degree" dv = e:v2e kwe k1. Objective Inhomogenous Hypergraph Partitioning: Minimize the conductance Cut(S) min (S); (S) , : S maxfVol(S); Vol(S¯)g New Algorithms for Inhomogenous Hypergraph Partitioning Major technical challenge: there is no matrix form for the Laplacian(s) of (inhomogeneous) hypergraphs. Two variants of spectral clustering: \Matrix approximation" of Laplacian (projection) + Standard graph spectral clustering; Pros: Efficient algorithms; Cons: Large distortion. Nonlinear Laplacian spectrum approximation; Pros: Small distortion; Cons: Potentially large complexity. \Matrix approximation" of Laplacian (projection) + Standard graph spectral clustering Projection-based Methods The algorithm essentially follows a 3-step framework: Spectral Hypergraph Partitioning Step 1: Project each hyperedge onto a weighted clique. Step 2: Merge the \projected cliques" into one graph. Step 3: Perform classical spectral graph partitioning based on the normalized Laplacian. Projection-based Methods An example of projection-based methods for constant-weight hypergraphs [ZHS'07, BGL'16] : (e) Projection: wvv~ = we =jej; 1 1 2 2 2 55 55 1 P (e) 2 Merging: wvv~ , e2E wvv~ ; 5 1 3 1 2 1 4 2 2 1 5 2 3 1 2 4 Spectral graph partitioning. Distortion Caused by Projection Consider a constant-weight hyperedge with size jej: weights in the projected clique are uniform. |e| - 1 |e|2/4 Projection may induce the distortion even for constant-weight hyperedges. Projection for Inhomogeneous Hyperedges Key question: How does one project inhomogeneous hyperedges onto cliques, with optimal cost approximation? min β(e) w (e) X (e) (e) s.t. we (S) ≤ wvv~ ≤ β we (S); v2S;v~2e=S e for all S 2 2 for which we (S) is defined, (e) where fwvv~ gv;v~2e stand for projected edge weights. O(2jej) constraints: LP may be complex; the problem may be infeasible. Problems: 1) There may be no feasible β(e) values for some hyperedges e. 2) One may have wvv~ < 0 for some v; v~. Empirically, wvv~ = (wvv~)+ appears to work well, but no general analysis available. Theoretical Performance Guarantees β(e) is the approximation ratio when projecting e into a clique. Theorem If there exist feasible constants β(e) for all hyperedges e and wvv~ ≥ 0 for all fv; v~g, then the underlying ^ satisfies ∗ p ^ ≤ 2(β ) ∗; ∗ ∗ (e) where is the Cheeger constant, β = maxe2E β . Theoretical Performance Guarantees β(e) is the approximation ratio when projecting e into a clique. Theorem If there exist feasible constants β(e) for all hyperedges e and wvv~ ≥ 0 for all fv; v~g, then the underlying ^ satisfies ∗ p ^ ≤ 2(β ) ∗; ∗ ∗ (e) where is the Cheeger constant, β = maxe2E β . Problems: 1) There may be no feasible β(e) values for some hyperedges e. 2) One may have wvv~ < 0 for some v; v~. Empirically, wvv~ = (wvv~)+ appears to work well, but no general analysis available. Submodular Weights Sufficient conditions to approximate a set-partition function we (S) by a graph-cut function? Submodular costs we (S)! Definition e ≥0 A function we : 2 ! R that satisfies e we (S1) + we (S2) ≥ we (S1 \ S2) + we (S1 [ S2) for all S1; S2 2 2 ; is referred to as submodular. Performance Guarantees under Submodular Costs Theorem A symmetric submodular function we (·) with we (;) = 0 has a constant graph-cut approximation function with nonnegative (e) weights fwvv 0 gv;v 02e . Specific Questions Is there a simple algorithm that avoids solving the optimization problem? (e) Constrain wvv~ = fvv~(we ), where fvv~(·) is linear. Finding the optimal β becomes a min-max problem: min max β ffvv~gv;v~2e submodular we X V s.t. we (S) ≤ fvv~(we (S)) ≤ β we (S); for S 2 2 . v2S;v~2e=S fvv~(·) does not depend on we but may depend on jej. Theoretical Performance Guarantees Theorem The min-max optimal solution (linear, 2 ≤ jej ≤ 7) satisfies: ∗(e) X we (S) w = 1 vv~ 2jSj(jej − jSj) jfv;v~g\Sj=1 S22e =f;;eg w (S) − e 1 2(jSj + 1)(jej − jSj − 1) jfv;v~g\Sj=0 w (S) − e 1 ; 2(jSj − 1)(jej − jSj + 1) jfv;v~g\Sj=2 jej 2 3 4 5 6 7 with constants β(e): β 1 1 3=2 2 4 6 Conjecture: The claim holds for all jej.(Proof?) Applications of Inhomogeneous Hypergraph Clustering in Foodweb Hierarchical Clustering, Category Learning in Rankings and Subspace Segmentation Application 1: Foodweb Hierarchical Clustering 3 2 Producers 1 0 Primary Consumers -1 … -2 High-level Consumers -3 Hierarchical communities in the Florida -4 Florida Bay food web Bay food web. -2 -1 0 1 2 3 4 5 Foodweb Hierarchical Clustering I 3 2 1 If two species share the same 0 food source, they are in the same niche. -1 If two species share the same predators, they are in the same -2 niche. -3 Florida Bay food web -4 -2 -1 0 1 2 3 4 5 Foodweb Hierarchical Clustering II Motif of interest: v1 v 3 we (fvi g) = 1 for i = 1; 2; 3; 4 we (fv1; v3g) = 2; and we (fv1; v4g) = 2 w (fv ; v g) = 0 v2 v4 e 1 2 The clustering result: Secondary consumers Primary consumers Producers Invertebrates Forage fishes Predatory fishes & Birds Top-level Predators Only 5 links in reverse directions.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages42 Page
-
File Size-