Submodular Hypergraphs: P-Laplacians, Cheeger Inequalities and Spectral Clustering

Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering Pan Li 1 Olgica Milenkovic 1 Abstract approximations have been based on the assumption that each hyperedge cut has the same weight, in which case the We introduce submodular hypergraphs, a family underlying hypergraph is termed homogeneous. of hypergraphs that have different submodular weights associated with different cuts of hyper- However, in image segmentation, MAP inference on edges. Submodular hypergraphs arise in cluster- Markov random fields (Arora et al., 2012; Shanu et al., ing applications in which higher-order structures 2016), network motif studies (Li & Milenkovic, 2017; Ben- carry relevant information. For such hypergraphs, son et al., 2016; Tsourakakis et al., 2017) and rank learn- we define the notion of p-Laplacians and derive ing (Li & Milenkovic, 2017), higher order relations between corresponding nodal domain theorems and k-way vertices captured by hypergraphs are typically associated Cheeger inequalities. We conclude with the de- with different cut weights. In (Li & Milenkovic, 2017), Li scription of algorithms for computing the spectra and Milenkovic generalized the notion of hyperedge cut of 1- and 2-Laplacians that constitute the basis of weights by assuming that different hyperedge cuts have new spectral hypergraph clustering methods. different weights, and that consequently, each hyperedge is associated with a vector of weights rather than a single scalar weight. If the weights of the hyperedge cuts are sub- 1. Introduction modular, then one can use a graph with nonnegative edge Spectral clustering algorithms are designed to solve a relax- weights to efficiently approximate the hypergraph, provided ation of the graph cut problem based on graph Laplacians that the largest size of a hyperedge is a relatively small con- that capture pairwise dependencies between vertices, and stant. This property of the projected hypergraphs allows one produce sets with small conductance that represent clusters. to leverage spectral hypergraph clustering algorithms based Due to their scalability and provable performance guaran- on clique expansions with provable performance guarantees, spectral methods represent one of the most prevalent tees. Unfortunately, the clique expansion method in general graph clustering approaches (Chung, 1997; Ng et al., 2002). has two drawbacks: The spectral clustering algorithm for graphs used in the second step is merely quadratically opti- Many relevant problems in clustering, semisupervised learn- mal, while the projection step can cause a large distortion. ing and MAP inference (Zhou et al., 2007; Hein et al., 2013; Zhang et al., 2017) involve higher-order vertex dependen- To address the quadratic optimality issue in graph cluster- cies that require one to consider hypergraphs instead of ing, Amghibech (Amghibech, 2003) introduced the notion graphs. To address spectral hypergraph clustering problems, of p-Laplacians of graphs and derived Cheeger-type inequal- several approaches have been proposed that typically op- ities for the second smallest eigenvalue of a p-Laplacian, ¨ erate by first projecting the hypergraph onto a graph via p > 1, of a graph. These results motivated Buhler and clique expansion and then performing spectral clustering Hein’s work (Buhler¨ & Hein, 2009) on spectral clustering on graphs (Zhou et al., 2007). Clique expansion involves based on p-Laplacians that provided tighter approximations transforming a weighted hyperedge into a weighted clique of the Cheeger constant. Szlam and Bresson (Szlam & such that the graph cut weights approximately preserve the Bresson, 2010) showed that the 1-Laplacian allows one to cut weights of the hyperedge. Almost exclusively, these exactly compute the Cheeger constant, but at the cost of computational hardness (Chang, 2016). Very little is known 1Department of Electrical and Computer Engineering, about the use of p-Laplacians for hypergraph clustering and University of Illinois Urbana-Champaign, USA. Correspon- their spectral properties. dence to: Pan Li <[email protected]>, Olgica Milenkovic <[email protected]>. To address the clique expansion problem, Hein et al. (Hein th et al., 2013) introduced a clustering method for homoge- Proceedings of the 35 International Conference on Machine neous hypergraphs that avoids expansions and works di- Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). rectly with the total variation of homogeneous hypergraphs, Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering without investigating the spectral properties of the operator. mental results in the spectral theory of p-Laplacians, while The only other line of work trying to mitigate the projection Section4 introduces two algorithms for evaluating the sec- problem is due to Louis (Louis, 2015), who used a natural ond largest eigenvalue of p-Laplacians needed for 2-way extension of 2-Laplacians for homogeneous hypergraphs, clustering. Section5 presents experimental results. All derived quadratically-optimal Cheeger-type inequalities and proofs are relegated to the Supplementary Material. proposed a semidefinite programing (SDP) based algorithm whose complexity scales with the size of the largest hyper- 2. Mathematical Preliminaries edge in the hypergraph. A weighted graph G = (V; E; w) is an ordered pair of Our contributions are threefold. First, we introduce submod- two sets, the vertex set V = [N] = f1; 2;:::;Ng and the ular hypergraphs. Submodular hypergraphs allow one to edge set E ⊆ V × V , equipped with a weight function perform hyperedge partitionings that depend on the subsets w : E ! +. of elements involved in each part, thereby respecting higher- R order and other constraints in graphs (see (Li & Milenkovic, A cut C = (S; S¯) is a bipartition of the set V , while the 2017; Arora et al., 2012; Fix et al., 2013) for applications in cut-set (boundary) of the cut C is defined as the set of edges food network analysis, learning to rank, subspace clustering that have one endpoint in S and one in the complement of and image segmentation). Second, we define p-Laplacians S, S¯, i.e., @S = f(u; v) 2 E j u 2 S; v 2 S¯g. The weight P for submodular hypergraphs and generalize the correspond- of the cut induced by S equals vol(@S) = u2S; v2S¯ wuv, ing discrete nodal domain theorems (Tudisco & Hein, 2016; while the conductance of the cut is defined as Chang et al., 2017) and higher-order Cheeger inequalities. vol(@S) Even for homogeneous hypergraphs, nodal domain theo- c(S) = ; minf (S); (S¯)g rems were not known and only one low-order Cheeger in- vol vol equality for 2-Laplacians was established by Louis (Louis, P P where vol(S) = u2S µu, and µu = v2V wuv. When- 2015). An analytical obstacle in the development of such a ever clear from the context, for e = (uv), we write we p theory is the fact that -Laplacians of hypergraphs are oper- instead of wuv. Note that in this setting, the vertex weight sets of values ators that act on vectors and produce . Conse- values µu are determined based on the weights of edges we quently, operators and eigenvalues have to be defined in a incident to u. Clearly, one can use a different choice for set-theoretic manner. Third, based on the newly established these weights and make them independent from the edge spectral hypergraph theory, we propose two spectral cluster- weights, which is a generalization we pursue in the context ing methods that learn the second smallest eigenvalues of of submodular hypergraphs. The smallest conductance of 2 1 2 - and -Laplacians. The algorithm for -Laplacian eigen- any bipartition of a graph G is denoted by h2 and referred value computation is based on an SDP framework and can to as the Cheeger constant of the graph. provably achieve quadratic optimality with an O(pζ(E)) approximation constant, where ζ(E) denotes the size of A generalization of the Cheeger constant is the k−way the largest hyperedge in the hypergraph. The algorithm for Cheeger constant of a graph G. Let Pk denote the set 1-Laplacian eigenvalue computation is based on the inverse of all partitions of V into k-disjoint nonempty subsets, power method (IPM) (Hein & Buhler¨ , 2010) that only has i.e., Pk = f(S1;S2; :::; Sk)jSi ⊂ V; Si 6= ;;Si \ Sj = convergence guarantees. The key novelty of the IPM-based ;; 8i; j 2 [k]; i 6= jg. The k−way Cheeger constant is method is that the critical inner-loop optimization problem defined as of the IPM is efficiently solved by algorithms recently devel- hk = min max c(Si): oped for decomposable submodular minimization (Jegelka (S1;S2;:::;Sk)2Pk i2[k] et al., 2013; Ene & Nguyen, 2015; Li & Milenkovic, 2018). Although without performance guarantees, given that the Spectral graph theory provides a means for bounding the 1-Laplacian provides the tightest approximation guarantees, Cheeger constant using the (normalized) Laplacian ma- the IPM-based algorithm – as opposed to the clique expan- trix of the graph, defined as L = D − A and L = sion method (Li & Milenkovic, 2017) – performs very well I − D−1=2AD−1=2, respectively. Here, A stands for the ad- empirically even when the size of the hyperedges is large. jacency matrix of the graph, D denotes the diagonal degree This fact is illustrated on several UC Irvine machine learning matrix, while I stands for the identity matrix. The graph datasets available from (Asuncion & Newman, 2007). (g) Laplacian is an operator 42 (Chung, 1997) that satisfies The paper is organized as follows. Section2 contains an (g) X 2 overview of graph Laplacians and introduces the notion of hx; 42 (x)i = wuv(xu − xv) : submodular hypergraphs. The section also contains a de- (uv)2E scription of hypergraph Laplacians, and relevant concepts A generalization of the above operator termed the p- in submodular function theory. Section3 presents the funda- (g) Laplacian operator of a graph 4p was introduced by Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering P Amghibech in (Amghibech, 2003), where as y(S) = yv.

Submodular Hypergraphs: P-Laplacians, Cheeger Inequalities and Spectral Clustering

Segment-Based Clustering of Hyperspectral Images Using Tree-Based Data Partitioning Structures

Image Segmentation Based on Multiscale Fast Spectral Clustering Chongyang Zhang, Guofeng Zhu, Minxin Chen, Hong Chen, Chenjian Wu

Learning on Hypergraphs: Spectral Theory and Clustering

Multiplex Conductance and Gossip Based Information Spreading in Multiplex Networks

Fast Approximate Spectral Clustering

An Algorithmic Introduction to Clustering

Spectral Clustering: a Tutorial for the 2010’S

Rumour Spreading and Graph Conductance∗

Absorbing State 34, 36 Adaptive Boolean Networks 78 Adaptive Chemical Network 73 Adaptive Coupled Map Lattices 66 Adaptive Epide

Spatially-Encouraged Spectral Clustering: a Technique for Blending Map Typologies and Regionalization

Semi-Supervised Spectral Clustering for Image Set Classification

Fast Approximate Spectral Clustering