Community Detection in Hypergraphs: Optimal Statistical Limit and Eﬃcient Algorithms

I (Eli) Chien Chung-Yi Lin I-Hsiang Wang University of Illinois, Urbana Champaign National Taiwan University National Taiwan University

Abstract learning, while it is usually an ill-posed problem due to the lack of ground truth. A prevalent way to cir- cumvent the difficulty is to formulate it as an inverse In this paper, community detection in hy- problem on a graph G = {V, E}, where each node pergraphs is explored. Under a generative i ∈ V = [n] {1, . . . , n} is assigned a community hypergraph model called “d-wise hypergraph , (label) σ (i) ∈ [K] {1,...,K} that serves as the stochastic block model” (d-hSBM) which na- , ground truth. The ground-truth community assign- turally extends the Stochastic Block Mo- ment σ :[n] → [K] is hidden while the graph G is del (SBM) from graphs to d-uniform hyper- revealed. Each edge in the graph models a certain graphs, the fundamental limit on the asymp- kind of pairwise interaction between the two nodes. totic minimax misclassified ratio is characte- The goal of community detection is to determine σ rized. For proving the achievability, we pro- from G, by leveraging the fact that different combina- pose a two-step polynomial time algorithm tion of community relations leads to different likeliness that provably achieves the fundamental limit of edge connectivity. A canonical statistical model is in the sparse hypergraph regime. For pro- the stochastic block model (SBM) (Holland et al., 1983) ving the optimality, the lower bound of the (also known as planted partition model (Condon and minimax risk is set by finding a smaller pa- Karp, 2001)) which generates randomly connected ed- rameter space which contains the most do- ges from a set of labeled nodes. The presence of the minant error events, inspired by the analy- n edges is governed by n independent Bernoulli sis in the achievability part. It turns out 2 2 random variables, and the parameter of each of them that the minimax risk decays exponentially depends on the community assignments of the two no- fast to zero as the number of nodes tends des in the corresponding edge. to infinity, and the rate function is a weighted combination of several divergence terms, Through the lens of statistical decision theory, the each of which is the Rényi divergence of order fundamental statistical limits of community detection 1/2 between two Bernoulli distributions. The provides a way to benchmark various community de- Bernoulli distributions involved in the cha- tection algorithms. Under SBM, the fundamental sta- racterization of the rate function are those tistical limits have been characterized recently. One governing the random instantiation of hype- line of work takes a Bayesian perspective, where the redges in d-hSBM. Experimental results on unknown labeling σ of nodes in V is assumed to be both synthetic and real-world data validate distributed according to certain prior, and one of the our theoretical finding. most common assumption is i.i.d. over nodes. Along this line, the fundamental limit for exact recovery is characterized (Abbe et al., 2016) in the full genera- 1 INTRODUCTION lity, while partial recovery remains open in general. See the survey (Abbe, 2017) for more details and re- Community detection (clustering) has received great ferences therein. A second line of work takes a mini- attention recently across many applications, including max perspective, and the goal is to characterize the social science, biology, computer science, and machine minimax risk, which is typically the mismatch ratio between the true community assignment and the reco- Proceedings of the 21st International Conference on Arti- vered one. In (Zhang and Zhou, 2016), a tight asymp- ficial Intelligence and Statistics (AISTATS) 2018, Lanza- totic characterization of the minimax risk for commu- rote, Spain. PMLR: Volume 84. Copyright 2018 by the nity detection in SBM is found. Along with these theo- author(s). retical results, several algorithms have been proposed Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms to achieve these limits, including degree-profiling com- to the censored block model to the hypergraph setting parison (Abbe and Sandon, 2015) for exact recovery, is considered in (Ahn et al., 2016), where an informa- spectral MLE (Yun and Proutiere, 2015) for almost- tion theoretic limit on the sample complexity for exact exact recovery, and a two-step mechanism (Gao et al., recovery is characterized. 2017) under the minimax framework. As a first step towards characterizing the fundamental However, graphs can only capture pairwise relational limit of community detection in hypergraphs, in this information, while such dyadic measure may be in- work we focus on the “d-wise hypergraph stochastic adequate in many applications, such as the task of block model” (d-hSBM), where all hyperedges genera- 3-D subspace clustering (Agarwal et al., 2005) and ted in the hypergraph stochastic block model are of the higher-order graph matching problem in compu- order d. Our main contributions are two-fold.First, ter vision (Duchenne et al., 2011). Therefore, it is na- we give a tight asymptotic characterization of the op- tural to model such beyond-pairwise interaction by a timal minimax risk in d-hSBM for any d. Second, we hyperedge in a hypergraph and study the clustering propose a polynomial time algorithm which provably problem in a hypergraph setting. Hypergraph par- achieves the minimax risk under mild regularity con- titioning has been investigated in computer science, ditions. Throughout the paper, the order d and the and several algorithms have been proposed, including number of communities K are both treated as con- spectral methods based on clique expansion (Agarwal stants, while other parameters (hyperedge connection et al., 2006), hypergraph Laplacian (Zhou et al., 2006), probability) may be coupled with n. The proposed al- tensor method (Ghoshdastidar and Dukkipati, 2015), gorithm consists of two steps. The first step is a global linear programming (Li et al., 2016), to name a few. estimator that roughly recovers the hidden commu- Existing approaches, though, mainly focus on optimi- nity assignment to a certain precision level, and the zing a certain score function entirely based on the con- second step refines the estimated assignment based on nectivity of the observed hypergraph and do not view the underlying probabilistic model. This refine-after- it as a statistical estimation problem. initialize concept has also been used in graph clustering (Abbe and Sandon, 2015; Yun and Proutiere, In this paper, we investigate the community detection 2015; Gao et al., 2017) and ranking (Chen and Suh, problem in hypergraphs through the lens of statistical 2015). The proposed algorithm performs well on both decision theory. Our goal is to characterize the funda- synthetic data and real-world data. The experimental mental statistical limit and develop computationally results validate the theoretical finding that not only feasible algorithms to achieve it. As for the genera- is the refinement step critical in achieving the optimal tive model for hypergraphs, one natural extension of statistical limit, but it is significantly better to use the SBM model to a hypergraph setting is the hyper- hypergraphs for community detection problem rather graph stochastic block model (hSBM), where the pre- than graphs. sence of an order-h hyperedge e ⊂ V (i.e. |e| = h ≤ M, the maximum edge cardinality) is governed by a Ber- The characterized minimax risk in d-hSBM is an ex- noulli random variable with parameter θe and the pre- ponential rate, and the error exponent turns out to sence of different hyperedges are mutually indepen- be a linear combination of Rényi divergences of order dent. Despite the success of the aforementioned algo- 1/2. Each divergence term in the sum corresponds to rithms applied on many practical datasets, it remains a pair of community relations that would be confused open how they perform in hSBM since the the funda- with one another when there is only one misclassifi- mental limits have not been characterized and the pro- cation, and the weighted coefficient associated with it babilistic nature of hSBM has not been fully utilized. indicates the total number of such confusing patterns. Probabilistically, there may well be two or more mis- The hypergraph stochastic block model is first intro- classifications, with each confusing relation pair pertai- duced in (Ghoshdastidar and Dukkipati, 2014) as the ning to a Rényi divergence when analyzing the error planted partition model in random uniform hyper- probability. However, we demonstrate technically that graphs where each hyperedge has the same cardinality. these situations are all dominated by the error event The uniform assumption is later relaxed in a follow-up with a single misclassified node, which leaves out only work (Ghoshdastidar and Dukkipati, 2017) and a more the “neighboring” divergence terms in the asymptotic general hSBM with mixing edge orders is considered. In expression. The main technical challenge resolved in (Angelini et al., 2015), the authors consider the sparse this work is attributed to the fact that the community regime and propose a spectral method based on a ge- relations become much more complicated as the order neralization of non-backtracking operator. Besides, a d increases, meaning that more error events may arise weak consistency condition is derived in (Ghoshdasti- compared to the much simpler homogeneous graph SBM dar and Dukkipati, 2017) for hSBM by using the hyper- case. In the proof of achievability, we show that the re- graph Laplacian. Departing from SBM, an extension I (Eli) Chien, Chung-Yi Lin, I-Hsiang Wang

n finement step is able to achieve the fundamental limit x = (x1, . . . , xn) ∈ R . Also, Sn is the symmetric provided that the initilization step satisfies a certain group of degree n which contains all the permutations weak consistency condition. The converse part of the from [n] to itself. The following two natural conditions minimax risk follows a standard approach in statistics on this adjacency tensor come from hypergraph: by finding a smaller parameter space where we can analyze the risk. No self-loop: Al 6= 0 ⇐⇒ |{l1, . . . , ld}| = d.

Finally, we would like to note that an extended version Symmetry: Al = Alπ ∀π ∈ Sd. (especially when K is allowed to scale with n) of this paper can be found online (Chien et al., 2018).

In d-SBM,Al is a Bernoulli random variable with 2 PROBLEM FORMULATION success probability Ql for each l. The parameter tensor Q , [Ql] depends on the community assignment 2.1 Community Relations and forms a block structure. The block structure is characterized by a symmetric community connectivity d-dimensional tensor B ∈ [0, 1]K×···×K where Q = Let Kd , {ri} be the set of all possible community re- l B . Here the function σ(x) (σ(x ), . . . , σ(x )) is lations under d-hSBM and κd , |Kd|. Contrary to the σ(l) , 1 n dichotomy situation (same-community or not) concer- the community label vector assigned by σ for a node ning the appearance of an edge between two nodes in a vector x = (x1, . . . , xn). Let nk = {i|σ(i) = k} symmetric graph SBM, there is a multitude of commu- be the size of the k-th community for k ∈ [K]. In κd nity relations in d-hSBM. In order not to mess up with addition, let p = (p1, . . . , pκd ) ∈ (0, 1) where pi them as d increases, we use the idea of majorization to is to denote the success probability of the Bernoulli organize Kd with each ri in the form of a histogram. random variable that corresponds to the appearance Speciﬁcally, the histogram operator hist(·) is used to of a hyperedge with relation ri ∈ Kd. We assume, transform a vector r ∈ [K]d into its histogram vector without loss of generality, that pi ≥ pj ∀i < j. The hist(r). For convinience, we sort the histogram vector more concentrated a group is, the higher the chan- in descending order and append zero’s to make hist(r) ces that the members will be connected by a hype- remain d-dimensional. The notion of majorization is redge. To guarantee the solvability of weak recovery introduced as follows. For any a, b ∈ Rd, we say that in d-hSBM, we set the probability parameter p at least Pk ↓ Pk ↓ in the order of Ω(1/nd−1). Therefore, we would write a majorizes b, written as a b, if i=1 ai ≥ i=1 bi d d 1 P P ↓ p = d−1 (a1, . . . , aκ ) where ai = Ω(1) ∀i = 1, . . . , κd. for k = 1, . . . , d and ai = bi, where x ’s n d i=1 i=1 i d−1 are elements of x sorted in descending order. Observe The sparse regime Ω(1/n ) considered here is ﬁrst motivated in (Lin et al., 2017). Under 3- , the aut- that each relation ri ∈ Kd can be uniquely represen- hSBM hors in (Lin et al., 2017) consider p = Θ(1/n2), which ted by it histogram counterpart hi. We arrange Kd in is orderwise-lower than the one (i.e. Θ(1/n)) required majorization (pre)order such that hi hj ⇔ i < j. for partial recovery (Abbe and Sandon, 2015) and the Example 2.1 (K in 4-hSBM): |K | = κ = 5 with 4 4 4 minimax risk (Zhang and Zhou, 2016) under SBM. histogram vectors being The parameter space that we consider is a homogeneous and approximately equal-sized case where each nk Connecting n 0 n Relation Histogram is roughly . Formally speaking (let n ), Probability K , K

n r1 (all-same) h1 = (4, 0, 0, 0) p1 0 Θd(n, K, p, η) , (B, σ) | σ :[n] → [K], r2 (only-1-diff) h2 = (3, 1, 0, 0) p2 0 0 o r3 h3 = (2, 2, 0, 0) p3 nk ∈ [(1 − η)n , (1 + η)n ] ∀k ∈ [K] r4 (only-2-same) h4 = (2, 1, 1, 0) p4 r5 (all-diff) h5 = (1, 1, 1, 1) p5 where B has the property that Bσ(l) = pi if and only if 2.2 Probabilistic Model: d-hSBM hist(σ(l)) = hi. In other words, only the histogram of the community labels within a group matters when it In a d-uniform hypergraph, the adjacency relation is comes to connectivity. η is a parameter that controls indicated by a d-dimensional n×· · ·×n random tensor how much nk could vary. We assume the more interes- 1 A , [Al] (the size of each dimension being n), where ting case that η ≥ n0 where the community sizes are d l = (l1, . . . , ld) ∈ [n] is the access index of an element not restricted to be exactly equal. Interchangeably, we σ in the tensor. Let xπ , (xπ(1), . . . , xπ(n)) for a permu- would write l ∼ ri to indicate the community relation tation π ∈ Sn denote the permuted version of a vector within nodes l1, . . . , ld under the assignment σ. Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms

L 2.3 Performance Measure In Θd , each community takes on only 3 possible sizes and there are exactly n0 + 1 members in the commu- To gauge how good an estimator σ : G → [K]n is, we b nity where the first node belongs. We pick a σ0 in use the mismatch ratio as the performance measure to L Θd and construct a new assignment σ[σ0] based on the community detection problem. The un-permuted σ0: σ[σ0](i) = σ0(i) for 2 ≤ i ≤ n and σ[σ0](1) = 1 loss function is defined as `0(σ1, σ2) dH(σ1, σ2) 0 , n arg min2≤k≤K {nk = n }. In other words, σ[σ0] and σ0 where dH is the Hamming distance. It directly counts only disagree on the label of the first node. For each the proportion of misclassified nodes between an es- pair (ri, rj) ∈ Nd, we define the weighted coefficient timator and the ground truth. Concerning the issue of possible re-labeling, the mismatch ratio is defined σ0 σ[σ0] mrirj , l = (1, l2, . . . , ld) | l ∼ ri, l ∼ rj as the loss function which maximizes the agreements between an estimator and the ground truth after an as the number of how many ri-edges do we mistake as alignment by label permutation. rj-edges. Note that the above definition is independent L of the choice of σ0 ∈ Θd . `(σ,b σ) , min `0(σbπ, σ) π∈SK Example 3.1 (N4 in 4-hSBM): |N4| = 5 with elements

As convention, we use Rσ(σb) , Eσ`(σ,b σ) to denote the Relation Pair Combinatorial Number corresponding risk function. Finally, the minimax risk 0 0 for the parameter space Θd(n, K, p, η) under d-hSBM is n (r1, r2) mr1r2 3 denoted as n0 0 (r2, r3) mr2r3 2 n ∗ 0 n0 Rd , inf sup Rσ(σ) (r3, r4) mr3r4 n (K − 2) 2 0 b 0 σb (B,σ)∈Θd n 0 (r2, r4) mr2r4 2 (K − 2)n (r , r ) m n0K−2(n0)2 3 MAIN CONTRIBUTIONS 4 5 r4r5 2

Note: mr1r2 is the smallest while mr4r5 is the largest. ∗ For the case d = 2, the asymptotic minimax risk R2 is characterized in (Zhang and Zhou, 2016), which decays Here the asymptotic equality between two functions to zero exponentially fast as n → ∞. In addition, the f(n) and g(n), denoted as f g (as n → ∞), holds if (negative) exponent of R∗ is determined by n0 and the limn→∞ f(n)/g(n) = 1. The asymptotic optimal mini- 2 0 R´enyi divergence of order 1/2 between two Bernoulli max risk for the parameter space Θd(n, K, p, η) under distributions Ber(p) and Ber(q) d-hSBM is characterized by the following theorem. Theorem 3.1 (Main Theorem): Suppose as n → ∞, √ p p Ipq , −2 log pq + 1 − p 1 − q . X mrirj Ipipj → ∞ (2) It turns out the worst-case risk of our proposed al- i

4 TWO-STEP ALGORITHM Algorithm 1: Refinement Scheme Input: Adjacency tensor A ∈ {0, 1}n×···×n, 4.1 Refinement Step number of communities K, The refinement step (Algorithm 1) comprises two ma- initialization algorithm Alginit. jor parts. First, for each node u ∈ [n], we generate Local MLE: for u = 1 to n do an estimated assignment σbu of all nodes except u by Apply Alginit on A−u to obtain σu(v) ∀v 6= u. applying an initialization algorithm Alginit on the sub- b hypergraph without the vertex u. The sub-hypergraph Estimate entries of B using the sample mean Bb u. is represented by A−u, which is the (n−1)×· · ·×(n−1) Assign the label of node u based on sub-tensor of A when the u-th coordinate is removed in σ (u) = arg max L (σ , k; A) (3) each dimension. Then, the label of u under σbu is deter- bu bu bu mined by maximizing a local likelihood function. Spe- k∈[K] cifically, let us start with the global likelihood function end defined as follows. Let Consensus: Define σb(1) = σb1(1). For u = 2, . . . , n, define L(σ; A) , X X σb(u) = arg max {v|σb1(v) = k} ∩ {v|σbu(v) = σbu(u)} log piAl + log(1 − pi)(1 − Al) k∈[K] σ {i|ri∈Kd} {l|l∼ri} (4) denote the log-likelihood of an adjacency tensor A Output: Community assignment σb. when the hidden community structure is determined by σ. For each u ∈ [n], we use spectral clustering (Von Luxburg, 2007) with regula- Lu(σ, k; A) , rization (Chin et al., 2015). In particular, a modified X X version of the hypergraph Laplacian described below is log piAl +log(1−pi)(1−Al) employed. Let H = [Hve] be the |V| × |E| incidence {i|r ∈K } {l|l =u,l∼σ r } i d 1 i matrix, where each entry Hve is the indicator function to denote those likelihood terms pertaining to the u-th whether or not node v belongs to hyperedge e. Note node when its label is k. Since A is symmetric, we can that the incidence matrix H contains the same amou- assume without loss of generality that the first index ont of information as the the adjacency tensor A. Let ¯ in l is u. Based on the estimated assignment of the du denote the degree of the u-th node, and d be the other n − 1 nodes, we make use of the following local average degree across the hypergraph. The unnorma- Maximum Likelihood Estimation method lized hypergraph Laplacian is defined as

T σb(u) , arg max L(σ, k; A) L(A) , HH − D (5) k∈[K] to predict the label of u. While the parameter B that where D = diag(d1, . . . , dn) is a diagonal matrix re- governs the underlying random hypergraph model is presenting the degree distribution in that hypergraph unknown when evaluating the likelihood, we will use with adjacency tensor A and (·)T is the usual ma- Lb(σ; A) and Lbu(σu, ku; A) to denote the global and lo- trix transpose. Note that L can be thought of as an cal likelihood function with the true B replaced by its encoding of the higher-dimensional connectivity rela- estimated counterpart Bb and Bb u, respectively. Note tionship into a two-dimensional matrix. that the superscript u is to indicate the fact that the Before we directly apply the spectral method, high- u estimation Bb is calculated with node u taken out. degree nodes in the hypergraph is first trimmed to The final step of the refinement algorithm is to form ensure the performance of the clustering algorithm. a consensus through a majority neighbor voting. The Specifically, we use Aτ to denote the modification consensus step seeks for a jointly agreed community of A where all coordinates pertinent to the set assignment among n different estimated assignments {u ∈ [n] | du ≥ τ} are replaced with all-zero vectors. Let H and D be the corresponding incidence ma- {σbu : u ∈ [n]} derived since they would all be close to τ τ the ground truth up to some permutation. trix and degree matrix of Aτ . The spectrum we are looking for is the trimmed version of L, denoted as T 4.2 Spectral Clustering Tτ (L(A)) , Hτ Hτ − Dτ where the operator Tτ (·) is the trimming process with a degree threshold τ. Let

To devise a good initialization algorithm Alginit, we T TT n×K develop a hypergraph version of the unnormalized SVDK (Tτ (L(A))) , Ub = u1 ··· un ∈ R Community Detection in Hypergraphs: Optimal Statistical Limit and Eﬃcient Algorithms

denote the K leading singular vectors generated from Condition 5.1: There exists constants C0, δ > 0 and the singular value decomposition of the trimmed ma- a positive sequence γ = γn such that trix Tτ (L(A)). Also, Let λK be the K-th largest sin- −(1+δ) inf min σ {`(σu, σ) ≤ γn} ≥ 1 − C0n . 0 P b gular value of it. Note that in a conventional spe- (B,σ)∈Θd u∈[n] tral clustering method, each node is represented by a reduced K-dimensional row vector. The hyper- 5.1 Refinement Step graph spectral clustering algorithm is described in Al- gorithm 2. We have the following upper bound for the risk obtained by the refinement scheme, which serves as the achievability part to the minimax risk. Algorithm 2: Spectral Initialization T Theorem 5.1: As n → ∞, if Input: Vectors SVD (T (L(A))) = uT ··· uT , K τ 1 n X number of communities K, mr r Ip p → ∞ (6) q i j i j K i 0. Set S = [n]. and Condition 5.1 is satisfied for for k = 1 to K do γ = o(1). (7) Let tk = arg maxi∈S j ∈ S : kuj − uik2 < r . Then, with Algorithm 1, there exists a positive se- Set Cbk = j ∈ S : kuj − utk k2 < r . quence ζn → 0 as n → ∞ such that Label σb(i) = k ∀i ∈ Cbk. X Update S ← S \ Cbk. ∗ Rd ≤ exp − (1 − ζn) mrirj Ipipj . end i

Lemma 5.1: Suppose as n → ∞, mrirj Ipipj → ∞ 4.3 Time Complexity for each (ri, rj) ∈ Nd, and Condition 5.1 holds with γ satisfying (7) for some δ > 0. Then there exists a 3 0 Algorithm 2 has a time complexity of O(n ), the sequence ζn → 0 as n → ∞ and a constant C > 0 such bottleneck of which being the SVDK step. Still, the that computation of SVD could be done approximately in n u inf min σ min max Bb − Bsπ 2 0 P d s O(n log n) time with high probability (Yun and Prou- (B,σ)∈Θd u∈[n] π∈SK s∈[K] tiere, 2015) if we are only interested in the first k 0 o −(1+δ) ≤ ζn max (pi − pj) ≥ 1 − Cn . spectrums. As for the refinement scheme, the spar- (ri,rj )∈Nd sity of the underlying hypergraph can be utilized to reduce the complextiy since the whole network struc- Based on Lemma 5.1, the next lemma shows that the ture could be stored in the incidence matrix H equiva- local MLE method is able to achieve a risk that decays lently as in the d-dimensional adjacency tensor A. As exponentially fast. a result, the parameter estimation stage only requires Lemma 5.2: Suppose as n → ∞, mr r Ip p → ∞ for O(dm) where m = |E| is the total number of hype- i j i j each (ri, rj) ∈ Nd. If there are two sequences γ = o(1) redges realized. Similary, the time complexity would 0 and ζn = o(1), constants C, δ > 0 and permutations be O(Kdm) and O(Kn2) for the calculation of like- n {πu} ⊂ SK such that lihood function and the consensus step, respectively. u=1 n Hence, the overall complexity for Algorithm 1 and Al- u inf min Pσ `0((σu)πu , σ) ≤ γ, Bbs − Bsπ 3 2 (B,σ)∈Θ0 u∈[n] b gorithm 2 combined are O(n log n + nKm + Kn ) for d a constant order d. It further reduces to O(n3 log n) 0 o −(1+δ) ≤ ζ max (pi − pj) ≥ 1 − Cn . d−1 n in the sparse regime p = O(log n/n ) where m = (ri,rj )∈Nd O(n log n) with high probability. Then for the local estimator σu(u) (3), there exists a 00 b sequence ζn = o(1) such that 5 THEORETICAL GUARANTEES n o sup max Pσ (σu(u))πu 6= σ(u) ≤ (K − 1)· 0 b (B,σ)∈Θd u∈[n] We first consider the theoretical guarantee for Algo- rithm 1, which requires that the first-step algorithm 00 X −(1+δ) exp − (1 − ζn ) mrirj Ipipj + Cn . satisfy the following condition. i

Finally, we justify the usage of (4) as a consensus ma- 5.2 Spectral Clustering jority voting. Next, we show that our proposed spectral clustering Lemma 5.3 (Lemma 4 in (Gao et al., 2017)): For any algorithm achieves Condition 5.1. We have the follo- community assignments σ and σ0: [n] → [K], such that wing performance guarantee for Algorithm 2. for some constant C ≥ 1 Theorem 5.2: If 0 n min {u|σ(u) = k} , min {u|σ (u) = k} ≥ Ka1 k∈[K] k∈[K] CK 2 ≤ C1 (8) λK and for some sufficiently small C1 ∈ (0, 1) where p1 = a1 0 1 nd−1 . Apply Algorithm 2 with a sufficiently small con- min `0(σπ) < . ¯ π∈SK CK stant µ > 0 and τ = C2d for some sufficiently large 0 constant C2. For any constant C > 0, there exists 0 Define map ξ :[K] → [K] as some C > 0 depending only on C ,C2 and µ so that a 0 `(σ, σ) ≤ C 1 ξ(i) = arg max {u | σ(u) = k} ∩ {u | σ (u) = k} b λ2 k∈[K] K 0 with probability at least 1 − n−C . 0 for each i ∈ [K]. Then ξ ∈ SK and `0(σξ, σ) is equal 0 to minπ∈SK `0(σπ, σ) Our technical contribution here is to generalize the analysis on adjacency matrix for a graph to the hy- We are now ready to sketch the proof of Theorem 5.1. pergraph Laplacian (5) associated with a hypergraph. This is not a trivial work because now the entries in a hypergraph Laplacian are not independent any more. Sketch Proof of Theorem 5.1. First, we use the union Still, we successfully arrive at a similar expression as bound to upper bound the risk as follows. the lower bound under the d-hSBM model. 1 X Combining Algorithm 1 and Algorithm 2, we have the ` (σ, σ) ≤ {(σ (u)) 6= σ(u)} Eσ 0 b n Pσ bu πu following achievability part to Theorem 3.1. The key u∈[n] step is to further lower bound λK in Theorem 5.2 and CSS + Pσ π 6= πu demonstrate that it would satisfy Condition 5.1. P Theorem 5.3: Suppose mr r Ip p → CSS i 0 with Algorithm 2. It turns out that correct permutation by the consensus step. Compared we allow the observed hypergraph to be sparser (a lo- to analysis in graph SBM (Gao et al., 2017), we are dea- wer connecting probability) yet acquire higher misma- ling with more kinds of community relations and thus tch ratio. However, if we raise the probability p to more kinds of random variables. Such perplexity ma- the same order as considered in (Ghoshdastidar and kes the generalization from a homogeneous SBM with Dukkipati, 2017), Algorithm 1 is guaranteed to have a only two possible community relations more difficult (n0)2-times lower risk. In either case, we always have to analyze. a success probability converging faster to 1. Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms

6 EXPERIMENTAL RESULTS k=3 SnHP 0.06 Algo 2 Algo 1 The advantage of clustering with a hypergraph re- 0.05 presentation over traditional graph-based approaches 0.04 has been reported in the literature (Agarwal et al., 0.03 2005; Zhou et al., 2006). Here, we present a com- Mismatch Ratio 0.02 parative study of our two-step algorithm with exis- 0.01 ting clustering methods on hypergraphs, especially the

20 30 40 50 60 70 80 90 100 spectral non-uniform hypergraph partitioning algo- nPrime rithm (SnHP) in (Ghoshdastidar and Dukkipati, 2017) (a) K = 3 using the hypergraph Laplacian proposed in (Zhou et al., 2006) and the generalized tensor spectral method k=6

0.35 SnHP (GTS) in (Ghoshdastidar and Dukkipati, 2015). In Algo 2 Algo 1 order to have a fair comparison, we run different al- 0.3 gorithms on the same hypergraphs generated and cal- 0.25 culate the corresponding mismatch ratio as the per- 0.2 formance measure. We will not elaborate on how to Mismatch Ratio 0.15 choose a best way or define a proper way of embedding, 0.1 as the topic itself would require a whole line of research 0.05 and is beyond the scope of this paper. In what fol- 20 30 40 50 60 70 80 90 100 nPrime lows, Algo 2 refers to our first-step spectral clustering algorithm and Algo 1 refers to the combined two-step (b) K = 6 workflow (i.e. Algorithm 1 on top of Algorithm 2).

6.1 Synthetic Data Figure 1: Simulation Results on 3-hSBM. We implement different algorithms on generative 3- hSBM data. The parameter spaces considered are ho- ases. The performance gain should be more evident mogeneous and exactly equal-sized, which means that for a larger network. each community has the same number of members. This nodes-per-community parameter n0 scales from 6.2 Real-World Data 20, 30, ··· to 100, while the number of communities K varies from 2, 3, ··· to 10. We set the connecting The data analyzed in this work are obtained from the probability parameter p to be (60, 30, 10) · log n/n2 for UCI repository (Lichman, 2013), which is widely used each possible value of n = K · n0. Note that the order as a benchmark database that admits ground-truth of p is as prescribed for the sparse regime in Section 5. community labels. To perform clustering on a hyper- The choice of this particular triplet is to ensure that graph, we first embed the entities with various attribu- the generated hypergraphs are not too sparse. Em- tes into a hypergraph. One caveat is that the embed- pirically, the total number of realized hyperedges is ded hypergraphs are no longer homogeneous nor ap- roughly 4n log n to 5n log n. The performance under proximately equal-sized as assumed when deriving the each scenario, i.e. each pair of (K, n0), is averaged over theoretical guarantees. Nevertheless, the experimental 25 realizations of the random hypergraph model, which results show that our two-step algorithm does have a is large enough for the mismatch ratio to converge for performance that is comparable to or even better than all algorithms implemented. Figure 1 summarizes our existing methods on either graph or hypergraph mo- simulation results. dels. More thorough experimental results are given in the supplementary matrial. Except for the first few scenarios where the total number of nodes n are quite small, we can see that Algo 2 performs roughly as well as the SnHP algorithm. This References somewhat indicates that the weak consistency condition Condition 5.1 can also be satisfied with the hyper- Abbe, E. and C. Sandon (2015). “Community De- graph Laplacian proposed by (Zhou et al., 2006) as the tection in General Stochastic Block models: Fun- first step. Furthermore, the refinement scheme indeed damental Limits and Efficient Algorithms for Re- has a better performance over the spectral clustering covery”. In: 2015 IEEE 56th Annual Symposium on methods. Observe that the improvement due to the Foundations of Computer Science, pp. 670–688. second step becomes larger as K (and hence n) incre- I (Eli) Chien, Chung-Yi Lin, I-Hsiang Wang

Abbe, Emmanuel (2017). “Community detection and Gao, Chao et al. (2017). “Achieving Optimal Misclassi- stochastic block models: recent developments”. In: fication Proportion in Stochastic Block Models”. In: CoRR abs/1703.10146. Journal of Machine Learning Research 18.60, pp. 1– Abbe, Emmanuel, Afonso S Bandeira, and Georgina 45. Hall (2016). “Exact recovery in the stochastic block Ghoshdastidar, Debarghya and Ambedkar Dukkipati model”. In: IEEE Transactions on Information The- (2014). “Consistency of Spectral Partitioning of Uni- ory 62.1, pp. 471–487. form Hypergraphs under Planted Partition Model”. Agarwal, S. et al. (2005). “Beyond pairwise clus- In: Advances in Neural Information Processing Sys- tering”. In: 2005 IEEE Computer Society Confe- tems 27: Annual Conference on Neural Information rence on Computer Vision and Pattern Recognition Processing Systems 2014, pp. 397–405. (CVPR’05). Vol. 2, 838–845 vol. 2. – (2015). “A Provable Generalized Tensor Spectral Agarwal, Sameer, Kristin Branson, and Serge Belongie Method for Uniform Hypergraph Partitioning”. In: (2006). “Higher Order Learning with Graphs”. In: Proceedings of the 32nd International Conference on Proceedings of International Conference on Machine Machine Learning, ICML 2015, pp. 400–409. Learning (ICML). ICML ’06, pp. 17–24. – (2017). “Consistency of spectral hypergraph partiti- Ahn, Kwangjun, Kangwook Lee, and Changho Suh oning under planted partition model”. In: Ann. Sta- (2016). “Community Recovery in Hypergraphs”. In: tist. 45.1, pp. 289–315. Allerton Conference on Communication, Control Holland, Paul W., Kathryn Blackmond Laskey, and and Computing. UIUC. Samuel Leinhardt (1983). “Stochastic Blockmodels: Alon, Noga and Joel H Spencer (2004). The probabi- First Steps”. In: Social Networks 5.2, pp. 109–137. listic method. John Wiley & Sons. Lei, Jing, Alessandro Rinaldo, et al. (2015). “Consis- Andritsos, Periklis et al. (2004). “LIMBO: Scala- tency of spectral clustering in stochastic block mo- ble clustering of categorical data”. In: Internatio- dels”. In: The Annals of Statistics 43.1, pp. 215– nal Conference on Extending Database Technology. 237. Springer, pp. 123–146. Li, Pan et al. (2016). “Motif Clustering and Overlap- Angelini, Maria Chiara et al. (2015). “Spectral de- ping Clustering for Social Network Analysis”. In: tection on sparse hypergraphs”. In: Communication, arXiv preprint arXiv:1612.00895. Control, and Computing (Allerton), 2015 53rd An- Lichman, M. (2013). UCI Machine Learning Reposi- nual Allerton Conference on. IEEE, pp. 66–73. tory. url: http://archive.ics.uci.edu/ml. Chen, Yuxin and Changho Suh (2015). “Spectral Lin, C. Y., I. E. Chien, and I. H. Wang (2017). “On MLE: Top-K Rank Aggregation from Pairwise Com- the fundamental statistical limit of community de- parisons”. In: Proceedings of the 32nd Internatio- tection in random hypergraphs”. In: 2017 IEEE nal Conference on Machine Learning, ICML 2015, International Symposium on Information Theory pp. 371–380. (ISIT), pp. 2178–2182. Chien, I (Eli), Chung-Yi Lin, and I-Hsiang Wang Rozovsky, LV (2003). “A lower bound of large- (2018). “On the Minimax Misclassification Ratio of deviation probabilities for the sample mean under Hypergraph Community Detection”. In: ArXiv e- the Cramércondition”. In: Journal of Mathematical prints. arXiv: 1802.00926. Sciences 118.6, pp. 5624–5634. Chin, Peter, Anup Rao, and Van Vu (2015). “Stochas- Von Luxburg, Ulrike (2007). “A tutorial on spectral tic Block Model and Community Detection in Sparse clustering”. In: Statistics and computing 17.4, Graphs: A spectral algorithm with optimal rate of pp. 395–416. recovery.” In: COLT, pp. 391–423. Yun, Se-Young and Alexandre Proutiere (2015). “Op- Condon, Anne and Richard M. Karp (2001). “Algo- timal Cluster Recovery in the Labeled Stochastic rithms for Graph Partitioning on the Planted Parti- Block Model”. In: ArXiv e-prints. arXiv: 1510 . tion Model”. In: Random Structures and Algorithms 05956. 18.2, pp. 116–140. Zhang, Anderson Y. and Harrison H. Zhou (2016). Davis, Chandler and William Morton Kahan (1970). “Minimax rates of community detection in stochas- “The rotation of eigenvectors by a perturbation. tic block models”. In: Ann. Statist. 44.5, pp. 2252– III”. In: SIAM Journal on Numerical Analysis 7.1, 2280. pp. 1–46. Zhou, Dengyong, Jiayuan Huang, and Bernhard Duchenne, O. et al. (2011). “A Tensor-Based Algo- Schölkopf (2006). “Learning with hypergraphs: rithm for High-Order Graph Matching”. In: IEEE Clustering, classification, and embedding”. In: Transactions on Pattern Analysis and Machine In- NIPS. Vol. 19, pp. 1633–1640. telligence 33.12, pp. 2383–2395.