Clustering Based on PageRank

Yuuki Takai∗ Atsushi Miyauchi∗ RIKEN AIP University of Tokyo Tokyo, Japan Tokyo, Japan [email protected] [email protected] Masahiro Ikeda Yuichi Yoshida RIKEN AIP National Institute of Informatics Tokyo, Japan Tokyo, Japan [email protected] [email protected]

ABSTRACT natural object to model such relations. Indeed, if we represent coau- A hypergraph is a useful combinatorial object to model ternary or thorship network by a graph, an edge only indicates the existence of higher-order relations among entities. Clustering is co-authored paper, and information about co-authors in one paper a fundamental task in network analysis. In this study, we develop is lost. However, hypergraphs do not lose that information. two clustering algorithms based on personalized PageRank on hy- As with graphs, hypergraph clustering is also an important task, pergraphs. The first one is local in the sense that its goal is tofind and it is natural to use an extension of conductance for hyper- a tightly connected set with a bounded volume including a graphs [8, 32] to measure the quality of a vertex set as a cluster. specified vertex. The second one is global in the sense that itsgoal Although several methods have been shown to have theoretical is to find a tightly connected vertex set. For both algorithms, we guarantees on the conductance of the output vertex set [8, 16, 32], discuss theoretical guarantees on the conductance of the output none of them are efficient in practice. vertex set. Also, we experimentally demonstrate that our clustering To obtain efficient hypergraph clustering algorithms with athe- algorithms outperform existing methods in terms of both the solu- oretical guarantee, we utilize the graph clustering algorithm devel- tion quality and running time. To the best of our knowledge, ours oped by Andersen et al. [5]. Their algorithm is based on personalized are the first practical algorithms for hypergraphs with theoretical PageRank (PPR), which is the stationary distribution of a random guarantees on the conductance of the output set. walk that jumps back to a specified vertex with a certain probability in each step of the walk. Roughly speaking, their algorithm first CCS CONCEPTS computes the PPR from a specified vertex, and then returns a set of vertices with high PPR values. They showed a theoretical guarantee • Information systems → Web mining; • Mathematics of com- on the conductance of the returned vertex set. puting → Spectra of graphs. In this work, we propose two hypergraph clustering algorithms KEYWORDS by extending Andersen et al.’s algorithm [5]. The first one is local in hypergraph, clustering, personalized PageRank, Laplacian the sense that its goal is to find a vertex set of a small conductance and a bounded volume including a specified vertex. The second 1 INTRODUCTION one is global in the sense that its goal is to find a vertex set of the smallest conductance. Graph clustering is a fundamental task in network analysis, where Although there could be various definitions of on the goal is to find a tightly connected vertex set in a graph. Ithas hypergraphs and corresponding PPRs, we adopt the PPR recently been used in many applications such as community detection [12] introduced by Li and Milenkovic [22] using a hypergraph Lapla- and image segmentation [11]. Because of its importance, graph cian [8, 32]. Because the hypergraph Laplacian well captures the clustering has been intensively studied, and a plethora of clustering cut information of hypergraphs, the corresponding PPR can be used arXiv:2006.08302v1 [cs.DS] 15 Jun 2020 methods have been proposed. See, e.g., [12] for a survey. to find vertex sets with small conductance. There is no single best criterion to measure the quality of a vertex Because we can efficiently compute (a good approximation to) set as a cluster; however, one of the most widely used is the notion the PPR, our algorithms are efficient. Moreover, as with Ander- of conductance. Roughly speaking, the conductance of a vertex set sen et al.’s algorithm [5], our algorithms have theoretical guaran- is small if many edges reside within the vertex set, whereas only tees on the conductance of the output vertex set. To the best of a few edges leave it (see Section 3 for details). Many clustering our knowledge, no local clustering algorithm with a theoretical methods have been shown to have theoretical guarantees on the guarantee on the conductance of the output set is known for hyper- conductance of the output vertex set [3–6, 10, 14, 18, 20, 30]. graphs. Although, as mentioned previously, several algorithms have Although a graph is suitable to represent binary relations over been proposed for global clustering for hypergraphs [8, 16, 32], we entities, many relations, such as coauthorship, goods that are pur- stress here that our global clustering algorithm is the first practical chased together, and chemicals involved in metabolic reactions algorithm with a theoretical guarantee. inherently comprise three or more entities. A hypergraph is a more

∗Both authors contributed equally to this research. Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida

We experimentally demonstrate that our clustering algorithms t ≥ 0, which cannot be efficiently implemented on Turing machines. outperform other baseline methods in terms of both the solution Although our algorithm also computes sweep cuts, we need only quality and running time. consider one specific vector, i.e., the PPR. Our contribution can be summarized as follows. If we do not need any theoretical guarantee on the quality of • We propose local and global hypergraph clustering algorithms the output vertex set, we can think of a heuristic in which we first based on PPR for hypergraphs given by Li and Milenkovic [22]. construct a graph from the input hypergraph H = (V , E,w) and • We give theoretical guarantees on the conductance of the vertex then apply a known clustering algorithm for graphs on the resulting set output by our algorithms. graph, e.g., Andersen et al.’s algorithm [5]. There are two popular • Our algorithms are more efficient than existing algorithms with methods for constructing graphs: theoretical guarantees [8, 16, 32] (see Section 2 for the details). expansion [2, 34]: For each hyperedge e ∈ E and each • We experimentally demonstrate that our clustering algorithms u,v ∈ e with u , v, we add an edge uv of weight w(e) or outperform other baseline methods in terms of both the solution ( )/|e | w e 2 . quality and running time. Star expansion [35]: For each hyperedge e ∈ E, we introduce This paper is organized as follows. We discuss related work in a new vertex ve , and then for each vertex u ∈ e, we add an Section 2 and then introduce notions used throughout the paper in edge uve of weight w(e) or w(e)/|e|. Section 3. After discussing basic properties of PPR for hypergraphs The obvious drawback of the clique expansion is that it introduces in Section 4, we introduce a key lemma (Lemma 4) that connects Θ(k2) edges for a hyperedge of size k, and hence the resulting graph conductance and PPR in Section 5. On the basis of the key lemma, we is huge. The relation of these expansions and various Laplacians show our local and global clustering algorithms in Sections 6 and 7, defined for hypergraphs [7, 28, 33] were discussed in [1]. respectively. We discuss our experimental results in Section 8, and Ghoshdastidar and Dukkipati [15] proposed a spectral method conclude in Section 9. In appendix, we give the proofs which could utilizing a tensor defined from the input hypergraph, and analyzed not be included in the main body because of the page restrictions. its performance when the hypergraph is generated from a planted 2 RELATED WORK partition model or a . A drawback of their method is that the hypergraph must be uniform, i.e., each hyper- As previously mentioned, even if we restrict our attention to graph edge must be of the same size. Chien et al. [9] proposed another clustering methods with theoretical guarantees on the conductance statistical method that also assumes uniformity of the hypergraph, of the output vertex set, a plethora of methods have been pro- and analyzed its performance on a stochastic block model. posed [3–6, 10, 14, 18, 20, 30]. Some of them run in time nearly Other iterative procedures for clustering hypergraphs have also linear in the size of the output vertex set, i.e., we do not have to been proposed [21, 23, 29]. However, these methods do not have read the entire graph. any theoretical guarantee on the quality of the output vertex set. Although several hypergraph clustering algorithms have been proposed [2, 21, 23, 29, 34], most of them are merely heuristics and have no guarantees on the conductance of the output vertex set. 3 PRELIMINARIES V Í Indeed, we are not aware of any local clustering algorithms with a We say that a vector x ∈ R is a distribution if v ∈V x(v) = 1 and V theoretical guarantee, and to the best of our knowledge, there are x(v) ≥ 0 for every v ∈ V . For a vector x ∈ R and a set S ⊆ V , we Í only two global clustering algorithms with theoretical guarantees. define x(S) = v ∈S x(v). The first one8 [ , 32] is based on Cheeger’s inequality for hyper- We often use the symbols n and m to denote the number of graphs. Given a hypergraph H = (V , E,w), it first computes an vertices and number of (hyper)edges of the (hyper)graph we are approximation x ∈ RV to the second eigenvector of the normal- concerned with, which should be clear from the context. ized Laplacian LH (see Section 3.3 for details), and then returns a sweep cut, i.e., a vertex set of the form {v ∈ V : x(v) > τ } for some 3.1 Personalized PageRank ∈ R τ . We can guarantee that the output vertex set has conductance Let G = (V , E,w) be a weighted graph, where w : E → R+ is a p  roughly O ϕH logn , where ϕH is the conductance of the hyper- weight function over edges. For convenience, we define w(uv) = 0 graph (see Section 3.2). However, to run the algorithm we need to when uv < E. The degree of a vertex u ∈ V , denoted by du , is Í approximately compute the second eigenvalue of the normalized e ∈E:u ∈e w(e). We assume du > 0 for any u ∈ V . The V ×V V ×V Laplacian, which requires solving SDP; hence, is impractical. matrix DG ∈ R and AG ∈ R are defined The second algorithm was developed by Ikeda et al. [16]. Their as DG (u,u) = du and DG (u,v) = 0 foru , v, and AG (u,v) = w(uv). algorithm is based on simulating the heat equation: Lazy random walk is a stochastic process, which stays at the current vertex u ∈ V with probability 1/2 and moves to a neighbor v ∈ V dρt ∈ −LH (ρt ), ρ0 = χv , with probability 1/2 · w(uv)/du . The transition matrix of a lazy dt −1 random walk can be written as WG = (I + AG D )/2. where χ ∈ RV is a vector with χ (u) = 1 if u = v and 0 otherwise. G v v For a graph G = (V , E,w), a seed distribution s ∈ RV and a Then, they showed that one of the sweep cuts of the form {v ∈ V : parameter α ∈ (0, 1], the personalized PageRank (PPR) is a vector ρ (v) > τ } for some τ ∈ R and t ≥ 0, has conductance roughly t pr (s) ∈ RV [17] defined as the stationary distribution of a random p  α O ϕH , under a mild condition. A drawback of their algorithm walk that, with probability α, jumps to a vertex v ∈ V with proba- is that we need to calculate sweep cuts induced by ρt for every bility s(v), and with the remaining probability 1 − α, moves as the Hypergraph Clustering Based on PageRank

V lazy random walk. We can rephrase prα (s) ∈ R as the solution to

AAAChHichVG7SgNBFD1ZXzE+ErURbIJBsZAwmygGCwnYWOZhoqASdtdRl+yL3UkgBn9AW8XCSsFC/AA/wMYfsMgniGUEGwtvNguiYrzL7Jw5c8+dM3NVx9A9wVgrJPX1DwwOhYcjI6Nj49HYxGTZs2uuxkuabdjutqp43NAtXhK6MPi243LFVA2+pVbXO/tbde56um1tiobD90zl0NIPdE0RROXlSizBksyP+G8gByCBIHJ27AG72IcNDTWY4LAgCBtQ4NG3AxkMDnF7aBLnEtL9fY4TREhboyxOGQqxVfof0monYC1ad2p6vlqjUwwaLinjmGPP7I612RO7Zy/s489aTb9Gx0uDZrWr5U4lejpdfP9XZdIscPSl6ulZ4AAZ36tO3h2f6dxC6+rrx5ft4mphrjnPbtgr+b9mLfZIN7Dqb9ptnheuevhRyQu9GDVI/tmO36CcSsrpZCq/lMhmglaFMYNZLFA/VpDFBnIoUX2OM5zjQhqUFqW0tNxNlUKBZgrfQlr7BDrBj6I= AAAChXichVFNLwNRFD3GR6s+WmwkNo2G2GjuIDQ2JDaWtFoSGpkZr0w6X5mZNqnGHxBbLKxILMQP8ANs/AGL/gSxrMTGwu10EqHBnbx55513z33nvas6hu75RI0uqbunty8S7Y8NDA4NxxMjowXPrriayGu2Ybs7quIJQ7dE3td9Q+w4rlBM1RDbanmttb9dFa6n29aWX3NE0VQOLb2ka4rPVFam/USK0hREshPIIUghjA078YA9HMCGhgpMCFjwGRtQ4PG3CxkEh7ki6sy5jPRgX+AEMdZWOEtwhsJsmf+HvNoNWYvXrZpeoNb4FIOHy8okpuiZ7qhJT3RPL/Txa616UKPlpcaz2tYKZz9+Op57/1dl8uzj6Ev1p2cfJWQCrzp7dwKmdQutra8eXzZzy9mp+jTd0Cv7v6YGPfINrOqbdrspsld/+FHZC78YN0j+2Y5OUJhLy/Ppuc2F1GombFUUE5jEDPdjCatYxwbyXL+EM5zjQopIs9KCtNhOlbpCzRi+hbTyCcZyj9w= the following equation: 1AAAChHichVG7SgNBFD1ZXzE+ErURbIJBsZAwmygGCwnYWOZhoqASdtdRl+yL3UkgBn9AW8XCSsFC/AA/wMYfsMgniGUEGwtvNguiYrzL7Jw5c8+dM3NVx9A9wVgrJPX1DwwOhYcjI6Nj49HYxGTZs2uuxkuabdjutqp43NAtXhK6MPi243LFVA2+pVbXO/tbde56um1tiobD90zl0NIPdE0RROXlSizBksyP+G8gByCBIHJ27AG72IcNDTWY4LAgCBtQ4NG3AxkMDnF7aBLnEtL9fY4TREhboyxOGQqxVfof0monYC1ad2p6vlqjUwwaLinjmGPP7I612RO7Zy/s489aTb9Gx0uDZrWr5U4lejpdfP9XZdIscPSl6ulZ4AAZ36tO3h2f6dxC6+rrx5ft4mphrjnPbtgr+b9mLfZIN7Dqb9ptnheuevhRyQu9GDVI/tmO36CcSsrpZCq/lMhmglaFMYNZLFA/VpDFBnIoUX2OM5zjQhqUFqW0tNxNlUKBZgrfQlr7BDrBj6I= 10AAAChXichVFNLwNRFD3GR6s+WmwkNo2G2GjuIDQ2JDaWtFoSGpkZr0w6X5mZNqnGHxBbLKxILMQP8ANs/AGL/gSxrMTGwu10EqHBnbx55513z33nvas6hu75RI0uqbunty8S7Y8NDA4NxxMjowXPrriayGu2Ybs7quIJQ7dE3td9Q+w4rlBM1RDbanmttb9dFa6n29aWX3NE0VQOLb2ka4rPVFam/USK0hREshPIIUghjA078YA9HMCGhgpMCFjwGRtQ4PG3CxkEh7ki6sy5jPRgX+AEMdZWOEtwhsJsmf+HvNoNWYvXrZpeoNb4FIOHy8okpuiZ7qhJT3RPL/Txa616UKPlpcaz2tYKZz9+Op57/1dl8uzj6Ev1p2cfJWQCrzp7dwKmdQutra8eXzZzy9mp+jTd0Cv7v6YGPfINrOqbdrspsld/+FHZC78YN0j+2Y5OUJhLy/Ppuc2F1GombFUUE5jEDPdjCatYxwbyXL+EM5zjQopIs9KCtNhOlbpCzRi+hbTyCcZyj9w= 1 10

AAAChHichVHLTsJAFD1URcQHqBsTN0SCcWHIABqJC0PixiUPeSRISFsHbSht0xYSJP6AbjUuXGniwvgBfoAbf8AFn2BcYuLGhZfSxCgRbzOdM2fuuXNmrmSoimUz1vUIY+MT3knflH96ZnYuEJxfKFh605R5XtZV3SxJosVVReN5W7FVXjJMLjYklRel+m5/v9jipqXo2r7dNnilIR5pSk2RRZuoTKIaDLMocyI0DGIuCMONtB58xAEOoUNGEw1waLAJqxBh0VdGDAwGcRV0iDMJKc4+xyn8pG1SFqcMkdg6/Y9oVXZZjdb9mpajlukUlYZJyhAi7IXdsx57Zg/slX3+Wavj1Oh7adMsDbTcqAbOlnIf/6oaNNs4/laN9GyjhqTjVSHvhsP0byEP9K2Tq15uOxvprLJb9kb+b1iXPdENtNa7fJfh2esRfiTyQi9GDYr9bscwKMSjsUQ0ntkIp5Juq3xYxgrWqB9bSGEPaeSpPsc5LnApeIV1ISFsDlIFj6tZxI8Qdr4APwGPpA== 3AAAChHichVHLTsJAFD1URcQHqBsTN0SCcWHIABqJC0PixiUPeSRISFsHbSht0xYSJP6AbjUuXGniwvgBfoAbf8AFn2BcYuLGhZfSxCgRbzOdM2fuuXNmrmSoimUz1vUIY+MT3knflH96ZnYuEJxfKFh605R5XtZV3SxJosVVReN5W7FVXjJMLjYklRel+m5/v9jipqXo2r7dNnilIR5pSk2RRZuoTKIaDLMocyI0DGIuCMONtB58xAEOoUNGEw1waLAJqxBh0VdGDAwGcRV0iDMJKc4+xyn8pG1SFqcMkdg6/Y9oVXZZjdb9mpajlukUlYZJyhAi7IXdsx57Zg/slX3+Wavj1Oh7adMsDbTcqAbOlnIf/6oaNNs4/laN9GyjhqTjVSHvhsP0byEP9K2Tq15uOxvprLJb9kb+b1iXPdENtNa7fJfh2esRfiTyQi9GDYr9bscwKMSjsUQ0ntkIp5Juq3xYxgrWqB9bSGEPaeSpPsc5LnApeIV1ISFsDlIFj6tZxI8Qdr4APwGPpA== 3 prα (s) = αs + (1 − α)WG prα (s).

AAAChHichVG7SgNBFD2uGt8maiPYiEGxkDBrDAYLCdhYJmoeEEV210kc3Owuu5NADP6AtoqFlYKF+AF+gI0/YJFPEMsINhbebBZExXiX2Tlz5p47Z+bqjik8yVizR+nt6w8NDA4Nj4yOjYcjE5M5z666Bs8atmm7BV3zuCksnpVCmrzguFyr6CbP60cb7f18jbuesK0dWXf4XkUrW6IkDE0SlUnsR6IsxvyY/Q3UAEQRRNqOPGAXB7BhoIoKOCxIwiY0ePQVoYLBIW4PDeJcQsLf5zjBMGmrlMUpQyP2iP5lWhUD1qJ1u6bnqw06xaThknIW8+yZ3bEWe2L37IV9/Fmr4ddoe6nTrHe03NkPn05vv/+rqtAscfil6upZooSk71WQd8dn2rcwOvra8WVre21rvrHAbtgr+b9mTfZIN7Bqb8Zthm9ddfGjkxd6MWqQ+rMdv0FuOabGY8uZlWgqGbRqEDOYwyL1YxUpbCKNLNXnOMM5LpSQsqTElUQnVekJNFP4Fsr6J0NBj6Y= AAAChHichVHLTsJAFD1URcQHqBsTN0SCcWHIABqJC0PixiUPeSRISFsHbSht0xYSJP6AbjUuXGniwvgBfoAbf8AFn2BcYuLGhZfSxCgRbzOdM2fuuXNmrmSoimUz1vUIY+MT3knflH96ZnYuEJxfKFh605R5XtZV3SxJosVVReN5W7FVXjJMLjYklRel+m5/v9jipqXo2r7dNnilIR5pSk2RRZuoTLIaDLMocyI0DGIuCMONtB58xAEOoUNGEw1waLAJqxBh0VdGDAwGcRV0iDMJKc4+xyn8pG1SFqcMkdg6/Y9oVXZZjdb9mpajlukUlYZJyhAi7IXdsx57Zg/slX3+Wavj1Oh7adMsDbTcqAbOlnIf/6oaNNs4/laN9GyjhqTjVSHvhsP0byEP9K2Tq15uOxvprLJb9kb+b1iXPdENtNa7fJfh2esRfiTyQi9GDYr9bscwKMSjsUQ0ntkIp5Juq3xYxgrWqB9bSGEPaeSpPsc5LnApeIV1ISFsDlIFj6tZxI8Qdr4ASaGPqQ== 5AAAChHichVG7SgNBFD2uGt8maiPYiEGxkDBrDAYLCdhYJmoeEEV210kc3Owuu5NADP6AtoqFlYKF+AF+gI0/YJFPEMsINhbebBZExXiX2Tlz5p47Z+bqjik8yVizR+nt6w8NDA4Nj4yOjYcjE5M5z666Bs8atmm7BV3zuCksnpVCmrzguFyr6CbP60cb7f18jbuesK0dWXf4XkUrW6IkDE0SlUnsR6IsxvyY/Q3UAEQRRNqOPGAXB7BhoIoKOCxIwiY0ePQVoYLBIW4PDeJcQsLf5zjBMGmrlMUpQyP2iP5lWhUD1qJ1u6bnqw06xaThknIW8+yZ3bEWe2L37IV9/Fmr4ddoe6nTrHe03NkPn05vv/+rqtAscfil6upZooSk71WQd8dn2rcwOvra8WVre21rvrHAbtgr+b9mTfZIN7Bqb8Zthm9ddfGjkxd6MWqQ+rMdv0FuOabGY8uZlWgqGbRqEDOYwyL1YxUpbCKNLNXnOMM5LpSQsqTElUQnVekJNFP4Fsr6J0NBj6Y= 8AAAChHichVHLTsJAFD1URcQHqBsTN0SCcWHIABqJC0PixiUPeSRISFsHbSht0xYSJP6AbjUuXGniwvgBfoAbf8AFn2BcYuLGhZfSxCgRbzOdM2fuuXNmrmSoimUz1vUIY+MT3knflH96ZnYuEJxfKFh605R5XtZV3SxJosVVReN5W7FVXjJMLjYklRel+m5/v9jipqXo2r7dNnilIR5pSk2RRZuoTLIaDLMocyI0DGIuCMONtB58xAEOoUNGEw1waLAJqxBh0VdGDAwGcRV0iDMJKc4+xyn8pG1SFqcMkdg6/Y9oVXZZjdb9mpajlukUlYZJyhAi7IXdsx57Zg/slX3+Wavj1Oh7adMsDbTcqAbOlnIf/6oaNNs4/laN9GyjhqTjVSHvhsP0byEP9K2Tq15uOxvprLJb9kb+b1iXPdENtNa7fJfh2esRfiTyQi9GDYr9bscwKMSjsUQ0ntkIp5Juq3xYxgrWqB9bSGEPaeSpPsc5LnApeIV1ISFsDlIFj6tZxI8Qdr4ASaGPqQ== 5 8 Using this equation, we can also define PPR for any vector s ∈ RV . For later convenience, we rephrase PPR in terms of Laplacian. For V ×V a graph G = (V , E,w), its Laplacian LG ∈ R and (random-walk) Figure 1: An example of a hypergraph H and the graph Hx . V ×V V normalized Laplacian LG ∈ R are defined as LG = DG − AG The values next to vertices represent a vector x ∈ R . L −1 − −1 and G = LG DG = I AG DG , respectively. Then, we have RV WG = I − LG /2. It is easy to check that the PPR prα (s) satisfies the 2 as AH = DH − LH , or more formally, x 7→ {DH x − y | y ∈ equation ( )} V → RV 7→ ( ( −1 ))/ LH x and WH : R 2 as x x + AH DH x 2. Note  1 − α  that when H is a graph, each of LH (x), LH (x), AH (x), and WH (x) I + L pr (s) = s. (1) 2α G α defined above is a singleton consisting of the vector obtained by applying the corresponding matrix to x. 3.2 Hypergraphs To understand how the Laplacian LH acts on x when H is not a ∈ RV In this section, we introduce several notions for hypergraphs. We graph, suppose that the entries of the vector x are pairwise distinct. Then for each hyperedge e ∈ E, we have b = χ − χ , note that these notions can also be used for graphs, because a e se te ( ) ( ) hypergraph is a generalization of a graph. where se = argmaxv ∈e x v and te = argminv ∈e x v . Then, we ( , , ) Let H = (V , E,w) be a (weighted) hypergraph, where w : E → construct a graph Hx = V Ex wx on the vertex set V by, for each hyperedge e ∈ E, adding the edge sete of weight w(e) and R+ is a weight function over hyperedges. The degree of a vertex Í the self- of weight w(e) at v for each v ∈ e \{se , te }. Note u ∈ V , denoted by du , is e ∈E:u ∈e w(e). We assume du > 0 for any V ×V that the degree of a vertex is unchanged between H and Hx . Then, u ∈ V . The degree matrix DH ∈ R is defined as DH (u,u) = LH (x) = {LH x}, where LH is the Laplacian of the graph Hx . An du and DH (u,v) = 0 for u , v. Let S ⊆ V be a vertex set and x x H S = V \S. We define the set of the boundary hyperedges ∂S by example of the construction of x is shown in Figure 1. n o For general x ∈ RV , there could be many choices for b (e ∈ E) ∈ | ∩ ∅ ∩ ∅ ◦ e ∂S = e E e S , and e S , . We define S as the interior b ∈ b⊤x e ∈ E in (2). For any choice of e argmaxb ∈Be for each , we of S, that is, v ∈ S◦ if and only if v ∈ S and v < e for any e ∈ ∂S. Í ⊤ can define a graph Hx = (V , Ex ,wx ) so that LHx = e ∈E w(e)bebe . (S) We define cut as the total weight of hyperedges intersect- We remark that wx can output negative values naturally. However, ing both S and S, that is, cut(S) = Í w(e). We say that H is e ∈∂S it is easy to see that there exists a graph Hex = (V , Eex ,wex ) such that connected if cut(S) > 0 for every ∅ S V . ⊊ ⊊ wx is non-negative and L x = LH x. This Hex is the same as ( ) = Í e Hex x Next, we define the volume of a vertex set S as vol S v ∈V dv . the graph introduced in [8]. In the following, we assume that when Then, we define the conductance of S as we choose a graph Hx , the weight function wx is non-negative. cut(S) Following (1), we define the PPR pr (s) of H as a solution (if ϕH (S) = . α min{vol(S), vol(V \ S)} exists) to ( ) We can say that a smaller ϕH S implies that the vertex set S is a   ( ) 1 − α better cluster. The conductance of H is ϕH = min∅⊊S⊊V ϕH S . I + LH (x) ∋ s. (3) V 2α Finally for S ⊆ V , we define the distribution πS ∈ R as ( dv ∈ , We can observe that pr (s) = s when α = 1 and pr (s) → π vol(S) if v S α α V πS (v) = → 0 otherwise. when α 0. In Section 4, we show that (3) indeed has a solution and it is unique for any s ∈ RV and α ∈ (0, 1]. We define χv as the indicator vector for v ∈ V , i.e., χv = π{v }. 3.4 Sweep cuts 3.3 Personalized PageRank for hypergraphs For a hypergraph H = (V , E,w) and a vector x ∈ RV , we say that a In this section, we explain the notion of PPR for hypergraphs re- set of the form {v ∈ V | x(v)/d > τ } for some τ ∈ R is a sweep cently introduced by Li and Milenkovic [22]. v cut induced by x. We start by defining the Laplacian for hypergraphs and then To discuss local clustering, we consider the minimum conduc- use (1) to define PPR. For a hypergraph H = (V , E,w), the (hyper- V RV tance of a sweep cut with a restriction on the volume. First, we graph) Laplacian LH : R → 2 of H is defined as order the vertices v1,v2,...,vn in V so that ( ) Õ L (x) = w(e)b b⊤x b ∈ argmaxb⊤x ⊆ RV , (2) x(v ) x(v ) x(v ) H e e e 1 ≥ 2 ≥ · · · ≥ n . e ∈E b ∈Be dv1 dv2 dvn where Be for e ∈ E is the convex hull of the vectors {χu − χv | u,v ∈ e} [8, 31, 32]. Note that LH is no longer a matrix, as opposed Then, we can represent sweep cuts explicitly as to the Laplacian of graphs. We then define the normalized Laplacian V x { }( ) L V → R 7→ ( −1 ) V → Sj = v1,v2,...,vj j = 1, 2,...,n . H : R 2 as x LH DH x . We also define AH : R Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida

x ∅ ∈ ( / ] −1/2 L −1/2 −1/2 We set S0 = for convenience. For µ 0, 1 2 , let ℓµ be the where es = D s and H = D LH D . To see this, note that ( x ) < · ( ) ≤ ( x ) 7→ − 1/2∂ ( ) unique integer such that vol Sℓ −1 µ vol V vol Sℓ . Then, the operator (5) can be rewritten as the operator x D Q xe , µ µ −1/2 µ ( ) µ ( ) { ( x ) | } where xe = D x and ∂Q(xe) is the sub-differential of Q(xe) with we define ϕH x as ϕH x = min ϕH Sj j = 1, 2, . . . , ℓµ . respect to the variable xe. Then, the differential inclusion (4) can be 4 PERSONALIZED PAGERANK FOR written as HYPERGRAPHS dρet ∈ −∂Q(ρet ) (6) In this section, we discuss basic properties and the computational dt aspects of the PPR for hypergraphs. The proofs of the theorem and −1/2 for ρet = D ρt and by [27, Theorem 6.9], the Euler sequence propositions in this section are given in Appendix A. { }∞ ρeti i=1 for (6) converges to an optimal solution ρe∞, i.e., a solution ∈ ( ) Í ∥ − ∥2 ∞ satisfying 0 ∂Q ρe∞ , under the assumption ρeti+1 ρeti < . 4.1 Basic properties 1/2 Then, ρ∞ = D ρe∞ is nothing but the PPR prα (s). Here we introduce some properties of the PPR. We first argue that (3) has a solution and it is unique for any s ∈ RV and α ∈ (0, 1]. 5 PERSONALIZED PAGERANK AND Theorem 1. For any s ∈ RV and α ∈ (0, 1], there exists a unique CONDUCTANCE solution to (3), and hence the PPR prα (s) is well-defined. Moreover, if In this section, we introduce the following lemma, which extends s is a distribution, then so is prα (s). the result in [5] to hypergraphs and to the local setting. We prove this lemma in Appendix B. We next investigate the continuity of prα (s) with respect to α. V ∈ RV ( ) Lemma 4. Let s ∈ R be a distribution and µ ∈ (0, 1/2]. If there Proposition 2. For any s , the PPR prα s is continuous p with respect to α ∈ (0, 1]. is a vertex set S ⊆ V and a constant δ ≥ 4/ vol(V ) such that vol(S)/vol(V ) ≤ µ and prα (s)(S) − πV (S) > δ, then we have The following proposition shows that the value of PPR at each r vertex can be bounded by the value of the distribution π . 24α log(4/δ) V µ ( ( )) ϕH prα s < . (7) Proposition 3. Let H = (V , E,w) be a connected hypergraph. δ Then for any α ∈ (0, 1] and a vertex u ∈ V , we have This lemma indicates that if the PPR prα (s) is far from the sta- ( tionary distribution πV on some vertex set S with a small volume, ≥ πV (u) if v = u, prα (χu )(v) then there is a sweep cut induced by prα (s) with small volume ≤ πV (v) if v , u. and conductance. Thus, we can reduce the clustering problem to showing the existence of a vertex set of a large PPR mass. 4.2 Computation It is not difficult to show that we can compute the PPR in polynomial 6 LOCAL CLUSTERING time by directly solving (3) using the techniques developed in [13]. For a hypergraph H = (V , E,w), v ∈ V , and µ ∈ (0, 1/2], we define However, their algorithm involves the ellipsoid method, which is inefficient in practice. Hence, we consider a different approach here. µ ( ) ϕH,v = min ϕH C . (8) ∅⊊C⊊V ,v ∈C, Lemma 15 in Appendix A indicates that prα (s) is a stationary vol(C)≤µ·vol(V ) solution to the following differential equation: In the local clustering problem, given a hypergraph H = (V , E,w) dρt ∈ β(s − ρt ) − (1 − β)LH (ρt ), (4) and a vertex v ∈ V , the goal is to find a vertex set containing v dt whose volume is bounded by µ · vol(V ) with a conductance close to µ where β := 2α/(1 + α). From the fact that LH is a maximal mono- ∗ ϕ , . Let Cv be an arbitrary minimizer of (8). We define wmin = tone operator [16], it is easy to show that the operator H v mine ∈E {w(e)}, and wmax = maxe ∈E {w(e)}. In this section, we x 7→ β(s − x) − (1 − β)LH (x) (5) show the following: is also maximally monotone. This implies that (4) has a unique Theorem 5. Let H = (V , E,w) be a hypergraph, v ∈ V be a vertex solution for any given initial state ρ . Moreover, if the hypergraph H ∗ ◦ 0 withv ∈ (Cv ) , and µ ∈ (0, 1/2]. Our algorithm (Algorithm 1) returns ρ is connected, then the solution t of (4) converges to the stationary a vertex set S ⊆ V withv ∈ S, vol(S) ≤ µ·vol(V )+maxv ∈V {dv }, and solution pr (s) as t tends to infinity. Hence, we can compute the q µ  α ϕ (S) = O ϕ inO ((T + Í |e|) log(w Í |e|/w )) PPR by simulating (4). H H,v H e ∈E max e ∈E min It is not difficult to show that we can simulate (4) on a Turing time, where TH is the time to compute PPR on H. machine with a certain error guarantee by time discretization using We stress here again that Theorem 5 is the first local cluster- the techniques in [16]. In our experiments (Section 8), however, we ing algorithm with a theoretical guarantee on the conductance. simply simulated it using the Euler method for practical efficiency. If we use the Euler method to compute PPR, then we have TH = We remark that simulating (4) by the Euler method is equivalent Í O( e ∈E |e|/∆), where ∆ is the time span for one step. Hence, the to running the gradient descent algorithm on the convex function Í Í total time complexity is O( e ∈E |e| log(wmax e ∈E |e|/wmin)/∆). ∗ ◦ β ⊤ We also remark that we can relax the assumption v ∈ (C ) , if we Q(x) = ∥x − s∥2 + (1 − β)x L (x), v e 2 e e e H e admit using the values of the PPR in the assumption. Hypergraph Clustering Based on PageRank

  Algorithm 1: Local clustering term A D−1 − I (p)(C) can be rewritten as Hp Hp 1 Procedure LocalClustering(H,v, µ)   Õ Õ wp (uv) Õ Input : H = (V , E,w) v ∈ V −1 − ( )( ) ( ) − ( ) A hypergraph , a vertex , and AHp DH I p C = p v p u p dv µ ∈ (0, 1/2] u ∈C v ∈V u ∈C Output: A vertex set S ⊆ V w (uv) w (uv) i Õ Õ p Õ Õ p n wmin(1+ϵ) o = p(v) − p(u) 2 A ← Í | i ∈ Z+ for a constant ϵ ∈ (0, 1); cand wmax ∈ |e | dv du e E u ∈C v ∈V u ∈C v ∈V 3 for each α ∈ Acand do Compute prα (χv ); Õ Õ ( ) − ( ) 4 { ( ) ∈ Ð µ ( ( ))} = p u,v p u,v return argmin ϕH S : S α ∈Acand sweep prα χv . (u,v)∈in(C)\out(C) (u,v)∈out(C)\in(C) Õ Õ Õ wp (uv) ≤ p(u,v) = p(u). (10) 6.1 Algorithm d u ∈C u We describe our algorithm for local clustering. Given a hypergraph (u,v)∈in(C)\out(C) v ∈C H = (V , E,w) and a vertex v ∈ V , we first construct a set Acand = The claim follows by combining (9) and (10). □ i n wmin(1+ϵ) o Í | i ∈ Z+ ∩ [0, 1] for a constant ϵ ∈ (0, 1). As the wmax e∈E |e | Next, we show that the amount of the leak of PPR from a vertex minimum conductance of a set in a connected hypergraph is lower- C wmin set can be bounded using its conductance. bounded by Í , we can guarantee that there exists α ∈ wmax e∈E |e | µ Lemma 7. Let C ⊆ V be a vertex set with vol(C) ≤ vol(V )/2. Then, Acand such that α ≤ ϕ ≤ (1 + ϵ)α. Then for each α ∈ Acand, H,v for any α ∈ (0, 1] and a vertex v ∈ C◦, we have we compute the PPR prα (χv ). Finally, we return the set S with the minimum conductance in 1 pr (χv )(C) ≤ ϕ (C). α 4α H Ø µ sweep (prα (χv )), Proof. Let p = prα (χv ). By Lemma 6, we have α ∈Acand ′ ′ 2α Õ Õ wp (u v ) µ ( ) { x x } ( ) ≤ ( ′) where sweep x = S1 ,..., Sℓ . Proposition 3 implies that the p C p u . (11) µ 1 − α du′ u′ ∈C ′ ∈ normalized PPR value prα (χv )(v)/dv takes the maximum value at v C v; hence the returned set S includes the specified vertex v. The pseu- As v ∈ C, by Proposition 3, the RHS of (11) is at most docode of our local clustering algorithm is given in Algorithm 1. ′ ′ ′ ′ Õ Õ wp (u v ) d ′ Õ wp (vv ) Õ wp (vv ) d u + p(v)− v . ′ ′ du vol(V ) dv dv vol(V ) 6.2 Leak of personalized PageRank u ∈C v′ ∈C v′ ∈C v′ ∈C ∗ As we want to apply Lemma 4 on Cv , we need to bound the amount ◦ ′ ′ ∗ From the assumption v ∈ C , wp (vv ) is zero for any v ∈ C. Hence, of the leak of PPR from Cv . To this end, we start by showing the we have following upper bound. ′ ′ 1 − α Õ Õ wp (u v ) 1 − α vol(C) p(C) ≤ ≤ ϕH (C) Lemma 6. Let C ⊆ V be a vertex set, w ∈ C, and Hp = (V , Ep ,wp ) 2α ′ vol(V ) 2α vol(V ) u ∈C v′ ∈C be the graph defined for p = pr (χw ) as shown in Section 3.3. Then α 1 − α 1 for any α ∈ (0, 1], we have ≤ ϕ (C) ≤ ϕ (C). 4α H 4α H 1 − α Õ Õ wp (uv) The second inequality follows from p(C) ≤ p(u), α d 2 ∈ u Õ Õ ′ ′ Õ u C v ∈C wp (u v ) ≤ w(e). (12) ′ u ∈C v′ ∈C e ∈∂C Proof. By the definition of Hp , the PPR p satisfies □ p(C) = α χ (C) + (1 − α)W (p)(C) w Hp 6.3 Proof of Theorem 5 1  −1  = (1 − α)p(C) + (1 − α) AH D − I p(C). First, we discuss the conductance of the output vertex set. 2 p Hp ∗ ◦ Lemma 8. Let µ ∈ (0, 1/2]. Suppose v ∈ (Cv ) . Then for any Here, the second equality follows from χw (C) = 0 and WH = ∗  p ϵ ∈ (0, 1) and α ∈ (0, 1] with α ≤ ϕH Cv ≤ (1 + ϵ)α, we have  −1  I + AH D − I /2. Then, we have s p Hp   µ 8 ∗ 2 ϕ (pr (χv )) < √ 6ϕH (Cv ) log . 1 − α   H α 3 − ϵ − 4µ 3 − ϵ − 4µ p(C) = A D−1 − I p(C). (9) Hp Hp 2α ∗ Proof. Let C = Cv and p = prα (χv ). To apply Lemma 4, we For a subset S ⊆ V , we define two sets of ordered pairs of vertices: want to lower-bound p(C) − πV (C). By the fact that the PPR p is in(S) = {(u,v) ∈ V × S} and out(S) = {(u,v) ∈ S × V }. For an a distribution by Theorem 1 and Lemma 7 for α ≥ ϕH (C)/(1 + ϵ), w (uv) ordered pair (u,v) ∈ V ×V , we set p(u,v) := p p(u). Then, the we have p(C) − π (C) = 1 − p(C) − π (C) ≥ 1 − (1 + ϵ)/4 − µ = du V V Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida

Algorithm 2: Global clustering graph. We discuss two sufficient conditions of this assumption in Lemmas 13 and 14 in Section 7.4. These conditions show that the 1 Procedure GlobalClustering(H) Input : A hypergraph H = (V , E,w) assumption of Lemma 10 is quite mild. Output: A vertex set S ⊆ V Proof of Lemma 10. We set pw = prα (χw ) and let w0 ∈ C 2 for v ∈ V do Sv ← LocalClustering(H,v, 1/2); Í be an arbitrary vertex that maximizes wp (uv). Then, by { ( ) ∈ { ∈ }} v ∈C w 3 return argmin ϕH S : S Sv : v V . Lemma 6, we have w (uv)  −1  Õ Õ pw0 ∗ AH DH − I (pw )(C) ≤ pw (u). (14) (3 − ϵ − 4µ)/4. Applying Lemma 4 with C = Cv , δ = (3 − ϵ − 4µ)/4, du u ∈C v ∈C and α ≤ ϕH (C), we obtain By taking the average of the inequality (14) using the distribution s   µ 8 µ 2 ϕ (p) < √ 6ϕ log . □ πC , we obtain the following inequality: H 3 − ϵ − 4µ H,v 3 − ϵ − 4µ Õ  −1  πC (w) AH DH − I (pw )(C) Proof of Theorem 5. The guarantee on conductance holds by w ∈C ∈ Lemma 8 as some α Acand satisfies the condition of Lemma 8 w (uv) ! Õ Õ pw0 Õ with ϵ ∈ (0, 1). ≤ πC (w)pw (u). (15) | | du Now, we analyze the time complexity. As the size of Acand = u ∈C v ∈C w ∈C O(log(w Í |e|/w )), it takesO(T log(w Í |e|/w )) max e ∈E min H max e ∈E min Now, we have time to compute all the PPRs. Computing the conductance values Í   of the sweep cuts induced by a particular vector takes O( e ∈E |e|) Õ Õ 1 − α  −1  π (w)pw (C) = π (w) A D − I pw (C) time. Hence, Line 4 takesO(Í |e| log(w Í |e|/w )) time. C C 2α H H e ∈E max e ∈E min w ∈C w ∈C Combining those, we obtain the claimed time complexity. □ (by (9)) w (uv) ! 7 GLOBAL CLUSTERING 1 − α Õ Õ pw0 Õ ≤ πC (w)pw (u) (by (15)) In the global clustering problem, given a hypergraph H, the goal is 2α du u ∈C v ∈C w ∈C to find a vertex set with conductance close to ϕH . In this section, 1 − α Õ Õ wpw (uv) d we show the following: ≤ 0 u (by the assumption) 2α du vol(C) Theorem 9. Let H = (V , E,w) be a hypergraph. Our algorithm u ∈C v ∈C p  1 − α 1 (Algorithm 2) returns a vertex set S ⊆ V with ϕH (S) = O ϕH ≤ ϕ (C) ≤ ϕ (C). (by (12)) 2α H 2α H in O ((T + Í |e|)n log(w Í |e|/w )) time, whereT is H e ∈E max e ∈E min H □ the time to compute PPR on H. 7.1 Algorithm 7.3 Proof of Theorem 9 C ⊆ V α ∈ ( , ] C ⊆ C Our global clustering algorithm is quite simple. That is, given a hy- For a subset and 0 1 , we define a subset α as   pergraph H = (V , E,w), we simply call LocalClustering(H,v, 1/2) ϕH (C) Cα = v ∈ C pr (χv )(C) ≤ . for every v ∈ V , and then return the best returned set in terms of α α conductance. The pseudocode is given in Algorithm 2. We here show the following from which Theorem 9 easily follows. ∗ ∗ 7.2 Leak of personalized PageRank Theorem 11. Let C ⊆ V be a set with ϕH = ϕH (C ) and assume ∗ ∗ Let C be an arbitrary set of conductance ϕH . As we want to apply vol(C ) ≤ vol(V )/2. Suppose that α ≤ 10ϕH ≤ (1 + ϵ)α and the ∗ Lemma 4 for µ = 1/2 on ϕH , we need to bound the amount of condition (13) hold. Then, for any v ∈ (C )α , we have the leak of PRR from ϕH . The following lemma is a counterpart to s   Lemma 7 for the global setting. 1/2 20 40 ϕ (prα (χv )) < √ 3ϕH log . H 4 − ϵ 4 − ϵ Lemma 10. Suppose that α ∈ (0, 1] and C ⊆ V satisfy ! Theorem 11 follows from Lemma 4 for µ = 1/2 and the following Õ lemma. The proof of Lemma 12 is similar to the proof of Theorem 5.1 πC (w)prα (χw ) (v) ≤ πC (v) (13) w ∈C in [5] with Lemma 10. The detail is in Appendix C. ◦ for every v ∈ C\C . Then we have Lemma 12. Suppose a set C ⊆ V and α > 0 satisfy the condi- Õ ϕH (C) tion (13). Then, vol(Cα ) ≥ vol(C)/2 holds. π (w)pr (χw )(C) ≤ . C α 2α w ∈C Proof of Theorem 11. By instantiating Lemma 12 with C = C∗ The assumption (13) means that the expected PPR of a boundary and α ≤ 10ϕH ≤ (1 + ϵ)α, we have vertex v ∈ C\C◦ is at most its probability mass in the distribution ∗ 1 + ϵ 9 − ϵ pr (χv )(C ) ≥ 1 − = πC . We note that (13) always holds as an equality when H is a α 10 10 Hypergraph Clustering Based on PageRank

∗ ∗ for v ∈ (C )α . Because vol(C ) ≤ vol(V )/2, Table 1: Real-world hypergraphs used in our experiments. − − Í Í ∗ ∗ 9 ϵ 1 4 ϵ Name n m v ∈V dv /n e ∈E |e|/m prα (χv )(C ) − π(C ) ≥ − = . 10 2 10 Graph products 86 106 3.57 2.04 Instantiating Lemma 4 with µ = 1/2, δ = (4 − ϵ)/10, and α ≤ 10ϕH , 330 299 3.06 3.04 we obtain DBLP KDD 5,590 2,719 1.99 4.00 s arXiv cond-mat 13,861 13,571 3.87 3.05 20  40  DBpedia Writers 54,909 18,232 1.80 5.29 1/2( ( )) ϕH prα χv < √ 3ϕH log . □ YouTube Group Memberships 88,490 21,974 3.24 12.91 4 − ϵ 4 − ϵ DBpedia Record Labels 158,385 9,827 1.40 22.51 DBpedia Genre 253,968 4,934 1.80 92.84 Proof of Theorem 9. The bound on conductance follows from Theorem 11. The analysis on the time complexity is trivial. □ 7.4 Sufficient conditions = 0.5 = 0.5 = 1.0 0.007 = 1.0 0.025 Here we discuss two useful sufficient conditions of the assump- = 2.0 = 2.0 tion (13) in Lemma 10. We give the proofs in Appendix D. 0.006 0.020 ) ) S S ( ( ∈ ( ] ⊆ ( ) ≤ H H 0.005 Lemma 13. Let α 0, 1 and C V be a vertex set with vol C 0.015 0.004 vol(V )/2. If prα (χv )(v) ≤ 1/2 holds for every v ∈ V , then the ◦ 0.010 condition (13) holds for any v ∈ C\C . 0.003

0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lemma 14. Let dmax = maxv ∈V {dv }. If α ∈ (0, 1] satisfies T T − (a) DBpedia Record Labels (b) DBpedia Genre  1 d   d  1 α ≤ − max 1 − max , 2 vol(V ) vol(V ) Figure 2: Effect of parameters ∆ and T . Average values over 50 initial vertices are plotted. then the assumption (13) holds for any v ∈ C\C◦.

Lemma 14 implies that, ifdmax/vol(V ) ≤ 1/4 holds, then Lemma 10 method, ∆ (i.e., time span) and T (i.e., total time), in terms of the so- holds for any α ≤ 1/4, which is usually the case in practice. lution quality. Specifically, we selected 50 initial vertices uniformly at random, and for each of which, we ran Algorithm 1 with µ = 1/2 8 EXPERIMENTAL EVALUATION for each pair of parameters (∆,T ) ∈ {0.5, 1.0, 2.0} × {2, 4,..., 30}. In this section, we evaluate the performance of Algorithms 1 and 2 The results are depicted in Figure 2. As can be seen, the effect of ∆ in terms of both the quality of solutions and running time. is drastic. The performance with ∆ = 0.5 is worse than that with ∆ = 1.0, which implies that due to the above-mentioned sparsi- Dataset. Table 1 lists real-world hypergraphs on which our ex- fication of the PageRank vector, we have to spend some amount periments were conducted. Except for DBLP KDD, all hypergraphs of time at once to obtain a non-negligible diffusion. Moreover, the 1 were collected from KONECT . As these are originally unweighted performance with ∆ = 2.0 is not stable, meaning that we should bipartite graphs, we transformed each of them into a hypergraph not use an unnecessarily large value for ∆. The effect of T is also (with edge weights) as follows: We take the left-hand-side vertices significant; when ∆ = 0.5 or 1.0, the conductance of the output in a given as the vertices in the corresponding solution becomes smaller as T increases. From these observations, hypergraph, and then for each right-hand-side vertex, we add a in the following main experiments, we set ∆ = 1.0 and T = 30. hyperedge (with weight one) consisting of the neighboring vertices in the bipartite graph. DBLP KDD is a coauthorship hypergraph, Baseline methods. In local clustering, we used two baselines which we compiled from the DBLP dataset2. Specifically, the ver- called CLIQUE and STAR. CLIQUE first transforms a given hy- tices correspond to the authors of the papers in KDD 1995–2019, pergraph into a graph using the clique expansion, i.e., for each while each hyperedge represents a paper, which contains the ver- hyperedge e ∈ E and each u,v ∈ e with u , v, it adds an edge uv tices corresponding to the authors of the paper. of weight w(e). Then CLIQUE computes personalized PageRank As almost all hypergraphs generated as above were disconnected, with some α using the power method and produces sweep cuts. we extracted the largest connected in each hypergraph. Finally, it outputs the best subset among them, in terms of the All information in Table 1 is about the resulting hypergraphs. conductance in the original hypergraph. STAR is an analogue of CLIQUE, which employs the star expansion alternatively, i.e., for Parameter setting. We explain the parameter setting of our al- each hyperedge e ∈ E, it introduces a new vertex ve , and then for gorithms. Throughout the experiments, we set ϵ = 0.9. In our each vertex u ∈ e, it adds an edge uve of weight w(e)/|e|. As the implementation, to make the algorithms more scalable, we try to set of vertices has been changed in STAR, we leave out the addi- keep the PageRank vector sparse; that is, we round every PageRank tional vertices from the sweep cut. As for the power method, the −5 value less than 10 down to zero in each iteration of the Euler initial (PageRank) vector is set to be (1/n,..., 1/n) and the stopping method. As a preliminary experiment, using the two largest hy- −8 condition is set to be ∥x − xprev ∥1 ≤ 10 , where x and xprev are pergraphs, we observed the effect of the parameters in the Euler vectors obtained in the current and previous iterations, respectively. 1 http://konect.uni-koblenz.de/ Note that if we employ Acand as the candidate set of α, the power 2 https://dblp.uni-trier.de/ method requires numerous number of iterations. This is because Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida

0.25 0.6 Algorithm 1 0.14 Algorithm 1 Algorithm 1 Algorithm 1 CLIQUE CLIQUE CLIQUE 0.20 CLIQUE 0.20 0.5 STAR 0.12 STAR STAR STAR 0.10 0.4 0.15

) ) 0.15 ) ) S S S S ( ( (

0.08 ( H H H 0.3 H 0.10 0.06 0.10 0.2 0.04 0.05 0.05 0.1 0.02

0.0 0.00 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 # of observations # of observations # of observations # of observations

(a) Graph products (b) Network theory (c) DBLP KDD (d) arXiv cond-mat

0.12 Algorithm 1 Algorithm 1 Algorithm 1 0.020 Algorithm 1 CLIQUE 0.150 STAR 0.05 STAR STAR 0.10 STAR 0.125 0.015 0.04 0.08

) ) 0.100 ) ) S S S S

( ( ( 0.03 (

H 0.06 H H H 0.010 0.075 0.02 0.04 0.050 0.005 0.02 0.025 0.01

0.00 0.000 0.00 0.000 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 # of observations # of observations # of observations # of observations

(e) DBpedia Writers (f) YouTube Group Memberships (g) DBpedia Record Labels (h) DBpedia Genre Figure 3: Comparison of the solution quality of Algorithm 1 with those of CLIQUE and STAR.

wmin Acand contains a lot of small values, e.g., Í . Hence, we Table 2: Running time(s) of local algorithms. wmax e∈E |e | just used α = 0.05, 0.10,..., 0.95 in our experiments. Name Algorithm 1 CLIQUE STAR In global clustering, we used the well-known hypergraph clus- tering package called hMETIS3 as well as the global counterparts Graph products 0.02 0.01 0.03 of CLIQUE and STAR, which can be obtained in the same way Network theory 0.06 0.18 0.11 as that of our algorithms. Note that hMETIS has a lot of parame- DBLP KDD 1.42 1.84 1.54 ters; among those, the parameters UBfactor, Nruns, CType, RType, arXiv cond-mat 5.43 4.84 5.70 and Vcycle may affect the solution quality and running time. To DBpedia Writers 7.07 77.98 17.50 obtain a high quality solution, we run the algorithm for each of YouTube Group Memberships 93.67 OM 57.64 (UBfactor, Nruns, CType, RType, Vcycle) ∈ {5, 10,..., 45}×{10}× DBpedia Record Labels 36.69 OM 44.18 {1, 2, 3, 4, 5} × {1, 2, 3} × {0, 1, 2, 3} to obtain vertex sets, and then DBpedia Genre 85.86 OM 97.47 return the one with the smallest conductance. We used hMETIS 1.5.3, the latest stable release. Finally, we remark that hMETIS is has a larger cumulative number with a fixed conductance value. It not applicable to local clustering because we cannot specify the can be observed that CLIQUE performs better than STAR. volume of a subset containing the specified vertex. Table 2 presents the running time of the algorithms, where we have the average values over 50 initial vertices. The best results 8.1 Local clustering among the algorithms are written in bold. As is evident, Algorithm 1 First, we focus on local clustering and evaluate the performance of is much more scalable than CLIQUE and comparable to STAR. Algorithm 1. To this end, for each hypergraph in Table 1, we selected 50 initial vertices uniformly at random, and for each of which, we 8.2 Global clustering CLIQUE STAR ran Algorithm 1, , and . To focus on local structure, Next we focus on global clustering and evaluate the performance µ = / CLIQUE STAR Algorithm 1 employs 1 10, and so do and in of Algorithm 2. As can be inferred from Table 2, Algorithm 2 itself themselves. The code is written in C#, which was conducted on a does not scale for large hypergraphs. Therefore, we modified the machine with 2.8 GHz CPU and 16 GB RAM. algorithm as follows: we selected 50 initial vertices uniformly at The quality of solutions are illustrated in Figure 3. The results random, and replace V with the set of selected vertices in Line 2 CLIQUE for are missing for the three largest hypergraphs because (in Algorithm 2). We modified the global counterparts of CLIQUE it could not be performed due to the memory limit. In those plots, and STAR in the same way. To compare the performance of the once we fix a value in the vertical axis, we can see the cumulative algorithms fairly, we used the same initial vertices for those three number of solutions with the conductance less than or equal to the algorithms. The code is again written in C#, which was conducted value fixed; thus, an algorithm drawing a line with a lower right on the above-mentioned machine. position has a better performance. As can be seen, Algorithm 1 The results are summarized in Table 3, where the best results are CLIQUE STAR outperforms and . In fact, Algorithm 1 almost always again written in bold. As for the solution quality, Algorithm 2 is one 3 http://glaros.dtc.umn.edu/gkhome/metis/hmetis/overview of the best choices, but unlike the local setting, STAR performs well. Hypergraph Clustering Based on PageRank

Table 3: Performance of Algorithm 2, CLIQUE, STAR, and hMETIS.

Algorithm 2 CLIQUE STAR hMETIS Name ϕH (S) Time(s) ϕH (S) Time(s) ϕH (S) Time(s) ϕH (S) Time(s) Graph products 0.0234 1.01 0.0246 0.70 0.0288 1.47 0.0234 15.30 Network theory 0.0064 3.03 0.0062 8.57 0.0074 5.15 0.0061 24.82 DBLP KDD 0.0116 68.91 0.0116 89.24 0.0116 74.98 0.0236 107.03 arXiv cond-mat 0.0127 251.81 0.0127 237.31 0.0127 281.37 0.0259 418.08 DBpedia Writers 0.0069 353.60 0.0043 3,812.72 0.0048 856.85 0.0010 721.36 YouTube Group Memberships 0.0003 4,735.28 — OM 0.0003 2,779.52 0.0029 3,696.24 DBpedia Record Labels 0.0002 1,766.34 — OM 0.0002 2,151.80 0.0052 1,655.88 DBpedia Genre 0.0001 4,320.72 — OM 0.0001 4,779.95 0.0023 8,670.49

In fact, STAR achieves the same performance as that of Algorithm 2 [12] Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3-5 for the three largest hypergraphs. On the other hand, the perfor- (2010), 75–174. [13] Kaito Fujii, Tasuku Soma, and Yuichi Yoshida. 2018. Polynomial-time algorithms mance of hMETIS is unfavorable. In our experiments, hMETIS was for submodular Laplacian systems. arXiv preprint, arXiv:1803.10923 (2018). performed with almost all possible parameter settings; therefore, it [14] Shayan Oveis Gharan and Luca Trevisan. 2012. Approximating the expansion hMETIS profile and almost optimal local graph clustering. In FOCS. 187–196. is unlikely that we can obtain a comparable solution using . [15] Debarghya Ghoshdastidar and Ambedkar Dukkipati. 2014. Consistency of spec- The trend of the results for the running time is similar to that of tral partitioning of uniform hypergraphs under planted partition model. In NIPS. the local setting; Algorithm 1 is much more scalable than CLIQUE 397–405. [16] Masahiro Ikeda, Atsushi Miyauchi, Yuuki Takai, and Yuichi Yoshida. 2018. Finding and comparable to STAR. Cheeger cuts in hypergraphs via heat equation. arXiv preprint arXiv:1809.04396 (2018). [17] Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 9 CONCLUSION 271–279. [18] Kyle Kloster and David F. Gleich. 2014. Heat kernel based community detection. In this work, we have developed local and global clustering algo- In KDD. 1386–1395. rithms based on PageRank for hypergraphs, recently introduced by [19] Yukio Komura. 1967. Nonlinear semi-groups in Hilbert space. Journal of the Li and Milenkovic [22]. To the best of our knowledge, ours are the Mathematical Society of Japan 19, 4 (1967), 493–507. [20] Tsz C. Kwok, Lap C. Lau, and Yin T. Lee. 2017. Improved Cheeger’s inequality first practical algorithms that have theoretical guarantees onthe and analysis of local graph partitioning using vertex expansion and expansion conductance of the output vertex set. By experiment, we confirmed profile. SIAM J. Comput. 46, 3 (2017), 890–910. the usefulness of our clustering algorithms in practice. [21] Marius Leordeanu and Cristian Sminchisescu. 2012. Efficient hypergraph cluster- ing. In AISTATS. 676–684. [22] Pan Li and Olgica Milenkovic. 2018. Submodular hypergraphs: p-Laplacians, Cheeger inequalities and spectral clustering. In ICML. 3014–3023. ACKNOWLEDGMENTS [23] Hairong Liu, Longin J. Latecki, and Shuicheng Yan. 2010. Robust clustering as The authors thank Yu Kitabeppu for helpful comments. This work ensembles of affinity relations. In NIPS. 1414–1422. [24] László Lovász and Miklós Simonovits. 1990. The mixing rate of Markov chains, was done while A.M. was at RIKEN AIP, Japan. This work was sup- an isoperimetric inequality, and computing the volume. In FOCS. 346–354. ported by JSPS KAKENHI Grant Numbers 18H05291 and 19K20218. [25] László Lovász and Miklós Simonovits. 1993. Random walks in a convex body and an improved volume algorithm. Random Structures & Algorithms 4, 4 (1993), 359–412. REFERENCES [26] Isao Miyadera. 1992. Nonlinear Semigroups. Translations of Mathematical Mono- graphs, Vol. 109. American Mathematical Society. [1] Sameer Agarwal, Kristin Branson, and Serge Belongie. 2006. Higher order learn- [27] Juan Peypouquet and Sylvain Sorin. 2010. Evolution Equations for Maximal ing with graphs. In ICML. 17–24. Monotone Operators: Asymptotic Analysis in Continuous and Discrete Time. [2] Sameer Agarwal, Jongwoo Lim, Lihi Zelnik-Manor, Pietro Perona, David Krieg- Journal of Convex Analysis 17, 3&4 (2010), 1113–1163. man, and Serge Belongie. 2005. Beyond pairwise clustering. In CVPR. 838–845. [28] J. A. Rodri’guez. 2002. On the Laplacian eigenvalues and metric parameters of [3] Noga Alon. 1986. Eigenvalues and expanders. Combinatorica 6, 2 (1986), 83–96. hypergraphs. Linear and Multilinear Algebra 50, 1 (2002), 1–14. [4] Noga Alon and V. D. Milman. 1985. λ1, Isoperimetric inequalities for graphs, and [29] Samuel Rota Bulo and Marcello Pelillo. 2013. A game-theoretic approach to hyper- superconcentrators. Journal of Combinatorial Theory, Series B 38, 1 (1985), 73–88. graph clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence [5] Reid Andersen, Fan Chung, and Kevin Lang. 2007. Using PageRank to locally 35, 6 (2013), 1312–1327. partition a graph. Internet Mathematics 4, 1 (2007), 35–64. [30] Daniel A. Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for [6] Reid Andersen and Yuval Peres. 2009. Finding sparse cuts locally using evolving massive graphs and its application to nearly linear time graph partitioning. SIAM sets. In STOC. 235–244. J. Comput. 42, 1 (2013), 1–26. [7] Marianna Bolla. 1993. Spectra, Euclidean representations and clusterings of [31] Yuichi Yoshida. 2016. Nonlinear Laplacian for digraphs and its applications to hypergraphs. Discrete Mathematics 117 (1993), 19–39. network analysis. In WSDM. 483–492. [8] T-H Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. 2018. [32] Yuichi Yoshida. 2019. Cheeger inequalities for submodular transformations. In Spectral properties of hypergraph Laplacian and approximation algorithms. J. SODA. 2582–2601. ACM 65, 3 (2018), 15–48. [33] Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2005. Beyond pairwise [9] I Chien, Chung-Yi Lin, and I-Hsiang Wang. 2018. Community detection in classification and clustering using hypergraphs. Max Plank Institute for Biological hypergraphs: Optimal statistical limit and efficient algorithms. In AISTATS. 871– Cybernetics, Tübingen, Germany (2005). 879. [34] Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with [10] Fan Chung. 2007. The heat kernel as the pagerank of a graph. Proceedings of hypergraphs: Clustering, classification, and embedding. In NIPS. 1601–1608. the National Academy of Sciences of the United States of America 104, 50 (2007), [35] Jason Y. Zien, Martin D. F. Schlag, and Pak K. Chan. 1999. Multilevel spectral 19735–19740. hypergraph partitioning with arbitrary vertex sizes. IEEE Transactions on Pattern [11] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. Efficient graph-based Analysis and Machine Intelligence 18, 9 (1999), 1389–1399. image segmentation. International Journal of Computer Vision 59, 2 (2004), 167– 181. Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida

A PROOFS OF SECTION 4.1 Hence for v , u, we have −LHp p(v) ≥ 0. Note that −LHp p(v) Proof of Theorem 1. We first prove the former claim. In[16], is equal to the difference between the heat diffusing into v and that v it was proven that LH is a maximal monotone operator. (We omit diffusing out from in the heat equation: the definition here because we do not need it). Then, by the general dρt = −L ρ . theory of maximal monotone operators [19, 26], the operator I + dt Hp t λL is injective for any λ > 0. Hence, its inverse J := (I + λL )−1 H λ H Because the heat diffuses to the stationary distribution π , the is single-valued, and is called a resolvent of L . Because the image V H non-negativity of −L p(v) implies p(v) ≤ π (v). of L is RV , for any s ∈ RV , there exists a unique vector J (s), Hp V H λ When v = u, we have p(u) − χ (u) ≤ 0. By a similar argument, which is a solution to (3) when λ = (1 − α)/2α > 0. u we have the desired inequality. □ We now prove the latter claim. For p = prα (s), we have 1 − α B PROOF OF LEMMA 4 p + LHp p = s, 2α B.1 Setup where H is the graph obtained from H and p, as defined in Sec- p Before proving Lemma 4, we first introduce several notations. Let tion 3.3. This implies that p is a classical PPR on H , and hence p G = (V , E,w) be an undirected graph. For a subset S ⊆ V , we define the claim holds by the Perron–Frobenius theorem for the matrix ⊤ two sets of ordered pairs of vertices as follows: αs1 + (1 − α)WHp . □ in(S) := {(u,v) ∈ V × S}, out(S) := {(u,v) ∈ S × V }. Proof of Propostion 2. As mentioned in the proof of Theo- (u,v) ∈ V ×V ∈ RV rem 1, the PPR prα (s) is equal to a resolvent Jλ(s) for λ = (1−α)/2α. For an ordered pair and a distribution p , we set ( ) Thus, it suffices to prove the continuity of the resolvent J (s) with p(u,v) := w uv p(u). For a set of ordered pairs of vertices A ⊆ V ×V λ du respect to λ > 0. By [26], for λ ≥ µ > 0, the resolvents satisfy the V Í and any distribution p ∈ R , we set p(A) := (u,v)∈A p(u,v). Then, following inequality: by a similar argument for p and Hp to [5], we have the following:  µ λ − µ  J (s) = J s + J (s) . Lemma 16. Let H = (V , E,w) be a hypergraph, p ∈ RV be a λ µ λ λ λ distribution, Hp = (V , Ep ,wp ) be a graph as defined in Section 3, and On the other hand, it is well known that Jλ is a non-expansive map, S ⊆ V . Then, we have V i.e., for any s, t ∈ R and any λ > 0, ∥Jλ(s)−Jλ(t)∥ −1 ≤ ∥s−t ∥ −1 , DH DH 1 WH (p)(S) = (p(in(S) ∪ out(S)) + p(in(S) ∩ out(S))) , q ⊤ −1 p 2 where ∥x ∥ −1 = x D x. By these relations, we can show the DH H following: where p(·) is defined using the graph Hp .   µ λ − µ B.2 Lovász–Simonovic function for weighted ∥Jλ(s) − Jµ (s)∥D−1 ≤ Jµ s + Jλ(s) − Jµ (s) H λ λ −1 DH graphs

µ λ − µ λ − µ Next, we introduce a weighted version of the Lovász–Simonovic ≤ ( ) − ≤ ∥ ( ) − ∥ − s + Jλ s s Jλ s s D 1 . V λ λ −1 λ H function p[x] [24, 25]. For a distribution p ∈ R , we define p[x] on DH ∈ [ ( )] ( p ) x 0, vol V as follows: If x = vol Sj for some j = 0, 1,...,n, Hence, lim → J (s) = Jµ (s). □ λ µ λ [ ] ( p ) ∈ [ ( )] we define p x := p Sj . For general x 0, vol V , we extend To prove Proposition 3, we prepare the following lemma: p[x] as a piecewise linear function. More specifically, if x satisfies p p vol(S ) ≤ x < vol(S ), then p[x] is defined as Lemma 15. For β := 2α/(1 + α), we have j j+1 p(v ) β(s − pr (s)) − ( − β)L (pr (s)) = 0. [ ] ( p ) j+1 ( − ( p )) α 1 H α p x = p Sj + x vol Sj . dvj+1 Proof of Lemma 15 . Let p = prα (s). Note that (1 − α)/2α = We remark that the function p[x] is concave. Then, an argument 1/β − 1. Then by (3) and easy calculation, we have similar to that of [5] with Lemma 16 implies the following lemma:  1 − α  I + L (p) ∋ s ⇔ β(s − p) − (1 − β)L (p) ∋ 0. Lemma 17. For p = prα (s) and j = 1, 2,...,n − 1, we have 2α H H h i [ ( p )] ≤ ( p ) As p is unique by Theorem 1, we can replace the inclusion with p vol Sj αs vol Sj + equality. □ 1  h p p i h p p i + (1 − α) p vol(S ) − |∂(S )| + p vol(S ) + |∂(S )| . 2 j j j j Proof of Proposition 3. Let p = prα (χu ). Because p is a dis- tribution by the latter claim of Theorem 1, we have p(v) ≥ 0 for B.3 A mixing result for PPR and the proof of any v ∈ V . Then by Lemma 15, we have Lemma 4 1 − β Next, we show a local version of a mixing result for PPR. A global p = χ − L p ≥ 0. u β Hp version for graphs was proven in [5]. Hypergraph Clustering Based on PageRank

Theorem 18. Let s be a distribution and ϕ ∈ [0, 1], µ ∈ (0, 1/2] the argument of [5], we take t as be any constants. We set p = prα (s). Let ℓµ ∈ {1, 2,...,n} be the & p !' p ! p p 8 4 vol(S) 9 4 vol(S) unique integer such that vol(S ) < µ · vol(V ) ≤ vol(S ). We t = log ≤ log . ℓµ −1 ℓµ ϕ2 δ ϕ2 δ p assume that ϕH (S ) ≥ ϕ holds for any j = 1, 2, . . . , ℓµ . Then, for any j Here, ⌈x⌉ = min{a ∈ Z | x ≤ a}. Then by the inequality (16), we S ⊆ V such that vol(S)/vol(V ) ≤ µ and any t ∈ Z , the following + have holds: p ! 9 4 vol(S) δ t p(S) − πV (S) ≤ α log + . p  ϕ2  ϕ2 δ 4 p(S) − π (S) ≤ αt + vol(S) 1 − . (16) V 8 By the assumption δ < p(S) − πV (S) and rearranging the obtained inequality, we have the desired inequality (18). □ Proof. We can show this theorem by an argument similar to that of [5]. Here, we show a sketch of the proof. Finally, we deduce Lemma 4 from Lemma 19. p We define a function ft (x) on x ∈ [0, vol(S )] by ℓµ Proof of Lemma 4. The statement of Lemma 19 is independent →  2 t of the normalization of the weight function w : E R≥0, except p ϕ c ft (x) := αt + min{x, vol(V ) − x} 1 − . for the factor vol(V ). Hence, for the hypergraph H = (V , E,cw) 8 scaled by c > 0, the similar statement to Lemma 19 obtained by Because p(S) ≤ p[vol(S)] for any vertex set S ⊆ V , if for a vertex replacing vol(V ) with c ·vol(V ) also holds. This means that the right set S ⊆ V such that vol(S)/vol(V ) ≤ µ, hand side of (18) can be decreased as long as the assumption δ ≥ p 4/ c · vol(V ) holds. Then, we remark that the obtained conductance x µ p[x] − ≤ ft (x) (17) ϕ (pr (s)) is independent of c. As a consequence of this argument, vol(V ) H c α if a hypergraph H satisfies the assumption of Lemma 19, then, holds for x = vol(S), then the inequality (16) for S follows. Lemma 19 holds for Hc for any c that satisfies the assumption p To prove Theorem 18, by induction on t, we show that if a sweep δ ≥ 4/ c · vol(V ), hence also for c · vol(V ) = 42/δ 2. However, the p ( p ) ≥ c cut Sj satisfies ϕH Sj ϕ, then the inequality (17) holds for left hand side of (18) for H is the same as that for H . This implies ( p ) ∈ the inequality (7). □ xj = vol Sj and any t Z+ as follows: (1) We prove that the inequality (17) holds for t = 0 and any C PROOF OF LEMMA 12 S ⊆ V . Proof of Lemma 12. Let pv = prα (χv ) for v ∈ V . We consider (2) We assume that the inequality (17) holds for t = t0 and xj = p a random variable X := p (C), where v ∈ V is sampled according vol(S ). Then, by Lemma 17, the concavity of f (x ), and the v j t j π p to the distribution C . Then by Lemma 10, we have piecewise linearity of p[x ] − x /vol(V ), we prove that if S j j j Õ ϕ (C) p [ ] ( ) ( ) ≤ H ( ) ≥ EπC X = πC v pv C . satisfies ϕH Sj ϕ, then the inequality (17) holds for xj = 2α p v ∈V vol(S ) and t = t + 1. j 0 By Markov’s inequality, we have p This argument implies that if any sweep cut Sj (j = 1, 2, . . . , ℓµ )   1 Pr[v < Cα ] ≤ Pr X > 2Eπ [X] ≤ . ( p ) ≥ ( p ) C 2 satisfies ϕH Sj ϕ, the inequality (17) holds for xj = vol Sj ( ) ( )/ for all j ∈ {0, 1, . . . , ℓµ } and any t ∈ Z+. Then, by the concavity This implies that vol Cα is larger than vol C 2. □ of ft (x) and the piecewise linearity of p[x] − x/vol(V ) again, the inequality (17) holds for any x ∈ [0, µ · vol(V )] and any t ∈ Z+. D PROOFS OF SUFFICIENT CONDITIONS Because p(S) ≤ p[vol(S)], we obtain the inequality (16) for any Proof of Lemma 13. Let pv = prα (χv ) for v ∈ V . For a vertex S ⊆ V with vol(S)/vol(V ) ≤ µ. □ v ∈ V , we have 1 vol(C) pv (v) ≤ ≤ 1 − . We next prove the following as a consequence of Theorem 18. 2 vol(V ) Hence by multiplying both sides by πC (v) = dv /vol(C) and trans- Lemma 19. If there is a vertex set S ⊆ V and a constant δ ≥ p posing a term, we have 4/ vol(V ) such that vol(S)/vol(V ) ≤ µ and pr (s)(S) − π (S) > δ, α V ( ) ( ) ( ) ≤ ( ). then the following inequality holds: πC v pv v + πV v πC v (19) r Now, we have µ 12α log(vol(V )) Õ Õ ϕ (pr (s)) < . (18) π (u)p (v) = π (v)p (v) + π (u)p (v) H α δ C u C v C u u ∈C u ∈C\{v } µ ( ( )) Õ Proof. We apply Theorem 18 for ϕ = ϕH prα s . Then, be- ≤ πC (v)pv (v) + πC (u)πV (v) (by Proposition 3) p u ∈C\{v } cause every sweep cut Sj (j = 1, 2, . . . , ℓ) satisfies the assump- ( p ) ≥ ⊆ ≤ πC (v)pv (v) + πV (v) ≤ πC (v). (by (19)) tion ϕH Sj ϕ in Theorem 18, any vertex set S V with vol(S)/vol(V ) ≤ µ satisfies the inequality (16) for any t ∈ Z+. As in This concludes the proof. □ Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida

Proof of Lemma 14. Let p = prα (χv ). By an argument similar Because p(u) ≤ du /vol(V ) holds for any u , v by Proposition 3, to the proof of Lemma 10 for the case C = {v}, we have we have w (vu) Õ wp (vu) d d Õ p p(v) ≤ α + (1 − α) u ≤ α + (1 − α) max α(1 − p(v)) = (1 − α)p(v) − (1 − α) p(u). ( ) ( ) du du vol V vol V u ∈C,u,v u ∈C,u,v Then, we have By combining this inequality with Lemma 13, if α+(1−α)dmax/vol(V ) ≤ 1/2, then α satisfies the condition (13) for any v ∈ C\C◦. □ Õ wp (vu) p(v) = α + (1 − α) p(u). du u ∈C,u,v