Hypergraph Clustering Based on Pagerank

Hypergraph Clustering Based on Pagerank

Hypergraph Clustering Based on PageRank Yuuki Takai∗ Atsushi Miyauchi∗ RIKEN AIP University of Tokyo Tokyo, Japan Tokyo, Japan [email protected] [email protected] Masahiro Ikeda Yuichi Yoshida RIKEN AIP National Institute of Informatics Tokyo, Japan Tokyo, Japan [email protected] [email protected] ABSTRACT natural object to model such relations. Indeed, if we represent coau- A hypergraph is a useful combinatorial object to model ternary or thorship network by a graph, an edge only indicates the existence of higher-order relations among entities. Clustering hypergraphs is co-authored paper, and information about co-authors in one paper a fundamental task in network analysis. In this study, we develop is lost. However, hypergraphs do not lose that information. two clustering algorithms based on personalized PageRank on hy- As with graphs, hypergraph clustering is also an important task, pergraphs. The first one is local in the sense that its goal is tofind and it is natural to use an extension of conductance for hyper- a tightly connected vertex set with a bounded volume including a graphs [8, 32] to measure the quality of a vertex set as a cluster. specified vertex. The second one is global in the sense that itsgoal Although several methods have been shown to have theoretical is to find a tightly connected vertex set. For both algorithms, we guarantees on the conductance of the output vertex set [8, 16, 32], discuss theoretical guarantees on the conductance of the output none of them are efficient in practice. vertex set. Also, we experimentally demonstrate that our clustering To obtain efficient hypergraph clustering algorithms with athe- algorithms outperform existing methods in terms of both the solu- oretical guarantee, we utilize the graph clustering algorithm devel- tion quality and running time. To the best of our knowledge, ours oped by Andersen et al. [5]. Their algorithm is based on personalized are the first practical algorithms for hypergraphs with theoretical PageRank (PPR), which is the stationary distribution of a random guarantees on the conductance of the output set. walk that jumps back to a specified vertex with a certain probability in each step of the walk. Roughly speaking, their algorithm first CCS CONCEPTS computes the PPR from a specified vertex, and then returns a set of vertices with high PPR values. They showed a theoretical guarantee • Information systems → Web mining; • Mathematics of com- on the conductance of the returned vertex set. puting → Spectra of graphs. In this work, we propose two hypergraph clustering algorithms KEYWORDS by extending Andersen et al.’s algorithm [5]. The first one is local in hypergraph, clustering, personalized PageRank, Laplacian the sense that its goal is to find a vertex set of a small conductance and a bounded volume including a specified vertex. The second 1 INTRODUCTION one is global in the sense that its goal is to find a vertex set of the smallest conductance. Graph clustering is a fundamental task in network analysis, where Although there could be various definitions of random walk on the goal is to find a tightly connected vertex set in a graph. Ithas hypergraphs and corresponding PPRs, we adopt the PPR recently been used in many applications such as community detection [12] introduced by Li and Milenkovic [22] using a hypergraph Lapla- and image segmentation [11]. Because of its importance, graph cian [8, 32]. Because the hypergraph Laplacian well captures the clustering has been intensively studied, and a plethora of clustering cut information of hypergraphs, the corresponding PPR can be used arXiv:2006.08302v1 [cs.DS] 15 Jun 2020 methods have been proposed. See, e.g., [12] for a survey. to find vertex sets with small conductance. There is no single best criterion to measure the quality of a vertex Because we can efficiently compute (a good approximation to) set as a cluster; however, one of the most widely used is the notion the PPR, our algorithms are efficient. Moreover, as with Ander- of conductance. Roughly speaking, the conductance of a vertex set sen et al.’s algorithm [5], our algorithms have theoretical guaran- is small if many edges reside within the vertex set, whereas only tees on the conductance of the output vertex set. To the best of a few edges leave it (see Section 3 for details). Many clustering our knowledge, no local clustering algorithm with a theoretical methods have been shown to have theoretical guarantees on the guarantee on the conductance of the output set is known for hyper- conductance of the output vertex set [3–6, 10, 14, 18, 20, 30]. graphs. Although, as mentioned previously, several algorithms have Although a graph is suitable to represent binary relations over been proposed for global clustering for hypergraphs [8, 16, 32], we entities, many relations, such as coauthorship, goods that are pur- stress here that our global clustering algorithm is the first practical chased together, and chemicals involved in metabolic reactions algorithm with a theoretical guarantee. inherently comprise three or more entities. A hypergraph is a more ∗Both authors contributed equally to this research. Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida We experimentally demonstrate that our clustering algorithms t ≥ 0, which cannot be efficiently implemented on Turing machines. outperform other baseline methods in terms of both the solution Although our algorithm also computes sweep cuts, we need only quality and running time. consider one specific vector, i.e., the PPR. Our contribution can be summarized as follows. If we do not need any theoretical guarantee on the quality of • We propose local and global hypergraph clustering algorithms the output vertex set, we can think of a heuristic in which we first based on PPR for hypergraphs given by Li and Milenkovic [22]. construct a graph from the input hypergraph H = ¹V ; E;wº and • We give theoretical guarantees on the conductance of the vertex then apply a known clustering algorithm for graphs on the resulting set output by our algorithms. graph, e.g., Andersen et al.’s algorithm [5]. There are two popular • Our algorithms are more efficient than existing algorithms with methods for constructing graphs: theoretical guarantees [8, 16, 32] (see Section 2 for the details). Clique expansion [2, 34]: For each hyperedge e 2 E and each • We experimentally demonstrate that our clustering algorithms u;v 2 e with u , v, we add an edge uv of weight w¹eº or outperform other baseline methods in terms of both the solution ¹ )/je j w e 2 . quality and running time. Star expansion [35]: For each hyperedge e 2 E, we introduce This paper is organized as follows. We discuss related work in a new vertex ve , and then for each vertex u 2 e, we add an Section 2 and then introduce notions used throughout the paper in edge uve of weight w¹eº or w¹e)/jej. Section 3. After discussing basic properties of PPR for hypergraphs The obvious drawback of the clique expansion is that it introduces in Section 4, we introduce a key lemma (Lemma 4) that connects Θ¹k2º edges for a hyperedge of size k, and hence the resulting graph conductance and PPR in Section 5. On the basis of the key lemma, we is huge. The relation of these expansions and various Laplacians show our local and global clustering algorithms in Sections 6 and 7, defined for hypergraphs [7, 28, 33] were discussed in [1]. respectively. We discuss our experimental results in Section 8, and Ghoshdastidar and Dukkipati [15] proposed a spectral method conclude in Section 9. In appendix, we give the proofs which could utilizing a tensor defined from the input hypergraph, and analyzed not be included in the main body because of the page restrictions. its performance when the hypergraph is generated from a planted 2 RELATED WORK partition model or a stochastic block model. A drawback of their method is that the hypergraph must be uniform, i.e., each hyper- As previously mentioned, even if we restrict our attention to graph edge must be of the same size. Chien et al. [9] proposed another clustering methods with theoretical guarantees on the conductance statistical method that also assumes uniformity of the hypergraph, of the output vertex set, a plethora of methods have been pro- and analyzed its performance on a stochastic block model. posed [3–6, 10, 14, 18, 20, 30]. Some of them run in time nearly Other iterative procedures for clustering hypergraphs have also linear in the size of the output vertex set, i.e., we do not have to been proposed [21, 23, 29]. However, these methods do not have read the entire graph. any theoretical guarantee on the quality of the output vertex set. Although several hypergraph clustering algorithms have been proposed [2, 21, 23, 29, 34], most of them are merely heuristics and have no guarantees on the conductance of the output vertex set. 3 PRELIMINARIES V Í Indeed, we are not aware of any local clustering algorithms with a We say that a vector x 2 R is a distribution if v 2V x¹vº = 1 and V theoretical guarantee, and to the best of our knowledge, there are x¹vº ≥ 0 for every v 2 V . For a vector x 2 R and a set S ⊆ V , we Í only two global clustering algorithms with theoretical guarantees. define x¹Sº = v 2S x¹vº. The first one [8, 32] is based on Cheeger’s inequality for hyper- We often use the symbols n and m to denote the number of graphs. Given a hypergraph H = ¹V ; E;wº, it first computes an vertices and number of (hyper)edges of the (hyper)graph we are approximation x 2 RV to the second eigenvector of the normal- concerned with, which should be clear from the context.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us