Practical Parallel Hypergraph Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Practical Parallel Hypergraph Algorithms Julian Shun [email protected] MIT CSAIL Abstract v While there has been signicant work on parallel graph pro- 0 cessing, there has been very surprisingly little work on high- e0 performance hypergraph processing. This paper presents v0 v1 v1 a collection of ecient parallel algorithms for hypergraph processing, including algorithms for betweenness central- e1 ity, maximal independent set, k-core decomposition, hyper- v2 trees, hyperpaths, connected components, PageRank, and v2 v3 e single-source shortest paths. For these problems, we either 2 provide new parallel algorithms or more ecient implemen- v3 tations than prior work. Furthermore, our algorithms are theoretically-ecient in terms of work and depth. To imple- (a) Hypergraph (b) Bipartite representation ment our algorithms, we extend the Ligra graph processing Figure 1. An example hypergraph representing the groups framework to support hypergraphs, and our implementa- , , , , , , and , , and its bipartite repre- { 0 1 2} { 1 2 3} { 0 3} tions benet from graph optimizations including switching sentation. between sparse and dense traversals based on the frontier size, edge-aware parallelization, using buckets to prioritize processing of vertices, and compression. Our experiments represented as hyperedges, can contain an arbitrary number on a 72-core machine and show that our algorithms obtain of vertices. Hyperedges correspond to group relationships excellent parallel speedups, and are signicantly faster than among vertices (e.g., a community in a social network). An algorithms in existing hypergraph processing frameworks. example of a hypergraph is shown in Figure 1a. CCS Concepts • Computing methodologies → Paral- Hypergraphs have been shown to enable richer analy- lel algorithms; Shared memory algorithms. sis of structured data in various domains, such as protein network analysis [76], machine learning [100], and image 1 Introduction processing [15, 27]. Various graph algorithms have been A graph contains vertices and edges, where a vertex repre- extended to the hypergraph setting, and we list some ex- sents an entity of interest, and an edge between two vertices amples of algorithms and their applications here. Between- represents an interaction between the two corresponding ness centrality on hypergraphs has been used for hierarchi- entities. There has been signicant work on developing al- cal community detection [12] and measuring importance gorithms and programming frameworks for ecient graph of hypergraphs vertices [77]. k-core decomposition in hy- processing due to their applications in various domains, such pergraphs can be applied to invertible Bloom lookup tables, as social network and Web analysis, cyber-security, and sci- low-density parity-check codes, and set reconciliation [45]. entic computations. One limitation of modeling data using PageRank and random walks on hypergraphs have been used graphs is that only binary relationships can be expressed, and for image segmentation and spectral clustering and shown can lead to loss of information from the original data. Hyper- to outperform graph-based methods [26, 27, 100]. Shortest graphs are a generalization of graphs where the relationships, paths, hyperpaths, and hypertrees have been used for solv- ing optimization problems [30, 70], satisability problems Permission to make digital or hard copies of all or part of this work for and deriving functional dependencies in databases [30], and personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear modeling information spread and nding important actors this notice and the full citation on the rst page. Copyrights for components in social networks [31]. Independent sets on hypergraphs of this work owned by others than ACM must be honored. Abstracting with have been applied to routing problems [2] and determining credit is permitted. To copy otherwise, or republish, to post on servers or to satisability of boolean formulas [48]. redistribute to lists, requires prior specic permission and/or a fee. Request Although there are many applications of hypergraphs, permissions from [email protected]. there has been little research on parallel hypergraph process- PPoPP ’20, February 22–26, 2020, San Diego, CA, USA © 2020 Association for Computing Machinery. ing. The main contribution of this paper is a suite of ecient ACM ISBN 978-1-4503-6818-6/20/02...$15.00 parallel hypergraph algorithms, including algorithms for hps://doi.org/10.1145/3332466.3374527 betweenness centrality, maximal independent set, k-core 232 PPoPP ’20, February 22–26, 2020, San Diego, CA, USA Julian Shun decomposition, hypertrees, hyperpaths, connected compo- usage and performance is signicantly worse than that of nents, PageRank, and single-source shortest paths. For these Hygra (2.8x–30.6x slower while using 235x more space on problems, we provide either new parallel hypergraph algo- the Friendster hypergraph). rithms (e.g., betweenness centrality and k-core decomposi- Our work shows that high-performance hypergraph pro- tion) or more ecient implementations than prior work. Ad- cessing can be done using just a single multicore machine, ditionally, we show that most of our algorithms are theoretically- on which we can process all existing publicly-available hy- ecient in terms of their work and depth complexities. pergraphs. Prior work has shown that graph processing We observe that our parallel hypergraph algorithms can can be done eciently on just a single multicore machine be implemented eciently by taking advantage of graph pro- (e.g., [24, 25, 67, 69, 81, 87, 96, 101]), and this work extends cessing machinery. To implement our parallel hypergraph the observation to hypergraphs. algorithms, we made relatively simple extensions to the Ligra The rest of the paper is organized as follows. Section 2 graph processing framework [81] and we call the extended discusses related work on graph and hypergraph process- framework Hygra. As with Ligra, Hygra is well-suited for ing. Section 3 describes hypergraph notation as well as the frontier-based algorithms, where small subsets of elements computational model and parallel primitives that we use in (referred to as frontiers) are processed in parallel on each the paper. Section 4 introduces the Hygra framework. Sec- iteration. We use a bipartite graph representation to store tion 5 describes our new parallel hypergraph algorithms hypergraphs, and use Ligra’s data structures for represent- implemented using Hygra. Section 6 presents our experi- ing subsets of vertices and hyperedges as well as operators mental evaluation of our algorithms, and comparisons with for mapping application-specic functions over these ele- alternative approaches. Finally, we conclude in Section 7. ments. The operators for processing subsets of vertices and hyperedges are theoretically-ecient, which enables us to 2 Related Work implement parallel hypergraph algorithms with strong the- oretical guarantees. Separating the operations on vertices Graph Processing. There has been signicant work on de- from operations on hyperedges is crucial for eciency and veloping graph libraries and frameworks to reduce program- requires carefully dening functions for vertices and hyper- ming eort by providing high-level operators that capture edges to preserve correctness. Hygra inherits from Ligra vari- common algorithm design patterns (e.g., [19, 21, 22, 28, 33– ous optimizations developed for graphs, including switching 35, 38–40, 42, 47, 51, 56, 60, 61, 63–65, 68, 69, 71–73, 75, 78, 81, between dierent traversal strategies based on the size of 84–87, 91, 94, 96, 99, 101], among many others; see [66, 79, 93] the frontier (direction optimization), edge-aware paralleliza- for surveys). Many of these frameworks can process large tion, bucketing for prioritizing the processing of vertices, graphs eciently, but none of them directly support hyper- and compression. graph processing. Our experiments on a variety of real-world and synthetic hypergraphs show that our algorithms implemented in Hy- Hypergraph Processing. As far as we know, HyperX [46] gra achieve good parallel speedup and scalability with re- and MESH [41] are the only existing high-level programming spect to input size. On 72 cores with hyper-threading, we frameworks for hypergraph processing, and they are built achieve a parallel speedup of between 8.5–76.5x. We also for distributed memory on top of Spark [95]. Algorithms nd that the direction optimization improves performance are written using hyperedge programs and vertex programs for hypergraphs algorithms compared to using a single tra- that are iteratively applied on hyperedges and vertices, re- versal strategy. Compared to HyperX [46] and MESH [41], spectively. HyperX stores hypergraphs using xed-length tu- which are the only existing high-level programming frame- ples containing vertex and hyperedge identiers, and MESH works for hypergraph processing that we are aware of, our stores the hypergraph as a bipartite graph. HyperX includes results are signicantly faster. For example, one iteration algorithms for random walks, label propagation, and spectral of PageRank on the Orkut community hypergraph with 2.3 learning. MESH includes PageRank, PageRank-Entropy (a million vertices and 15.3 million hyperedges [59] takes 0.083s variant of PageRank that also computes the entropy of ver- on 72 cores and 3.31s on one thread in Hygra, while taking tex ranks