Hypergraph Motifs: Concepts, Algorithms, and Discoveries

Hypergraph Motifs: Concepts, Algorithms, and Discoveries Geon Lee Jihoon Ko Kijung Shin KAIST AI KAIST AI KAIST AI & EE [email protected] [email protected] [email protected] ABSTRACT Co-authorship 0.2 coauth-DBLP Hypergraphs naturally represent group interactions, which 0.0 are omnipresent in many domains: collaborations of re- -0.2 coauth-geology Normalized searchers, co-purchases of items, joint interactions of pro- Significance coauth-history teins, to name a few. In this work, we propose tools for 1 5 10 15 20 25 answering the following questions in a systematic manner: Hypergraph Motif Index (Q1) what are structural design principles of real-world hy- 0.2 Tags pergraphs? (Q2) how can we compare local structures of 0.0 tags-ubuntu hypergraphs of different sizes? (Q3) how can we identify -0.2 tags-math Normalized domains which hypergraphs are from? We first define hy- Significance pergraph motifs (h-motifs), which describe the connectivity 1 5 10 15 20 25 patterns of three connected hyperedges. Then, we define Hypergraph Motif Index the significance of each h-motif in a hypergraph as its occur- Figure 1: Distributions of h-motifs' instances precisely char- rences relative to those in properly randomized hypergraphs. acterize local structural patterns of real-world hypergraphs. Lastly, we define the characteristic profile (CP) as the vec- Note that the hypergraphs from the same domains have sim- tor of the normalized significance of every h-motif. Regard- ilar distributions, while the hypergraphs from different do- ing Q1, we find that h-motifs' occurrences in 11 real-world mains do not. See Section 4.3 for details. hypergraphs from 5 domains are clearly distinguished from graphs have been used in a wide variety of fields, including those of randomized hypergraphs. In addition, we demon- social network analysis, web, bioinformatics, and epidemi- strate that CPs capture local structural patterns unique to ology. Global structural patterns of real-world graphs, such each domain, and thus comparing CPs of hypergraphs ad- as power-law degree distribution [10, 24] and six degrees of dresses Q2 and Q3. Our algorithmic contribution is to pro- separation [33, 66], have been extensively investigated. pose MoCHy, a family of parallel algorithms for counting In addition to global patterns, real-world graphs exhibit h-motifs' occurrences in a hypergraph. We theoretically an- patterns in their local structures, which differentiate graphs alyze their speed and accuracy, and we show empirically that in the same domain from random graphs or those in other the advanced approximate version MoCHy-A+ is up to 25× domains. Local structures are revealed by counting the oc- more accurate and 32× faster than the basic approximate currences of different network motifs [45, 46], which describe and exact versions, respectively. the patterns of pairwise interactions between a fixed num- PVLDB Reference Format: ber of connected nodes (typically 3, 4, or 5 nodes). As a Geon Lee, Jihoon Ko, and Kijung Shin. Hypergraph Motifs: Con- fundamental building block, network motifs have played a cepts, Algorithms, and Discoveries. PVLDB, 13(11): 2256-2269, key role in many analytical and predictive tasks, including 2020. DOI: https://doi.org/10.14778/3407790.3407823 community detection [13, 43, 62, 68], classification [20, 39, arXiv:2003.01853v2 [cs.SI] 19 Jul 2020 45], and anomaly detection [11, 57]. Despite the prevalence of graphs, interactions in many 1. INTRODUCTION complex systems are groupwise rather than pairwise: col- Complex systems consisting of pairwise interactions be- laborations of researchers, co-purchases of items, joint inter- tween individuals or objects are naturally expressed in the actions of proteins, tags attached to the same web post, to form of graphs. Nodes and edges, which compose a graph, name a few. These group interactions cannot be represented represent individuals (or objects) and their pairwise interac- by edges in a graph. Suppose three or more researchers tions, respectively. Thanks to their powerful expressiveness, coauthor a publication. This co-authorship cannot be represented as a single edge, and creating edges between all This work is licensed under the Creative Commons Attribution- NonCommercial-NoDerivatives 4.0 International License. To view a copy pairs of the researchers cannot be distinguished from multi- of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For ple papers coauthored by subsets of the researchers. any use beyond those covered by this license, obtain permission by emailing This inherent limitation of graphs is addressed by hyper- [email protected]. Copyright is held by the owner/author(s). Publication rights graphs, which consist of nodes and hyperedges. Each hyper- licensed to the VLDB Endowment. edge is a subset of any number of nodes, and it represents a Proceedings of the VLDB Endowment, Vol. 13, No. 11 group interaction among the nodes. For example, the coau- ISSN 2150-8097. DOI: https://doi.org/10.14778/3407790.3407823 thorship relations in Figure 2(a) are naturally represented as Authors (Nodes) �� Jure Leskovec (L) Austin Benson (B) Three Jon Kleinberg (K) David Gleich (G) Connected �� Christos Faloutsos (F) Timos Sellis (S) B G Hyperedges �� Daniel Huttenlocher (H) Nick Roussopoulos (R) � � � � (instances) �� Publications (Hyperedges) L � � H K F ��: [(L,K,F) KDD’05] R 1 1 1 �� Hypergraph ��: [(L,H,K) WWW’10] S 4 5 4 5 7 7 4 7 5 Motifs � : [(B,G,L) Science’16] �� 2 6 3 2 6 3 2 6 3 (h-motifs) ��: [(S,R,F) VLDB’87] (a) Example data: (b) Hypergraph (c) Projected (d) H-motifs and instances coauthorship relations representation graph Figure 2: (a) Example: co-authorship relations. (b) Hypergraph: the hypergraph representation of (a). (c) Projected Graph: the projected graph of (b). (d) Hypergraph Motifs: example h-motifs and their instances in (b). Table 1: Frequently-used symbols. • Hypergraphs from the same domains have similar CPs, Notation Definition while hypergraphs from different domains have distinct G = (V; E) hypergraph with nodes V and hyperedges E CPs (see Figure 1). In other words, CPs successfully cap- E = fe1; :::; ejEjg set of hyperedges ture local structure patterns unique to each domain. Ev set of hyperedges that contains a node v Our algorithmic contribution is to design MoCHy (Motif ^ set of hyperwedges in G Counting in Hypergraphs), a family of parallel algorithms îj hyperwedge consisting of ei and ej for counting h-motifs' instances, which is the computational G¯ = (E; ^;!) projected graph of G bottleneck of the aforementioned process. Note that since !(îj ) the number of nodes shared between ei and ej non-pairwise interactions are taken into consideration, count- N set of neighbors of e in G¯ ei i ing the instances of h-motifs is more challenging than count- h(fei; ej ; ekg) h-motif corresponding to an instance fei; ej ; ekg M[t] count of h-motif t's instances ing the instances of network motifs, which are defined solely based on pairwise interactions. We provide one exact ver- the hypergraph in Figure 2(b). In the hypergraph, seminar sion, named MoCHy-E, and two approximate versions, named work [40] coauthored by Jure Leskovec (L), Jon Kleinberg MoCHy-A and MoCHy-A+. Empirically, MoCHy-A+ is up to (K), and Christos Faloutsos (F) is expressed as the hyper- 25× more accurate than MoCHy-A, and it is up to 32× faster edge e1 = fL; K; F g, and it is distinguished from three pa- than MoCHy-E, with little sacrifice of accuracy. These em- pers coauthored by each pair, which, if they exist, can be pirical results are consistent with our theoretical analyses. represented as three hyperedges fK; Lg, fF; Lg, and fF; Kg. In summary, our contributions are summarized as follow: The successful investigation and discovery of local structural patterns in real-world graphs motivate us to explore • Novel Concepts: We propose h-motifs, the counts of local structural patterns in real-world hypergraphs. How- whose instances capture local structures of hypergraphs, ever, network motifs, which proved to be useful for graphs, independently of the sizes of hyperedges or hypergraphs. are not trivially extended to hypergraphs. Due to the flexi- • Fast and Provable Algorithms: We develop MoCHy, bility in the size of hyperedges, there can be infinitely many a family of parallel algorithms for counting h-motifs' in- patterns of interactions among a fixed number of nodes, and stances. We show theoretically and empirically that the other nodes can also be associated with these interactions. advanced version significantly outperforms the basic ones, In this work, taking these challenges into consideration, providing a better trade-off between speed and accuracy. we define 26 hypergraph motifs (h-motifs) so that they de- • Discoveries in 11 Real-world Hypergraphs: We show scribe connectivity patterns of three connected hyperedges that h-motifs and CPs reveal local structural patterns that (rather than nodes). As seen in Figure 2(d), h-motifs de- are shared by hypergraphs from the same domains but scribe the connectivity pattern of hyperedges e1, e2, and e3 distinguished from those of random hypergraphs and hy- by the emptiness of seven subsets: e1 n e2 n e3, e2 n e3 n e1, pergraphs from other domains (see Figure 1). e3ne1ne2, e1\e2ne3, e2\e3ne1, e3\e1ne2, and e1\e2\e3. As Reproducibility: The code and datasets used in this work a result, every connectivity pattern is described by a unique are available at https://github.com/geonlee0325/MoCHy. h-motif, independently of the sizes of hyperedges. While this In Section 2, we introduce h-motifs and characteristic pro- work focuses on connectivity patterns of three hyperedges, files. In Section 3, we present exact and approximate algo- h-motifs are easily extended to four or more hyperedges.

Load more