Community Detection Algorithm Using Hypergraph Modularity

Community Detection Algorithm Using Hypergraph Modularity Bogumiª Kamiński, Paweª Praªat, and François Théberge Updated: 2021/01/08 Department of Mathematics, Ryerson University File: JMM-Hypergraphs Overview 1. Hypergraphs 2. Modularity Function 3. Algorithms 4. Conclusion 1 Hypergraphs From Graphs to Hypergraphs } Graphs are commonly used to model pairwise relations There are many tools to deal with graphs ◦ } Hypergraphs can represent relations beyond pairwise Hyperedges have arbitrary size ◦ More complex: often reduced to graphs ◦ Recent software packages handle hypergraphs ◦ } Our main goal: develop more hypergraph-aware tools 3 From Graphs to Hypergraphs — Detecting Communities } Clustering: partition vertices into communities } Graphs: each edge is either within a community or between two communities (noise) } Hypergraphs: which are community edges? 4 From Graphs to Hypergraphs — Detecting Communities Starting point — simple hypergraph benchmark: } A hyperedge of size 3 is a community edge if 7 3 2 vertices / are from the same community (to avoid multi-class) } Other edges are noise edges } Community edges can be non-homogeneous } Homogeneity (prop. from the dominant community) for community edges min, max 2 » ¼ } 0.5 5 min max 1 ≤ ≤ 5 Modularity Function Graph Modularity For a graph = +, and a partition A = 1, . , of +: ¹ º f : g 2 Õ 4 8 Õ vol 8 @ A = ¹ º ¹ º , ¹ º − vol + 8 A 8 A 2 j j 2 ¹ º } 4 8 = E9 ,E : E9 ,E 8 ¹ º jff : g 2 : 2 gj } The first term is the edge contribution: the fraction of edges that fall within one of the parts } The second term is the degree tax: expected fraction of edges that do the same in the corresponding null model (Chung-Lu random graph) 7 Hypergraph Modularity For a hypergraph = +, : ¹ º } let 3 be the set of hyperedges of size 3 } null model(generalized Chung-Lu random graph): 4 2 3 is a multiset (generalization of loops) There are several possible definitions of edge contribution for hypergraphs. For example, } strict: all 3 vertices need to be from the same community } majority: 7 3 2 vertices need to be from the same / community 8 Generalized Hypergraph Modularity 2,3 Let 4 8 be the number of hyperedges of size 3 that have ¹ º exactly 2 vertices in 8, with 3 2 5 2 3, and / ≤ 2,3 Õ 4 8 Õ vol 8 @2,3 A = ¹ º j 3 j P Bin 3, ¹ º = 2 ¹ º − vol + 8 A 8 A 2 j j j j 2 ¹ º This leads to the generalized H-modularity: 3 Õ Õ 2,3 @ A = F @ A . ¹ º 2,3 ¹ º 3 2 2= 3 2 1 ≥ b / ç controlled by hyper-parameters F 0, 1 . 2,3 2 » ¼ strict: all F2,3 = 0 except F3,3 = 1 majority: F2,3 = 1 9 Generalized Hypergraph Modularity } the strict definition is convenient and leads to interesting results (2019 PLOS ONE paper) ... but it is likely too strict in practice ◦ } empirical results point to some choices for @ A : ¹ º use increasing functions of 2 such as F = 2 3 ◦ 2,3 ¹ / º set F = 0 for 2 3 5 <8= where <8= is estimated from the ◦ 2,3 / ˆ ˆ data } further adapting the F2,3 to the data is work in progress 10 Algorithms Clustering Given hypergraph = +, : ¹ º } To partition +, one may reduce the problem by considering its 2-section graph = 2 » ¼ } Algorithms such as Louvain can then be used } Kumar et al. (Complex Networks 2019) proposed a nice refinement: build = 2 and run Louvain ◦ » ¼ re-weight edges in based on a measure of homogeneity to ◦ favour purer edges repeat until convergence ◦ } We propose two ways to include the hypergraph-based objective @ ¹º 12 Algorithm #1 – Last Step (LS) Consider the new objective @ only in the last step: ¹º 1. partition + via graph clustering on = 2 » ¼ 2. for every vertex (in random order) 2.1 compute change in @ if we move it to all of its ¹º neighbour’s communities in turn 2.2 apply best move 3. repeat step 2 until convergence 13 Algorithm #2 – Hybrid Algorithm (HA) Here we introduce the hypergraph-based objective @ sooner. ¹º 1. form small clumps of vertices by running level-1 Louvain or similar algorithm on = 2 » ¼ 2. merge clumps if @ improves; repeat until no more ¹º improvement is possible 3. run steps 2-3 in LS algorithm (move nodes) 14 Example — Benchmarks A few general observations: } @ -based algorithms generally help, given ¹º non-homogeneity } LS, HA: not as good with noise edges } LS+, HA+: refinement using estimate for <8= ˆ 15 Example — Games of Thrones Scenes1 @ -based clusters for two main characters: ¹º 1https://github.com/jeffreylancaster/game-of-thrones 16 Conclusion Conclusion Using hypergraph-based objective is useful; in particular, when non-homogeneity is present Future work: } Further adjust weights F in @ based on data 2,3 ¹º } More realistic benchmark models } More tests on real hypergraphs 18 THE END.

Load more