Community Detection Algorithm Using Hypergraph Modularity

Community Detection Algorithm Using Hypergraph Modularity Bogumiª Kamiński, Paweª Praªat, and François Théberge Updated: 2021/01/08 Department of Mathematics, Ryerson University File: JMM-Hypergraphs Overview 1. Hypergraphs 2. Modularity Function 3. Algorithms 4. Conclusion 1 Hypergraphs From Graphs to Hypergraphs } Graphs are commonly used to model pairwise relations There are many tools to deal with graphs ◦ } Hypergraphs can represent relations beyond pairwise Hyperedges have arbitrary size ◦ More complex: often reduced to graphs ◦ Recent software packages handle hypergraphs ◦ } Our main goal: develop more hypergraph-aware tools 3 From Graphs to Hypergraphs — Detecting Communities } Clustering: partition vertices into communities } Graphs: each edge is either within a community or between two communities (noise) } Hypergraphs: which are community edges? 4 From Graphs to Hypergraphs — Detecting Communities Starting point — simple hypergraph benchmark: } A hyperedge of size 3 is a community edge if 7 3 2 vertices / are from the same community (to avoid multi-class) } Other edges are noise edges } Community edges can be non-homogeneous } Homogeneity (prop. from the dominant community) for community edges min, max 2 » ¼ } 0.5 5 min max 1 ≤ ≤ 5 Modularity Function Graph Modularity For a graph = +, and a partition A = 1, . , of +: ¹ º f : g 2 Õ 4 8 Õ vol 8 @ A = ¹ º ¹ º , ¹ º − vol + 8 A 8 A 2 j j 2 ¹ º } 4 8 = E9 ,E : E9 ,E 8 ¹ º jff : g 2 : 2 gj } The first term is the edge contribution: the fraction of edges that fall within one of the parts } The second term is the degree tax: expected fraction of edges that do the same in the corresponding null model (Chung-Lu random graph) 7 Hypergraph Modularity For a hypergraph = +, : ¹ º } let 3 be the set of hyperedges of size 3 } null model(generalized Chung-Lu random graph): 4 2 3 is a multiset (generalization of loops) There are several possible definitions of edge contribution for hypergraphs. For example, } strict: all 3 vertices need to be from the same community } majority: 7 3 2 vertices need to be from the same / community 8 Generalized Hypergraph Modularity 2,3 Let 4 8 be the number of hyperedges of size 3 that have ¹ º exactly 2 vertices in 8, with 3 2 5 2 3, and / ≤ 2,3 Õ 4 8 Õ vol 8 @2,3 A = ¹ º j 3 j P Bin 3, ¹ º = 2 ¹ º − vol + 8 A 8 A 2 j j j j 2 ¹ º This leads to the generalized H-modularity: 3 Õ Õ 2,3 @ A = F @ A . ¹ º 2,3 ¹ º 3 2 2= 3 2 1 ≥ b / ç controlled by hyper-parameters F 0, 1 . 2,3 2 » ¼ strict: all F2,3 = 0 except F3,3 = 1 majority: F2,3 = 1 9 Generalized Hypergraph Modularity } the strict definition is convenient and leads to interesting results (2019 PLOS ONE paper) ... but it is likely too strict in practice ◦ } empirical results point to some choices for @ A : ¹ º use increasing functions of 2 such as F = 2 3 ◦ 2,3 ¹ / º set F = 0 for 2 3 5 <8= where <8= is estimated from the ◦ 2,3 / ˆ ˆ data } further adapting the F2,3 to the data is work in progress 10 Algorithms Clustering Given hypergraph = +, : ¹ º } To partition +, one may reduce the problem by considering its 2-section graph = 2 » ¼ } Algorithms such as Louvain can then be used } Kumar et al. (Complex Networks 2019) proposed a nice refinement: build = 2 and run Louvain ◦ » ¼ re-weight edges in based on a measure of homogeneity to ◦ favour purer edges repeat until convergence ◦ } We propose two ways to include the hypergraph-based objective @ ¹º 12 Algorithm #1 – Last Step (LS) Consider the new objective @ only in the last step: ¹º 1. partition + via graph clustering on = 2 » ¼ 2. for every vertex (in random order) 2.1 compute change in @ if we move it to all of its ¹º neighbour’s communities in turn 2.2 apply best move 3. repeat step 2 until convergence 13 Algorithm #2 – Hybrid Algorithm (HA) Here we introduce the hypergraph-based objective @ sooner. ¹º 1. form small clumps of vertices by running level-1 Louvain or similar algorithm on = 2 » ¼ 2. merge clumps if @ improves; repeat until no more ¹º improvement is possible 3. run steps 2-3 in LS algorithm (move nodes) 14 Example — Benchmarks A few general observations: } @ -based algorithms generally help, given ¹º non-homogeneity } LS, HA: not as good with noise edges } LS+, HA+: refinement using estimate for <8= ˆ 15 Example — Games of Thrones Scenes1 @ -based clusters for two main characters: ¹º 1https://github.com/jeffreylancaster/game-of-thrones 16 Conclusion Conclusion Using hypergraph-based objective is useful; in particular, when non-homogeneity is present Future work: } Further adjust weights F in @ based on data 2,3 ¹º } More realistic benchmark models } More tests on real hypergraphs 18 THE END.

Community Detection Algorithm Using Hypergraph Modularity

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support