Community Detection Algorithm Using Hypergraph Modularity
Bogumiª Kamiński, Paweª Praªat, and François Théberge
Updated: 2021/01/08
Department of Mathematics, Ryerson University File: JMM-Hypergraphs Overview
1. Hypergraphs
2. Modularity Function
3. Algorithms
4. Conclusion
1 Hypergraphs From Graphs to Hypergraphs
} Graphs are commonly used to model pairwise relations There are many tools to deal with graphs ◦ } Hypergraphs can represent relations beyond pairwise Hyperedges have arbitrary size ◦ More complex: often reduced to graphs ◦ Recent software packages handle hypergraphs ◦ } Our main goal: develop more hypergraph-aware tools
3 From Graphs to Hypergraphs — Detecting Communities
} Clustering: partition vertices into communities } Graphs: each edge is either within a community or between two communities (noise)
} Hypergraphs: which are community edges?
4 From Graphs to Hypergraphs — Detecting Communities
Starting point — simple hypergraph benchmark:
} A hyperedge of size 3 is a community edge if > 3 2 vertices / are from the same community (to avoid multi-class) } Other edges are noise edges } Community edges can be non-homogeneous } Homogeneity (prop. from the dominant community) for
community edges min, max ∈ [ ] } 0.5 < min max 1 ≤ ≤
5 Modularity Function Graph Modularity
For a graph = +, and a partition A = 1, . . . , of +: ( ) { : } 2 Õ 4 8 Õ vol 8 @ A = ( ) ( ) , ( ) − vol + 8 A 8 A ∈ | | ∈ ( )
} 4 8 = E9 ,E : E9 ,E 8 ( ) |{{ : } ∈ : ∈ }| } The first term is the edge contribution: the fraction of edges that fall within one of the parts } The second term is the degree tax: expected fraction of edges that do the same in the corresponding null model (Chung-Lu random graph)
7 Hypergraph Modularity
For a hypergraph = +, : ( )
} let 3 be the set of hyperedges of size 3 } null model(generalized Chung-Lu random graph): 4 ∈ 3 is a multiset (generalization of loops)
There are several possible definitions of edge contribution for hypergraphs. For example,
} strict: all 3 vertices need to be from the same community } majority: > 3 2 vertices need to be from the same / community
8 Generalized Hypergraph Modularity
2,3 Let 4 8 be the number of hyperedges of size 3 that have ( ) exactly 2 vertices in 8, with 3 2 < 2 3, and / ≤ 2,3 Õ 4 8 Õ vol 8 @2,3 A = ( ) | 3 | P Bin 3, ( ) = 2 ( ) − vol + 8 A 8 A ∈ | | | | ∈ ( )
This leads to the generalized H-modularity:
3 Õ Õ 2,3 @ A = F @ A . ( ) 2,3 ( ) 3 2 2= 3 2 1 ≥ b / c+ controlled by hyper-parameters F 0, 1 . 2,3 ∈ [ ] strict: all F2,3 = 0 except F3,3 = 1 majority: F2,3 = 1 9 Generalized Hypergraph Modularity
} the strict definition is convenient and leads to interesting results (2019 PLOS ONE paper) ... but it is likely too strict in practice ◦ } empirical results point to some choices for @ A : ( ) use increasing functions of 2 such as F = 2 3 ◦ 2,3 ( / ) set F = 0 for 2 3 < <8= where <8= is estimated from the ◦ 2,3 / ˆ ˆ data
} further adapting the F2,3 to the data is work in progress
10 Algorithms Clustering
Given hypergraph = +, : ( ) } To partition +, one may reduce the problem by considering
its 2-section graph = 2 [ ] } Algorithms such as Louvain can then be used } Kumar et al. (Complex Networks 2019) proposed a nice refinement:
build = 2 and run Louvain ◦ [ ] re-weight edges in based on a measure of homogeneity to ◦ favour purer edges repeat until convergence ◦ } We propose two ways to include the hypergraph-based
objective @ ()
12 Algorithm #1 – Last Step (LS)
Consider the new objective @ only in the last step: ()
1. partition + via graph clustering on = 2 [ ] 2. for every vertex (in random order)
2.1 compute change in @ if we move it to all of its () neighbour’s communities in turn 2.2 apply best move 3. repeat step 2 until convergence
13 Algorithm #2 – Hybrid Algorithm (HA)
Here we introduce the hypergraph-based objective @ sooner. () 1. form small clumps of vertices by running level-1 Louvain
or similar algorithm on = 2 [ ] 2. merge clumps if @ improves; repeat until no more () improvement is possible 3. run steps 2-3 in LS algorithm (move nodes)
14 Example — Benchmarks
A few general observations:
} @ -based algorithms generally help, given () non-homogeneity } LS, HA: not as good with noise edges
} LS+, HA+: refinement using estimate for <8= ˆ
15 Example — Games of Thrones Scenes1
@ -based clusters for two main characters: ()
1https://github.com/jeffreylancaster/game-of-thrones 16 Conclusion Conclusion
Using hypergraph-based objective is useful; in particular, when non-homogeneity is present
Future work:
} Further adjust weights F in @ based on data 2,3 () } More realistic benchmark models } More tests on real hypergraphs
18 THE END