Community Detection Algorithm Using Modularity

Bogumiª Kamiński, Paweª Praªat, and François Théberge

Updated: 2021/01/08

Department of Mathematics, Ryerson University File: JMM- Overview

1. Hypergraphs

2. Modularity Function

3. Algorithms

4. Conclusion

1 Hypergraphs From Graphs to Hypergraphs

} Graphs are commonly used to model pairwise relations There are many tools to deal with graphs ◦ } Hypergraphs can represent relations beyond pairwise Hyperedges have arbitrary size ◦ More complex: often reduced to graphs ◦ Recent software packages handle hypergraphs ◦ } Our main goal: develop more hypergraph-aware tools

3 From Graphs to Hypergraphs — Detecting Communities

} Clustering: partition vertices into communities } Graphs: each edge is either within a community or between two communities (noise)

} Hypergraphs: which are community edges?

4 From Graphs to Hypergraphs — Detecting Communities

Starting point — simple hypergraph benchmark:

} A hyperedge of size 3 is a community edge if > 3 2 vertices / are from the same community (to avoid multi-class) } Other edges are noise edges } Community edges can be non-homogeneous } Homogeneity (prop. from the dominant community) for

community edges min, max ∈ [ ] } 0.5 < min max 1 ≤ ≤

5 Modularity Function Graph Modularity

For a graph  = +,  and a partition A = 1, . . . ,  of +: ( ) { : }  2 Õ 4 8 Õ vol 8 @ A = ( ) ( ) , ( )  − vol + 8 A 8 A ∈ | | ∈ ( )

} 4 8 = E9 ,E  : E9 ,E 8 ( ) |{{ : } ∈ : ∈ }| } The first term is the edge contribution: the fraction of edges that fall within one of the parts } The second term is the tax: expected fraction of edges that do the same in the corresponding null model (Chung-Lu )

7 Hypergraph Modularity

For a hypergraph  = +,  : ( )

} let 3 be the set of hyperedges of size 3 } null model(generalized Chung-Lu random graph): 4  ∈ 3 is a multiset (generalization of loops)

There are several possible definitions of edge contribution for hypergraphs. For example,

} strict: all 3 vertices need to be from the same community } majority: > 3 2 vertices need to be from the same / community

8 Generalized Hypergraph Modularity

2,3 Let 4 8 be the number of hyperedges of size 3 that have  ( ) exactly 2 vertices in 8, with 3 2 < 2 3, and / ≤ 2,3     Õ 4 8  Õ vol 8 @2,3 A =  ( ) | 3 | P Bin 3, ( ) = 2  ( )  −  vol + 8 A 8 A ∈ | | | | ∈ ( )

This leads to the generalized H-modularity:

3 Õ Õ 2,3 @ A = F @ A . ( ) 2,3  ( ) 3 2 2= 3 2 1 ≥ b / c+ controlled by hyper-parameters F 0, 1 . 2,3 ∈ [ ] strict: all F2,3 = 0 except F3,3 = 1 majority: F2,3 = 1 9 Generalized Hypergraph Modularity

} the strict definition is convenient and leads to interesting results (2019 PLOS ONE paper) ... but it is likely too strict in practice ◦ } empirical results point to some choices for @ A : ( ) use increasing functions of 2 such as F = 2 3 ◦ 2,3 ( / ) set F = 0 for 2 3 < <8= where <8= is estimated from the ◦ 2,3 / ˆ ˆ data

} further adapting the F2,3 to the data is work in progress

10 Algorithms Clustering

Given hypergraph  = +,  : ( ) } To partition +, one may reduce the problem by considering

its 2-section graph  =  2 [ ] } Algorithms such as Louvain can then be used } Kumar et al. (Complex Networks 2019) proposed a nice refinement:

build  =  2 and run Louvain ◦ [ ] re-weight edges in  based on a measure of homogeneity to ◦ favour purer edges repeat until convergence ◦ } We propose two ways to include the hypergraph-based

objective @ ()

12 Algorithm #1 – Last Step (LS)

Consider the new objective @ only in the last step: ()

1. partition + via graph clustering on  =  2 [ ] 2. for every (in random order)

2.1 compute change in @ if we move it to all of its () neighbour’s communities in turn 2.2 apply best move 3. repeat step 2 until convergence

13 Algorithm #2 – Hybrid Algorithm (HA)

Here we introduce the hypergraph-based objective @ sooner. () 1. form small clumps of vertices by running level-1 Louvain

or similar algorithm on  =  2 [ ] 2. merge clumps if @ improves; repeat until no more () improvement is possible 3. run steps 2-3 in LS algorithm (move nodes)

14 Example — Benchmarks

A few general observations:

} @ -based algorithms generally help, given () non-homogeneity } LS, HA: not as good with noise edges

} LS+, HA+: refinement using estimate for <8= ˆ

15 Example — Games of Thrones Scenes1

@ -based clusters for two main characters: ()

1https://github.com/jeffreylancaster/game-of-thrones 16 Conclusion Conclusion

Using hypergraph-based objective is useful; in particular, when non-homogeneity is present

Future work:

} Further adjust weights F in @ based on data 2,3 () } More realistic benchmark models } More tests on real hypergraphs

18 THE END