Linear Algebraic Louvain Method in Python
Total Page:16
File Type:pdf, Size:1020Kb
c c 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. 1 Linear Algebraic Louvain Method in Python Tze Meng Low, Daniele G. Spampinato Scott McMillan Michel Pelletier Electrical and Computer Engineering Department Software Engineering Institute Graphegon, Inc. Carnegie Mellon University Carnegie Mellon University Portland, OR, USA Pittsburgh, PA, USA Pittsburgh, PA, USA [email protected] [email protected], [email protected] [email protected] Abstract—We show that a linear algebraic formulation of Step 2) A new graph is created from the newly par- • the Louvain method for community detection can be derived titioned graph. Identified communities form the set of systematically from the linear algebraic definition of modularity. new vertices, and edges between vertices in different Using the pygraphblas interface, a high-level Python wrapper for the GraphBLAS C Application Programming Interface (API), we communities are aggregated into a new set of edges. Step demonstrate that the linear algebraic formulation of the Louvain 1 is then applied to this newly created graph. method can be rapidly implemented. The two steps above are repeated until no further improvement Index Terms—Louvain Method, Community Detection, Graph in modularity is obtained. Algorithms, GraphBLAS A. Definition of Modularity I. INTRODUCTION Modularity is a scalar measure used to quantify the strengh Community detection is a critical component in the anal- of community structures within a graph [3]. It is computed as isys of real-world network systems with important appli- 1 k k Q = A i j δ(c , c ), cations in social media analytics, biology, recommendation 2m ij − 2m i j ij systems, and telecommunications among others [1]. The Lou- X vain method [2] based on modularity optimization [3] is a where Aij is the edge weight between vertices vi, vj V , m ∈ commonly adopted technique to identify communities. is the sum of all edge weights, ki is the sum of the weights Current implementations of the Louvain method adopt a of all edges incident on vertex vi, ci C is the community ∈ traditional vertex-based approach in their algorithmic for- to which vi is currently assigned, and δ(ci, cj) = 1 if vertices mulation despite there being a linear algebraic formulation vi and vj are in the same community (i.e., ci = cj) and 0 for the computation of modularity [4]. In this paper, we otherwise. show that by starting with the linear algebraic formulation We will build on the matrix formulation of Q from [4]: of modularity, a linear algebraic algorithm for computing 1 Q = Γ(ST BS), (1) the Louvain method can be determined. This formulation 2m can leverage recent community-driven efforts behind linear where Γ( ) computes the trace of a matrix, S is a V C algebraic graph interfaces, such as the GraphBLAS [5], to · | | × | | matrix that captures the community membership of each rapidly develop an implementation of the Louvain method. vertex, and B is the V V modularity matrix where element Productivity is enhanced through the use of pygraphblas [6], | |×| | Bij is defined as a python interface for the GraphBLAS effort. k k B = A i j . ij ij − 2m II. BACKGROUND Each column si of S represents a community of vertices, th We assume the goal is to identify communities within a where a one in the j position of si indicates that vertex vj graph graph = (V, E) where V is the set of vertices G is in community ci. We denote vertex vj as a column basis and E is the set of weighted edges connecting them. The vector e of length V , and s is the sum of all vertices that j | | i Louvain method is an iterative algorithm where each iteration belong to the same community ci, i.e., is composed of two major steps: Step 1) Vertices are moved between communities such si = ej. • j:v c that each move increases the overall modularity of G. Xj ∈ i Step 1 is initialized by creating V communities where ~ | | An empty community is simply the zero vector, i.e. si = 0. each vertex is in its own community. Vertices are se- quentially reassigned to new communities as long as III. LINEAR ALGEBRAIC LOUVAIN METHOD modularity can be increased. The choice of communities In this section we develop a fully linear algebraic formula- for a vertex is based on the community that yields the tion of the Louvain algorithm starting from the linear algebraic highest modularity increase. definition of modularity. A. Modularity of a Community The maximum increase in modularity is computed as κ = Starting with Eqn. (1) and expanding on the definition of max(q1), and the community c` associated with κ can be the trace of a matrix, we express the modularity Q as identified by t = (q1 == κ), 1 T T T Q = s0 Bs0 + s1 Bs1 + ... + s C 1Bs C 1 2m | |− where t is a boolean vector of length V indicating the | |− | | C 1 C 1 v 1 | |− 1 | |− new community for vertex j. The original community = sT Bs = Q , c for vertex v can be computed by extracting the jth 2m i i 2m i i j i=0 i=0 S X X row of . Using the original community and the vector where Qi is the modularity of community ci. t, the community matrix S can be updated as Consider an arbitrary vertex vj where vj is a vertex T T S = S + ejt ejej S. (possibly, the only one) in community ci. We can define − a community vector si in terms of the vertex vj and the Step 1 is repeated for all vertices until no increase in remaining set of vertices si j as follows modularity can be achieved. \{ } Step 2) A new graph 0 = (V 0,E0) is constructed, where si = si j + ej. • G \{ } the set of vertices V are the communities identified in Using the above definition, we can derive the following defini- 0 Step 1. The new adjacency matrix A is computed as tion of modularity for community i in terms of the contribution 0 T to modularity by introducing vertex vj into community i: A0 = S AS, T Qi = si Bsi where the weight at location Aij0 is the sum of edge = (e + (s e ))T B(e + (s e )) weights in A between vertices in community ci and j i − j j i − j T T T T vertices in community cj. The diagonal elements Aii0 are = ej Bej + si j Bej + ej Bsi j + si j Bsi j \{ } \{ } \{ } \{ } the sum of edge weights in A belonging to the same T T T = Bjj + si j Bej + ej Bsi j +si j Bsi j communities ci. \{ } \{ } \{ } \{ } Contribution by vertex vj IV. IMPLEMENTATION CONSIDERATIONS B. Linear| Algebraic Louvain{z Method } In this section, we focus on Step 1 of the previously We address the two steps of Louvain separately. presented algorithm and discuss several optimizations. Step 1) Initially, vertices are assigned to their own • 1) Removing redundant computation. Recall the objective community and the community matrix S is the identity in Step 1 is to find the maximum increase in modularity. matrix, I. Since the expression eT Bs +sT Be is removed Each vertex, v is evaluated for the change in modularity j i j i j j j from all modularity computation,\{ } it\{ does} not contribute when reassigned from community ci to community c`. to the decision of which community should vertex vj Mathematically, the change in modularity due to such T be reassigned to. Thus, the expression ej Bsi j + movement (∆Q j , where i = `) can be computed as: \{ } i ` 6 sT Be does not need to be computed. Hence, the −→ i j j T T following\{ } computation can be performed instead: ∆Q j = (Bjj + ej Bs` j + s` j Bej) i ` \{ } \{ } −→ T T ˜ T T (Bjj + ej Bsi j + si j Bej) ∆Q j = ej Bs` j + s` j Bej. − \{ } \{ } i ` \{ } \{ } T T −→ T T = ej Bs` j + s` j Bej = e (B + B )s (3) \{ } \{ } j ` j T T \{ } (ej Bsi j + si j Bej). − \{ } \{ } 2) Bulk computation of q. Notice that Eqn. (3) can be Essentially, ∆Q j is computed by subtracting the con- computed at once for all values of ` as: i ` tribution of vertex−→vj to community ci from the contri- T T T ¯ q = ej (B + B )S. bution of vertex vj to community c`. Let q be a vector of length V defined as follows: where | | ¯ T S = S ejej S T q = ∆Q j ... ∆Q j . − i 0 i V 1 = (I e eT )S, (4) −→ −→| |− − j j As the Louvain algorithm only considers moving a vertex which is the community matrix after removing vertex v to a community that includes at least one neighbor, we j from its original community. can restrict the modularity computation to the commu- 3) Eliminating modularity matrix B. Modularity matrix B nities containing neighbors of v via a mask in the j is a dense V V matrix. In order to reduce the following manner: | | × | | memory requirements, recall that qT = qT ((eT (A + AT ))S = 0). (2) 1 ◦ j 6 1 B = A kkT , tq − 2m | {z } where k is a vector where the ith value is the node B. Discussion degree of vertex vi. Therefore, In terms of productivity, our pygraphblas implementation 1 1 can be mapped nearly one-to-one with the linear algebraic B + BT = (A kkT ) + (A kkT )T − 2m − 2m formulation of the algorithm, and requires less than 40 lines 1 of code (LOC).