c c 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted of this work in other works.

1 Linear Algebraic Louvain Method in Python

Tze Meng Low, Daniele G. Spampinato Scott McMillan Michel Pelletier Electrical and Computer Engineering Department Software Engineering Institute Graphegon, Inc. Carnegie Mellon University Carnegie Mellon University Portland, OR, USA Pittsburgh, PA, USA Pittsburgh, PA, USA [email protected] [email protected], [email protected] [email protected]

Abstract—We show that a linear algebraic formulation of Step 2) A new graph is created from the newly par- • the Louvain method for community detection can be derived titioned graph. Identified communities form the set of systematically from the linear algebraic definition of . new vertices, and edges between vertices in different Using the pygraphblas interface, a high-level Python wrapper for the GraphBLAS C Application Programming Interface (API), we communities are aggregated into a new set of edges. Step demonstrate that the linear algebraic formulation of the Louvain 1 is then applied to this newly created graph. method can be rapidly implemented. The two steps above are repeated until no further improvement Index Terms—Louvain Method, Community Detection, Graph in modularity is obtained. Algorithms, GraphBLAS A. Definition of Modularity I.INTRODUCTION Modularity is a scalar measure used to quantify the strengh Community detection is a critical component in the anal- of community structures within a graph [3]. It is computed as isys of real-world network systems with important appli- 1 k k Q = A i j δ(c , c ), cations in social media analytics, biology, recommendation 2m ij − 2m i j ij systems, and telecommunications among others [1]. The Lou- X   vain method [2] based on modularity optimization [3] is a where Aij is the edge weight between vertices vi, vj V , m ∈ commonly adopted technique to identify communities. is the sum of all edge weights, ki is the sum of the weights Current implementations of the Louvain method adopt a of all edges incident on vi, ci C is the community ∈ traditional vertex-based approach in their algorithmic for- to which vi is currently assigned, and δ(ci, cj) = 1 if vertices mulation despite there being a linear algebraic formulation vi and vj are in the same community (i.e., ci = cj) and 0 for the computation of modularity [4]. In this paper, we otherwise. show that by starting with the linear algebraic formulation We will build on the matrix formulation of Q from [4]: of modularity, a linear algebraic algorithm for computing 1 Q = Γ(ST BS), (1) the Louvain method can be determined. This formulation 2m can leverage recent community-driven efforts behind linear where Γ( ) computes the trace of a matrix, S is a V C algebraic graph interfaces, such as the GraphBLAS [5], to · | | × | | matrix that captures the community membership of each rapidly develop an implementation of the Louvain method. vertex, and B is the V V modularity matrix where element Productivity is enhanced through the use of pygraphblas [6], | |×| | Bij is defined as a python interface for the GraphBLAS effort. k k B = A i j . ij ij − 2m II.BACKGROUND Each column si of S represents a community of vertices, th We assume the goal is to identify communities within a where a one in the j position of si indicates that vertex vj graph graph = (V,E) where V is the set of vertices G is in community ci. We denote vertex vj as a column basis and E is the set of weighted edges connecting them. The vector e of length V , and s is the sum of all vertices that j | | i Louvain method is an iterative algorithm where each iteration belong to the same community ci, i.e., is composed of two major steps: Step 1) Vertices are moved between communities such si = ej. • j:v c that each move increases the overall modularity of G. Xj ∈ i Step 1 is initialized by creating V communities where ~ | | An empty community is simply the zero vector, i.e. si = 0. each vertex is in its own community. Vertices are se- quentially reassigned to new communities as long as III.LINEAR ALGEBRAIC LOUVAIN METHOD modularity can be increased. The choice of communities In this section we develop a fully linear algebraic formula- for a vertex is based on the community that yields the tion of the Louvain algorithm starting from the linear algebraic highest modularity increase. definition of modularity. A. Modularity of a Community The maximum increase in modularity is computed as κ = Starting with Eqn. (1) and expanding on the definition of max(q1), and the community c` associated with κ can be the trace of a matrix, we express the modularity Q as identified by t = (q1 == κ), 1 T T T Q = s0 Bs0 + s1 Bs1 + ... + s C 1Bs C 1 2m | |− where t is a boolean vector of length V indicating the | |− | | C 1 C 1  v 1 | |− 1 | |− new community for vertex j. The original community = sT Bs = Q , c for vertex v can be computed by extracting the jth 2m i i 2m i i j i=0 i=0 S X X row of . Using the original community and the vector where Qi is the modularity of community ci. t, the community matrix S can be updated as Consider an arbitrary vertex vj where vj is a vertex T T S = S + ejt ejej S. (possibly, the only one) in community ci. We can define − a community vector si in terms of the vertex vj and the Step 1 is repeated for all vertices until no increase in remaining set of vertices si j as follows modularity can be achieved. \{ } Step 2) A new graph 0 = (V 0,E0) is constructed, where si = si j + ej. • G \{ } the set of vertices V are the communities identified in Using the above definition, we can derive the following defini- 0 Step 1. The new A is computed as tion of modularity for community i in terms of the contribution 0 T to modularity by introducing vertex vj into community i: A0 = S AS, T Qi = si Bsi where the weight at location Aij0 is the sum of edge = (e + (s e ))T B(e + (s e )) weights in A between vertices in community ci and j i − j j i − j T T T T vertices in community cj. The diagonal elements Aii0 are = ej Bej + si j Bej + ej Bsi j + si j Bsi j \{ } \{ } \{ } \{ } the sum of edge weights in A belonging to the same T T T = Bjj + si j Bej + ej Bsi j +si j Bsi j communities ci. \{ } \{ } \{ } \{ }

Contribution by vertex vj IV. IMPLEMENTATION CONSIDERATIONS B. Linear| Algebraic Louvain{z Method } In this section, we focus on Step 1 of the previously We address the two steps of Louvain separately. presented algorithm and discuss several optimizations. Step 1) Initially, vertices are assigned to their own • 1) Removing redundant computation. Recall the objective community and the community matrix S is the identity in Step 1 is to find the maximum increase in modularity. matrix, I. Since the expression eT Bs +sT Be is removed Each vertex, v is evaluated for the change in modularity j i j i j j j from all modularity computation,\{ } it\{ does} not contribute when reassigned from community ci to community c`. to the decision of which community should vertex vj Mathematically, the change in modularity due to such T be reassigned to. Thus, the expression ej Bsi j + movement (∆Q j , where i = `) can be computed as: \{ } i ` 6 sT Be does not need to be computed. Hence, the −→ i j j T T following\{ } computation can be performed instead: ∆Q j = (Bjj + ej Bs` j + s` j Bej) i ` \{ } \{ } −→ T T ˜ T T (Bjj + ej Bsi j + si j Bej) ∆Q j = ej Bs` j + s` j Bej. − \{ } \{ } i ` \{ } \{ } T T −→ T T = ej Bs` j + s` j Bej = e (B + B )s (3) \{ } \{ } j ` j T T \{ } (ej Bsi j + si j Bej). − \{ } \{ } 2) Bulk computation of q. Notice that Eqn. (3) can be Essentially, ∆Q j is computed by subtracting the con- computed at once for all values of ` as: i ` tribution of vertex−→vj to community ci from the contri- T T T ¯ q = ej (B + B )S. bution of vertex vj to community c`. Let q be a vector of length V defined as follows: where | | ¯ T S = S ejej S T q = ∆Q j ... ∆Q j . − i 0 i V 1 = (I e eT )S, (4) −→ −→| |− − j j As the Louvain algorithm only considers moving a vertex which is the community matrix after removing vertex v to a community that includes at least one neighbor, we j from its original community. can restrict the modularity computation to the commu- 3) Eliminating modularity matrix B. Modularity matrix B nities containing neighbors of v via a mask in the j is a dense V V matrix. In order to reduce the following manner: | | × | | memory requirements, recall that qT = qT ((eT (A + AT ))S = 0). (2) 1 ◦ j 6 1 B = A kkT , tq − 2m | {z } where k is a vector where the ith value is the node B. Discussion of vertex vi. Therefore, In terms of productivity, our pygraphblas implementation 1 1 can be mapped nearly one-to-one with the linear algebraic B + BT = (A kkT ) + (A kkT )T − 2m − 2m formulation of the algorithm, and requires less than 40 lines 1 of code (LOC). In contrast, the vertex-based NetworkX im- = (A + AT ) kkT . (5) − m plementation requires 4 as many (158) lines of code (LOC). × Instead of storing B, the above formulation allows us to For a graph algorithm developer, the use of the pygraphblas use a sparse matrix format to store G = (A + AT ), and interface greatly simplifies the implementation; resulting in a a dense vector k, thus reducing the memory footprint. boost in algorithmic development productivity. 4) Handling same change in modularity. Often, moving The ability to port between CPU and GPU platforms to take advantage of different hardware while maintaining the vertex vj into different communities yields the same change in modularity. In such cases, a tie-breaking same high-level interface (similar to Blanco et. al [10]) is also heuristics has to be applied to favor one community over another advantage of using GraphBLAS. the rest. In this paper, a community is picked at random. Performance-wise, due to the use of SuiteSparse as a black-box library that implement the GraphBLAS API, it Applying these optimizations, the final linear algebraic Lou- is unclear how much of the performance is related to the vain method algorithm is shown in Fig. 1 (left). linear algebraic approach or specific implementation decisions. V. EXPERIMENTS For example, parallelism was performed by the underlying SuiteSparse library, which may choose to use less than the In this section, we discuss preliminary experimental results allocated number of threads. Selecting the chuck size used in implementing our formulation of the Louvain method allowed us some limited form of control over this decision. discussing both productivity and performance aspects provided Nonetheless, we highlight key performance-related features of by the use of pygraphblas [6], a Python wrapper to the our implementation. GraphBLAS [5] API for graph algorithms. Lines 23-25 in Fig. 1 (right) are essentially the equivalent TABLE I of the AXPY operation in the Level 1 BLAS [11]. This can COMPARISON BETWEEN TWO PYTHON IMPLEMENTATIONS OF THE be implemented with a single GraphBLAS apply operation LOUVAIN METHOD:NETWORKX(VERTEX-BASED, SEQUENTIAL) AND PYGRAPHBLAS (PRESENTWORK, LINEARALGEBRAIC, BOTH in the most recent GraphBLAS 1.3 as follows: SEQUENTIAL, 1T, AND PARALLEL, 4T). THENUMBEROFCOMMUNITIES k.apply(FP64.TIMES, first=(-k_j/m), (GROUNDTRUTH) ARE REPORTED IN COLUMN 3. BOTH accum=FP64.PLUS, out=v) IMPLEMENTATIONS REPORTED 100% ACCURACY. However, as this capability is not supported by the version of Graph NetworkX pygraphblas SuiteSparse used, three separate function calls, each traversing VEC Time (s) Time (s) Time (s) through a vector of length V , have to be used. w/o Mask w/ Mask | | 1T 1T 4T 1T 4T It was interesting to note that while the masking operation 50 319 3 0.02 0.06 0.07 0.07 0.09 as described by Eqn. (2) was meant to reduce the computation 100 778 5 0.03 0.07 0.08 0.08 0.10 cost by considering only neighbor communities when deter- 500 9.4k 8 0.31 0.41 0.47 0.48 0.58 1,000 20.1k 11 0.78 0.79 0.89 0.93 1.11 mining changes in modularity, the computation of the mask 5,000 101.9k 19 5.59 7.55 8.19 8.63 9.06 was a significant cost in our implementation. As such, it was 20k 408.8k 32 26.29 103.99 70.03 113.43 77.17 generally faster for our implementation to compute the change 50k 1.0M 44 88.12 696.27 387.86 711.83 380.24 in modularity for all existing communities. This cost could A. Experimental Setup potentially be amortized with larger graphs. A considerable fraction of the runtime was spent extracting We implemented our linear algebraic Louvain method using from or assigning to single matrix columns or rows. While we pygraphblas [6], a Python wrapper of the SuiteSparse [7] have implemented the computation of the change in modularity library v3.2.0 that implements the C GraphBLAS API, and as a bulk operation, the sequential vertex approach remains a compared with the implementation of the original vertex-based large cost in this implementation. Louvain method for the NetworkX package [8] v.2.4. Our implementation is available at [6]. VI.CONCLUSION We ran our experiments on an Intel Core i7-4770K In this paper, we derived a linear algebraic formulation of (Haswell) CPU with four cores, 8 MB LLC cache, and 32 the Louvain method for community detection starting from GB of memory. We provide both sequential (1T) and parallel the linear algebraic definition of modularity. We also imple- (4T, GxB_CHUNK=1024) results in Table I. mented the derived algorithm using the pygraphblas interface We report time to solutions (mean of seven runs) for to enhance algorithm design productivity. implementations using Static Graphs with known truth for the We believe that alternative modularity-based clustering al- 2017 Partitioning Challenge provided on the Graph Challenge gorithms (e.g. the synchronized Louvain algorithm [12]), website [9]. All implementations correctly identified commu- should enable better performance when expressed in terms of nities when compared to the ground truth. bulk linear algebra operations. 1 1 def louvain_cluster(graph): 2 2 rows= graph.nrows 3 3 S_row= Vector.from_type(BOOL, rows) 4 4 empty= Vector.from_type(BOOL, rows) 5 5 6 6 G= graph.dup() T 7 G = A + A 7 graph.transpose(out=G, accum=FP64.PLUS) 8 8 G_rows=[G.extract_row(i) for i in range(rows) ] 9 k = A vec(1) 9 k= graph.reduce_vector() 1 T 10 m = 2 k vec(1) 10 inv_m= 2.0/k.reduce_int() 11 S = I 11 S= Matrix.identity(BOOL, rows, format=lib.GxB_BY_COL) 12 12 13 v changed = True 13 vertices_changed= len(k) 14 while v changed: 14 while vertices_changed>0: 15 v changed = False 15 16 for j in range( V ): 16 for j,k_j in k: | | 17 17 v= G_rows[j] T 18 t_q=v.vxm(S, semiring=BOOL.ANY_PAIR) 18 tq = (ej G)S 19 19 T T 20 S.extract_row(j, out=S_row) 20 tj = ej S ¯ T 21 S[j,:]= empty 21 S = (I ej ej )S − 22 22 23 23 q=k.dup() 24 24 q.assign_scalar(-k_j * inv_m, accum=FP64.TIMES, mask=k) T T T 25 q+=v 25 q = ej G + ( kj /m)k T T ¯ − 26 q1=q.vxm(S, mask=t_q) 26 q1 = (q S) (tq = 0) ◦ 6 27 27 28 max_q1= q1.reduce_float(MAX_MONOID) 28 κq = max(q1) 29 t= q1.select("==", max_q) 29 t = (q1 == κq) 30 while len(t) !=1: 30 while t.nvals() != 1: 31 p=t.apply(random_scaler) 31 p = random() t ◦ 32 max_p=p.reduce_float(MAX_MONOID) 32 κp = max(p) 33 t=p.select("==", max_p) 33 t = (p == κp) T 34 S[j,:]=t 34 S = S¯ + ej t 35 35 36 vertices_changed-=1 36 37 if len(S_row * t) ==0: 37 if t != tj : 38 vertices_changed= len(k)-1 38 v = True changed 39 if vertices_changed<=0: 39 40 break 40 41 41 42 return S 42

Fig. 1. (Left) Formulation of of the linear algebraic Louvain method described in Sec. III and (right) its pygraphblas implementation. ACKNOWLEDGEMENT [5] J. Kepner, P. Aaltonen, D. A. Bader, A. Buluc¸, F. Franchetti, J. R. Gilbert, D. Hutchison, M. Kumar, A. Lumsdaine, This material is based upon work funded and supported by H. Meyerhenke, S. McMillan, J. E. Moreira, J. D. Owens, C. Yang, the Department of Defense under Contract No. FA8702-15- M. Zalewski, and T. G. Mattson, “Mathematical foundations of the GraphBLAS,” CoRR, vol. abs/1606.05790, 2016. [Online]. Available: D-0002 with Carnegie Mellon University for the operation of http://arxiv.org/abs/1606.05790 the Software Engineering Institute, a federally funded research [6] M. Pelletier, “pygraphblas: GraphBLAS for Python.” [Online]. and development center [DM20-0213]. References herein to Available: https://github.com/michelp/pygraphblas [7] T. Davis, “Algorithm 9xx: SuiteSparse:GraphBLAS: graph algorithms in any specific commercial product, process, or service by trade the language of sparse linear algebra,” Submitted to ACM Transactions name, trade mark, manufacturer, or otherwise, does not nec- on Mathematical Software (TOMS), 2018. essarily constitute or imply its endorsement, recommendation, [8] T. Aynaud, “Community detection for NetworkX.” [Online]. Available: https://python-louvain.readthedocs.io or favoring by Carnegie Mellon University or its Software [9] “Graph Challenge,” 2017. [Online]. Available: Engineering Institute. https://graphchallenge.mit.edu REFERENCES [10] M. Blanco, T.-M. Low, and K. Kim, “Exploration of fine-grained parallelism for load balancing eager K-truss on GPU and CPU,” in High [1] S. Fortunato, “Community detection in graphs,” Physics Reports, vol. Performance Extreme Computing (HPEC), 2019. 486, no. 3, pp. 75–174, 2010. [11] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, “Basic [2] V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast linear algebra subprograms for Fortran usage,” ACM Transactions on unfolding of community hierarchies in large networks,” Journal of Mathematical Software (TOMS), vol. 5, no. 3, pp. 308–323, 1979. Statistical Mechanics: Theory and Experiment, vol. 10, 2008. [12] A. Browet, P.-A. Absil, and P. Van Dooren, “Fast community detection [3] M. E. J. Newman and M. Girvan, “Finding and evaluating community using local neighbourhood search,” preprint arXiv:1308.6276, 2013. structure in networks,” Physical review E, vol. 69, p. 026113, 2004. [4] M. E. J. Newman, “Modularity and in networks,” Proceedings of the National Academy of Sciences (PNAS), vol. 103, no. 23, p. 85778582, 2006.