Modularity Based Community Detection with Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Modularity Based Community Detection with Deep Learning 1,2,3 1 3, 1 4 5,6 Liang Yang, Xiaochun Cao, Dongxiao He, ⇤ Chuan Wang, Xiao Wang, Weixiong Zhang 1State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences 2School of Information Engineering, Tianjin University of Commerce 3School of Computer Science and Technology, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University 4Department of Computer Science and Technology, Tsinghua University 5College of Math and Computer Science, Institute for Systems Biology, Jianghan University 6Department of Computer Science and Engineering, Washington University in St. Louis yangliang,caoxiaochun,wangchuan @iie.ac.cn, [email protected], { [email protected], [email protected]} Abstract model [Psorakis et al., 2011] focuses on deriving generative models of networks. Such a generative model in essence maps Identification of module or community structures a network to an embedding in a low-dimensional latent space. is important for characterizing and understanding The mapping can be done by, e.g., nonnegative matrix fac- complex systems. While designed with different ob- torization (NMF) [Wang et al., 2008]. Unlike the stochastic jectives, i.e., stochastic models for regeneration and model, the modularity maximization model [Newman, 2006], modularity maximization models for discrimination, as the name suggests, attempts to maximize a modularity func- both these two types of model look for low-rank tion on network substructures. The optimization can be done embedding to best represent and reconstruct net- by eigenvalue decomposition (EVD), which is equivalent to work topology. However, the mapping through such reconstructing a low-rank modularity matrix. embedding is linear, whereas real networks have var- In short, while appeared in different forms, the stochastic ious nonlinear features, making these models less model and modularity maximization model share an essential effective in practice. Inspired by the strong represen- commonality, i.e., mapping a network to a low-dimensional, tation power of deep neural networks, we propose a latent space embedding. Motivated by the strong discrimi- novel nonlinear reconstruction method by adopting nation power of the modularity maximization model and the deep neural networks for representation. We then relationship between maximizing modularity and reconstruct- extend the method to a semi-supervised community ing modularity matrix by finding low-dimensional embedding, detection algorithm by incorporating pairwise con- we aim to seek a more effective reconstruction algorithm for straints among graph nodes. Extensive experimental modularity optimization. However, all these types of embed- results on synthetic and real networks show that ding adopted in the two popular models are linear, e.g., NMF the new methods are effective, outperforming most in the stochastic model and EVD in the modularity maximiza- state-of-the-art methods for community detection. tion model. This is in sharp contrast to the fact that real-world networks are full of nonlinear properties, e.g., a relationship 1 Introduction (e.g., distance) among nodes may not necessarily be linear. As a result, the representation power of these linear mapping Real-world systems often appear in networks, e.g., social net- based models is limited on real-world networks. works in Facebook media, protein interaction networks, power Neural networks, particularly that with deep structures, are grids and the Internet. Real-world networks often consist of well known to provide nonlinear low-dimensional represen- functional units, which manifest in the form of network mod- tations [Bourlard and Kamp, 1988]. They have been success- ules or communities, subnetworks with nodes more tightly fully applied to complex problems and systems in practice, connected with respect to the rest of the networks. Finding such as image classification, speech recognition, and playing network communities is, therefore, critical for characteriz- an ancient strategic board game of Go. To the best of our ing the organizational structures and understanding complex knowledge, however, deep neural networks have not yet been systems. successfully applied to community detection. A great deal of effort has been devoted to developing net- work community finding methods, among which, two are Taking advantage of the nonlinear representation power of widely adopted and thus worth mentioning. The stochastic deep neural networks, we propose in this paper a nonlinear re- construction (DNR) algorithm for community detection using ⇤Corresponding author. deep neural networks. 2252 It is known that information beyond network topology can The index of the largest element in the ith row of H indicates greatly aid network community identification [Zhang, 2013; the community that node i belongs to. Yang et al., 2015]. Examples of such information include There are many variations to the stochastic model, such semantics on nodes, e.g., names and labels, and constraints on as nonnegative matrix tri-factorization and stochastic block relationships among nodes, e.g., community membership con- model. Nearly all of these models can be intuitively viewed as straints between adjacent nodes (pairwise constraints). In finding new representations in a low-dimensional space that the current study, we extend our DNR method to a semi- can best represent and reconstruct the adjacency matrix. supervised DNR (semi-DNR) algorithm to explicitly incor- porates pairwise constraints among nodes to further improve 2.2 Modularity Maximization Model community detection. This model was introduced by Newman [Newman, 2006] to maximize a modularity function Q, which is defined as the dif- 2 Reconstruction based Community Detection ference between the number of edges within communities and the expected number of such edges over all pairs of vertices. We consider an undirected and unweighted graph G =(V,E), For example, consider a network with two communities, then where V = v1,v2,...,vN is the set of N vertices, and E = { } 1 kikj eij the set of edges each of which connects two vertices in Q = a (h h ), { } 4m ij − 2m i j V . The adjacency matrix of G is a nonnegative symmetric ij N N X ⇣ ⌘ matrix A =[aij] R+ ⇥ where aij =1if there is an edge where h equals to 1 (or -1) if vertex i belongs to the first (or between vertices i2and j, or a =0otherwise, and a =0 i ij ii second) group, kikj is the expected number of edges between for all 1 i N. The degree of vertex i is defined as 2m vertices i and j if edges are placed randomly, ki is the degree ki = j aij. The problem of community detection is to find 1 K of vertex i and m = 2 i ki is the total number of edges in the K modules or communities Vi i=1 that are subgraphs whose N N P { } network. By defining modularity matrix B =[bij] R ⇥ vertices are more tightly connected with one another than with P 2 whose element is b = a kikj , modularity Q can be outside vertices. Here, we consider disjoint communities, i.e., ij ij − 2m Vi Vj = for i = j. written as \ ; 6 1 Q = hT Bh, (1) 2.1 Stochastic Model 4m N In stochastic model [Psorakis et al., 2011; He et al., 2015; where h =[hi] R is a community membership indicator 2 Jin et al., 2015], aij can be viewed as the probability that vector. Maximizing Eq. (1) is NP-hard, for which many vertices i and j are connected. This probability can be further optimization algorithms have been proposed, such as extremal considered to be determined by the probabilities that vertices optimization [Duch and Arenas, 2005]. In practice, we can i and j generate edges belonging to the same community. relax the problem by allowing variable hi to take any real value N K T We introduce latent variables H =[hik] R+ ⇥ with hik and h h = N. To generalize Eq. (1) to K>2 communities, 2 N K representing the probability that node i generates an edge we can define an indicator matrix H =[hij] R ⇥ and 2 belonging to community k. This latent variable also captures obtain T the probability that node i belongs to community k, and each MOD(H, B)=Q =Tr(H BH), row of H can be considered as a community membership L T distribution of a vertex. The probability that vertices i and j is s.t. Tr(H H)=N, connected by a link belonging to community k is then hikhjk, where Tr(.) is the trace of a matrix. Based on Rayleigh Quo- and the probability that they are connected is: tient, the solution to this problem is the largest K eigenvectors of the modularity matrix B. Each row of matrix H can be K regarded as a new representation of the corresponding vertex aˆij = hikhjk. in the latent space, and clustering algorithms, such as k-means, kX=1 can be used to classify the nodes into disjoint groups in the As a result, the community detection problem can be formu- latent space. lated as a nonnegative matrix factorization A Aˆ = HHT . The Eckart-Young-Mirsky Theorem [Eckart and Young, The NMF-based community detection approaches⇡ [Psorakis et 1936] explores the relationship between the reconstruction and al., 2011] aim to find a nonnegative membership matrix H to singular value decomposition (SVD). reconstruct adjacency matrix A. There are two common objec- Theorem 1. [Eckart and Young, 1936] For a matrix D m n T 2 tive (loss) functions to quantify the reconstruction error. The R ⇥ (m > n), if D = U⌃V is the singular value decom- first is based on the square loss function [Wang et al., 2008; position of D, and U, V, ⌃ = diag(σ1,σ2,...,σm) where Zhang et al., 2007] which is equivalent to the square of the σ1 > σ2 > ... > σm are as follows: Frobenius norm of the difference between two matrices ⌃1 0 T T U =[U1U2], ⌃ = , V =[V1V2], (A, HH )= A HH 2 .