IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL.XX, NO.XX, 2021 1
A Comprehensive Survey on Community Detection with Deep Learning Xing Su, Shan Xue, Fanzhen Liu, Jia Wu, Senior Member, IEEE, Jian Yang, Chuan Zhou, Wenbin Hu, Cecile Paris, Surya Nepal, Di Jin, Quan Z. Sheng, and Philip S. Yu, Fellow, IEEE
Abstract—A community reveals the features and connections of its members that are different from those in other communities C1 C2 in a network. Detecting communities is of great significance in 2 1 5 3 1 6 network analysis. Despite the classical spectral clustering and 5 statistical inference methods, we notice a significant develop- 2 3 ment of deep learning techniques for community detection in 6 7 7 4 recent years with their advantages in handling high dimensional 4 network data. Hence, a comprehensive overview of community (a) graph (b) communities detection’s latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the- Fig. 1: (a) An illustration of graph structures with nodes denoting users in a social network and their edges. (b) An illustration of two communities (C art methods, including deep learning-based models upon deep 1 and C2) based on the prediction of users’ occupations. The prediction utilizes neural networks, deep nonnegative matrix factorization and deep users’ closeness in online activities (topological structures) and account sparse filtering. The main category, i.e., deep neural networks, profiles (node attributes). is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to and large networks [15]–[17]. Increasingly graph-based ap- address experimentation settings. We then discuss the practical proaches are developed to detect communities in environments applications of community detection in various domains and with complex data structures [18], [19]. With community point to implementation scenarios. Finally, we outline future detection, the dynamics and impacts of communities in a directions by suggesting challenging topics in this fast-growing network can be analyzed in details, for example, for rumor deep learning field. spread, virus outbreak, and tumour evolution. Index Terms—Community Detection, Deep Learning, Social The existence of communities drives the development of Networks, Network Representation, Graph Neural Networks community detection studies and a research area with in- creasing practical significance. As the saying goes, Birds of I.INTRODUCTION a feather flock together [20]. Based on the theory of Six Degrees of Separation, any person in the world can know Communities have been investigated as early as 1920s in anyone else through six acquaintances [21]. Indeed, our world sociology and social anthropology [1]. However, it is only is a huge network formed by a series of communities. For after the 21st century that researchers started to detect com- example, by detecting communities in social networks [22]– munities with powerful mathematical tools and large-scale data [24], as shown in Fig.1, platform sponsors can promote their manipulation to address challenging problems [2]. Since 2002 products to targeted users. Community detection in citation [3], Girvan and Newman brought the graph partition problem networks [25] determines the importance, interconnectedness, arXiv:2105.12584v1 [cs.SI] 26 May 2021 to the broader attention. In the past 10 years, researchers evolution of research topics and identifies research trends. In from computer science have extensively studied the problem metabolic networks [26], [27] and Protein-Protein Interaction of community detection [4] by utilizing network topological (PPI) networks [28], community detection reveals metabolisms structures [5]–[8] and entities’ semantic information [9]–[11], and proteins with similar biological functions. Similarly, com- for both static and dynamic networks [12]–[14], and for small munity detection in brain networks [19], [29] reflects the X. Su, S. Xue, F. Liu, J. Wu, J. Yang, Q. Z. Sheng are with De- functional and anatomical segregation of brain regions. partment of Computing, Macquarie University, Sydney, Australia. E-mail: Many traditional techniques, such as spectral clustering {xing.su2@students., emma.xue, fanzhen.liu@students., jia.wu, jian.yang, [30], [31] and statistical inference [32]–[35], are employed michael.sheng}mq.edu.au. C. Zhou is with Academy of Mathematics and Systems Science, Chinese for small networks and simple scenarios. However, they do Academy of Sciences, Beijing, China. E-mail: [email protected]. not scale up for large networks or networks with high dimen- W. Hu is with School of Computer Science, Wuhan University, Wuhan, sional features because of their significant computational and China. E-mail: [email protected]. S. Xue, C. Paris, S. Nepa are with CSIRO’s Data61, Sydney, Australia. space costs. Rich nonlinear structure information in real-world E-mail: {emma.xue, surya.nepal, cecile.paris}@data61.csiro.au networks makes traditional models less applicable to practical D. Jin is with School of Computer Science and Technology Tianjin applications. Thus, more powerful techniques with good com- University, China. E-mail: [email protected]. P.S. Yu is with Department of Computer Science, University of Illinois at putation performance are required. For now, deep learning Chicago, Chicago, USA. Email: [email protected]. offers the most flexible solution because deep learning models: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL.XX, NO.XX, 2021 2
(1) learn nonlinear network properties, such as relationship be- C1 tween nodes, (2) provide a low-dimensional network represen- 1 C2 C1 1 C2 tation preserving the complicated network structures, and (3) 5 3 5 2 achieve an improved performance in detecting communities 2 3 6 6 from various pieces of information. Therefore, deep learning 7 4 7 for community detection is a new trend that demands a timely 4 1 comprehensive survey . (a) disjoint (b) overlapping To the best of our knowledge, this paper is the first compre- hensive survey focusing on deep learning contributions in com- Fig. 2: An illustrative comparison between (a) disjoint communities and (b) munity detection. Previous surveys focused on traditional com- overlapping communities (overlapping nodes are in blue). munity detection, reviewing its significant impact in discover- ing the inherent network patterns and functions [36], [37]. The techniques. SectionXI summarizes the popular experimental surveys on specific techniques are also summarized but not data sets, evaluation metrics and open-source implementing limited to: partial detection based on Stochastic Block Models codes. We survey real-world applications in Section XII. (SBMs) [38], Label Propagation Algorithm (LPAs) [39], [40], Lastly, Section XIII discusses the current challenges, suggest- and evolutionary computation for single– and multi–objective ing future research directions. Section XIV summarizes the optimizations [13], [14]. In terms of network types, researchers paper. have provided overviews of community detection methods in dynamic networks [12], directed networks [41] and multi- II.DEFINITIONS AND PRELIMINARIES layer networks [5]. Moreover, a series of overviews to disjoint Preliminaries include primary definitions, notations (TABLE and overlapping community defections are reviewed [6], [7]. I), and general inputs and outputs of deep learning models. Focusing on application scenarios, previous surveys review community detection techniques in social networks [9], [42]. Definition 1: Network. A basic network structure is repre- This paper aims to support researchers and practitioners to sented as G = (V,E), where V is the set of nodes and E is the understand the past, current and future trends of the commu- set of edges. Let vi ∈ V denote a node and eij = (vi, vj) ∈ E nity detection field from: an edge between vi and vj. The neighborhood of a node vi is • Systematic Taxonomy and Comprehensive Review. We defined as N(vi) = {u ∈ V |(vi, u) ∈ E}. The adjacency propose a new systematic taxonomy for this survey (see matrix A = [aij] is an n × n dimensional matrix, where Fig.3). For each category, we review, summarize and aij = 1 if eij ∈ E, and aij = 0 if eij ∈/ E. If aij 6= aji, compare the representative works. We also briefly intro- G is a directed network, otherwise it is an undirected network. duce community detection applications in the real world. If aij is weighted by wij ∈ W , G = (V,E,W ) is a weighted These scenarios provide horizons for future community network, otherwise it is an unweighted network. If aij’s value detection research and practices. differs in +1 and −1, G is a signed network representing • Abundant Resources and High-impact References. positive (1) and negative (−1) edges. If nodes V have attributes n The survey is not only a literature review but also a X = {xi}1 , G = (V,E,X) is an attributed network where d collection of resources of benchmark data sets, evalu- xi ⊆ R represents an attribute vector of vi, otherwise it is an ation metrics, open-source implementations and practi- unattributed network. When the network changes over time t, cal applications. We widely survey community detection it is a dynamic network denoted G(t) = (V(t),E(t)) or temporal publications in the latest high-impact international confer- network denoted G(t) = (V,E,X(t)). ences and high-quality peer-reviewed journals, covering Definition 2: Community. Given a set of communities C = domains of artificial intelligence, machine learning, data {C1,C2, ··· ,Ck}, each community Ci is a partition of the mining and data discovery. network G which keeps regional structures and clusters’ com- • Future Directions. As deep learning is a new research mon properties. A node vi clustered into community Ci should trend, we discuss current limitations, critical challenges satisfy the condition that the internal node degree (inside the and open issues for future directions. community) exceeds its external degree. Suppose Ci ∩Cj = ∅, The rest of this survey is organized as follows: SectionII (∀i, j), then C is a set of disjoint communities; otherwise it introduces notations used in this paper, gives definitions of includes a set of overlapping communities in which nodes basic concepts, and specifies the input and output of deep belong to more than one community. learning-based community detection. Section III provides an Community Detection Input. Deep learning-based commu- overview of traditional community detection approaches and nity detection models take as input the network structure explains why uncovering communities should rely on deep and other attributed information, including node attributes learning. SectionIV introduces a taxonomy which is used in and signed edges. Network structure represents topological the rest of the paper. SectionsV-X provide a comprehensive relationships by nodes and edges. Weights may apply on edges overview of community detection with various deep learning indicating the connection strength. Node attributes denote semantic information on nodes, e.g., user’s account profiles 1This paper is an extended vision of our published survey [4] in IJCAI- 20, which is the first published work on the review of community detection in online social networks. Signed edges indicate connection approaches with deep learning. statuses, i.e., positive (+) and negative (−) connections. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL.XX, NO.XX, 2021 3
TABLE I: Notations and their descriptions in this paper. traditional approaches are often sub-optimal. We briefly review Notations Descriptions their representative methods in this section. Deep learning R A data space methods (right part in Fig.3) uncover deep network informa- G A graph tion, complex relationships and handle high-dimensional data. V A set of nodes E A set of edges Graph Partition. These methods, widely known as graph C A set of communities vi The i-th node clustering [36], partition a network into k communities, where eij The edge between vi and vj k is a predefined number. The number of edges in a cluster Ck The k-th community is denser than the number of edges between the clusters. N(vi) A neighborhood of vi n The number of nodes Kernighan-Lin [43] is a representative heuristic algorithm for m The number of edges finding partitions of graphs. It initially divides a graph into two A An adjacency matrix arbitrary sub-graphs and then optimizes nodes’ participants A(+, −) The adjacency matrix of a signed network until it reaches the maximization of the gain function. Another Aij The anchor links between graphs (Gi, Gj ) X A set of heterogeneous node attributes representative method is spectral bisection [44] which applies X A node attribute matrix the spectrum Laplacian matrix for graph partitioning. These xi The node attribute vector of vi methods are still used in deep learning methods. yi The node label of vi k yi The binary community label of vi in Ck Statistical Inference. Stochastic Block Model (SBM) [32] is ck The community label of Ck a widely applied generative model by assigning nodes into d The attribute dimension of xi D A degree matrix communities and controlling their probabilities of likelihood. L A Laplacian matrix The variants include Degree-Corrected SBM (DCSBM) [33], l The layer index of DNN Mixed Membership SBM (MMB) [34] and Overlapping SBM (l) W The weight matrix of the l-th layer (OSBM) [35]. σ(·) The activation function (l) H The matrix of activations in the l-th layer Hierarchical Clustering. This group of methods discover h(l) The representation vector of v in the l-th layer i i multi-level community structures in three ways: divisive, ag- Z Processed features (e.g., autoencoded, noised) glomerative and hybrid. Girvan-Newman (GN) algorithm finds zi A vector of processed features of vi B A modularity matrix community structure in a divisive way by a successively bij The modularity value of (vi, vj ) removing edges such that a new community occurs [2], [45]. Q The modularity evaluation metric M A Markov matrix Its output is a dendrogram representing a nested hierarchy S A similarity matrix of community structures. Fast Modularity (FastQ)[3], [46], sij The pairwise node similarity value of (vi, vj ) an agglomerative algorithm, gradually merges nodes, each O A community membership matrix of which is initially regarded as a community. Community oij The community membership value of (vi, vj ) U/P The nonnegative matrix Detection Algorithm based on Structural Similarity (CDASS) pij Community membership probability of (vi,Cj ) [47] jointly applies divisive and agglomerative strategies in L A loss function a hybrid way, which divides the graph based on structural Ω A sparsity penalty similarity and merges it into hierarchical communities. | · | The length of a set k · k The norm operator Dynamical Methods. Random walks utilize the tendency of a Θ Trainable parameters P r(·) The probability distribution walker being trapped within a community during a short walk, φg A generator which is the most exploited procedure to dynamical detect φd A discriminator communities. For example, WalkTrap [48] uses a random φ An encoder e walk to calculate the probability and distance in measuring φr A decoder node’s similarities within communities. Information Mapping (InfoMap) [49] detects communities by describing paths of Community Detection Output. In general, community mod- random walks using the minimal-length encoding. Label Prop- els output a set of communities that group nodes and edges. agation Algorithm (LPA) [50] identifies communities through The communities can be either disjoint or overlapping. As an information propagation mechanism to detect the diffusion shown in Fig.2, overlapping communities share nodes, while community, denoting a set of nodes grouped by the same disjoint communities do not. We consider both of these com- propagation. A combination with the modularity forms a munities. variation – Modularity-specialized LPA (LPAm) [51]. Spectral Clustering. The property of spectrum in a network III.ADEVELOPMENTOF COMMUNITY DETECTION can be used to detect communities. Spectral clustering [30] Community detection has been significant in network anal- partitions nodes based on the network’s normalized Laplacian ysis and data mining. Fig.4 illustrates its development of matrix from regularized adjacency matrix, and then fits parti- traditional and deep learning methods. Traditional methods tions to a SBM using a pseudo-likelihood algorithm. By exam- explore communities upon network structures. These methods ining the spectra of normalized Laplacian matrices, Siemon et in seven categories (left part in Fig.3) only capture shallow al. [31] identified an integrative community structure in neural connections in a straightforward way. The detection results of brain networks to partition three kinds of animals. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL.XX, NO.XX, 2021 4
Traditional Methods Deep Learning-based Methods Community Deep Learning Detection Convolutional Network-based Graph Partition CNN-based GCN-based
1 1 2 4 2 4 Statistical Inference CNN GCN 3 3 … 5 7 5 7 Graph Attention Network-based … … 6 6 … 1 1 1 2 4 2 4 5 3 3 5 7 … 5 7 … 6 6 2 Hierarchical Clustering … Generative Adversarial Network-based 3 6 7 AE GAT 4 h4 Z h2 Autoencoder-based … … … … …
… 1 Dynamical Methods … h5 h3 h DenoisingAE-based
h6 StackedAE-based Min Loss(X, X’) GCN AE-based h7 h’3 Spectral Clustering SparseAE-based GAN Real Samples GAT AE-based
1 C C2 φg φd VariationalAE-based 2 1 5 3 Density-based Algorithms Fake Samples 6 Finetune Deep Nonnegative Matrix Factorization-based DNMF NMF DSF 4 7 i Pi-1 Optimizations × A Ui Pi Deep Sparse Filtering-based
Fig. 3: Traditional community detection methods and the taxonomy of deep learning-based methods.
1970s 1980s 1990s 2000s 2010 — 2014 2016 2017 2018 2019 2020
Graph Graph Density-based Hierarchical Graph Optimizations Density-based Hierarchical Optimizations Algorithms Algorithms Clustering ComNet-R Partition Partition Clustering Partition CNN MAGA-Net CE-MOEA spectral GN 2002 clustering KL 1970 DBSCAN 1996 LCCD CDASS bisection FastQ 2004 2010 LGNN 1982 CommDGI MRFasGCN semi-DRN TINs CayleyNets GCLN AE GCN Statistical CNN Optimizations DNGR GCN NOCD
GCN IPGDN Statistical Inference AGC Extremal 2002 AGE Inference DNE-SBP, Modularity 2004 OSBM, DCSBM DIME SBM 1983 2011 UWMNE, Louvain 2008 AE GRACE, ProGAN MGAE DANE DMGI AE GAN CommunityGAN
Spectral Dfuzzy GAT MAGNN Optimizations Dynamical Clustering VGAECD, simulated Methods ARVGA 2013 Transfer-CDDTA annealing 1983 SEAL WalkTrap 2005 2014 DAEGC LPA 2007 DR-GCN
DGLFRM, GAN InfoMap 2008 DANMF AE VGECLE, JANE
Optimizations DNMF LPAm 2009 VGAECD-OPT, spectral 2013 LGVG sE-Autoencoder Density-based Combo 2014 CDMEC DSFCD Algorithms DSF GUCD, SDCN, AE O2MAC SCAN 2007 Graph MAGCN, DMGC AE Encoder TVGA Traditional Methods Statistical Deep Learning Methods Inference MDNMF
MMB 2008 DNMF
Fig. 4: A timeline of community detection development.
Density-based Algorithms. This group includes the following Ci = Cj and zero otherwise, the sum runs over all pairs of significant clustering algorithms: Density-Based Spatial Clus- nodes (vi and vj) on adjacency matrix aij ∈ A, m denotes tering of Applications with Noise (DBSCAN) [52], Structural the number of edges, and P = [pij] indicates the average Clustering Algorithm for Networks (SCAN) [53] and Locating adjacency matrix of an ensemble of randomizations of the Structural Centers for Community Detection (LCCD) [54]. original network. P preserves the network features, such as They identify communities, hubs and outliers by measuring bipartiteness, correlations, signed edges and space embedded- entities’ density. LCCD particularly uses the density peak ness. The standard P [45] denotes pij = kikj/2m, where ki clustering algorithm [55] to locate the structural centres from and kj are node degrees. Louvain [56] is another well-known networks and then expands communities from the identified optimization algorithm that employs the node-moving strategy centres to the borders through a local search procedure. to extract community structures with the network’s optimized modularity. The extensions of greedy optimization methods Optimizations. Community detection methods employ opti- include simulated annealing [57], extremal optimization [58] mizations to reach an extremum value, which usually expects and spectral optimization [59]. Effective in local learning maximization indicating a likelihood over communities. The and global searching [60], the evolutionary community de- most classic optimization function is Modularity (Q)[45] tection methods can be divided into two classes: single– and and its variant FastQ [3], [46]. They estimate the quality multi–objective optimizations. Single-objective optimizations, of a partition of the network in communities. The general such as Multi-Agent Genetic Algorithm (MAGA-Net) [61], expression of modularity [37] is: apply the modularity function. Multi-objective optimizations 1 X Q = (aij − pij )δ(Ci,Cj ), (1) combine multiple objective functions. For example, Combo 2m ij [62] mixes Normalized Mutual Information (NMI) [63] and where C and C indicates the community of node v and i j i Conductance (CON) [64]. Continuous Encoding Multi-Object vj, δ is the Kronecker delta function which yields one if IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL.XX, NO.XX, 2021 5
Graph Images Embeddings Labels Communities Convolutions … n=7 1 … d Feature Maps 1 1 Nodes 2 4 C1 1 1 Nodes … Class 1 3 4 4 Class 2 7 2 … 2 … 5
… 6 3 or … d … 1 C1 3 … 1 2 4 … m=9 2 Edges 2 4 C2 7 2 Edges Pooling Remove 7