Community Detection in Complex Networks Via Clique Conductance Zhenqi Lu1, Johan Wahlström2 & Arye Nehorai1
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports Correction: Publisher Correction OPEN Community Detection in Complex Networks via Clique Conductance Zhenqi Lu1, Johan Wahlström2 & Arye Nehorai1 Network science plays a central role in understanding and modeling complex systems in many areas Received: 8 November 2017 including physics, sociology, biology, computer science, economics, politics, and neuroscience. Accepted: 20 March 2018 One of the most important features of networks is community structure, i.e., clustering of nodes Published online: 13 April 2018 that are locally densely interconnected. Communities reveal the hierarchical organization of nodes, and detecting communities is of great importance in the study of complex systems. Most existing community-detection methods consider low-order connection patterns at the level of individual links. But high-order connection patterns, at the level of small subnetworks, are generally not considered. In this paper, we develop a novel community-detection method based on cliques, i.e., local complete subnetworks. The proposed method overcomes the defciencies of previous similar community- detection methods by considering the mathematical properties of cliques. We apply the proposed method to computer-generated graphs and real-world network datasets. When applied to networks with known community structure, the proposed method detects the structure with high fdelity and sensitivity. When applied to networks with no a priori information regarding community structure, the proposed method yields insightful results revealing the organization of these complex networks. We also show that the proposed method is guaranteed to detect near-optimal clusters in the bipartition case. Networks are a standard representation of complex interactions among multiple objects, and network analysis has become a crucial part of understanding the features of a variety of complex systems1–10. One way to analyze net- works is to identify communities, mesoscopic structures consisting of groups of nodes that are relatively densely connected to each other but sparsely connected to other dense groups in the network11. Communities, also called clusters or modules, mark groups of nodes which could, for example, share common properties, exchange infor- mation frequently, or have similar functions within the network12. Te existence of communities is evident in many networked systems from a great many areas, including physics, sociology, biology, computer science, engi- neering, economics, politics, and neuroscience13–20. Community detection is important for many reasons. It allows classifcation of the functions of nodes in accordance with their structural positions in their communities21–23. It reveals the hierarchical organization that exists in many real-world networks24. Moreover, it improves the performance and efciency of processing, ana- lyzing, and storing networked data25,26. Communities also have concrete applications. In social networks, com- munities represent groups of individuals with mutual interests and backgrounds, and imply patterns of real social groupings15. In purchase networks, communities represent groups of customers with similar purchase habits, and can help establish efcient recommendation systems26. In citation networks, communities represent groups of related papers in one research direction, and identify scholars sharing research interests27. In brain networks, communities represent groups of nodes that are intricately interconnected and that could perform local compu- tations, and they give insights into structural units of the brain28. Te mathematical synonym of networks is graphs, and in the context of graph theory, one of the mathemati- cal formalizations of community detection is graph partitioning. Guided by spectral graph theory29, the method of spectral graph partitioning arose by relating network properties to the spectrum of the Laplacian matrix30. Te earliest method in this category minimized connections between diferent communities31,32. In practice, this optimization problem can be efciently solved, but it favors non-optimal solutions involving cutting a small part from the graph. One way to circumvent this drawback is to introduce balancing factors to the objective functions in order to enforce a reasonably large size for each community33,34. However, introducing balancing factors makes 1Preston M. Green Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO, USA. 2Department of Computer Science, University of Oxford, Oxford, United Kingdom. Correspondence and requests for materials should be addressed to A.N. (email: [email protected]) SCIENTIFIC REPORTS | (2018) 8:5982 | DOI:10.1038/s41598-018-23932-z 1 www.nature.com/scientificreports/ these optimization problems NP-hard35. Hence relaxed versions of these problems are solved by taking advantage of the properties of the Laplacian matrix. Despite thousands of publications in the literature on spectral partitioning, these methods are constrained to conventional graph-based models. Tese models involve a set of vertices, which represent objects of interest, and a set of edges, which encode the existence or non-existence of a relationship between each pair of objects. However, in many real-world systems, the complex and rich nature of systems cannot be captured by such dyadic relationships. More importantly, recent computer innovations have greatly increased the size of the real networks that one can potentially handle. As a result, the way to process and understand graphs has been changed, and polyadic interactions are becoming more and more important. In particular, a community is intuitively a cohesive group of vertices that are “more densely” connected within the community than across communities11. Te precise defnition and characterization of “more densely” relies on polyadic interactions among multiple vertices. In order to quantitatively characterize polyadic structures, we employ the high-order structures of cliques, defned to be local complete subgraphs. In the context of networks, cliques are groups of objects that rapidly and efectively interact. Tis paper presents a graph-partitioning method that identifes clusters of cliques. One line of related work is the method of k-clique percolation36,37. Tis method defnes the k-clique com- munity to be the union of “adjacent” k-cliques, which by defnition share k − 1 vertices, where k is any positive integer. However, this defnition is too stringent because it rules out other possible communities that are not so well-connected. Its performance also relies heavily on the choice of k: A small k leads to a single giant community, and a large k leads to multiple small and possibly distant communities. In addition, this defnition includes top- ological cavities38, which enclose holes in networks and mark local lacks of connectivity. However, this feature is not an expected property of communities. In a recent paper, Benson et al. devised a community-detection method based on high-order connectivity patterns called network motifs39,40, and proposed a generalized framework for identifying clusters of network motifs41. Cliques are certainly one special kind of network motif, and Benson et al. provide numerical simulations for applying this framework to cliques. However, this framework has several drawbacks. First, the framework fails to consider the nested nature of cliques and so sufers from unnecessary computational cost, since it needs to take into consideration non-maximal cliques. Second, the method requires pre-specifcation of the sizes of the cliques involved, instead of considering all clique sizes occurring in the network. Tird, the conductance function merely counts the number of cliques and ignores other properties infuenced by partitions. Lastly, the perfor- mance guarantee works only for 3-cliques. We overcome all these drawbacks by designing a novel conductance function specifcally for cliques. In this paper, we propose a novel community-detection method that minimizes a new objective function, called the clique conductance function. We encode in this objective function the number and sizes of cliques, and the numbers of edges in the cliques. Finding a partition that exactly minimizes the clique conductance is computationally intractable. Tus we extend the spectral graph partitioning methodology, and devise a com- putationally tractable solution that approximately minimizes the clique conductance. In addition, we derive a performance guarantee for the bipartition case, showing that the resulting bipartition is near-optimal. Finally, we apply the proposed method to computer-generated graphs and real-world network datasets. When applied to networks with known community structure, the proposed method achieves excellent agreement with the ground-truth communities. When applied to networks with no a priori information regarding community structure, the proposed method yields insightful results that help us understand the structures embedded in these complex networks. Methods In this section, we describe our proposed graph-partitioning method. We begin by introducing several graph nota- tions, and then state the formulation of our proposed graph-partitioning method based on clique-conductance minimization. We conclude this section by proposing a computationally efcient algorithm that approximately solves the optimization problem. Graph Notations. An undirected weighted graph is an ordered triplet (,VE,)π consisting of a set of verti-