Discovering Community Structure in Multilayer Networks

2017 International Conference on Data Science and Advanced Analytics Discovering Community Structure in Multilayer Networks Soumajit Pramanik1, Raphael Tackx2, Anchit Navelkar1, Jean-Loup Guillaume3 and Bivas Mitra1 1Department of Computer Science & Engineering, IIT Kharagpur, India 2Sorbonne Universites,´ UPMC Universite´ Paris 06, CNRS, LIP6 UMR 7606, 4 place Jussieu 75005 Paris, France 3L3I, University of La Rochelle, France [email protected], [email protected], [email protected] [email protected], [email protected] Abstract —Community detection in single layer, isolated net- Proximity Links works has been extensively studied in the past decade. However, Location/ many real-world systems can be naturally conceptualized as L2 L5 L1 Business multilayer networks which embed multiple types of nodes and L Layer L3 4 relations. In this paper, we propose algorithm for detecting communities in multilayer networks. The crux of the algorithm is Review Links based on the multilayer modularity index QM , developed in this U U paper. The proposed algorithm is parameter-free, scalable and U1 U3 5 7 User adaptable to complex network structures. More importantly, it Layer U U can simultaneously detect communities consisting of only single U2 4 6 type, as well as multiple types of nodes (and edges). We develop Friendship Links a methodology to create synthetic networks with benchmark multilayer communities. We evaluate the performance of the Fig. 1. A sample multilayer (Yelp) network. proposed community detection algorithm both in the controlled environment (with synthetic benchmark communities) and on the reveal complex interactions between multi-type nodes and empirical datasets (Yelp and Meetup datasets); in both cases, the heterogeneous links. They are also found to be beneficial for proposed algorithm outperforms the competing state-of-the-art algorithms. different data mining tasks such as context-sensitive search, prediction and recommendation [3] etc. Community detection Keywords -Multilayer Network, Modularity, Community Detec- in multilayer networks is challenging as the detected com- tion munities have possibility to contain only single or multiple types of nodes. Most of the recent endeavors concentrated I. INTRODUCTION on the multiplex networks [4], [5] where all layers share Communities are defined as groups of nodes that are more the identical set of nodes but may have multiple types of densely connected to each other than to the rest of the network. interactions. In multiplex network, some of the approaches The goal of the community detection algorithms, consequently, propose new quality metrics [4] to measure the goodness of the is to partition the networks into groups of nodes; large detected communities whereas a few other approaches utilize body of work exists on community detection in single and random walk [5] or frequent-pattern mining techniques [6] isolated network [1]. Recently, many real networks, including to obtain structurally similar components. In principle, most communication, social, infrastructural and biological ones, are of the aforementioned algorithms transform the problem to often represented as multilayer networks [2]. A multilayer the classical community detection in a monoplex network network is comprised of multiple interdependent networks, leveraging on the fact that in multiplex network, one-to-one where each network layer represents one aspect of interaction. cross layer links connect the copies of the same nodes in Moreover, the functionality of a node in one network layer is multiple layers. Unfortunately, the presence of heterogeneous dependent on the role of nodes in other layers. For instance, a nodes across multiple layers and cross layer dependency links location based social network (say, Yelp) can be represented as make the aforementioned solutions inadequate for multilayer a multilayer network (see Fig. 1) where in one layer customers networks. (visitors) are connected via social links and in the other layer Attempts have been made in bits and pieces to detect location nodes are connected through proximity links. The communities in multilayer networks; novel methodologies (coupling) link connecting a customer with a location node have been introduced such as Dirichlet process [7], tensor represents the visit of a customer to a location. factorization [3], subspace clustering [8], non-negative matrix Community detection in complex multilayer networks is an factorization [9] etc. However, most of these approaches suffer important research problem. The communities in multilayer from several limitations. First of all, some of the aforesaid networks help to identify functionally cohesive sub-units and algorithms only work on a specific type of multilayer networks 978-1-5090-5004-8/17 $31.00 © 2017 IEEE 611 DOI 10.1109/DSAA.2017.71 layer algorithms on the empirical datasets (Yelp and Meetup) L1 and demonstrate that they outperform the state of the art baselines in correctly discovering the communities (sec. VII). II. REPRESENTATION &PROBLEM STATEMENT L2 We start with formally representing the multilayer network Multilayer Ground-Truth Single-Layer Ground- and defining the respective communities. Next, we state the Communities Truth Communities problem of detecting communities in multilayer network and Fig. 2. Network configurations with two different types of ground truth the key challenges. communities. A. Representation (say, star-type [7] etc.). Secondly, some of them are forced G =(G G ) to detect communities comprising only multiple types of We represent a multilayer network as a tuple U , B G = { : ∈{1 2 }} nodes [3], hence introducing bias. Third, the desired number where U Li i , ,...,M is a family of M G G = { : of communities are required to be fixed apriori for most of uni-partite graphs (called layers of ) and B Lij ∈{1 2 } = } them [3], [8], limiting their capability to discover the true i, j , ,...,M ,i j is the family of bipartite set of communities. Finally, a proper framework to generate graphs containing nodes from individual layers and the cross benchmark communities for generic multilayer networks is not layer interconnections among them. We denote each layer =( ) available in any of them. Li Vi,Ei where Vi and Ei are respectively the set Recent endeavors directed towards development of modular- of nodes and intra-layer edges present in Li. In the same ( ) ity index for heterogeneous networks. For instance, composite line, we can represent Lij as a triplet Vi,Vj,Eij where { ⊆{ × } : ∈{1 2 } = } modularity [10] calculates the modularity of a multi-relational Eij Vi Vj i, j , ,...,M ,i j is the network as the integration of modularities calculated for each set of coupling edges between nodes of layers Li & Lj. G single-relational subnetwork. However, due to the deficiency Definition: A community C in a multilayer network is (C C ) G in definition, the composite modularity can only produce defined as a cohesive module U , B of containing a subset communities with single type of nodes. On the other side, of nodes from one or more layers and all the edges having both C C modularity proposed in [11] in the context of gene-chemical endpoints incident on them. Mathematically, U and B can C = { C =( C C ): C ⊆ C = interaction network fails to conceive the role of coupling links be defined as U Li Vi ,Ei Vi Vi,Ei { ∩ ( C × C )} ∈{1 2 }} C = { C = in communities characterization. The detailed exploration of Ei Vi Vi ,i , ,...,M and B Lij ( C C C ): C ⊆ C ⊆ C = { ∩ ( C × prior art reveals the importance of the multilayer community Vi ,Vj ,Eij Vi Vi,Vj Vj,Eij Eij Vi C )} ∈{1 2 } = } detection algorithm which is free from (a) any external param- Vj ,i,j , ,...,M ,i j . G eter, say total number of communities (b) any bias towards Importantly, communities of a multilayer network can be communities with only single type or only multiple types of divided into two types (see Fig. 2) (a) cross layer communities |C | =Φ nodes. Developing a suitable modularity index should be the (containing multiple types of nodes) for which B ; first step towards this direction. (b) single layer communities (containing only single type of |C | =Φ In this paper, we propose a community detection algorithm nodes) for which B . for multilayer networks which is able to detect communities B. Problem Statement comprising both single type as well as multiple types of The problem of multilayer community detection algorithm nodes, depending on the network structure. First we represent G the multilayer network with proper notations and define the is to divide the network into a set of disjoint cohesive modules C1,C2,...,CK which is a cover of the nodes in problem of community detection (sec. II). Next, we develop G a methodology to construct synthetic multilayer network with such that each module Ci is comprised of a group of ground truth communities and evaluate it rigorously (sec. III). nodes densely connected inside & loosely connected outside The major contribution of this paper is to propose a modu- the community. The key challenges of this problem are two-fold - (a) dealing larity index QM for characterizing communities in multilayer networks. Subsequently, we develop the multilayer community with multilayer network which contains multiple types of links detection algorithms GN-Q and Louvain-Q incorporating (of different densities) & nodes and (b) detecting both cross M M layer & single layer communities simultaneously without any the modularity index QM . We present the convergence proof for both the proposed algorithms along with their complexity additional parameter. analysis (sec. IV). We first evaluate the performance of the III. DATASET proposed modularity as a community scoring metric (sec. V) A. Synthetic Dataset Generation and then tested the performance of the developed algorithms against the competing algorithms. Controlled experiments, In this section, we propose a methodology to generate performed on the synthetic network, exhibit the ability of benchmark multilayer networks with ground truth communi- the GN-QM and Louvain-QM algorithms to efficiently detect ties.

Load more