Generalized Louvain Method for Community Detection in Large Networks

Total Page:16

File Type:pdf, Size:1020Kb

Generalized Louvain Method for Community Detection in Large Networks Generalized Louvain method for community detection in large networks Pasquale De Meo∗, Emilio Ferrarax, Giacomo Fiumara∗, Alessandro Provetti∗,∗∗ ∗Dept. of Physics, Informatics Section. xDept. of Mathematics. University of Messina, Italy. ∗∗Oxford-Man Institute, University of Oxford, UK. fpdemeo, eferrara, gfiumara, [email protected] Abstract—In this paper we present a novel strategy to a novel measure of edge centrality, in order to rank all discover the community structure of (possibly, large) networks. the edges of the network with respect to their proclivity This approach is based on the well-know concept of network to propagate information through the network itself; iii) modularity optimization. To do so, our algorithm exploits a novel measure of edge centrality, based on the κ-paths. its computational cost is low, making it feasible even for This technique allows to efficiently compute a edge ranking large network analysis; iv) it is able to produce reliable in large networks in near linear time. Once the centrality results, even if compared with those calculated by using ranking is calculated, the algorithm computes the pairwise more complex techniques, when this is possible; in fact, proximity between nodes of the network. Finally, it discovers because of the computational constraints, the adoption of the community structure adopting a strategy inspired by the well-known state-of-the-art Louvain method (henceforth, LM), some existing techniques is not viable when considering efficiently maximizing the network modularity. The experi- large networks, and their application is only limited to small ments we carried out show that our algorithm outperforms case-studies. other techniques and slightly improves results of the original This paper is organized as follows: in the next Section we LM, providing reliable results. Another advantage is that its provide some background information about the community adoption is naturally extended even to unweighted networks, differently with respect to the LM. detection problem. Section III introduces the main objectives of this work and describes an intuitive sketch about the novel Keywords-complex networks; community structure strategy of community detection we propose. In Section IV the key concept of κ-path edge centrality is recalled, being it I. INTRODUCTION a novel and efficient strategy of ranking edges with respect The investigation of the community structure inside net- to their centrality in the network. All the pieces are glued works has acquired a great relevance during the last years, together in Section V. We describe our strategy to detect in particular in the context of Social Network Analysis the community structure, inspired by the well-known state- (SNA). This, also because of the unpredicted success of of-the-art LM [1], which is computationally suitable even Online Social Networks (OSNs). In fact, social phenomena when large networks are analyzed. Experiments that have such as Facebook and Twitter amongst others, glue together been carried out are discussed in Section VI. Finally, Section millions of users under a unique network whose features VII concludes, depicting some future directions of research. are a goldmine for Social Scientists. Several works are focused on the Social Network analysis of these OSNs; II. BACKGROUND others describe the strategies of analysis themselves. Several techniques to investigate the community structure In this paper we focus on the possible strategies of com- of networks have been proposed in literature during last munity detection. As to date, two paradigms exist to discover years. There exist numerous comprehensive surveys to this the community structure of a network. The former is based problem, such as [2], [3]. on the analysis of the global features of the network, for In its general formulation, the problem of finding commu- example its topology. These approaches are characterized by nities in a network is intended as a data clustering problem. high computational complexity and high quality results. The In fact, it could be solved assigning each node of the network latter paradigm relies on exploiting local information, for to a cluster, in a meaningful way. Two approaches have been example those acquirable by nodes and their neighborhoods. widely investigated, i) spectral clustering based techniques, The computational cost of these techniques is lower than and, ii) network modularity optimization strategies. The those exploiting global features, but the reliability decreases. former relies on the optimization of the process of cutting In this work, we propose a novel strategy to discover the graph representing the given network. The latter is based the inner community structure of a network. The main on the maximization of a benefit function, called network characteristics of our approach are the followings: i) it modularity. We briefly recall them, separately. exploits global information of the network, establishing The problem of minimizing the number of cuts in a which are the edges of the network that contribute to the given graph has been proved to be NP-hard. To do so, creation of the community structure; ii) to do so, it adopts different approximate techniques have been proposed. An example is by using the spectral clustering [4], exploiting betweenness centrality, that is itself very costly (even if the the eigenvectors of the Laplacian matrix of the network. most efficient algorithm [10] is adopted). We recall that the Laplacian matrix L of a given graph has Several variants of this strategy have been proposed during components Lij = kiδ(i; j)−Aij, where ki is the degree of the years, such as the fast clustering algorithm provided by a node i, δ(i; j) is the Kronecker delta (that is, δ(i; j) = 1 Clauset, Newman and Moore [11], that runs in O(n log n) on if and only if i = j) and Aij is the adjacency matrix sparse graphs; the extremal optimization method proposed representing the graph connections. Another approach relies by Duch and Arenas [12], based on a fast agglomerative on the strategy of the ratio cut partitioning [5], [6]. This approach, with O(n2 log n) time complexity; the Newman- is a function that, if minimized, allows the identification Leicht [13] mixture model based on statistical inferences; of large clusters with a minimum number of outgoing other maximization techniques by Newman [14] based on interconnections. The principal issue of spectral clustering eigenvectors and matrices. based techniques is that one has to know in advance the The state-of-the-art technique is called Louvain method number and the size of communities comprised in the given (LM) [1]. This strategy is based on local information and network. This makes this strategy unfeasible if the purpose is is well-suited for analyzing large weighted networks. It is to discover the unknown community structure of a network. based on the two simple steps: i) each node is assigned The strategy exploited in this paper adopts the second to a community chosen in order to maximize the network paradigm, the one relying on the concept of network modu- modularity Q; the gain derived from moving a node i into larity. It can be explained as follows: let consider a network, a community C can simply be calculated as [1] represented by means of a graph G = (V; E), partitioned into m communities; assuming l the number of edges P C P 2 P P 2 s +k +ki C i C^ C C^ ki ∆Q = − − − − (2) between nodes belonging to the s-th community and ds is 2m 2m 2m 2m 2m the sum of the degrees of the nodes in the s-th community, the network modularity Q is given by where P is the sum of the weights of the edges inside P C C, C^ is the sum of the weights of the edges incident m " 2# X ls ds to nodes in C, ki is the sum of the weights of the edges Q = − (1) C jEj 2jEj incident to node i, ki is the sum of the weights of the edges s=1 from i to nodes in C, m is the sum of the weights of all Intuitively, high values of Q implies high values of ls for the edges in the network; ii) the second step simply makes a each discovered community; thus, detected communities are new network consisting of nodes that are those communities dense within their structure and weakly coupled among each previously found. Then the process iterates until a significant other. Equation 1 reveals a possible maximization strategy: improvement of the network modularity is obtained. in order to increase the value of the first term (namely, the In this paper we present an efficient community detection coverage), the highest possible number of edges should fall algorithm which represents a generalization of the LM. in each given community, whereas the minimization of the In fact, it can be applied even on unweighted networks second term is obtained by dividing the network in several and, most importantly, it exploits both global and local communities with small total degrees. information. To make this possible, our strategy computes The problem of maximizing the network modularity has the pairwise distance between nodes of the network. To do been proved to be NP complete [7]. To this purpose, several so, edges are weighted by using a global feature which heuristic strategies to maximize the network modularity Q represents their aptitude to propagate information through have been proposed as to date. Probably, the most pop- the network. The edge weighting is based on the κ-path ular one is called Girvan-Newman strategy [8], [9]. This edge centrality, a novel measure whose calculation requires approach works in two steps, i) ranking edges by using the a near linear computational cost [15]. Thus, the partition of betweenness centrality as measure of importance; ii) deleting the network is obtained improving the LM.
Recommended publications
  • Networks in Nature: Dynamics, Evolution, and Modularity
    Networks in Nature: Dynamics, Evolution, and Modularity Sumeet Agarwal Merton College University of Oxford A thesis submitted for the degree of Doctor of Philosophy Hilary 2012 2 To my entire social network; for no man is an island Acknowledgements Primary thanks go to my supervisors, Nick Jones, Charlotte Deane, and Mason Porter, whose ideas and guidance have of course played a major role in shaping this thesis. I would also like to acknowledge the very useful sug- gestions of my examiners, Mark Fricker and Jukka-Pekka Onnela, which have helped improve this work. I am very grateful to all the members of the three Oxford groups I have had the fortune to be associated with: Sys- tems and Signals, Protein Informatics, and the Systems Biology Doctoral Training Centre. Their companionship has served greatly to educate and motivate me during the course of my time in Oxford. In particular, Anna Lewis and Ben Fulcher, both working on closely related D.Phil. projects, have been invaluable throughout, and have assisted and inspired my work in many different ways. Gabriel Villar and Samuel Johnson have been col- laborators and co-authors who have helped me to develop some of the ideas and methods used here. There are several other people who have gener- ously provided data, code, or information that has been directly useful for my work: Waqar Ali, Binh-Minh Bui-Xuan, Pao-Yang Chen, Dan Fenn, Katherine Huang, Patrick Kemmeren, Max Little, Aur´elienMazurie, Aziz Mithani, Peter Mucha, George Nicholson, Eli Owens, Stephen Reid, Nico- las Simonis, Dave Smith, Ian Taylor, Amanda Traud, and Jeffrey Wrana.
    [Show full text]
  • Modularity in Static and Dynamic Networks
    Modularity in static and dynamic networks Sarah F. Muldoon University at Buffalo, SUNY OHBM – Brain Graphs Workshop June 25, 2017 Outline 1. Introduction: What is modularity? 2. Determining community structure (static networks) 3. Comparing community structure 4. Multilayer networks: Constructing multitask and temporal multilayer dynamic networks 5. Dynamic community structure 6. Useful references OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Introduction: What is Modularity? OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon What is Modularity? Modularity (Community Structure) • A module (community) is a subset of vertices in a graph that have more connections to each other than to the rest of the network • Example social networks: groups of friends Modularity in the brain: • Structural networks: communities are groups of brain areas that are more highly connected to each other than the rest of the brain • Functional networks: communities are groups of brain areas with synchronous activity that is not synchronous with other brain activity OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Findings: The Brain is Modular Structural networks: cortical thickness correlations sensorimotor/spatial strategic/executive mnemonic/emotion olfactocentric auditory/language visual processing Chen et al. (2008) Cereb Cortex OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Findings: The Brain is Modular • Functional networks: resting state fMRI He et al. (2009) PLOS One OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Findings: The
    [Show full text]
  • Scikit-Network Documentation Release 0.24.0 Bertrand Charpentier
    scikit-network Documentation Release 0.24.0 Bertrand Charpentier Jul 27, 2021 GETTING STARTED 1 Resources 3 2 Quick Start 5 3 Citing 7 Index 205 i ii scikit-network Documentation, Release 0.24.0 Python package for the analysis of large graphs: • Memory-efficient representation as sparse matrices in the CSR formatof scipy • Fast algorithms • Simple API inspired by scikit-learn GETTING STARTED 1 scikit-network Documentation, Release 0.24.0 2 GETTING STARTED CHAPTER ONE RESOURCES • Free software: BSD license • GitHub: https://github.com/sknetwork-team/scikit-network • Documentation: https://scikit-network.readthedocs.io 3 scikit-network Documentation, Release 0.24.0 4 Chapter 1. Resources CHAPTER TWO QUICK START Install scikit-network: $ pip install scikit-network Import scikit-network in a Python project: import sknetwork as skn See examples in the tutorials; the notebooks are available here. 5 scikit-network Documentation, Release 0.24.0 6 Chapter 2. Quick Start CHAPTER THREE CITING If you want to cite scikit-network, please refer to the publication in the Journal of Machine Learning Research: @article{JMLR:v21:20-412, author= {Thomas Bonald and Nathan de Lara and Quentin Lutz and Bertrand Charpentier}, title= {Scikit-network: Graph Analysis in Python}, journal= {Journal of Machine Learning Research}, year={2020}, volume={21}, number={185}, pages={1-6}, url= {http://jmlr.org/papers/v21/20-412.html} } scikit-network is an open-source python package for the analysis of large graphs. 3.1 Installation To install scikit-network, run this command in your terminal: $ pip install scikit-network If you don’t have pip installed, this Python installation guide can guide you through the process.
    [Show full text]
  • Multidimensional Network Analysis
    Universita` degli Studi di Pisa Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Multidimensional Network Analysis Michele Coscia Supervisor Supervisor Fosca Giannotti Dino Pedreschi May 9, 2012 Abstract This thesis is focused on the study of multidimensional networks. A multidimensional network is a network in which among the nodes there may be multiple different qualitative and quantitative relations. Traditionally, complex network analysis has focused on networks with only one kind of relation. Even with this constraint, monodimensional networks posed many analytic challenges, being representations of ubiquitous complex systems in nature. However, it is a matter of common experience that the constraint of considering only one single relation at a time limits the set of real world phenomena that can be represented with complex networks. When multiple different relations act at the same time, traditional complex network analysis cannot provide suitable an- alytic tools. To provide the suitable tools for this scenario is exactly the aim of this thesis: the creation and study of a Multidimensional Network Analysis, to extend the toolbox of complex network analysis and grasp the complexity of real world phenomena. The urgency and need for a multidimensional network analysis is here presented, along with an empirical proof of the ubiquity of this multifaceted reality in different complex networks, and some related works that in the last two years were proposed in this novel setting, yet to be systematically defined. Then, we tackle the foundations of the multidimensional setting at different levels, both by looking at the basic exten- sions of the known model and by developing novel algorithms and frameworks for well-understood and useful problems, such as community discovery (our main case study), temporal analysis, link prediction and more.
    [Show full text]
  • A Network Approach to Define Modularity of Components In
    A Network Approach to Define Modularity of Components Manuel E. Sosa1 Technology and Operations Management Area, in Complex Products INSEAD, 77305 Fontainebleau, France Modularity has been defined at the product and system levels. However, little effort has e-mail: [email protected] gone into defining and quantifying modularity at the component level. We consider com- plex products as a network of components that share technical interfaces (or connections) Steven D. Eppinger in order to function as a whole and define component modularity based on the lack of Sloan School of Management, connectivity among them. Building upon previous work in graph theory and social net- Massachusetts Institute of Technology, work analysis, we define three measures of component modularity based on the notion of Cambridge, Massachusetts 02139 centrality. Our measures consider how components share direct interfaces with adjacent components, how design interfaces may propagate to nonadjacent components in the Craig M. Rowles product, and how components may act as bridges among other components through their Pratt & Whitney Aircraft, interfaces. We calculate and interpret all three measures of component modularity by East Hartford, Connecticut 06108 studying the product architecture of a large commercial aircraft engine. We illustrate the use of these measures to test the impact of modularity on component redesign. Our results show that the relationship between component modularity and component redesign de- pends on the type of interfaces connecting product components. We also discuss direc- tions for future work. ͓DOI: 10.1115/1.2771182͔ 1 Introduction The need to measure modularity has been highlighted implicitly by Saleh ͓12͔ in his recent invitation “to contribute to the growing Previous research on product architecture has defined modular- field of flexibility in system design” ͑p.
    [Show full text]
  • Task-Dependent Evolution of Modularity in Neural Networks1
    Task-dependent evolution of modularity in neural networks1 Michael Husk¨ en, Christian Igel, and Marc Toussaint Institut fur¨ Neuroinformatik, Ruhr-Universit¨at Bochum, 44780 Bochum, Germany Telephone: +49 234 32 25558, Fax: +49 234 32 14209 fhuesken,igel,[email protected] Connection Science Vol 14, No 3, 2002, p. 219-229 Abstract. There exist many ideas and assumptions about the development and meaning of modu- larity in biological and technical neural systems. We empirically study the evolution of connectionist models in the context of modular problems. For this purpose, we define quantitative measures for the degree of modularity and monitor them during evolutionary processes under different constraints. It turns out that the modularity of the problem is reflected by the architecture of adapted systems, although learning can counterbalance some imperfection of the architecture. The demand for fast learning systems increases the selective pressure towards modularity. 1 Introduction The performance of biological as well as technical neural systems crucially depends on their ar- chitectures. In case of a feed-forward neural network (NN), architecture may be defined as the underlying graph constituting the number of neurons and the way these neurons are connected. Particularly one property of architectures, modularity, has raised a lot of interest among researchers dealing with biological and technical aspects of neural computation. It appears to be obvious to emphasise modularity in neural systems because the vertebrate brain is highly modular both in an anatomical and in a functional sense. It is important to stress that there are different concepts of modules. `When a neuroscientist uses the word \module", s/he is usually referring to the fact that brains are structured, with cells, columns, layers, and regions which divide up the labour of information processing in various ways' 1This paper is a revised and extended version of the GECCO 2001 Late-Breaking Paper by Husk¨ en, Igel, & Toussaint (2001).
    [Show full text]
  • Analyzing Social Media Network for Students in Presidential Election 2019 with Nodexl
    ANALYZING SOCIAL MEDIA NETWORK FOR STUDENTS IN PRESIDENTIAL ELECTION 2019 WITH NODEXL Irwan Dwi Arianto Doctoral Candidate of Communication Sciences, Airlangga University Corresponding Authors: [email protected] Abstract. Twitter is widely used in digital political campaigns. Twitter as a social media that is useful for building networks and even connecting political participants with the community. Indonesia will get a demographic bonus starting next year until 2030. The number of productive ages that will become a demographic bonus if not recognized correctly can be a problem. The election organizer must seize this opportunity for the benefit of voter participation. This study aims to describe the network structure of students in the 2019 presidential election. The first debate was held on January 17, 2019 as a starting point for data retrieval on Twitter social media. This study uses data sources derived from Twitter talks from 17 January 2019 to 20 August 2019 with keywords “#pilpres2019 OR #mahasiswa since: 2019-01-17”. The data obtained were analyzed by the communication network analysis method using NodeXL software. Our Analysis found that Top Influencer is @jokowi, as well as Top, Mentioned also @jokowi while Top Tweeters @okezonenews and Top Replied-To @hasmi_bakhtiar. Jokowi is incumbent running for re-election with Ma’ruf Amin (Senior Muslim Cleric) as his running mate against Prabowo Subianto (a former general) and Sandiaga Uno as his running mate (former vice governor). This shows that the more concentrated in the millennial generation in this case students are presidential candidates @jokowi. @okezonenews, the official twitter account of okezone.com (MNC Media Group).
    [Show full text]
  • A Survey of Results on Mobile Phone Datasets Analysis
    Blondel et al. RESEARCH A survey of results on mobile phone datasets analysis Vincent D Blondel1*, Adeline Decuyper1 and Gautier Krings1,2 *Correspondence: [email protected] Abstract 1Department of Applied Mathematics, Universit´e In this paper, we review some advances made recently in the study of mobile catholique de Louvain, Avenue phone datasets. This area of research has emerged a decade ago, with the Georges Lemaitre, 4, 1348 increasing availability of large-scale anonymized datasets, and has grown into a Louvain-La-Neuve, Belgium Full list of author information is stand-alone topic. We will survey the contributions made so far on the social available at the end of the article networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues. Keywords: mobile phone datasets; big data analysis 1 Introduction As the Internet has been the technological breakthrough of the ’90s, mobile phones have changed our communication habits in the first decade of the twenty-first cen- tury. In a few years, the world coverage of mobile phone subscriptions has raised from 12% of the world population in 2000 up to 96% in 2014 – 6.8 billion subscribers – corresponding to a penetration of 128% in the developed world and 90% in de- veloping countries [1]. Mobile communication has initiated the decline of landline use – decreasing both in developing and developed world since 2005 – and allows people to be connected even in the most remote places of the world. In short, mobile phones are ubiquitous.
    [Show full text]
  • Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large
    Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks Raghvendra Mall, Rocco Langone and Johan A.K. Suykens Department of Electrical Engineering, KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg,10 B-3001 Leuven, Belgium {raghvendra.mall,rocco.langone,johan.suykens}@esat.kuleuven.be Abstract Kernel spectral clustering corresponds to a weighted kernel principal compo- nent analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a repre- sentative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is de- termined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community de- arXiv:1411.7640v2 [cs.SI] 2 Dec 2014 tection techniques like the Louvain, OSLOM and Infomap methods. We show a major advantage our proposed approach i.e. the ability to locate good quality clusters at both the coarser and finer levels of hierarchy using internal cluster quality metrics on 7 real-life networks.
    [Show full text]
  • Xerox University Microfilms
    INFORMATION TO USERS This material was produced from a microfilm copy of the original document. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the original submitted. The following explanation of techniques is provided to help you understand markings or patterns which may appear on this reproduction. I.The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting thru an image and duplicating adjacent pages to insure you complete continuity. 2. When an image on the film is obliterated with a large round black mark, it is an indication that the photographer suspected that the copy may have moved during exposure and thus caijse a blurred image. You will find a good image of die page in the adjacent frame. 3. When a map, drawing or chart, etif., was part of the material being photographed the photographer followed a definite method in "sectioning" the material. It is customary to begin photoing at the upper left hand corner of a large sheet ang to continue photoing from left to right in equal sections with a small overlap. If necessary, sectioning is continued again — beginning below tf e first row and continuing on until complete. 4. The majority of users indicate that thetextual content is of greatest value, however, a somewhat higher quality reproduction could be made from "photographs" if essential to the understanding of the dissertation.
    [Show full text]
  • Ensemble-Based Community Detection in Multilayer Networks
    ENSEMBLE-BASED COMMUNITY DETECTION IN MULTILAYER NETWORKS Andrea Tagarelli, Alessia Amelio, Francesco Gullo The 2017 European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in DataBases Experimental evaluation Datasets • Our experimental evaluation was mainly conducted on seven real-world multilayer network datasets Experimental evaluation Datasets • We also resorted to a synthetic multilayer network generator, mLFR Benchmark, mainly for our evaluation of efficiency of the M-EMCD method • We used mLFR to create a multilayer network with 1 million of nodes, setting other available parameters as follows: • 10 layers, • average degree 30, • maximum degree 100, • mixing at 20% , • layer mixing 2. Experimental evaluation Competing methods • flattening methods • apply a community detection method on the flattened graph of the input multilayer network • it is a weighted multigraph having V as set of nodes, the set of edges, and edge weights that express the number of layers on which two nodes are connected • Nerstrand algorithm1 1 D. LaSalle and G. Karypis, "Multi-threaded modularity based graph clustering using the multilevel paradigm", J. Parallel Distrib. Comput., 76:66–80, 2015. Experimental evaluation Competing methods • aggregation methods • detect a community structure separately for each network layer, after that an aggregation mechanism is used to obtain the final community structure • Principal Modularity Maximization (PMM)2 • frequent pAttern mining-BAsed Community discoverer in mUltidimensional networkS (ABACUS)3 2 L. Tang, X. Wang, and H. Liu, “Uncovering groups via heterogeneous interaction analysis,” in Proc. ICDM, 2009, pp. 503–512. 3 M. Berlingerio, F. Pinelli, and F. Calabrese, "ABACUS: frequent pattern mining-based community discovery in multidimensional networks", Data Min.
    [Show full text]
  • Fast Community Detection Using Local Neighbourhood Search
    Fast community detection using local neighbourhood search Arnaud Browet,∗ P.-A. Absil, and Paul Van Dooren Universit´ecatholique de Louvain Department of Mathematical Engineering, Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM) Av. Georges Lema^ıtre 4-6, B-1348 Louvain-la-Neuve, Belgium. (Dated: August 30, 2013) Communities play a crucial role to describe and analyse modern networks. However, the size of those networks has grown tremendously with the increase of computational power and data storage. While various methods have been developed to extract community structures, their computational cost or the difficulty to parallelize existing algorithms make partitioning real networks into commu- nities a challenging problem. In this paper, we propose to alter an efficient algorithm, the Louvain method, such that communities are defined as the connected components of a tree-like assignment graph. Within this framework, we precisely describe the different steps of our algorithm and demon- strate its highly parallelizable nature. We then show that despite its simplicity, our algorithm has a partitioning quality similar to the original method on benchmark graphs and even outperforms other algorithms. We also show that, even on a single processor, our method is much faster and allows the analysis of very large networks. PACS numbers: 89.75.Fb, 05.10.-a, 89.65.-s INTRODUCTION ity is encoded in the edge weight, this task is known in graph theory as community detection and has been introduced by Newman [9]. Community detection has Over the years, the evolution of information and com- already proven to be of great interest in many different munication technology in science and industry has had research areas such as epidemiology [10], influence and a significant impact on how collected data are being spread of information over social networks [11, 12], anal- used to understand and analyse complex phenomena [1].
    [Show full text]