Communities in Networks Mason A
Total Page:16
File Type:pdf, Size:1020Kb
Communities in Networks Mason A. Porter, Jukka-Pekka Onnela, and Peter J. Mucha Introduction: Networks and Communities for an example). Each node has a degree given by “But although, as a matter of history, statistical the number of edges connected to it and a strength mechanics owes its origin to investigations in given by the total weight of those edges. Graphs thermodynamics, it seems eminently worthy of can represent either man-made or natural con- an independent development, both on account of structs, such as the World Wide Web or neuronal the elegance and simplicity of its principles, and synaptic networks in the brain. Agents in such because it yields new results and places old truths networked systems are like particles in traditional in a new light in departments quite outside of statistical mechanics that we all know and (pre- thermodynamics.” sumably) love, and the structure of interactions — Josiah Willard Gibbs, between agents reflects the microscopic rules that Elementary Principles in Statistical Mechanics, govern their behavior. The simplest types of links 1902 [47] are binary pairwise connections, in which one only cares about the presence or absence of a tie. How- rom an abstract perspective, the term ever, in many situations, links can also be assigned network is used as a synonym for a math- a direction and a (positive or negative) weight to ematical graph. However, to scientists designate different interaction strengths. across a variety of fields, this label means Traditional statistical physics is concerned with so much more [13,20,44,83,88,120,124]. the dynamics of ensembles of interacting and FIn sociology, each node (or vertex) of a network noninteracting particles. Rather than tracking the represents an agent, and a pair of nodes can be motion of all of the particles simultaneously, which connected by a link (or edge) that signifies some is an impossible task due to their tremendous num- social interaction or tie between them (see Figure 1 ber, one averages (in some appropriate manner) the microscopic rules that govern the dynamics Mason A. Porter, Oxford Centre for Industrial and Ap- of individual particles to make precise statements plied Mathematics, Mathematical Institute, University of of macroscopic observables such as temperature Oxford, and CABDyN Complexity Centre, University of Oxford. His email address is [email protected]. and density [112]. It is also sometimes possible to make comments about intermediate (mesoscop- Jukka-Pekka Onnela, Harvard Kennedy School, Harvard ic) structures, which lie between the microscopic University; Department of Physics, University of Oxford; and macroscopic worlds; they are large enough CABDyN Complexity Centre, University of Oxford; and that it is reasonable to discuss their collective Department of Biomedical Engineering and Computa- tional Science, Helsinki University of Technology. His email properties but small enough so that those proper- address is [email protected]. ties are obtained through averaging over smaller numbers of constituent items. One can similarly Peter J. Mucha, Carolina Center for Interdisciplinary take a collection of interacting agents, such as Applied Mathematics, Department of Mathematics, University of North Carolina, and Institute for Advanced the nodes of a network, with some set of micro- Materials, Nanoscience and Technology, University of scopic interaction rules and attempt to derive the North Carolina. His email address is [email protected]. resulting mesoscopic and macroscopic structures. 1082 Notices of the AMS Volume 56, Number 9 One mesoscopic structure, called a community, consists of a group of nodes that are relatively densely connected to each other but sparsely con- nected to other dense groups in the network [39]. We illustrate this idea in Figure 2 using a well- known benchmark network from the sociology literature [131]. The existence of social communities is intu- itively clear, and the grouping patterns of humans have been studied for a long time in both sociol- ogy [25, 44, 79] and social anthropology [66, 113]. For example, Stuart Rice clustered data by hand to investigate political blocs in the 1920s [106], and George Homans illustrated the usefulness of rearranging the rows and columns of data matrices to reveal their underlying structure in 1950 [60]. Robert Weiss and Eugene Jacobson per- formed (using organizational data) what might have been the first analysis of network commu- nity structure in 1955 [126], and Herbert Simon espoused surprisingly modern views on communi- ty structure and complex systems in general in the 1960s [117]. Social communities are ubiquitous, arising in the flocking of animals and in so- Figure 1. The largest connected component of cial organizations in every type of human society: a network of network scientists. This network groups of hunter-gatherers, feudal structures, roy- was constructed based on the coauthorship of al families, political and business organizations, papers listed in two well-known review families, villages, cities, states,nations, continents, articles [13,83] and a small number of and even virtual communities such as Facebook additional papers that were added groups [39, 88]. Indeed, the concept of commu- manually [86]. Each node is colored according nity is one of everyday familiarity. We are all to community membership, which was connected to relatives, friends, colleagues, and determined using a leading-eigenvector acquaintances who are in turn connected to each spectral method followed by Kernighan-Lin other in groups of different sizes and cohesions. node-swapping steps [64,86,107]. To The goals of studying social communities have determine community placement, we used the aligned unknowingly with the statistical physics Fruchterman-Reingold graph paradigm. As sociologist Mark Granovetter wrote visualization [45], a force-directed layout in his seminal 1973 paper[51] on weak ties, “Large- method that is related to maximizing a quality scale statistical, as well as qualitative, studies offer function known as “modularity” [92]. To apply a good deal of insight into such macro phenomena this method, we treated the communities as if as social mobility, community organization, and they were themselves the nodes of a political structure... But how interaction in small (significantly smaller) network with groups aggregates to form large-scale patterns connections rescaled by inter-community eludes us in most cases.” links. We then used the Kamada-Kawaii Sociologists recognized early that they need- spring-embedding graph visualization ed powerful mathematical tools and large-scale algorithm [62] to place the nodes of each data manipulation to address this challenging individual community (ignoring problem. An important step was taken in 2002, inter-community links) and then to rotate and when Michelle Girvan and Mark Newman brought flip the communities for optimal placement graph-partitioning problems to the broader atten- (including inter-community links). See the tion of the statistical physics and mathematics main text for further details on some of the communities [48]. Suddenly, community detec- ideas in this caption. (We gratefully tion in networks became hip among physicists acknowledge Amanda Traud for preparing this and applied mathematicians all over the world, figure.) and numerous new methods were developed to try to attack this problem. The amount of research in this area has become massive over the past seven years (with new discussions or algorithms posted on the arXiv preprint server almost every day), and the study of what has become known October 2009 Notices of the AMS 1083 context, the term module is typically used to re- fer to a single cluster of nodes. Given a network that has been partitioned into nonoverlapping modules in some fashion (although some meth- ods also allow for overlapping communities), one can continue dividing each module in an iterative fashion until each node is in its own singleton community. This hierarchical partitioning process can then be represented by a tree, or dendro- gram (see Figure 2). Such processes can yield a hierarchy of nested modules (see Figure 3), or a collection of modules at one mesoscopic 32 29 5 6 level might be obtained in an algorithm inde- 2 8 7 26 11 pendently from those at another level. However 17 25 obtained, the community structure of a network 1 24 refers to the set of graph partitions obtained at 2 each “reasonable” step of such procedures. Note 34 3 that community structure investigations rely im- 33 4 plicitly on using connected network components. 31 8 (We will assume such connectedness in our discus- 30 12 sion of community-detection algorithms below.) 13 27 Community detection can be applied individually 14 23 to separate components of networks that are not 18 21 2 connected. 0 19 22 9 16 Many real-world networks possess a natural hi- 15 10 erarchy. For example, the committee assignment Figure 2. (Top) The Zachary Karate Club network of the U.S. House of Representatives in- network [131], visualized using the cludes the House floor, groups of committees, com- Fruchterman-Reingold method [45]. Nodes are mittees, groups of subcommittees within larger colored black or white depending on their committees, and individual subcommittees [100, later club affiliation (after a disagreement 101]. As shown in Figure 4, different House com- prompted the organization’s breakup). The mittees are resolved into distinct modules within dashed lines separate different communities, this network. At a different hierarchical level, small which were determined using a groups of committees belong to larger but less leading-eigenvector spectral maximization of densely connected modules. To give an example modularity [86] with subsequent closer to home, let’s consider the departmental Kernighan-Lin node-swapping steps (see the organization at a university and suppose that discussion in the main text). (Bottom) Polar the network in Figure 3 represents collaborations coordinate dendrogram representing the among professors. (It actually represents grass- results of applying this community-detection land species interactions [23].) At one level of algorithm to the network.