Extracting the Hierarchical Organization of Complex Systems Marta Sales-Pardo ∗, Roger Guimer`A ∗ , Andr´Ea

Extracting the hierarchical organization of complex systems Marta Sales-Pardo ∗, Roger Guimerà ∗ , AndréA. Moreira ∗ , and Lu´ısA. Nunes Amaral ∗ y ∗Department of Chemical and Biological Engineering and Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL 60208, USA Submitted to Proceedings of the National Academy of Sciences of the United States of America Extracting understanding from the growing \sea" of biological and a network into modules, which are comprised of sub-modules socio-economic data is one of the most pressing scientific challenges and so on1. facing us. Here, we introduce and validate an unsupervised method A method for the identification of the hierarchical orga- that is able to accurately extract the hierarchical organization of complex biological, social, and technological networks. We define nization of nodes in a network must fulfill two requirements: an ensemble of hierarchically nested random graphs, which we use (i) it must be accurate for many types of networks, and (ii) to validate the method. We then apply our method to real-world it must identify the different levels in the hierarchy as well as networks, including the air-transportation network, an electronic cir- the number of modules and their composition at each level. cuit, an email exchange network, and metabolic networks. We find The first condition may appear as trivial, but we make it ex- that our method enables us to obtain an accurate multi-scale de- scriptions of a complex system. plicit to exclude algorithms that only work for a particular network or family of networks, but that will otherwise fail. complex networks j hierarchical organization j multi-scale representation j cel- The second condition is more restrictive, as it excludes meth- lular metabolism j transportation networks ods whose output is subject to interpretation. Specifically, a method does not fulfill the second condition if it organizes he high-throughput methods available for probing bio- nodes into a tree structure, but it is up to the researcher to Tlogical samples have drastically increased our ability to find a \sensible" criterion to establish which are the different gather comprehensive molecular-level information on an ever levels in that tree. An implication of the previous two re- growing number of organisms. These data show that these quirements is that any method for the identification of node systems are connected through a dense network of nonlinear organization must have a null output for networks, such as interactions among its components [1, 2], and that this in- Erd}os-Rényi random graphs, which do not have an internal terconnectedness is responsible for their efficiency and adapt- structure. ability. At the same time, however, such interconnectedness To our knowledge, there is no procedure that enables one poses significant challenges to researchers trying to interpret to simultaneously assess whether a network is organized in a empirical data and to extract the \systems biology" princi- hierarchical fashion and to identify the different levels in the ples that will enable us to build new theories and to make hierarchy in an unsupervised way. Ravasz et al. [12] studied new predictions [3]. the hierarchical structure of metabolic networks, but in their A central idea in biology is that life processes are hierarchi- analysis the authors put emphasis on detecting \global sig- cally organized [2, 4, 5, 6] and that this hierarchical structure natures" of a hierarchical network architecture. Specifically, plays an important role in their dynamics [7]. However, given they reported that, for the metabolic networks studied and for a set of genes, proteins, or metabolites and their interactions, certain hierarchical network models, the clustering coefficient −1 we still do not have an objective manner to assess whether of nodes appears to scale with the connectivity as C(k) ∼ k . such hierarchical organization does indeed exist, or to objec- This scaling, however, is neither a necessary nor a sufficient tively identify the different levels in the hierarchy. condition for a network to be hierarchical [13]. Here, we report a new method that identifies the levels More direct methods to investigate the hierarchical or- in the organization of complex systems and extracts the rele- ganization of the nodes in a network have also been recently vant information at each level. To illustrate the potetial of proposed [14, 15, 16]. Although useful in some contexts, these our method, it is useful to think of electronic maps as in methods do not clearly identify hierarchical levels and thus fail http://maps.google.com (Fig. S1). Electronic maps are powerful to satisfy condition (ii) above. Furthermore, all these methods arXiv:0705.1679v1 [physics.soc-ph] 11 May 2007 tools because they present information in a scalable manner, yield a tree even for networks with no internal structure. that is, despite the increase in the amount of information as In the following, we define inclusion hierarchies in complex we \zoom out," the representation is able to extract the in- networks and describe an ensemble of hierarchically nested formation that is relevant at the new scale. In a similar spirit, random graphs. We then introduce a method that is able to our method will enable researchers to characterize each scale accurately extract the hierarchical organization of hierarchi- with the relevant information at that scale. This achievement cal random graphs. Finally, we apply our method to several is key for the development of systems biology, but will en- real-world networks. counter application in many other areas. yTo whom correspondence should be addressed. E-mail: [email protected] Background c 2006 by The National Academy of Sciences of the USA Complex networks are convenient representations of the in- 1We do not consider other hierarchical schemes that classify nodes according to, for instance, their importance [9]. Another issue that we do not address here is that of \overlapping" modules. In the teractions within complex systems [8]. Here, we focus on the literature, some authors refer to the existence of \soft" boundaries between communities [10, 11]. identification of inclusion hierarchies in complex neworks, that However, there has been so far no rigorous connection between the soft boundaries and the overlap between communities. Moreover, at present, there is no theoretical model that includes overlapping is, to the unraveling of the nested organization of the nodes in modules, that is, modules that share nodes, as opposed to communities that share edges. www.pnas.org | | PNAS Issue Date Volume Issue Number 1{6 Inclusion hierarchies els are more cohesive, whereas for ρ > 1, they are less cohesive Consider the ensemble of networks comprised of N nodes, (Supplementary Information). N = f ni : i = 1;:::;N g, that hold membership in a set of nested groups, G =f g(k1 ::: k`) : ` = 1; 2 ::: g, where ` is the Extracting the hierarchical organization of networks level at which the group is defined, and the labels k : : : k 1 `−1 Our method consists of two major steps (Fig. 1): (i) measuring indicate the groups at higher levels in the hierarchy within the \proximity" in the hierarchy between all pairs of nodes, which the group is nested. For instance, group g is a group 111 which we call node affinity; and (ii) uncovering the overall hi- defined at ` = 3 that is nested inside group g defined at 11 erarchical organization of node affinities, or, in other words, ` = 2, which in turn is a subgroup of group g defined at 1 detecting the underlying organization of group memberships. ` = 1. Node affinity| A standard approach for quantifying the affin- Let Gi ⊂ G be the set of groups in which node ni holds ity between a pair of nodes in a network is to measure their membership. Here, we consider that node ni holds membership in only one group per level, and that membership to \topological overlap" [12, 21, 22], which is defined as the ratio between the number of common neighbors of the two nodes groups follows a nested hierarchy. Therefore, for node ni to and the minimum degree of the two nodes. This measure hold membership in group g11, node ni must also hold mem- identifies affinity between nodes with a dense pattern of local bership in group g1. connections. Because topological overlap is a local measure, it We assume that the probability pij of the edge (ni; nj ) being present in a network is a function solely of the set of will fail to detect any structure when a network is not locally dense (Fig. 2). co-memberships Mij = Gi \Gj of the two nodes. Note that We propose a new affinity measure based on surveying of our assumptions imply that: (i) Mij obeys transitivity, so the modularity landscape [23], a collective property of the net- that if Mij = Mik, then Mij = Mjk; and (ii) node member- work. Our definition of affinity between nodes draws upon the ships in groups f gk1k2 g at the second level are uniquely and completely defined by the sub-network of connections of all idea that modules correspond to sets of nodes which are more strongly interconnected than one would expect from chance nodes holding membership in group gk1 , that is, information at deeper levels in the hierarchy is totally decoupled from the alone [23, 24]. We show below that our affinity measure de- information at higher levels in the hierarchy. tects the modular structure even in the absence of a dense pattern of local connections. In the simplest scenario, pij is a non-decreasing function of Consider the ensemble P of all partitions of a network into the cardinality x of Mij , which implies that groups of nodes holding membership in the same groups will be more densely modules [23, 25], and assign to each partition P the modular- connected than a randomly selected group of nodes.

Extracting the Hierarchical Organization of Complex Systems Marta Sales-Pardo ∗, Roger Guimer`A ∗ , Andr´Ea

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support