PHYSICAL REVIEW X 6, 031005 (2016)
Detectability Thresholds and Optimal Algorithms for Community Structure in Dynamic Networks
Amir Ghasemian,1 Pan Zhang,2 Aaron Clauset,1,3,4 Cristopher Moore,4 and Leto Peel5,6 1Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA 2CAS Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China 3BioFrontiers Institute, University of Colorado, Boulder, Colorado 80305, USA 4Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA 5ICTEAM, Université catholique de Louvain, Avenue George Lemaître 4, B-1348 Louvain-la-Neuve, Belgium 6naXys, University of Namur, Rempart de la Vierge 8, 5000 Namur, Belgium (Received 2 November 2015; revised manuscript received 17 May 2016; published 13 July 2016) The detection of communities within a dynamic network is a common means for obtaining a coarse- grained view of a complex system and for investigating its underlying processes. While a number of methods have been proposed in the machine learning and physics literature, we lack a theoretical analysis of their strengths and weaknesses, or of the ultimate limits on when communities can be detected. Here, we study the fundamental limits of detecting community structure in dynamic networks. Specifically, we analyze the limits of detectability for a dynamic stochastic block model where nodes change their community memberships over time, but where edges are generated independently at each time step. Using the cavity method, we derive a precise detectability threshold as a function of the rate of change and the strength of the communities. Below this sharp threshold, we claim that no efficient algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this threshold. The first uses belief propagation, which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the belief propagation equations. These results extend our understanding of the limits of community detection in an important direction, and introduce new mathematical tools for similar extensions to networks with other types of auxiliary information.
DOI: 10.1103/PhysRevX.6.031005 Subject Areas: Complex Systems, Interdisciplinary Physics, Statistical Physics
I. INTRODUCTION coarse graining of the network, revealing the large-scale structure of the system. In the simple case of static networks, Many complex systems can be represented as networks, we now have rigorous methods for accomplishing this that is, as a set of elements characterized by pairwise task, using Bayesian techniques and probabilistic generative interactions. Examples of networks are plentiful and models [9–15]. Importantly, we also have a precise math- include friendships or communication in a social network, ematical understanding of when they can or cannot succeed regulatory interactions among genes, transportation [14,16]. However, real-world networks are rarely simple, between cities, and hyperlinks between documents or and are often accompanied by auxiliary data, such as weights websites. Furthermore, many, perhaps even most, of these on edges [17] or metadata on nodes [18–22]. Extending the networks are dynamic in nature, and their evolving struc- – rigorous results for static networks to these richer graph ture is often represented as a sequence of graphs [1 8]. structures remains an important direction of study. Here, we A common step in analyzing the structure of such net- focus on the question of dynamic networks, in which each works is the detection of communities, in which we seek to node’s connections may change over time. divide a network into groups of nodes that play similar Community detection in dynamic networks inherits structural roles. A good division should provide a structural many of the challenges of static networks, including learning the number of communities, their sizes and node membership, and the pattern of connections among com- munities, e.g., assortative or disassortative (or, in physical Published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License. Further distri- terms, ferromagnetic or antiferromagnetic). However, it bution of this work must maintain attribution to the author(s) and also poses new challenges, as both the network topology the published article’s title, journal citation, and DOI. and the community memberships may evolve over time.
2160-3308=16=6(3)=031005(9) 031005-1 Published by the American Physical Society GHASEMIAN, ZHANG, CLAUSET, MOORE, and PEEL PHYS. REV. X 6, 031005 (2016)
Community detection in dynamic networks has a long network snapshot aggregates these calls over a month. In history, and a number of techniques have been previously this case, belief propagation (BP) algorithms, like the ones developed. For instance, there are variants of multilayer or we develop here, are often asymptotically optimal, and we temporal modularity maximization [5,23,24], non-negative may use the cavity method to compute the detectability matrix or tensor factorization [3,6,8,25,26], minimum threshold exactly. While our results are not mathematically description length [27,28], and probabilistic models rigorous, we believe that they can be made so using the [4,7,29–33]. References [34,35] provide more comprehen- techniques of Refs. [16,48–50]. sive reviews of this work. Approaches for detecting Finally, we give two principled and efficient algorithms for communities in multiplex networks are also relevant detecting communities in real dynamic networks. The first [36–45], as dynamic networks are a special case of algorithm uses BP to pass messages between neighbors both multiplex networks, in which the layers are organized in within a network at a particular time and between consecutive a linear sequence. However, despite these varied efforts, up networks in order to integrate information over the network’s to now we have lacked a theoretical understanding of the time series in an optimal way. We then linearize BP to obtain optimality of these techniques, when or how they tend to a second spectral algorithm, based on a dynamical version fail, or whether there are fundamental limits to detecting of the nonbacktracking matrix [50,51]. Through numerical community structure in dynamic networks. experiments, we confirm our theoretical calculations by Here, we answer these questions by deriving a precise showing that these algorithms accurately recover the true threshold on the detectability of communities in dynamic community structure in dynamic networks all the way down networks, whose location depends only on the rate of to the generalized detectability threshold. change of the community structure and on its strength. Below this sharp threshold, we claim no efficient algorithm II. DYNAMIC STOCHASTIC BLOCK MODEL can recover the true communities better than chance. Furthermore, we give two algorithms that are optimal in The stochastic block model is a classic model of the sense that they succeed all the way down to this community structure in static networks [46,47]. To obtain threshold. These results generalize the theoretical insights a theoretical understanding of detectability in dynamic of Refs. [14,15] for community detection in static networks networks, we use a variant of the stochastic block model in to the dynamic setting, in which detectability depends on which the community labels of nodes change over time, but where edges are independent conditioned on these labels. both spatial and temporal coupling between nodes. The This particular model is also a special case of several mathematical tools we use to obtain these results are also models previously introduced for community detection in general, and could be used to obtain similar extensions to dynamic networks [4,7,29–31]. A crucial feature of the networks with other types of auxiliary information. variant we study is that it captures the dynamic behavior of Our approach exploits the powerful tools of probabilistic changing community labels but is analytically tractable. generative models and Bayesian inference, which we use to Our model generates a dynamic sequence of graphs study the limits of the community detection problem using G t (V;E t ), with 1 ≤ t ≤ T. There are V n the cavity method of statistical physics. We begin with the ð Þ¼ ð Þ j j¼ nodes divided into k groups. Each graph has its own well-known stochastic block model [46,47], a generative group assignment, represented by an n-dimensional vector model for static networks with community structure. We of labels fg ðtÞ ∈ f1; …;kgji ∈ Vg. To generate this note that there are several dynamic variants of this model i sequence, we start by drawing g ð1Þ from a prior distribu- [29–31] and its mixed-membership version [7], and the i tion, where each node has initial probability q of being in variant that we analyze here is a special case of some of r community 1 ≤ r ≤ k. In successive steps t>1, each node these models. Specifically, we mathematically study a updates its label according to a transition matrix τ, moving model in which nodes change their community member- to group r from group s with probability τ . Finally, the ship over time according to a Markov process, and edges rs edges EðtÞ are generated independently for each t accord- are generated independently at each time step. As a result, ing to a k × k matrix p, connecting each pair of nodes i, j the network of connections between nodes at different at time t with probability p . The likelihood of the times is locally treelike. giðtÞ;gjðtÞ In many real-world systems, edge occurrence can corre- graph sequence is then late across time [1]. In this case, however, our model can τ still be applied if the edges have a time scale that is short PðfGðtÞg; fgðtÞgjp; q; Þ relative to the time windows over which the interactions YT are aggregated, which returns us to a setting in which the ¼ P(gð1Þ) P(gðtÞjgðt − 1Þ) t¼2 dynamic network will be locally treelike. For instance, consider a network of phone calls or emails where the YT Y Y p 1 − p ; autocorrelation time governing conversations (or sequences × giðtÞ;gjðtÞ ð giðtÞ;gjðtÞÞ of successive calls) is on the order of days, but where each t¼1 ði;jÞ∈EðtÞ ði;jÞ∉EðtÞ
031005-2 DETECTABILITY THRESHOLDS AND OPTIMAL … PHYS. REV. X 6, 031005 (2016) and as its root: each node in this tree has “children” consisting Y of its spatial and temporal neighbors. Using the cavity ( 1 ) P gð Þ ¼ qg ð1Þ; method, we can think of inference as a reconstruction i ’ Yi problem on this tree, where each child s label is transmitted, ( − 1 ) τ with some noise, to its parent. As stated above, we P gðtÞjgðt Þ ¼ giðtÞ;giðt−1Þ: i assume that node labels are copied along temporal edges with probability η and replaced with uniformly random In our analysis, we focus on a uniform initial prior labels with probability 1 − η. Similarly, since each edge in 1 qr ¼ =k and the popular special case, where prs ¼ cin=n EðtÞ exists with probability cin=n if the labels are the ≠ ’ if r ¼ s and cout=n if r s for constants cin, cout. same, and with probability cout=n otherwise, Bayes s rule The average degree of each graph is then implies that labels are copied along a spatial edge with − 1 c ¼½cin þðk Þcout =k. For simplicity, we also assume probability the transition matrix τ has a special form, where the node − keeps its label with probability η and chooses a uniformly cin cout λ ¼ random label with probability 1 − η. In that case, kc
J and replaced with a random label with probability 1 − λ. τ ¼ η1 þð1 − ηÞ ; ð1Þ k Thus, we can think of the labels on spatial and temporal edges as following a Markov process with stochastic where 1 is the identity matrix and J is the all-1’s matrix. transition matrices σ and τ, respectively, where
III. GENERALIZED DETECTABILITY np J σ ¼ ¼ λ1 þð1 − λÞ ; ð2Þ THRESHOLD kc k In this context, the community detection task consists of τ η and is given by Eq. (1). recovering the labels fgiðtÞg given the parameters p, q, We now consider the question of whether information and the sequence of graphs fGðtÞg. We now consider under from distant leaves on this tree is transmitted to the root. what conditions we can perform this task better than The tree is generated by a two-type branching process: chance. For static networks, previous work has shown that following a temporal edge leads to a node with one there exists a phase transition below which no algorithm temporal child, while following a spatial edge leads to a can succeed [14,15]; for the case k 2, this is now known ¼ node with two temporal children, and in both cases the rigorously [16]. This threshold occurs at a critical value number of spatial children is Poisson distributed with of c − c , which depends on the average degree, in out pffiffiffi mean c. The transition matrix describing the expected namely, c − c k c. j in outj¼ number of children of each type is then ðc cÞ. On the other In a dynamic network where community memberships 2 1 hand, besides the trivial eigenvalue 1 corresponding to the change slowly, we can learn more about a network and its uniform distribution, the eigenvalues of the transition large-scale structure by integrating its edges over time. matrices σ and τ are λ and η, respectively. The results of Summing GðtÞ to form a single graph yields a denser Ref. [52] then imply that the detectability transition occurs network, in which case we would expect to be able to 2 2 cλ cλ − ≠ 0 when the largest eigenvalue of the matrix ð2η2 η2 Þ crosses detect its community structure whenever cin cout .On the other hand, if node labels at successive steps are unity. This yields uncorrelated, we can do no better than to treat each graph sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi in GðtÞ separately as a static graph. We thus expect the 1 − η2 1 − η2 λ2 − community detection threshold in dynamic networks to c ¼ 1 η2 or jcin coutj¼k c 1 η2; ð3Þ interpolate between its static value at η ¼ 0 and zero þ þ at η ¼ 1. pffiffiffi To facilitate our analysis, we define a spatiotemporal which ranges from the static threshold k c when η ¼ 0 to graph with Tn vertices iðtÞ, one for each node at each zero when η ¼ 1, as expected. time step. In addition to the “spatial” edges (iðtÞ;jðtÞ) ∈ The expression of Eq. (3) holds in the limit T → ∞. EðtÞ for each t, we add “temporal” edges (iðtÞ;iðt 1Þ) We can compute the corresponding finite-time threshold connecting each node with its time-adjacent copies. Since for a fixed T by diagonalizing a ð3T − 2Þ-dimensional the spatial edges EðtÞ are independent and sparse, short matrix, where we have a branching process with states loops in this spatiotemporal graph are rare, implying that it corresponding to moving along spatial, forward-temporal, is locally treelike. or backward-temporal edges at each time step. In particular, Now consider the neighborhood of a particular node iðtÞ. the threshold thenpffiffiffiffiffiffiffiffi ranges from the static value for η ¼ 0 to − η 1 Moving outward in space and time, there is a tree with iðtÞ jcin coutj¼k c=T for ¼ .
031005-3 GHASEMIAN, ZHANG, CLAUSET, MOORE, and PEEL PHYS. REV. X 6, 031005 (2016)
In spin glass theory, this type of threshold is called the Almeida-Thouless line [53]; in probability and information theory, it is known as the Kesten-Stigum bound or the robust reconstruction threshold [52]. In the static case, it has been shown rigorously for k ¼ 2 that below this point community detection is information theoretically impos- sible [16].Fork>4 groups (and k ¼ 4 in the disassortative − 0 or antiferromagnetic case cin cout < ), it was first con- jectured [14,15], and later proved [54,55], that there is an additional region below the threshold where community detection is information theoretically possible, but expo- nentially hard, so that no efficient algorithm can do FIG. 1. A schematic representation of belief propagation better than chance. We make the same claims for the messages [see Eqs. (5) and (6)] being passed along spatial and dynamic case. temporal edges in the spatiotemporal graph.
IV. BAYESIAN INFERENCE AND μi→j ’ BELIEF PROPAGATION messages r ðtÞ: this is i s estimate, sent to j, of the probability that i belongs to group r at time t. Similarly, Given an observed graph sequence GðtÞ, we wish to infer iðtÞ→iðt 1Þ the temporal message μr is i’s estimate of this the posterior distribution of group assignments fgðtÞg.For probability sent to its past and future selves. fixed p, q, and η, Bayes’s rule gives us For general values of the connection probabilities prs and the transition matrix τ, the update equation for the P PðfEðtÞg; fgðtÞgÞ spatial messages is PðfgðtÞgjfEðtÞgÞ ¼ 0 : ð4Þ fg0ðtÞgPðfEðtÞg; fg ðtÞgÞ 1 X X μi→j τ μiðt−1Þ→iðtÞ τ μiðtþ1Þ→iðtÞ We are especially interested in the one-point marginals of r ðtÞ¼ i→j rs s sr s Z ðtÞ s s this distribution, i.e., the probability distribution of giðtÞ for Y X Y X l→i l→i each node i at each step t. We denote this as × crsμs ðtÞ ð1−prsÞμs ðtÞ; l∶ði;lÞ∈EðtÞ s l∶ði;lÞ∉EðtÞ s l≠j l≠j μi ðtÞ¼Pðg ðtÞ¼rjfEðtÞgÞ r Xi ð5Þ δ ¼ PðfgðtÞgjfEðtÞgÞ giðtÞ;r: fgðtÞg and the update equation for the temporal messages is X As always, computing the denominator in Eq. (4) is → 1 1 −1 → μiðtÞ iðtþ Þ ¼ τ μiðt Þ iðtÞ difficult, as it is a sum over kTn terms. In physical terms, r iðtÞ→iðtþ1Þ rs s Z s this quantity is a partition function and thus we do not Y X μl→i expect to be able to compute PðfgðtÞgjfEðtÞg;p;ηÞ × crs s ðtÞ l∶ l ∈ s exactly. However, we can approximate it with variational ði;YÞ EðtÞ X methods. Since the spatiotemporal graph is locally treelike, l→i × ð1 − prsÞμs ðtÞ; ð6Þ we can make a Bethe approximation, corresponding to the l∶ði;lÞ∉EðtÞ s cavity method in physics or belief propagation in machine learning. This allows us to approximate the marginals and where Zi→jðtÞ and ZiðtÞ→iðt 1Þ are normalization factors, the free energy of the model in an efficient and asymp- → −1 and similarly for μiðtÞ iðt Þ with τ transposed. Finally, once totically optimal way. r the messages reach a fixed point, we compute the marginals In belief propagation, vertices send their neighbors at each vertex by taking all its incoming messages into “messages” consisting of estimates of their marginal account: distributions. Each vertex updates its message to each of its neighbors, based on the message it receives from its 1 X X i iðt−1Þ→iðtÞ iðtþ1Þ→iðtÞ other neighbors. It does this using Bayes’s rule, assuming μ ðtÞ¼ τ μs τ μs r ZiðtÞ rs sr that its neighbors are independent of each other (condi- Y s X s tioned on that vertex’s state). We then update the messages l→i × crsμs ðtÞ until they reach a fixed point. l∶ði;lÞ∈EðtÞ s In our dynamic setting, we have two kinds of messages, Y X 1 − μl→i passing along the spatial and temporal edges of the × ð prsÞ s ðtÞ: ð7Þ spatiotemporal graphs (Fig. 1). We denote the spatial l∶ði;lÞ∉EðtÞ s
031005-4 DETECTABILITY THRESHOLDS AND OPTIMAL … PHYS. REV. X 6, 031005 (2016)
Of course, when t ¼ 1 or t ¼ T, we remove the term For k ≤ 3, the detectability transition is second order, corresponding to the temporal edge coming from outside with the optimal accuracy going to zero continuously at the the domain of t. transition. In analogy with the static block model [14,15], In the update equations given here, we have OðTn2Þ we believe that for k>4 (or k ≥ 4 for the disassortative messages, with spatial messages between both neighboring case) the detectability transition becomes first order. Then and non-neighboring pairs of nodes. As in Refs. [13–15],in there is an additional regime where there are at least two the sparse case prs ¼ crs=n, we can approximate the effect competing fixed points, the trivial one and an accurate one, of non-neighboring pairs with an external field, so that both of which are locally stable. However, the basin of we only need to keep track of OðTnÞ messages between attraction of the accurate fixed point is exponentially small, nodes and their spatiotemporal neighbors. This amounts to so that BP with random initial messages will almost always writing converge to the trivial fixed point. In this regime, commu- Y X nity detection is information theoretically possible, but it l→i −hrðtÞ ð1 − prsÞμs ðtÞ¼e ; would require exponential time to search the space of l s possible fixed points. Physically, there is a free-energy barrier between the trivial fixed point, which corresponds to where a paramagnetic phase, and the accurate fixed point, which corresponds to a ferromagnetic one. 1 X X h ðtÞ¼ c μl→iðtÞ: ð8Þ r n rs s l s V. SPECTRAL CLUSTERING
Furthermore, in our special case where τ and σ are given by In dense networks, a common approach to detecting Eqs. (1) and (2), we have communities is spectral clustering, which is accomplished by examining the eigenvectors of either the adjacency or X −1 → −1 → 1 − η Laplacian matrix. In sparse networks, approach fails strictly τ μiðt Þ iðtÞ ¼ ημiðt Þ iðtÞ þ ; rs s r k above the detectability threshold [51,57].However,itis s known that this difficulty can be circumvented, in static X 1 − η iðtþ1Þ→iðtÞ iðtþ1Þ→iðtÞ networks, by using a nonbacktracking matrix or Hashimoto τsrμs ¼ ημr þ ; s k operator [50,51], which prevents localization of the eigen- X vectors to the high-degree vertices. Here, we extend these l→ l→ 1 − λ c μ iðtÞ¼λμ iðtÞþ : techniques to derive a spectral algorithm for sparse rs s r k s dynamic networks. The idea is simply to linearize the BP update equations As in Refs. [14,15], the BP equations have a trivial fixed around the trivial fixed point described above, expanding μi→j point where all the messages are uniform: ðtÞ¼ all the messages to first order around 1=k.Inthestatic → 1 μiðtÞ iðt Þ ¼ 1=k for all i, j, and t. The Kesten-Stigum case, this linearization yields the nonbacktracking matrix. transition computed above is precisely where this fixed Its second through kth eigenvectors are correlated with point becomes unstable. Below the transition, BP con- the true community structure all the way down to the verges to the trivial fixed point, all marginals are uniform, detectability transition [50,51], so that we can label nodes and the algorithm performs no better than chance. using a clustering technique in Rk−1 such as the k-means However, above this transition, the trivial fixed point is algorithm. unstable, and BP converges to a nontrivial fixed point. If the For the dynamic block model, linearizing the BP update network is generated by our model and we know the correct equations around the trivial fixed point gives a 2km × 2km parameters, we expect this nontrivial fixed point to give an matrix, where m is the total number of edges in the asymptotically correct estimate of the marginals Eq. (7) up spatiotemporal graph. Analogous to Ref. [51], this is the to a permutation of the groups. In physical terms, we are on tensor product of a k × k matrix with a 2m × 2m matrix, the Nishimori line, so there is no static replica symmetry which we can simplify further by writing it in terms of the breaking and no spin glass phase [56]. Then, if we assign total incoming and outgoing messages at each vertex. This each node its most likely label at each time, setting gives a 4nT × 4nT matrix: d μi giðtÞ¼argmaxr rðtÞ, this assignment maximizes the 0 1 fraction of correct labels. Thus, our BP algorithm succeeds λAspatial −λ1 λAspatial 0 all the way down to the detectability threshold given by B C B spatial spatial C Eq. (3), and is asymptotically optimal in terms of its B λðD − 1Þ 0 λD 0 C B ¼ B C: ð9Þ B temp temp C accuracy. We expect that this can be made rigorous, @ ηA 0 ηA −η1 A at least in the case k ¼ 2, using the techniques of ηDtemp 0 ηðDtemp − 1Þ 0 Refs. [16,48,49].
031005-5 GHASEMIAN, ZHANG, CLAUSET, MOORE, and PEEL PHYS. REV. X 6, 031005 (2016)
Here, 1 denotes the nT-dimensional identity matrix, Atemp the network at each time is an Erdős-Rényi random graph is the adjacency matrix of temporal edges, Dtemp is the with no community structure. In terms of ϵ, the detect- diagonal matrix of temporal degrees, Aspatial is the adja- ability transition in Eq. (3) occurs at cency matrix of spatial edges, and Dspatial is the diagonal sffiffiffiffiffiffiffiffiffiffiffiffiffi matrix of spatial degrees. That is, 1 − ϵ 1 1 − η2 λ ffiffiffi : ¼ 1 − 1 ϵ ¼ p 1 η2 ð10Þ Atemp δ δ δ þðk Þ c þ 0 ¼ uvð t;t0þ1 þ t;t0−1Þ; ðu;tÞ;ðv;t Þ 2 if 1