<<

Entropic in a non–equilibrium system: Flocks of birds

Michele Castellana,1 William Bialek,1, 2 Andrea Cavagna,2, 3 and Irene Giardina2, 3 1Joseph Henry Laboratories of and Lewis–Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States 2Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Ave., New York, New York 10016, United States 3Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Rome, Italy and Dipartimento di Fisica, Universit`aSapienza, Rome, Italy When birds come together to form a flock, the distribution of their individual velocities narrows around the mean velocity of the flock. We argue that, in a broad class of models for the joint distribution of positions and velocities, this narrowing generates an entropic that opposes the cohesion of the flock. The strength of this force depends strongly on the nature of the interactions among birds: if birds are coupled to a fixed number of neighbors, the entropic forces are weak, while if they couple to all other birds within a fixed distance, the entropic forces are sufficient to tear a flock apart. Similar entropic forces should occur in other non–equilibrium systems. For the joint distribution of protein structures and amino–acid sequences, these forces favor the occurrence of “highly designable” structures.

I. INTRODUCTION In the maximum framework, it is relatively easy to show that even non–equilibrium systems are sub- Entropic forces are a familiar concept in equilibrium ject to entropic forces. In the context of flocking, this statistical mechanics. From the to the elastic- really does mean that birds are repelled by the loss of ity of random and the effective forces between entropy associated with their mutual orientation. But, in molecules in , we know that changing the en- detail, we will see that this effect depends dramatically tropy of a system generates a force that is just as “real” on the nature of the interactions and ordering in the sys- as the forces that result from changes in . Does tem. If we imagine that the flight direction of individual this intuition carry over into complex, non–equilibrium birds maintains a certain level of correlation with the systems? average direction of its nc nearest neighbors (“topological Consider a flock of birds, or a school of fish. As the interactions” [20]), then the entropic forces are weak, and animals come close to one another, they interact in ways (in a sense that we will make precise) flocks can cohere that cause their velocities to align. If we imagine con- even without explicit forces holding them together. On structing the joint distribution of velocities for all the the other hand, if flocks are characterized by correlation birds in the flock, alignment means that the entropy of between a bird and its neighbors within some characteris- this distribution goes down. Is there a resulting entropic tic distance rc (“metric interactions”), then the entropic force that pushes the birds apart, allowing the entropy to effects are strong enough that almost all reasonable flocks increase? If this were an equilibrium system, the answer will be broken into multiple disconnected pieces unless would be yes. But this is not an equilibrium system, by there are other explicitly cohesive forces. Although such any means. Are there still entropic forces? forces surely exist, observations on real flocks of starlings In recent years there has been renewed interest in the show that the positional correlations among birds are use of maximum entropy methods to describe the collec- weak [21], so if there are strong repulsive forces from the tive behavior of biological networks, with applications entropy of flight directions, these would have to be finely spanning scales from the network of amino acids in a fam- balanced by attractive interactions. Such fine tuning is ily of proteins [1–5], to biochemical and genetic networks unnecessary in the case of topological interactions. [6, 7], networks of neurons [8–16], and flocks of birds [17, 18]. The idea of the maximum entropy method is to Although we can make analytic progress on evaluating construct the least structured model of a system that is the entropic forces that result from directional ordering consistent with certain measured average properties [19]. in the flock, computing the impact of these forces on If the only quantity that we measure is the energy, then the distribution of the birds’ positions must be done constructing the maximum entropy distribution is exactly numerically. This becomes challenging for large flocks, the construction of the thermal equilibrium, Boltzmann and so we have formulated a simpler problem in which the distribution. But in these complex biological systems, birds live not in the full three dimensional space, but on a the quantities that we can measure are not the energy, a graph, and we carry out Monte Carlo (MC) simulations and typically there are many such quantities. The maxi- in the space of these graphs. We can see that the two mum entropy distribution that we construct then is not problems have similar structure, and in particular the at all an equilibrium distribution for the system we are differences between metric and topological interactions studying, although it is mathematically equivalent to the arise in both cases; the graph model allows us to follow equilibrium statistical mechanics of some other system. these difference out to larger systems. 2

II. ENTROPIC FORCES IN MAXIMUM There are two different points of view that we can ENTROPY MODELS take on the maximum entropy construction. In the first view, the positions of the birds are known, and we are The essential intuition that we use in building maximum constructing the distribution of flight directions given entropy models for flocks of birds is that the dominant these positions. This maximum entropy distribution is interactions are local, and hence if we want to characterize   the nature of order in the flock we should measure the N N 1 X X nij(x) degree of correlation between the flight velocities ~vi of P (s|x) = exp J ~si · ~sj, (4) Z(x; J) ni(x) birds and their near neighbors [17, 18]. To be concrete, i=1 j=1 we will consider only normalized velocities neglecting variations in speed, so each bird i is described by a unit where, as usual, the partition function is given by vector ~si ≡ ~vi/ |~vi| and is located at position ~xi. Around  N N  bird i we define a neighborhood Ni, and within this X X nij(x) neighborhood there are ni neighbors—details of how this Z(x; J) = ds exp J ~si · ~sj , (5) ˆ ni(x) neighborhood is defined are discussed below. To measure i=1 j=1 the correlation of each bird’s direction with the average of its neighbors, we compute and ds ≡ d~s1 ··· d~sN . The parameter J is determined by the´ condition´ that Cint computed from this distribution N    obs 1 X 1 X matches what we observe for the real flock, Cint [17], and Cint = ~si · ~sj , (1) N ni this is equivalent to solving the equation i=1 j∈Ni

6 ∂ ln Z(x; J) where the sum over index i runs over birds with ni = 0, = NCobs. (6) and h i denotes the average with respect to the joint ∂J int distribution of directions s ≡ {~si}. It will be crucial in what follows that, although Cint depends explicitly This description of flight directions given positions is on flight directions, it depends implicitly on the birds’ useful in part because the neighbor relations among birds positions x ≡ {~xi}. We can make this explicit by defining in the flock change slowly compared to flight directions an adjacency matrix nij(x) such that nij = 1 if j ∈ Ni, [22], and fluctuations in Cint from moment to moment in and zero otherwise. Then we have a single flocking event are small. In the second view, we imagine that we observe a N N   1 X X nij(x) flock for a very long time, long enough for the birds Cint = ~si · ~sj , (2) N ni(x) to rearrange substantially, exchanging neighbors. Then i=1 j=1 when we compute the average involved in defining Cint, Eq. N X (2), we are averaging not just over flight directions but also ni(x) = nij(x). (3) over positions. Now we can ask for the maximum entropy j=1 distribution of (jointly) positions and flight directions that obs is consistent with the Cint , and the answer is If the local correlations Cint characterize the nature of ordering in the flock, then we should obtain a good approx-  N N  imation to the full, joint distribution of flight directions 1 X X nij(x) by building the maximum entropy distribution consistent P (x, s) = exp J ~si · ~sj . (7) Z (J) ni(x) 0 i=1 j=1 with the value of Cint observed in real flocks. Indeed, this works: the maximum entropy distribution that matches Cint provides accurate, parameter–free predictions for the Of course we might know more about the flock than obs behavior of two– and four–point correlations as a function just Cint . For example, we might have some information of distance, out to length scales comparable to the size of about the distribution of pairwise distances between birds, the flock itself [17]. in which case the maximum entropy distribution becomes

 N N N N  1 X X X X nij(x) P (x, s) = exp − V (|~xi − ~xj|) + J ~si · ~sj , (8) Z (J) ni(x) 1 i=1 j=1 i=1 j=1

where the effective potential V (r) must be tuned to match the distribution of pairwise distances. 3

Once we have a model for the joint distribution of set of experimental observations. The choice of which positions and velocities, we can integrate out the velocities observations to match is based on intuition, and must be to give the distribution of positions alone. We will refer tested by checking that the resulting maximum entropy to this as the “motional distribution”, Pmot(x), since if distributions actually provide an accurate description of we start in the simplest case of Eq. (7) all the nontrivial the system, as in Ref.[17]. The equivalence to equilibrium structure of this distribution arises from the motion of statistical–physics models means that we can carry over the birds. We have much of our what we know about expectation values, correlation functions and, as we have seen, entropic forces. Z(x; J) Pmot(x) ≡ dsP (x, s) = . (9) But we cannot jump from this probabilistic description ˆ Z0(J) back to a model of the underlying dynamics. Thus, allowing ourselves the usual language of statistical mechanics, the free energy F (x) = − ln Z(x; J) acts as an effective potential for the flock, III. THE MOTIONAL FREE ENERGY −F (x) Pmot(x) ∝ e . (10) Our task now is to compute the partition function in Notice that if the flock is perfectly ordered, so that all Eq. (5). It is useful to note that this can be rewritten ~si are equal, then the exponential in Eq. (7) is just JN, more symmetrically, as independent of x. In fact, real flocks are highly polarized, and we can compute Z(x; J) with an expansion around " N # X this perfectly ordered state [17, 18] by means of the spin– Z(x; J) = ds exp J (x)~s · ~s , (11) ˆ ij i j wave approximation—a method used in solid-state physics i,j=1   to study perturbations in fully ordered ferromagnetic J nij(x) nji(x) F x Jij(x) = + . (12) states [23]. In this approximation, the free energy ( ) is 2 n (x) n (x) dominated by the entropy of the fluctuations in the flight i j directions, so that gradients in this free energy constitute A configuration x defines a graph G(x) with N vertices, an entropic force on the birds’ spatial configuration. If where each nonzero element of the adjacency matrix nij(x) we have other constraints on the distribution as in Eq. corresponds to an edge between vertices i and j in G. We (8), then the free energy F (x) of the flight directions just denote by k the number of connected components in G, adds another term to the effective potential, as usual. and by Nl the number of birds in the l-th connected We conclude this discussion with a cautionary remark component, with l = 1, . . . , k. We can relabel the birds about the interpretation of maximum entropy models, so that the matrix Jij consists of k uncoupled blocks. and their relation to equilibrium statistical physics. When To evaluate Z(x; J), we are going to use the spin–wave we look at Eq. (8), it is tempting to note the equivalence approximation, which is valid at large J, such that each with a Boltzmann distribution and interpret the term in connected component of the flock is strongly polarized. the exponential as the Hamiltonian of the system, and For each block we define the net polarization we will sometimes lapse into this language ourselves. But this mapping obviously does not mean that the effective ~ 1 X Hamiltonian is really the energy of the system, nor does Sl ≡ ~si ≡ Slnˆl, (13) Nl it even mean that the dynamics correspond to Brownian i∈Nl motion in the effective potential. In more biologically motivated models of flocks, one speaks of “social forces” where Sl, nˆl are the norm and the direction of S~l, re- that drive cohesion and orientational ordering [24–28], spectively, and the sum for i ∈ Nl runs over all birds in and one might tempted to identify these social forces the l-th connected component. We now decompose the with derivatives of the two terms in the Hamiltonian, but velocity ~si, with i ∈ Nl, into components parallel and this need not be correct. As is well known, there are perpendicular to S~l infinitely many dynamical processes that can give rise to L the same stationary distribution. The maximum entropy ~si = si nˆl + ~πi. (14) method aims to characterize this distribution directly, incorporating only the structure needed to match a small Substituting into Eq. (11), we obtain

k p " N # ! δsL − − |~π |2 Y Nl L Nl Y i 1 i X L L 1 X Z(x; J) = dS~l d s d ~π exp Jij(x)(s s + ~πi · ~πj) δ S~l − ~si , (15) p 2 i j ˆ 2 1 − |~πi| Nl l=1 i∈Nl i,j∈Nl i∈Nl 4 where in Eq. (15) the first Dirac delta results from inte- ~πi are small, and we will thus expand the right–hand gration over the unit sphere, we used the fact that Jij is side of Eq. (15) in powers of ~πi with the spin–wave a block matrix, we inserted a factor of unity approximation. Specifically, we manipulate Eq. (15) 2 as follows: We neglect |~πi| in the square root in the ! L 1 X denominator, we integrate with respect to {si }, and we dS~l δ S~l − ~si = 1, ˆ Nl expand the term in parentheses in the exponent to leading i∈Nl order as and we rewrote the dot product ~si · ~sj in terms of the q q |~π |2 |~π |2 L L L 2 2 i j s , ~π coordinates by using Eq. (14). si sj = 1 − |~πi| 1 − |~πj| ≈ 1 − − . 2 2 Guided by the experimental observation that birds in a block fly in directions mostly parallel to S~l [17], In addition, we rewrite the Dirac delta function in the we assume that the perpendicular velocity components last line of Eq. (15) in terms of the sL, ~π coordinates as

! ! ! ~ 1 X 1 X L 1 X δ Sl − ~si = δ Sl − si δ ~πi , (16) Nl Nl Nl i∈Nl i∈Nl i∈Nl and we perform the integration with respect to S~l in spherical coordinates. The result is " # ! l  | |2 | |2  Y Nl X ~πi ~πj 1 X Z(x; J) = 4π d ~π exp Jij(x) 1 − − + ~πi · ~πj δ ~πi . (17) ˆ 2 2 Nl l=1 ij∈Nl i∈Nl

Note that the factors of 4π in Eq. (17) arise from the angular integration over all possible directions of the mean velocity S~l of the l-th connected component: these are explicitly entropic terms. In what follows, we will drop the first addend in the exponential in Eq. (17), because this term gives rise to a NJ multiplication factor e which is independent of the positional configuration x as noted above. Then, since ~πi is a two-dimensional vector, the integral with respect to ~πi can be rewritten as a product of two identical integrals

l ( " # !)2 Y Nl X 1 X Z(x; J) = 4π d π exp − πiΛij(x)πj δ πi , (18) ˆ Nl l=1 ij∈Nl i∈Nl where πi denotes one component of ~πi, and we introduced the Laplacian

N X Λij(x) ≡ δij Jli(x) − Jij(x); (19) l=1 since Jij has a block structure, so does Λij. We denote the eigenvalues and eigenvectors of the l-th block of the l l Laplacian by {λp} and {~vp}, respectively, so that

Nl X l l l Λij = vp,iλpvp,j, i, j ∈ Nl. (20) p=1 √ l Summing both sides of Eq. (19) with respect to j ∈ Nl, we find that the vector ~v1 = 1/ Nl(1, ··· , 1) is an eigenvector l PNl l with eigenvalue λ1 = 0 [29]. Setting ul,p ≡ i=1 vp,iπi, we finally obtain

2 l ( " Nl #   ) l Nl Y Nl X l 2 ul,1 Y Y π Z(x; J) = 4π d u exp − λ (x)u δ √ = 4πNl . (21) ˆ p l,p N λl (x) l=1 p=2 l l=1 p=2 p

In what follows, we will consider the impact of the mo- tional free energy alone, uncompensated by any other 5

knowledge of the distribution of the birds’ positions. That In the first models for collective behavior, the metric is, we will follow Eq. (9) and write structure of interactions seemed obvious. Later on, quan- titative studies on flocks of starlings [20] led to the idea of k N Y Yl π topological interactions, and this has been supported by P (x) ∝ Z(x; J) = 4πNl . (22) mot λl (x) further analyses of these data [17]. The spatial arrange- l=1 p=2 p ments of starling flocks are relatively uniform, however, so that the evidence for topological vs. metric interactions Equation (22) tells us that if the flock with N birds is based on the comparison of different flocking events, consists of one connected cluster, then there are N − 1 in which birds assemble at significantly different densi- l powers of J/π in the denominator of Pmot, since all λp ∝ ties; across many such events, the data are described J πN , and an overall factor of 4 . If the flock is cut in very accurately by a fixed number of neighbors nc, rather J/π two halves, then we lose one power of , and the factor than by a fixed interaction range rc. In what follows, we 2 4πN → (4πN/2) ; the net result is to multiply Pmot by will allow model flocks to explore a much wider range NJ. This factor is essentially the increase in entropy of configurations driven by the entropic forces, and so associated with the creation of a new zero mode in the we expect the distinction between topological and metric joint distribution; evidently for large N and large J, it interactions to become more clear. favors breaking the flock in half. There is some subtlety, however, since when we break the flock in half we also shift the spectrum of all the modes in each half, and it is not clear how this balances against the zero mode. We V. MONTE CARLO SAMPLING OF THE will see that the answer depends on the nature of the MOTIONAL DISTRIBUTION interactions between birds. We are interested in exploring the entropic forces gen- erated by the alignment of the birds, as described by Eq. IV. METRIC VS TOPOLOGICAL (22). Since we can write the distribution of positions INTERACTIONS analytically, it is natural to generate samples from this distribution with MC simulations. We will do this directly In what follows we will consider a flock of N birds in for both metric and topological interactions in Section a volume V , hence at mean density ρ = N/V . We can VA. Although there are clear results, we find that it is imagine the interaction between birds having two very difficult to push these simulations to very large N, essen- different forms [20]. In the first model, birds interact with tially because calculating the eigenvalues of the Laplacian 3 other birds within a characteristic distance rc. In the sec- requires O(N ) operations. To make progress, in Section ond model, birds interact with their nc nearest neighbors, VB we introduce a simpler version of the problem, in independent of distance. The first model is referred to as a which we sample interaction graphs rather than the un- “metric” interaction, while the second is summarized as a derlying positions. While not quite the same problem, “topological” interaction. We can compare the two models we will see consistent results that extend to much larger by equating the mean number of interacting neighbors in N. The essential point to emerge from both analyses the metric case with the fixed number of neighbors in the is that topological interactions lead to flocks that stay 3 connected for realistic values of the number of neighbors, topological case, nc = 4πρ rc /3. In both models this is the only relevant dimensionless parameter at large N and while in metric networks the repulsive entropic force is V . strong enough to rip the flock apart into multiple com- For a topological network, the adjacency matrix is ponents for all sensible values of the metric–interaction range.   1 If j =6 i is amongst the first nc nij(x) ≡ nearest neighbors of i . (23)  0 Otherwise A. Monte Carlo on positions

Here we have ni = nc for all i, and Eq. (12) takes the simple form Equation (22) gives us a model for the distribution of birds’ positions in a flock with only two parameters: the J strength of the interactions J and their range nc. Before Jij(x) = (nij(x) + nji(x)). (24) we study the effects of these interactions on the spatial 2nc configurations of the flock, we should start by asking what For a metric network, the adjacency matrix is happens if the N birds are simply in random positions, drawn uniformly throughout a box of volume V . Even  1 If j =6 i and |~x − ~x | ≤ r in this simple case there is a question about whether the n (x) ≡ j i c , (25) ij 0 Otherwise resulting network of interactions—metric or topological— supports a single, connected cluster of birds, thus allowing and the interaction matrix is given by Eq. (12). for the possibility of coherent flocking behavior. 6

1 1 a Topological case b Metric case ran Pc 0.75 0.75 ran N = 16 Pc N = 32 N = 16 c 0.5 N = 64 c 0.5 N = 32 P P mot mot Pc Pc 0.25 N = 16 0.25 N = 16 N = 32 N = 32 0 N = 64 0 1 2 4 8 4 8 16 32 64 n c 4π 3 3 rc ρ

mot ran FIG. 1: Probability of finding a connected flock. Connection probabilities Pc , Pc from the motional potential and from random positions, respectively, as functions of the number of neighbors nc, both in the topological and metric case. In the 4π 3 metric case, the average number of neighbors nc is obtained from the interaction range rc from the relation nc = 3 rc ρ, where ρ is the average density of the flock. Error bars have been estimated with the bootstrap method [30].

Concretely, for any spatial configuration we can con- entropic forces are ripping the flock apart even when the struct the adjacency matrix nij(x), see Eq. (23) or (25), range of metric interactions is so large that nc > N. the resulting interaction matrix Jij(x) in Eq. (12), and finally the Laplacian Λij(x) in Eq. (19). Then we count the zero modes of the Laplacian: if there is just one, cor- B. Monte Carlo on graphs responding to freedom in the overall flight direction, then the flock is connected; if there is more than one, then the The adjacency matrix nij(x) defines a graph G(x), as flock has broken into disconnected pieces. shown for a (very) small flock in Fig.2. In the topological

In Fig.1 we show results on the probability Pc of find- case, any positional configuration x corresponds to a graph ing a single connected cluster for random configurations, G(x) with a fixed number nc of outgoing edges per vertex as a function of nc for metric and topological interactions. and thus a fixed total number of edges. Conversely, in the We see that, for topological interactions, connectedness is metric case the number of edges connected to a vertex guaranteed by very modest values of nc. Perhaps surpris- may vary depending on the positional configuration x, ingly, this is not the case for metric interactions. When but the total number of edges M is almost fixed, with interactions are limited to a fixed distance, even random N N ! fluctuations are enough to prevent the formation of a X N 1 X N M = nij = ni ≈ nc. (26) single connected cluster, unless the range of interactions 2 N 2 i

a k =1 0110 0011 2 n =   ij 0101    0110    1 4  1 1 1 2 4 4 0 1 −5 − 1 1  4 4 2 2  Λij = − 1 1 −5 − 1 3  4 2 4 2   − − −   1 1   0 2 2 1   − −  1 9 √17 9+√17 7 λ = 0, − , , { p} 8 8 4 n o

b k =2 0 0 0 0 0 0 1 1 2 n =   ij 0 1 0 1    0 1 1 0  r   1 4 c   0 0 0 0 1 1  0 1 2 2  3 Λij = 1 − − 1  0 2 1 2   − −   0 1 1 1   − 2 − 2  λ1 = 0, λ2 = 0, 3, 3  1 { p} { }

FIG. 2: Graphs corresponding to a a two-dimensional flock with N = 4 birds and J = 1. The number of connected components, the adjacency and Laplacian matrices, and the Laplacian eigenvalues are also shown. (a) Topological case: Here nc = 2, and the graph G is a directed graph with one connected component. The Laplacian has only one null eigenvalue, corresponding to one connected component in the graph [29]. (b) Metric case: rc is the radius of the dashed circle centered around bird 4, and the graph G is an undirected graph with two connected components. The first connected component is given by bird 1 (in red), and the second connected component is given by birds 2, 3, 4 (in green). The adjacency and Laplacian matrices are block matrices composed of two blocks corresponding to the first and second connected component depicted in red and green, respectively. The Laplacian has two null eigenvalues, corresponding to two connected components in G. of these moves on the eigenvalues of the Laplacian can be probability that flocks are connected. This is confirmed computed in O(N 2) operations by using the LDL matrix by a detailed comparison between Figs.1 and3. Despite factorization method [32, 33] (see AppendixA). As a this quantitative difference, the MC on graphs confirms result, we will be able to study values of N comparable the qualitative scenario from the MC on positions: the to those of natural flocks, i.e. N ∼ 1000. metric potential has a strong repulsive effect which rips As before, we focus on the probability that the flock is the flock apart into multiple components. in a single connected cluster, Pc; results are shown in Fig. Figure1 gives a hint that, with metric interactions, 3. In the topological case, this probability is one for all larger flocks have a lower probability of being connected, values of nc, both in the case where graphs are chosen at and this trend continues to larger N for the simulation ran on graphs in Fig.3. Put another way, larger flocks random (Pc ) and when the graphs are chosen from the mot require a larger range of interaction nc in order to stay motional distribution (Pc ). In contrast, for the metric ran connected. Marking the crossover n∗ between connected case the connection probability for random graphs Pc is c and disconnected regimes by mot(n∗,N) = 1/2, we see close to one for only for nc & 10, while even larger values Pc c in Fig.4 that n∗ increases with N along an approximately of nc & 15 are needed for the connection probability from c the motional potential to be close to one. logarithmic trajectory, such that a factor of two increase in N requires an extra contact to maintain coherence. We note that our simplified MC method includes graphs which do not have a three-dimensional layout: while any configuration x can be mapped onto a graph G (Fig.2), not all graphs G correspond to a configuration x. If we VI. DISCUSSION break the graph into disconnected pieces, it is always easier to find a mapping into a configuration x, and so The past decade has seen considerable interest in the we expect that sampling graphs will overestimate the use of maximum entropy models to describe biological 8

1 Metric case P ran 0.75 c N = 128 N = 256 c 0.5 N = 512 mot P N = 1024 Pc N = 128 0.25 N = 256 N = 512 N = 1024 0 4 8 12 16 nc

FIG. 3: Probability of finding a connected flock with the simplified Monte Carlo method on graphs: connection probabilities mot ran Pc , Pc from the motional potential and from random graphs respectively as functions of the number of neighbors nc in the mot metric case. In the topological case, Pc is equal to one for all nc > 1, and it is not shown here. networks, from single protein molecules up to groups [17, 18, 34], it is plausible that the underlying interactions of organisms. The maximum entropy method has deep are local, but the precise statement is that the full cor- connections to equilibrium statistical mechanics: These relation structure in the flock as a whole is the minimal connections are a source of intuition, but also create consequence of the local correlations. opportunities for confusion. The derivation of entropic In the case of proteins, maximum entropy methods forces thus requires some care. have been used to describe the ensemble of amino–acid The maximum entropy–distribution that is consistent sequences that are consistent with being a member of a with pairwise correlations among the variables in a net- particular protein family, and hence having a particular (e.g. the normalized velocities of the birds) has the three–dimensional structure [1–5]. This is the inverse of form of a Boltzmann distribution in which the “energy” the usual problem, where we are given is built out of pairwise interactions among these variables. the sequence and asked to predict the structure. In the This is not, of course, the actual energy, and there is no (forward) folding problem, it is widely believed that inter- reason to think that the interactions out of which the actions are local and that the structures we see often really energy is built correspond to microscopic interactions. If, are at thermal equilibrium. Approximate of the as in the case of flocks, we can build a successful maxi- inverse problem show that the effective interactions in mum entropy model by matching only local correlations the maximum entropy model extend over a much shorter range than the correlations in amino–acid substitutions, to the point that one can identify physical contacts [3] and hence go quite some way toward structure prediction 12 from the sequence ensemble alone [4,5]. If we could write a successful maximum entropy model 10 for the joint distribution of amino–acid sequences and the associated protein structures, then with the sequence held ∗ c n nc∗ran c∗mot n 8 fixed it should reduce to a Boltzmann distribution over a log N + b, ran a log N + b, mot structures, with local interactions. But if we sum over sequences to obtain the distribution of structures, then— 6 by the arguments in SectionII—there will be an effective entropic potential that favors structures which can be 4 stabilized by many different amino acid sequences. These 128 256 512 1024 are precisely the “highly designable structures” identified long ago by Li and colleagues [35, 36]. N In a flock of birds, there is no part of the problem that is in thermal equilibrium, but nonetheless we can ∗ FIG. 4: Crossover values nc of the number of neighbors in the write a maximum entropy approximation to the joint ∗ metric case, given by Pc(nc ,N) = 1/2, for random graphs and distribution of positions and velocities for all the birds for graphs drawn from the motional potential, as functions of in the flock, as in Eqs. (7) or (8). Once we integrate ∗ the flock size N. The fitting functions nc = a log N + b are out the velocities, the resulting motional distribution of shown to guide the eye. positions has a term in the exponential that is exactly 9 the logarithm of the partition function for the velocities and US–AFOSR Grant No. FA95501010250 (through the at fixed positions—the free energy. For flocks that are University of Maryland). strongly polarized, as in real flocks, this free energy is dominated by the entropy of the birds’ velocities, and in this sense the flock is subject to entropic forces. We expect Appendix A: Monte Carlo update with LDL these entropic forces to be repulsive, since disconnected factorization groups of birds have more freedom to reorient their flight directions, and this intuition is borne out by detailed Here we discuss the MC simulations with the probability simulations. The surprise is that strength of this repulsion pmot(G) from Eq. (27). In AppendixA1 we show that depends dramatically on the form of the correlations that pmot(G) can be related to the LDL factorization of the we constrain. Laplacian matrix (19), and in AppendixA2 we show If we imagine that the essential correlations are between that a MC step can be performed efficiently by using a a bird and its nc nearest neighbors (topological interac- known update algorithm for LDL factorizations. For the tions), then the entropic forces are quite weak, and leave sake of simplicity, we consider the metric case, and we flocks fully connected with high probability at reasonable assume that G is connected. The results below can then values of nc. In contrast, if the essential correlations are be easily extended to the topological case and to graphs between a bird and all the other birds within fixed dis- with multiple connected components. tance rc (metric interactions), then the entropic forces are so large that flocks are almost always disconnected. Given the complicated form of the motional distribu- 1. Relation between the motional probability and tion, Monte Carlo (MC) simulations are computationally the Laplacian LDL factorization demanding, and they are thus limited to small flocks. To address this problem, we explored a slightly different Since G has only one connected component, there is formulation in which we sample graphs of bird–bird inter- a single zero eigenvalue of Λ [29], which we denote by action rather than the positions of the birds themselves; λ1, and Eq. (27) shows that the probability pmot(G) is this allows for using an efficient MC update algorithm determined by the product of the nonzero eigenvalues of based on LDL matrix factorization [32], with which we the Laplacian (19). Here, we will show that this product could analyze realistically large flocks. Our main result is is related to the LDL factorization of Λ [33], which reads in line with the one obtained by sampling the positions: For a topological network, the configurations generated by A ≡ P ΛP T = LDLT , (A1) the motional distribution are connected for all values of the number of nearest neighbors. On the other hand, for where in Eq. (A1) P is a permutation matrix, L is a a metric network the positional configurations are discon- lower-triangular matrix with unit diagonal elements, D is nected with high probability unless the metric interaction a diagonal matrix whose diagonal elements will be denoted { } range is increased to unrealistically large values. by di , and the matrix A has the same eigenvalues as In real flocks, the absence of large local density fluctua- Λ. Since Λ has one zero eigenvalue, there is a single zero D d tions means that the distinction between topological and diagonal entry in , which we denote by 1. We will now metric interactions must be based on comparisons across establish the connection between the spectrum of Λ and different flocking events [17]. Our analysis of entropic its LDL factorization by proving the following identity forces provides a different path to comparing these two N N Y Y models. While the strongly repulsive entropic effects of λi = N di. (A2) metric interactions could be compensated by explicit co- i=2 i=2 hesive forces, the fact that positional correlations in flocks are weak [21] means that these strong opposing forces To prove Eq. (A2), let us consider the characteristic would have to be carefully balanced. Such fine tuning polynomial of A: by using Sylvester’s theorem, we have is not needed if the essential correlations are topological f(λ) ≡ det(A − λI) = det(DLT L − λI). (A3) rather than metric. The characteristic polynomial (A3) can be rewritten as

N N Acknowledgments f(λ) = aN − λaN−1 + ··· + (−1) λ , (A4)

T where ai is the sum of all diagonal minors of DL L MC is grateful to TA Davis and WW Hager for ad- containing i rows and i columns [37]. Since d1 = 0, the vice and support on sparse LDL factorizations. Work first row in DLT L is zero, thus the only nonzero diagonal at Princeton and CUNY was supported in part by NSF minor with one row and one column is the one obtained Grants PHY–1305525 and CCF–0939370, by the Human by deleting the first row and the first column from DLT L. Frontiers Science Program, and by the Simons and Swartz It follows that Foundations. Work in Rome was supported in part by IIT grant Seed Artswarm, ERC–StG Grant No. 257126, aN−1 = det(B), (A5) 10 where B is a (N − 1) × (N − 1) matrix with entries Bij = where in the first line of Eq. (A11) we used Eq. (A10), PN T T −1 Dil(L L)lj, i, j = 2, ··· ,N. Since D is diagonal, and [(L L) ]1 1 denotes the entry in the first row and l=1 T −1 B is given by the product of the matices obtained by first column of (L L) . By using Cramer’s rule, the last removing the first row and the first column in D and LT L line in Eq. (A11) can be rewritten as respectively: hence, from Eq. (A5) we obtain det(C ) [(LT L)−1] = 1 1 = det(C ), (A12) N ! 1 1 T 1 1 Y det(L L) aN−1 = di det(C1 1), (A6) T 2 i=2 where we use the identity det(L L) = [det(L)] = 1. Equations (A11) and (A12) imply that det(C1 1) = N. T where C1 1 denotes the matrix obtained from L L by Substituting into Eq. (A8), we obtain Eq. (A2). deleting the first row and the first column. Since the eigenvalues of A are 0 = λ1 < λ2 ≤ · · · ≤ λN , the eigenvalues of A − λI are −λ, λ2 − λ, ··· , λN − λ. Thus, the characteristic polynomial (A3) reads

f(λ) = (−λ)(λ2 − λ) ··· (λN − λ) (A7) N Y 2 = −λ λi + O(λ ). 2. Monte Carlo with LDL-factorization update i=2 Comparing the coefficient of λ in the right–hand side of Since we intend to sample the space of graphs with a Eq. (A4) with that in the right–hand side of Eq. (A7) constant total number of edges, a MC move is given by and using Eq. (A6), we obtain one edge insertions and one edge deletion. We use Eqs. (19) and (24) to rewrite the Laplacian of G as N N Y Y " N ! # λi = det(C1 1) di. (A8) J X i=2 i=2 Λij = nil δij − nij . (A13) nc l=1 To derive Eq. (A2), let us compute det(C1 1). Equa- tions (19), (A1) show that the vector u ≡ (1, ··· , 1) We then take two vertices i and j in G that are not is an eigenvector of A with eigenvalue zero. Setting connected, and we insert an edge between them. As a T −1 G0 0 e1 ≡ (1, 0, ··· , 0), we have that (L ) e1 is also an eigen- result, we obtain a new graph with Laplacian Λ , vector of A with eigenvalue zero: J Λ0 = Λ + v · vT , (A14) T −1 T T −1 A(L ) e1 = LDL (L ) e1 (A9) nc LDe = 1 where the vector v is given by vl = δil − δjl. Since Λ is = 0, sparse and Λ0 is related to Λ by a transformation of the form (A14), it can be shown [32] that the LDL factor- T where in the second line of Eq. (A9) we have De1 = 0 ization of the Laplacian Λ0 = L0D0L0 can be computed because d1 = 0. Given that Λ is symmetric, the geometric from the LDL factorization of Λ in a number of steps multiplicity of λ1 is equal to its algebraic multiplicity, the proportional to the number of nonzero entries in L that latter being equal to one. It follows that there is only change upon the update, which is bounded above by one eigenvector of Λ with zero eigenvalue, thus only one 2 0 O(N ). The probability pmot(G ) is then obtained from eigenvector of A with zero eigenvalue: hence, u must be 0 T −1 T −1 D according to Eq. (A2). An edge insertion can be proportional to (L ) e1. Also, (L ) is upper triangu- thus performed with not more than O(N 2) operations; lar with unit diagonal entries, thus the first component T −1 by the same argument, an edge deletion—and thus a full of (L ) e1 is equal to one, implying that MC move—can be also performed with this number of − operations. u = (LT ) 1e . (A10) 1 Finally, it is important to point out that replacing the To prove Eq. (A2), we will relate the norm of u to L: denominators ni, nj in Eq. (12) with their average value nc—see SectionVB—is crucial for this efficient update N = uT u (A11) method to work: indeed, without this simplification the − − Laplacian would not have the simple form (A13), and a = eT L 1(LT ) 1e 1 1 Laplacian update upon edge insertion or deletion would T T −1 = e1 (L L) e1 not be of the form (A14), thus preventing us from using T −1 = [(L L) ]1 1, the LDL factorization update algorithm above. 11

[1] W. Bialek and R. Ranganathan. Rediscovering the power M. Viale, and A. M. Walczak. Statistical mechanics of pairwise interactions. arXiv:0712.4397 [q-bio.QM], for natural flocks of birds. P. Natl. Acad. Sci. USA, 2007. 109(13):4786–4791, 2012. [2] F. Seno, A. Trovato, J. R. Banavar, and A. Maritan. [18] W. Bialek, A. Cavagna, I. Giardina, T. Mora, O. Pohl, Maximum entropy approach for deducing amino acid E. Silvestri, M. Viale, and A. M. Walczak. Social interac- interactions in proteins. Phys. Rev. Lett., 100(7):078102, tions dominate speed control in poising natural flocks near 2008. criticality. P. Natl. Acad. Sci. USA, 111(20):7212–7217, [3] M. Weigt, R. A. White, H. Szurmant, J. A. Hoch, and 2014. T. Hwa. Identification of direct residue contacts in protein– [19] E. T. Jaynes. Information theory and statistical mechanics. protein interaction by message passing. P. Natl. Acad. Phys. Rev., 106(4):620–630, 1957. Sci. USA, 106(1):67–72, 2009. [20] M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, [4] D. S. Marks, L. J. Colwell, R. Sheridan, T. A. Hopf, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Pagnani, R. Zecchina, and C. Sander. Protein 3D A. Procaccini, M. Viale, and Z. Zdravkovic. Interaction structure computed from evolutionary sequence variation. ruling animal collective behavior depends on topological PloS one, 6(12):e28766, 2011. rather than metric distance: Evidence from a field study. [5] J. I. Sulkowska, F. Morocos, M. Weigt, T. Hwa, and J. N. P. Natl. Acad. Sci. USA, 105(4):1232–1237, 2008. Onuchic. Genomics–aided structure prediction. P. Natl. [21] A. Cavagna, A. Cimarelly, I. Giardina, A. Orlandi, Acad. Sci. USA, 109(26):10340–10345, 2012. G. Parisi, A. Procaccini, R. Santagati, and F. Stefanini. [6] T. R. Lezon, J. R. Banavar, M. Cieplak, A. Maritan, and New statistical tools for analyzing the structure of animal N. V. Fedoroff. Using the principle of entropy maximiza- groups. Math. Biosci., 214(1–2):32–37, 2008. tion to infer genetic interaction networks from gene ex- [22] A. Cavagna, S. M. D. Queiros, I. Giardina, F. Stefanini, pression patterns. P. Natl. Acad. Sci. USA, 103(50):19033– and M. Viale. Diffusion of individual birds in starling 19038, 2006. flocks. Proc. Roy. Soc. B-Biol. Sci., 280(1756):20122484, [7] G. Tkaˇcik. Information Flow in Biological Networks. PhD 2013. thesis, Princeton University, 2007. [23] F. J. Dyson. General theory of spin–wave interactions. [8] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek. Phys. Rev., 102(5):1217–1230, 1956. Weak pairwise correlations imply strongly correlated [24] I. Aoki. A simulation study on the schooling mechanism network states in a neural population. Nature, in fish. Bull. Jpn. Soc. Sci. Fish, 48(8):1081–1088, 1982. 440(7087):1007–1012, 2006. [25] T. Vicsek, A. Czirok, E. Ben-Jacob, I. Cohen, and [9] G. Tkaˇcik,E. Schneidman, M. J. Berry II, and W. Bialek. O. Shochet. Novel type of phase transition in a system of Ising models for networks of real neurons. arXiv:q- self-driven particles. Phys. Rev. Lett., 75(6):1226–1229, bio/0611072 [q-bio.NC], 2006. 1995. [10] J. Shlens, G. D. Field, J. L. Gaulthier, M. I. Grivich, [26] I. D. Couzin, J. Krause, R. James, G. D. Ruxton, and D. Petrusca, A. Sher, A. M. Litke, and E. J. Chichilnisky. N. R. Franks. Collective memory and spatial sorting in The structure of multi–neuron firing patterns in primate animal groups. J. Theor. Biol., 218(1):1–11, 2002. retina. J. Neurosci., 26(32):8254–8266, 2006. [27] G. Gr´egoire,H. Chat´e,and Y Tu. Moving and staying [11] A. Tang, D. Jackson, J. Hobbs, W. Chen, J. L. Smith, together without a leader. Physica D–Nonlinear Phenom- H. Patel, A. Prieto, D. Petrusca, M. I. Grivich, A. Sher, ena, 181(3–4):157–170, 2003. P. Hottowy, W. Dabrowski, A. M. Litke, and J. M. Beggs. [28] H. Hildenbrandt, C. Carere, and C. K. Hemelrijk. Self- A maximum entropy model applied to spatial and tem- organized aerial displays of thousands of starlings: a poral correlations from cortical networks in vitro. J. model. Behav. Ecol., 21(6):1349–1359, 2010. Neurosci., 28(2):505–518, 2008. [29] J. S. Caughman and J. J. P. Veerman. Kernels of directed [12] G. Tkaˇcik, E. Schneidman, M. J. Berry II, and graph Laplacians. Electron. J. Comb., 13(1):R39, 2006. W. Bialek. Spin glass models for a network of real neurons. [30] M. E. J. Newman and G. T. Barkema. Monte Carlo arXiv:0912.5409, 2009. Methods in Statistical Physics. Clarendon Press, 1999. [13] I. E Ohiorhenuan, F. Mechler, K. P. Purpura, A. M. [31] In Eq. (4), with topological interactions we have ni(x) = Schmid, Q. Hu, and J. D. Victor. Sparse coding and nc for all i, and hence higher–order correlations in fine–scale cortical networks. Nature, 466(7306):617–621, 2010. " N N # 1 J X X [14] E. Ganmor, R. Segev, and E. Schniedman. Sparse low– P (s|x) = exp nij (x)~si · ~sj . (A15) Z(x; J) nc order interaction network underlies a highly correlated i=1 j=1 and learnable neural population code. P. Natl. Acad. Sci. USA, 108(23):9679–9684, 2011. Comparing with Ref. [17], we see that what was J in [15] E. Granot-Atedgi, G. Tkaˇcik,R. Segev, and E. Schnei- the previous work is J/nc in the current formulation. dman. Stimulus–dependent maximum entropy mod- In the metric case, we compare Eq. (4) with the flock els of neural population codes. PLoS Comput. Biol., configurations in Ref. [17] with a nearly homogeneous 9(3):e1002922, 2013. 3 density ρ by setting ni(x) ≈ nc = 4πρ rc /3, and we obtain [16] G. Tkaˇcik,T. Mora, O. Marre, D. Amodei, M. J. Berry II, the same relation between the interaction strengths. and W. Bialek. for a network of neurons: [32] T. A. Davis and W. W. Hager. Modifying a sparse Signatures of criticality. arXiv:1407.5946 [q-bio.NC], 2014. Cholesky factorization. SIAM Journal on Matrix Analysis [17] W. Bialek, A. Cavagna, I. Giardina, T. Mora, E. Silvestri, and Applications, 20(3):606–627, 1999. 12

[33] T. A. Davis. User guide for CHOLMOD: a gence of preferred structures in a simple model of protein sparse Cholesky factorization and modification package. folding. Science, 273(5275):666–669, 1996. http://www.suitesparse.com. [36] H. Li, C. Tang, and N. S. Wingreen. Are protein folds [34] A. Cavagna, L. Del Castello, S. Dey, I. Giardina, S. Melillo, atypical? P. Natl. Acad. Sci. USA, 95(9):4987–4991, L. Parisi, and M. Viale. Short–range interaction vs. 1998. long–range correlation in bird flocks. arXiv:1407.6887 [37] N. Jacobson. Basic Algebra, volume I. Freeman, 1985. [physics.bio–ph], 2014. [35] H. Li, R. Helling, C. Tang, and N. S. Wingreen. Emer-