arXiv:cond-mat/0205405v1 [cond-mat.dis-nn] 20 May 2002 h n farnol hsneg sproportional is edge chosen one. randomly at low- a a the of for at end distribution degree than the the vertex that means high-degree edges This a more since at degree, high end of vertices of favor in biased admycoe deo h rp.Tedge fthis to of according degree distributed The not graph. is the vertex on edge chosen randomly htarnol hsnvre ntegahwl have will graph the on vertex chosen degree randomly a that ihdge distribution degree with the of target these degree the of the to on none attachment also in of depend vertex probability models: the network does other models as well as 14]. these 13, 12, [11, degree networks skewed many other in and seen probable power-law at- distributions the the Preferential as for of accepted attached.) number explanation widely is the are is it processes vertex which the tachment a to of of vertices degree degree other the is (The given of vertex vertex. a target function of target a increasing) probability to the (usually connection which some a in forming 10] vertex 9, attachment source 8, preferential 7, model or [6, well-studied advantage model particularly cumulative One have the 6]. which is 5, the of [4, in networks world many of and real features literature, reproducing proposed at physics successful been been the have A networks in 3]. of studied 2, models [1, of networks variety biological and networks, networks, social computer edges—including by together joined tices aeb nudrce rp of graph undirected an by case systems. networked of ac- real-world many into of correctly it reproduce behaviors take of to the fail not behavior necessarily do the will that on count Models effect substantial systems. networked a assortative have that by can simulations, networks solv- mixing numerical exactly of and using variety models argue, a able then in and pres- measurement, the mixing direct demonstrate assortative first we of paper ence this low- In to ones. dis- attach degree show vertices vertices Others mixing—high-degree high-degree vertices. high-degree assortative for other preference to attach a on to are i.e., mixing” dependencies degrees, “assortative such show their hand networks other Many the common. on world real the oee,teei niprateeetmsigfrom missing element important an is there However, ver- of networks—sets of form the take systems Many osdrte ewr,rpeetdi h simplest the in represented network, a then Consider k o osdravre ece yfloiga following by reached vertex a consider Now . n htte r lomr outt etxremoval. vertex to robust easily more also more are percolate they to that tend and networks numerically. assortative and that analytically find ar both networks study social we disassort which be that network, to show tend to networks biological it connectio and use many technological with and networks nodes other for to mixing connected be to tend tions ewr ssi oso sottv iigi h oe nth in nodes the if mixing assortative show to said is network A eateto hsc,Uiest fMcia,AnAbr M Arbor, Ann Michigan, of University Physics, of Department p k htis, That . at eIsiue 39Hd akRa,SnaF,N 87501 NM Fe, Santa Road, Park Hyde 1399 Institute, Fe Santa N etcsand vertices p sottv iigi networks in mixing Assortative k p source steprobability the is k nta tis it Instead . etx In vertex. M .E .Newman J. E. M. edges, kp k , distribution soels hntettldge n ec sdistributed is hence ( and to degree total proportion number the This in than along. less arrived one we is one in the but than vertex, other a vertex such of degree total the the in not interested ahrta just than rather σ tahee napretyasraientok .. one i.e., network, assortative which value, perfectly maximal a with its convenient on by achieves is dividing it it by it pur- networks, normalize the assorta- different to For comparing for as- of respectively. no negative mixing poses for or disassortative zero positive or is tive and function mixing correlation This sortative [15]. edges ec h omlzdcreainfnto is function correlation normalized the hence t nie na nietdgraph undirected a an of in on symmetric end is indices either quantity its This at [34]. vertices edge two chosen randomly the of degrees remaining tity P ere tete nso neg n isi h range the in lies and edge an of the ends − of either coefficient at correlation Pearson degrees the simply is which r mixing, mixing fasraiemxn a eqatfidb h con- the by quantified be function can correlation degree-degree mixing nected assortative of h u rules sum the q 2 1 na bevdntok ecnrwie()as (3) rewrite can we network, observed an on r olwn Callaway Following nantokwt oasraie(rdisassortative) (or assortative no with network a In jk = ≤ = eann degree remaining tv.W rps oe fa assortative an of model a propose We ative. e jk e jk r P M jk ( ihntefaeoko hsmdlwe model this of framework the Within ≤ e e e k hnterdssottv counterparts disassortative their than ob h on rbblt itiuino the of distribution probability joint the be to jk − s edfieamaueo assortative of measure a define We ns. jk jk = M fe sottvl ie,btthat but mixed, assortatively often e k 3] o h rcia ups fevaluating of purpose practical the For [35]. 1 1 − 2 P ae h value the takes ewr hthv ayconnec- many have that network e − q q q ildffrfo hsvleadteamount the and value this from differ will q k k k X 1 i jk j δ r P − q ftermiigdge sthen is degree remaining the of 2 1 jk k = ( e ,where ), p i hsvlei qa otevariance the to equal is value This . j  k jk i P k j 2 q 80–10and 48109–1120 I σ i nti ae,w iluulybe usually will we paper, this In . 1) + 1 k + k q 2 1 = k tenme fegslaigthe leaving edges of number —the i = tal. et X k kq − jk i 2 p , ( ) k  k k  M h jk − +1 P 2 . . . 1) + 1] enwdfietequan- the define now we [15], q  ( − h orcl normalized correctly The . ftedistribution the of j j e M 1 i q X jk jp k j P niae naeaeover average an indicates p − fteei assortative is there If . k j − 1 +1 e i P jk 2 1 q e j jk ( . i q j = i k 1 2 = ) + ( q , j k h e i k . jk kj + i )  n obeys and , −h i− k 2 i )  q j 2 k ih , and , k i (4) (1) (3) (2) = 2

network n r For the , since edges are placed at ran- a physics coauthorship 52 909 0.363 dom without regard to vertex degree it follows trivially biology coauthorshipa 1 520 251 0.127 b that r = 0 in the limit of large graph size. The model mathematics coauthorship 253 339 0.120 of Callaway et al. however, although apparently similar film actor collaborationsc 449 913 0.208 in construction, gives a markedly different result. From company directorsd 7 673 0.276 Internete 10 697 −0.189 Eq. (21) of Ref. 15, ejk for this model satisfies the recur- World-Wide Webf 269 504 −0.065 rence relation protein interactionsg 2 115 −0.156 h − − real-world networks neural network 307 −0.163 (1+4δ)ejk =2δ(ej 1,k + ej,k 1)+ pjpk, (5) food webi 92 −0.276 u k k+1 random graph 0 and the is pk = (2δ) /(1+2δ) . Callaway et al.v δ/(1+2δ) Substituting into Eq. (3) and making use of Eq. (2), we Barab´asi and Albertw 0 models then find that r = δ/(1+2δ). Thus the model shows significant , with a maximum value of TABLE I: Size n and coefficient r for a num- r = 1 in the limit of large δ. This agrees with intu- ber of different networks: collaboration networks of (a) sci- 2 ition [15]: in the grown graph the older vertices have entists in physics and biology [16], (b) mathematicians [17], (c) film actors [4], and (d) businesspeople [18]; (e) connections higher degree and also tend to have higher probability between autonomous systems on the Internet [19]; (f) undi- of being connected to one another, simply by virtue of rected hyperlinks between Web pages in a single domain [6]; being around for longer. Thus one would expect positive (g) protein-protein interaction network in yeast [20]; (h) undi- assortative mixing. rected (and unweighted) synaptic connections in the neu- The model of Barab´asi and Albert [6] provides an inter- ral network of the nematode C. Elegans [4]; (i) undirected esting counter-example to this intuition. Although this trophic relations in the food web of Little Rock Lake, Wis- is a grown graph model, in which again older vertices consin [21]. The last three lines give analytic results for model have higher degree [23], it shows no assortative mixing at networks in the limit of large network size: (u) the random all. Making use of Eq. (42) of Ref. 24 we can show that graph of Erd˝os and R´enyi [22]; (v) the grown graph model of ejk for the model of Barab´asi and Albert goes asymp- Callaway et al. [15]; (w) the model of 2 2 4 Barab´asi and Albert [6]. totically as 1/(j k ) 6/(j + k) in the limit of large j and k, which implies− that r 0 as (log2 N)/N as N becomes large. The model of→ Barab´asi and Albert has been used as a model of the structure of the Internet and where ji, ki are the degrees of the vertices at the ends of the ith edge, with i =1 ...M [36]. the World-Wide Web. Since these networks show signif- In Table I we show values of r for a variety of real-world icant disassortative mixing however (Table I), it is clear networks. As the table shows, of the social networks that the model is incomplete. It is an interesting open studied (the top five entries in the table) all have signifi- question what type of network evolution processes could cant assortative mixing, which accords with accepted wis- explain the values of r observed in real-world networks. dom within the sociological community. By contrast, the Turning now to theoretical developments, we propose technological and biological networks studied (the mid- a simple model of an assortatively mixed network, which dle five entries) all have disassortative mixing—high de- is exactly solvable for many of its properties in the limit gree vertices preferentially connect with low degree ones of large graph size. Consider the ensemble of graphs in and vice versa. Various explanations for this observation which the distribution ejk takes a specified value. This suggest themselves. In the case of the Internet, for ex- defines a random graph model similar in concept to the ample, it appears that the high degree vertices mostly random graphs with specified degree sequence [5, 25, 26], represent connectivity providers—telephone companies except for the added element of assortative mixing. and other communications carriers—who typically have Consider a typical member of this ensemble in the limit a large number of connections to clients who themselves of large graph size, and consider a randomly chosen edge have only a single connection [19]. Thus the high-degree in that graph, one end of which is attached to a vertex of vertices do indeed tend to be connected to the low-degree degree j. We ask what the probability distribution is of ones. the number of other vertices reachable by following that We have also calculated r analytically for three mod- edge. Let this probability distribution be generated by a els of networks: (1) the random graph of Erd˝os and generating function Gj (x), which depends in general on R´enyi [22], in which edges are placed at random between the degree j of the starting vertex. By arguments similar a fixed set of vertices; (2) the grown graph model of to those of Ref. 5, we can show that Gj (x) must satisfy Callaway et al. [15], in which both edges and vertices a self-consistency condition of the form are added at random at constant but possibly different k rates, the ratio of the rates being denoted δ; (3) the grown k ejk Gk(x) G (x)= x , (6) j   graph model of Barab´asi and Albert [6], in which both P k ejk edges and vertices are added, and one end of each edge P is added with linear preferential attachment. while the number of vertices reachable from a randomly 3 chosen vertex is generated by dynamics conserves the degree sequence, is ergodic on

∞ the set of graphs having that degree sequence, and, with k the choice of acceptance probability above, satisfies de- H(x)= xp0 + x pk Gk−1(x) . (7) tailed balance for state probabilities i ejiki , and hence kX=1   has the required edge distribution ejkQas its fixed point. The average size of the to which such a vertex As an example, consider the symmetric binomial form ′ belongs is given by the derivative of H: s = H (1) = j + k j + k ′ h i −(j+k)/κ j k k j 1+ kp G (1). Differentiating Eq. (6) we then get ejk = e p q + p q , (10) k k k−1 N  j   k   P q A−1 q 1 −1/κ s =1 z , (8) where p + q = 1, κ> 0, and = 2 (1 e ) is a nor- h i − · · malizing constant. (The binomialN probabilities− p and q where z is the mean degree, q is the vector whose ele- should not be confused with the quantities pk and qk in- ments are the qk, and A is the asymmetric matrix with troduced earlier.) This distribution is chosen for analytic elements Ajk = kejk qkδjk. tractability, although its behavior is also quite natural: − Equation (8) diverges at the point at which the deter- the distribution of the sum j + k of the degrees at the minant of A is zero. This point marks the phase trans- ends of an edge falls off as a simple exponential, while ition at which a giant component forms in our graph. By that sum is distributed between the two ends binomially, considering the behavior of Eq. (8) close to the transition, the parameter p controlling the assortative mixing. From where s must be large and positive in the absence of Eq. (3), the value of r is a gianth component,i we deduce that a giant component 8pq 1 exists in the network when det A > 0. This is the ap- r = − , (11) 2e1/κ 1+2(p q)2 propriate generalization for a network with assortative − − mixing of the criterion of Molloy and Reed [26] for the which can take both positive and negative values, passing existence of a giant component. 1 1 √ through zero when p = p0 = 2 4 2=0.1464 . . . To calculate the size S of the giant component, we In Fig. 1 we show the size of− the giant component for define uk to be the probability that an edge connected graphs of this type as a function of the degree scale pa- to a vertex of remaining degree k leads to another vertex rameter κ, from both our numerical simulations and the that does not belong to the giant component. Then exact solution above. As the figure shows, the two are in ∞ k good agreement. The three curves in the figure are for ejku k k k p =0.05, where the graph is disassortative, p = p0, where S =1 p0 pkuk−1, uj = . (9) − − P ejk Xk=1 k it is neutral (neither assortative nor disassortative), and P p =0.5, where it is assortative. As with most other random graph models, including the As κ becomes large we see the expected phase trans- original model of Erd˝os and R´enyi, it is usually not pos- ition at which a giant component forms. There are two sible to solve for S in closed form, but we can determine important points to notice about the figure. First, the it by numerical iteration from a suitable set of starting position of the phase transition moves lower as the graph values for uk. becomes more assortative. That is, the graph percolates To test these results and to help form a more complete more easily, creating a giant component, if the high- picture of the properties of assortatively mixed networks, degree vertices preferentially associate with other high- we have also performed computer simulations, generat- degree ones. Second, notice that, by contrast, the size of ing networks with given values of ejk and measuring their the giant component for large κ is smaller in the assor- properties directly. Generating such networks is not en- tatively mixed network. tirely trivial. One cannot simply draw a set of degree These findings are intuitively reasonable. If the net- pairs (ji, ki) for edges i from the distribution ejk, since work mixes assortatively, then the high-degree vertices such a set would almost certainly fail to satisfy the basic will tend to stick together in a subnetwork or core group topological requirement that the number of edges end- of higher mean degree than the network as a whole. It is ing at vertices of degree k must be a multiple of k. In- reasonable to suppose that percolation would occur ear- stead therefore we propose the following Monte Carlo lier within such a subnetwork. Conversely, since perco- algorithm for generating graphs. lation will be restricted to this subnetwork, it is not sur- First, we generate a random graph with the desired prising that the giant component has a smaller size in this degree distribution according to the prescription given in case than when the network is disassortative. These re- Ref. 26. Then we apply a Metropolis dynamics to the sults could have implications, for example, for the spread graph in which on each step we choose at random two of disease on social networks [27]—social networks being edges, denoted by the vertex pairs, (v1, w1) and (v2, w2), assortatively mixed in many cases, as Table I shows. The that they connect. We measure the remaining degrees core group of an assortatively mixed network could form (j1, k1) and (j2, k2) for these vertex pairs, and then re- a “reservoir” for disease, sustaining an epidemic even in place the edges with two new ones (v1, v2) and (w1, w2) cases in which the network is not sufficiently dense on av- with probability min(1, (ej1j2 ek1k2 )/(ej1k1 ej2k2 )). This erage for the disease to persist. On the other hand, one 4

1.0 attacks on the highest degree vertices are much more effective, these vertices being broadly distributed over 0.8 the network and presumably therefore forming links on many paths between other vertices. For networks of the S type described by Eq. (10) we find that the number of 0.6 high-degree vertices that need to be removed to destroy similarly sized giant components is greater by a factor 0.4 assortative of about five to ten in an assortative network (p = 0.5) neutral than in a disassortative one (p = 0.05) for the typical giant component disassortative parameter values studied here. 0.2 These considerations paint rather a grim picture: the networks that we might want to break up, such as the social networks that spread disease, appear to be assor- 0.0 1 10 100 tative, and therefore are resilient, at least against simple targeted attacks such as attacks on the highest degree exponential parameter κ vertices. And yet at the same time the networks that we would wish to protect, including technological networks such as the Internet, appear to be disassortative, and are FIG. 1: Size of the giant component as a fraction of graph size hence particularly vulnerable. for graphs with the edge distribution given in Eq. (10). The To conclude, in this paper we have studied assortative points are simulation results for graphs of N = 100 000 ver- mixing by degree in networks—the tendency for high- tices while the solid lines are the numerical solution of Eq. (9). degree vertices to associate preferentially with other high- Each point is an average over ten graphs; the resulting statis- degree vertices. We have defined a scalar measure of tical errors are smaller than the symbols. The values of p are assortative mixing and used it to show that many so- 0.5 (circles), p0 = 0.146 . . . (squares), and 0.05 (triangles). cial networks have significant assortative mixing, while technological and biological networks seem to be disas- sortative. We have also proposed a model of an assor- would expect the disease to be restricted to a smaller tatively mixed network, which we have solved exactly segment of the population in such cases than for diseases using generating function techniques, and also simulated spreading on neutral or disassortative networks. using a Monte Carlo graph sampling method. Within Assortative mixing also has implications for questions this model we find that assortative networks percolate of network resilience, the subject of much discussion in more easily and that they are also more robust to removal the recent literature [28, 29, 30, 31, 32]. It has been found of their highest degree vertices, while disassortative net- that the connectivity of many networks (i.e., the exis- works percolate less easily and are more vulnerable. This tence of paths between pairs of vertices) can be destroyed suggests that social networks may be robust to interven- by the removal of just a few of the highest degree vertices, tion and attack while technological networks are not. a result that may have applications in, for example, vac- cination strategies [33]. In assortatively mixed networks, The author thanks Duncan Callaway, Michelle Girvan, however, we find numerically that removing high-degree Cris Moore, and Martina Morris for helpful comments, vertices is a relatively inefficient strategy for destroying and ´aszl´oBarab´asi, Jerry Davis, Jerry Grossman, Ha- network connectivity, presumably because these vertices woong Jeong, Neo Martinez, and Duncan Watts for pro- tend to be clustered together in the core group, so that viding network data used in the calculations for Table I. removing them is somewhat redundant. In a disassor- This work was funded in part by the National Science tative network with a similarly sized giant component Foundation under grant DMS–0109086.

[1] S. H. Strogatz, Exploring complex networks. Nature 410, random networks. Science 286, 509–512 (1999). 268–276 (2001). [7] H. A. Simon, On a class of skew distribution functions. [2] R. Albert and A.-L. Barab´asi, Statistical mechanics of Biometrika 42, 425–440 (1955). complex networks. Rev. Mod. Phys. 74, 47–97 (2002). [8] D. J. de S. Price, A general theory of bibliometric and [3] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of other cumulative advantage processes. J. Amer. Soc. In- networks. Advances in Physics 51, 1079 (2002). form. Sci. 27, 292–306 (1976). [4] D. J. Watts and S. H. Strogatz, Collective dynamics of [9] P. L. Krapivsky, S. Redner, and F. Leyvraz, Connectivity ‘small-world’ networks. Nature 393, 440–442 (1998). of growing random networks. Phys. Rev. Lett. 85, 4629– [5] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Ran- 4632 (2000). dom graphs with arbitrary degree distributions and their [10] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, applications. Phys. Rev. E 64, 026118 (2001). Structure of growing networks with preferential linking. [6] A.-L. Barab´asi and R. Albert, Emergence of scaling in Phys. Rev. Lett. 85, 4633–4636 (2000). 5

[11] D. J. de S. Price, Networks of scientific papers. Science Combinatorial Theory A 24, 296–307 (1978). 149, 510–515 (1965). [26] M. Molloy and B. Reed, A critical point for random [12] R. Albert, H. Jeong, and A.-L. Barab´asi, Diameter of the graphs with a given degree sequence. Random Structures world-wide web. Nature 401, 130–131 (1999). and Algorithms 6, 161–179 (1995). [13] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power- [27] M. Morris, Telling tails explain the discrepancy in sexual law relationships of the internet topology. Computer partner reports. Nature 365, 437–440 (1993). Communications Review 29, 251–262 (1999). [28] R. Albert, H. Jeong, and A.-L. Barab´asi, Attack and [14] L. A. N. Amaral, A. Scala, M. Barth´el´emy, and H. E. error tolerance of complex networks. Nature 406, 378– Stanley, Classes of small-world networks. Proc. Natl. 382 (2000). Acad. Sci. USA 97, 11149–11152 (2000). [29] R. Cohen, K. Erez, D. ben-Avraham, and S. Havlin, Re- [15] D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. J. silience of the Internet to random breakdowns. Phys. Rev. Newman, and S. H. Strogatz, Are randomly grown graphs Lett. 85, 4626–4628 (2000). really random? Phys. Rev. E 64, 041902 (2001). [30] R. Cohen, K. Erez, D. ben-Avraham, and S. Havlin, [16] M. E. J. Newman, The structure of scientific collabora- Breakdown of the Internet under intentional attack. tion networks. Proc. Natl. Acad. Sci. USA 98, 404–409 Phys. Rev. Lett. 86, 3682–3685 (2001). (2001). [31] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and [17] J. W. Grossman and P. D. F. Ion, On a portion of the D. J. Watts, Network robustness and fragility: Percola- well-known collaboration graph. Congressus Numeran- tion on random graphs. Phys. Rev. Lett. 85, 5468–5471 tium 108, 129–131 (1995). (2000). [18] G. F. Davis, M. Yoo, and W. E. Baker, The small world [32] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, At- of the corporate elite. Preprint, University of Michigan tack vulnerability of complex networks. Phys. Rev. E 65, Business School (2001). 056109 (2002). [19] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. J. [33] R. Pastor-Satorras and A. Vespignani, Immunization of Shenker, and W. Willinger, The origin of power laws complex networks. Phys. Rev. E 65, 036104 (2002). in Internet topologies revisited. In Proceedings of the [34] A related quantity has been studied by Krapivsky and 21st Annual Joint Conference of the IEEE Computer Redner [24] in the context of the model of Barab´asi and and Communications Societies, IEEE Computer Society Albert [6]. That quantity however, which is denoted nkl, (2002). is more complex than the one used here, being asymmet- [20] H. Jeong, S. Mason, A.-L. Barab´asi, and Z. N. Oltvai, ric in its indices, because one index is designated as being Lethality and in protein networks. Nature 411, the “ancestral” index with respect to the order in which 41–42 (2001). the graph was grown. [21] N. D. Martinez, Artifacts or atributes? Effects of resolu- [35] The quantity r can easily be generalized to the case of tion on the Little Rock Lake food web. Ecological Mono- a directed network, where ejk is asymmetric and r = 61 − graphs , 367–392 (1991). jk jk(ejk qj qk)/(σinσout), with σin and σout being [22] B. Bollob´as, Random Graphs. Academic Press, New theP standard deviations of the remaining degrees at the York, 2nd edition (2001). in-going and out-going ends of the edge respectively. [23] L. A. Adamic and B. A. Huberman, Power-law distribu- [36] One can use either the total degrees or the remaining tion of the world wide web. Science 287, 2115a (2000). degrees to evaluate Eq. (4)—the answer is the same ei- [24] P. L. Krapivsky and S. Redner, Organization of growing ther way. Note also that we have written Eq. (4) in a form random networks. Phys. Rev. E 63, 066123 (2001). manifestly symmetric in ji and ki, so that it doesn’t mat- [25] E. A. Bender and E. R. Canfield, The asymptotic number ter which end of an edge is which. of labeled graphs with given degree sequences. Journal of