niiul scnetae rudtema.Hwvr when most www.pnas.org/cgi/doi/10.1073/pnas.1713019115 However, of mean. the the if around i.e., popu- concentrated the network, of is the representative individuals in is nodes it generaliza- if of a meaningful lation such really only However, mixing is average network. the tion whole capture the to across useful (13). is pattern coefficient statistic assortativity summary the calculating a by Such is network a in ity network. the for process generative specific a imply cor- to are as insufficient assortativity just of that, observations causation, note imply to not important does relation is It (13). net- degree) biological (node and works networks technological food and disassortative ecological category), (gender), (metabolic of and webs networks smoking Examples dating as heterosexual 12). such social include patterns (11, and behavioral race, habits as age, drinking well to as respect (10), with status observed has been “”) assortativity example, frequently (or For (9). selection result “contagion”) (or of a influence as and processes occur complementary val- correlations the These attribute of (8). of connections correlations across net- positive ues Social contain or (disassortativity). frequently (assortativity) ones works links characteristics different observe with same those to the between likely with more nodes are between we whether gen- make about to us eralizations allows analysis attributes This is edges. node across of structure “metadata”) mixing, (or are assortative network or they the correlation, the that characterize on based suggests to structures approach for- but observed One unknown, link diverse. of largely The variety still broad are interactions. systems the pairwise these between indicate in links processes to and mation rep- system, used to the are used of are them components Nodes or domains. 7) entities (6, resent technological and 5), (4, N networks complex examine closely more can networks. we in which patterns through mixing lens pro- clearer assorta- method a of Our vides multimodal. distribution and overdispersed, find the skewed, We networks, is network. tivity real-world the many in for patterns mixing that, of eval- distribution qualitatively node and the capture the uate to at able scales, are we we multiple that Consequently, across level. so assortativity, measure the global we describe this Here, can localize norms. pat- to cultural mixing approach in an differences in introduce of differences consequence local a exhibit as terns may globe network the social a spanning example, For when heterogeneous. statistic are representative a patterns be mixing not assortativity may of and level network average the the to across categori- is according value of global case com- assortativity This the metadata. is in of cal modularity It level or coefficient, networks. the assortativity complex measure the of to organization practice can attributes) the mon different on with light nodes to shed linking disassortativity of or preference assortativity politi- of or (the race, level age, the same Quantifying the belief. tendency of cal higher people between a occurring as property links manifesting a of networks, is It social other. in each found to 2017) often link 21, the to July with metadata, review nodes or for for attributes, (received tendency same 2018 the 1, is March networks approved in mixing and Assortative CA, Berkeley, California, of University Wachter, W. Kenneth by Edited Kingdom United 6GG, OX2 Oxford Oxford, Universit (CORE), Econometrics and Research Belgium; B-1348, Louvain-la-Neuve a Peel networks Leto in patterns mixing Multiscale nttt fIfrainadCmuiainTcnlge,EetoisadApidMteais(CEM,Universit (ICTEAM), Mathematics Applied and Electronics Technologies, Communication and Information of Institute h tnadapoc oqatfigtelvlo assortativ- of level the quantifying to approach standard The tok r sda omnrpeetto o wide a for biological (1–3), representation social spanning common systems, a complex of as variety used are etworks a,b,1 enCalsDelvenne Jean-Charles , | assortativity | multiscale b au nttt o ope ytm nXs,Universit (naXys), Systems Complex for Institute Namur ahlqed ovi,Luanl-ev -38 egu;and Belgium; B-1348, Louvain-la-Neuve Louvain, de Catholique e ´ | a,c,1 oemetadata node n eadLambiotte Renaud and , ok httn ofcso igesca ieso eg,a (e.g., dimension tech- social Now, single net- interest). a common small-scale or on these mixing environment focus for assortative working reasonable to specific of be tend pattern may that the value works single Summarizing a 16). group as or 15, organization specific 2, rea- a practical to (1, limited For often observations. were these or sons, social surveys pro- conducting Previously, manual of time-consuming cesses via networks. collected were massive-scale with data capture, interaction issue process to us pertinent and enabled have particularly store, that a technology in becomes advances modern network a across quantify- of patterns for outlier means or a anomalous descrip- identifying provide accurate interaction. or an not diversity the present does ing not it may Furthermore, measure tion. global single patterns, mixing a diverse contain and heterogeneous are networks 1073/pnas.1713019115/-/DCSupplemental at online information supporting contains article This under distributed is article 1 BY-NC-ND) (CC 4.0 access License NonCommercial-NoDerivatives open This Submission. Direct PNAS a is article This interest. of conflict no declare authors The paper. L.P., the and wrote data; R.L. analyzed L.P. and research; J.-C.D., designed R.L. and J.-C.D., L.P., contributions: Author of measure node-centric a popula- develop entire we the issue, this of address network representation To social global poor tion. the a Facebook that be the indicates may patterns in mixing assortativity nodes in variation of high subset A variation (14). a this of on example mixing an introduce, in will shows, we 1 methods Fig. dif- the backgrounds. to using cultural due and example, demographic for in arise, ferences conceivably mixing could heterogeneous which for patterns, opportunities social more multidimensional present vast networks These (17). population for global the account the of to of reported component previously connected was largest network the Facebook interac- instance, social For auto- of data. the amounts tion for larger allow increasingly platforms of collection media matic social online as such nology [email protected] [email protected]. [email protected], or Email: [email protected], addressed: be may correspondence whom To ol eosue ysmaiigasraiiywt single a with statistic. assortativity summarizing that by patterns mixing obscured and rich be network data, would empirical a of within variety heterogene- a assortativity of in the of exhibit, properties characterize variations the local to and with method ity a covaries design edges pres- We the of nodes. how absence describe and to used ence net- correlation a assortativity, of exhibit- on analogue network focus work we the Here heteroge- of behaviors. different parts of ing different types small- with other their patterns, Despite present neous heterogeneity nodes. may the networks the for worldness, absence degree is the characteristic through science a instance of for network systems, real-life of in present theme central A Significance d,1 uniyn iest n esrn o iigmyvary may mixing how measuring and diversity Quantifying eNmr au -00 Belgium; B-5000, Namur Namur, de e ´ d ahmtclIsiue nvriyof University Institute, Mathematical . ahlqed Louvain, de Catholique e ´ Cetv omn Attribution- Commons Creative www.pnas.org/lookup/suppl/doi:10. NSLts Articles Latest PNAS . c etrfrOperations for Center | ∼ 10% f6 of 1

APPLIED MATHEMATICS The global assortativity coefficient rglobal for categorical attri- butes compares the proportion of links connecting nodes with same attribute value, or type, relative to the proportion expected if the edges in the network were randomly rewired. The differ- ence between these proportions is commonly known as modu- larity Q, a measure frequently used in the task of community detection (18). The assortativity coefficient is normalized such that rglobal = 1 if all edges only connect nodes of the same type (i.e., maximum modularity Qmax) and rglobal = 0 if the number of edges is equal to the expected number for a randomly rewired network in which the total number of edges incident on each type of node is held constant. The global assortativity rglobal is given by (13)

P P 2 Q g egg − g ag rglobal = = P 2 , [1] Qmax 1 − g ag

in which egh is half the proportion of edges in the network that Fig. 1. Local assortativity of gender in a sample of Facebook friendships connect nodes with type yi = g to nodes with type yj = h (or the (14). Different regions of the graph exhibit strikingly different patterns, sug- P P proportion of edges if g = h) and ag = h egh = i∈g ki /2m is gesting that a single variable, e.g., global assortativity, would provide a poor the sum of degrees (ki ) of nodes with type g, normalized by twice description of the system. the number of edges, m. We calculate egh as 1 X X the assortativity within a local neighborhood. Varying the size of e = A , [2] gh 2m ij the neighborhood allows us to interpolate from the mixing pat- i:yi =g j :yj =h tern between an individual node and its neighbors to the global assortativity coefficient. In a number of real-world networks, we where Aij is an element of the adjacency matrix. The normaliza- P 2 find that the global assortativity is not representative of the col- tion constant Qmax = 1 − g ag ensures that the assortativity coef- lective patterns of mixing. ficient lies in the range −1 ≤ r ≤ 1 (see Supporting Information).

1. Mixing in Networks Local Patterns of Mixing. The summary statistic rglobal describes Currently, the standard approach to measure the propensity of the average mixing pattern over the whole network. However, links to occur between similar nodes is to use the assortativity as with all summary , there may be cases where it pro- coefficient introduced by Newman (13). Here we will focus on vides a poor representation of the network, e.g., if the network undirected networks and categorical node attributes, but assor- contains localized heterogeneous patterns. Fig. 2 illustrates an tativity and the methods we propose naturally extend to directed analogy to Anscombe’s quartet of bivariate datasets with iden- networks and scalar attributes (Supporting Information). tical correlation coefficients (19). Each of the five networks in

ABCDE

Fig. 2. Five networks (Top) of n = 40 nodes and m = 160 edges with the same global assortativity rglobal = 0, but with different local mixing patterns generated by varying how edges are assigned across subgroups of node types (Middle). The different local mixing patterns can be seen by the distributions of rmulti (Bottom). In A, the mixing pattern is homogeneous across all nodes, and this is reflected in the distribution of rmulti by a unimodal distribution that is peaked around 0. In B–E, there are different heterogeneous mixing patterns, with E being the most extreme case in which half the nodes are highly assortative and half the nodes are highly disassortative. These differences can be observed in the different distributions of rmulti.

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1713019115 Peel et al. ele al. et Peel ( type. same ( the distribution of stationary nodes the between show 1 bars links of blue of The probability proportion green. restart expected or of with yellow the values nodes: walk to of random relative types a type two to with same network according the weighted of are nodes network connecting the network the in links of 3. Fig. will walker random type simple with a node that a probability from jump total the is which equivalently can we that is observation key Eq. a rewrite context, this probability equal In tra- with is network direction the each of in edge versed every Then, degree. its to proportional probability stationary the work, probability, equal with edge attribute node the at is walker dom value each where We node series less. time define To we weighted random concrete, assortativity. analysis a are local series time the interest with calculating connection of the in make point strategy far- similar the as values a such from which adopt filter time in local in average, a use moving ther to weighted A is exponential trends. problem important the this any but to ignores mean, solution and the noise common process of as noise estimate variation accurate all the an treats of over provides Averaging analy- series descriptive in. interested whole Direct more are the we time. be signal underlying over may the of varies series than signal the noisy of a sis how interpret to assign would we then type, of node’s all the when of Also, estimate preference. a poor mixing on potentially only a based providing assortativity sample, calculating small be would with we nodes degree, For low problems. encounter can approach this However, Eq. adjusting by node given a of interest neighborhood of local the within pattern mixing the 2 (Fig. mixing homogeneous with creates sub- network subgroups four a between the uniformly between edges and Distributing within groups. edges of types placement the the depicts of subgroups sized each equally splitting two into by formed has are each mixing that and such type types, same different the of nodes ( cross a by ( indicated mond attribute, binary a to ( respect nodes with of number same the ( have row top the m ABC efc iia susi iesre nlsswe ewish we when analysis series time in issues similar face We assortativity of measure local a propose We 160 = i xml ftelclasraiiymauefrctgrclatiue.( attributes. categorical for measure assortativity local the of Example d iie narno ako h rp.Asml ran- simple A graph. the on walk random a in visited α .Alfientok have networks five All ). n aebe osrce ohv h same the have to constructed been have and ) nenahec itiuin h oe nteln ewr r ooe codn oterlclasraiiyvalue. assortativity local their to according colored are network line the in nodes the distribution, each Underneath . ` 2 rval,oecudcluaetelclassortativity local the calculate could one Trivially, . as 1 ool osdrteimdaenihosof neighbors immediate the consider only to e gh i up onode to jumps = i : X y r i = ( g ` A g m = ) ij ooewt type with one to j cd π : X y { / m i j c k ∞ 80 = = = 1 i cc h n,i nudrce net- undirected an in and, , ` r , snihosaeo h same the of are neighbors ’s k because global c π + j i 2 i / , de ewe oe of nodes between edges yslciga outgoing an selecting by A m 2 d k m i ij 1 0 = dd , , A d fbiga node at being of 80 = 2 oa atrsof patterns Local . ). 1 n } r π − h iderow middle The . ( i 40 = h ` A P de between edges ) ecnthen can We . { ij htcaptures that c g / , n edges and ) k a c d i g radia- a or ) 2 } 1 = 0 = further y i r / . global fa of 2 [3] i m ` is . . A stedvainfo h lblassortativity, the global the of from value deviation nodes the global as the between to neighborhood links local of the in proportion P type same the compare and nodes local the how interest, over on of we based node assortativity, distribution network the stationary of the to in are measure equal edges they local with the edges our reweight all instead create visits walker To random probability. stationary the as just (see series time random this of D 1) section of autocorre- the lag as time network (with the lation of assortativity global the interpret o iesre fnd trbts enda eoebtnow but before as defined attributes, ran- node the of of series autocovariance time (normalized) dom a as assortativity local ( assortativity restarts) global never the to node) tial ( assortativity use and to way may kernels only graph (22)]. the of suitable number not be a however, [e.g., neighborhood is, local a It the define to (21). connections model and given block (20) choice, detection stochastic community intuitive local an in analysis. is role its series vector time PageRank in personalized used The commonly filter exponential tioned 3 (Fig. line, distribution a in linked nodes interest of of node the to ity return we step, time each distri- stationary bution the vector, PageRank distribution personalized a well-known define to is remains that All sottvt scluae a nEq. in (as calculated is Assortativity ) ecnnwcluaealclassortativity local a calculate now can We lblasraiiycut l de ntentokequally, network the in edges all counts assortativity Global g (1 ( − e − gg α w ( ( . α ` α α ) frdetails). for ) C ( neapeo h oa sottvt ple oasml line simple a to applied assortativity local the of example An ) − Fg 3 (Fig. i oitroaefo h rva oa neighborhood local trivial the from interpolate to ; r e ` ( α gg ) ` w e w = ) r 0 = fasml admwl,mdfids ht at that, so modified walk, random simple a of ) gh 1 ( = ( hnw a aclt h oa assortativity local the calculate can we Then . A i i ( ; ( ` ; ` B .I h pca aeo ewr consisting network a of case special the In ). h admwle ee evsteini- the leaves never walker random the , ` Q Q = ) )o h admwl ihrsat at restarts with walk random the of )) ` = ) n saaoost h rvosymen- previously the to analogous is and ) ) 1 1 max max , π r i global nEq. in : X X w y g i α ν = ( ( g ( i ` e Fg 3 (Fig. ; + ) j gg : ` 3 X y 1 ) j ( codn oteata proportion actual the to according ) iha lentv distribution alternative an with = ` orsod oa exponential an to corresponds X ` ) h ed ob elcn the replacing by so do We . g − w C e α a ( .W a love this view also can We ). gg i g 2 1 = ; NSLts Articles Latest PNAS ) ` − w . ) A h admwalker random the , ( X k i r g i ij ; α ` ( , ) a ` ecos the choose We . ` g ) 2 ! ihprobabil- with o ahnode each for B h oe in nodes The ) ` o different for IText SI | ν ( f6 of 3 ` = ) [6] [5] [4] ,

APPLIED MATHEMATICS generated by a stationary random walker with restarts and only of nodes and incident edges across different node types. This when traversing edges of the original network. effect is particularly pronounced for the attribute “Metabolic Category,” for which we observe two distinct peaks in the dis- Choice of α. We can use α to interpolate from the global mea- tribution. The larger peak that occurs around rmulti ≈ 0 repre- sure at α = 1 to the local measure based only on the neighbors sents all species that belong to the metabolic category “plant” of ` when α = 0. As previously mentioned, either extreme can be and accounts for the majority (348/492) of the species in the net- problematic: r1 is uniform across network, while r0 may be based work, upon which approximately two-thirds of the edges are inci- on a small sample (particularly in the case of low degree nodes) dent. This bias in the distribution of edges across the different and therefore subject to overfitting. Moreover, both extremes are node types means that randomly assigned edges are more likely blind to the possible existence of coherent regions of assortativity to connect two nodes of the majority class than any other pair of inside the network, as r1 considers the network as a whole, while nodes. In fact, it is impossible to assign edges such that nodes in r0 considers the local assortativities of the nodes as independent each Metabolic Category exhibit (approximately) the same assor- entities. tativity as the global value. Specifically, to achieve rglobal = −0.13, To circumvent these issues, we consider calculating the assor- it is necessary that more than half of the edges connect species tativity across multiple scales by calculating a “multiscale” from different metabolic categories. However, this is impossible distribution wmulti by integrating over all possible values of for the plant category without changing the distribution of edges α (23), over categories.

Z 1 Facebook 100. We next consider a set of online social networks wmulti(i; `) = wα(i; `) dα, [7] collected from the Facebook platform at a time 0 when it was only open to 100 US universities (3). The process which is effectively the same as treating α as an unknown with of incrementally providing these universities access to the plat- a uniform prior distribution (see Supporting Information for form meant that, at this point in time, very few links existed details). Using this distribution, we can calculate a multiscale between each of the universities’ networks, which provides the measure rmulti that captures the assortativity of a given node opportunities to study each of these social systems in a rel- across all scales. atively independent manner. One of the original studies on As a simple demonstration, we return to Fig. 2 in which the this dataset examined the assortativity of each demographic distribution of rmulti for each synthetic network is shown in the attribute in each of the networks (3). This study found some bottom row. We see, under homogeneous mixing (Fig. 2A), a uni- common patterns that occurred in many of the networks, such modal distribution peaked around 0, confirming that the global as a tendency to be assortative by matriculation year and dor- measure rglobal is representative of the mixing patterns in the net- mitory of residence, with some variation around the magnitude work. However, when mixing is heterogeneous (Fig. 2 B–E), we of assortativity for each of the attributes across the different observe multimodal distributions of rmulti that allow us to disam- universities. biguate between different local mixing patterns. In this case, it makes sense to analyze the universities sepa- rately, since it is reasonable to assume that university member- 2. Real Networks ship played an important and restricting role in the organization Next we use rmulti to evaluate the mixing patterns in some real of the network. However, a modern version of this dataset might networks: an ecological network and set of online social net- contain a higher density of interuniversity links, making it less works. In both cases, nodes have multiple attributes assigned to reasonable to treat them independently; in general, partitioning them, providing different dimensions of analysis. networks based on attributes without careful considerations can be problematic (24). Weddell Sea Food Web. We first examine a network of ecologi- Fig. 5 depicts the distributions of rmulti for each of the 100 cal consumer interactions between species dwelling in the Wed- networks according to dormitory. For many of the networks, we dell Sea (4). Fig. 4 shows the distributions (green) of local observe a positively skewed distribution. The surrounding sub- assortativity for five different categorical node attributes. For plots show details for four universities with approximately the comparison, we present a null distribution (black) obtained by same global assortativity rglobal ≈ 0.13, but with qualitatively dif- randomly rewiring the edges such that the attribute values, ferent distributions of rmulti. Common across these distributions degree sequence, and global assortativity are all preserved (see is that all of the empirical distributions exhibit a positive skew Supporting Information). In each case, we observe skewed and/or beyond that of the null distribution. Closer inspection reveals multimodal distributions. The empirical distributions appear that, in all four networks, the nodes associated with a higher local overdispersed compared with the null distributions. assortativity belong to a community of nodes more loosely con- It may be surprising to see that, for some attributes, the null nected to the rest of the network. These nodes also correspond to distribution appears to be multimodal. Closer inspection reveals first-year students, which suggests that residence is more relevant that the different modes are correlated with the attribute values to friendship among new students than it is for the rest of the stu- and that multimodality arises from the unbalanced distribution dent body. We see this pattern in many of the other schools too:

Fig. 4. Multiscale assortativity for different attributes in the Weddell Sea Food Web. The observed distribution is depicted by the solid green bars. The black outline shows the null distribution.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1713019115 Peel et al. urudn lt,i hc ahdtrpeet tdn ylo rtya;bak=ohr) lelnsindicate lines al. Blue et Peel others). = black first-year; = (yellow student a represents dot ( each in year which difference in by zero plots, than indicates surrounding dorm line by horizontal assortative red more the are which nodes of 6. Fig. local between correlation the university, compares each 6 for shows, Fig. plot correlation. ter on com- focus directly be we r not why may between is values assortativ- normalization which actual parable, in the the differences that replace that mean may attributes Note attribute another. indicates of one correlation ity negative of relation- a assortativity a may suggest while that could This attributes, correlation network. between positive a a ship as across interest, covaries of attributes be multiple of see ing schools); of 75% than schools (more all residence Information (for Supporting by year and by values one) higher assortative exhibit except more and network are the students of First-year rest the to connected global loosely line). same more cyan ( be the dashed to markers. the approximately tend of square has right students blue them the first-year the of to the by Each (nodes that indicated assortativity detail. trend is of in common assortativity shown a global exists are The there Wesleyan) range. different, and interquartile Haverford, Wellesley, the (Dartmouth, show ( schools lines assortativity four black for solid distributions the while The percentiles, 90th and 10th 5. Fig. CD A multi ecnas s oa sottvt ocmaehwtemix- the how compare to assortativity local use also can We o ero td n lc frsdne h eta scat- central The residence. of place and study of year for cte lt nwihec coli on,idctn h orlto flclasraiiisb omadmtiuainya ( year matriculation and dorm by assortativities local of correlation the indicating point, a is school each which in plot, scatter A itiuin ftelclasraiiyb eiec dr)frec fteshosi h aeok10dtst() otdbaklnsidct the indicate lines black Dotted (3). dataset 100 Facebook the in schools the of each for (dorm) residence by assortativity local the of Distributions r global ≈ 0 . 3,bttedsrbtosidct ifrn eeso eeoeet ntepteno iigb eiec.Wietedsrbtosare distributions the While residence. by mixing of pattern the in heterogeneity of levels different indicate distributions the but 13), o details. for r global on itiuin ftedr n erasraiiisfrfu fteshosaesoni the in shown are schools the of four for assortativities year and dorm the of distributions Joint . y xs.Frrfrne h levria ieidctszr sottvt rno iig,and mixing), (random assortativity zero indicates line vertical blue the reference, For axis). ewe omadya oa sottvt.Hwvr nSimmons, in However, assortativity. local year and dorm between dorm. a sharing or year assorta- same either dorm the with in associated and being are year friendships many between that correlation suggesting Pepperdine, tivity, negative and Auburn a both is In by there both. assortative universities, or most most residence, the year, In were either students year. first-year first that their observed yellow we in The universities. students specific indicate for points assortativity year of local distribution dorm joint (3). and the ( assortativities show subplots compare network surrounding to four each way The only for the assortativities previously global was which two ( the attributes in two ence the of assortativities B o imn n ie eosreapstv correlation positive a observe we Rice, and Simmons For r global . x xs gis h differ- the against axis) NSLts Articles Latest PNAS x xs n proportion and axis) y | axis), f6 of 5 A – D )

APPLIED MATHEMATICS we see that the first-year students form a separate cluster, while, in a relationship between particular node properties and existence Rice, they are much more interspersed. This difference may relate of links in part of a network, it does not imply that this rela- to how students are placed in university dorms. At Simmons, tionship exists across the network as a whole. This heterogene- all first-year students live on campus (www.simmons.edu/ ity has implications for how we make generalizations in net- student-life/life-at-simmons/housing/residence-halls) and form work data, as what we observe in a subgraph might not nec- the majority residents in the few dorms they occupy. Rice houses essarily apply to the rest of the network. However, it may also their new intakes according to a different strategy, by placing present new opportunities too. Recent results show that, with an them evenly spread across all of the available dorms. The fact appropriately constructed learning algorithm, it is still possible that students are mixed across years and that the vast major- to make accurate predictions about node attributes in networks ity [almost 78% (campushousing.rice.edu/)] of students reside with heterogeneous mixing patterns (25) and, in some cases, even in university accommodation offers a possible explanation for utilize the heterogeneity to further improve performance (26). why we observe a smooth variation in values of assortativity Quantifying local assortativity offers a new dimension to study without a distinction between new students and the rest of the this predictive performance. Heterogeneous mixing also offers population. a potential new perspective for the community detection prob- lem (27), i.e., to identify sets of nodes with similar assortativity, 3. Discussion which may be useful in the study of “echo chambers” in social Characterizing the level of assortativity plays an important role networks (28). in understanding the organization of complex systems. However, Our approach to quantifying local mixing could easily be the global assortativity may not be representative, given the vari- applied to any global network measure, such as clustering coef- ation present in the network. We have shown that the distribu- ficient or mean degree. It may also be used to capture the tion of mixing in real networks can be skewed, overdispersed, local correlation between node attributes and their degree, a and possibly multimodal. In fact, for certain network configura- relationship that plays a definitive role in network phenomena tions, we have seen that a unimodal distribution may not even be such as the majority illusion (29) and the generalized friendship possible. paradox (30). As network data grow bigger, there is a greater possibility for ACKNOWLEDGMENTS. This work was supported by Concerted Research heterogeneous subgroups to coexist within the overall popula- Action (ARC) supported by the Federation Wallonia-Brussels Contract ARC 14/19-060 (to L.P., J.-C.D., and R.L.); Fonds de la Recherche Scientifique-Fonds tion. The presence of these subpopulations adds further to the National de le Recherche Scientifique (L.P.); and Flagship European Research ongoing discussions of the interplay between node metadata and Area Network (FLAG-ERA) Joint Transnational Call “FuturICT 2.0” (J.-C.D. network structure (24) and suggests that, while we may observe and R.L.).

1. Krackhardt D (1999) The ties that torture: Simmelian tie analysis in organizations. Res 21. Kloumann IM, Ugander J, Kleinberg J (2016) Block models and personalized Sociol Organ 16:183–210. PageRank. Proc Natl Acad Sci USA 114:33–38. 2. Lazega E (2001) The Collegial Phenomenon: The Social Mechanisms of Cooper- 22. Fouss F, Saerens M, Shimbo M (2016) Algorithms and Models for Network Data and ation Among Peers in a Corporate Law Partnership (Oxford Univ Press, Oxford, Link Analysis (Cambridge Univ Press, Cambridge, UK). UK). 23. Boldi P (2005) Totalrank: Ranking without damping. Special Interest Tracks and 3. Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A Posters of the 14th International Conference on World Wide Web (WWW) (Assoc 391:4165–4180. Comput Machinery, New York), pp 898–899. 4. Brose U, et al. (2005) Body sizes of consumers and their resources. Ecology 86: 24. Peel L, Larremore DB, Clauset A (2017) The ground truth about metadata and com- 2545. munity detection in networks. Sci Adv 3:e1602548. 5. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large-scale organiza- 25. Peel L (2017) Graph-based semi- for relational networks. SIAM tion of metabolic networks. Nature 407:651–654. International Conference on (Soc Industrial Appl Math, Philadelphia), 6. Albert R, Jeong H, Barabasi´ A-L (1999) Internet: Diameter of the worldwide web. pp 435–443. Nature 401:130–131. 26. Altenburger KM, Ugander J (2017) Bias and variance in the social structure of gender. 7. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature arXiv:1705.04774. 393:440–442. 27. Schaub MT, Delvenne J-C, Rosvall M, Lambiotte R (2017) The many facets of commu- 8. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: Homophily in social nity detection in complex networks. Appl Netw Sci 2:4. networks. Annu Rev Sociol 27:415–444. 28. Colleoni E, Rozza A, Arvidsson A (2014) Echo chamber or public sphere? predicting 9. Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence-based conta- political orientation and measuring political homophily in twitter using big data. J gion from homophily-driven diffusion in dynamic networks. Proc Natl Acad Sci USA Commun 64:317–332. 106:21544–21549. 29. Lerman K, Yan X, Wu X-Z (2016) The “majority illusion” in social networks. PLoS One 10. Moody J (2001) Race, school integration, and friendship segregation in America. Am 11:e0147617. J Sociol 107:679–716. 30. Eom Y-H, Jo H-H (2014) Generalized in complex networks: The 11. Cohen JM (1977) Sources of peer group homogeneity. Sociol Educ 50:227–241. case of scientific collaboration. Sci Rep 4:4603. 12. Kandel DB (1978) Homophily, selection, and socialization in adolescent friendships. 31. Yule GU (1912) On the methods of measuring association between two attributes. J Am J Sociol 84:427–436. R Stat Soc 75:579–652. 13. Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67:026126. 32. Ferguson GA (1941) The factorial interpretation of test difficulty. Psychometrika 14. McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. Adv 6:323–329. Neur 25:539–547. 33. Guilford JP (1950) Fundamental Statistics in Psychology and Education (McGraw-Hill, 15. Sampson SF (1968) A novitiate in a period of change: An experimental and case study New York). of social relationships. PhD thesis (Cornell Univ, Ithaca, NY). 34. Davenport EC, Jr, El-Sanhurry NA (1991) Phi/Phimax: Review and synthesis. Educ Psy- 16. Zachary WW (1977) An information flow model for conflict and fission in small chol Meas 51:821–828. groups. J Anthropol Res 33 452–473. 35. Cureton EE (1959) Note on ϕ/ϕ max. Psychometrika 24:89–91. 17. Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook 36. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas . arXiv:1111.4503. 20:37–46. 18. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in net- 37. Boldi P, Santini M, Vigna S (2007) A deeper investigation of PageRank as a function of works. Phys Rev E 69:026113. the damping factor. Web Information Retrieval and Linear Algebra Algorithms (Inter- 19. Anscombe FJ (1973) Graphs in statistical analysis. Am Stat 27:17–21. nationales Begegnungs- und Forschungszentrum fur¨ Informatik, Dagstuhl, Germany), 20. Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. 07071. Foundations of Computer Science (FOCS’06) (Inst Electr Electron Eng, New York), pp 38. Fosdick BK, Larremore DB, Nishimura J, Ugander J (2016) Configuring 475–486. models with fixed degree sequences. arXiv:1608.00607.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1713019115 Peel et al.