Arxiv:1708.01236V2 [Cs.SI] 18 Apr 2018 Smoking and Drinking Habits [11, 12]
Total Page:16
File Type:pdf, Size:1020Kb
Multiscale mixing patterns in networks Leto Peel,1, 2, ∗ Jean-Charles Delvenne,1, 3, y and Renaud Lambiotte4, z 1ICTEAM, Universit´ecatholique de Louvain, Louvain-la-Neuve, B-1348, Belgium 2naXys, Universit de Namur, Namur, B-5000, Belgium 3CORE, Universit´ecatholique de Louvain, Louvain-la-Neuve, B-1348, Belgium 4Mathematical Institute, University of Oxford, Oxford, UK Assortative mixing in networks is the tendency for nodes with the same attributes, or metadata, to link to each other. It is a property often found in social networks manifesting as a higher tendency of links occurring between people with the same age, race, or political belief. Quantifying the level of assortativity or disassortativity (the preference of linking to nodes with different attributes) can shed light on the organisation of complex networks. It is common practice to measure the level of assortativity according to the assortativity coefficient, or modularity in the case of categorical metadata. This global value is the average level of assortativity across the network and may not be a representative statistic when mixing patterns are heterogeneous. For example, a social network spanning the globe may exhibit local differences in mixing patterns as a consequence of differences in cultural norms. Here, we introduce an approach to localise this global measure so that we can describe the assortativity, across multiple scales, at the node level. Consequently we are able to capture and qualitatively evaluate the distribution of mixing patterns in the network. We find that for many real-world networks the distribution of assortativity is skewed, overdispersed and multimodal. Our method provides a clearer lens through which we can more closely examine mixing patterns in networks. Networks are used as a common representation for a wide variety of complex systems, spanning social [1{3], biological [4, 5] and technological [6, 7] domains. Nodes are used to represent entities or components of the system and links between them used to indicate pairwise inter- actions. The link formation processes in these systems are still largely unknown, but the broad variety of ob- served structures suggest that they are diverse. One ap- proach to characterise the network structure is based on the correlation, or assortative mixing, of node attributes (or \metadata") across edges. This analysis allows us to make generalisations about whether we are more likely to observe links between nodes with the same characteristics (assortativity) or between those with different ones (dis- assortativity). Social networks frequently contain posi- tive correlations of attribute values across connections [8]. FIG. 1. Local assortativity of gender in a sample of Facebook These correlations occur as a result of the complemen- friendships [14]. Different regions of the graph exhibit strik- ingly different patterns, suggesting that a single variable, e.g. tary processes of selection (or \homophily") and influ- global assortativity, would provide a poor description of the ence (or \contagion") [9]. For example, assortativity has system. frequently been observed with respect to age, race and social status [10], as well as behavioural patterns such as arXiv:1708.01236v2 [cs.SI] 18 Apr 2018 smoking and drinking habits [11, 12]. Examples of disas- sortativity in a network is by calculating the assortativ- sortative networks include heterosexual dating networks ity coefficient [13]. Such a summary statistic is useful to (gender), ecological food webs (metabolic category), and capture the average mixing pattern across the whole net- technological and biological networks (node degree) [13]. work. However, such a generalisation is only really mean- It is important to note that just as correlation does not ingful if it is representative of the population of nodes in imply causation, observations of assortativity are insuf- the network, i.e., if the assortativity of most individuals ficient to imply a specific generative process for the net- is concentrated around the mean. But when networks work. are heterogeneous and contain diverse mixing patterns, The standard approach to quantifying the level of as- a single global measure may not present an accurate de- scription. Furthermore it does not provide a means for quantifying the diversity or identifying anomalous or out- ∗ [email protected] lier patterns of interaction. y [email protected] Quantifying diversity and measuring how mixing may z [email protected] vary across a network becomes a particularly perti- 2 10202020 7838 2 200 200 7838 2 0 0 7810 202 0 0 7838 2 0 0 102020 0 8020 200 0 80 0 100 80 0 0 80 0 1020 0 2 100 202 100 202 0 2 10 7838 7810 7810 7838 FIG. 2. Five networks (top) of n = 40 nodes and m = 160 edges with the same global assortativity rglobal = 0, but with different local mixing patterns as shown by the distributions of rmulti (bottom). nent issue with modern advances in technology that I. MIXING IN NETWORKS have enabled us to capture, store and process massive- scale networks. Previously social interaction data was collected via time-consuming manual processes of con- Currently the standard approach to measure the ducting surveys or observations. For practical reasons propensity of links to occur between similar nodes is these were often limited to a specific organisation or to use the assortativity coefficient introduced by New- group [1, 2, 15, 16]. Summarising the pattern of as- man [13]. Here we will focus on undirected networks sortative mixing as a single value may be reasonable and categorical node attributes, but assortativity and the for these small-scale networks that tend to focus on a methods we propose naturally extend to directed net- single social dimension (e.g., a specific working environ- works and scalar attributes (see Appendix A & B). ment or common interest). Now, technology such as on- The global assortativity coefficient rglobal for categori- line social media platforms allow for the automatic col- cal attributes compares the proportion of links connect- lection of increasingly larger amounts of social interac- ing nodes with same attribute value, or type, relative to tion data. For instance, the largest connected compo- the proportion expected if the edges in the network were nent of the Facebook network was previously reported randomly rewired. The difference between these pro- to account for approximately 10% of the global popula- portions is commonly known as modularity Q, a mea- tion [17]. These vast multi-dimensional social networks sure frequently used in the task of community detec- present more opportunities for heterogeneous mixing pat- tion [18]. The assortativity coefficient is normalised such terns, which could conceivably arise, for example, due that rglobal = 1 if all edges only connect nodes of the same to differences in demographic and cultural backgrounds. type (i.e., maximum modularity Qmax) and rglobal = 0 if Figure 1 shows, using the methods we will introduce, an the number of edges is equal to the expected number for example of this variation in mixing on a subset of nodes a randomly rewired network in which the total number in the Facebook social network [14]. A high variation of edges incident on each type of node is held constant. in mixing patterns indicates that the global assortativity The global assortativity rglobal is given by [13] may be a poor representation of the entire population. To address this issue, we develop a node-centric measure P P 2 of the assortativity within a local neighbourhood. Vary- Q g egg − g ag rglobal = = P 2 ; (1) ing the size of the neighbourhood allows us to interpolate Qmax 1 − g ag from the mixing pattern between an individual node and its neighbours to the global assortativity coefficient. In a number of real-world networks we find that the global as- in which egh is half the proportion of edges in the network sortativity is not representative of the collective patterns that connect nodes with type yi = g to nodes with type yj = h (or the proportion of edges if g = h) and ag = of mixing. P P h egh = i2g ki=2m is the sum of degrees (ki) of nodes with type g, normalised by twice the number of edges, m. 3 We calculate egh as: graph. A simple random walker at node i jumps to node j by selecting an outgoing edge with equal probability, 1 X X Aij e = A ; (2) k and, in an undirected network, the stationary proba- gh 2m ij i i:yi=g j:yj =h bility πi = ki=2m of being at node i is proportional to its degree. Then, every edge of the network is traversed in where A is an element of the adjacency matrix. The Aij ij each direction with equal probability πi k = 1=2m. In P 2 i normalisation constant Qmax = 1 − g ag ensures that this context, a key observation is that we can equivalently the assortativity coefficient lies in the range −1 ≤ r ≤ 1 rewrite (2) as, (see Appendix C). X X Aij e = π ; (3) gh i k i:y =g i A. Local patterns of mixing i j:yj =h which is the total probability that a simple random The summary statistic rglobal describes the average walker will jump from a node with type g to one with mixing pattern over the whole network. But as with all type h. We can then interpret the global assortativity of summary statistics there may be cases where it provides the network as the autocorrelation (with time lag of 1) a poor representation of the network e.g., if the network of this random time-series (see Appendix D for details). contains localised heterogeneous patterns. Figure 2 il- Global assortativity counts all edges in the network lustrates an analogy to Anscombe's quartet of bivariate equally just as the stationary random walker visits all datasets with identical correlation coefficients [19].