COMPLEX NETWORKS measures NODE

• We can measure nodes importance using so-called centrality.

• Bad term: nothing to do with being central in general

• Usage: ‣ Some have straightforward interpretation ‣ Centralities can be used as node features for machine learning on graph - (Classification, , …) Connectivity based centrality measures centrality - recap Number of connections of a node

• Undirected network N 2 1 ki = Ai1 + Ai2 + ... + AiN = Aij j X 3 ki m = i where m = E 2 | | 3 1 P 1 N 1 mean degree k = k h i N i 2 i • Directed network X 2 N 0 in In degree ki = Aij 1 j X 3 1 1 N out Out degree kj = Aij 0 i X 1 1 N N m = kin = kout = A 2 i j ij i j ij 0 X X X 1 0 1 N 1 N kin = kin = kout = kout h i N i N j h i i=1 j=1 3 X X NODE DEGREE

• Degree: how many neighbors

• Often enough to find important nodes ‣ Main characters of a series talk with the more people ‣ Largest airports have the most connections ‣ …

• But not always ‣ Facebook users with the most friends are spam ‣ Webpages/wikipedia pages with most links are simple lists of references ‣ … NODE

• Clustering coefficient: closed triangles/triads

• Tells you if the neighbors of the node are connected

• Be careful! ‣ Degree 2: value 0 or 1 ‣ Degree 1000: Not 0 or 1 (usually) ‣ Ranking them is not meaningful

• Can be used as a proxy for “communities” belonging: ‣ If node belong to single group: high CC ‣ If node belong to several groups: lower CC RECURSIVE DEFINITIONS

• Recursive importance: ‣ Important nodes are those connected to important nodes

• Several centralities based on this idea: ‣ ‣ PageRank ‣ Katz centrality ‣ … RECURSIVE DEFINITION and PageRank. Other popular versions are the Katz centralityandhub/authorityscores,whichwe will not cover here. • We would like scores such as : ‣ Each node has a score (centrality), Eigenvector centrality‣ If every node “sends” its score to its neighbors, the sum of all scores received Taking the recursive ideaby each about node importance will be equal to atits original face value,score we may write down the following equation:

n x(t+1) = A x(t) , (1) i ! ij j j=1 where Aij is an element of is thethe centrality adjacency of node matrix i. (and thus selects contributions to i’s importance xi (0) =1 if there is an edge, 0 otherwise based on whether i andAij j are connected), and with the initial condition xi =1foralli.

This formulation is a model in which each “votes” for the importance of its neighbors by transferring some of its importance to them. By iterating theequation,withtheiterationnumber indexed by t,importanceflowsacrossthenetwork.However,thisequationbyitselfwillnotpro- duce useful importance estimates because the values xi increase with t.But,absolutevaluesare not of interest themselves, and relative values may be derived by normalizing at any (or every) step.4

Applying this method to the karate club for different choices of t yields the following table. Notice that by the t =15thiteration,thevectorx has essentially stopped changing, indicating convergence on a fixed point. (Convergence here is particularly fast in part because the network has a small diameter.)

vertex x(1) x(5) x(10) x(15) x(20) degree, k 1 0.103 0.076 0.071 0.071 0.071 16 2 0.058 0.055 0.053 0.053 0.053 9 3 0.064 0.065 0.064 0.064 0.064 10 4 0.038 0.043 0.042 0.042 0.042 6 5 0.019 0.015 0.015 0.015 0.015 3 6 0.026 0.016 0.016 0.016 0.016 4 7 0.026 0.016 0.016 0.016 0.016 4 8 0.026 0.034 0.034 0.034 0.034 4 9 0.032 0.044 0.046 0.046 0.046 5 10 0.013 0.020 0.021 0.021 0.021 2 11 0.019 0.015 0.015 0.015 0.015 3 12 0.006 0.010 0.011 0.011 0.011 1 13 0.013 0.017 0.017 0.017 0.017 2 14 0.032 0.044 0.046 0.045 0.045 5 15 0.013 0.019 0.021 0.020 0.020 2 16 0.013 0.019 0.021 0.020 0.020 2 17 0.013 0.005 0.005 0.005 0.005 2 18 0.013 0.018 0.019 0.019 0.019 2 19 0.013 0.019 0.021 0.020 0.020 2 20 0.019 0.028 0.030 0.030 0.030 3 21 0.013 0.019 0.021 0.020 0.020 2 22 0.013 0.018 0.019 0.019 0.019 2 23 0.013 0.019 0.021 0.020 0.020 2 24 0.032 0.029 0.030 0.030 0.030 5 25 0.019 0.012 0.011 0.011 0.011 3 26 0.019 0.013 0.012 0.012 0.012 3 27 0.013 0.015 0.015 0.015 0.015 2 28 0.026 0.026 0.027 0.027 0.027 4 29 0.019 0.026 0.026 0.026 0.026 3 30 0.026 0.026 0.027 0.027 0.027 4 31 0.026 0.034 0.035 0.035 0.035 4 32 0.038 0.037 0.039 0.038 0.038 6 33 0.077 0.066 0.062 0.062 0.062 12 34 0.109 0.082 0.074 0.075 0.075 17 Illustrating the close relationship between degree and eigenvector centrality, the centrality scores here are larger among the high-degree vertices, e.g., 1, 34, 33, 3 and 2.

4Large values of t will tend to produce overflow errors in most matrix computations, and thus normalizing is a necessary of a complete calculation.

3 RECURSIVE DEFINITION and PageRank. Other popular versions are the Katz centralityandhub/authorityscores,whichwe will not cover here.

Eigenvector centrality • This problem can Takingbe solved the recursive by ideawhat about is importance called atthe face power value, we may write down the following equation: n method: x(t+1) = A x(t) , (1) i ! ij j ‣ 1) We initialize all scores to random values j=1

‣ 2)Each score is updatedwhere accordingAij is an elementto the ofdesired the adjacency rule, until matrix reaching (and thus a stable selects contributions to i’s importance (0) point (after normalization)based on whether i and j are connected), and with the initial condition xi =1foralli.

This formulation is a model in which each vertex “votes” for the importance of its neighbors by • Why does it converge?transferring some of its importance to them. By iterating theequation,withtheiterationnumber ‣ Perron-Frobenius theoremindexed for by treal,importanceflowsacrossthenetwork.However,thisequatio and irreducible square matrices with non- nbyitselfwillnotpro- duce useful importance estimates because the values xi increase with t.But,absolutevaluesare negative entries not of interest themselves, and relative values may be derived by normalizing at any (or every) step.4 ‣ =>True for undirected graphs with a single connected component Applying this method to the karate club for different choices of t yields the following table. Notice that by the t =15thiteration,thevectorx has essentially stopped changing, indicating convergence on a fixed point. (Convergence here is particularly fast in part because the network has a small diameter.)

vertex x(1) x(5) x(10) x(15) x(20) degree, k 1 0.103 0.076 0.071 0.071 0.071 16 2 0.058 0.055 0.053 0.053 0.053 9 3 0.064 0.065 0.064 0.064 0.064 10 4 0.038 0.043 0.042 0.042 0.042 6 5 0.019 0.015 0.015 0.015 0.015 3 6 0.026 0.016 0.016 0.016 0.016 4 7 0.026 0.016 0.016 0.016 0.016 4 8 0.026 0.034 0.034 0.034 0.034 4 9 0.032 0.044 0.046 0.046 0.046 5 10 0.013 0.020 0.021 0.021 0.021 2 11 0.019 0.015 0.015 0.015 0.015 3 12 0.006 0.010 0.011 0.011 0.011 1 13 0.013 0.017 0.017 0.017 0.017 2 14 0.032 0.044 0.046 0.045 0.045 5 15 0.013 0.019 0.021 0.020 0.020 2 16 0.013 0.019 0.021 0.020 0.020 2 17 0.013 0.005 0.005 0.005 0.005 2 18 0.013 0.018 0.019 0.019 0.019 2 19 0.013 0.019 0.021 0.020 0.020 2 20 0.019 0.028 0.030 0.030 0.030 3 21 0.013 0.019 0.021 0.020 0.020 2 22 0.013 0.018 0.019 0.019 0.019 2 23 0.013 0.019 0.021 0.020 0.020 2 24 0.032 0.029 0.030 0.030 0.030 5 25 0.019 0.012 0.011 0.011 0.011 3 26 0.019 0.013 0.012 0.012 0.012 3 27 0.013 0.015 0.015 0.015 0.015 2 28 0.026 0.026 0.027 0.027 0.027 4 29 0.019 0.026 0.026 0.026 0.026 3 30 0.026 0.026 0.027 0.027 0.027 4 31 0.026 0.034 0.035 0.035 0.035 4 32 0.038 0.037 0.039 0.038 0.038 6 33 0.077 0.066 0.062 0.062 0.062 12 34 0.109 0.082 0.074 0.075 0.075 17 Illustrating the close relationship between degree and eigenvector centrality, the centrality scores here are larger among the high-degree vertices, e.g., 1, 34, 33, 3 and 2.

4Large values of t will tend to produce overflow errors in most matrix computations, and thus normalizing is a necessary component of a complete calculation.

3 EIGENVECTOR CENTRALITY

• What we just described is called the Eigenvector centrality

• A couple eigenvector (x) and eigenvalue (λ) is defined by the following relation: Ax = λx ‣ x is a vector of size n, which can be interpreted as the scores of nodes ‣ Ax yield a new vector of size n, which corresponds for each node to receive the sum of the scores of its neighbors (like in the power method) ‣ The equality means that the new scores are proportional to the previous scores

• What Perron-Frobenius algorithm says is that the power method will always converge to the leading eigenvector, i.e., the eigenvector associated with the highest eigenvalue EIGENVECTOR CENTRALITY

normalized=> Eigenvector Centrality Eigenvector centrality — Bonacich centrality Some problems in case of directed network: I am important if my friends are important too Vertex A is connected but • is asymmetric has only outgoing link = Its centrality will be 0 • 2 sets of eigenvectors (Left & Right)

• 2 leading eigenvectors Vertex B has outgoing and ingoing • Use right eigenvectors : consider nodes that are pointing towards you But Ingoing comes from A = Its centrality will be 0 But problem with source nodes (0 in-degree) -Vertex A is connected but has only outgoing link = Its centrality will be 0 Only in strongly connected component -Vertex B has outgoing and an incoming link, but incoming link comes from A = Its centrality will be 0 Acyclic networks (citation network) do not have strongly connected component -etc. 17

Solution: Only in strongly connected component Note: Acyclic networks (citation network) do not have strongly connected component PageRank Centrality

• Eigenvector centrality generalised for directed networks PageRank

The Anatomy of a Large-Scale Hypertextual Web Search Engine!

Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Seventh International World-Wide Web Conference (WWW 1998), April 14-18, 1998, Brisbane, Australia.!

Wednesday, November 14, 12 PageRank Centrality

• Eigenvector centrality generalised for directed networks PageRank

The Anatomy of a Large-Scale Hypertextual Web Search Engine!

Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Seventh International World-Wide Web Conference (WWW 1998), April 14-18, 1998, Brisbane, Australia.!

Wednesday, November 14, 12 PageRank Centrality (Side notes)

-“We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large- scale search “

-“[…] at the same time, search engines have migrated from the academic domain to the commercial. Up until now most search engine development has gone on at companies with little publication of technical details. This causes search engine technology to remain largely a black art and to be advertising oriented (see Appendix A). With Google, we have a strong goal to push more development and understanding into the academic realm.”

-“[...], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers." PageRank Centrality (Side notes)

Sergey Brin received his B.S. degree in mathematics and computer science from the University of Maryland at College Park in 1993. Currently, he is a Ph.D. candidate in computer science at Stanford University where he received his M.S. in 1995. He is a recipient of a National Science Foundation Graduate Fellowship. His research interests include search engines, information extraction from unstructured sources, and data mining of large text collections and scientific data.

Lawrence Page was born in East Lansing, Michigan, and received a B.S.E. in Computer Engineering at the University of Michigan Ann Arbor in 1995. He is currently a Ph.D. candidate in Computer Science at Stanford University. Some of his research interests include the link structure of the web, human computer interaction, search engines, scalability of information access interfaces, and personal data mining.

8 Appendix A: Advertising and Mixed Motives

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who "deserves" to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. Furthermore, advertising income often provides an incentive to provide poor PageRank PageRank is another kind of eigenvector centrality,7 but which has some nicer features than the Bonacich (and Katz) definitions. In particular, the classic eigenvector centrality performs poorly when applied to directedPAGERANK networks. In general, centralitieswillbezeroforallverticesnotwithina strongly connected component, even if those vertices have high in-degree. Moreover, in an directed acyclic graph, there are no strongly connected components larger than a single vertex, and thus only vertices with no out-going edges (kout =0)willhavenon-zerocentrality.Thesearenotdesir- able behaviors for a centrality score.

PageRank solves these problems by adding two features to our vertex voting model. First, it assigns every vertex a small amount of centrality regardless of its network position. This eliminates the • 2 main improvementsproblems caused by vertices over with eigenvector zero in-degree—who centrality: have no other way of gaining any centrality— and allows them to contribute to the centrality of vertices they link to. As a result, vertices with ‣ In directedhigh networks, in-degree will problem tend to haveof source higher centralitynodes as a resultofbeinglinkedto,regardlessofwhether - => Addthose a constant neighbors centrality themselves gain for have every any node links to them. Second, itdividesthecentralitycontribution and PageRank. Other popular‣ versionsNodes arewithof the a vertexvery Katz high bycentralit its centralities out-degree.yandhub/authorityscores,whichwe give This very eliminates high thecentralities problemat toic situationall their neighbors in which a large number will not cover here. of vertex centralities are increased merely because they arepointedtobyasinglehigh-centrality (even if vertex.that is their only in-coming link) - => What each node “is worth” is divided equally among its neighbors (normalization by the Eigenvector centrality degree) Taking the recursive idea about importanceMathematically, at face value, we themay addition write down of these the featuresfollowing modifies equation: Eq. (1) to become

n n t t xj ( +1) ( ) xi = α Aij + β , (3) xi = Aijxj , out(1) ! => ! kj j=1 j=1 where Aij is an element of the adjacencywhere matrixα (andand β thusare selects positive contr constants.ibutions The to i’s first importance term represents the contribution from the classic With by convention(0) =1 and a parameter (usually 0.85) based on whether i and j are connected),(Bonacich) and with the eigenvector initial condition centrality,x while=1forallβ the secondi.α is the “free” or uniform centrality that every vertex receives. The value of β isi a simple scaling constant and thus by convention we will choose β =1;asaresult,α alone scales the relative contributions of the eigenvector and uniform centrality This formulation is a model in which each vertex “votes” for the importance of its neighbors by out transferring some of its importance to them.components. By iterating Further, theequation,withtheiterationnumber we must choose a resolution method for the case of k =0,whichwould indexed by t,importanceflowsacrossthenetwork.However,thisequatioresult in a divide-by-zero in the calculation.nbyitselfwillnotpro- This problem is solved by artificially simply setting kout =1foreachsuchvertex. duce useful importance estimates because the values xi increase with t.But,absolutevaluesare not of interest themselves, and relative values may be derived by normalizing at any (or every) step.4 As a matrix formulation, PageRank is equivalent to Applying this method to the karate club for different choices of t yields the following table. Notice − that by the t =15thiteration,thevectorx has essentially stopped changing, indicatingx = αAD convergence1x + β1 (4) on a fixed point. (Convergence here is particularly fast in part because the network has a small − −1 diameter.) = D(D αA) 1 , (5)

(1) (5) (10) (15) (20) out vertex x wherex x D isx a diagonalx matrixdegree, k with Dii =max(k , 1), as described above, and where we have set 1 0.103 0.076 0.071 0.071 0.071 16 i 2 0.058 0.055β =1. 0.053 0.053 0.053 9 3 0.064 0.065 0.064 0.064 0.064 10 4 0.038 0.043 0.042 0.042 0.042 6 5 0.019 0.015 0.015 0.015 0.015 3 6 0.026 0.0167 0.016 0.016 0.016 4 7 0.026 0.016PageRank 0.016 0.016 is usually 0.016 attributed4 to S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search 8 0.026 0.034engine.” 0.034Computer 0.034 Networks 0.034 and4 ISDN Systems 30,107–117(1998).However,asisoftenthecasewithgoodideas, 9 0.032 0.044 0.046 0.046 0.046 5 10 0.013 0.020it has been0.021 reinvented 0.021 0.021 a number2 of times, and PageRank is, arguably, one of these reinventions. The idea of using 11 0.019 0.015 0.015 0.015 0.015 3 12 0.006 0.010eigenvectors 0.011 in 0.011 a manner 0.011 very1 similar to PageRank goes back asleastasfarasG.PinskiandF.Narin,“Citation 13 0.013 0.017influence 0.017 for journal0.017 0.017 aggregates2 of scientific publications:Theory,withapplicationtotheliteratureofphysics.” 14 0.032 0.044 0.046 0.045 0.045 5 15 0.013 0.019Information 0.021 Processing 0.020 0.020 & Management2 12(5): 297–312 (1976), but may even go back further than that. 16 0.013 0.019 0.021 0.020 0.020 2 17 0.013 0.005 0.005 0.005 0.005 2 18 0.013 0.018 0.019 0.019 0.019 2 19 0.013 0.019 0.021 0.020 0.020 2 20 0.019 0.028 0.030 0.030 0.030 3 5 21 0.013 0.019 0.021 0.020 0.020 2 22 0.013 0.018 0.019 0.019 0.019 2 23 0.013 0.019 0.021 0.020 0.020 2 24 0.032 0.029 0.030 0.030 0.030 5 25 0.019 0.012 0.011 0.011 0.011 3 26 0.019 0.013 0.012 0.012 0.012 3 27 0.013 0.015 0.015 0.015 0.015 2 28 0.026 0.026 0.027 0.027 0.027 4 29 0.019 0.026 0.026 0.026 0.026 3 30 0.026 0.026 0.027 0.027 0.027 4 31 0.026 0.034 0.035 0.035 0.035 4 32 0.038 0.037 0.039 0.038 0.038 6 33 0.077 0.066 0.062 0.062 0.062 12 34 0.109 0.082 0.074 0.075 0.075 17 Illustrating the close relationship between degree and eigenvector centrality, the centrality scores here are larger among the high-degree vertices, e.g., 1, 34, 33, 3 and 2.

4Large values of t will tend to produce overflow errors in most matrix computations, and thus normalizing is a necessary component of a complete calculation.

3 PageRank PageRank is another kind of eigenvector centrality,7 but which has some nicer features than the Bonacich (and Katz) definitions. In particular, the classic eigenvector centrality performs poorly when applied to directed networks. In general, centralitieswillbezeroforallverticesnotwithina strongly connected component, even if those vertices have high in-degree. Moreover, in an directed acyclic graph, there are no strongly connected components larger than a single vertex, and thus only vertices with no out-going edges (kout =0)willhavenon-zerocentrality.Thesearenotdesir- able behaviors for a centrality score.

PageRank solves these problems by adding two features to our vertex voting model. First, it assigns every vertex a small amount of centrality regardless of its network position. This eliminates the problems caused by vertices with zero in-degree—who have no other way of gaining any centrality— and allows them to contribute to the centrality of vertices they link to. As a result, vertices with high in-degree will tend to have higher centrality as a resultofbeinglinkedto,regardlessofwhether those neighbors themselves have any links to them. Second, itdividesthecentralitycontribution of a vertex by its out-degree. This eliminates the problematic situation in which a large number of vertex centralities are increased merely because they arepointedtobyasinglehigh-centrality vertex.

Mathematically, thePAGERANK addition of these features modifies Eq. (1) to become n xj xi = α Aij + β , (3) Intuition ! kout j=1 j

Left term dominateswhere α and whenβ are positivenodes constants.have many The neighbors. first term represents the contribution from the classic Right term (Bonacich)dominate eigenvector with few centrality,neighbors while the second is the “free” or uniform centrality that every vertex receives. The value of β is a simple scaling constant and thus by convention we will choose β =1;asaresult,α alone scales the relative contributions of the eigenvector and uniform centrality components. Further, we must choose a resolution method for the case of kout =0,whichwould Matrix interpretationresult in a divide-by-zero in the calculation. This problem is solved by artificially simply setting kout =1foreachsuchvertex. Principal eigenvector of the “Google Matrix”: First, define Asmatrix a matrix S as: formulation, PageRank is equivalent to − -Normalization by columns of A x = αAD 1x + β1 (4) − -Columns with only 0 receives 1/n = D(D − αA) 11 , (5)

out where D is a diagonal matrix with Dii =max(ki , 1), as described above, and where we have set -Finally, Gij =β =1.αSij + (1 − α)/n

7PageRank is usually attributed to S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine.” Computer Networks and ISDN Systems 30,107–117(1998).However,asisoftenthecasewithgoodideas, it has been reinvented a number of times, and PageRank is, arguably, one of these reinventions. The idea of using eigenvectors in a manner very similar to PageRank goes back asleastasfarasG.PinskiandF.Narin,“Citation influence for journal aggregates of scientific publications:Theory,withapplicationtotheliteratureofphysics.” Information Processing & Management 12(5): 297–312 (1976), but may even go back further than that.

5 PageRank - as Random Walk

Main idea: The PageRank computation can be interpreted as a Random Walk process with restart

• In the Google Matrix, Elements in each row are summing up to 1 • It is a stochastic matrix which can be interpreted as a stationary transition matrix of a random walk process

• Probability that the RW will be in node i next step depends only on the current node j and the transition probability j ➝ i determined by the stochastic matrix

• Consequently this is a first-order Markov process

• Stationary probabilities (i.e., when walk length tends towards ∞) of the RW to be in node i gives the PageRank of the actual node

Teleportation probability: the parameter α gives the probability that in the next step of the RW will follow a Markov process or with probability 1-α it will jump to a random node

• If α<1, it assures that the RW will never be stuck at nodes with kout=0, but it can restart the RW from a randomly selected other node PAGERANK

• Then how do Google rank when we do a research?

• Compute (using the power method for scalability)

• Create a subgraph of documents related to our topic

• Of course now it is certainly much more complex, but we don’t really know: “Most search engine development has gone on at companies with little publication of technical details. This causes search engine technology to remain largely a black art” [Page, Brin, 1997] Katz Centrality

It measures the relative degree of influence of a node within a network

∞ N connected C (i) = αk(Ak) pairs of nodes Katz ∑ ∑ ij in k k=1 j=1 attenuation factor to penalise influence by distance

• Attenuation factor α must be smaller than 1/|λ0|, i.e. the reciprocal of the absolute value of the largest eigenvalue of A.

Matrix form:

⃗ T −1 C Katz = ((I − αA ) − I)I ⃗

• where I is the identity matrix, and I ⃗ is the identity vector • Katz centrality is useful for directed networks (citation nets, WWW) where Eigenvector centrality fails KATZ CENTRALITY

Katz centrality of node i= KATZ CENTRALITY

Repeat for all distances as long As possible (convergence) KATZ CENTRALITY

Sum for each node j KATZ CENTRALITY

Alpha is a parameter. Its strength decreases at each iteration (increased distance) KATZ CENTRALITY

Number of different paths from i to j of length k KATZ CENTRALITY

Sum of paths to all other nodes at each distance multiplied by a factor decreasing with distance Katz Centrality

It measures the relative degree of influence of a node within a network

∞ N connected C (i) = αk(Ak) pairs of nodes Katz ∑ ∑ ij in distance k k=1 j=1 attenuation factor to penalise influence by distance

• Attenuation factor α must be smaller than 1/|λ0|, i.e. the reciprocal of the absolute value of the largest eigenvalue of A.

Matrix form:

⃗ T −1 C Katz = ((I − αA ) − I)I ⃗

• where I is the identity matrix, and I ⃗ is the identity vector • Katz centrality is useful for directed networks (citation nets, WWW) where Eigenvector centrality fails Geometric centrality measures CLOSENESS CENTRALITY

n − 1 Ccl(i) = ∑ dij dij<∞ • Farness: average of length of shortest paths to all other nodes.

• Closeness: inverse of the Farness (normalized by number of nodes) ‣ Highest closeness = More central ‣ Closness=1: directly connected to all other nodes

• Well defined only on connected networks CLOSENESS CENTRALITY

n − 1 Network Analysis and Modeling,C CSCIcl(i) = 5352 Prof. Aaron Clauset Lecture 2 ∑ dij 2017 dij<∞

2 1 2 2 3 i 1 2 2 1 2 2

12 − 1 11 C (i) = = = 0.55 which is the maximal scorec inl the network,(3 × 1 + but7 × one2 + other1 × 3) vertex20 has the same closeness (which one?). Its harmonic centrality is 0.6212 ..., which is the second largest value (what is the largest?). The minimal scores are 0.316 (closeness) and 0.417 (harmonic), which illustrates the narrow range of variation of closeness (less than a factor of 2). (Do you see which vertex produces these scores?)

Applying the harmonic centrality calculation to the karate club network yields the figure on the next page (with circle size scaled to be proportional to the score). The small size of this network tends to compress the centrality scores into a narrow range. Comparing the harmonic scores to degrees, we observe several di↵erences. For instance, the centrality of vertex 17, the only vertex in group 1 that does not connect to the hub vertex 1, is lower than that of vertex 12, which has the lowest degree but connects to the high-degree vertex 1. And, vertex 3 has a harmonic centrality close to that of the main hubs 1 and 34, by virtue of it being “between” the two groups and thus having short paths to all members of each.

22 22

18 18 20 25 26 20 25 26 8 8 10 10 28 28 2 2 4 4 30 30 24 24 27 31 27 31 13 13 1 3 3 1 34 15 34 15

6 6 16 16 32 32 7 7 5 5 14 14 19 12 19 12 33 33 21 17 21 17 9 11 9 11

29 23 29 23

degree harmonic

Relationship to degree-based centralities In fact, degree-based centrality measures are related to -based measures like closeness and harmonic centrality, although they do emphasize di↵erent aspects of network structure. For in-

11 CLOSENESS CENTRALITY Harmonic Centrality

• Harmonic mean of the geodesic (shorted paths) distances from a given node to all others 1 1 Ch(i) = ∑ n − 1 i≠j dij

Network• In case Analysis of no and between Modeling, nodes CSCIi and j: 5352 dij = ∞ Prof. Aaron Clauset Lecture• Well defined 2 on disconnected networks 2017

2 1 2 2 3 i 1 2 2 1 2 2

1 1 1 1 41 Ch(i) = 3 × + 7 × + 1 × = = 0.6212 which is the maximal score12 in− the1 ( network,1 but one2 other3 vertex) 66 has the same closeness (which one?). Its harmonic centrality is 0.6212 ..., which is the second largest value (what is the largest?). The minimal scores are 0.316 (closeness) and 0.417 (harmonic), which illustrates the narrow range of variation of closeness (less than a factor of 2). (Do you see which vertex produces these scores?)

Applying the harmonic centrality calculation to the karate club network yields the figure on the next page (with circle size scaled to be proportional to the score). The small size of this network tends to compress the centrality scores into a narrow range. Comparing the harmonic scores to degrees, we observe several di↵erences. For instance, the centrality of vertex 17, the only vertex in group 1 that does not connect to the hub vertex 1, is lower than that of vertex 12, which has the lowest degree but connects to the high-degree vertex 1. And, vertex 3 has a harmonic centrality close to that of the main hubs 1 and 34, by virtue of it being “between” the two groups and thus having short paths to all members of each.

22 22

18 18 20 25 26 20 25 26 8 8 10 10 28 28 2 2 4 4 30 30 24 24 27 31 27 31 13 13 1 3 3 1 34 15 34 15

6 6 16 16 32 32 7 7 5 5 14 14 19 12 19 12 33 33 21 17 21 17 9 11 9 11

29 23 29 23

degree harmonic

Relationship to degree-based centralities In fact, degree-based centrality measures are related to geodesic-based measures like closeness and harmonic centrality, although they do emphasize di↵erent aspects of network structure. For in-

11 Betweenness Centrality

Assumption: important vertices are bridges over which information flows

Practically: if information spreads via shortest paths, important nodes are found on many shortest paths

Notation: σjk(i) = number of geodesic path from j to k via i: j → … → i → … → k

σjk = number of geodesic path from j to k: j → … → k

Definition:

#{geodesic path: j → … → i → … → k} σjk(i) C (i) = = b ∑ geodesic path: ∑ j≠k #{ j → … → k} j≠k σjk

Normalised definition:

1 σjk(i) C (i) = where b 2 ∑ Cb ∈ [0,1] n j≠k σjk

Total number of ordered vertex pairs Network Analysis and Modeling, CSCI 5352 Prof. Aaron Clauset Lecture 2 2017

The following table compares the relative rankings derived from the di↵erent measures defined in this lecture, along with a ranking by degree, for the top few most central vertices. (The last col- umn gives the betweenness estimated by using only a single SSSP tree from each vertex, in order to illustrate the di↵erences this approximation produces.) A few details are worth pointing out. For instance, closeness scores are all very similar, di↵ering only in their second decimal place, while other scores have greater variability, with betweenness having the broadest range. Although all measures generally agree on which vertices are among the most important (mainly 1, 3, 32 and 34), they disagree on the precise ordering of their importance. Consider applying these measures Network Analysis and Modeling, CSCIto 5352 a novel network: how wouldProf. such disagreementsAaron Clauset complicate your interpretation of which vertices Lecture 2 are most important? what if there were more disagreement2017 about which vertices were in the top five?

degree k/m closeness harmonic betweenness betweenness⇤ The following table compares the relative rankings derived from the di↵erent measuresEq. (7) defined inEq. (8) Eq. (11) Eq. (11) this lecture, along with a ranking by degree, for the1st toplargest few most34 central (0.1090) vertices.1 (0.5862) (The last34 col- (0.7045) 1 (0.4577) 1 (0.4939) Betweennessumn gives the betweennessCentrality estimated by using only2nd alargest single SSSP1 tree (0.1026) from each3 vertex, (0.5763) in order1 (0.7020) 34 (0.3357) 34 (0.2708) to illustrate the di↵erences this approximation produces.)3rd largest A few33 details (0.0769) are worth34 (0.5667) pointing out.3 (0.6364) 33 (0.1906) 32 (0.2638) For instance, closeness scores are all very similar, di4↵theringlargest only in3 their (0.0641) second decimal32 (0.5574) place, while33 (0.6338) 3 (0.1892) 33 (0.2439) other scores have greater variability, with betweenness5th largest havingZachary’s the2 karate broadest(0.0577) club range. 9network (0.5312) Although32 all (0.5859) 32 (0.1843) 3 (0.1912) Network Analysis and Modeling, CSCImeasures 5352 generally agree on which verticesProf. are among Aaron the Clauset most important (mainly 1, 3, 32 and 1 34),σjk they(i) disagree on the precise ordering of their importance. Consider applying these measures Lecture 2 where 22 2017 C (i) = 22 b C ∈ 0,1 18 2 ∑to a novel network: how would suchb disagreements[ ] complicate your interpretation of which vertices 18 20 25 26 n σ 20 25 26 8 jk 10 j≠kare most important? what if there were more disagreement about which vertices28 were in the top five? 8 10 28 2 4 2 30 4 24 24 30 31 27 31 27 13 3 13 3 degree k/m closeness harmonic 1 betweenness betweenness⇤ 1 34 15 15 34 Eq. (7) Eq.6 (8) Eq. (11) Eq. (11) 16 32 6 16 32 7 st 7 5 1 largest 34 (0.1090) 1 (0.5862) 34 (0.7045) 141 (0.4577) 1 (0.4939)19 12 5 14 19 nd 12 2 largest 1 (0.1026) 3 (0.5763) 1 (0.7020) 34 (0.3357) 3433 (0.2708)21 33 17 21 i rd 11 9 17 3 largest 33 (0.0769) 34 (0.5667) 3 (0.6364) 33 (0.1906) 32 (0.2638) 11 9 29 23 29 23 4th largest 3 (0.0641) 32 (0.5574) 33 (0.6338) 3 (0.1892) 33 (0.2439) 5th largest 2 (0.0577) 9 (0.5312) 32 (0.5859) 32degree (0.1843) 3 (0.1912) betweenness 78 C (i) = 22 22 b 18 Finally, we return to the question of correlation between measures, as both harmonic and between- 18 20 25 26 144 20 25 26 8 10 ness centrality are functions of geodesic paths, which may produce correlated rankings. On the 28 8 10 28 2 4 other hand, such2 a correlation is not a foregone conclusion. Consider a vertex v that has only a which is the maximal score in the network, but one other vertex has the30 same closeness4 (which 24 24 30 Exact computation: 31 single27 connection to another, highly central vertex. This vertex would have a minimal betweenness 31 27 13 one?). Its harmonic centrality is 0.6212 ..., which is the second3 largest value (what is13 the largest?).3 1 1 34 score,15 as it lies on no geodesic paths that do15 not begin or terminate at v, but its path lengths 3 34 The minimal scores are 0.316 (closeness) and 0.4176 (harmonic), which illustrates the narrow range Floyd-Warshall: O(n ) time complexity 16 32 are all very6 short, being a single step longer than16 the paths to the highly central node to which it 32 7 7 2 5 14 19 of variation of closeness (less than a factor of O(n 2). (Do) space you12 seecomplexity which vertex producesconnects. these Thus,5 scores?)v would14 have high closeness or19 harmonic centrality, but low betweenness. 12 33 21 33 17 21 11 9 17 11 9 Applying the harmonicApproximate centrality calculation computation to the karate club network29 yields23 the figure on the 29 23 next page (with circle size Dijskstra: scaled to be O(n(m+n proportional log n)) to time thedegree score).complexity The small size of this network betweenness 15 tends to compress the centrality scores into a narrow range. Comparing the harmonic scores to degrees, we observe several di↵erences. ForFinally, instance, we return the to centrality the question of of vertex correlation 17, between the only measures, vertex as in both harmonic and between- group 1 that does not connect to the hubness vertex centrality 1, is are lower functions than of that geodesic of vertex paths, which 12, which may produce has the correlated rankings. On the lowest degree but connects to the high-degreeother hand, vertex such 1. a correlation And, vertex is not 3 a has foregone a harmonic conclusion. centrality Consider a vertex v that has only a single connection to another, highly central vertex. This vertex would have a minimal betweenness close to that of the main hubs 1 and 34,score, by virtue as it lies of onit being no geodesic “between” paths that the do two not groups begin or and terminate thus at v, but its path lengths having short paths to all members of each.are all very short, being a single step longer than the paths to the highly central node to which it connects. Thus, v would have high closeness or harmonic centrality, but low betweenness.

22 22

18 18 20 25 26 20 15 25 26 8 8 10 10 28 28 2 2 4 4 30 30 24 24 27 31 27 31 13 13 1 3 3 1 34 15 34 15

6 6 16 32 16 32 7 7 5 5 14 14 19 12 19 12 33 33 21 17 21 17 9 11 9 11

29 23 29 23

degree harmonic

Relationship to degree-based centralities In fact, degree-based centrality measures are related to geodesic-based measures like closeness and harmonic centrality, although they do emphasize di↵erent aspects of network structure. For in-

11 BETWEENNESS CENTRALITY BETWEENNESS

Can you guess the node/edge of highest betweenness in the European rail network ? OTHERS

• Many other centralities have been proposed

• The problem is how to interpret them ?

• Can be used as supervised tool: ‣ Compute many centralities on all nodes ‣ Learn how to combine them to find chosen nodes ‣ Discover new similar nodes ‣ (roles in social networks, key elements in an infrastructure, …) Which is which?

Harmonic Closeness Betweenness Eigenvector Katz Degree A: Betweenness B:Closeness C:Eigenvector D:Degree E:Harmonic F: Katz Try again :)

Degree Betweenness Closeness Eigenvector Try again :)

A: Degree B:Closeness C: Betweenness D: Eigenvector Caveats of centrality measures

• Each centrality measures is a proxy of an underlying network process • If this process is irrelevant for the actual network than the centrality measure makes no sense • E.g. If information does not pass via the shortest paths in a network, betweenness centrality is irrelevant • Centrality measures should be used with caution for (a) for exploratory purposes and (b) for characterisation SOME EXAMPLES ON REAL NETWORKS WIKIPEDIA

• What are the most important pages on Wikipedia ?

• Wikipedia network: ‣ Nodes are pages ‣ Links are hypertext links

• Wikipedia in english: Cultural bias !

• Results from http://wikirank-2019.di.unimi.it WIKIPEDIA

Table 1 Page views harmonic centrality indegree PageRank

0. Main Page 0. United States 0. United States 0. United States 1. Hyphen-minus 1. World War II 1. Association football 1. Association football

2. Louis Tomlinson 2. United Kingdom 2. World War II 2. France 3. Darth Vader 3. Association football 3. France 3. Iran

4. Lists of deaths by year 4. World War I 4. Germany 4. World War II

5. Exo (band) 5. France 5. India 5. Germany 6. List of stand-up comedians 6. Catholic Church 6. New York City 6. India from the United Kingdom

7. List of United States stand- 7. Germany 7. United Kingdom 7. Moth up comedians

8. List of stand-up comedians 8. China 8. Iran 8. United Kingdom

9. List of Australian stand-up 9. India 9. London 9. Australia comedians WIKIPEDIA

Table 1 PageRank Harmonic Centrality Indegree Page Views

0. Gone with the Wind (film) 0. Avatar (2009 film) 0. The Wizard of Oz (1939 film) 0. Black Panther (film)

1. The Wizard of Oz (1939 film) 1. Gone with the Wind (film) 1. Star Wars (film) 1. Deadpool 2

2. Cinema of Japan 2. The Wizard of Oz (1939 film) 2. Titanic (1997 film) 2. Venom (2018 film)

3. Star Wars (film) 3. The Godfather 3. Gone with the Wind (film) 3. A Quiet Place (film)

4. Titanic (1997 film) 4. Citizen Kane 4. Avatar (2009 film) 4. The Shape of Water

5. The Godfather 5. Casablanca (film) 5. The Godfather 5. Avengers: Endgame

6. Citizen Kane 6. Lawrence of Arabia (film) 6. The Lion King 6. Ant-Man and the Wasp

7. Avatar (2009 film) 7. On the Waterfront 7. The Matrix 7. The Greatest Showman

8. Casablanca (film) 8. Titanic (1997 film) 8. The Dark Knight (film) 8. A Star Is Born (2018 film)

9. Blade Runner 9. Schindler's List 9. Blade Runner 9. Ready Player One (film) Similarity measures Node similarity

Similarity between nodes based on their neighborhood How much two nodes are similarly connected

• What does it mean that they have 3 neighbours in common? • It is relative to their degree (different meaning for nodes with 3 or 100 neighbours)

➡Normalisation to penalise nodes with small degrees We can define it using existing measures:

• Cosine Similarity

• Pearson Coefficient Cosine similarity

» number of common neighbours

Cosinenij similarity= AikAkj Node similarity » Use varying degree of the nodes Xk » cosine / Salton’s cosine » number of commonNumber neighbours of common neighbours: Cosine similarityx.y between= x y twocos non-zero✓ vectors: | || | x.y cos ✓ = nij = AikAkj x y k Cosine| ||5similarity| X Cosine similarity» What does it mean to have 3 vertices in Vectors» number are the ofrows common of adjacency neighbours matrix Cosine» number similarity of common neighbourscommon? A A = cos ✓ = k ik kj » Need a normalisation. » number of commonij neighbours AikAkj 2 k 2 3 ij = cos ✓ = Pk Aik k Ajk 2 2 k AikAkj Pk Aik k Ajk ij = cos ✓ = pP qP A2 AAi,j2 =0/1 q with propertiesPk forik adjacencyk vectorspjk P as P A2 =Ai,jA =0/1 q ij ij pP P Cosine2 similarity A2 = A = k Ai,j =0A2 /=1 A A=ijk = Aij ∑ ik ∑ ik i ik jk i k k 2 » number of common neighbours A k = A 2 k 6 Xij ijAikX= Ajk = ki Number of common 2 k k k AikAkj nij 6 neighbours normalised ACosineik = similarity:AjkX= ki ijX= = kikj kikj by the geometric mean k k P 6 X X of their degrees » divided by the pgeometric meanp of their degree

7 = (A A )(A A ) ik h ii jk h ji Xk » n times the cov(A_i , A_j) of two rows » normalize the cov by the cosine so that max value Pearson coefficientis 1 cov(Ai,AJ ) rij = Correlation between rows of the adjacency matrix iJ (Aik Ai )(Ajk Aj ) cov(Ai, Aj) = k h i h i rij = (A A )2 (A A )2 σ σ kP ik i k jk j i j h i h i12 pP pP cov: covariance, expected product of deviations from individual expected values σ: std deviation, square root of the expected squared deviation from the mean Pearson coefficient Intuition, numerator: Number of common neighbours compared to the » Actual number in common minus expected expected number of common neighbours number

kikj 1 = (Aik Ai )(Ajk Aj ) A A = A A A A h i h i = ik jk n ik jk n ik jl k k k k l X X X X X » 0 Properties = AikAjk n Ai Aj » number in common exactly what we would h ih i • r(i,j)=0 - if the number of common neighbours exactly as manyk as we expect = X(A A A A ) would expect by chance ik jk h iih ji » > 0 k » i & j have more neighbours in common X • r(i,j)>0 - if nodes have more neighbours in common than= expected(Aik Ai )(Ajk Aj ) » < 0 h i h10 i k »• i r(i,j)<0& j have - fewerif nodes neighbours have fewer in common neighbours in common than expectedX

11 - Assortative mixing

"birds of a feather flock together"

• Property of (social) networks that nodes of the same attitude tends to be connected with a higher probability than expected

• It appears as correlation between vertex properties of x(i) and x(j) if (i,j)∈E

Vertex properties • age • gender • nationality • political beliefs • socioeconomic status • habitual place • obesity • …

• Homophily can be a link creation mechanism or consequence of social influence (and it is Highschool network difficult to distinguish) colored by race (J Moody)

? Connected people of the same political opinion are connected because they were a priori similar (homophily) or they become similar after they become connected (social influence)? Homophily - Assortative mixing

Dissasortative mixing • Contrary of homophily, where dissimilar nodes are tend to be connected

Examples • Sexual networks • Predator - prey ecological networks 2 women Homophily - AssortativeA. mixing Measuring discrete assortative mixing black hispanic white other ai black 0.258 0.016 0.035To quantify 0.013 homophily0.323 Discrete properties To quantify the level of assortative mixing in a network hispanic 0.012 0.157 0.058 0.019 0.247 2 we define an coefficient thus: men white 0.013 0.023 0.306 0.035 women0.377 A. Measuring discrete assortative mixing black hispanic white other ai other 0.005 0.007 0.024black 0.0160.258 0.0160.053 0.035 0.013 0.323 hispanic 0.012 0.157 0.058 0.019 0.247 To quantify the level of assortative mixing in a network e e2 we define an assortativity coefficienteiithus:− aibi Tr −∥ ∥ men white 0.013 0.023 0.306 0.035 0.377 i i bi 0.289 0.204 0.423other 0.0840.005 0.007 0.024 0.016 0.053 r = e e2 = , (2) i eii − i aibi Tr −∥ ∥ bi 0.289 0.204 0.423 0.084 r = = , (2) e2 1 − e2 a b 1 −∥ ∥ 1 − i aibi 1 −∥ ∥i i i " "" " TABLE I: The mixing matrix eij and the values of ai and e x where is the matrix" whose elements are eij and ∥ ∥ TABLE I: The mixing matrix eij andbi thefor sexual values partnerships in of the studyai ofand Catania et al. [23]. x After Morris [24]. means thee sum of all elements of the matrix .This x whereformula givesisr =0whenthereisnoassortativemixing, the matrix" whose elements are eij and ∥ ∥ bi for sexual partnerships in the study of Catania et al. [23]. since eij = aibj in that case, and r =1whenthereis x meansperfect assortative the mixing sum and ofi eii =1.Ifthenetwork all elements of the matrix .This After Morris [24]. effect on network structure and behavior. The outline is perfectly disassortative, i.e., every edge connects two of the paper is as follows. In Section II we study theformulavertices of di givesfferent types,r then=0whenthereisnoassortativemixing,"r is negative and has the effects of assortative mixing by discrete characteristics value suchNo as language assortative or race. In Section mixing III we study mixing: r=0 ( ) a b sinceeeijij==ar iab=ij−bj iini i that, case,(3) and r =1whenthereis by scalar properties such as age and particularly vertex min 1 − a b degree; since degree is itself a property of the networkperfect assortative" i i mixingi and e =1.Ifthenetwork topology, the latterPerfectly type of mixing leadsassortative: to some novel which r=1 lies in general in the range" −1 ≤ r<0. One i ii network structures not seen with other types. In Sec- might ask what this value signifies. Why do we not sim- effect on network structure and behavior.tion IV wePerfectly give our The conclusions. outlinedisassortative: A preliminary reportis of perfectly ply-1

40 As before, we can use the matrix exy to define a mea- sure of assortativity. We first note that exy satisfies the Homophily - Assortativesum rules mixing 30 exy =1, exy = ax, exy = by, (20) To quantify homophily xy y x ! Scalar! properties! 5 where axPearsonand by correlationare, respectively, coefficient of the properties fraction of edges 40 As before, we can use the matrix e to define a mea- 20 that start and end at vertices withxy values x and5 y.(Onan age of wife [years] sure of assortativity.at both We extremities first note that ofexy edgessatisfies the 40 sum rules undirected, unipartiteAs before, we graph, can use thea matrixx =exybxto.) define Then, a mea- if there is 30 : fractionsure of assortativity. edges joining We first nodes note that withexy satisfies values the x and y no assortativee exy =1 mixing, eexy = a=x, a b .Ifthereisassortativeexy = by, (20) xy sum rules xy x y xy y x ! ! ! 30 mixing one can measuree =1, ite by= a calculating, e = b , the(20) standard where a and b xyare, respectively,xy thex fractionxy of edgesy 10 x xy y y x 10 20 3020 40 50 Pearsonthat correlation start and! end at coe verticesffi!cient with values thus:x and! y.(Onan age of wife [years] undirected,where unipartiteax and graph,by are,a respectively,x = bx.) Then, the fraction if there of is edges 20 that start and end at vertices with values x and y.(Onan age of husband [years] age of wife [years] no assortative mixing exy=xyaxb(ye.Ifthereisassortativexy − axby) mixing oneundirected, can measure unipartite it by graph, calculatingax = bx the.) Then, standard if there is nor assortative= mixing e = a b .Ifthereisassortative, (21) 10 Pearson correlation coefficient thus:xyσ σx y 10 20 30 40 50 mixing one" can measure ita byb calculating the standard 10 10age of husband 20 [years] 30 40 50 Pearson correlationxy( coeexyffi−cientaxb thus:y) withr = σ xy standard deviation, of (21)a age of husband [years]where σa and σb area theσ standardσ xy(exy − axb deviationsy) x of the dis- 200 " r = a xyb , (21) σaσb tributionswhereaσxa andandσb areby.Thevalueof the standard" deviations ofr thelies dis- in the range 200 tributionswhereax andσa andby.Thevalueofσb are the standardr lies deviations in the range of the dis- 200 −1 ≤ r ≤ 1, with r =1indicatingperfectassortativity −1 ≤ r ≤tributions1, with rax=1indicatingperfectassortativityand by.Thevalueofr lies in the range and r =and−r1indicatingperfectdisassortativity(i.e.,per-= −−1indicatingperfectdisassortativity(i.e.,per-1 ≤ r ≤ 1, with r =1indicatingperfectassortativity 150 150 and r = −1indicatingperfectdisassortativity(i.e.,per- 150 fect negative correlation between x and y). For the age fect negative correlationfect negative correlation between between xxandandy). Fory the). age For the age data from Fig. 1, for example, we find that r =0.574, r=0, no assortative mixing, data from Fig. 1, for example, we find that r =0.574, 100 indicating strong assortative mixing once more. number 100 data from Fig.indicating 1, for strong example, assortative mixing we once find more. that r =0.574, number One can construct in a straightforward manner a ran- 100 indicating strongOne assortative can construct in a straightforward mixing once manner more. a ran- number r>0 assortative mixing, dom graphdom model graph of model a network of a network with this with type this of type mixing of mixing 50 50 One canexactly construct analogousexactly analogous to the in model a to the straightforward presented model presented in Section in Section II B. manner II B. a ran- r<0 disassortative mixing It is also possibleIt is also to possible generate to generate random random representative representative net- net- dom graph modelworks from of the a ensemblenetwork defined with by exy thisusing the type algo- of mixing 0 0 works from the ensemble defined by exy using the algo- 50 rithm describedrithm described in Section in Section II C. In II this C. In paper this paper however, however, -5 0 5-5 100 515 1020exactly1525 20 analogous25 to the model presented in Section II B. rather thanrather working than further working on further the general on the general type of type mixing of mixing age difference [years] described here, we will concentrate on one special exam- age difference [years] described here, we will concentrate on one special exam- It is also possibleple of to assortative generate mixing by random a scalar property representative which is net- FIG. 1: Top: scatter plot of the ages of 1141 marriedple couples of assortativeparticularly mixing important by a for scalar many property of the networks which iswe are FIG. 1: Top: scatterat time plot of of marriage, the ages from of 1141 the married 1995 USworks couples National Survey fromparticularly of the important ensemble for many defined of the networks by exy we areusing the algo- 0 at time of marriage, from the 1995 US National Survey of interested in, namely mixing by vertex degree. Family Growth [37]. Bottom: a histogramrithm of the age describedinterested differ- in, namely in Section mixing by II vertex C. degree. In this paper however, -5 0 5 Family10 Growth15 [37].ences Bottom: (male20 minus a histogram female)25 for of the the same age data. differ- ences (male minus female) for the same data. rather than working furtherA. Mixing on by vertex the general degree type of mixing age difference [years] A. Mixing by vertex degree In Fig. 1 (top panel) we show a scatterdescribed plot of the ages here,In we general, will scalar concentrate assortative mixing on of onethe type special de- exam- In Fig. 1 (top panel)of marriage we show partners a scatter in the plot 1995 of US theple National ages of Survey assortativeIn of general,scribed scalar above mixing assortative requires by that mixing thea scalarvertices of the of thetype property network de- of which is Family Growth [37]. As is clear from the figure, there is interest have suitable scalar properties attached to them, FIG. 1: Top: scatter plot of the agesof marriage of 1141 partners married in the 1995 couples US National Survey of scribed above requires that the vertices of the network of astrongpositivecorrelationbetweentheages,withmostparticularly importantsuch as age or income for in many social networks. of the In many networks cases, we are at time of marriage, from the 1995Family US Growth Nationalof [37]. the As density is clear Survey in fromthe distribution the of figure, lying there along is ainterest rough havehowever, suitable data scalar are not properties available attached for these to properties them, to Family Growth [37]. Bottom: aastrongpositivecorrelationbetweentheages,withmost histogramdiagonal of the in the age plot; di people,ffer- it appears,interested prefer tosuch marry in, as age namelyallow or income us to assess mixingin social whether networks. by the network vertex In many is assortatively degree. cases, of the density inothers the of distribution about the same lying age, along although a rough there ishowever, some datamixed. are But not there available is one forscalar these vertex properties property to that is ences (male minus female) for thediagonal same in data. thebias plot; towards people, husbands it appears, being prefer older to than marry their wives.allow In us toalways assess available whether for the every network network, is and assortatively that is vertex others of aboutthe the bottom same panel age, of although the same figurethere we is show some a histogrammixed. Butdegree. there So is long one as scalar we know vertex the property network structure that is we bias towards husbandsof the age being differences older in than the study,their wives. which emphasizes In always the availableA.always Mixing know for every the degree network, by of vertex a and vertex, that and degree is then vertex we can the bottom panelsame of the conclusion same figure [76]. we show a histogram degree. Soask long whether as we vertices know of the high network degree preferentially structure we asso- of the age differencesBy analogy in the study, with the which developments emphasizes of Section the II,always we can knowciate the with degree other vertices of a vertex, of high and degree. then Do we the can gregari- define a quantity e ,whichisthefractionofalledges ous people hang out with other gregarious people? This same conclusion [76]. xy ask whether vertices of high degree preferentially asso- in the network that join together vertices with values x has been a topic of considerable discussion in the physics In Fig. 1 (top panel) we show aBy scatter analogy withand plot they for developmentsof the age the or other ages of Section scalar variable II, weIn of can interest. general,ciate The withliterature scalar other vertices [38, assortative 39, of 40, high 41, degree. 42]. As mixing Do we the will gregari- show, of many the type de- define a quantityvaluesexy,whichisthefractionofalledgesx and y might be either discrete in nature (e.g.,ous in- peoplereal-world hang out networks with other do show gregarious significant people? assortative This (or of marriage partners in the 1995in the US network National thattegers, join such together Surveyas age to vertices the nearest of with year) valuesscribed or continuousx has above (ex- been adisassortative) requires topic of considerable mixingthat by the discussion vertex vertices degree. in the physics of the network of Family Growth [37]. As is clearand fromy for the the ageact or figure, age), other making scalar theree variablexy either is a of matrix interest.interest or a The functionliterature haveof two suitable [38,Assortative 39, 40, scalar mixing 41, 42]. by propertiesAs degree we can will be show, quantified attached many in ex- to them, values x and y mightcontinuous be either variables. discrete Here, in for nature simplicity, (e.g., we in- concentratreal-worlde actly networks the same do way show as significant for other scalar assortative properties (or of ver- astrongpositivecorrelationbetweentheages,withmosttegers, such as ageon theto the discrete nearest case, year) but generalization or continuoussuch to (ex- the as continuousdisassortative) age ortices, income mixing using Eq. by in (21). vertex social Taking degree. networks. the example of an In undi- many cases, case is straightforward. rected network and using the notation of Ref. 22, we of the density in the distributionact age), lyingmaking exy alongeither a a matrix rough or a functionhowever, of two Assortative data are mixing not by degreeavailable can be quantified for these in ex- properties to continuous variables. Here, for simplicity, we concentrate actly the same way as for other scalar properties of ver- diagonal in the plot; people, iton appears, the discrete case, prefer but generalization to marry to the continuousallow ustices, to using assess Eq. (21). whether Taking the the example network of an undi- is assortatively others of about the same age,case although is straightforward. there is some mixed.rected But network there and is using one the scalar notation vertex of Ref. 22, property we that is bias towards husbands being older than their wives. In always available for every network, and that is vertex the bottom panel of the same figure we show a histogram degree. So long as we know the network structure we of the age differences in the study, which emphasizes the always know the degree of a vertex, and then we can same conclusion [76]. ask whether vertices of high degree preferentially asso- By analogy with the developments of Section II, we can ciate with other vertices of high degree. Do the gregari- define a quantity exy,whichisthefractionofalledges ous people hang out with other gregarious people? This in the network that join together vertices with values x has been a topic of considerable discussion in the physics and y for the age or other scalar variable of interest. The literature [38, 39, 40, 41, 42]. As we will show, many values x and y might be either discrete in nature (e.g., in- real-world networks do show significant assortative (or tegers, such as age to the nearest year) or continuous (ex- disassortative) mixing by vertex degree. act age), making exy either a matrix or a function of two Assortative mixing by degree can be quantified in ex- continuous variables. Here, for simplicity, we concentrate actly the same way as for other scalar properties of ver- on the discrete case, but generalization to the continuous tices, using Eq. (21). Taking the example of an undi- case is straightforward. rected network and using the notation of Ref. 22, we Degree-degree correlation

• A particular type of application is the degree correlation: • Are important nodes connected to other important nodes with a higher probability than expected? • The degree can be used as any other scalar property 7 PEARSON-CORRELATION network type size n assortativity r error σr ref. physics coauthorship undirected 52 909 0.363 If0. 002there are degreea correlations, ejk will differ from qjqk. The magnitude of the correlation is biology coauthorship undirected 1520251 0.127 captured0.0004 by a- difference, which is: mathematics coauthorship undirected 253 339 0.120 0.002 b ⎧ jk(e − q q ) social ⎪ film actor collaborations undirected 449 913 0.208 0.0002 c ∑ jk j k ⎪ jk ⎪ company directors undirected 7673 0.276 <0.jk004>-d is expected to be: ⎨⎪ student relationships undirected 573 −0.029 0.037 positivee for assortative networks, ⎪ email address books directed 16 881 0.092 0.004 f ⎪ zero for neutral networks, ⎪ power grid undirected 4941 −0.003 0.013 g ⎩⎪ negative for dissasortative networks Internet undirected 10 697 −0.189 0.002 h technological ⎧ World-Wide Web directed 269 504 −0.067 To0.0002 compare idifferent networks, we should normalize it with its maximum value; the ⎨⎪ software dependencies directed 3162 −0.016 0.020 j maximum is reached for a perfectly assortative network, i.e. ejk=qkjk protein interactions undirected 2115 −0.156 0.010 k ⎪ ⎩ metabolic network undirected 765 −0.240 0.007 l − 2 biological ⎧ neural network directed 307 0.226 normalization0.016 m: σr = max∑ jk(e jk − q jqk ) = ∑ jk(qkδ jk − q jqk ) ⎪ marine food web directed 134 −0.263 0.037 n jk jk ⎨ − freshwater food web directed 92 0.326 0.031 o jk(e − q q ) r ≤ 0 disassortative ⎪ ∑ jk j k ⎩⎪ jk −1≤ r ≤1 r = 0 neutral TABLE II: Size n,degreeassortativitycoefficient r,andexpectederrorσr on the assortativity, for a number ofr social, = 2 assortative technological, and biological networks, both directed and undirected. Social networks: coauthorship networks of (a) physicists σr r ≥ 0 and biologists [43] and (b) mathematicians [44], in which authors are connected if they have coauthored one or more articles in learned journals; (c) collaborations of film actors in which actors are connected if they have appeared together in one or : Degree Correlations March 7, 2011 more movies [5, 7]; (d) directors of Fortune 1000 companies for 1999, in which two directors areM. E. connected J. Newman, ifPhys. they Rev. sit Lett on. 89, th 208701e (2002) board of directors of the same company [45]; (e) romantic (notnecessarilysexual)relationshipsbetweenstudentsataUShigh school [46]; (f) network of email address books of computer users on a large computer system, in which an edge from user A to user B indicates that B appears in A’s address book [47]. Technological networks: (g) network of high voltage transmission lines in the Western States Power Grid of the United States [5]; (h) network of direct peering relationships between autonomous systems on the Internet, April 2001 [48]; (i) network of hyperlinks between pages in the World-Wide Web domain nd.edu, circa 1999 [49]; (j) network of dependencies between software packages in the GNU/Linux operating system, in which an edge from package A to package B indicates that A relies on components ofBforitsoperation.Biologicalnetworks:(k)protein–protein interaction network in the yeast S. Cerevisiae [50]; (l) metabolic network of the bacterium E. Coli [51]; (m) neural network of the nematode worm C. Elegans [5, 52]; tropic interactions between species in the food websof(n)YthanEstuary,Scotland[53] and (o) Little Rock Lake, Wisconsin [54].

B. Models of assortative mixing by degree discussed recently by Dorogovtsev et al. [40].) As we pointed out, however, this algorithm is flawed because In Ref. 22 we studied the ensemble of graphs that have in order to create a network without any dangling edges aspecifiedvalueofthematrixejk and solved exactly for the number of degree k ends must be a multiple of k for its average properties using generating function methods all k. It is very unlikely that these constraints will be similar to those of Section II B. We showed that the phase satisfied by chance, and there does not appear to be any transition at which a giant component first appears in simple way of arranging for them to be satisfied without such networks occurs at a point given by det(I − m)=0, introducing bias into the ensemble of graphs. Instead, where m is the matrix with elements mjk = kejk/qj.One therefore, we use a Monte Carlo sampling scheme which is can also calculate exactly the size of the giant component, essentially equivalent to the Metropolis–Hastings method and the distribution of sizes of the small components be- widely used in the mathematical and social sciences for low the phase transition. While these developments are generating model networks [55, 56]. The algorithm is as mathematically elegant, however, their usefulness is lim- follows. ited by the fact that the generating functions involved 1. Given the desired edge distribution e ,wefirst are rarely calculable in closed form for arbitrary speci- jk calculate the corresponding distribution of excess fied e ,andthedeterminantofthematrixI − m almost jk degrees q from Eq. (23), and then invert Eq. (22) never is. In this paper, therefore, we take an alternative k to find the : approach, making use of computer simulation. We would like to generate on a computer a random qk−1/k network having, for instance, a particular value of the pk = . (27) j qj−1/j matrix ejk.(Thisalsofixesthedegreedistribution,via Eq. (23).) In Ref. 22 we discussed one possible way of Note that this equation% cannot tell us how many doing this using an algorithm similar that of Section II C. vertices there are of degree zero in the network. One would draw edges from the desired distribution ejk This information is not contained in the edge dis- and then join the degree k ends randomly in groups of k tribution ejk since no edges connect to degree-zero to create the network. (This algorithm has also been vertices, and so must be specified separately. On Average nearest-neighbourAverage next neighbor degree degree

R. Pastor-Satorras, A. Vázquez, A. Vespignani, Phys. Rev. E 65, 066130 (2001) k (k): average degree of the first • More detailed characterisationannd of degree-degree correlations neighbors of nodes with degree k. • kannd: average nearest neighbours degree

∑k ′e kk ′ 4 + 3+ 3+1 k v = k (k) = k ′P (k ′ | k) = k ′ annd 4 • kannd can be written as: annd ∑ k ′ ∑ekk ′ k ′

k e k q q • where P(k’|k) is the conditional probability that∑ ′ ankk ′ edge∑ ′ k ofk ′ a k 2 No degree k k k' p(k') k (k) = ′ = ′ = k ′q = k ′ = node with degree k pointscorrelations: to a node anndwith degree k’ q ∑ k ′ ∑ < k > k ∑ekk ′ k k ′ k ′ k ′

• If there are no degree correlations:If there are no degree correlations, kannd(k) is independent of k.

R. Pastor-Satorras, A. Vázquez, A. Vespignani, Phys. Rev. E 65, 066130 (2001) Network Science: Degree Correlations March 7, 2011 ⟨k2⟩ k (k) = . . . = annd ⟨k⟩

• kannd is independent of k (nodes of any degrees should have the same nearest neighbors degree)

• If the network is assortative knn(k) is a positive function

• If the network is disassortative knn(k) is a negative function Nearest neighbour degree

kannd(k) FOR REAL NETWORKS

10

100 exponent: 0.37+-0.11 > > nn nn

exponent: -.27+-0.03

10 1 1 10 100 1000 1 10 100 k k Astrophysics co-authorship network Yeast PPI

Assortative Disassortative

Network Science: Degree Correlations March 7, 2011 Nearest neighbour degree

/!\ These definitions suppose a finite variance. One of the properties of power-law degree distributions is that they have infinite variance

Imagine a network with a node of degree 10 and 10 nodes of degree 1: by construction, they cannot have the same average degree of neighbors

Other measures need to be applied (see for instance https://arxiv.org/pdf/1704.05707.pdf ) the rich-club coefficient

• How well connected are high- • Values should be compared to Rich-club coefficientdegree vertices among some random reference or themselves? null model • How well connected are the well connected among themselves • It is calculated on a list of •nodeThe degree rich-club sorted coefficient: in ascendant order as• Usually, the configuration Algorithmmodel is used • rank nodes by degree Take as many nodes as in the • remove• nodes in an ascendantoriginal degree network, order with exactly the (N>k = # of nodes with degree • N>k denotes the number of• nodes with degree k or larger than k same degrees, and connect • measure the density of higher than k; E>k = # of links randomly the remaining network • E>k measures the number of linksbetween between these) them • Results are usually compared to random references • Empirical networks: exchange endpoints of randomly chosen • configuration model of equivalent synthetic network pairs of links until the whole • configuration model ofthe the empirical rich-club network coefficient network has been rewired, see lecture 1 Colizz’aColizza et et al., al, NatureNature PhysicsPhysics 22,, 2006 rand rand Φ Φ / / orig orig Φ Φ Wednesday, November 14, 12 ratio ratio rand rand Φ Φ / / orig orig Φ Φ ratio ratio degree k degree k

Wednesday, November 14, 12