<<

Measures of Measures of Outline centrality Measures of centrality Complex Networks, CSYS/MATH 303, Spring, 2010 Background Background Centrality Centrality measures measures Degree centrality Background Degree centrality Closeness centrality Closeness centrality Prof. Peter Dodds Betweenness Betweenness Eigenvalue centrality Eigenvalue centrality Hubs and Authorities Hubs and Authorities

Department of Mathematics & Statistics References Centrality measures References Center for Complex Systems Degree centrality Vermont Advanced Computing Center University of Vermont Closeness centrality Betweenness Eigenvalue centrality Hubs and Authorities

References

1/29 2/29 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Measures of Measures of How big is my node? centrality Centrality centrality

Background Background I Basic question: how ‘important’ are specific nodes Centrality Centrality measures I One possible reflection of importance is centrality. measures and edges in a network? Degree centrality Degree centrality Closeness centrality I Presumption is that nodes or edges that are (in some Closeness centrality I An important node or edge might: Betweenness Betweenness Eigenvalue centrality sense) in the middle of a network are important for Eigenvalue centrality 1. handle a relatively large amount of the network’s Hubs and Authorities Hubs and Authorities the network’s function. traffic (e.g., cars, information); References References 2. bridge two or more distinct groups (e.g., liason, I Idea of centrality comes from social networks interpreter); literature [7]. 3. be a source of important ideas, knowledge, or I Many flavors of centrality... judgments (e.g., supreme court decisions, an 1. Many are topological and quasi-dynamical; employee who ‘knows where everything is’). 2. Some are based on dynamics (e.g., traffic). So how do we quantify such a slippery concept as I We will define and examine a few... importance? I (Later: see centrality useful in identifying We generate ad hoc, reasonable measures, and I I communities in networks.) examine their utility...

3/29 4/29 Measures of Measures of Centrality centrality Closeness centrality centrality

Background I Idea: Nodes are more central if they can reach other Background

Centrality nodes ‘easily.’ Centrality measures measures Degree centrality I Measure average shortest path from a node to all Degree centrality Closeness centrality Closeness centrality Betweenness other nodes. Betweenness Degree centrality Eigenvalue centrality Eigenvalue centrality Hubs and Authorities I Define Closeness Centrality for node i as Hubs and Authorities [7] References References I Naively estimate importance by node degree. N − 1 Doh: assumes linearity I P ( from i to j). (If node i has twice as many friends as node j, it’s j,j6=i twice as important.) I Range is 0 (no friends) to 1 (single hub). I Doh: doesn’t take in any non-local information. I Unclear what the exact values of this measure tells us because of its ad-hocness.

I General problem with simple centrality measures: what do they exactly mean?

I Perhaps, at least, we obtain an ordering of nodes in 6/29 terms of ‘importance.’ 8/29

Measures of Measures of centrality centrality I Consider a network with N nodes and m edges (possibly weighted). Background N Background I Betweenness centrality is based on shortest paths in Centrality I Computational goal: Find 2 shortest paths () Centrality a network. measures between all pairs of nodes. measures Degree centrality Degree centrality Closeness centrality Closeness centrality I Idea: If the quickest way between any two nodes on Betweenness I Traditionally use Floyd-Warshall () algorithm. Betweenness Eigenvalue centrality Eigenvalue centrality a network disproportionately involves certain nodes, Hubs and Authorities 3 Hubs and Authorities I Computation time grows as O(N ). then they are ‘important’ in terms of global cohesion. References References I See also: For each node i, count how many shortest paths I 1. Dijkstra’s algorithm () for finding shortest path pass through i. between two specific nodes, 2. and Johnson’s algorithm ( ) which outperforms I In the case of ties, or divide counts between paths.  Floyd-Warshall for sparse networks: I Call frequency of shortest paths passing through O(mN + N2 log N). node i the betweenness of i, Bi . [4, 5] [1] I Newman (2001) and Brandes (2001) I Note: Exclude shortest paths between i and other independently derive much faster algorithms. nodes. I Computation times grow as: I Note: works for weighted and unweighted networks. 1. O(mN) for unweighted graphs; 11/29 2. and O(mN + N2 log N) for weighted graphs. 12/29 Measures of [4] Measures of Shortest path between node i and all others: centrality Newman’s Betweenness algorithm: centrality 1. Set all nodes to have a value c = 0, j = 1, ..., N I Consider unweighted networks. ij Background (c for count). Background I Use breadth-first search: Centrality Centrality 1. Start at node i, giving it a distance d = 0 from itself. measures 2. Select one node i. measures Degree centrality Degree centrality 2. Create a list of all of i’s neighbors and label them Closeness centrality 3. Find shortest paths to all other N − 1 nodes using Closeness centrality Betweenness Betweenness being at a distance d = 1. Eigenvalue centrality breadth-first search. Eigenvalue centrality 3. Go through list of most recently visited nodes and Hubs and Authorities 4. Record # equal shortest paths reaching each node. Hubs and Authorities References References find all of their neighbors. 5. Move through nodes according to their distance from 4. Exclude any nodes already assigned a distance. i, starting with the furthest. 5. Increment distance d by 1. 6. Label newly reached nodes as being at distance d. 6. Travel back towards i from each starting node j, 7. Repeat steps 3 through 6 until all nodes are visited. along shortest path(s), adding 1 to every value of ci` at each node ` along the way. I Record which nodes link to which nodes moving out from i (former are ‘predecessors’ with respect to i’s 7. Whenever more than one possibility exists, apportion shortest path structure). according to total number of short paths coming through predecessors. Runs in O(m) time and gives N shortest paths. I 8. Exclude starting node j and i from increment. I Find all shortest paths in O(mN) time 13/29 9. Repeat steps 2–8 for every node i 14/29 2 N I Much, much better than naive estimate of O(mN ). P and obtain betweenness as Bj = i=1 cij .

[4] Measures of [4] Measures of Newman’s Betweenness algorithm: centrality Newman’s Betweenness algorithm: centrality

I For a pure treeM. network E. J. NEWMAN, cij is the AND number M. GIRVAN of nodes PHYSICAL REVIEW E 69, 026113 ͑2004͒ Background Background beyond j from i’s vantage point. Centrality Centrality pears not to influence the results highly. Themeasures recalculation measures I Same algorithm for computing drainage area in river Degree centrality Degree centrality networks (withstep, 1 added on the across other hand, the board). is absolutely crucial toCloseness the centrality operation Closeness centrality Betweenness Betweenness of our methods. This step was missing fromEigenvalue previous centrality at- Eigenvalue centrality I For edge betweenness, use exact same algorithm Hubs and Authorities Hubs and Authorities

but now tempts at solving the clustering problem usingReferences divisive algo- References 1. j indexesrithms, edges, and yet without it the results are very poor indeed, 2. and we addfailing one to to each find edge known as wecommunity traverse it. structure even in the sim-

I For both algorithms,plest of computation cases. In Sec. time V Bgrows we give as an example comparing the performance of the algorithm on a particular network with andO without(mN). the recalculation step. In the following sections, we discuss implementation and I For sparse networksgive examples with relatively of our small algorithms average for finding community degree, we havestructure. a fairly For digestible the reader time who growth merely of wants to know what algorithm they should use for their own problem, let us give 2 FIG. 4. Calculation of shortest-path betweenness: ͑a͒ When an immediateO(N answer:). for most problems, we recommend the there is only a single shortest path from a source vertex s ͑top͒ to all algorithm with betweenness scores calculated15/29 using the other reachable vertices, those paths necessarily form a tree, which 16/29 shortest-path betweenness measure ͑i͒ above. This measure makes the calculation of the contribution to betweenness from this appears to work well and is the quickest to calculate—as set of paths particularly simple, as described in the text. ͑b͒ For described in Sec. III A, it can be calculated for all edges in cases in which there is more than one shortest path to some vertices, time O(mn), where m is the number of edges in the graph the calculation is more complex. First we must calculate the number and n is the number of vertices ͓48͔. This is the only version of distinct paths from the source s to each vertex ͑numbers on of the algorithm that we discussed in Ref. ͓25͔. The other vertices͒, and then these are used to weight the path counts as versions we discuss, while being of some pedagogical inter- described in the text. In either case, we can check the results by est, make greater computational demands, and in practice confirming that the sum of the betweennesses of the edges con- seem to give results no better than the shortest-path method. nected to the source vertex is equal to the total number of reachable vertices—six in each of the cases illustrated here.

III. IMPLEMENTATION tion to betweenness for each edge from this set of paths as follows. We find first the ‘‘leaves’’ of the tree, i.e., those In theory, the descriptions of the preceding section com- nodes such that no shortest paths to other nodes pass through pletely define the methods we consider in this paper, but in them, and we assign a score of 1 to the single edge that practice there are a number of subtleties to their implemen- connects each to the rest of the tree, as shown in the figure. tation that are important for turning the description into a Then, starting with those edges that are farthest from the workable computer algorithm. source vertex on the tree, i.e., lowest in Fig. 4͑a͒, we work Essentially all of the work in the algorithm is in the cal- upwards, assigning a score to each edge that is 1 plus the culation of the betweenness scores for the edges; the job of sum of the scores on the neighboring edges immediately be- finding and removing the highest-scoring edge is trivial and low it ͑i.e., those edges with which it shares a common ver- not computationally demanding. Let us tackle our three sug- tex͒. When we have gone though all edges in the tree, the gested betweenness measures in turn. resulting scores are the betweenness counts for the paths from vertex s. Repeating the process for all possible vertices s and summing the scores, we arrive at the full betweenness A. Shortest-path betweenness scores for shortest paths between all pairs. The breadth-first At first sight, it appears that calculating the edge between- search and the process of working up through the tree both ness measure based on geodesic paths for all edges will take take worst-case time O(m) and there are n vertices total, so O(mn2) operations on a graph with m edges and n vertices: the entire calculation takes time O(mn) as claimed. calculating the shortest path between a particular pair of ver- This simple case serves to illustrate the basic principle tices can be done using breadth-first search in time O(m) behind the algorithm. In general, however, it is not the case ͓28,29͔, and there are O(n2) vertex pairs. Recently, however, that there is only a single shortest path between any pair of new algorithms have been proposed by Newman ͓30͔ and vertices. Most networks have at least some vertex pairs be- independently by Brandes ͓31͔ that can perform the calcula- tween which there are two or more geodesic paths of equal tion faster than this, finding all betweennesses in O(mn) length. Figure 4͑b͒ shows a simple example of a shortest time. Both Newman and Brandes gave algorithms for the path ‘‘tree’’ for a network with this property. The resulting standard Freeman vertex betweenness, but it is trivial to structure is in fact no longer a tree, and in such cases an extra adapt their algorithms for edge betweenness. We describe the step is required in the algorithm to calculate the betweenness resulting method here for the algorithm of Newman. correctly. Breadth-first search can find shortest paths from a single In the traditional definition of vertex betweenness ͓27͔, vertex s to all others in time O(m). In the simplest case, multiple shortest paths between a pair of vertices are given when there is only a single shortest path from the source equal weights summing to 1. For example, if there are three 1 vertex to any other ͑we will consider other cases in a mo- shortest paths, each will be given weight 3. We adopt the ment͒, the resulting set of paths forms a shortest-path tree— same definition for our edge betweenness ͑as did Anthonisse see Fig. 4͑a͒. We can use this tree to calculate the contribu- in his original work ͓26͔, although other definitions are pos-

026113-4 Measures of Measures of Important nodes have important friends: centrality Important nodes have important friends: centrality

I Define xi as the ‘importance’ of node i. T Background I So... solve A ~x = λ~x. Background I Idea: xi depends (somehow) on xj Centrality But which eigenvalue and eigenvector? Centrality if j is a neighbor of i. measures I measures Degree centrality Degree centrality Closeness centrality I We, the people, would like: Closeness centrality I Recursive: importance is transmitted through a Betweenness 1. A unique solution. Betweenness network. Eigenvalue centrality X Eigenvalue centrality Hubs and Authorities 2. λ to be real. X Hubs and Authorities Simplest possibility is a linear combination: References References I 3. Entries of ~x to be real. X X 4. Entries of ~x to be non-negative. X xi ∝ aji xj 5. λ to actually mean something... (maybe too much) j 6. Values of xi to mean something (what does an observation that x3 = 5x7 mean?) I Assume further that constant of proportionality, c, is (maybe only ordering is informative...) independent of i. (maybe too much) T T −1 7. λ to equal 1 would be nice... (maybe too much) I Above gives ~x = cA ~x or A ~x = c ~x= λ~x . 8. Ordering of ~x entries to be robust to reasonable I Eigenvalue equation based on adjacency matrix... modifications of linear assumption (maybe too much) Note: Lots of despair over size of the largest I We rummage around in bag of tricks and pull out the eigenvalue. [7] Lose sight of original assumption’s I 18/29 Perron-Frobenius theorem... 19/29 non-physicality.

Measures of Measures of Perron-Frobenius theorem: () centrality Other Perron-Frobenius aspects: centrality If an N×N matrix A has non-negative entries then: Background Background 1. A has a real eigenvalue λ1 ≥ |λi | for i = 2,..., N. Centrality Centrality measures measures Degree centrality Degree centrality 2. λ1 corresponds to left and right 1-d eigenspaces for Closeness centrality I Assuming our network is irreducible (), meaning Closeness centrality which we can choose a basis vector that has Betweenness Betweenness Eigenvalue centrality there is only one component, is reasonable: just Eigenvalue centrality Hubs and Authorities Hubs and Authorities non-negative entries. consider one component at a time if more than one References References 3. The dominant real eigenvalue λ1 is bounded by the exists. minimum and maximum row sums of A: I Irreducibility means largest eigenvalue’s eigenvector N N has strictly non-negative entries. X X min aij ≤ λ1 ≤ max aij Analogous to notion of ergodicity: every state is i i I j=1 j=1 reachable. 4. All other eigenvectors have one or more negative I (Another term: Primitive graphs and matrices.) entries. 5. The matrix A can make toast.

6. Note: Proof is relatively short for symmetric matrices 20/29 21/29 that are strictly positive [6] and just non-negative [3]. Measures of Measures of Hubs and Authorities centrality Hubs and Authorities centrality

I Give each node two scores: Background Background I Generalize eigenvalue centrality to allow nodes to 1. x = authority score for node i Centrality i Centrality have two attributes: measures 2. yi = hubtasticness score for node i measures Degree centrality Degree centrality 1. Authority: how much knowledge, information, etc., Closeness centrality Closeness centrality I As for eigenvector centrality, we connect the scores held by a node on a topic. Betweenness Betweenness Eigenvalue centrality of neighboring nodes. Eigenvalue centrality 2. Hubness (or Hubosity or Hubbishness): how well a Hubs and Authorities Hubs and Authorities node ‘knows’ where to find information on a given References I New story I: a good authority is linked to by good References topic. hubs. [2] PN I Original work due to the legendary Jon Kleinberg. I Means xi should increase as j=1 aji yj increases. I Best hubs point to best authorities. I Note: indices are ji meaning j has a directed link to i.

I Recursive: nodes can be both hubs and authorities. I New story II: good hubs point to good authorities. PN I More: look for dense links between sets of good I Means yi should increase as j=1 aij xj increases. hubs pointing to sets of good authorities. I Linearity assumption: I Known as the HITS algorithm ( )  T (Hyperlink-Induced Topics Search). ~x ∝ A ~y and ~y ∝ A~x 24/29 25/29

Measures of Measures of Hubs and Authorities centrality We can do this: centrality

Background Background T Centrality I A A is symmetric. Centrality So let’s say we have measures T measures I Degree centrality I A A is semi-positive definite so its eigenvalues are Degree centrality Closeness centrality Closeness centrality T Betweenness all ≥ 0. Betweenness ~ ~ ~ ~ Eigenvalue centrality Eigenvalue centrality x = c1A y and y = c2Ax T Hubs and Authorities I A A’s eigenvalues are the square of A’s singular Hubs and Authorities References values. References where c1 and c2 must be positive. AT A’s eigenvectors form a joyful orthogonal basis. I Above equations combine to give I I Perron-Frobenius tells us that only the dominant T T ~x = c1A c2A~x = λA A~x. eigenvalue’s eigenvector can be chosen to have non-negative entries. where λ = c1c2 > 0. I So: linear assumption leads to a solvable system. I It’s all good: we have the heart of singular value What would be very good: find networks where we decomposition before us... I have independent measures of node ‘importance’ and see how importance is actually distributed.

26/29 27/29 Measures of Measures of References I centrality References II centrality

Background [4] M. E. J. Newman. Background Scientific collaboration networks. II. Shortest paths, [1] U. Brandes. Centrality Centrality measures weighted networks, and centrality. measures A faster algorithm for betweenness centrality. Degree centrality Degree centrality Closeness centrality Phys. Rev. E, 64(1):016132, 2001. pdf ( ) Closeness centrality J. Math. Sociol., 25:163–177, 2001. pdf ( ) Betweenness  Betweenness  Eigenvalue centrality Eigenvalue centrality Hubs and Authorities [5] M. E. J. Newman and M. Girvan. Hubs and Authorities [2] J. M. Kleinberg. References Finding and evaluating community structure in References Authoritative sources in a hyperlinked environment. networks. Proc. 9th ACM-SIAM Symposium on Discrete Phys. Rev. E, 69(2):026113, 2004. pdf () Algorithms, 1998. pdf () [6] F. Ninio. [3] K. Y. Lin. A simple proof of the Perron-Frobenius theorem for An elementary proof of the perron-frobenius theorem positive symmetric matrices. for non-negative symmetric matrices. J. Phys. A.: Math. Gen., 9:1281–1282, 1976. pdf ( ) Chinese Journal of Physics, 15:283–285, 1977.  pdf () [7] S. Wasserman and K. Faust. : Methods and Applications. 28/29 29/29 Cambridge University Press, Cambridge, UK, 1994.