Centrality Measures and Link Analysis

Centrality Measures and Link Analysis Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester [email protected] http://www.ece.rochester.edu/~gmateosb/ February 21, 2020 Network Science Analytics Centrality Measures and Link Analysis 1 Centrality measures Centrality measures Case study: Stability of centrality measures in weighted graphs Centrality, link analysis and web search A primer on Markov chains PageRank as a random walk PageRank algorithm leveraging Markov chain structure Network Science Analytics Centrality Measures and Link Analysis 2 Quantifying vertex importance I In network analysis many questions relate to vertex importance Example I Q1: Which actors in a social network hold the `reins of power'? I Q2: How authoritative is a WWW page considered by peers? I Q3: The `knock-out' of which genes is likely to be lethal? I Q4: How critical to the daily commute is a subway station? I Measures of vertex centrality quantify such notions of importance ) Degrees are simplest centrality measures. Let's study others Network Science Analytics Centrality Measures and Link Analysis 3 Closeness centrality I Rationale: `central' means a vertex is `close' to many other vertices I Def: Distance d(u; v) between vertices u and v is the length of the shortest u − v path. Oftentimes referred to as geodesic distance I Closeness centrality of vertex v is given by 1 cCl (v) = P u2V d(u; v) ∗ I Interpret v = arg maxv cCl (v) as the most approachable node in G Network Science Analytics Centrality Measures and Link Analysis 4 Normalization, computation and limitations I To compare with other centrality measures, often normalize to [0; 1] Nv − 1 cCl (v) = P u2V d(u; v) I Computation: need all pairwise shortest path distances in G 2 ) Dijkstra's algorithm in O(Nv log Nv + Nv Ne ) time I Limitation 1: sensitivity, values tend to span a small dynamic range ) Hard to discriminate between central and less central nodes I Limitation 2: assumes connectivity, if not cCl (v) = 0 for all v 2 V ) Compute centrality indices in different components Network Science Analytics Centrality Measures and Link Analysis 5 Betweenness centrality I Rationale: `central' node is (in the path) `between' many vertex pairs I Betweenness centrality of vertex v is given by X σ(s; tjv) c (v) = Be σ(s; t) s6=t6=v2V I σ(s; t) is the total number of s − t shortest paths I σ(s; tjv) is the number of s − t shortest paths through v 2 V ∗ I Interpret v = arg maxv cBe (v) as the controller of information flow Network Science Analytics Centrality Measures and Link Analysis 6 Computational considerations I Notice that a s − t shortest path goes through v if and only if d(s; t) = d(s; v) + d(v; t) I Betweenness centralities can be naively computed for all v 2 V by: Step 1: Use Dijkstra to tabulate d(s; t) and σ(s; t) for all s; t Step 2: Use the tables to identify σ(s; tjv) for all v 3 Step 3: Sum the fractions to obtain cBe (v) for all v (O(Nv ) time) I Cubic complexity can be prohibitive for large networks I O(Nv Ne )-time algorithm for unweighted graphs in: U. Brandes, \A faster algorithm for betweenness centrality," Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163-177, 2001 Network Science Analytics Centrality Measures and Link Analysis 7 Eigenvector centrality I Rationale: `central' vertex if ìn-neighbors' are themselves important ) Compare with ìmportance-agnostic' degree centrality I Eigenvector centrality of vertex v is implicitly defined as X cEi (v) = α cEi (u) (u;v)2E I No one points to 1 5 4 I Only 1 points to 2 I Only 2 points to 3, but 2 1 6 more important than 1 I 4 as high as 5 with less links 2 3 I Links to 5 have lower rank I Same for 6 Network Science Analytics Centrality Measures and Link Analysis 8 Eigenvalue problem I Recall the adjacency matrix A and X cEi (v) = α cEi (u) (u;v)2E > I Vector cEi = [cEi (1);:::; cEi (Nv )] solves the eigenvalue problem −1 AcEi = α cEi ) Typically α−1 chosen as largest eigenvalue of A [Bonacich'87] I If G is undirected and connected, by Perron's Theorem then ) The largest eigenvalue of A is positive and simple ) All the entries in the dominant eigenvector cEi are positive −1 2 I Can compute cEi and α via O(Nv ) complexity power iterations AcEi (k) cEi (k + 1) = ; k = 0; 1;::: kAcEi (k)k Network Science Analytics Centrality Measures and Link Analysis 9 Example: Comparing centrality measures I Q: Which vertices are more central? A: It depends on the context I Each measure identifies a different vertex as most central ) None is `wrong', they target different notions of importance Network Science Analytics Centrality Measures and Link Analysis 10 4 4 Example: Comparing centrality measures I Q: Which vertices are more central? A: It depends on the context 4 4 (a)(a) (b) (b) (a)(a) (b) (b) (c)(c) (d) (d) Fig.Fig. 4.4 4.4IllustrationIllustration of (b) of (b) closeness, closeness, (c) (c) betweenness, betweenness, and and (d) (d) eigenvector eigenvector centrality centrality measures measures on on Closeness thethe graph graphBetweenness in (a). in (a). Example Example and andfiguresfigures courtesy courtesy of Ulrik of Ulrik Brandes. Brandes.Eigenvector I Small green vertices are arguably more peripheral ) Less clear how the yellow, dark blue and red vertices compare (c)(c) (d) (d) Network Science Analytics Centrality Measures and Link Analysis 11 Fig.Fig. 4.4 4.4IllustrationIllustration of of (b) (b) closeness, closeness, (c) (c) betweenness, betweenness, and and (d) (d) eigenvector eigenvector centrality centrality measures measures on on thethe graph graph in in (a). (a). Example Example and andfiguresfigures courtesy courtesy of of Ulrik Ulrik Brandes. Brandes. Case study Centrality measures Case study: Stability of centrality measures in weighted graphs Centrality, link analysis and web search A primer on Markov chains PageRank as a random walk PageRank algorithm leveraging Markov chain structure Network Science Analytics Centrality Measures and Link Analysis 12 Centrality measures robustness I Robustness to noise in network data is of practical importance I Approaches have been mostly empirical ) Find average response in random graphs when perturbed ) Not generalizable and does not provide explanations I Characterize behavior in noisy real graphs ) Degree and closeness are more reliable than betweenness I Q: What is really going on? ) Framework to study formally the stability of centrality measures I S. Segarra and A. Ribeiro, \Stability and continuity of centrality measures in weighted graphs," IEEE Trans. Signal Process., 2015 Network Science Analytics Centrality Measures and Link Analysis 13 Definitions for weighted digraphs 5 I Weighted and directed graphs G(V ; E; W ) a b ) Set V of Nv vertices 2 ) Set E ⊆ V × V of edges 3 4 ) Map W : E ! of weights in each edge R++ c I Path P(u; v) is an ordered sequence of nodes from u to v I When weights represent dissimilarities ) Path length is the sum of the dissimilarities encountered I Shortest path length sG (u; v) from u to v `−1 X sG (u; v) := min W (ui ; ui+1) P(u;v) i=0 Network Science Analytics Centrality Measures and Link Analysis 14 Stability of centrality measures I Space of graphs G(V ;E) with (V ; E) as vertex and edge set I Define the metric d(V ;E)(G; H): G(V ;E) × G(V ;E) ! R+ X d(V ;E)(G; H) := jWG (e) − WH (e)j e2E I Def: A centrality measure c(·) is stable if for any vertex v 2 V in any two graphs G; H 2 G(V ;E), then G H c (v) − c (v) ≤ KG d(V ;E)(G; H) I KG is a constant depending on G only I Stability is related to Lipschitz continuity in G(V ;E) I Independent of the definition of d(V ;E) (equivalence of norms) I Node importance should be robust to small perturbations in the graph Network Science Analytics Centrality Measures and Link Analysis 15 Degree centrality I Sum of the weights of incoming arcs X cDe (v) := W (u; v) uj(u;v)2E I Applied to graphs where the weights in W represent similarities I High cDe (v) ) v similar to its large number of neighbors Proposition 1 For any vertex v 2 V in any two graphs G; H 2 G(V ;E), we have that G H jcDe (v) − cDe (v)j ≤ d(V ;E)(G; H) i.e., degree centrality cDe is a stable measure I Can show closeness and eigenvector centralities are also stable Network Science Analytics Centrality Measures and Link Analysis 16 Betweenness centrality I Look at the shortest paths for every two nodes distinct from v ) Sum the proportion that contains node v X σ(s; tjv) c (v) := Be σ(s; t) s6=v6=t2V I σ(s; t) is the total number of s − t shortest paths I σ(s; tjv) is the number of those paths going through v Proposition 2 The betweenness centrality measure cBe is not stable Network Science Analytics Centrality Measures and Link Analysis 17 Instability of betweenness centrality I Compare the value of cBe (v) in graphs G and H G H 1 1 v 1 1 1 1 + v 1 + 1 1 1 1 1 1 1 1 1 G H cBe (v) = 9 cBe (v) = 0 H ) Centrality value cBe (v) = 0 remains unchanged for any > 0 I For small values of , graphs G and H become arbitrarily similar G H 9 = jcBe (v) − cBe (v)j ≤ KG d(V ;E)(G; H) ! 0 ) Inequality is not true for any constant KG Network Science Analytics Centrality Measures and Link Analysis 18 Stable betweenness centrality v v v v v v v I Define G =(V ; E ; W ), V =V nfvg, E =EjV v ×V v , W =W jE v ) G v obtained by deleting from G node v and edges connected to v I Stable betweenness centrality cSBe (v) X cSBe (v) := sG v (s; t) − sG (s; t) s6=v6=t2V ) Captures impact of deleting v on the shortest paths I If v is (not) in the s − t shortest path, sG v (s; t) − sG (s; t) > (=)0 ) Same notion as (traditional)

Load more