Centrality Measures Survey and Comparisons

centrality measures Survey and comparisons Authors: Antonio Esposito Emanuele Pesce Supervisors: Prof. Vincenzo Auletta Ph.D Diodato Ferraioli Aprile 2015 University of Salerno, deparment of computer science 0 outline Introduction Centrality measures Geometric measures Path-based measures Spectral measures Effectiveness of centrality measures Axioms for centrality Information retrieval Conclusions 1 introduction centrality of a network What is a centrality measure? ∙ Given a network, the centrality is a quantitative measure which aims at reveling the importance of a node ∙ The more a node is centered, the more it is important ∙ Formally, a centrality measure is a real valued function on the nodes of a graph What do you mean by center? ∙ There are many intuitive ideas about what a center is, so there are many different centrality measures 3 definition of center The center of a star is at the same time: ∙ the node with largest degree ∙ the node that is closest to the other nodes ∙ the node through which most shortest paths pass ∙ the node with the largest number of incoming paths ∙ the node that maximize the dominant eigenvector of the graph matrix Several centrality indices ∙ Different centrality indices capture different properties of a network 4 centrality: some applications Centrality is used often for detecting: ∙ how influential a person is in a social network? ∙ how well used a road is in a transportation network? ∙ how important a web page is? ∙ how important a room is in a building? 5 centrality measures centrality measures Geometric measures ∙ Indegree ∙ Closeness ∙ Harmonic ∙ Lin’s Index Path-based measures ∙ Betweeness Spectral measures ∙ The left dominant eigenvector ∙ Seeley’s index ∙ Katz’s index ∙ PageRank ∙ HITS ∙ SALSA 7 different centrality measures Example of different centrality measures applied to the same network 8 geometric measures The idea ∙ In geometric measures the importance is a function of distances. ∙ A geometric centrality depends on how many nodes exist at every distance 9 geometric measures: indegree centrality ∙ Indegree centrality is defined as the number of incoming arcs of a node x − Cindegree(x) = d (x) (1) ∙ The node with the highest degree is the most important When to use it? ∙ To identify people whom you can talk to ∙ To identify people whom will do favors for you 10 indegree centrality: examples Indegree measure applied on different networks 11 indegree centrality: examples Indegree centrality can be deceiving because it is a local measure Indegree centrality doeas not work well for: ∙ detecting nodes that are broker between two groups ∙ predicting if an information reaches a node 12 geometric measures: closeness centrality ∙ Closeness centrality of x is defined by: 1 C (x) = P (2) closeness d(y; x) d(y;x)<1 ∙ Divide it for the max number of nodes (n − 1) to normalize the closeness centrality ∙ Nodes with empty coreachable set have centrality 0 ∙ The closer a node is to all others, the more it is important When to use it? ∙ To identify people whom tend to be very influential person within their local network ∙ They may often not be public figures, but they are often respected locally ∙ To measure how long it will take to spread information from node x to all other nodes 13 closeness centrality: example Closeness measure applied to different networks 14 geometric measures: harmonic centrality ∙ Harmonic centrality of x, with the convention 1−1 = 0 is defined by: 1 C (x) = P (3) harmonic d(y; x) y=6 x ∙ It is correlated to closeness centrality in simple networks, but it also accounts for nodes y that cannot reach x When to use it? ∙ The same for the closeness but it can be applied to graphs that are not connected 15 harmonic centrality: examples Harmonic and indegree measures applied to the same network (Zachary’s karate club) 16 lin’s index ∙ Lin’s index of x jfy j d(y; x) < 1gj2 C (x) = P (4) lin d(y; x) d(y;x)<1 ∙ As closeness, but here nodes with a larger coreachable set are more important A fact ∙ Surprisingly, Lin’s index was ignored in literature, even though it seems to provide a reasonable solution for detecting centers in networks 17 path-based measures The idea ∙ Path-based measures exploit not only the existence of shortest paths but actually take into examination all shortest paths (or all paths) coming into a node 18 path-based measures: betweenness centrality ∙ The intuition behind the betweenness centrality is to measure the probability that a random shortest path passes though a given node. Betweenness of x is defined as: X αyz(x) Cbetweenness(x) = (5) αyz y;z=6 x,αyz=6 0 ∙ αyz is the number of shortest paths going from y to z ∙ αyz(x) is the number of shortest paths that pass through x ∙ The higher is the fraction of shortest paths which passes through a node, the more the node is important When to use it? ∙ To identify nodes which have a large influence on the transfer of items through the network 19 betweenness centrality: examples Betweenness applied to different networks 20 betweenness and indegree Betweenness and indegree measures applied to the same network (Zachary’s karate club) 21 betweenness and closeness ∙ Betweenness and closeness measures applied to the same network ∙ The nodes are sized by degree and colored by betweenness 22 spectral measures The idea ∙ In spectral measures the importance is related to the iterated computation of the left dominant eigenvector of the adjacency matrix. ∙ In the spectral centrality the importance of a node is given by the importance of the neighbourhood ∙ The more important are the nodes pointing at you, the more important you are 23 spectral measures How many of them? ∙ The dominant eigenvector ∙ Seeley’s index ∙ Katz’s index ∙ PageRank ∙ HITS ∙ SALSA 24 spectral measures: some useful notation Given the adjacency matrix A we can compute: ∙ The `1 norm of the matrix A¯ ∙ Each element of the row i is divided by the sum of its elements ∙ The symmetric graph G0 of the given graph G ∙ The transpose of AT of the adjacency matrix A ∙ The number of k−lenght path from a node i to another node j k ∙ A : in such a matrix, each element aij will be the number of paths with lenght = k from the node i to the node j 25 spectral measures: the left dominant eigenvector Dominant eigenvector ∙ Taking in consideration the left dominant eigenvector means to consider the incoming edges of a node. ∙ To find out the node’s importance, we perform an iterated computation of: 1 Xn xt+1 = A(t) (6) i λ ij i=0 where: 0 8 ∙ xi = 1 i at step 0 ∙ xt is the score after t iterations ∙ λ is the dominant eigenvalue of the adjacency matrix A ∙ After that, the vector x is normalized and the process iterated until convergence ∙ Each node starts with the same score. Then, in iteration, it receives the sum of the connected neighbor’s score 26 eigenvector centrality: example In figure 1 there are applications on the same graph of degree and eigenvector centrality Figure 1: Degree and eigenvector centrality 27 spectral measures: seeley’s index ∙ Why give away all of our importance? ∙ It would have more sense to equally divide our importance among our successors ∙ The process will remains the same, but from an algebric point of view that means normalizing each row of the adjacency matrix: 1 Xn xt+1 = A¯(t) (7) i λ ij i=0 where: 0 8 ∙ xi = 1 i at step 0 ∙ xt is the score after t iterations ∙ λ is the dominant eigenvalue of the adjacency matrix A ∙ A¯ is the normalized form of the adjacency matrix ∙ Isolated nodes of a non strongly connected graph will have null score over iterations 28 spectral measures: katz’s index Katz’s index weighs all incoming paths to a node and then compute: X1 x = 1 βiAi (8) i=0 where: ∙ x is the output’s scores vector ∙ 1 is the weight’s vector (for example all 1) i 1 ∙ β is an attenuation factor (β < λ ) i ∙ A contains in the generic element aij the number of i-lenght path from i to j 29 spectral measures: pagerank PageRank - a little overview ∙ It’s supposed to be how the Google’s search engine works ∙ It is the unique vector p satisfying p = (1 − α)v(1 − αA¯)−1 ∙ where: ∙ α 2 [0; 1) is a dumping factor ∙ v is a preference vector (a distribution) ∙ A¯ is the `1 normalized adjacency matrix ∙ As shown, PageRank and Katz’s index differ by a constant factor and the `1 normalization of the adjacency matrix A 30 spectral measures: eigenvector and pagerank In figure 2 there are applications of the same graph of eigenvector PageRank centrality Figure 2: Degree and eigenvector centrality 31 spectral measures: hits HITS - a little overview by Kleinberg ∙ The key here is the mutual reinforcement ∙ A node ( such as a page ) is authoritative if it is pointed by many good hubs ∙ Hubs: pages containing good list of authoritative pages ∙ Then an Hub is good if it points to many authoritative pages ∙ We iteratively compute the: ∙ ai: authoritativeness score ( where a0 = 1) ∙ hi: hubbiness score as the following: T hi+1 = aiA ai+1 = hi+1A ∙ This process converges to the left dominant eigenvector of the matrix ATA giving the final score of authoritativeness, called ”HITS” 32 spectral measures: salsa SALSA was ideated by Lempel and Moran ∙ Based on the same mutual reinforcement between authoritativeness and hubbiness, but `1normalizing the matrices A and AT. ∙ Starting value: a0 = 1 ¯T ∙ hi+1 = aiA ¯ ∙ ai+1 = aiA ∙ Contrarily to HITS there is no need of a large number of iteration with SALSA 33 spectral measures: some applications ∙ Left dominant eigenvector: the idea on which networks structure analysisis is based ∙ Seeley’s index: feedback’s network ∙ Katz’s index:

Centrality Measures Survey and Comparisons

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support