
centrality measures Survey and comparisons Authors: Antonio Esposito Emanuele Pesce Supervisors: Prof. Vincenzo Auletta Ph.D Diodato Ferraioli Aprile 2015 University of Salerno, deparment of computer science 0 outline Introduction Centrality measures Geometric measures Path-based measures Spectral measures Effectiveness of centrality measures Axioms for centrality Information retrieval Conclusions 1 introduction centrality of a network What is a centrality measure? ∙ Given a network, the centrality is a quantitative measure which aims at reveling the importance of a node ∙ The more a node is centered, the more it is important ∙ Formally, a centrality measure is a real valued function on the nodes of a graph What do you mean by center? ∙ There are many intuitive ideas about what a center is, so there are many different centrality measures 3 definition of center The center of a star is at the same time: ∙ the node with largest degree ∙ the node that is closest to the other nodes ∙ the node through which most shortest paths pass ∙ the node with the largest number of incoming paths ∙ the node that maximize the dominant eigenvector of the graph matrix Several centrality indices ∙ Different centrality indices capture different properties of a network 4 centrality: some applications Centrality is used often for detecting: ∙ how influential a person is in a social network? ∙ how well used a road is in a transportation network? ∙ how important a web page is? ∙ how important a room is in a building? 5 centrality measures centrality measures Geometric measures ∙ Indegree ∙ Closeness ∙ Harmonic ∙ Lin’s Index Path-based measures ∙ Betweeness Spectral measures ∙ The left dominant eigenvector ∙ Seeley’s index ∙ Katz’s index ∙ PageRank ∙ HITS ∙ SALSA 7 different centrality measures Example of different centrality measures applied to the same network 8 geometric measures The idea ∙ In geometric measures the importance is a function of distances. ∙ A geometric centrality depends on how many nodes exist at every distance 9 geometric measures: indegree centrality ∙ Indegree centrality is defined as the number of incoming arcs of a node x − Cindegree(x) = d (x) (1) ∙ The node with the highest degree is the most important When to use it? ∙ To identify people whom you can talk to ∙ To identify people whom will do favors for you 10 indegree centrality: examples Indegree measure applied on different networks 11 indegree centrality: examples Indegree centrality can be deceiving because it is a local measure Indegree centrality doeas not work well for: ∙ detecting nodes that are broker between two groups ∙ predicting if an information reaches a node 12 geometric measures: closeness centrality ∙ Closeness centrality of x is defined by: 1 C (x) = P (2) closeness d(y; x) d(y;x)<1 ∙ Divide it for the max number of nodes (n − 1) to normalize the closeness centrality ∙ Nodes with empty coreachable set have centrality 0 ∙ The closer a node is to all others, the more it is important When to use it? ∙ To identify people whom tend to be very influential person within their local network ∙ They may often not be public figures, but they are often respected locally ∙ To measure how long it will take to spread information from node x to all other nodes 13 closeness centrality: example Closeness measure applied to different networks 14 geometric measures: harmonic centrality ∙ Harmonic centrality of x, with the convention 1−1 = 0 is defined by: 1 C (x) = P (3) harmonic d(y; x) y=6 x ∙ It is correlated to closeness centrality in simple networks, but it also accounts for nodes y that cannot reach x When to use it? ∙ The same for the closeness but it can be applied to graphs that are not connected 15 harmonic centrality: examples Harmonic and indegree measures applied to the same network (Zachary’s karate club) 16 lin’s index ∙ Lin’s index of x jfy j d(y; x) < 1gj2 C (x) = P (4) lin d(y; x) d(y;x)<1 ∙ As closeness, but here nodes with a larger coreachable set are more important A fact ∙ Surprisingly, Lin’s index was ignored in literature, even though it seems to provide a reasonable solution for detecting centers in networks 17 path-based measures The idea ∙ Path-based measures exploit not only the existence of shortest paths but actually take into examination all shortest paths (or all paths) coming into a node 18 path-based measures: betweenness centrality ∙ The intuition behind the betweenness centrality is to measure the probability that a random shortest path passes though a given node. Betweenness of x is defined as: X αyz(x) Cbetweenness(x) = (5) αyz y;z=6 x,αyz=6 0 ∙ αyz is the number of shortest paths going from y to z ∙ αyz(x) is the number of shortest paths that pass through x ∙ The higher is the fraction of shortest paths which passes through a node, the more the node is important When to use it? ∙ To identify nodes which have a large influence on the transfer of items through the network 19 betweenness centrality: examples Betweenness applied to different networks 20 betweenness and indegree Betweenness and indegree measures applied to the same network (Zachary’s karate club) 21 betweenness and closeness ∙ Betweenness and closeness measures applied to the same network ∙ The nodes are sized by degree and colored by betweenness 22 spectral measures The idea ∙ In spectral measures the importance is related to the iterated computation of the left dominant eigenvector of the adjacency matrix. ∙ In the spectral centrality the importance of a node is given by the importance of the neighbourhood ∙ The more important are the nodes pointing at you, the more important you are 23 spectral measures How many of them? ∙ The dominant eigenvector ∙ Seeley’s index ∙ Katz’s index ∙ PageRank ∙ HITS ∙ SALSA 24 spectral measures: some useful notation Given the adjacency matrix A we can compute: ∙ The `1 norm of the matrix A¯ ∙ Each element of the row i is divided by the sum of its elements ∙ The symmetric graph G0 of the given graph G ∙ The transpose of AT of the adjacency matrix A ∙ The number of k−lenght path from a node i to another node j k ∙ A : in such a matrix, each element aij will be the number of paths with lenght = k from the node i to the node j 25 spectral measures: the left dominant eigenvector Dominant eigenvector ∙ Taking in consideration the left dominant eigenvector means to consider the incoming edges of a node. ∙ To find out the node’s importance, we perform an iterated computation of: 1 Xn xt+1 = A(t) (6) i λ ij i=0 where: 0 8 ∙ xi = 1 i at step 0 ∙ xt is the score after t iterations ∙ λ is the dominant eigenvalue of the adjacency matrix A ∙ After that, the vector x is normalized and the process iterated until convergence ∙ Each node starts with the same score. Then, in iteration, it receives the sum of the connected neighbor’s score 26 eigenvector centrality: example In figure 1 there are applications on the same graph of degree and eigenvector centrality Figure 1: Degree and eigenvector centrality 27 spectral measures: seeley’s index ∙ Why give away all of our importance? ∙ It would have more sense to equally divide our importance among our successors ∙ The process will remains the same, but from an algebric point of view that means normalizing each row of the adjacency matrix: 1 Xn xt+1 = A¯(t) (7) i λ ij i=0 where: 0 8 ∙ xi = 1 i at step 0 ∙ xt is the score after t iterations ∙ λ is the dominant eigenvalue of the adjacency matrix A ∙ A¯ is the normalized form of the adjacency matrix ∙ Isolated nodes of a non strongly connected graph will have null score over iterations 28 spectral measures: katz’s index Katz’s index weighs all incoming paths to a node and then compute: X1 x = 1 βiAi (8) i=0 where: ∙ x is the output’s scores vector ∙ 1 is the weight’s vector (for example all 1) i 1 ∙ β is an attenuation factor (β < λ ) i ∙ A contains in the generic element aij the number of i-lenght path from i to j 29 spectral measures: pagerank PageRank - a little overview ∙ It’s supposed to be how the Google’s search engine works ∙ It is the unique vector p satisfying p = (1 − α)v(1 − αA¯)−1 ∙ where: ∙ α 2 [0; 1) is a dumping factor ∙ v is a preference vector (a distribution) ∙ A¯ is the `1 normalized adjacency matrix ∙ As shown, PageRank and Katz’s index differ by a constant factor and the `1 normalization of the adjacency matrix A 30 spectral measures: eigenvector and pagerank In figure 2 there are applications of the same graph of eigenvector PageRank centrality Figure 2: Degree and eigenvector centrality 31 spectral measures: hits HITS - a little overview by Kleinberg ∙ The key here is the mutual reinforcement ∙ A node ( such as a page ) is authoritative if it is pointed by many good hubs ∙ Hubs: pages containing good list of authoritative pages ∙ Then an Hub is good if it points to many authoritative pages ∙ We iteratively compute the: ∙ ai: authoritativeness score ( where a0 = 1) ∙ hi: hubbiness score as the following: T hi+1 = aiA ai+1 = hi+1A ∙ This process converges to the left dominant eigenvector of the matrix ATA giving the final score of authoritativeness, called ”HITS” 32 spectral measures: salsa SALSA was ideated by Lempel and Moran ∙ Based on the same mutual reinforcement between authoritativeness and hubbiness, but `1normalizing the matrices A and AT. ∙ Starting value: a0 = 1 ¯T ∙ hi+1 = aiA ¯ ∙ ai+1 = aiA ∙ Contrarily to HITS there is no need of a large number of iteration with SALSA 33 spectral measures: some applications ∙ Left dominant eigenvector: the idea on which networks structure analysisis is based ∙ Seeley’s index: feedback’s network ∙ Katz’s index:
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages48 Page
-
File Size-