Centralities (4)

Total Page:16

File Type:pdf, Size:1020Kb

Centralities (4) Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn’t talk about in class: 2 PageRank algorithm • Eigenvector centrality: i’s Rank score is the sum of the Rank scores of all pages j that point to i : , ∈ • Then Katz centrality adds the teleportation by adding a small weight edge to each node (using a weight of ): , ∈ • BUT, since a page j may point to many other pages, its prestige score should be shared among these pages. (For example NPS pointing to many sites) , ∈ 3 Matrix notation (1) • Let be a n-dimensional column vector of PageRank T values, i.e., ) . • Let A be the adjacency matrix of our digraph with entries • Then the PageRank centrality of node is given by: deg or 1 Where is the damping factor, generally set for = .85 (more on the next page). 4 Matrix notation (2) So the PageRank centrality of node is given by: β where α is the damping factor (generally α = .85) Recall from eigenvector centrality: A x =x or x = A x • Small values (close to 0): the contribution given by paths longer than one hop is small, so centrality scores are mainly influenced by in-degrees. • Large values (close to ): allows long paths to be devalued smoothly, and centrality scores influenced by the topology of G. • Recommendation: choose α∈0, , where the centrality diverges at α = . The default is usually .85 5 Overview What makes a How do you When is it How can we capture it? vertex central in describe it appropriate to a network? mathematically? use it? Lots of one-hop A weighted degree For example Eigenvector centrality connections to centrality based on when the α high centrality the weight of the people you are vertices neighbors connected to Where A is the in degree matrix matter. Lots of one-hop A weighted degree Directed graphs Katz connections to centrality based on that are not α ∑ + β high out-degree the out degree of strongly Where β is some small weight vertices the neighbors connected for each node As above but As above but Page Rank: distribute the deg distributing the α weight that a wealth of a deg node has to the node to the ones or nodes it points to it points to α β PR: most known and influential algorithms for computing the relevance of web pages An example as just described: Problem vertex (no outgoing links) Recall that the problem with vertices with indegree = 0 was solved by using β in-degree matrix 010000 α β deg 101000 or 110100 α β A 000001 each row Is the formula above shows the 000101 well defined? in degree If not, how could we fix 000100 the formula or the matrix? each column shows the out degree How can we fix the problem? 1. Remove those pages with no out-links during the PageRank computation as these pages do not affect the ranking of any other page directly (these pages will get outgoing links in the future). 2. Add a complete set of outgoing links from each such page i to all the pages on the Web. each column shows the out in-degree matrix degree each row 010010 shows the The second choice is used in PR since 101010 in degree matrix may get updated 110110 A 000011 000111 000110 8 How can we fix the out degree = 0? 010010 α β 101010 110110 A 000011 000111 000110 1/2 0 0 0 0 0 in-degree matrix 01/20 0 0 0 001/1000 D-1 0001/300 00001/60 Inverse of the out-degree matrix 000001/29 PR centrality formula is well defined By multiplying them we obtain the matrix that captures: 1. The in and out degree per vertex α β 2. Divides the centrality of each vertex by its degree out-degree matrix in-degree matrix 0 1 2 0 0 1 6 0 1 2 0 1 0 1 6 0 -1 1/ 2 1 2 0 1 3 1 6 0 The contribution AD of node 5 is 0 0 0 0 1 6 1 2 insignificant, 0 0 0 1 3 1 6 1 2 and the formula is now well defined 0 0 0 1 3 1 6 0 10 Transition probability matrix • This modified matrix is called the state transition probability matrix. Denote its entries by pij : p p . p 11 12 1n p21 p22 . p2n . AD-1 . . . pn1 pn2 . pnn • pij represents the transition probability that the surfer in state i (page i) will move to state j (page j). • Here is an example: 11 A small Internet consisting of just 4 websites Source: http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html 12 A small Internet consisting of just 4 websites pij represents the transition probability that the surfer on page j will move to page i: 0 0 1 1/ 2 1/ 3 0 0 0 AD -1 p ij 1/ 3 1 2 0 1 2 1/ 3 1/ 2 0 0 Source: http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html 13 A small Internet consisting of just 4 websites Random surfer: each page has equal probability ¼ to be chosen as a starting point. 0 0 1 1/ 2 1/ 3 0 0 0 AD -1 p ij 1/ 3 1 2 0 1 2 1/ 3 1/ 2 0 0 The probability that page i will be visited after k steps (i.e. the random surfer ending up at page i ) is equal to entry of A kx. Simplification for this example: No β was involved since id i > 0, for all i Source: http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html 14 Overview Updated! What makes a How do you When is it How can we capture it? vertex central in describe it appropriate to a network? mathematically? use it? Lots of one-hop A weighted degree For example Eigenvector centrality connections to centrality based on when the high centrality the weight of the people you are vertices neighbors connected to є matter. Lots of one-hop A weighted degree Directed graphs Katz connections to centrality based on that are not ∑ є + β high out-degree the out degree of strongly Where β is some initial weight vertices the neighbors connected As above but As above but Page Rank: distribute the deg distributing the α β weight that a wealth of a deg node has to the node to the ones Where outdeg j = max{1, out nodes it points to it points to degree of node j} Some comments • Newman’s book gives: where α is called the damping factor which can be set to between 0 and 1(or the largest eigenvalue of A). • And the formula in the original PageRank is: where d is the damping factor (d = 0.85 as default) • Gephi: the default value for is the probability = 0.85 and Epsilon is the criteria for eigenvector convergence based on the power method Final Points on PageRank • Fighting spam. – A page is important if the pages pointing to it are important. – Since it is not easy for Web page owner to add in-links into his/her page from other important pages, it is thus not easy to influence PageRank. • PageRank is a global measure and is query independent. – The values of the PageRank algorithm of all the pages are computed and saved off-line rather than at the query time => fast • Criticism: – There are companies that can increase your pagerank by adding it to a cluster and increasing its indegree – It cannot not distinguish between pages that are authoritative in general and pages that are authoritative on the query topic. • But it works based on the keyword search 17 Betweenness Centrality Some pages are adapted from Dan Ryan, Mills College Different types of centralities: Betweenness Centrality Closeness Centrality Eigenvector Centrality Degree Centrality Source: Discovering Sets of Key Players in Social Networks – Daniel Ortiz-Arroyo – Springer 2010/ 19 Betweenness Centrality • Intuition: how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops? • Interactions between two individuals depend on the other individuals in the set of nodes. The nodes in the middle have some control over the paths in the graph. • Useful for flow, such as information or data packages Assumptions • When there is more than one geodesic, all geodesics are equally likely to be used. • Flow takes the shortest path (we’ll look at alternatives) • Every pair of nodes in G exchanges a message with equal probability per unit time. • Question: How many messages, on average, will have passed through each vertex en route to their destination? – A node’s betweenness is given by all pairs of nodes, including the node in question. 21 Meaning of betweenness centrality • Vertices with high betweenness centrality have influence in the network by virtue of their control over information passing between others. – They get to see the messages as they pass through – They could get paid for passing the message along Thus they get a lot of power: their removal would disrupt communication How would you capture it in a mathematical formula? 22 Formula for betweenness centrality , where is the number of s-t geodesics that i belongs to (default: i could equal s or t, but in other versions it cannot and that’s where you see 0 values) – in an undirected graph, an s-t geodesic is the same as a t-s geodesics, so the edge gets counted twice) It is applicable to directed networks as well.
Recommended publications
  • Approximating Network Centrality Measures Using Node Embedding and Machine Learning
    Approximating Network Centrality Measures Using Node Embedding and Machine Learning Matheus R. F. Mendon¸ca,Andr´eM. S. Barreto, and Artur Ziviani ∗† Abstract Extracting information from real-world large networks is a key challenge nowadays. For instance, computing a node centrality may become unfeasible depending on the intended centrality due to its computational cost. One solution is to develop fast methods capable of approximating network centralities. Here, we propose an approach for efficiently approximating node centralities for large networks using Neural Networks and Graph Embedding techniques. Our proposed model, entitled Network Centrality Approximation using Graph Embedding (NCA-GE), uses the adjacency matrix of a graph and a set of features for each node (here, we use only the degree) as input and computes the approximate desired centrality rank for every node. NCA-GE has a time complexity of O(jEj), E being the set of edges of a graph, making it suitable for large networks. NCA-GE also trains pretty fast, requiring only a set of a thousand small synthetic scale-free graphs (ranging from 100 to 1000 nodes each), and it works well for different node centralities, network sizes, and topologies. Finally, we compare our approach to the state-of-the-art method that approximates centrality ranks using the degree and eigenvector centralities as input, where we show that the NCA-GE outperforms the former in a variety of scenarios. 1 Introduction Networks are present in several real-world applications spread among different disciplines, such as biology, mathematics, sociology, and computer science, just to name a few. Therefore, network analysis is a crucial tool for extracting relevant information.
    [Show full text]
  • Networkx: Network Analysis with Python
    NetworkX: Network Analysis with Python Salvatore Scellato Full tutorial presented at the XXX SunBelt Conference “NetworkX introduction: Hacking social networks using the Python programming language” by Aric Hagberg & Drew Conway Outline 1. Introduction to NetworkX 2. Getting started with Python and NetworkX 3. Basic network analysis 4. Writing your own code 5. You are ready for your project! 1. Introduction to NetworkX. Introduction to NetworkX - network analysis Vast amounts of network data are being generated and collected • Sociology: web pages, mobile phones, social networks • Technology: Internet routers, vehicular flows, power grids How can we analyze this networks? Introduction to NetworkX - Python awesomeness Introduction to NetworkX “Python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.” • Data structures for representing many types of networks, or graphs • Nodes can be any (hashable) Python object, edges can contain arbitrary data • Flexibility ideal for representing networks found in many different fields • Easy to install on multiple platforms • Online up-to-date documentation • First public release in April 2005 Introduction to NetworkX - design requirements • Tool to study the structure and dynamics of social, biological, and infrastructure networks • Ease-of-use and rapid development in a collaborative, multidisciplinary environment • Easy to learn, easy to teach • Open-source tool base that can easily grow in a multidisciplinary environment with non-expert users
    [Show full text]
  • Network Centrality) (100 Points)
    In the name of God. Sharif University of Technology Analysis of Biological Networks CE 558 Spring 2020 Dr. H.R. Rabiee Homework 3 (Network Centrality) (100 points) 1. Compute centrality of the nodes with the most and least centrality for the following network according to the following measures: • Degree • Eccentricity • Closeness • Shortest Path Betweenness 1 2. For a complete (m,n)-bipartite graph compute Katz centrality measure with α = 2mn for each node and determine which nodes have the most centrality. ( (m, n)-bipartite graph consists of two independent partitions with m and n nodes each) 3. (a) For a cycle graph prove that we don't have a central node. (The centrality of all nodes are the same). Prove it for following centrality measures. • Degree • Eccentricity • Closeness • Shortest Path Betweenness • Katz • PageRank (b) Prove that in a graph that has a full-cycle automorphism, there is no central measure. Prove it for an arbitrary centrality measure. (Note that an appropriate centrality measure only depends on the structure of the graph and not the node labels) (c) for n > 2, find a graph with n nodes that has an automorphism but the centrality of the nodes are not all equal for some measure. 1 4. Prove that for any d-regular graph, PageRank centrality measure approaches to nm CKatz as the number of steps approaches to infinity, where n is the number of nodes, m is the number of steps and CKatz is 1 the Katz centrality measure with α = d . 5. (Fast Algorithm to Calculate Shortest-Path-Betweenness Centrality Measure) Consider an arbitrary undirected graph G = (V; E).
    [Show full text]
  • Measuring Homophily
    Measuring Homophily Matteo Cristani, Diana Fogoroasi, and Claudio Tomazzoli University of Verona fmatteo.cristani, diana.fogoroasi.studenti, [email protected] Abstract. Social Network Analysis is employed widely as a means to compute the probability that a given message flows through a social net- work. This approach is mainly grounded upon the correct usage of three basic graph- theoretic measures: degree centrality, closeness centrality and betweeness centrality. We show that, in general, those indices are not adapt to foresee the flow of a given message, that depends upon indices based on the sharing of interests and the trust about depth in knowledge of a topic. We provide new definitions for measures that over- come the drawbacks of general indices discussed above, using Semantic Social Network Analysis, and show experimental results that show that with these measures we have a different understanding of a social network compared to standard measures. 1 Introduction Social Networks are considered, on the current panorama of web applications, as the principal virtual space for online communication. Therefore, it is of strong relevance for practical applications to understand how strong a member of the network is with respect to the others. Traditionally, sociological investigations have dealt with problems of defining properties of the users that can value their relevance (sometimes their impor- tance, that can be considered different, the first denoting the ability to emerge, and the second the relevance perceived by the others). Scholars have developed several measures and studied how to compute them in different types of graphs, used as models for social networks. This field of research has been named Social Network Analysis.
    [Show full text]
  • Modeling Customer Preferences Using Multidimensional Network Analysis in Engineering Design
    Modeling customer preferences using multidimensional network analysis in engineering design Mingxian Wang1, Wei Chen1, Yun Huang2, Noshir S. Contractor2 and Yan Fu3 1 Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA 2 Science of Networks in Communities, Northwestern University, Evanston, IL 60208, USA 3 Global Data Insight and Analytics, Ford Motor Company, Dearborn, MI 48121, USA Abstract Motivated by overcoming the existing utility-based choice modeling approaches, we present a novel conceptual framework of multidimensional network analysis (MNA) for modeling customer preferences in supporting design decisions. In the proposed multidimensional customer–product network (MCPN), customer–product interactions are viewed as a socio-technical system where separate entities of `customers' and `products' are simultaneously modeled as two layers of a network, and multiple types of relations, such as consideration and purchase, product associations, and customer social interactions, are considered. We first introduce a unidimensional network where aggregated customer preferences and product similarities are analyzed to inform designers about the implied product competitions and market segments. We then extend the network to a multidimensional structure where customer social interactions are introduced for evaluating social influence on heterogeneous product preferences. Beyond the traditional descriptive analysis used in network analysis, we employ the exponential random graph model (ERGM) as a unified statistical
    [Show full text]
  • Models for Networks with Consumable Resources: Applications to Smart Cities Hayato Montezuma Ushijima-Mwesigwa Clemson University, [email protected]
    Clemson University TigerPrints All Dissertations Dissertations 12-2018 Models for Networks with Consumable Resources: Applications to Smart Cities Hayato Montezuma Ushijima-Mwesigwa Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Recommended Citation Ushijima-Mwesigwa, Hayato Montezuma, "Models for Networks with Consumable Resources: Applications to Smart Cities" (2018). All Dissertations. 2284. https://tigerprints.clemson.edu/all_dissertations/2284 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Models for Networks with Consumable Resources: Applications to Smart Cities A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Hayato Ushijima-Mwesigwa December 2018 Accepted by: Dr. Ilya Safro, Committee Chair Dr. Mashrur Chowdhury Dr. Brian Dean Dr. Feng Luo Abstract In this dissertation, we introduce different models for understanding and controlling the spreading dynamics of a network with a consumable resource. In particular, we consider a spreading process where a resource necessary for transit is partially consumed along the way while being refilled at special nodes on the network. Examples include fuel consumption of vehicles together with refueling stations, information loss during dissemination with error correcting nodes, consumption of ammunition of military troops while moving, and migration of wild animals in a network with a limited number of water-holes. We undertake this study from two different perspectives. First, we consider a network science perspective where we are interested in identifying the influential nodes, and estimating a nodes’ relative spreading influence in the network.
    [Show full text]
  • RESEARCH PAPER . Text Information Aggregation with Centrality Attention
    SCIENCE CHINA Information Sciences . RESEARCH PAPER . Text Information Aggregation with Centrality Attention Jingjing Gong, Hang Yan, Yining Zheng, Qipeng Guo, Xipeng Qiu* & Xuanjing Huang School of Computer Science, Fudan University, Shanghai 200433, China Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China fjjgong, hyan11, ynzheng15, qpguo16, xpqiu, [email protected] Abstract A lot of natural language processing problems need to encode the text sequence as a fix-length vector, which usually involves aggregation process of combining the representations of all the words, such as pooling or self-attention. However, these widely used aggregation approaches did not take higher-order relationship among the words into consideration. Hence we propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. More specifically, we build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word. The explicit modeling of relationships as a graph is able to capture some higher-order dependency among words, which helps us achieve better results in 5 text classification tasks and one SNLI task than baseline models such as pooling, self-attention and dynamic routing. Besides, in order to compute the dominant eigenvector of the graph, we adopt power method algorithm to get the eigen-centrality measure. Moreover, we also derive an iterative approach to get the gradient for the power method process to reduce both memory consumption and computation requirement. Keywords Information Aggregation, Eigen Centrality, Text Classification, NLP, Deep Learning Citation Jingjing Gong, Hang Yan, Yining Zheng, Qipeng Guo, Xipeng Qiu, Xuanjing Huang.
    [Show full text]
  • Received Citations As a Main SEO Factor of Google Scholar Results Ranking
    RECEIVED CITATIONS AS A MAIN SEO FACTOR OF GOOGLE SCHOLAR RESULTS RANKING Las citas recibidas como principal factor de posicionamiento SEO en la ordenación de resultados de Google Scholar Cristòfol Rovira, Frederic Guerrero-Solé and Lluís Codina Nota: Este artículo se puede leer en español en: http://www.elprofesionaldelainformacion.com/contenidos/2018/may/09_esp.pdf Cristòfol Rovira, associate professor at Pompeu Fabra University (UPF), teaches in the Depart- ments of Journalism and Advertising. He is director of the master’s degree in Digital Documenta- tion (UPF) and the master’s degree in Search Engines (UPF). He has a degree in Educational Scien- ces, as well as in Library and Information Science. He is an engineer in Computer Science and has a master’s degree in Free Software. He is conducting research in web positioning (SEO), usability, search engine marketing and conceptual maps with eyetracking techniques. https://orcid.org/0000-0002-6463-3216 [email protected] Frederic Guerrero-Solé has a bachelor’s in Physics from the University of Barcelona (UB) and a PhD in Public Communication obtained at Universitat Pompeu Fabra (UPF). He has been teaching at the Faculty of Communication at the UPF since 2008, where he is a lecturer in Sociology of Communi- cation. He is a member of the research group Audiovisual Communication Research Unit (Unica). https://orcid.org/0000-0001-8145-8707 [email protected] Lluís Codina is an associate professor in the Department of Communication at the School of Com- munication, Universitat Pompeu Fabra (UPF), Barcelona, Spain, where he has taught information science courses in the areas of Journalism and Media Studies for more than 25 years.
    [Show full text]
  • Katz Centrality for Directed Graphs • Understand How Katz Centrality Is an Extension of Eigenvector Centrality to Learning Directed Graphs
    Prof. Ralucca Gera, [email protected] Applied Mathematics Department, ExcellenceNaval Postgraduate Through Knowledge School MA4404 Complex Networks Katz Centrality for directed graphs • Understand how Katz centrality is an extension of Eigenvector Centrality to Learning directed graphs. Outcomes • Compute Katz centrality per node. • Interpret the meaning of the values of Katz centrality. Recall: Centralities Quality: what makes a node Mathematical Description Appropriate Usage Identification important (central) Lots of one-hop connections The number of vertices that Local influence Degree from influences directly matters deg Small diameter Lots of one-hop connections The proportion of the vertices Local influence Degree centrality from relative to the size of that influences directly matters deg C the graph Small diameter |V(G)| Lots of one-hop connections A weighted degree centrality For example when the Eigenvector centrality to high centrality vertices based on the weight of the people you are (recursive formula): neighbors (instead of a weight connected to matter. ∝ of 1 as in degree centrality) Recall: Strongly connected Definition: A directed graph D = (V, E) is strongly connected if and only if, for each pair of nodes u, v ∈ V, there is a path from u to v. • The Web graph is not strongly connected since • there are pairs of nodes u and v, there is no path from u to v and from v to u. • This presents a challenge for nodes that have an in‐degree of zero Add a link from each page to v every page and give each link a small transition probability controlled by a parameter β. u Source: http://en.wikipedia.org/wiki/Directed_acyclic_graph 4 Katz Centrality • Recall that the eigenvector centrality is a weighted degree obtained from the leading eigenvector of A: A x =λx , so its entries are 1 λ Thoughts on how to adapt the above formula for directed graphs? • Katz centrality: ∑ + β, Where β is a constant initial weight given to each vertex so that vertices with zero in degree (or out degree) are included in calculations.
    [Show full text]
  • Multidimensional Network Analysis
    Universita` degli Studi di Pisa Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Multidimensional Network Analysis Michele Coscia Supervisor Supervisor Fosca Giannotti Dino Pedreschi May 9, 2012 Abstract This thesis is focused on the study of multidimensional networks. A multidimensional network is a network in which among the nodes there may be multiple different qualitative and quantitative relations. Traditionally, complex network analysis has focused on networks with only one kind of relation. Even with this constraint, monodimensional networks posed many analytic challenges, being representations of ubiquitous complex systems in nature. However, it is a matter of common experience that the constraint of considering only one single relation at a time limits the set of real world phenomena that can be represented with complex networks. When multiple different relations act at the same time, traditional complex network analysis cannot provide suitable an- alytic tools. To provide the suitable tools for this scenario is exactly the aim of this thesis: the creation and study of a Multidimensional Network Analysis, to extend the toolbox of complex network analysis and grasp the complexity of real world phenomena. The urgency and need for a multidimensional network analysis is here presented, along with an empirical proof of the ubiquity of this multifaceted reality in different complex networks, and some related works that in the last two years were proposed in this novel setting, yet to be systematically defined. Then, we tackle the foundations of the multidimensional setting at different levels, both by looking at the basic exten- sions of the known model and by developing novel algorithms and frameworks for well-understood and useful problems, such as community discovery (our main case study), temporal analysis, link prediction and more.
    [Show full text]
  • Analysis of the Youtube Channel Recommendation Network
    CS 224W Project Milestone Analysis of the YouTube Channel Recommendation Network Ian Torres [itorres] Jacob Conrad Trinidad [j3nidad] December 8th, 2015 I. Introduction With over a billion users, YouTube is one of the largest online communities on the world wide web. For a user to upload a video on YouTube, they can create a channel. These channels serve as the home page for that account, displaying the account's name, description, and public videos that have been up- loaded to YouTube. In addition to this content, channels can recommend other channels. This can be done in two ways: the user can choose to feature a channel or YouTube can recommend a channel whose content is similar to the current channel. YouTube features both of these types of recommendations in separate sidebars on the user's channel. We are interested analyzing in the structure of this potential network. We have crawled the YouTube site and obtained a dataset totaling 228575 distinct user channels, 400249 user recommendations, and 400249 YouTube recommendations. In this paper, we present a systematic and in-depth analysis on the structure of this network. With this data, we have created detailed visualizations, analyzed different centrality measures on the network, compared their community structures, and performed motif analysis. II. Literature Review As YouTube has been rising in popularity since its creation in 2005, there has been research on the topic of YouTube and discovering the structure behind its network. Thus, there exists much research analyzing YouTube as a social network. Cheng looks at videos as nodes and recommendations to other videos as links [1].
    [Show full text]
  • A Systematic Survey of Centrality Measures for Protein-Protein
    Ashtiani et al. BMC Systems Biology (2018) 12:80 https://doi.org/10.1186/s12918-018-0598-2 RESEARCHARTICLE Open Access A systematic survey of centrality measures for protein-protein interaction networks Minoo Ashtiani1†, Ali Salehzadeh-Yazdi2†, Zahra Razaghi-Moghadam3,4, Holger Hennig2, Olaf Wolkenhauer2, Mehdi Mirzaie5* and Mohieddin Jafari1* Abstract Background: Numerous centrality measures have been introduced to identify “central” nodes in large networks. The availability of a wide range of measures for ranking influential nodes leaves the user to decide which measure may best suit the analysis of a given network. The choice of a suitable measure is furthermore complicated by the impact of the network topology on ranking influential nodes by centrality measures. To approach this problem systematically, we examined the centrality profile of nodes of yeast protein-protein interaction networks (PPINs) in order to detect which centrality measure is succeeding in predicting influential proteins. We studied how different topological network features are reflected in a large set of commonly used centrality measures. Results: We used yeast PPINs to compare 27 common of centrality measures. The measures characterize and assort influential nodes of the networks. We applied principal component analysis (PCA) and hierarchical clustering and found that the most informative measures depend on the network’s topology. Interestingly, some measures had a high level of contribution in comparison to others in all PPINs, namely Latora closeness, Decay, Lin, Freeman closeness, Diffusion, Residual closeness and Average distance centralities. Conclusions: The choice of a suitable set of centrality measures is crucial for inferring important functional properties of a network.
    [Show full text]