Comparison of Different Generalizations of Clustering Coefficient and Local Efficiency for Weighted Undirected Graphs
Total Page:16
File Type:pdf, Size:1020Kb
LETTER Communicated by Mika Rubinov Comparison of Different Generalizations of Clustering Coefficient and Local Efficiency for Weighted Undirected Graphs Yu Wang Eshwar Ghumare [email protected] Laboratory for Cognitive Neurology, Department of Neurosciences, KU Leuven, Leuven 3000, Belgium Rik Vandenberghe [email protected] Laboratory for Cognitive Neurology, Department of Neurosciences, KU Leuven, Leuven 3000, Belgium, and Alzheimer Research Centre, KU Leuven, Leuven Institute for Neuroscience and Disease, KU Leuven, Leuven 3000, Belgium Patrick Dupont [email protected] Laboratory for Cognitive Neurology, Department of Neurosciences, KU Leuven, Leuven 3000, Belgium; Alzheimer Research Centre, KU Leuven, Leuven Institute for Neuroscience and Disease, KU Leuven, Leuven 3000, Belgium; and Medical Imaging Research Center, KU Leuven and University Hospitals Leuven, Leuven 3000, Belgium Binary undirected graphs are well established, but when these graphs are constructed, often a threshold is applied to a parameter describing the connection between two nodes. Therefore, the use of weighted graphs is more appropriate. In this work, we focus on weighted undirected graphs. This implies that we have to incorporate edge weights in the graph mea- sures, which require generalizations of common graph metrics. After re- viewing existing generalizations of the clustering coefficient and the local efficiency, we proposed new generalizations for these graph measures. To be able to compare different generalizations, a number of essential and useful properties were defined that ideally should be satisfied. We ap- plied the generalizations to two real-world networks of different sizes. As a result, we found that not all existing generalizations satisfy all es- sential properties. Furthermore, we determined the best generalization for the clustering coefficient and local efficiency based on their properties and the performance when applied to two networks. We found that the best generalization of the clustering coefficient is CM,hm, defined in Miya- jima and Sakuragawa (2014), while the best generalization of the local Neural Computation 29, 313–331 (2017) c 2017 Massachusetts Institute of Technology. doi:10.1162/NECO_a_00914 Published under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license. Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00914 by guest on 01 October 2021 314 Y.Wang,E.Ghumare,R.Vandenberghe,andP.Dupont P efficiency is Eloc, proposed in this letter. Depending on the application and the relative importance of sensitivity and robustness to noise, other gen- eralizations may be selected on the basis of the properties investigated in this letter. 1 Introduction A complex system can be modeled as a graph or network, which is com- posed of nodes and edges connecting them. Analysis over a wide range of complex systems has led to a fundamental insight: many complex systems often share certain topological characteristics, and these can be captured by graph-theoretical metrics (Barabasi & Oltvai, 2004; Amaral & Ottino, 2004; Zhang & Horvath, 2005; Bullmore & Sporns, 2009; Fornito, Zalesky, & Breakspear, 2013). The small-world topology for example, has been found in many real-world networks (He, Chen, & Evans, 2007; Opsahl & Panzarasa, 2009; Batalle et al., 2012; Vandenberghe et al., 2013), and is an indication of the cost-efficiency of these networks. While traditional graph analysis uses binary edges to enhance contrast between strong and weak connections, there is an increasing demand for using edge weight, which entails potentially important information. Incor- porating edge weights in the graph analysis calls for generalizations of the graph metrics. While some of these measures can be naturally generalized to a weighted version (e.g., node degree to node strength), others cannot be generalized in a straightforward way. The generalization of clustering coefficient and local efficiency, used to quantify the small-world topology (Watts & Strogatz, 1998; Achard and Bullmore, 2007; Batalle et al., 2012), is far from trivial. The clustering coefficient reflects the tendency that neighbors of a node are also neighbors to each other (Rubinov & Sporns, 2010). The clustering coefficient is high in small-world networks compared to random networks (Watts & Strogatz, 1998). Local efficiency is a measure for the fault toler- ance of the system: it measures how efficient the communication is between neighbors of a node when that node is removed (Latora & Marchiori, 2003). A small-world network features a local efficiency intermediate to that of regular (lattice) and random network (Achard & Bullmore, 2007; Batalle et al., 2012). The two measures are related in a way that the clustering coef- ficient in an undirected network is found to be a reasonable approximation of local efficiency (Latora & Marchiori, 2003). Although it is straightforward to find the neighbors of a node, the ques- tion of how to define their weighted surrogates is far from obvious. Several generalizations have been proposed (Barrat, Barthelemy, Pastor-Satorras, & Vespignani, 2004; Onnela, Saramaki,¨ Kertesz,´ & Kaski, 2005; Zhang & Horvath, 2005; Saramaki,¨ Kivela,¨ Onnela, Kaski, & Kertesz, 2007; Opsahl & Panzarasa, 2009; Miyajima & Sakuragawa, 2014; Rubinov & Sporns, 2010). Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00914 by guest on 01 October 2021 Generalizations of Clustering Coefficient and Local Efficiency 315 Different definitions capture slightly different aspects of the network, yet some of the generalizations are not designed for fully weighted networks (Rubinov & Sporns, 2010; Barrat et al., 2004; Onnela et al., 2005). These generalizations require the removal of the weak or noisy connections be- forehand. A preferable solution is to adapt the equations in such a way that they can be used for a fully weighted network (Zhang & Horvath, 2005; Saramaki¨ et al., 2007; Opsahl & Panzarasa, 2009; Miyajima & Sakuragawa, 2014). In this letter, we first define a number of essential and useful properties that ideally should be satisfied when using a generalized graph measure and explain how we will evaluate them. Then we review the existing gen- eralizations, and we propose new generalizations for the local efficiency for fully weighted undirected networks with no self-connections. Finally, we make a thorough comparison of the different generalizations and apply them to two real-world networks. 2 Methods Assume an undirected weighted network with N nodes and an N × N ad- ( , ) jancy matrix A for which the i j th element aij is 1 if an edge between i and j exists and 0 otherwise. In this work, we assume that no self-connections are = present: aii 0. For a binary undirected network, the clustering coefficient for node i is given by 1 C(i) = a a a , (2.1) k (k − 1) ij ih jh i i j,h with ki the degree of node i. The node degree ki is defined as the number of nodes connected to node i. The local efficiency is defined as 1 − E (i) = a a [d (N )] 1, (2.2) loc k (k − 1) ij ih jh i i i j,h; j=h in which Ni is the subgraph consisting of the neighbors of i excluding node ( ) i itself, and d jh Ni is the length of the shortest path between nodes j and h containing only neighbors of i. If no path containing these neighbors is ( ) =∞ found, d jh Ni . In a weighted network, we define the weight matrix W in which each w element represents the weight ij between node i and node j. All weights are assumed to be positive. If no connection is present, the weight is 0. ∀ ,w = In this work, we focus on networks without self-connections: i ii 0. The node degree ki is calculated based on the presence of connections with nonzero weights irrespective of the amplitude. In case of a fully connected Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00914 by guest on 01 October 2021 316 Y.Wang,E.Ghumare,R.Vandenberghe,andP.Dupont Figure 1: Six possible triangle configurations for node i filled in black. Normal and dashed lines represent strong and weak edges, respectively. = − weighted network with N nodes and no self-connections, ki N 1. The node strength, which takes into account the weight of the connection, is defined by = w , si ij (2.3) j w = Note that for a binary network in which ij 1 if a connection is present = between nodes i and j and 0 otherwise, si ki. 2.1 Properties of Generalized Graph Measures. A generalization ( ) ( ) Gw W of a binary graph measure Gb A should ideally satisfy some prop- erties. Some of these properties are essential, while others may depend on the application. The essential properties are: • General versatility. If the input is given as a binary network, the output of the generalizations should give the same results as the = = binary version: Gw Gb when W A (Miyajima & Sakuragawa, 2014). • Continuity. The graph measure should be continuous, that is, ( ) = ( ) = lim→0 Gw W Gw W in which W W except for one connec- tion weight, which differs by (Miyajima & Sakuragawa, 2014). • Sensitivity. The graph measure should be able to make a distinction between different cases for which the graph measure is designed. Because the clustering coefficient and the local efficiency are defined using triangles, we will evaluate the six possible cases in which the weight is either low or high in one of the connections of the triangle (see Figure 1). • Robustness to noise. The graph measure Gw (W)(i) for node i should be robust when adding noise to the weights of the connections. If we assume a noise matrix ν, which represents additive noise on each wν = w + ν connection weight ij ij ij, and define the mean relative error (in percent) with respect to the noise-free measures as Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00914 by guest on 01 October 2021 Generalizations of Clustering Coefficient and Local Efficiency 317 N ( + ν)( ) − ( )( ) 100 Gw W i Gw W i (ν) = , (2.4) N Gw (W)(i) i=1 then a small value of ||ν|| should lead to a small error value (ν).