Arxiv:2105.06318V1 [Cs.SI] 13 May 2021

Weighted directed clustering: interpretations and requirements for heterogeneous, inferred, and measured networks Tanguy Fardet1;2 and Anna Levina1;2 1 University of Tübingen,Tübingen,Germany 2 Max Planck Institute for Biological Cybernetics, Tübingen,Germany Weights and directionality of the edges carry a large part of the information we can extract from a complex network. However, many network measures were formulated initially for undirected binary networks. The necessity to incorporate information about the weights led to the conception of the multiple extensions, particularly for definitions of the local clustering coefficient discussed here. We uncover that not all of these extensions are fully-weighted; some depend on the degree and thus change a lot when an infinitely small weight edge is exchanged for the absence of an edge, a feature that is not always desirable. We call these methods \hybrid" and argue that, in many situations, one should prefer fully-weighted definitions. After listing the necessary requirements for a method to analyze many various weighted networks properly, we propose a fully-weighted continuous clustering coefficient that satisfies all the previously proposed criteria while also being continuous with respect to vanishing weights. We demonstrate that the behavior and meaning of the Zhang{Horvath clustering and our new continuous definition provide complementary results and significantly outperform other definitions in multiple relevant conditions. Using synthetic and real-world examples, we show that when the network is inferred, noisy, or very heterogeneous, it is essential to use the fully-weighted clustering definitions. CONTENTS 1. Barrat 14 2. Onnela 15 I. Introduction 1 3. Directed versions of the clustering coefficients 15 II. Interpretation and purpose of weighted clustering 2 D. Closure 15 A. Desired properties of weighted clustering 1. Undirected weighted closure 15 coefficients 2 2. Directed weighted closure 15 B. State of the art for weighted clustering 3 C. A continuous definition for weighted E. Network generation algorithms 16 clustering and closure 4 1. Core-periphery network 16 D. Directed weighted clustering 4 2. Watts{Strogatz 16 III. The advantages of fully-weighted definitions 5 F. Real-world networks 16 A. Sensitivity to weight-encoded topological 1. Mouse mesoscale connectome 16 features 6 2. Fediverse mesoscale network 17 B. Continuity and resilience to noise 6 3. Closure in the shuffled networks 17 4. Networks with a high number of single-node IV. Application to real world networks 8 triangles 17 A. Mouse mesoscale connectome 8 B. Decentralized social media: the Fediverse 8 C. Using local clustering to infer dynamical I. INTRODUCTION arXiv:2105.06318v2 [cs.SI] 29 Aug 2021 properties 10 The clustering coefficient (CC) was originally intro- V. Discussion 11 duced for binary undirected networks to quantify strong Acknowledgments 12 connectedness within a local neighborhood. It was de- fined as the fraction of all possible triangles that were re- References 12 alized i.e. the ratio between all triangles to which node i participates (n∆;i) and the total number of triangles that A. Limitations of other fully-weighted definitions 13 could theoretically be made given its degree di, which is 1. Holme et al. (2007) 13 the number of triplets (nT;i): 2. Miyajima and Sakuragawa (2014) 13 bin n∆;i n∆;i Ci = = (1) B. Comparison of clustering properties 13 nT;i di(di − 1) C. Derivation of the evolution of hybrid clustering From a neighbor-centric perspective, it can be seen per- coefficients 14 haps more intuitively as the probability that two neigh- 2 bors of a node are connected. However, as network sci- propose a new definition that obeys additional condi- ence expanded, more and more graphs were encountered, tions, including continuity of the results with respect to where directedness and edge weights play a central role. infinitesimal changes in edge weights, which has signifi- Generalizations of the clustering coefficient were there- cant consequences for the resilience to noise in inferred fore introduced to account for asymmetry in the connec- networks. We demonstrate why fully-weighted methods tions between pairs of nodes or heterogeneity in their are essential for measured and inferred networks, that are strength. pervasive in biological fields such as neuroscience, and The importance of clustering, including its directed for networks dealing with flows of information, money, variants, to understand complex dynamics on networks or goods that display a very broad weight distribution. has been stressed in multiple studies [1{4]. This is notably the case for the middleman motif which is a marker of feedforward loops in transcriptional networks, and of II. INTERPRETATION AND PURPOSE OF information transfer redundancy e.g. in neuroscience. WEIGHTED CLUSTERING More generally, such motifs will influence the evolution of dynamical processes on the networks, for instance syn- A. Desired properties of weighted clustering chronization patterns, and have been shown to character- coefficients ize families of networks such as transcription or language networks [2]. Finally, clustering is used in other mea- Weighted measures are crucial for many network types surements to access the small-world propensity of net- where the binary connectivity is either uninformative works [5] and the choice of a specific definition can there- (fully connected network) or displays similar or lower fore influence whether the network of interest will register heterogeneity compared to the weighted structure. In as small-world or not. this study, we focus on two classes of real-world net- In many applications network topology and weights are works: inferred or measured networks where there can measured only up to certain precision [6, 7]. For exam- be a large number of spurious (false positive) edges with ple, in neuroscience, the functional connectivity networks small weights; and networks associated with flows of in- measured using the indirect inference of connections from formation or goods, which often display broad weight the recorded activity [8, 9]. Accepting the inevitability distributions. This is notably the case for many net- of noise in a network brings forward new requirements on works in neuroscience, and more generally in information, the network measures, namely that they are stable to the transportation, or other social and economic networks. noise and do not change dramatically if the weights are Weights are essential to understand the dynamical pro- perturbed or weak connections are randomly omitted. cesses that occur in these networks, requiring measures There is no agreement among the researchers which that go beyond the binary structure. weighted extension of the clustering coefficient defini- There could be multiple requirements for weighted tion is most appropriate. The three predominantly used clustering coefficients [14] depending on the particular methods at the moment [10{12] differ in many properties question of interest and on the network properties. The of their definitions. Part of the reason for the absence of main requirements that we considered necessary for a a single best weighted clustering lies in a different inter- weighted clustering coefficient are: pretation of weights in various datasets. Consequently, a normalization (C 2 [0; 1]), different weighted extension might be most appropriate • i for various data and specific scientific questions. How- • consistency with the binary definition (for binary ever, to understand which method to use when and why networks, it should give back the classical result), we need to understand their differences precisely. The difficulty of extending graph measures to weighted • linearity (scaling by α all edges involving node i networks is not specific to the clustering coefficient but and all edges in triangles including node i scales Ci can occur whenever ratios of degrees or path-length by α), are involved. We will therefore also discuss a second • continuity (weak influence of the addition or dele- clustering-related measure, called the closure coefficient tion of edges having very small weights, mean- and introduced as the fraction of all open walks of length ing that an edge with infinitesimally small weight 2 starting from node i that are part of a triangle [13]. should be equivalent to the absence of that edge). This will also enable us to discuss the complementarity of closure and clustering as the former provides an im- Compared to a previously proposed list of condi- portant complement to analyze the tendency of nodes to tions [14], we added a continuity condition but did not form 3 and 4-cliques. include a requirement of a specific normalization factor We introduce here a distinction between fully-weighted (the global max(w)) as long as the normalization con- and hybrid definitions and discuss why, for several dition is fulfilled since only the normalization matters. classes of networks, fully-weighted and directed defini- We omitted the last two conditions of Saramäki'spaper tions should be preferred to other clustering definitions (invariance under weight permutation and ignorance of that are currently used for network analysis. We also weights not participating in any triangle). Although they 3 might be of interest for some specific applications, we do triangles to which node i participates. In terms of trian- not consider them to be generally desired properties for gle intensity, this definition was originally written: a clustering coefficient. We also did not require that all P wij +wik a a a weights in a triangle should be accounted for because this B j6=k 2 ij ik jk Ci = condition is necessarily met if the continuity condition is 2si(di − 1) fulfilled. (3) 1 X wij + wik Continuity can be expressed mathematically as follow: = aijaikajk di(di − 1) 2wi for a graph G(V; E), if a weighted edge (u; v; w) with j6=k u; v 2 V and weight w 2 is added to this graph to R B form a new graph G0(V; E0), with E0 = E + f(u; v; w)g, thus defining the intensity of triangle ∆ijk as I∆ijk = wij +wik then the clustering measure is continuous if and only if aijaikajk as the function of two of the triangle's 2wi (G0) (G) weights and the average weight of the edges connected to 8i 2 V , Ci −−−−! Ci .

Arxiv:2105.06318V1 [Cs.SI] 13 May 2021

Annual Report and Accounts for the Year Ended 31 March 2020

The Public Square Project

Mcafee® Antivirus to Be 100% Safe 10/10 1

On Why There Is a Need to Conceptualise Privacy from a Marxist Perspective

Open PDF 125KB

What Is Digital Public Infrastructure?

Content Moderation in Social Media and AI

“I HAVE READ and AGREE to SHARE MY LIFE with YOU”: Building Trust with Comprehensible Terms of Service Agreements