Weighted directed clustering: interpretations and requirements for heterogeneous, inferred, and measured networks

Tanguy Fardet1,2 and Anna Levina1,2 1 University of T¨ubingen,T¨ubingen,Germany 2 Max Planck Institute for Biological Cybernetics, T¨ubingen,Germany

Weights and directionality of the edges carry a large part of the information we can extract from a complex network. However, many network measures were formulated initially for undirected binary networks. The necessity to incorporate information about the weights led to the conception of the multiple extensions, particularly for definitions of the local clustering coefficient discussed here. We uncover that not all of these extensions are fully-weighted; some depend on the degree and thus change a lot when an infinitely small weight edge is exchanged for the absence of an edge, a feature that is not always desirable. We call these methods “hybrid” and argue that, in many situations, one should prefer fully-weighted definitions. After listing the necessary requirements for a method to analyze many various weighted networks properly, we propose a fully-weighted continuous clustering coefficient that satisfies all the previously proposed criteria while also being continuous with respect to vanishing weights. We demonstrate that the behavior and meaning of the Zhang–Horvath clustering and our new continuous definition provide complementary results and significantly outperform other definitions in multiple relevant conditions. Using synthetic and real-world examples, we show that when the network is inferred, noisy, or very heterogeneous, it is essential to use the fully-weighted clustering definitions.

CONTENTS 1. Barrat 14 2. Onnela 15 I. Introduction 1 3. Directed versions of the clustering coefficients 15 II. Interpretation and purpose of weighted clustering 2 D. Closure 15 A. Desired properties of weighted clustering 1. Undirected weighted closure 15 coefficients 2 2. Directed weighted closure 15 B. State of the art for weighted clustering 3 C. A continuous definition for weighted E. Network generation algorithms 16 clustering and closure 4 1. Core-periphery network 16 D. Directed weighted clustering 4 2. Watts–Strogatz 16

III. The advantages of fully-weighted definitions 5 F. Real-world networks 16 A. Sensitivity to weight-encoded topological 1. Mouse mesoscale connectome 16 features 6 2. mesoscale network 17 B. Continuity and resilience to noise 6 3. Closure in the shuffled networks 17 4. Networks with a high number of single-node IV. Application to real world networks 8 triangles 17 A. Mouse mesoscale connectome 8 B. Decentralized social media: the Fediverse 8 C. Using local clustering to infer dynamical I. INTRODUCTION arXiv:2105.06318v2 [cs.SI] 29 Aug 2021 properties 10 The clustering coefficient (CC) was originally intro- V. Discussion 11 duced for binary undirected networks to quantify strong Acknowledgments 12 connectedness within a local neighborhood. It was de- fined as the fraction of all possible triangles that were re- References 12 alized i.e. the ratio between all triangles to which node i participates (n∆,i) and the total number of triangles that A. Limitations of other fully-weighted definitions 13 could theoretically be made given its degree di, which is 1. Holme et al. (2007) 13 the number of triplets (nT,i): 2. Miyajima and Sakuragawa (2014) 13 bin n∆,i n∆,i Ci = = (1) B. Comparison of clustering properties 13 nT,i di(di − 1) C. Derivation of the evolution of hybrid clustering From a neighbor-centric perspective, it can be seen per- coefficients 14 haps more intuitively as the probability that two neigh- 2 bors of a node are connected. However, as network sci- propose a new definition that obeys additional condi- ence expanded, more and more graphs were encountered, tions, including continuity of the results with respect to where directedness and edge weights play a central role. infinitesimal changes in edge weights, which has signifi- Generalizations of the clustering coefficient were there- cant consequences for the resilience to noise in inferred fore introduced to account for asymmetry in the connec- networks. We demonstrate why fully-weighted methods tions between pairs of nodes or heterogeneity in their are essential for measured and inferred networks, that are strength. pervasive in biological fields such as neuroscience, and The importance of clustering, including its directed for networks dealing with flows of information, money, variants, to understand complex dynamics on networks or goods that display a very broad weight distribution. has been stressed in multiple studies [1–4]. This is no- tably the case for the middleman motif which is a marker of feedforward loops in transcriptional networks, and of II. INTERPRETATION AND PURPOSE OF information transfer redundancy e.g. in neuroscience. WEIGHTED CLUSTERING More generally, such motifs will influence the evolution of dynamical processes on the networks, for instance syn- A. Desired properties of weighted clustering chronization patterns, and have been shown to character- coefficients ize families of networks such as transcription or language networks [2]. Finally, clustering is used in other mea- Weighted measures are crucial for many network types surements to access the small-world propensity of net- where the binary connectivity is either uninformative works [5] and the choice of a specific definition can there- (fully connected network) or displays similar or lower fore influence whether the network of interest will register heterogeneity compared to the weighted structure. In as small-world or not. this study, we focus on two classes of real-world net- In many applications network topology and weights are works: inferred or measured networks where there can measured only up to certain precision [6, 7]. For exam- be a large number of spurious (false positive) edges with ple, in neuroscience, the functional connectivity networks small weights; and networks associated with flows of in- measured using the indirect inference of connections from formation or goods, which often display broad weight the recorded activity [8, 9]. Accepting the inevitability distributions. This is notably the case for many net- of noise in a network brings forward new requirements on works in neuroscience, and more generally in information, the network measures, namely that they are stable to the transportation, or other social and economic networks. noise and do not change dramatically if the weights are Weights are essential to understand the dynamical pro- perturbed or weak connections are randomly omitted. cesses that occur in these networks, requiring measures There is no agreement among the researchers which that go beyond the binary structure. weighted extension of the clustering coefficient defini- There could be multiple requirements for weighted tion is most appropriate. The three predominantly used clustering coefficients [14] depending on the particular methods at the moment [10–12] differ in many properties question of interest and on the network properties. The of their definitions. Part of the reason for the absence of main requirements that we considered necessary for a a single best weighted clustering lies in a different inter- weighted clustering coefficient are: pretation of weights in various datasets. Consequently, a normalization (C ∈ [0, 1]), different weighted extension might be most appropriate • i for various data and specific scientific questions. How- • consistency with the binary definition (for binary ever, to understand which method to use when and why networks, it should give back the classical result), we need to understand their differences precisely. The difficulty of extending graph measures to weighted • linearity (scaling by α all edges involving node i networks is not specific to the clustering coefficient but and all edges in triangles including node i scales Ci can occur whenever ratios of degrees or path-length by α), are involved. We will therefore also discuss a second • continuity (weak influence of the addition or dele- clustering-related measure, called the closure coefficient tion of edges having very small weights, mean- and introduced as the fraction of all open walks of length ing that an edge with infinitesimally small weight 2 starting from node i that are part of a triangle [13]. should be equivalent to the absence of that edge). This will also enable us to discuss the complementarity of closure and clustering as the former provides an im- Compared to a previously proposed list of condi- portant complement to analyze the tendency of nodes to tions [14], we added a continuity condition but did not form 3 and 4-cliques. include a requirement of a specific normalization factor We introduce here a distinction between fully-weighted (the global max(w)) as long as the normalization con- and hybrid definitions and discuss why, for several dition is fulfilled since only the normalization matters. classes of networks, fully-weighted and directed defini- We omitted the last two conditions of Saram¨aki’spaper tions should be preferred to other clustering definitions (invariance under weight permutation and ignorance of that are currently used for network analysis. We also weights not participating in any triangle). Although they 3 might be of interest for some specific applications, we do triangles to which node i participates. In terms of trian- not consider them to be generally desired properties for gle intensity, this definition was originally written: a clustering coefficient. We also did not require that all P wij +wik a a a weights in a triangle should be accounted for because this B j6=k 2 ij ik jk Ci = condition is necessarily met if the continuity condition is 2si(di − 1) fulfilled. (3) 1 X wij + wik Continuity can be expressed mathematically as follow: = aijaikajk di(di − 1) 2wi for a graph G(V,E), if a weighted edge (u, v, w) with j6=k u, v ∈ V and weight w ∈ is added to this graph to R B form a new graph G0(V,E0), with E0 = E + {(u, v, w)}, thus defining the intensity of triangle ∆ijk as I∆ijk = wij +wik then the clustering measure is continuous if and only if aijaikajk as the function of two of the triangle’s 2wi (G0) (G) weights and the average weight of the edges connected to ∀i ∈ V , Ci −−−−→ Ci . This condition in crucial to w→0+ node i, wi. ensure a reasonable behavior of the clustering coefficient Proposed a bit later Onnela’s definition [11] scales the in inferred networks. binary clustering by the average intensity of the triangles Though some definitions of previously proposed (see Appendix C 2 for more details): weighted clustering coefficient definitions obey most of the required properties, none of them completely fulfill  1 3 W [ 3 ] the continuity condition — despite previous claims [15]. O ii bin O Ci = = Ci I∆ijk (4) This is why we will later propose a new definition that di(di − 1) fulfills all aforementioned conditions. An extensive com- O parison of the properties fulfilled by different clustering with the triangle intensity defined as I∆ijk = definitions can be found in Appendix B. 1/3 (wijwikwjk) and the average intensity taken over all triangles to which i participates. For all hybrid methods the denominator relies on the B. State of the art for weighted clustering node’s degree, meaning that the addition or deletion of edges will always significantly affect the clustering coeffi- First we introduce and classify the existing weighted cient even if the edge has an infinitely small weight. Such clustering coefficient definition. For all clustering defi- methods can thus lead to inaccurate results when applied nitions in the main text, we use the following notation: to the inferred networks, where a significant fraction of A is an adjacency matrix, W = {wij} is the normalized edges are false positives with small weights. In the fol- weight matrix, obtained from the original weight matrix lowing we will also demonstrate that they cannot reliably W˜ = {w˜ij} by wij =w ˜ij/ maxi,j (w ˜ij). detect the most strongly clustered nodes in structured Hybrid definitions were the first extensions of the networks. Only fully-weighted definitions can rise up to binary clustering coefficients definitions. They combine these challenges. properties associated with weighted connectivity matrix Fully-weighted definitions are variants of the clus- (i.e. intensity of the triangle) with properties that could tering coefficient that do not include any binary mea- be already obtained from adjacency matrix (i.e. node de- sures (anything that can be derived from the adjacency grees). These definitions move from an integer counting matrix alone, e.g. degrees). In addition to substitut- ing the number of triangles by the sum of triangle in- the number of triangles (n∆) to a sum of real numbers (computed as a function of edge weights) that we call tensities, they also move away from counting triplets — defining the maximum number of possible triangles “intensities” of triangles (I∆). The choice of a particular function for the intensity of the triangles determines the max(n∆,i) = nT,i = di(di − 1). Instead, they introduce properties of the clustering coefficient. the triplet intensity (IT ) such that, for a node i, IT,i is Two popular hybrid weighted clustering were given by as a real-valued function of the weights associated to i. the teams of Barrat [10] and Onnela [11]. One of the first fully-weighted definition for the clus- For a node i in a graph, the definition from [10] quan- tering coefficient was provided by Zhang and Horvath tifies the fraction of the node’s strength that is invested [12] to analyze gene co-expression networks: in triangles (see Appendix C 1 for more details): P Z wijwikwjk I CZ = j6=k = ∆ijk Cbin. (5) 2 i P Z i WA w∆ j6=k wijwik I CB = ii = Cbin i (2) T ijk i 2s (d − 1) i w i i i Note, that the fact that the definition can be expressed P as a function of the binary clustering does not contradict where si = wij is the strength of the node, wi is j6=i the fully-weighted nature of the measure, as it stems from ∆ the average weight of the edges involving i, and wi = a simple recombination of the terms. P wij +wik aijaikajk is the average weight of edges in- j6=k 2n∆,i This definition can be interpreted as the ratio of the Z volving i that are part of a triangle computed over all summed intensities I∆ijk = wijwikwjk of the triangles 4

the definition. The reason for this choice is twofold: to assign higher influence to large triangles and to ensure the linearity of the coefficient. Importantly, this definition of B C 1 1 1 1 1/2 0 1/3 1/2 the triangle intensity I∆ assigns the same role to all par- CO 0 0 0 0 1/3 0 0 0 ticipating edges. The new clustering could be interpreted CZ 0 1 0 1 1 0 1/3 0 as the ratio of the triangle intensity that is invested in C 0 0 0 0 1 0 0 0 strong triangles (given by the sum of the squared intensi- ties of triangles, which increases the importance of strong Table I. Limit values for undirected weighted clustering coeffi- triangles) to the triplet intensity, which would represent cients of vertex i (full circle) for different weight configurations the maximum possible triangle intensity if all weights in graphs with vanishing weights. Solid lines depict edges of connecting adjacent nodes were equal to one. weight w = max(w) = 1, dotted lines denote edges with van- The new continuous definition fulfils all the conditions ishing weight . Only the new continuous clustering (bottom we put forward above, and to he best of our knowledge it row) returns the values consistent with the a continuity con- is the only one to do so. Similarly to previous definitions, dition, whereas the definitions of Barrat (CB ), Onnela (CO), the new clustering coefficient can also be rewritten in Z and Zhang–Horvath (C ) deviate for it. terms of node properties:

3  [ 2 ] ∆ijk = (i, j, k) to the maximal possible summed intensi- W 3 Z(max) ii Z Ci = (7) ties I∆ijk = IT ijk = wijwik if all existing triplets Tijk  1 2 [ 2 ] were closed by an edge of weight 1 (the maximal possi- si − si ble weight in the normalized network). This way, if i is P involved in a single triangle, the clustering coefficient is where si = j6=i wij is the normalized strength of node equal to the weight of the edge closing the triplet centered i and W [α] = wα and s[α] = P wα , the fractional on i (see also Table I). ij i j6=i ij weight matrix and strength for any α ∈ R. Though this definition does not fulfill the continuity Same as for the previous definitions, the continuous property, we will show that it still provides a consistent clustering can be interpreted as a function of intensities interpretation of weighted clustering, as discussed in [16], and the binary clustering: and is well suited to tackle networks with a large fraction of false positives. n I2 I2 2 ∆ ∆ijk ∆ijk bin Var(I∆ijk) + I∆ijk bin Other fully-weighted definitions that were proposed Ci = = Ci = Ci , n I I I and discussed since [15, 17, 18] do not bring significant T T ijk T ijk T ijk (8) additions compared to Zhang–Hovart’s definition while with means I and I and variance taken over all actually losing some of its properties and its straightfor- ∆ijk T ijk the triangles or triplets the node i participate in respec- ward interpretation. They are not considered further in tively. In the limit where all triangles associated to node i this study — see Appendix A for further explanations. have similar intensities, we can neglect the variance term, 2 I∆ijk bin leaving Ci . In this case, contrary to the Zhang– IT ijk C. A continuous definition for weighted clustering Horvath definition (Eq. 5) the absolute value of the inten- and closure sity matters, not only its ratio to the maximum possible intensity. For a given average triangle intensity, the pos- For an undirected graph G, we define the new contin- itive contribution of the variance implies that nodes with uous clustering of node i as: more variable intensities, i.e. at least one triangle with a high intensity, will have higher clustering coefficients √ 2 P 2 P 3  j6=k I∆ijk j6=k wijwjkwik than nodes with identical triangles of average-intensity. Ci = = √ . (6) P I P w w Finally, the global clustering can also be defined in j6=k T ijk j6=k ji ik a straightforward fashion. For simplicity, we define P 2 P We define the weighted intensity of triangles and triplets, I∆,i = j6=k I∆ijk and IT,i = j6=k IT ijk, leading to √ √ respectively I∆ijk = 3 wijwjkwik and IT ijk = wjiwik, Ci = I∆,i/IT,i. Using this definition, the continuous using the geometric mean of the weights involved. Thank global clustering is obtained via the formula: to this, one strong weight in a triangle or triplet cannot P I C = i ∆,i (9) compensate the presence of smaller weights, contrary to g P I what may happen if one uses the arithmetic mean. This i T,i provides the desired property that the intensity of tri- angles and triplets will go to zero if even a single edge weight goes to zero. Note that, though the triangle inten- D. Directed weighted clustering sity is defined as the geometric mean of the three weights involved, it is the square of this intensity that is used in Fagiolo [19] proposed how to generalize clustering to 5

(m) (m) (m) (m) Mode Pattern n∆,i nT,i I∆,i IT,i

↔ 3 1 1 di,indi,out − di  2  [ ] [ ] 3 [ 3 ] 2 2 ↔ Cycle Aii P W si,insi,out − si = j6=k aij aki ii

↔ 2 2 2 1 1 T  di,indi,out − di  T  [ 2 ] [ 2 ] ↔ Middleman AA A W [ 3 ]W [ 3 ] W [ 3 ] s s − s ii P i,in i,out i = j6=k aij aki ii

 2  1 2 di,in(di,in − 1) 2  2  [ ] T 2 [ 3 ]T [ 3 ] 2 Fan-in A A P W W si,in − si,in ii = ajiaki j6=k ii

 2   1 2 di,out(di,out − 1)  2  2 [ ] 2 T  [ 3 ] [ 3 ]T 2 Fan-out A A P W W si,out − si,out ii = aij aik j6=k ii

Table II. Definitions of the continuous intensities for each partial mode pattern in directed graph. Column 1: pattern names; column 2: patterns illustration; column 3: number of triangles for node i; column 4: number of triplets for node i; column 5: continuous intensities of the triangles for node i; column 6: continuous intensities of triplets for node i. The clustering (m)bin (m) (m) (m) (m) (m) coefficients associated to each mode m are given by Ci = n∆,i /nT,i for binary networks and Ci = I∆,i /IT,i for the continuous definition.

(m) P (m) P (m) directed networks. He defined the different patterns or via the formula: Cg = i I∆,i / i IT,i . motifs (shown in Table II) that can exist in these net- works, and adapted Onnela’s definition [11] as the first weighted directed clustering coefficient. Barrat’s defini- tion [10] was generalized to directed graphs in [20] follow- ing the same distinction into Fagiolo’s cycle, middleman, fan-in, and fan-out motifs. III. THE ADVANTAGES OF Similarly, the Zhang–Horvath [12] and continuous def- FULLY-WEIGHTED DEFINITIONS initions can be generalized in a straightforward manner for directed networks. For it, we only need to redefine the intensities of each directed triangle and triplet motif In this section we discuss the sensitivity to the weight- as shown in Table II for the continuous definition and in encoded topological features and stability to noise in net- Table VII for Zhang–Horvath. This simply requires re- work measurements of the different clustering methods. A previous study [14] already noted the fact that previous placing A by W in all expressions of n(m) and a by w in ∆,i definitions did not fulfill the continuity condition by ana- (m) all expressions of nT,i . As the total directed clustering lyzing the behavior of the different coefficients for nodes is defined as the sum of all modes, we can write it as: that participate to a single-triangle. Table I illustrates some of these cases and shows that the new continuous Z(tot) I (W + W T )3 definition is the only one to behave as expected. CZ(tot) = ∆,i = ii (10) i Z(tot) P (wij + wji)(wik + wki) IT,i j6=k Yet, we note that the definition of Zhang and Hor- vath is also very resilient to noise because, except for the Finally, the continuous clustering can be extended for corner-cases associated to single triangles, its behavior is each directed mode (see Table II) and, for the total di- continuous in all other situations. Moreover, contrary to rected clustering, this leads to: what was asserted in [14], it provides a perfectly sensi-

 2 2 3 ble behavior given its interpretation of clustering as the (tot) 1 [ 3 ] [ 3 ]T Z I 2 W + W ratio of the triangle intensity I = w w w to its (tot) ∆,i ii ∆ijk ij ik jk Ci = = , (11) (tot)  1 2 maximum possible intensity given the weights of node i: IT,i [ 2 ] ↔ Z(max) Z si,tot − si,tot − 2si I∆ijk = IT ijk = wijwik if wjk = 1. Because the definitions from Barrat’s and Onnela’s 1 P [ 2 ] [10, 11] teams are the most well-known and (to the best of with si,tot = j (wij + wji) the total strength, si,tot =  1 1  our knowledge) the only methods implemented in popu- P 2 2 ↔ j wij + wji the total root strength, and si = lar graph libraries, we restrict our comparison to Zhang– P √ j wijwji the reciprocal strength. Horvath’s and these two definitions. A more compre- As for the undirected case, the global clustering coeffi- hensive discussion of other definitions of the weighted cient associated to each directed pattern can be obtained clustering coefficient can be found in Appendix A. 6

A Zhang Continuous Onnela Barrat 0.35 0.3 0.3 0.5 0.30 0.2 0.25 0.2 0.4

0.20 0.1 0.1 0.3 0.15

B center 30 40 20 outer-core 40 periphery 20 30

20 10 20 10 10 Probability density 0 0 0 0 0.0 0.2 0.4 0.0 0.2 0.4 0.0 0.2 0.4 0.2 0.4 0.6 Clustering Clustering Clustering Clustering

Figure 1. Only continuous clustering coefficient uncovers the true structure in the weighted core-periphery network. A network has a 11 strongly-connected core nodes (black edges) that interact with well-clustered periphery nodes with weaker connection strengths (light-brown edges), see Appendix E 1 for details on the network. A. Graphical view of the network; edge width gives the strength of the connection, node color gives its clustering coefficient. B. Distribution of clustering coefficients for the three types of nodes over 10 realizations of such a core-periphery network. Only the continuous definition differentiates between the central, the outer-core and the periphery nodes. In all other methods, the clustering coefficients of the 10 “outer-core” and the 22 periphery nodes overlap: Onnela’s definition only distinguishes the central node, while Barrat and Zhang–Horvath definitions do not hint at a core-periphery structure.

A. Sensitivity to weight-encoded topological butions. Furthermore, contrary to the Zhang–Horvath features definition, it accounts not only for the ratio of the tri- angle intensity over the triplet intensity (how strong are Here we investigate how weighted structures can be the triangles compared the maximum possible value given detected ore missed using different clustering definitions. the node’s weights) but also for the absolute value of the As an example we consider a weighted core-periphery intensity: a weak triangle, even if it corresponds to the graph, Figure 1. There, core nodes are characterized highest possible value given the node’s weights, will de- by both a dense binary connectivity and large weights, crease the node’s clustering in the continuous definition whereas periphery nodes display both sparser connec- whereas it increases it with the Zhang–Horvath method. tivity (though they still have large degrees) and weaker In that sense, the continuous clustering provides a more weights, for more details in the see Appendix E 1. We global evaluation of the clustering coefficient compared to generate 10 realizations of the network and consider dis- Zhang–Horvath that provides a more local information. tribution of clustering coefficients of different types of The definition from Barrat et al. has several lim- nodes. Continuous definition leads to distinct cluster- itations because it is close to being weight-insensitive ing of different type of nodes, making the true structure [11, 21]. It is particularly unsuitable for assessing net- of the network clearly visible already in the clustering works with a potentially large number of low-weight spu- distributions. Because of their hybrid nature, Barrat’s rious connections or very heterogeneous weight distribu- and Onnela’s definitions cannot capture this underlying tions. For this reason, we will mostly leave this clustering structure as it is mostly encoded in the weights and not aside in the rest of this study. at all in the degrees (core nodes are not binary hubs). Though it is purely weighted, the Zhang–Horvath defi- nition is also not suited to detect this type of weighted B. Continuity and resilience to noise structure because its interpretation of a node’s clustering only accounts for the relative triangle intensity given the Stability of a network measure to noise is of a partic- node’s weights. ular importance for networks that are obtained via ex- The continuous clustering is sensitive to any topolog- perimental measurements since these are often subject ical property that is encoded via specific weight distri- to noise and statistical biases, notably for inferred net- 7

A B C D G.T. ER noise G.T. SF noise 1000 1500 750 + + 1000 500 Count Count 500 250 measured measured network network 0 0 10 4 10 2 100 10 4 10 2 100 Weight Weight E F G H A+C 40 150 0.8 A+D

20 Zhang 30 Onnela 0.6 B+C 100 2 B+D R 20 Continuous 0.4

10 Density Density

Probability 50 10 0.2

0 0 0 0.0 0.0 0.1 0.00 0.05 0.10 0.00 0.05 0.10 Z C O B Clustering Clustering Clustering method

Figure 2. Fully-weighted methods are less sensitive to spurious edges. A “measured network” can be represented as the union of a “ground truth” (G.T.) — here a Watts–Strogatz network in dark brown — and spurious small-weight connections (“noise” graphs with: A random, or B scale-free connectivity, in red). We assess the influence of the weight distribution of spurious connections (red) by checking weights that are: C all equal and small or D following an exponential distribution and overlapping with the real weights (dark brown). E–G: ground-truth clustering distribution (filled dark brown) compared to the distributions associated to the measured networks for each method (dashed lines). Weight and noise types, from (A+C) to (B+D), are associated to colors from brown to orange in the same order as in H. H: correlation between the ground truth clustering and clustering in measured networks for indicated spurious edge topology and weights. Fully-weighted clusterings retain most of the correlation for (A+D), with R2 > 0.55 and only lose the original information for (B+D). The results were obtained for 10 realizations of the spurious edges; error bars give confidence intervals. works. Methods abiding by the continuity condition are the presence of spurious edges — Figure 2 G, H. especially resilient to the presence of low-weight spurious edges. Violation of continuity can have significant and The difference in behavior between the methods can be pervasive consequences for inference of network proper- easily explained by a first-order expansion. We consider ties in many network structures. We have already seen change in the clustering coefficient of a node i with degree the simple examples in Table I, here we demonstrate that di after addition of a spurious edge e = (i, v) with weight they are not just corner cases, but occur in larger, real-   1. For Barrat’s and Onnela’s methods, the new world networks. clustering coefficient becomes: We illustrate the impact of spurious edges on mea- sured clustering coefficients using the example of Watts- Strogatz small-world networks. We consider different topologies for the subnetwork formed by the spurious C  1  I + O  3 edges: either an Erd˝os-Renyi random network (Fig- 0 ∆,i di − 1  1  →0 O O 3 O Ci = = Ci + O  9 Ci ure 2 A.), associated to uncorrelated noise, or a scale- di(di + 1) di + 1 free network (Figure 2 B.), which would correlate noise B P +wik 0 I + avkaik d − 1 with certain nodes in the network. Additionally, weights B ∆,i k 2 i B →0 B Ci = B = Ci + O() 9 Ci on the spurious edges could be much smaller than the IT,i + si +  di weight of the actual edges (Figure 2 C.) or have an over- lapping distribution (Figure 2 D.). Both fully-weighted methods are unaffected by low-noise (conditions A+C and B+C) and are also less influenced by the spurious edges when they weights are large enough to overlap with meaning that, for both methods, the coefficients will O/B the real weight distribution (conditions A+D and B+D). deviate from the original clustering Ci by a non- On other hand, because hybrid methods explicitly de- infinitesimal value, even when the perturbation was in- pend on the nodes’ degrees, they are very susceptible to finitesimal — see Appendix C for complete derivation. 8

On the other hand, the continuous clustering becomes seen as an attempt to remove spurious edges, though no correct threshold level is known. Compared to the hy- 2 P 3 I∆,i + ( wvkwki) brid method, both fully-weighted definitions are much C = k∼i less sensitive to thresholding, Figure 3 B, C. Thus we i 1 √ [ 2 ] see that the resilience to noise we showed analytically IT,i + 2s  i and on toy networks is relevant for a real-world network.  1  [ 2 ]√ (12) 2si   2  Furthermore, the resilience is not limited to the general = Ci 1 − + O  3  shape of the distribution but indeed preserves the precise IT,i values and ranks: a larger fraction of the nodes display- √  = Ci + O  −−−−→ Ci ing high clustering in the full graph are still among the →0+ highest ranking nodes in the thresholded graphs when showing only an infinitesimal deviation to the similarly fully-weighted method’s are used compared to the hybrid infinitesimal perturbation. method, Figure 3 E. Similarly, except for the single-triangle cases discussed Clustering coefficient is a network measure that cap- in I, the Zhang–Horvath clustering becomes tures features beyond purely local parameters, such as degree or strength. However, as we see for the mouse Z connectome, the hybrid method strongly correlates with Z0 I∆,i + O() Z Z Ci = = Ci + O() −−−−→ Ci (13) the average weight associated with a node i: si/di. At IZ + O() →0+ T,i the same time, the fully-weighted definitions are much It is worth noting that one continuity issue, associated less correlated with it, Figure 3 D. As a result, they can to nodes participating in only one triangle, occurs for all bring more independent information regarding weighted definitions but the continuous one. Since this situation is network structure than the hybrid method. This trend pervasive in networks with low degree or binary cluster- is even stronger for the Zhang–Horvath definition, which ing, using the continuous clustering definition can be of does not account for the absolute intensity of triangles. particular importance in such cases — see Appendix F 4. In contrast, the continuous definition provides some in- termediate behavior as the intensity of triangles often correlates, if only in part, with the average weight asso- IV. APPLICATION TO REAL WORLD ciated with the node. NETWORKS Finally, though the continuous and Zhang–Horvath definition often provide somewhat similar results, they A. Mouse mesoscale connectome may differ significantly, e.g. for cerebellar cortex on Fig- ure 3 F. Combining the results of both methods can thus be informative, for example to single out nodes that pos- In neuroscience, the networks on different scales give a sess only weak connections (and will therefore register as vital piece of information to understand the brain better. weakly clustered for the continuous definition) yet con- Unsurprisingly, connectomics, the mapping of the con- nect to other nodes that are strongly connected (thus nections in the nervous system, gained significant atten- registering as strongly clustered for Zhang–Horvath). tion and developed dramatically over the last years. Most of the networks in neuroscience are weighted, with all obtained connectivities are either measured or inferred, making them a typical example for the challenges dis- B. Decentralized social media: the Fediverse cussed above. The mouse mesoscale connectome [22] is a fascinating example of such networks, both because it The Fediverse is a set of federated social media that can provides information about the entire mouse brain and communicate via a collection of common protocols, the because it contains an evaluation of the probability of most well-known being ActivityPub. This network can false positives for all connections. be seen as a set of alternatives to corporate platforms Here, we investigate how the choice of the clustering such as or . Social media on the Fedi- coefficient definition can alter the results. The network is verse usually promote ideas of decentralization, interop- very inhomogeneous, with broadly varying degrees, Fig- erability, free/libre and open-source software (FLOSS), ure 3 A. The edges in the mesoscale connectome are as- and the absence of algorithmic filters in favor of human signed p-values that quantify their probability to corre- curation and moderation. spond to real physical connection (p denotes the prob- We analyze here a snapshot of this network that was ability of the connection to be a spurious edge). They obtained in 2018 by Zignani et al. [23]1. Contrary to have therefore different significance levels, with only 13% the original publication, we chose here to look at the of all edges having p-values smaller than 0.01, i.e. only 13% of all edges have a probability to be spurious that is lower than 1%. We consider a thresholding procedure, 1 where at each level of the threshold (pmax), only the edges data available at https://dataverse.mpi- with smaller p-values are kept. This procedure can be sws.org/dataset.xhtml?persistentId=doi:10.5072/FK2/AMYZGS 9

A B 75 p 10 4 75 60 p 10 2 50 50 p 0.5 40

Count p 1 25 25 20

0 0 0 10 4 10 2 10 4 10 2 10 4 10 2 CC (Zhang) CC (Continuous) CC (Onnela) C D

t R2 = 0.00 R2 = 0.20 R2 = 0.79 n a

i 40 r a v n i -

p 20

f o CC (Zhang) CC (Onnela)

% CC (Continuous) 0 5 % 10 % 25 % Average weight (si/di) Average weight (si/di) Average weight (si/di) Fraction of top-clustered nodes E 0.8 F Continuous 0.6 Zhang 0.4 Rank 0.2 Common nodes Onnela 0.0 1 0.5 10 2 10 4 pmax Pons Cortical Pallidum Medulla OlfactoryAreas Isocortex Subplate CerebellarNuclei CerebellarCortex

Figure 3. For the mouse connectome different clustering methods give significantly different results. A. Top view of the mouse brain, node size – total-degree, node color – out-degree (lighter colors for higher degrees). B. Distribution of the total clustering coefficients if only edges with the p-value p < pmax are preserved. Smaller changes in clustering distribution for fully-weighted definitions than for Onnela. From yellow to black thresholding keeps respectively 9, 13, 32, and 100% of the original network. C. The fraction of the nodes with highest total clustering (top 5, 10 and 25%) that are preserved across all subsamplings in B for Zhang (brown), continuous (orange), and Onnela (pale yellow). D. Correlation of the three total clustering definitions with average total node weight (si/di) shows that fully-weighted definitions captures additional information beyond degree and strength. E. Fraction of the 10% highest clustering nodes that are common between two of the definitions (full markers on right panel include the central region of the left panel) or among all three definitions (black crosses, central region) as shown on the Venn diagram. F. Clustering ranks of the areas within brain regions (telling which regions contain nodes with high clustering coefficients) can significantly vary depending on the definition (Zhang - brown, continuous - orange, and Onnela - pale yellow). mesoscale level, i.e. at connections between instances to this edge gives the fraction of all followers from the (the equivalent of a community on the Fediverse, where source instance that are associated to members of the at least one, but up to several thousand users can have target instance. The weights are thus a proxy to charac- an account). This mesoscale view leads to a network of terize the fraction of the community’s attention that is weighted directed interactions between communities of associated to content produced by another community; strongly connected users. Indeed, users of a single in- this means of course that, for each node, it’s outgoing stance (the technical name for a community on the Fe- strength so is equal to one. Note that in this network, diverse) can see and interact with all public messages information flow occurs in the direction opposite to that posted by other members on that same server. At the of the edge, because the directed edge denotes that the same time, they can only see a subset of the posts from source is paying attention to what the target posts. people on other servers (either because they follow their author or because other members of the instance follow The network snapshot contains 3,825 nodes corre- the author or shared this specific post). sponding mostly to instances running , one of the most prominent platforms on the Fe- For each instance I1, an edge towards another instance diverse. These 3,825 nodes represent more than half of I2 means that at least one user on I1 follows at least one the entire network and are connected via 81,371 edges. member of I2. The precise value of the weight associated Chord diagram of connections between locations shows 10

Figure 4. Properties of the network of Fediverse instances. A. Chord diagram of connections between locations. B. Spatial representations of the network showing connections that amount for at least 1‰ of the strongest connection, nodes placed on each country’s capital, their sizes represent the number of instances hosted in that country. The zoom on Europe show all connections in the European subnetwork. C. The counts of users per instance follows a heavy-tailed distribution. D. The network displays strong structural clustering, most nodes with non-zero in- and out-degrees displaying binary clustering values close two one, whereas the expected value for an Erd˝os-Renyi network with the same number of edges would be almost zero (dotted line). E. The median values of the fan-out clustering for different instance sizes show that the strong heterogeneity of the network can have a notable influence for the Onnela method (yellow) whereas Zhang–Horvath (dark brown) and continuous methods (orange) display much weaker correlation with the size of the instance. F. The fan-in motifs have intensities that are several orders of magnitude lower than fan-out and display weaker dependency on the instance size. The binary (gray) clustering coefficient misses the difference between fan-out and fan-in, which predominantly relies on the weights’ effect. that, besides the fact that the position of most instances C. Using local clustering to infer dynamical is unspecified (UNS), the largest hosting countries are properties Japan, the USA, and France, with Japanese and French communities interacting mostly among themselves, Fig- Analysis of clustering coefficient for different struc- ure 4 A, B. The network is both very sparse and strongly tured patterns offer a way to obtain a precise idea of the heterogeneous, with a median degree of 5 but node de- critical dynamical patterns within a network. To deter- grees and sizes varying over 3 to 5 orders of magnitude. mine the significance of a particular pattern we compare This broad distribution has notable implications for dif- its prominence in the original network with it in a null- ferent clustering definitions. For the Zhang–Horvath, it model networks obtained via appropriate randomization. increases the likelihood of running into the corner-cases For the mouse brain, as a randomized control, we take of single triangles, increasing the average clustering com- the original network and only shuffle the weights, thus pared to the other methods. For Onnela’s definition, it preserving the weights distribution and binary structure. strongly correlates clustering values to the degree (and Comparing the actual values in the original graphs to thus to the size) of the instance. that of randomized graphs, we can see the importance of As for the mouse network, all weighted methods lead looking at fully-weighted measures. to results that differ strongly from the binary cluster- Both Zhang–Horvath and continuous definitions iden- ing, Figure 4 (D and E). Some of the results from the tify the preference of the mouse brain network for redun- hybrid method tends to correlate strongly with some 1st dant information flows, whereas the hybrid method of order properties of the nodes (Figure 4 E) while the fully- Onnela does not capture this feature. Redundant infor- weighted methods bring more independent information. mation transfer in the brain is indeed associated to situ- The Fediverse network displays a peculiar feature as its ations where a signal can be transferred not only directly fan-out and fan-in clustering differ significantly despite from one node to another, but also indirectly via a third the usual correlation between these two patterns, as can (middleman) node, as can be seen in middleman, fan-in be seen from the comparison of Figure 4 E and F. and fan-out patters (cf. Table II). This overexpression of 11

Figure 5. Different structures of clustering patterns and information flow in the mouse brain and the Fediverse. The original clustering values are compared to graphs with the same adjacency matrix but shuffled edge weights (mouse, 10 uniformly shuffled networks; Fediverse, 200 graphs shuffled across all out-going edges of each node to preserve out-strength normalization). The contours show different density levels of the point clouds associated to the original/shuffled pairs of clustering coefficients. A. The Zhang–Horvath and continuous definitions capture the redundancy of information flow in the mouse brain (middleman and fan-in/fan-out motifs are higher in the original graph). However, the Onnela method does not capture this feature. B. For the Fediverse, only fan-out and middleman motifs are significantly stronger than in the random graphs. Patterns where the original values are significantly greater (resp. smaller) than the randomized ones are marked by the + (resp. −) and the initial of the method (brown, Z for Zhang; orange, C for continuous; yellow, O for Onnela). The original clustering is considered to be significantly higher (resp. lower) if 75% of the points where above (resp. below) the dotted identity line. Additional * denotes one-sided fractions greater than 95%. redudant patterns is visible, for the fully-weighted meth- particular classes of networks [15] (disregarding, for ex- ods, in the high values of middleman and fan-in/fan-out ample, small degree cases), erroneously marking previous motifs in Figure 5 A. methods as continuous. The other study [14] asserted The situation for the Fediverse is significantly more discontinuity of previous measures simply as a feature, complex. Indeed, 98 % of all edges belong to at least one without discussing its implications or ways to avoid it. triangle, some are involved in up to thousands of trian- Our new proposal of fully continuous clustering meth- gles. A deeper understanding of the clustering requires ods based on simple mathematical principles and require- precisely investigation of how the weight distribution cor- ments is, therefore, the first to fully solve the issue of con- relates with specific patterns of triangles. For instance, tinuity. In addition, we show in Appendix D that these though fan-in and fan-out patterns co-occur and are thus principles can be extended to other measures such as the usually correlated, the Fediverse displays an unexpected local closure. discrepancy between the weight associated to both pat- We discussed how each weighted definition is associ- terns, as seen on Figure 5 B, which notably differentiates ated to a specific interpretation of weighted clustering as it from the mouse connectome (see also Appendix F 3). a function of the binary clustering, triangle, and triplet intensities (see Appendix B and Table VI for a sum- mary). In turns, these interpretations are associated to V. DISCUSSION specific properties for each clustering coefficient. Com- bining mathematical analysis and concrete examples, we In this work, we introduced new directed weighted def- asserted that fully-weighted measures out-perform hy- initions for clustering analysis. Using analytic deriva- brid ones to evaluate clustering in networks with large tions, generated networks models and real data, we numbers of spurious edges with small weights. We ex- showed that the behavior of fully-weighted measures panded the results from previous studies comparing ex- displayed enhanced sensitivity, selectivity, and robust- isting weighted definitions [14–16, 21], extending their ness compared to hybrid measures. To facilitate access definition to directed networks and providing complete to these measures, all clustering methods were imple- mathematical justifications to previous observations re- mented in Python and made compatible with the three garding linearity and continuity. main libraries in this language (networkx, igraph, and Our analyses show that either of the fully weighted graph-tool) [24]. definitions may be preferred depending on the network We highlighted the importance of continuity as a properties. For networks with large degrees, the Zhang– crucial notion for networks with highly heterogeneous Horvath definition may be preferred if the number of low- weight distribution or numerous spurious edges with low weight spurious edges is very high as it is least suscepti- weights. In the previous studies, slight variations on ble to noise. In networks with heterogeneous weight dis- this notion had been introduced mathematically [15] and tributions where the absolute value of triangle strength hinted at by the study of corner cases [14]. However, the is of interest, we showed that the continuous definition continuity property was inadvertently considered only in provides more relevant results than that of Zhang and 12

Horvath. Indeed, in networks involving fluxes of good or key differences in their weighted structural properties. matter as well as for information processing (e.g. in brain Indeed, we showed that middleman, fan-in, and fan-out or telecommunication networks), one may be interested patterns, characteristic of pathways enabling redundant in the absolute amount that can be transferred between information transfer between nodes, were overexpressed nodes, making nodes with small weights of little rele- in the mouse brain. In the Fediverse, the methods re- vance. A similar issue occurs for networks where many vealed an unexpected discrepancy between fan-in and nodes participate to a single triangle (see Appendix F 4) fan-out modes, probably associated with social interac- making the Zhang–Horvath clustering non-continuous. tion patterns that would mandate further investigation. In such networks, the use of the Zhang–Horvath defi- nition is likely to assign high clustering to single-triangle nodes with low-weights, whereas the continuous defini- Acknowledgments tion will not, as was shown on Figure 1. The research was funded by a Humboldt Research Fel- Finally, we illustrate the usefulness of weighted clus- lowship for Postdoctoral Researchers and a Sofja Ko- tering methods to investigate clustering on the example valevskaja Award from the Alexander von Humboldt of a connectome and a decentralized social network. The Foundation, endowed by the Federal Ministry of Edu- fully-weighted methods were especially suitable to reveal cation and Research.

[1] O. Mason and M. Verwoerd, Graph theory and networks [11] J.-P. Onnela, J. Saram¨aki,J. Kert´esz,and K. Kaski, In- in Biology, IET Systems Biology 1, 89 (2007). tensity and coherence of motifs in weighted complex net- [2] S. E. Ahnert and T. M. A. Fink, Clustering signatures works, Physical Review E 71, 065103 (2005), arXiv:cond- classify directed networks, Physical Review E 78, 036112 mat/0408629. (2008). [12] B. Zhang and S. Horvath, A General Framework for [3] X.-J. Zhang, B. Gu, X.-M. Guan, Y.-B. Zhu, and R.-L. Weighted Gene Co-Expression Network Analysis, Statis- Lv, Cascading failure in scale-free networks with tunable tical Applications in Genetics and Molecular Biology 4, clustering, International Journal of Modern Physics C 10/bc8bw3 (2005). 27, 1650093 (2016). [13] H. Yin, A. R. Benson, and J. Leskovec, The Local Closure [4] Y.-W. Li, Z.-H. Zhang, D. Fan, Y.-R. Song, and G.-P. Coefficient: A New Perspective On Network Clustering, Jiang, Influence of Clustering on Network Robustness in Proceedings of the Twelfth ACM International Confer- Against Epidemic Propagation, in Science of Cyber Se- ence on Web Search and Data Mining (ACM, Melbourne curity, Lecture Notes in Computer Science, edited by VIC Australia, 2019) pp. 303–311. F. Liu, S. Xu, and M. Yung (Springer International Pub- [14] J. Saram¨aki, M. Kivel¨a, J.-P. Onnela, K. Kaski, and lishing, 2018) pp. 19–33. J. Kert´esz,Generalizations of the clustering coefficient [5] S. F. Muldoon, E. W. Bridgeford, and D. S. Bassett, to weighted complex networks, Physical Review E 75, Small-World Propensity and Weighted Brain Networks, 027105 (2007). Scientific Reports 6, 22057 (2016). [15] Y. Wang, E. Ghumare, R. Vandenberghe, and P. Dupont, [6] J.-G. Young, G. T. Cantwell, and M. E. J. Comparison of Different Generalizations of Clustering Newman, Bayesian inference of network struc- Coefficient and Local Efficiency for Weighted Undirected ture from unreliable data, Journal of Complex Graphs, Neural Computation 29, 313 (2016). Networks 8, 10.1093/comnet/cnaa046 (2021), [16] G. Kalna and D. J. Higham, Clustering Coefficients for cnaa046, https://academic.oup.com/comnet/article- Weighted Networks ∗ (2006) p. 7. pdf/8/6/cnaa046/36509950/cnaa046.pdf. [17] P. Holme, S. M. Park, B. J. Kim, and C. R. Edling, Ko- [7] R. Guimer`a and M. Sales-Pardo, Missing and rean university life in a network perspective: Dynamics spurious interactions and the reconstruction of a large affiliation network, Physica A: Statistical Me- of complex networks, Proceedings of the Na- chanics and its Applications 373, 821 (2007), arXiv:cond- tional Academy of Sciences 106, 22073 (2009), mat/0411634. https://www.pnas.org/content/106/52/22073.full.pdf. [18] K. Miyajima and T. Sakuragawa, Continuous and [8] M. Fox and M. Greicius, Clinical applications of resting robust clustering coefficients for weighted and di- state functional connectivity, Frontiers in Systems Neu- rected networks, arXiv:1412.0059 [physics] (2014), roscience 4, 19 (2010). arXiv:1412.0059 [physics]. [9] J. G. Orlandi, O. Stetter, J. Soriano, T. Geisel, and [19] G. Fagiolo, Clustering in complex directed net- D. Battaglia, Transfer entropy reconstruction and label- works, Physical Review E 76, 026107 (2007), ing of neuronal connections from simulated calcium imag- arXiv:physics/0612169. ing, PloS one 9, e98842 (2014). [20] G. Clemente and R. Grassi, Directed clustering in [10] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and weighted networks: A new perspective, Chaos, Solitons A. Vespignani, The architecture of complex weighted net- & Fractals 107, 26 (2018), arXiv:1706.07322. works, Proceedings of the National Academy of Sciences [21] I. E. Antoniou and E. T. Tsompa, Statistical Analysis 101, 3747 (2004). of Weighted Networks, Discrete Dynamics in Nature and Society 2008, 1 (2008). 13

[22] S. W. Oh, J. A. Harris, L. Ng, B. Winslow, N. Cain, h : R2 → R such that: S. Mihalas, Q. Wang, C. Lau, L. Kuan, A. M. Henry, M. T. Mortrud, B. Ouellette, T. N. Nguyen, S. A. P M,h jk h(h(wij, wik), wjk) Sorensen, C. R. Slaughterbeck, W. Wakeman, Y. Li, Ci = P (A3) D. Feng, A. Ho, E. Nicholas, K. E. Hirokawa, P. Bohn, j6=k h(h(wij, wjk), maxlm(wlm)) K. M. Joines, H. Peng, M. J. Hawrylycz, J. W. Phillips, J. G. Hohmann, P. Wohnoutka, C. R. Gerfen, C. Koch, More specifically, Wang et al. [15] argue in favor of the A. Bernard, C. Dang, A. R. Jones, and H. Zeng, A use of a specific version using the harmonic mean: mesoscale connectome of the mouse brain, Nature 508, 207 (2014). P 2 [23] M. Zignani, C. Quadri, S. Gaito, H. Cherifi, and G. P. jk 2 + 1 1 + 1 wjk Rossi, The footprints of a “mastodon”: How a decentral- M,hm wij wik Ci = P 2 (A4) ized architecture influences online social relationships, in j6=k 2 + 1 1 + 1 maxlm(wlm) IEEE Conference on Computer Communications Work- wij wik shops 2019 (IEEE, 2019) pp. 472–477. [24] T. Fardet, NNGT 2.3.0 (2021). However, this definition suffers from three major short- [25] T. Opsahl and P. Panzarasa, Clustering in weighted net- comings: works, Social Networks 31, 155 (2009). [26] D. J. Watts and S. H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393, 440 (1998). • despite what is asserted by the authors, it does not [27] H. Yin, A. R. Benson, and J. Ugander, Measuring di- fulfill the continuity condition; rected triadic closure with closure coefficients, Network Science 8, 551 (2020). • it is not locally linear, meaning that two nodes that [28] A. Cho, J. Shin, S. Hwang, C. Kim, H. Shim, H. Kim, have the same neighborhood but with all weights H. Kim, and I. Lee, WormNet v3: A network-assisted differing by a factor λ will not have the ratio of hypothesis-generating server for Caenorhabditis elegans, Nucleic Acids Research 42, W76 (2014). their clustering coefficients equal to λ;

• it introduces an undesired asymmetry in the defi- Appendix A: Limitations of other fully-weighted nition of the triangle intensity: for a given triangle definitions ∆ijk, the computed intensity will be different for each node as it depends on which one is considered See Table III for a complete comparison of the fully- as i. weighted clustering definitions.

1. Holme et al. (2007)

Most studies consider the definition of Holme et al. CZ 0 1 0 1 1 0 1/3 0 [17] to be: CH 0 1 0 1 1 0 1/3 0 CM,hm 0 1 0 1 1 0 1/3 0 P w w w C 0 0 0 0 1 0 0 0 H jk ij ik jk Ci = P (A1) maxij(wij) jk wijwjk Table III. Undirected weighted clustering coefficients of vertex which would make it inconsistent with the binary def- i (full circle) for different weight configurations. Solid lines inition. However the discussion in their paper states depict edges of weight w = max(w) = 1, whereas dotted lines that consistency was one of their requirements, letting are associated to edges with vanishing weight . Only the us think that they actually meant to define it as new continuous clustering (bottom row) displays the required properties, compared to the definitions of Zhang (CZ ), Holme P w w w (CH ), and Miyajima (CM,hm). H j6=k ij ik jk Ci = P (A2) maxij(wij) j6=k wijwjk which would make it equal to the definition from Zhang and Horvath [12], we will therefore not consider A1 here. Appendix B: Comparison of clustering properties

2. Miyajima and Sakuragawa (2014) Multiple properties of the main definitions for the clus- tering coefficient are listed in Table IV. These properties The authors define a multitude of generalized cluster- depend on the definitions of triangle and triplet intensity ing coefficients based on the use of an arbitrary function that are recapitulated in Table V. 14

Property Barrat Onnela Miyajima Zhang Continuous Consistent with binary definition X X X X X Normalized (C ∈ [0, 1]) X X X X X All weights participate to I∆ XXXX All weights participate equally to I∆ XXX Linear against local scaling of the weights X X X Sensitive to weight permutations X X X X For a triangle ∆ = (i, j, k), I∆ = IT f(wjk)X Continuous X

Table IV. Comparison of the different properties of the different clustering coefficients. Zhang–Horvath and the continuous definitions fulfill the maximum number of desirable properties. An X means that the method possesses this property.

B0 Definition Triangle (I∆ijk) Triplet (IT ijk) new value Ci by rewriting the definition from [10] as in wij + wik [14]: Barrat aij aikajk di(di − 1) 2wi

1/3 Onnela (wij wikwjk) di(di − 1) B 1 X wij + wik Ci = aijaikajk 2 2 di(di − 1) 2wi j6=k Miyajima (hm) 2 1 1 1 1 1 + + (C1) + wjk wij wik 1 w wij wik X ij = aijaikajk di(di − 1) wi j6=k Zhang wij wikwjk wij wik √ √ Continuous 3 wij wjkwik wij wik From this, we can deduce the following relationship between Barrat’s definition and the binary clustering co- bin Table V. Comparison of the formula for triangle and triplet efficient Ci : intensity among the different clustering definitions for undi- rected networks. ∆ ∆ B n∆,iwi /wi bin wi The different formulas for the intensities lead to dif- Ci = = Ci (C2) di(di − 1) wi ferent interpretations of weighted clustering — Table VI. Barrat and Zhang only quantify ratios of triangle and with n the number of triangles to which node i par- triplet strength. while Onnela and the continuous defi- ∆,i ∆ nitions are sensitive to the absolute value of the triangle ticipates and wi the average weight associated to edges intensity. The latter provides an intermediate between connected to i and participating in a triangle. Zhang and Onnela as it reacts to both the ratio of inten- Notice that this expression also explains why the defi- sities and the absolute value of the triangle intensity. nition from Barrat is so close to the binary clustering: for networks where weights are either rather homogeneous or ∆ Definition Barrat Onnela Zhang Continuous where they are not strongly correlated to triangles, wi and w become very close as the number of triangles per ∆ IZ I2 i wi bin O bin ∆ijk bin ∆ijk bin Formula Ci I Ci Ci Ci node increases. ∆ijk Z wi I IT ijk T ijk Using equation C2, the new clustering can be defined as: Table VI. Comparison of the interpretation of the different clustering definitions for undirected networks. 0 ∆ B0 bin0 wi Ci = Ci 0 wi Appendix C: Derivation of the evolution of hybrid ∆ 0 n∆,iwi +(n∆,i−n∆,i) clustering coefficients 0 0 n∆,i n∆,i = d w + di(di + 1) i i (C3) 1. Barrat di+1 ∆ n∆,iwi = 2 + O() Upon addition to a graph G(N,E) of an edge (i, v) of di wi 0 0 weight , giving G (N,E = E+{(i, v)}, one can compute di − 1 B = CB + O() the evolution of the initial clustering coefficient Ci to its k i 15

2. Onnela 2. Directed weighted closure

As for the other definitions, the clustering from [11] For directed networks, contrary to [27], we only con- can be defined as a function of the binary clustering: sider the extension of the undirected measure to directed walks and therefore define only four variants of the di- rected closure, two for outgoing walks (cycle-out, CO, n IO O ∆ ∆ijk bin O and fan-out, FO) and two for incoming walks (cycle-in, Ci = = Ci I∆ijk (C4) di(di − 1) CI, and fan-in, FI). We define the weighted version either directly using From this, one can define the evolution upon addition the weights, as in the Zhang–Horvath definition for the of an edge (i, v) of weight  as: clustering coefficient:

P w w w 3 0 j6=k ij jk ki Wii 0 0 0 H = = CO = Cbin IO i,CO P P i i ∆ijk j6=k6=i wijwjk j wij(sj,out − wij) 0 n IO + O() (D3) n∆ ∆ ∆ijk = 0 di(di + 1) n∆ (C5) 3 n∆ P T  O wkjwjiwik W = I∆ijk + O() 0 j6=k ii di(di + 1) Hi,CI = P = P (D4) j6=k6=i wkjwji j wji(sj,in − wji) di − 1 O = Ci + O() di + 1 P w w w 2 T  0 j6=k ij jk ik W W ii Hi,F O = P = P 3. Directed versions of the clustering coefficients j6=k6=i wijwjk j wij(sj,out − wij) (D5) To generalize the Zhang-Horvath definition of cluster- ing [12] for directed graphs we use the same approach as P w w w W T W 2 proposed in Fagiolo [19]. These definitions are visible in 0 j6=k kj ji ki ii Hi,F I = P = P (D6) Table VII together with the directed definitions associ- j6=k6=i wkjwji j wji(sj,in − wji) ated to Barrat [10] and Onnela [11], respectively defined in [20] and [19]. Or via the continuous definition:

 2 3 √ 2 [ 3 ] P 3 w w w W c j6=k ij jk ki ii Appendix D: Closure Hi,CO = √ = 2 P w w  1  j6=k6=i ij jk P [ 2 ] j W − si,out ij Closure was introduced in [13] as a complementary (D7) measure of clustering for binary undirected networks.

 2 3 √ 2 [ 3 ],T P 3 w w w W 1. Undirected weighted closure c j6=k kj ji ik ii Hi,CI = √ = 2 P w w  1  j6=k6=i kj ji P [ 2 ,T ] j W − si,in From the Zhang–Horvath definition of clustering, clo- ij sure can be generalized in a fully-weighted but non- (D8) continuous way as

 2  P 3 2 2 w w w [ 3 ] [ 3 ],T 0 j6=k ij jk ki Wii √ 2 W W H = = (D1) P 3 w w w i P w w P w (s − w ) c j6=k ij jk ki ii j6=k6=i ij jk j ij j ij Hi,F O = √ = 2 P w w  1  j6=k6=i ij jk P [ 2 ] j W − si,out again comparing the triangle intensities to their maxi- ij mum possible value if all open-walks of length two were (D9) closed into a triangle by an edge of weight 1.

Finally, it can also be defined in a continuous way as:  2 [ 2 ],T [ 2 ] P √ 2 W 3 W 3 3 3 w w w  2  c j6=k kj ji ki ii √ 2 [ 3 ] P 3 W Hi,F I = √ = 2 j6=k wijwjkwki P w w  1  ii j6=k6=i kj ji P [ 2 ,T ] Hi = √ = 2 (D2) j W − si,in P w w  1  ij j6=k6=i ij jk P [ 2 ] j W − si ij (D10) 16

O,(m) B,(m) B,(m) Z,(m) Z,(m) Mode I∆,i I∆,i IT,i I∆,i IT,i 3  1   T  3 [2] [ 3 ] 1 2 2 1 B Cycle W 2 WA + WA 2 (si,indi,out + si,outdi,in) − si,↔ (W )ii si,insi,out − si,↔ ii ii  1 1 1  [2] [ 3 ] [ 3 ]T [ 3 ] 1 T T T  1 B T  Middleman W W W 2 WA A + W AA 2 (si,indi,out + si,outdi,in) − si,↔ WW W si,insi,out − si,↔ ii ii ii  2 1 T  1  1 T T  T  2 [2] Fan-in W [ 3 ] W [ 3 ] W (A + A )A s (s − 1) W WW (s ) − s 2 ii i,in i,in ii i,in i,in ii  2   1  1 T 1 T T  T  2 [2] Fan-out W [ 3 ] W [ 3 ] W (A + A )A s (s − 1) WWW (s ) − s 2 ii i,out i,out ii i,out i,out ii

Table VII. Definitions of the Barrat, Onnela, and Zhang-Horvath intensities for each partial mode pattern in directed graph. Column 1: pattern names; column 2: patterns illustration; column 3: Barrat triangle intensity for node i; column 4: Onnela triangle intensity for node i; column 5: Zhang–Horvath triangle intensity for node i; column 6: Zhang–Horvath triplet intensity O,(m) O,(m) (m) B,(m) for node i. The clustering coefficients associated to each mode m are given by Ci = I∆,i /nT,i for Onnela, Ci = B,(m) B,(m) Z,(m) Z,(m) Z,(m) I∆,i /IT,i Barrat and by Ci = I∆,i /IT,i for Zhang–Horvath. Note that, for Barrat, the reciprocal strength has B P 1 been defined in [20] as si,↔ = i6=j 2 (wij + wji).

Appendix E: Network generation algorithms 2. Watts–Strogatz

All networks were generated using the NNGT library The original Watts–Strogatz network [26] consists of [24]. a regular lattice basis (characterized by a coordination number k) that is then modified, rewiring each edge with a probability p. For directed networks, we used a gener- 1. Core-periphery network alization of that method implemented in NNGT which is strictly equivalent except for the fact that edges are now directed: The core-periphery network on Figure 1 contains: 1. start from a directed regular lattice with coordina- • 1 central core node (CCN) tion number k and reciprocity r (taken as 1 in this paper), • 10 outer-core nodes (OCNs) 2. rewire each edge with probability p. 20 periphery nodes (PNs) • 1 The original lattice L(N, k, r) has 2 Nk(1 + r) edges, The nodes are connected as follow: leading to the limit cases 1 Nk edges if r = 0, like the undirected lattice, • the 10 OCNs for a circular graph with fully recip- • 2 rocal connections to their 4 nearest-neighbors, • Nk edges if r = 1, with all connections being re- ciprocal. • the OCNs all connect to the CCN with reciprocal connections. In the network used on Figure 2, we used a coordina- tion number k = 20 and a rewiring probability p = 0.03. The weights associated to these connections are drawn from U(5, 10). The connections with the PNs are as fol- low: Appendix F: Real-world networks • each OCN receive one connection from every other PN, starting with the first or second PN depending 1. Mouse mesoscale connectome on the OCN’s evenness, The network was obtained from [22] and contains 426 • the OCN reciprocate the connections with proba- nodes corresponding to brain regions of intermediate bility 0.5 scale connected by 65,465 edges. It is a symmetric ver- sion of the original network that separates nodes from • the PNs are connected among themselves following both hemispheres. Each node is associated a “name” an Erd˝os–Renyi scheme of density 0.05. property corresponding of a abbreviated denomination for the corresponding brain region, as well as a suffix Weights associated to connections involving PNs are (left, or right) corresponding to the hemisphere. Edges drawn from U(0.05, 0.5). are associated three attributes: a “weight”, a “pvalue”, 17 and a “distance” (corresponding to the Euclidean dis- Table IX. Some characteristic properties of the Fediverse tance from the source to the target node). mesoscale network.

Mean STD Median Min Max 5 In-degree 153.7 25.6 152 104 223 103 10 104 Out-degree 153.7 77.6 151 14 357 2 bin 10 3 CCtot 0.43 0.02 0.43 0.38 0.49 10 −16 Weights 0.076 0.36 0.038 3.9·10 20.4 101 102 101 100 Edge count Node count 0 Table VIII. Some characteristic properties of the mesoscale 10 mouse brain. 0 0 100 101 102 103 104 0.0 0.5 1.0 Degree Weight

105 75 104 Figure 7. Distribution of in/out-degrees (respectively in or- ange/yellow) and weights in the Fediverse instances. 103 50 102 1 25 10 Edge count Node count 100 3. Closure in the shuffled networks 0 0 0 200 0 10 20 Degree Weight The results obtained for the local closure confirm those obtained using the clustering coefficient for the mouse brain and Fediverse networks, as show on Figure 8. Figure 6. Distribution of in/out-degrees (respectively in or- Namely, we see an overexpression of fan-in and fan-out ange/yellow) and weights in the mouse connectome. patterns for the mouse and a discrepancy between those two patterns for the Fediverse.

4. Networks with a high number of single-node 2. Fediverse mesoscale network triangles

The network was obtained from [23]2. It contains 3,825 The presence of nodes participating to a single triangle nodes representing servers (instances) that are connected is frequently in very sparse networks such as collabora- via 81,371 edges. Each node is associated two attributes: tion or gene/protein interaction networks. a “name”, corresponding to the server domain, and a Table X details several such networks: “size”, Si, corresponding to the number of users regis- tered on that server. Edges are associated four attributes: NetScience: co-authorship in the network science com- “num follows”, defined a Fij, the number of followers munity3 from i to j, “follow/size” defined as Fij/Si, a “weight” 4 given by wij = Fij/Fi, the ratio between the number of CompGeo: collaborations in computational geometry followers from i to j divided by the total number of fol- CE-CX: a graph of gene associations inferred from co- lowers from i, and a “distance” giving the Euclidean dis- expression pattern of two genes (based on high- tance between the two servers on the latitude/longitude dimensional gene expression data) for C. elegans plane. CE-HT: gene associations inferred from high- Mean STD Median Min Max throughput protein-protein interactions for C. In-degree 21.3 82.6 4 0 2271 elegans Out-degree 21.3 77.8 5 0 2038 bin CCtot 0.68 0.37 0.86 0 1 HS-HT: gene associations inferred from high- Weights 0.045 0.14 0.0029 6.0·10−7 1 throughput protein-protein interactions for Homo sapiens

2 data available at https://dataverse.mpi- 3 https://networkrepository.com/netscience.php sws.org/dataset.xhtml?persistentId=doi:10.5072/FK2/AMYZGS 4 http://vlado.fmf.uni-lj.si/pub/networks/data/collab/geom.htm 18

SC-HT: gene associations inferred from high- throughput protein-protein interactions for S. cerevisiae Table X. Number of nodes (N), edges (E), and fraction of single-triangle (1-T) nodes in several collaboration and gene Gene functional association networks [28] were down- association networks. loaded from https://networkrepository.com/bio.php.

Network N E 1-T nodes (%) NetScience 1,589 2,742 23 CompGeo 7,343 11,898 21 CE-CX 15,229 24,5952 8 CE-HT 2,617 2,985 2 HS-HT 2,570 13,691 7 SC-HT 2,084 63,027 5

Figure 8. Different structures of local closure in the mouse brain and Fediverse. The panels compare the local closure of directed patterns for the mouse brain (A) and Fediverse (B). The original values are compared to the averages over an ensemble of graphs with the same adjacency matrix but shuffled edge weights. For mouse: 10 uniformly shuffled networks, for Fediverse: 200 shuffling across all out-going edges of a each node (to preserve out-strength normalization). A. The local closure also shows an increase for patterns promoting redundancy of information flow in the mouse brain (fan-in/fan-out motifs are higher in the original graph). B. For the Fediverse, only fan-out is significantly stronger than in the random graphs. Patters where the original values are significantly greater (smaller) than the randomized ones are marked by the + (resp. −) and initial of the method (brown, Z for Zhang; orange, C for continuous; yellow, O for Onnela). The original closure is considered to be significantly higher (resp. lower) if 75% of the points where above (resp. below) the dotted identity line. Additional * denotes one-sided fractions greater than 95%.