<<

PREDICTING TRIADIC CLOSURE IN NETWORKS USING COMMUNICABILITY DISTANCE FUNCTIONS

ERNESTO ESTRADA† AND FRANCESCA ARRIGO ‡

Abstract. We propose a communication-driven mechanism for predicting triadic closure in complex networks. It is mathematically formulated on the basis of communicability distance func- tions that account for the “goodness” of communication between nodes in the network. We study 25 real-world networks and show that the proposed method predicts correctly 20% of triadic closures in these networks, in contrast to the 7.6% predicted by a random mechanism. We also show that the communication-driven method outperforms the random mechanism in explaining the clustering coefficient, average path length, average communicability, and degree heterogeneity of a network. The new method also displays some interesting features toward its use for optimizing communication and degree heterogeneity in networks.

Key words. network analysis; triangles; triadic closure; communicability distance; adjacency matrix; matrix functions; quadrature rules.

AMS subject classifications. 05C50, 15A16, 91D30, 05C82, 05C12.

1. Introduction. Complex networks are ubiquitous in many real-world scenar- ios, ranging from the biomolecular—those representing gene transcription, protein interactions, and metabolic reactions—to the social and infrastructural organization of modern society [12, 18, 44]. Mathematically, these networks are represented as graphs, where the nodes represent the entities of the system and the edges represent the “relations” among those entities. A vast accumulation of empirical evidence has left little doubts about the fact that most of these real-world networks are dissimilar from their random counterparts in many structural and functional aspects [18]. In particular, it is well-documented nowadays that the relative number of triangles is significantly higher in those networks arising in natural or man-made contexts that what one would expect from a random wiring of nodes [44]. Here, “relative” means that we are comparing the number of existing triangles to the total possible number of such structure that can exist in a given network. In jargon such relative number of triangles is commonly known as the clustering coefficient (see [52]). The fact that triangles are abundant in real-world networks was well known in social sciences, where Simmel [51] theorized that people with common friends are more likely to create friendships. This “friendship transitivity” definitively implies a social mech- anism for triadic closure in social networks, which may then be applied to explain the evolution of triangle closures as done previously by Krackhardt and Handcock [31]. This Simmelian principle of triadic closure due to friendship transitivity assumes that individuals may benefit from cooperative relations, and hence that individuals may choose new acquaintances among their friends’ friends. The high degree of transitivity is not a unique feature of social networks; in- deed it is a common characteristic of many other types of networks which includes biomolecular, cellular, ecological, infrastructural, and technological (see [18] and ref- erences therein). It can then be though that similar cooperative principles, as the ones proposed by Simmel for social networks, could be applied to find mechanisms

†Department of Mathematics and Statistics, University of Strathclyde, Glasgow, U. K. ([email protected]) ‡Department of Science and High Technology, University of Insubria, Como I-22100, Italy ([email protected]) 1 2 Ernesto Estrada and Francesca Arrigo

that explain triadic closure in these other types of networks. Although intuitive, this simple idea has some fundamental drawbacks. First, it is not always true that nodes benefit from cooperative relations, and therefore the Simmelian principle is useless in such situations. Secondly, it is evident that not every pair of nodes separated by two edges participates in a triangle in a real-world network. Thus, some kind of selective process has taken place, closing some of the triads in a network and leaving many others open. The goal of this paper is to propose a general mechanism that account for such selective process of triadic closure in networks. The importance of triadic closure in networks goes beyond the prediction of tran- sitive relations. It has been shown that models of network growth based on simple triadic closure naturally lead to the emergence of , together with fat-tailed degree distributions, and high clustering coefficients [7]. Similar results by Klimek and Thurner [28] show that triadic closure is the main responsible for the , scaling of the attachment kernel, and clustering coefficients as a function of node degrees in networks. A common characteristic of these works is that the mechanism used for predicting triadic closure is based on the random se- lection of nodes. There have also been attempts to predict triadic closures by using nonrandom mechanisms. An exhaustive computational analysis was performed by Leskovec et al. [32] by considering several strategies to model how a node u selects a node v, two steps from it, to form a triangle. The first strategy is that u selects the node v randomly among all nodes separated by two edges from u. An improved strategy was to consider that u first selects a neighbor node w according to some mechanism, and then w selects a neighbor v according to some (possibly different) mechanism. Then, the edge (u,v) is formed and the triangle uwv is closed. The selection of a neighbor w for u (or v for w) was carried out using△ one of the follow- ing techniques: (i) uniformly at random; (ii) proportional to degree of w raised to a power; (iii) proportional to the number of friends that u and w have in common; (iv) proportional to the time passed since w last created an edge raised to a power; (v) proportional to the product of the number of common friends with u and the last activity time, raised to a power. After the analysis of these mechanisms for real-world social networks, Leskovec et al. [32] concluded that “while degree helps marginally, for all the networks, the random-random model gives a sizable chunk of the performance gain over the baseline (10%)”. This means that in no case the methods proposed display improvement over the log-likelihood of picking a random node two hops away by more than 10%. This example illustrates the high degree of complexity of the task of predicting triadic closure in real-world networks and emphasizes the necessity of developing new mathematical and computational tools for facing this important problem. Here we propose a different strategy for predicting triadic closure in networks. Our approach is based on the idea that triadic closure is a communication-driven process in which a triad is closed either (i) if the addition of a new link enhances significantly the communication between the nodes u and v, or (ii) the new link is added because of the highly empathic relations existing between the nodes u and v. This paradigm is formulated on the basis of communicability distance functions that account for the “goodness” of communication between pairs of nodes using one-hop or two-hops steps in the network. The method requires a calibration step in which we determine which of the two mechanisms, enhancing or empathic, dominates the triadic closure in the network. We study 25 real-world networks and show that the proposed method predicts correctly 20% of the triadic closures in these networks. In Prediction of triadic closure 3

some cases these percentage of correct prediction reaches values of more than 40%. In contrast, the random closure of triads identifies only 7.6% of the real triangles existing in those networks. The differences between the current method and a random process could be as high as 30% as observed for a few of the networks studied. We finally use this method to analyze how the communication-driven triadic closure determines a few properties of a given network. We study the clustering coefficient, average path length, average communicability, and degree heterogeneity of the network. We see that in general the communication-driven method outperforms the random mechanism in explaining these network properties and displays some interesting features toward its use for optimizing communication and degree heterogeneity in networks. This paper starts with the definition of all the mathematical concepts used in order to make the paper self-contained, then we introduce the new method of triadic closure, and we finish with the presentation of the results and their discussion.

2. Preliminaries. We recall in this section some definitions, notations, and properties associated with networks and we set the notation used throughout the work. A graph Γ=(V,E) is defined by a set of n nodes (vertices) V and a set of m edges E = (u,v) u,v V between the nodes. An edge is said to be incident to a vertex u if{ there exists| ∈ a node} v = u such that either (u,v) E or (v,u) E. The 6 ∈ ∈ degree of a vertex, denoted by du, is the number of edges incident to u in Γ. The graph is said to be undirected if the edges are formed by unordered pairs of vertices. A walk of length k in Γ is a set of nodes u ,u ,...,uk,uk such that for all 1 l k, 1 2 +1 ≤ ≤ (ul,ul+1) E. A closed walk is a walk for which u1 = uk+1. A path is a walk with no repeated∈ nodes. A closed walk of length 3 is called a triangle. We will call potential triangle every triplet of nodes u, v, and w such that (u,v), (v,w) E but (u,w) E. A potential triangle is hence a triangle where one edge is missing.∈ We shall call6∈ this missing edge a potential edge. A graph is connected if there is a path joining u and v for every u,v V . A graph with unweighted edges, no edges from a node to itself, and no multiple∈ edges is said to be simple. We will always consider undirected, simple, and connected networks. It is well known that a graph Γ = (V,E) n n can be represented as a matrix A =(auv) R × called the adjacency matrix of the graph. It is worth noting that for undirected,∈ simple, and connected networks the associated adjacency matrix is symmetric, binary, hollow (i. e. has zeros on the main diagonal), and irreducible (see [26, 18]) and hence its entries are

1 if(u,v) E auv = ∈ u,v V. 0 otherwise ∀ ∈  It is possible to define several distance measures on networks. The most common is the shortest-path (or geodesic) distance between two nodes u,v V , which is defined as the length of the shortest path connecting these nodes. We∈will write d(u,v) to denote the geodesic distance between u and v. Here we will call, as usual in network theory, average path length the average of the shortest path distance in the graph [18, 44]:

1 l = d(u,v). 2m u,v V X∈ Another measure used for characterizing the structure of networks is the so-called local clustering coefficient of a node u, which quantifies the degree of transitivity of 4 Ernesto Estrada and Francesca Arrigo

local relations in a network and which is defined as [52]:

2 (v,w): v,w Nu;(v,w) E Cu = |{ ∈ ∈ }| du(du 1) − where Nu = v : (u,v) E is the set of neighbors of node u and S denotes the cardinality of{ the set S.∈ Taking} the mean of these values as u varies| among| all the nodes in Γ, one gets the clustering coefficient of the network Γ: n 1 C = C . n u u=1 X One of the most commonly used methods to characterize the structure of complex networks is the study of their degree distributions [2]. This feature is useful for distinguishing certain structural characteristics of networks, specially in those cases where the distribution is very skewed, like in fat-tailed degree distributions. However, there are a few problems in using degree distributions to analyze complex networks. Of special relevance here is the fact that we cannot easily compare networks with different degree distributions; this makes this measure inappropriate for analyzing a network evolving in time. Indeed, for such networks the type of distribution may change over time. In [16] it has been introduced a quantitative measure of the degree heterogeneity of a network which accounts for the information contained in the degree distribution in a more precise way. The degree heterogeneity of the graph Γ = (V,E) has been defined in [16] as: 1 1 1 2 ρ (Γ) = . n 2√n 1 √du − √dv (u,v) E   − − X∈ This index is bounded as 0 ρ(Γ) 1. The lower bound is attained by any regular graph, i. e. , any graph≤ whose nodes≤ have all the same degree (the most homogeneous degree distribution), and the upper bound is reached by the star graph, i. e. , the graph having one node of degree n 1 and all the other n 1 node of degree 1 (the most skewed degree distribution). − − An important quantity to be considered when studying communication processes in networks is the so-called communicability function [15, 19, 17], which is defined as: k n ∞ A A uv λk Guv = e = = e qk(u)qk(v), u,v V, uv k! ∀ ∈ k=0  k=1  X X where A = QΛQT is the spectral decomposition of the adjacency matrix (see [26]), with Λ a diagonal matrix containing the eigenvalues of A and Q = [q1,..., qn] an orthogonal matrix containing the associated eigenvectors. The communicability counts the total number of walks starting at node u and 1 ending at node v, weighting their length by a factor k! , hence considering shorter walks more influential than longer ones (see [15, 19, 17]). The Guu terms of the communicability function, which are usually called subgraph of the nodes, characterize the degree of participation of a node in all subgraphs of the network, giving more weight to the smallest ones. Here we will use the average communicabil- ity as a way to characterize the goodness of the communication taking place in the network as a whole: 1 G = G n(n 1) uv u=v − X6 Prediction of triadic closure 5

The communicability function has been recently used to propose the following quantification of the goodness of communication between nodes in a network. When two nodes u and v are exchanging information, the quality of their communication de- pends on two factors: (i) how much information departing from a source node reaches its target (Guv), and (ii) how much information departing from the node returns to it without arriving at its destination (Guu). That is, the goodness of communication in- creases with the amount of information which departs from the originator and arrives at its destination, and decreases with the amount of information which is frustrated due to the fact that the information returns to its source without being delivered to the target. In [20] the communicability distance has been introduced. It is defined as:

(2.1) ξuv = Guu + Gvv 2Guv − and it has been proved to be an Euclideanp distance between the nodes u and v in Γ (see [20, 21]). From its definition, it is clear that ξuv characterizes the quality of the communication taking place between nodes u and v. 3. Model. Let us consider a triad u,w,v, such that the edges (u,w) E , ∈ (w,v) E and (u,v) / E, i. e. , let us consider a potential triangle. Let fR(u,v) be a function∈ that characterizes∈ the goodness of communication between the nodes u and v. It considers (as usual) that the information departing from node u arrives at node v by taking one-hop steps between the nodes in any of the existing walks that communicate both nodes. Let fH (u,v) be another function that characterizes the goodness of communication that would exists between nodes u and v if the information travels using two-hops steps between the nodes of the network. That is, fH (u,v) assumes that the information is flowing by using only those links that can be created by closing all the triads existing in the network. We define the gain of creating a new link between the nodes u and v as:

(3.1) ∆uv = fR(u,v) fH (u,v). − Now we have the following modeling scenarios. Enhancer link: a new link connecting the nodes u and v represents a sig- • nificant increment in the communicability between them in comparison to the existing one-hop communicability. Thus, we assume that this new link is added as a way to enhance the communicability between the nodes u and v. Consequently, a new link between u and v is created if and only if ∆uv < 0. Empathic link: a new link connecting the nodes u and v is added in spite • of the fact that the existing one-hop communicability between the nodes is larger than the one obtained by traveling by the two-hops steps in the network. Thus, we assume that this new link is added because of the highly empathic relations existing between the nodes u and v. Consequently, a new link between u and v is created if and only if ∆uv > 0. Suppose that we want to predict whether the potential triangles existing in a network will evolve into new triangles. Since we do not know whether the existing triads will close by using enhancer or empathic links, the previous method is of little use. The only way we have to make some inference is to assume that the network as a whole has been evolving following a general mechanism. That is, that most of the potential triangles in that network have been closed either by enhancer or by empathic links. Such information can be obtained by empirically estimating the values of ∆uv for all existing triangles in the network. For this purpose we need the definition of some em- 2 pirical parameters that capture this characteristic of a network. Let fR(u,v)= αξuv, 6 Ernesto Estrada and Francesca Arrigo

where α is an empirical parameter and ξuv is the communicability distance as defined in equation 2.1. This function accounts for the closeness in communication between two nodes assuming that the information is traveling by one-hop steps through the 2 network. Let fH (u,v) = βηuv, where β is an empirical parameter and ηuv is a new communicability distance defined to account for the two-hops propagation of the in- formation in the network. A2 We define this new distance as follows. Let Guv = e− be the communica- uv bility function between the nodes u and v assuming that the information is hopping by two-hops steps in the network [22]. It is straightforwarde to realize that

p(2) u = v A2 = uv uv d u 6= w  u  (2) where puv is the number of paths of length 2 between the nodes u and v and du is the degree of the node u. Hence, the entries of this matrix take into account the information derived from the degree of the nodes and from the paths of length two between them, which are exactly the information we want to consider. Then, the two-hops communicability distance function is defined as:

(3.2) ηuv = Guu + Gvv 2Guv − q The proof that ηuv is an Euclidean distancee e betweene the nodes u and v follows the same lines as in [20, 21] and is not given here. The following result states the condition under which the gain function

2 2 (3.3) ∆uv(α,β) := αξ βη , u,v V uv − uv ∀ ∈ is also a distance. Lemma 3.1. Let Γ=(V,E) be an undirected, simple, and connected network. T Let A = QΛQ , with Q =[q1, q2,..., qn] orthogonal and Λ= diag(λ1,λ2,...,λn) be the spectral decomposition of the adjacency matrix A and let ∆uv(α,β) be defined as in (3.3) for all u,v V . Then, ∆uv(α,β) is a squared Euclidean distance if and only λ2 λ ∈ if αe j + j β for all j = 1, 2,...,n. ≥ Proof. Using the spectral decomposition of the adjacency matrix one gets:

2 2 ∆uv(α,β)= α (ξuv) β (ηuv) n − 2 2 λj λ = [qj(u) qj(v)] αe βe− j − − j=1 X  

q1(u) q2(u) Define now su =  .  for every u V . . ∈   q (u)  n  Then  

T Λ Λ2 ∆uv(α,β)=(su sv) αe βe− (su sv) − − − T   =(su sv) Σ(su sv). − − Prediction of triadic closure 7

Λ Λ2 where Σ = αe βe− is a diagonal matrix. This matrix is positive semidefinite if 2 1 λ +−λj and only if αe j β for all λj . If this condition holds, by defining xu = Σ 2 su one gets: ≥

T 2 ∆uv(α,β)=(xu xv) (xu xv)= xu xv . − − k − k and this concludes the proof. At this point the only remaining ingredient needed to implement a method for predicting triadic closure in networks is the determination of the empirical coefficients α and β. Due to the lack a priori of information about the nature (enhancer or empathic) of the links that have formed triangles in the network, we need a calibration method to determine these parameters. This calibration algorithm is described in the following section. 3.1. Calibration method. The aim of this calibration method is to find the empirical parameters α and β in the gain function ∆uv(α,β) for a given network. There is a precomputing phase in which we detect all the existing and potential triangles in the selected network and we build a list P containing the potential edges. This operation is performed by exhaustively searching the adjacency matrix (or the adjacency list [45]) and it is very costly. However, if a smaller percentage of all the potential edges is needed instead, one can simply select them randomly. The general calibration method is then described as follows: 1. Randomly remove one edge from each triangle in the network and keep track of which edge was selected. Denote the set of these edges by R and its cardinality by r; 2. Create a new list of edges L by the union of R and P ; 3. Compute the values for ∆uv(α,β), for all (u,v) L and for values of α,β varying in an interval I R. ∈ ⊂ 4. Sort these values of ∆uv(α,β) in non-decreasing order; 5. Count the number of elements of R which are among the top r elements of the sorted list. These are denoted as the detected edges; 6. Select the values of β and α which return the highest percentage of detected edges (α∗, β∗); 7. Repeat this procedure T times and report the average values α∗ and β∗ and the average percentage of detected edges. h i h i Once we have obtained the parameters α and β for a given network we can use them to predict which new potential triangles will close in the next steps of evolution of this network. For instance, suppose we have a version of Facebook in which the nodes represent the users and the links represents friendship relations. Using this version of Facebook one can determine the values of α and β which better describe the triadic closure in the network. Now, one can take a step forward and predict which new triads will close in the future, allowing a forecasting of the most probable structure of this network in a near future. Another feature of this method is that it also allows to detect the presence of actual edges which have not been accounted for by the exploratory techniques used to generate the network. For instance, in biomolecular networks, like protein-protein interaction ones, there is a significant number of interactions (i. e. , edges) which are missing due to experimental errors inherent to the techniques used to find them. Thus, the current method could help in predicting which edges closing triangles are missing because of experimental errors. Finally, an added value of the current model is that it allows to know whether a network has been evolving via an enhancing or empathic mechanism. All these topics will be discussed in the section of 8 Ernesto Estrada and Francesca Arrigo

Results, but before we describe in the next section some mathematical characteristics of the measures used in the proposed method. 3.2. Bounds for communicability distance functions. Although in this work we will use the exact values of the communicability distance functions in or- der to obtain the values of ∆uv, we will introduce in this section some bounds for ξuv and ηuv, which are of great importance in calculations of these quantities for extremely large networks. As it appears from the definitions given in (2.1), (3.2), and (3.3) for large matrices these values may be too costly to compute. To avoid 2 2 the computation of the matrix exponential, we derive bounds for ξuv and ηuv (and therefore for ∆uv(α,β)) by means of the Gauss–Radau quadrature rule. Before giving these bounds, in order to make the present paper self-contained, we summarize the approach used as described in [5, 4, 24]. It is well known that the problem of computing bilinear expressions of the form uT f(A)v can be reduced to the approximation of a Riemann–Stieltjes integral with respect to a certain measure using quadrature rules. Indeed, in a series of papers, Golub and collaborators give bounds on the entries of f(A) based on Gauss-type quadrature rules when f is a strictly completely monotonic (s.c.m.) function on an interval containing the spectrum of A by working on a 2 2 matrix derived from 1 step ofI the symmetric Lanczos iteration. Recall that a func× tion is s.c.m. on if f (2k)(x) > 0 and f (2k+1)(x) < 0 for all x and k 0, where f (k) denotes the Ikth derivative of f and f (0) f. Since g(x)∈I = ex is∀ not≥ s. c. m. , we need to work on x ≡ f(x)= e− to derive bounds on the quantities of interest here. The key result that allows to easily compute a priori bounds using Gauss-type quadrature rules is that these can be obtained as the element in position (1, 1) of the matrix f(Jp+1) (see Theorem 6.6 in [24]), where

ω1 γ1 γ1 ω2 γ2  . . .  Jp+1 = ......    γp 1 ωp γp   −   γ ω   p p+1    is a tridiagonal matrix whose eigenvalues are the nodes of the quadrature rule, whereas the weights are given by the squares of the first entries of its normalized eigenvectors. Our results are summarized in the following theorems. Theorem 3.2. Let A be the adjacency matrix of an unweighted and undirected network. Then

γ2 (ξ )2 γ2 (3.4a) Φ b,ω + 1 uv Φ a,ω + 1 1 ω b ≤ 2 ≤ 1 ω a  1 −   1 −  where

x y y x c (e− e− )+ xe− ye− (3.4b) Φ(x,y)= − − , c = ω , x y 1 −

ω1 = auv; 1 . du+dv 2 2 ( γ1 = ω1 Auv 2 − −   Prediction of triadic closure 9

and [a,b] is an interval containing the spectrum of A. − Remark 1. If (u,v) E the bounds considerably simplify. Indeed, in this case 6∈ ω1 = 0 and hence

γ2 γ2 2 1 2 b 2 2 1 2 a b e b + γ1 e− (ξuv) a e a + γ1 e− 2 2 2 2 b + γ1 ≤ 2 ≤ a + γ1

2 Before proceeding with the proof of the result, note that (ξuv) can be written as

2 T A (ξuv) =(eu ev) e (eu ev), − −  where eu and ev are the uth and vth vectors of the canonical basis, respectively. Proof. Using the Lagrange interpolation formula for the evaluation of matrix functions [25] one can easily derive [4] that

µ1 µ2 µ2 µ1 T C c11(e− e− )+ µ1e− µ2e e e− e = − − . 1 1 µ µ 1 − 2  where µ1, µ2 are the distinct eigenvalues of the matrix C. ω γ We now want to build explicitly the matrix J = 1 1 in such a way that 2 γ ω  1 2  τ1 = a or τ1 = b is a prescribed eigenvalue. The values of ω1 and γ1 are derived explicitly applying one step of Lanczos iteration to the matrix A with starting eu ev − vectors x 1 = 0 and x0 = − . − √2 Note that if γ1 = 0 we simply take ω2 = τ1 and the matrix J2 is diagonal with eigenvalues µ1 = ω1 and µ2 = τ1. Thus, let us assume γ1 = 0. Using the three-term recurrence for orthogonal polynomials: 6

γj pj(λ)=(λ ωj)pj 1(λ) γj 1pj 2(λ), j = 1, 2,...,p, − − − − −

γ1 with p 1(λ) 0, p0(λ) 1 we derive ω2 = τ1 . Using the same recurrence, we − ≡ ≡ − p1(τ1) τ1 ω1 also find that p1(τ1)= − = 0, since the zeros of orthogonal polynomials satisfying γ1 6 the three-term recurrence are distinct and lie in the interior of (see [24, Theorem 2.14]). I The matrix

ω1 γ1 J = 2 2 γ τ γ1 1 1 τ1 ω1 ! − −

2 has (distinct) eigenvalues µ = τ and µ = ω + γ1 . The result then follows by 1 1 2 1 ω1 τ1 applying Theorems 6.4 and 6.6 from [24]. − 2 Similar bounds can be computed for ηuv and are described in the following theo- rem, whose proof goes as that of theorem 3.2. Theorem 3.3. Let A be the adjacency matrix of an unweighted and undirected network. Then

2 2 2 γ˜1 (ηuv) γ˜1 Φ ˜b, ω˜1 + Φ a,˜ ω˜1 + ω˜ ˜b ≤ 2 ≤ ω˜1 a˜  1 −   −  10 Ernesto Estrada and Francesca Arrigo

ξ2 η2 ∆ (α,β) uv uv uv 0 0 10 −10 2 10 −1 10 1 −10 −2 1 10 10

−3 2 10 −10

0 −4 10 10 0 200 400 0 200 400 0 200 400

2 2 Fig. 1. Bounds on ξuv, ηuv and ∆uv for all the elements (u,v) 6∈ E for the network Zachary. Continuous line: value of the measure, dotted lines: upper and lower bounds obtained using the Gauss–Radau quadrature rule.

where Φ is defined as in equation (3.4b) with c =ω ˜1, ˜ = [˜a, ˜b] is an interval containing the spectrum of A2, and I

2 ω˜1 = γ1 + ω1; 1 1 n 2 2 2 2 2 .  γ˜1 = A A ω˜1  2 w=1 uw − wv − h P  i with ω1 and γ1 as in theorem 3.2. Remark 2. Since the behavior of the eigenvalues of A is known (see [40]), we may take a˜ = 0 and ˜b = a2 as the square of the approximation to the largest eigenvalue of A. For these values, the bounds simplify as

2 2 ω˜1 +˜γ1 2 2 2 2 2 γ˜ (η ) γ˜ ω˜ e− ω˜1 +γ ˜ Φ a2, ω˜ + 1 uv Φ 0, ω˜ + 1 = 1 1 . 1 ω˜ a2 ≤ 2 ≤ 1 ω˜ ω˜2 +γ ˜2  1 −   1  1 1 Combining the results described in the previous theorems, one easily get bounds ∆uv (γ,β) for the values of 2 . Indeed, the computation is straightforward, since the new bounds are linear combination of the previously introduced ones. For example, if we have γ 0 and β 0 we get as lower bound for ∆uv (γ,β) ≥ ≤ 2 γ2 γ˜2 γΦ b,ω + 1 + βΦ a,˜ ω˜ + 1 1 ω b 1 ω˜ a˜  1 −   1 −  and as upper bound

2 2 γ1 γ˜1 γΦ a,ω1 + + βΦ ˜b, ω˜1 + ω1 a ω˜ ˜b  −   1 −  where ω1, γ1,ω ˜1, andγ ˜1 depend on the choice of u and v. 2 Figure 1 displays the behavior of the bounds derived for the distances ξuv and 2 ηuv, and for the gain function ∆uv for any pair of nodes u, v such that (u,v) E for 1 2 2 6∈ the network Zachary . We plot the values for ξuv, ηuv, and ∆uv and of the derived

1See the Appendix for a description of the network. Prediction of triadic closure 11

bounds versus the elements (u,v) E. The indexing of these elements is the same in all the subfigures and it has been6∈ obtained by ordering in non-decreasing order the values of ∆uv(α,β). The values α = 1.696 and β = 0.392 (see table 1) have been determined empirically using the technique− previously described. 4. Results and Discussion. We study 25 networks representing complex sys- tems from a wide variety of environments, such as social, ecological, biomolecular, technological, infrastructural, and informational. A brief description of all these net- works is given in the Appendix. For the computational results we limited ourselves to T = 100 repetitions in order to find the optimal values of the empirical parameters α and β. Also, in order to prove the effectiveness of this method, we have run a random test in which we randomly order the edges in L and count how many of the r removed edges in the list R are found in the first r positions (rand). The extremes for the interval I in which α and β vary have been determined empirically: I = [ 2.1, 2.1]. Indeed, smaller intervals led to worse results than those obtained with [ 2.1−, 2.1]. On the other hand, larger intervals did not improve the results. − We start by considering the possibility of fixing one of the parameters and letting the other varying in the bounded interval [ 2.1, 2.1]. Then, we set α = 1 and let β I vary. This seems reasonable, since this− choice allows us to tune the disturbance ∈ inserted by the repulsion in the values of ∆uv. However, the results obtained for 10 of the studied networks discouraged us from proceeding with this approach. As average the use of the two parameters α and β makes predictions of triadic closure which are 7% higher than those using only one parameter, with maximum differences of up to 20% for one network (results not shown here). Thus, we will use the more general approach of calibrating both empirical parameters. 4.1. Predicting and interpreting triadic closure. The first series of results refers to the finding of the optimal values of α and β for the studied networks and the comparison of the percentage of triadic closures correctly predicted by the method in comparison with the random process. The results of the tests are listed in table 1. The columns α∗ and β∗ contain the average best values for the parameters, where the averageh isi takenh overi the 100 iterations we run. As it can be seen there is a large variety of values for both parameters, which in general can have positive or negative values. We will return to the interpretation of these values later on when we discuss the types of processes (enhancing or empathic) taking place on the networks analyzed. As can be seen in the table 1, as average the method based on the communicability distance functions predicts correctly 20% of the triad closures in the real-world networks studied. In some cases this percentage of correct prediction reaches values of more than 40%. In contrast, the random closure of potential triangles identifies only 7.6% of the real triangles existing in those networks. The differences between the current method and the random process could be as high as 30% as observed for a few of the networks studied. Then, to conclude this part, we stress the fact that these results illustrate the great potentialities that the current method has for predicting triadic closures in real-world networks. As we have remarked when explaining the method we are proposing in this work, it is impossible for most of the real-world networks to know a priori whether the process which drives the triad closure is dominated by enhancer or empathic links. This is the main reason why we had to implement the calibration procedure to obtain the values of the empirical parameters α and β. However, an interesting side-effect of this process is that we can obtain information about such process a posteriori. In fact, if we have obtained relatively high percentages of correct triad closure for a given 12 Ernesto Estrada and Francesca Arrigo network, we can then analyze the sign of ∆uv and infer whether these triangles have been closed by an enhancer or by an empathic link (see 3.1 for details). There are a few important remarks that have to be made here. First,§ for the sake of simplicity we are considering the averages ∆uv . Using these values to infer the type of process which has driven the triadic closureh assumesi that all the potential triangles had been closed by following the same mechanism. However, we know that this is hardly the case in most of the real-world situations and thus such inferences have to be read with care in most of the cases. Despite of this, they can still give some hints about the “average” kind of process taking place in the network and such information may be useful for understanding more complex phenomena taking place on the network. Secondly, it has to be said that the terms “enhancer” and “empathic” have not necessarily any meaning for some of the networks studied. Hence, again these terms have to be considered metaphorically here before making any speculation about the real driving forces of triadic closure in some networks. With these clarifications in mind, we have classified the networks studied into two groups. The first group correspond to networks in which the potential triangles are closed preferentially by enhancer links ∆uv < 0 and the second by empathic links, ∆uv > 0. h i h i In table 1 we give the values of ∆uv for all the networks, which have already been grouped according to the sign of thish index.i We also report the standard deviations σ(∆) for the values of ∆uv. The networks with the most negative values of ∆uv are social networks (Sawmill, Social3, MathMethod, Galesburg, and Prison) ash welli as a food web (Grassland). On the other hand, the networks for which the triads have been closed preferentially by empathic links do not realize large values in this index. This is perhaps an indication that in these networks there is a mix up of positive and negative values of ∆uv for the individual edges (see further). Also the types of networks with the largest positive values of ∆uv are more heterogeneous than the ones in the first class, including social, foodh webs,i biological, and technological networks.

Let us first analyze the networks with the largest negative values of ∆uv . Among them we find a communication network within a small enterprise: a sawmill,h i where all employees were asked to indicate the frequency with which they discussed work matters with each of their colleagues on a five-point scale ranging from less than once a week to several times a day. Two employees were linked in the communication network if they rated their contact as three or more. This appears as a cooperative process in which both advisers and advisees cooperate to share information which is needed for improving their skills and knowledge. Thus, closing the potential triangles in order to enhance the communication between the individuals involved seems a very reasonable mechanism. The other social networks included in the top of the list of this class all share a common characteristic. In the networks Social3 (a network of social contacts among college students participating in a leadership course), Galesburg (a network of friendship among physicians) and MathMethod (a network of friendship among school superintendents) the participants in the respective studies were asked the following questions: Choose the three members they wished to include in a committee; • Nominate three doctors with whom they would choose to discuss medical • matters; Name their best friends among the chief school administrators in Allegheny • County. In the first two cases it is very clear that the participants were asked to nominate Prediction of triadic closure 13

some individuals they would easily cooperate with, e. g. , members of a committee or colleagues with whom to discuss medical matters. The last one looks more like a general kind of friendship relation, but the question was formulated in the context of analyzing the diffusion of a new mathematical method among High Schools in the county. Thus, selecting your best friends among other chief school administrators also means selecting those with whom you would easily cooperate in technical matters. In these situations closing triads by enhancer links benefit the goal of the network created among these groups of individuals. In the class of networks in which triads have been closed by empathic links we find the animal of Dolphins at the top and three other social networks which do not display very high values of ∆uv . These include a social network in a small high-tech computer firm which sells,h installs,i and maintains computer systems, and the friendship network among boys in a High School. The links in the first one were obtained by asking the employees: “Who do you consider to be a personal friend?”. In the second network, the students were asked: “What fellows here in school do you go around with most often?”. The network of Dolphins represents a group of bottlenose dolphins living in the waters of the fiord of Doubtful Sound (New Zealand). This network is very well know for the fact that after the departure of a leading dolphin the network split into factions, a situation which may create the necessity of empathic triadic closure among the dolphins in the same faction. It could be argued that also the network Zachary, representing the friendship ties among the members of a karate club, should display similar characteristics to this network. Indeed, in this network a conflict between the administrator and the trainer in the karate club forces the network to split into two factions. Although the karate club network is classified in the first class, it is clear from the standard deviation σ(∆) that not every triad was closed by an enhancer link, but several were closed by empathic ones. Thus, the fact that one network belongs to one class or the other is just a matter of the number and magnitude of both empathic and enhancer links. The other three social networks mentioned are in the same situation with both empathic and enhancer links closing triads. It is difficult and risky to extend this analysis to networks different from the social ones. However, it is interesting to note that the three protein-protein interaction networks analyzed are all in the class in which triads have been closed by empathic links. Then, it would be the case that triads in these networks are closed more likely because of similarity among the functions of proteins than because of the necessity of enhancing the communication among them. It is well known that these networks are highly modular and that the modules represent proteins with similar functions in which a high number of intra-modular triangles exists. Further specialized studies are needed for deducing more general conclusions about these mechanisms. However, the current work clearly points out the many possibilities that the proposed method has for predicting and understanding triad closure in real-world networks.

4.2. Network evolution under triadic closure. Finally in this section we use the results of the calibrated method for modeling the triadic closure evolution in a given network. Although we can model the future evolution of a network from its current state, we prefer to consider a network in an early stage of its evolution and predicting how it has evolved towards its current structure. This method allows to contrast the predictions made by the current method with some control parameters obtained for the real-world network at its current state. For this experiment we se- lected the network of injecting drug users (Drugs) for which we consider the clustering coefficient, the average path length, the heterogeneity index, and the average com- 14 Ernesto Estrada and Francesca Arrigo

Table 1 Results of the calibration method for the 25 complex networks studied in this work.

Network r detected rand. hα∗i hβ∗i h∆uvi σ(∆) Sawmill 18 27% 10% −1.906 1.25 −0.197 0.084 social3 32 24% 11% −1.164 1.258 −0.143 0.039 MathMethod 19 25% 10% −1.196 0.574 −0.125 0.051 Grassland 30 25% 9% −1.833 1.203 −0.127 0.058 Galesburg 29 23% 11% −0.902 0.648 −0.099 0.037 Prison 58 30% 12% −0.294 1.492 −0.088 0.081 Zachary 45 42% 10% −1.696 0.392 −0.053 0.085 BridgeBrook 774 13% 3% −1.977 1.046 −0.019 0.017 Colorado 17 20% 1% −0.754 0.044 −0.015 0.018 Transc yeast 72 4% 1% −0.221 0.544 −0.006 0.002 USAir97 12181 45% 18% −1.452 −0.63 −0.002 0.005 Dolphins 95 24% 13% 0.364 −0.586 0.038 0.013 ScotchBroom 358 31% 4% 0.372 −0.660 0.014 0.048 StMartin 278 16% 11% 0.232 −0.335 0.012 0.005 PIN Ecoli 478 10% 5% 1.025 −0.137 0.009 0.011 High tech 77 31% 16% −0.198 −0.288 0.007 0.012 Drugs 3598 27% 16% −0.526 −1.048 0.005 0.006 High School 199 28% 18% 0.654 0.434 0.004 0.019 Neurons 2808 16% 8% −0.526 −0.978 0.004 0.003 Geom 12325 12% 6% −0.14 −1.149 0.003 0.002 Ythan1 507 10% 4% −0.248 −0.492 0.003 0.004 PIN Yeast 3530 13% 4% 1.53 −1.842 0.002 0.005 Internet 2331 26% 0% −0.1 −1.842 0.002 0.001 PIN Human 1047 5% 2% 0.203 −0.291 0.001 0.001 Roget 1550 7% 6% 0.305 −0.008 0.001 0.002

municability of the actual network. In order to perform these experiments we select 50% of the total number of triangles existing in the network and we remove one edge from each of them. Such edge is selected uniformly at random among those present in the corresponding triangle. As before, let L be the list of edges obtained as the union of the potential edges and of those we removed. The values for α∗ and β∗ are those determined empirically in 4 and listed in table 1. § The iteration process goes as follows. We select the edge having the smallest value for ∆uv(α∗,β∗) and we add this edge to the network. Then, we compute the values for the parameters of interest. Finally, the values for ∆uv(α∗,β∗) are updated. Here every addition of an edge is considered as a time step. This iteration is run as many times as the number of edges we have removed. That is, if we removed r edges, we consider a discrete time evolution for 0 t r. We then repeat this experiment 10 times, taking the average and standard deviations≤ ≤ of the corresponding parameter. In order to compare the results we simulate the same process by adding such edges randomly. The results of this experiment are illustrated in Figure 2. The x-axis represents the time evolution of the network and the y-axis gives the average values of the corresponding property at each time step together with the corresponding error bar. The horizontal dotted line represents the actual value of the property for the real- world network. As can be seen in the Figure 2 the current method outperforms the random one for predicting the clustering coefficient of the network. The current value of C for this network is 0.549, while the one predicted by ∆uv is 0.486, which contrasts with that of 0.183 obtained by the random method. We remark here that this is the most important parameter to be considered in this experiment as it is the one which accounts more directly for the ratio of triangles to paths of length two in the network. Prediction of triadic closure 15

Both methods predict very well the average path length of the network, returning values very close to the actual one of the network, which is l = 5.287. The other two parameters considered are not well reproduced neither by the current method nor by the random closure of triads. However, the differences between the two methods give some important insights about what the ∆uv method of triadic closure brings to the analysis of network evolution. The first important remark is that the current method increases more significantly the average communicability of the network than the random triadic closure. This would be important in those cases in which we are interested in maximizing the total average communicability of a network, which is equivalent to increasing the goodness of communication among the nodes in the network. The analysis of the heterogeneity index also gives important insights. The current method overestimates the heterogeneity by about 47%, while the random one underestimates it by about 45%. The trends in the evolution of this parameter for both methods is totally opposed to each other. While the random evolution reduces the heterogeneity at each time step, the current method continuously increases it. This result is of enormous importance. The random method is making the network more and more random-like. This is reflected in its lack of heterogeneity. In contrast, the current method is increasing the heterogeneity and departing the network from the typical homogeneity of Poisson-like degree distribution networks. This result can be used to generate new networks with high heterogeneity, like the ones usually generated by mechanisms.

5. Conclusions and future developments. The prediction of triadic closure is a very important and far from trivial problem in network theory. The fact that most of real-world complex networks have more triangles than expected makes the problem interesting per se. In addition, there is a large amount of evidence that points out how triadic closure in (social) networks is a major driver for other impor- tant structural characteristics of networks, such as degree distributions, clustering, and community structure. Although this is an evidence of its importance, the lack of a general mechanism for predicting triadic closure in networks makes this a very hard problem indeed. The assumption that cooperation is the main driving force for closing triads is hard to be maintained. Indeed, there is evidence that not all individ- ual entities in a complex system benefit from such cooperative relations. Moreover, usually only a fraction of triads becomes triangles in the real-world networks. It is then not surprising that a large number of the works dealing with triadic closure treats the problem as any variant of a random process. With this work, we have made a serious attempt of escaping from this “randomness” assumption and consider that the triadic closure is a communication-driven process. On this basis, a potential triangle is closed on any of two distinct processes which account for the enhancement of communication between nodes at distance two and for the empathy of relations between such nodes. This new paradigm is mathematically formulated in terms of communicability distance functions that account for the “goodness” of communica- tion between nodes. Our results here are very encouraging as they show that the communicability-based method outperforms the random process of triadic closure in all the networks studied. Moreover, the validity of the results is strengthened by the wide variety of networks studied, which indeed arise form different scenarios. In some cases the differences in the prediction between the two methods is larger than 40% in favor of the new approach. Also importantly, the communicability-based method for triadic closure outperforms the random one in explaining a few important network properties and displays some interesting features toward its use for communication 16 Ernesto Estrada and Francesca Arrigo

Average Clustering coeff Average path length 0.6 20

0.5 15 0.4

0.3 10 0.2

0.1 5 0 0 500 1000 1400 0 500 1000 1400 ∆ uv random

Average Heterogeneity index Average communicability

0.4

0.35 5 10

0.3

0.25

0.2 0 10

0.15 0 500 1000 1400 0 500 1000 1400

Fig. 2. Illustration of the evolution of the clustering coefficient, average path length, network heterogeneity, and average communicability for the network of injecting drug users (Drugs) as a result of the triadic closure obtained by the current method and at random (see the text for expla- nations). and degree heterogeneity optimization. The new method is, obviously, more compu- tationally demanding than the random process, but the improvement in the results very well justifies its use. There are a few immediate directions of research in this area which we will briefly describe here. It would be interesting to further explore the possibilities of this communicability-driven mechanism for the generation of networks with large de- gree heterogeneity, such as those having fat-tailed degree distributions. For instance, one could start with an Erd¨os–R`enyi and then proceed closing triads by using this communicability-driven approach. As shown in this work, this strat- egy should increase the degree heterogeneity of the network, departing it from the low values characterizing random graphs and approaching those observed in networks obtained by preferential attachment models. The reader should be alerted that the aim of these experiments is not only of practical interest but of conceptual relevance. Nowadays it is believed that such fat-tailed degree distributions observed in real- world networks are mostly the consequence of preferential attachments mechanisms [2]. Thus, a positive result in this direction will question the validity of such statement. Recently, in [1] several mathematical techniques to tune the total communicability (see [6]) of a network have been introduced; these allow to obtain more robust networks with the highest possible goodness of communication among the nodes. The current Prediction of triadic closure 17

communicability-driven mechanism for triadic closure may help in this effort as well. Indeed, as we have shown here, the average communicability of the network increases as a secondary effect of closing triads using this method. Finally, a third direction of exploration should include the so-called network tem- perature, which has been previously used in the study of network communicability [14]. This parameter will allow the tuning of the communicability functions used in the definition of ∆uv, increasing the flexibility of the method to improve the calibra- tion results. In closing, the current communicability-driven method for predicting triadic clo- sure in networks displays very interesting properties and excellent perspectives for future developments. Appendix. In this section we give a brief description of the networks used for the tests throughout the paper. Brain networks Neurons: Neuronal synaptic network of the nematode C. elegans. Included • all data except muscle cells and using all synaptic connections [53, 41]. Ecological networks BridgeBrook: pelagic species from the largest of a set of 50 New York Adiron- • dack lake food webs [47]; Grassland: all vascular plants and all insects and trophic interactions found • inside stems of plants collected from 24 sites distributed within England and Wales [36]; ScotchBroom: trophic interactions between the herbivores, parasitoids, preda- • tors and pathogens associated with broom, Cytisus scoparius, collected in Silwood Park, Berkshire, England, UK [37]; StMartin: birds and predators and arthropod prey of Anolis lizards on the • island of St. Martin, which is located in the northern Lesser Antilles [35]; Ythan1: mostly birds, fishes, invertebrates, and metazoan parasites in a Scot- • tish Estuary [27]. Informational networks Roget: vocabulary network of words related by their definitions in Rogets • Thesaurus of English. Two words are connected if one is used in the definition of the other [49]. PPI networks PIN Ecoli: protein-protein interaction network in Escherichia coli [9]; • PIN Human: protein-protein interaction network in human [50]; • PIN Yeast: protein-protein interaction network in S. cerevisiae (yeast) [8, 38]. • Social and economic networks Colorado: the risk network of persons with HIV infection during its early • epidemic phase in Colorado Spring, USA, using analysis of community wide HIV/AIDS contact tracing records (sexual and injecting drugs partners) from 1985-1999 [48]; Dolphins: social network of frequent association between 62 bottlenose dol- • phins living in the waters off New Zealand [33]; Drugs: social network of injecting drug users (IDUs) that have shared a needle • in the last six months [43]. Galesburg: friendship ties among 31 physicians [11, 29, 46]; • Geom: collaboration network of scientists in the field of Computational Ge- • ometry [3]; 18 Ernesto Estrada and Francesca Arrigo

High School: network of relations in a high school. The students choose the • three members they wanted to have in a committee [55]; High tech: friendship ties among the employees in a small high-tech computer • firm which sells, installs, and maintain computer systems [30, 46]; MathMethod: this network concerns the diffusion of a new mathematics • method in the 1950s. It traces the diffusion of the modern mathematical method among school systems that combine elementary and secondary pro- grams in Allegheny County (Pennsylvania, U.S.) [10, 46]; Prison: social network of inmates in prison who chose What fellows on the • tier are you closest friends with? [34]; Sawmill: social communication network within a sawmill, where employees • were asked to indicate the frequency with which they discussed work matters with each of their colleagues [39, 46]; social3: social network among college students in a course about leadership. • The students choose which three members they wanted to have in a committee [55]; Zachary: social network of friendship among the members of a karate club • [54]. Technological networks Internet: the Internet at the Autonomous System (AS) level as of September • 1997 and of April 1998 [23]; USAir97: airport transportation network between airports in US in 1997 [3]. • Transcription networks Transc yeast: direct transcriptional regulation between genes in Saccaromyces • cerevisiae [41, 42].

REFERENCES

[1] F. Arrigo and M. Benzi, Updating and downdating techniques for optimizing network com- municability, arXiv:1410.5303, 2014. [2] A. L. Barabasi´ and R. Albert, Emergence of scaling in random networks, Science, 286 (1999), pp. 509–512 [3] V. Batagelj and A. Mrvar, Pajek datasets, http://vlado.fmf.uni-lj.si/pub/networks/data/. [4] M. Benzi and G. H. Golub, Bounds for the entries of matrix functions with application to preconditioning, BIT 39 (1999), pp. 417–438. [5] M. Benzi and P. Boito, Quadrature rule-based bounds for functions of adjacency matrices, Linear Algebra Appl. 433 (2010), pp. 637–652. [6] M. Benzi and C. Klymko, Total communicability as a measure, J. Complex Net. , 1(2) (2013), pp. 124–149. [7] G. Bianconi, R. K. Darst, J. Iacovacci and S. Fortunato, Triadic closure as a basic generating mechanism of communities in complex networks, Phys. Rev. E, 90 (4) (2014), 042806. [8] D. Bu Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen, Topological structure analysis of the protein-protein interaction network in budding yeast, Nucleic Acids Res. 31 (2003), pp. 2443–2450. [9] G. Butland, J. M. Peregrn–Alvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Staros- tine, D. Richards, B. Beattie, N. Krogan, M. Davey, J. Parkinson, J. Greenblatt, and A. Emili, Interaction network containing conserved and essential protein complexes in Escherichia coli, Nature 433.7025 (2005), pp. 531–537. [10] R. O. Carlson, Adoption of Educational Innovations, Eugene: University of Oregon, Center for the Advanced Study of Educational Administration (1965), p. 19). [11] J. S. Coleman, E. Katx, H. Menzel, Medical Innovation. A Diffusion Study, Indianapolis: Bobbs–Merrill Company, 1966. [12] L. F. Costa, O. N. Oliveira Jr, G. Travieso, F. A. Rodrigues,P. R. Villas Boas, L. An- tiqueira, M. P. Viana, and L. E. C. Rocha, Analyzing and modeling real-world phenom- Prediction of triadic closure 19

ena with complex networks: a survey of applications, Advances in Physics 60 (3) (2011), pp. 329–412. [13] E. Estrada and J. A. Rodr´ıguez-Velazquez´ , Subgraph centrality in complex networks, Phys. Rev. E 71 (2005), 056103. [14] E. Estrada and N. Hatano, Statistical-mechanical approach to subgraph centrality in complex networks, Chem. Phy. Lett. 439 (2007), pp. 247–251 [15] E. Estrada and N. Hatano, Communicability in Complex Networks, Phys. Rev. E, 77 (2008), 036111. [16] E. Estrada, Quantifying network heterogeneity, Phys. Rev. E 82 (6) (2010), 066102. [17] E. Estrada, D. J. Higham, Network properties revealed through matrix functions, SIAM Rev. 52 (2010), pp. 696–714. [18] E. Estrada, The Structure of Complex Networks. Theory and Applications, Oxford University Press, 2011; [19] E. Estrada, N. Hatano, M. Benzi, The physics of communicability in complex networks, Phys. Rep. 514 (2012), pp. 89–119. [20] E. Estrada, The communicability distance in graphs, Linear Algebra Appl. , 436 (2012), pp. 4317–4328. [21] E. Estrada, Complex networks in the Euclidean space of communicability distances, Phys. Rev. E, 85 (2012), 066122. [22] E. Estrada and M. Benzi, Atomic displacements due to spin-spin repulsion in conjugated alternant hydrocarbons, Chem. Phys. Lett. , 568–569 (2013), pp. 184–189. [23] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power-law relationships of the internet topology, Comp. Comm. Rev. 29 (1999), pp. 251–262. [24] G. H. Golub and G. Meurant, Matrices, Moments and Quadrature with Applications, Prince- ton University Press, Princeton, NJ 2010. [25] N. J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2008. [26] R. A. Horn and C. R. Johnson, Matrix Analysis. Second Edition, Cambridge University Press, 2013. [27] M. Huxman, S. Beany, and D. Raffaelli, Do parasites reduce the chances of triangulation in a real food web?, Oikos 76 (1996), pp. 284–300. [28] P. Klimek and S. Thurner, Triadic closure dynamics drives scaling laws in social multiplex networks, New Journal of Physics 15.6 (2013): 063008. [29] D. Knoke and R. S. Burt, Prominence, Applied network analysis (1983): pp. 195–222. [30] D. Krackhardt, The ties that torture: Simmelian tie analysis in organizations, Res. So- ciol. Org. 16 (1999), pp. 183–210. [31] D. Krackhardt and M. Handcock, Heider vs. Simmel: Emergent features in dynamic structure, Statistical Network Analysis: Models, Issues, and New Directions (2007), pp. 14– 27. [32] J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins, Microscopic evolution of social net- works, Y. Li, B. Liu, S. Sarawagi (eds.) KDD, pp. 462–470, ACM (2008). [33] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphin community in Doubtful Sound features a large proportion of long-lasting associations, Behavioral Ecology and Sociobiology 54 (2003), pp. 396–405. [34] D. MacRae, Direct factor analysis of sociometric data, Sociometry 23 (1960), pp. 360–371. [35] N. D. Martinez Artifacts or attributes? Effects of the resolution on the Little Rock lake food web, Ecol. Monogr. 61 (1991), pp. 367–392. [36] N. D. Martinez, B. A. Hawkins, H. A. Dawah, and B. P. Feifarek, Effects of sampling efforts on characterization of food web structure, Ecology 80 (1999), pp. 1044–1055. [37] J. Memmott, N. D. Martinez, and J. E. Cohen, Predators, parasitoids and pathogens: species richness, trophic generality and body sizes in a natural food web, J. Anim. Ecol. 69. 1 (2000), pp. 1–15. [38] C. von Mering, R. Krause, B. Snel, M. Cornell, S. G. Oliver, S. Fields, and P. Bork, Comparative assessment of large-scale data sets of protein-protein interactions, Nature 417 (2002), pp. 399–403. [39] J. H. Michael and J. G. Massey, Modeling the communication network in a sawmill, Forest Prod. J. 47 (1997), pp. 25–30. [40] P. Van Mieghem, Graph Spectra for Complex Networks, Cambridge University Press, 2011. [41] R. Milo, S. Shen–Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon Network motifs: simple building blocks of complex networks, Science, vol. 298 no. 5594 (2002), pp. 824–827. [42] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. S. Shen–Orr, I. Ayzenshtat, M. Shef- 20 Ernesto Estrada and Francesca Arrigo

fer, and U. Alon, Superfamilies of evolved and designed networks, Science 303.5663 (2004), pp. 1538–1542. [43] Data for this project were provided, in part, by NIH grants DA12831 and HD41877, and copies can be obtained from James Moody (moody.77@.osu.edu). [44] M. E. J. Newman, The structure and function of complex networks, SIAM Rev. 45 (2003), pp. 167–256; [45] M. E. J. Newman, Networks. An Introduction, Oxford University Press, 2010. [46] W. De Nooy, A. Mrvar, and V. Batagelj, Exploratory with Pajek, Cambridge University Press, 2005. [47] G. A. Polis and R. S. Donald, Food web complexity and community dynamics, Am. Nat. (1996): pp. 813-846. [48] J. J. Potterat, L. Philips–Plummer, S. Q. Muth, R. B. Rothenberg, D. E. Woodhouse, T. S. Maldonado–Long, H. P. Zimmermann, J. B. Muth, Risk network structure in the early epidemic phase of HIV transmission in Colorado Springs, Sex. Transm. Infect. 78 (2002), pp. i159–i163. [49] Roget’s thesaurus of english words and phrases, Project Gutenberg (2002), http://www.gutenberg.org/etxt/22. [50] J. F. Rual, K. Venkatesan, T. Hao, T. Hirozante–Kishikawa, A. Dricot, L. Ning, G. F. Berriz, F. D. Gibbons, M. Dreze, N. Ayivi–Guedehoussou, N. Klitgord, C. Si- mon, M. Boxem, S. Milstein, J. Rosenberg, D. S. Goldberg, L. V. Zhang, S. L. Wong, G. Franklin, S. Li, J. S. Albala, J. Lim, C. Fraughton, E. Llamosas, S. Cevik, P. Lamesch, R. S. sikoroski, J. Andenhaute, H. Y. Zoghbi, A. Smolyar, S. Bosak, R. Sequerra, L. Doucette–Stamm, M. E. Cusick, D. E. Hill, F. P. Roth, and M. Vi- dal, Towards a proteome-scale map of the human protein-protein interaction networks, Nature 437 (2005), pp. 1059–1069. [51] G. Simmel, Conflict and The Web of Group Affiliations, Free Press, New York (1922). [52] D. J. Watts and S. H. Strogatz, Collective dynamics of small-world networks, Nature 393 (1998), pp. 440–442. [53] J. G. White, E. Southgate, J. N. Thomson, and S. Brenner The structure of the nervous system of the nematode Caenorhabditis elegans, Philos. T. Roy. Soc. B, 314.1165 (1986), pp. 1-340. [54] W. W. Zachary, An information flow model for conflict and fission in small groups, J. An- thropol. Res., 33 (1977), pp. 452–473. [55] L. D. Zeleny, Adaptation of research findings in social leadership to college classroom proce- dures, Sociometry, 13 (4) (1950), pp. 314–328