1 measures for graphons: Accounting for uncertainty in networks

Marco Avella-Medina, Francesca Parise, Michael T. Schaub, and Santiago Segarra, Member, IEEE

Abstract—As relational datasets modeled as graphs keep increasing in size and their data-acquisition is permeated by uncertainty, graph-based analysis techniques can become computationally and conceptually challenging. In particular, node centrality measures rely on the assumption that the graph is perfectly known — a premise not necessarily fulfilled for large, uncertain networks. Accordingly, centrality measures may fail to faithfully extract the importance of nodes in the presence of uncertainty. To mitigate these problems, we suggest a statistical approach based on graphon theory: we introduce formal definitions of centrality measures for graphons and establish their connections to classical graph centrality measures. A key advantage of this approach is that centrality measures defined at the modeling level of graphons are inherently robust to stochastic variations of specific graph realizations. Using the theory of linear integral operators, we define , eigenvector, Katz and PageRank centrality functions for graphons and establish concentration inequalities demonstrating that graphon centrality functions arise naturally as limits of their counterparts defined on sequences of graphs of increasing size. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score.

Index Terms— theory, Networks, Graphons, Centrality measures, !

1 INTRODUCTION

ANY biological [1], social [2], and economic [3] systems tems we might be unable to extract a complete and accurate M can be better understood when interpreted as net- graph-based representation, e.g., due to computational or works, comprising a large number of individual components measurement constraints, errors in the observed data, or that interact with each other to generate a global behavior. because the network itself might be changing over time. For These networks can be aptly formalized by graphs, in such reasons, some recent approaches have considered the which nodes denote individual entities, and edges represent issue of robustness of centrality measures [12], [13], [14], pairwise interactions between those nodes. Consequently, [15] and general network features [16], as well as their a surge of studies concerning the modeling, analysis, and computation in dynamic graphs [17], [18], [19]. The closest design of networks have appeared in the literature, using work to ours is [20], where convergence results are derived graphs as modeling devices. for eigenvector and Katz in the context of random A fundamental task in network analysis is to identify graphs generated from a stochastic block model. salient features in the underlying system, such as key nodes or agents in the network. To identify such important agents, As the size of the analyzed systems continues to grow, researchers have developed centrality measures in various traditional tools for network analysis have been pushed to contexts [4], [5], [6], [7], [8], each of them capturing different their limit. For example, systems such as the world wide aspects of node importance. Prominent examples for the util- web, the brain, or social networks can consist of billions of ity of centrality measures include the celebrated PageRank interconnected agents, leading to computational challenges algorithm [9], [10], employed in the search of relevant sites on and the irremediable emergence of uncertainty in the obser- arXiv:1707.09350v4 [cs.SI] 28 Nov 2018 the web, as well as the identification of influential agents in vations. In this context, graphons have been suggested as an social networks to facilitate viral marketing campaigns [11]. alternative framework to analyze large networks [21], [22]. A crucial assumption for the applicability of these cen- While graphons have been initially studied as limiting objects trality measures is that the observation of the underlying of large graphs [23], [24], [25], they also provide a rich non- network is complete and noise free. However, for many sys- parametric modeling tool for networks of any size [26], [27], Authors are ordered alphabetically. All authors contributed equally. M. Avella- [28], [29]. In particular, graphons encapsulate a broad class Medina is with the Department of Statistics, Columbia University. F. Parise of network models including the stochastic block model [30], is with the Laboratory for Information and Decision Systems, MIT. M. [31], random dot-product graphs [32], the infinite relational Schaub is with the Institute for Data, Systems, and Society, MIT and model [33], and others [34]. A testament of the practicality with the Department of Engineering Science, University of Oxford, UK. S. Segarra is with the Department of Electrical and Computer Engineering, of graphons is their use in applied disciplines such as signal Rice University. Emails: [email protected], {parisef, processing [35], collaborative learning [36], and control [37]. mschaub}@mit.edu, [email protected]. Funding: This work was supported by the Swiss National Science Foundation grants P2EGP1 168962, P300P1 177746 (M. Avella-Medina) and P2EZP2 168812, P300P2 177805 In this work we aim at harnessing the additional flexibility (F. Parise); the European Union’s Horizon 2020 research and innovation provided by the graphon framework to suggest a statistical programme under the Marie Sklodowska-Curie grant agreement No 702410 (M. Schaub); the Spanish MINECO TEC2013-41604-R and the MIT IDSS approach to agents’ centralities that inherently accounts for Seed Fund Program (S. Segarra). network uncertainty, as detailed next. 2 1.1 Motivation

Most existing applications of network centrality measures follow the paradigm in Fig. 1a: a specific graph — such as a with friendship connections — is observed, and conclusions about the importance of each agent are then drawn based on this graph, e.g., which individuals have more friends or which have the most influential connections. Mathematically, these notions of importance are encapsulated in a centrality measure that ranks the nodes according to the observed network structure. For instance, the idea that importance derives from having the most friends is captured by degree centrality. Since the centrality of any node is computed solely from the network structure [4], [5], [6], [7], [8], a crucial assumption hidden in this analysis is that the empirically observed network captures all the data we care about. However, in many instances in which centrality measures are employed, this assumption is arguably not fulfilled: we typically do not observe the complete network at once. Further, even those parts we observe contain measurement errors, such as false positive or false negative links, and other forms of uncertainty. The key question is therefore how to identify crucial nodes via network-based centrality measures without having access to an accurate depiction of the ‘true’ latent network. One answer to this problem is to adopt a statistical inference-based viewpoint towards centrality measures, by assuming that the observed graph is a specific realization Fig. 1: Schematic — Network centrality analysis. (a) Classical centrality of an underlying stochastic generative process; see Fig. 1b. analysis computes a centrality measure purely based on the observed In this work, in particular, we use graphons to model such network. (b) If networks are subject to uncertainty, we may adopt a underlying generative process, because they provide a rich statistical perspective on centrality, by positing that the observed network non-parametric statistical framework. Further, it has been is but one realization of a true, unobserved latent model. Inference of the model then would lead to a centrality estimate that accounts for the recently shown that graphons can be efficiently estimated uncertainty in the data in a well-defined manner. (c) Illustrative example. from one (or multiple) noisy graph observations [28], [38]. Left: A network of 100 nodes is generated according to a graphon model Our main contribution is to show that, based on the inferred with a well-defined increasing connectivity pattern. Right: This graphon model defines a latent (expected) centrality for each node (blue curve). graphon, one can compute a latent centrality profile of the The centralities of a single realization of the model (red curve) will in nodes that we term graphon centrality function. This graphon general not be equivalent to the latent centrality, but deviate from it. centrality may be seen as a fundamental measure of node Estimating the graphon-based centrality thus allows us to decompose importance, irrespective of the specific realization of the the observed centrality into an expected centrality score (blue), and a fluctuation that is due to randomness. graph at hand. This leads to a robust estimate of the centrality profiles of all nodes in the network. In fact, we provide high-probability bounds between the of such latent graphon centrality functions and the centrality profiles in The result is that while v1 is the most central node in the any realized network, in terms of the network size. specific graph realization (see red curve), we would expect it To illustrate the dichotomy of the standard approach to be less central within the model-based framework since towards centrality and the one outlined here, let us consider the two nodes to its right have higher latent centrality (see the graphon in Fig. 1c. Graphons will be formally defined in blue curve). Stated differently, in this specific realization, v1 Section 2.2, but the fundamental feature is that it defines a benefited from the random effects in terms of its centrality. random graph model from where graphs of any pre-specified If another random graph is drawn from the same graphon size can be obtained. If we generate one of these graphs with model, the rank of v1 might change, e.g., node v1 might 100 nodes, we can apply the procedure in Fig. 1a to obtain become less central. Based on a centrality analysis akin to a centrality value for each agent, as shown in the red curve Fig. 1a, we would conclude that the centrality of this node in Fig. 1c. In the standard paradigm we would then sort decreased relative to other agents in the network. However, the centrality values to find the most central nodes, which this difference is exclusively due to random variations and in this case would correspond to the node marked as v1 in thus not statistically significant. The approach outlined in Fig. 1c. On the other hand, if we have access to the generative Fig. 1b and, in particular, centrality measures defined on graphon model (or an estimate thereof), then we can compute graphons thus provide a statistical framework to analyze the continuous graphon centrality function and compare the centralities in the presence of uncertainty, shielding us from deviations from it in the specific graph realization; see blue making the wrong conclusion about the change in centrality and red curves in Fig. 1c. of node v1 if the network is subject to uncertainty. 3

A prerequisite to apply the perspective outlined above 2.1 Graphs and centrality measures is to have a consistent theory of centrality measures for An undirected and unweighted graph G = (V, E) consists of graphons, with well-defined limiting behaviors and well- a set V of N nodes or vertices and an edge set E of unordered understood convergence rates. Such a theory is developed in pairs of elements in V. An alternative representation of such a this paper, as we detail in the next section. graph is through its A ∈ {0, 1}N×N , where Aij = Aji = 1 if (i, j) ∈ E and Aij = 0 otherwise. In this paper we consider simple graphs (i.e., without self-loops), so 1.2 Contributions and article structure that Aii = 0 for all i. Our contributions are listed below. Node centrality is a measure of the importance of a node 1) We develop a theoretical framework and definitions for within a graph. This importance is not based on the intrinsic centrality measures on graphons. Specifically, using the nature of each node, but rather on the location that the existing spectral theory of linear integral operators, we nodes occupy within the graph. More formally, a centrality define the degree, eigenvector, Katz and PageRank centrality measure assigns a nonnegative centrality value to every node functions (see Definition 3). such that the higher the value, the more central the node V 2) We discuss and illustrate three different analytical ap- is. The centrality ranking imposed on the node set is in proaches to compute such centrality functions (see Section 4). general more relevant than the absolute centrality values. 3) We derive concentration inequalities showing that our Here, we focus on four centrality measures, namely, the newly defined graphon centrality functions are natural degree, eigenvector, Katz and PageRank centrality measures limiting objects of centrality measures for finite graphs. These overviewed next; see [8] for further details. concentration inequalities improve the current state of the Degree centrality is a local measure of the importance of cd i art and constitute the main technical results of this paper a node within a graph. The degree centrality i of a node i (see Theorems 1 and 2). is given by the number of nodes connected to , that is, 4) We illustrate how such bounds can be used to quantify cd := A1, (1) the distance between the latent graphon centrality function d d and the centrality measures of finite graphs sampled from where the vector c collects the values of ci for all i ∈ V. the graphon. , just as degree centrality, depends The remainder of the article is structured as follows. In on the neighborhood of each node. However, the centrality ce i Section 2 we review preliminaries regarding graphs, graph measure i of a given node does not depend only on centralities, and graphons. Subsequently, in Section 3 we the number of neighbors, but also on how important those recall the definition of the graphon operator and use it neighbors are. This recursive definition leads to an equation Ace = λce ce to introduce centrality measures for graphons. Section 4 of the form , i.e., the vector of centralities is A A discusses how centrality measures for graphons can be an eigenvector of . Since is symmetric, its eigenvalues λ ≥ λ ≥ ... ≥ λ computed using different strategies, and provides some are real and can be ordered as 1 2 N . The ce detailed numerical examples. Thereafter, in Section 5, we eigenvector centrality is then defined as the principal eigenvector v1, associated with λ1: derive our main convergence results. Section 6 provides √ e concluding remarks. Appendix A contains proofs omitted c := N v1. (2) in the paper. Appendix B provided in the supplementary material presents some auxiliary results and discussions. For connected graphs, the Perron-Frobenius theorem guaran- λ Notation: The entries of a matrix X and a (column) vector tees that 1 is a simple eigenvalue, and that there is a unique associated (normalized) eigenvector v1 with positive real x are denoted by Xij and xi, respectively; however, in some √ T entries. As will become apparent later, the N normalization cases [X]ij and [x]i are used for clarity. The notation stands for transpose. diag(x) is a diagonal matrix whose ith introduced in (2) facilitates the comparison of the eigenvector centrality on a graph to the corresponding centrality measure diagonal entry is xi. dxe denotes the ceiling function that returns the smallest integer larger than or equal to x. Sets are defined on a graphon. measures the importance of a node based represented by calligraphic capital letters, and 1B(·) denotes on the number of immediate neighbors in the graph as well the indicator function over the set B. 0, 1, ei, and I refer to the all-zero vector, the all-one vector, the i-th canonical basis as the number of two-hop neighbors, three-hop neighbors, vector, and the identity matrix, respectively. The symbols v, and so on. The effect of nodes further away is discounted ϕ λ at each step by a factor α > 0. Accordingly, the vector of , and are reserved for eigenvectors, eigenfunctions, and k 1 2 eigenvalues, respectively. Additional notation is provided at centralities is computed as cα = 1 + (αA) 1 + (αA) 1 + ..., where we add the number of k-hop neighbors weighted the beginning of Section 3. k by α . By choosing α such that 0 < α < 1/λ1(A), the above series converges and we can write the Katz centrality compactly as 2 PRELIMINARIES k −1 cα := (I − αA) 1. (3) In Section 2.1 we introduce basic graph-theoretic concepts Notice that if α is close to zero, the relative weight given k as well as the notion of node centrality measures for finite to neighbors further away decreases fast, and cα is driven graphs, emphasizing the four measures studied throughout mainly by the one-hop neighbors just like degree centrality. In the paper. A brief introduction to graphons and their relation contrast, if α is close to 1/λ1(A), the solution to (3) is almost to random graph models is given in Section 2.2. a scaled version of ce. Intuitively, for intermediate values of 4

α, Katz centrality captures a hybrid notion of importance by W π(x, y) := W (π(x), π(y)) define the same probability combining elements from degree and eigenvector centralities. distribution on random graphs. A precise characterization k We remark that Katz centrality is sometimes defined as cα −1. of the equivalence classes of graphons defining the same Since a constant shift does not alter the centrality ranking, probability distribution can be found in [22, Ch. 10]. we here use formula (3). We also note that Katz centrality is sometimes referred to as Bonacich centrality in the literature. PageRank measures the importance of a node in a recur- 3 EXTENDINGCENTRALITIESTOGRAPHONS sive way based on the importance of the neighboring nodes In order to introduce centrality measures for graphons we (weighted by their degree). Mathematically, the PageRank first introduce a linear integral operator associated with a centrality of node is given by graphon and recall its spectral properties. From here on, we 2 p −1 −1 denote by L ([0, 1]) the Hilbert function space with inner cβ := (1 − β)(I − βAD ) 1, (4) R 1 2 product hf1, f2i := 0 f1(x)f2(x)dx for f1, f2 ∈ L ([0, 1]), p 2 where 0 < β < 1 and D is the diagonal matrix of the degrees and norm kf1k := hf1, f1i. The elements of L ([0, 1]) are of the nodes. Note that the above formula corresponds to the equivalence classes of Lebesgue integrable functions the stationary distribution of a random ‘surfer’ on a graph, f : [0, 1] → R, that is, we identify two functions f and who follows the links on the graph with probability β and g with each other if they differ only on a set of measure with probability (1 − β) jumps (‘teleports’) to a uniformly zero (i.e., f ≡ g ⇔ kf − gk = 0). 1[0,1] is the identity at random selected node in the graph. See [10] for further function in L2([0, 1]). We use blackboard bold symbols (such 2 details on PageRank. as L) to denote linear operators acting on L ([0, 1]), with the exception of N and R that denote the sets of natural and 2.2 Graphons real numbers. The induced (operator) norm is defined as A graphon is the limit of a convergent sequence of graphs of |||L||| := supf∈L2([0,1]) s.t. kfk=1 kLfk. increasing size, that preserves certain desirable features of the graphs contained in the sequence [21], [22], [23], [24], [25], 3.1 The graphon integral operator and its properties [39], [40], [41]. Formally, a graphon is a measurable function Following [22], we introduce a linear operator that is funda- W : [0, 1]2 → [0, 1] that is symmetric W (x, y) = W (y, x). mental to derive the notions of centrality for graphons. Intuitively, one can interpret the value W (x, y) as the proba- bility of existence of an edge between x and y. However, the Definition 1 (Graphon operator). For a given graphon W , we ‘nodes’ x and y no longer take values in a finite node set as define the associated graphon operator W as the linear integral 2 2 in classical finite graphs but rather in the continuous interval operator W : L ([0, 1]) → L ([0, 1]) [0, 1]. Based on this intuition, graphons also provide a natural Z 1 way of generating random graphs [39], [42], as introduced in f(y) → (Wf)(x) = W (x, y)f(y)dy. the seminal paper [23] under the name W -random graphs. 0 In this paper we will make use of the following model, in From an operator theory perspective, the graphon W is (N) N×N which the symmetric adjacency matrix S ∈ {0, 1} of the integral kernel of the linear operator W. Given the key a simple random graph of size N constructed from a graphon importance of W, we review its spectral properties in the is such that for all i, j ∈ {1,...,N} next definition and lemma. (N) Pr[Sij = 1|ui, uj] = κN W (ui, uj), (5) Definition 2 (Eigenvalues and eigenfunctions). A complex number λ is an eigenvalue of if there exists a nonzero function where u and u are latent variables selected uniformly at W i j ϕ ∈ L2([0, 1]), called the eigenfunction, such that random from [0, 1], and κN is a constant regulating the 1 sparsity of the graph (see also Definition 7) (Wϕ)(x) = λϕ(x). (6) This means that, when conditioned on the latent variables (u1, u2, . . . , uN ), the off-diagonal entries of the symmetric It follows from the above definition that the eigenfunc- matrix S(N) are independent Bernoulli random variables tions are only defined up to a rescaling parameter. Hence, with success probability given by κN W . In this sense, when from now on we assume all eigenfunctions are normalized kϕk = 1. κN = 1, the constant graphon W (x, y) = p gives rise such that to Erdos-R˝ enyi´ random graphs with edge probability p. We next recall some known properties of the graphon Analogously, a piece-wise constant graphon gives rise to operator. stochastic block models [30], [31]; for more details see Section Lemma 1. The graphon operator W has the following properties. 4.1. Interestingly, it can be shown that the distribution 1) is self-adjoint, bounded, and continuous. of any simple exchangeable random graph [34], [39] is W characterized by a function W as discussed above [39], 2) W is diagonalizable. Specifically, W has countably many eigen- [47], [48]. Finally, observe that for any measure preserv- values, all of which are real and can be ordered as λ1 ≥ λ2 ≥ λ3 ≥ 2 ing map π : [0, 1] → [0, 1], the graphons W (x, y) and .... Moreover, there exists an orthonormal basis for L ([0, 1]) ∞ of eigenfunctions {ϕi}i=1. That is, (Wϕi)(x) = λiϕi(x), 2 1. Throughout this paper we adopt the terminology of sparse graphs hϕi, ϕji = δi,j for all i, j and any function f ∈ L ([0, 1]) for graphs generated following (5) with parameter κN → 0 and NκN → P∞ can be decomposed as f(x) = hf, ϕiiϕi(x). Consequently, ∞ as N → ∞, even though this does not imply a bounded degree. This i=1 terminology is consistent with common usage in the literature [43], [44], ∞ [45]. Note also that [46] proposed an interesting graph limit framework X (Wf)(x) = λihf, ϕiiϕi(x). for graph sequences with bounded degree. i=1 5

If the set of nonzero eigenvalues is infinite, then 0 is its unique and the properties of the higher order powers of W we obtain the accumulation point. equivalent representation k 3) Let W denote k consecutive applications of the operator W. −1 −1 P∞ k k (Mα f)(x) = ((I − αW) f)(x) = k=0 α (W f)(x) Then, for any k ∈ N, P∞ k P∞ k = f(x) + k=1 α i=1 λi hϕi, fiϕi(x) ∞ P∞ αλi k X k = f(x) + hϕi, fiϕi(x), (W f)(x) = λi hf, ϕiiϕi(x). i=1 1−αλi i=1 where we used that |λi| < |||W||| for all i. Using an analogous series representation it can be shown that PageRank is well defined. λ 4) The maximum eigenvalue 1 is positive and there exists an Note also that eigenvector centrality is well-defined by Lemma 1, ϕ ϕ (x) > 0 associated eigenfunction 1 which is positive, that is, 1 part 4). for all x ∈ [0, 1]. Moreover, λ1 = |||W|||. Since a graphon describes the limit of an infinite dimen- Points 1 to 3 of the lemma above can be found in [24], sional graph, there is a subtle difference in the semantics of while part 4) follows from the Krein-Rutman theorem [49, the centrality measure compared to the finite graph setting. Theorem 19.2], upon noticing that the graphon operator W Specifically, in the classical setting the network consists of a K is positive with respect to the cone defined by the set of finite number of nodes and thus for a graph of N nodes we L2([0, 1]) nonnegative functions in . obtain an N-dimensional vector with one centrality value per node. In the graphon setting, we may think of each real 3.2 Definitions of centrality measures for graphons x ∈ [0, 1] as corresponding to one of infinitely many nodes, We define centrality measures for graphons based on the and thus the centrality measure is described by a function. graphon operator introduced in the previous section. These definitions closely parallel the construction of centrality 4 COMPUTINGCENTRALITIESONGRAPHONS measures in finite graphs; see Section 2.1. The main difference We illustrate how to compute centrality measures for is that the linear operator defining the respective centralities graphons by studying three examples in detail. The graphons is an infinite dimensional operator, rather than a finite we consider are ordered by increasing complexity of their dimensional matrix. respective eigenspaces and by the generality of the methods Definition 3 (Centrality measures for graphons). Given a used in the computation of the centralities. graphon W and its associated operator W, we define the following centrality functions: 4.1 Stochastic block model graphons d 1) Degree centrality: We define c : [0, 1] → R+ as We consider a class of piecewise constant graphons that may be seen as the equivalent of a stochastic block model cd(x) := ( 1 )(x) = R 1 W (x, y)dy. (7) W [0,1] 0 (SBM). Such graphons play an important role in practice, as 2) Eigenvector centrality: For W with a simple largest they enable us to approximate more complicated graphons eigenvalue λ1, let ϕ1 (kϕ1k = 1) be the associated positive eigen- in a ‘stepwise’ fashion. This approximation idea has been e function. The eigenvector centrality function c : [0, 1] → R+ exploited to estimate graphons from finite data [28], [51], is [52]. In fact, optimal statistical rates of convergence can be e c (x) := ϕ1(x). (8) achieved over smooth graphon classes [45], [53]. The SBM graphon is defined as follows 3) Katz centrality: Consider the operator Mα where m m (Mαf)(x) := f(x) − α(Wf)(x). For any 0 < α < 1/|||W|||, X X k WSBM(x, y) := Pij1Bi (x)1Bj (y), (11) we define the Katz centrality function cα : [0, 1] → R+ as i=1 j=1 k −1  m cα(x) := Mα 1[0,1] (x). (9) where Pij ∈ [0, 1], Pij = Pji, ∪i=1Bi = [0, 1] and Bi ∩Bj = ∅ for i 6= j. We define the following m dimensional vector of 4) PageRank centrality: Consider the operator indicator functions Z 1 d † T (Lβf)(x) = f(x) − β W (x, y)f(y)(c (y)) dy, 1(x) := [1B1 (x),..., 1Bm (x)] , (12) 0 enabling us to compactly rewrite the graphon in (11) as where (cd(y))† = (cd(y))−1 if cd(y) 6= 0 and (cd(y))† = 0 if d pr T c (y) = 0. For any 0 < β < 1, we define cβ : [0, 1] → R+ as WSBM(x, y) = 1(x) P1(y). (13) pr −1 We also define the following auxiliary matrices. cβ (x) := (1 − β)(Lβ 1[0,1])(x). (10) 1 Definition 4. Let us define the effective measure matrix Q ∈ Note that cd(y) = R W (y, z)dz = 0 implies W (x, y) = 0 SBM 0 m×m and the effective connectivity matrix E ∈ m×m for almost everywhere. R SBM R SBM graphons as follows Z 1 T Remark 1. The Katz centrality function is well defined, since QSBM := 1(x)1(x) dx, ESBM := PQSBM. (14) for 0 < α < 1/|||W||| the operator Mα is invertible [50, Theorem 0 2.2]. Moreover, denoting the identity operator by I, it follows that Notice that QSBM is a diagonal matrix with entries Mα = I − αW. Hence, by using a Neumann series representation collecting the sizes of each block. Similarly, the matrix ESBM 6 a b c d e

Fig. 2: Illustrative example of a graphon with stochastic block model structure. (a) Graphon WSBM with block model structure as described in the text. (b-d) Degree, eigenvector, Katz, and PageRank centralities for the graphon depicted in (a).

is obtained by weighting the probabilities in P by the sizes [cf. (14)], which for our example are given by of the different blocks. Hence, the effective connectivity from 0.1 1 1 1 0 0 block Bi to two blocks Bj and Bk may be equal even if the 0.3 1 0.5 0 0 0 latter block Bk has twice the size (Qkk = 2Qjj), provided that     Q =diag 0.2 , P= 1 0 0.25 0 1 , (16a) it has half the probability of edge appearance (2P = P ). SBM     ik ij 0.3 0 0 0 0.5 1 Notice also that the matrix E need not be symmetric. As     SBM 0.1 0 0 1 1 1 will be seen in Section 4.2, the definitions in (14) are specific examples of more general constructions. 0.1 0.3 0.2 0 0  The following lemma relates the spectral properties of 0.1 0.15 0 0 0  ESBM to those of the operator WSBM induced by WSBM. We   E = 0.1 0 0.05 0 0.1 . (16b) do not prove this lemma since it is a special case of Lemma 3, SBM    0 0 0 0.15 0.1 introduced in Section 4.2 and shown in Appendix A.   0 0 0.2 0.3 0.1 λ v Lemma 2. Let i and i denote the eigenvalues and eigenvectors The principal eigenvector of E is given by v ≈ E SBM 1 of SBM in (14), respectively. Then, all the nonzero eigenvalues [0.59, 0.28, 0.38, 0.28, 0.59]T . Furthermore, from (15) we can of WSBM are given by λi and the associated eigenfunctions are of T compute the graphon centrality functions to obtain the form ϕi(x) = 1(x) vi. cd(x) = 1(x)T [0.6, 0.25, 0.25, 0.25, 0.6]T , Using the result above, we can compute the centrality e T T measures for stochastic block model graphons based on the c (x) ≈ 1(x) [1.56, 0.72, 0.99, 0.72, 1.56] , effective connectivity matrix. ( T T k 1(x) [1.36, 1.15, 1.16, 1.15, 1.36] if α = 0.5, cα(x) ≈ T T Proposition 1 (Centrality measures for SBM graphons). Let 1(x) [2.86, 1.84, 2.01, 1.84, 2.86] if α = 1.5, pr λi and vi denote the eigenvalues and eigenvectors of ESBM T T c0.85(x) ≈ 1(x) [1.77, 0.82, 0.78, 0.82, 1.77] , in (14), respectively, and define the diagonal matrix DE := d e k pr where for illustration purposes we have evaluated the Katz diag(ESBM1). The centrality functions c , c , cα, and cβ of the graphon WSBM can be computed as follows centrality for two specific choices of α, and we have set β = 0.85 for the PageRank centrality. These four centrality T functions are depicted in Fig. 2(b)-(e). d T e 1(x) v1 c (x) = 1(x) ESBM1, c (x) = , (15) Note that these functions are piecewise constant accord- p T m v1 QSBMv1 ing to the block partition {Bi}i=1. Moreover, as expected T k −1 from the functional form of WSBM in Fig. 2-(a), blocks B1 cα(x) = 1(x) (I − αESBM) 1, and B are the most central as measured by any of the four cpr(x) = (1 − β)1(x)T (I − βE D−1)−11. 5 β SBM E studied centralities. Regarding the remaining three blocks, degree centrality deems them as equally important whereas We next illustrate this result with an example. Its proof is eigenvector centrality considers B3 to be more important than given in Appendix A. B2 and B4. To understand this discrepancy, notice that in any finite realization of the graphon WSBM, most of the edges from a node in block B3 will go to nodes in B1 and B5, which 4.1.1 Example of a stochastic block model graphon are the most central ones. On the other hand, for nodes in blocks B2 and B4, most of the edges will be contained within Consider the stochastic block model graphon WSBM depicted their own block. Hence, even though nodes corresponding in Fig. 2-(a), with corresponding symmetric matrix P [cf. (11)] to blocks B2, B3, and B4 have the same expected number as in (16). Let us define the vector of indicator functions of neighbors – thus, same degree centrality – the neighbors T specific to this graphon 1(x) := [1B1 (x),..., 1B5 (x)] , of nodes in B3 tend to be more central, entailing a higher where the blocks coincide with those in Fig. 2-(a), that is, eigenvector centrality. As expected, an intermediate situation B1 = [0, 0.1), B2 = [0.1, 0.4), B3 = [0.4, 0.6), B4 = [0.6, 0.9), occurs with Katz centrality, whose form is closer to degree and B5 = [0.9, 1]. To apply Proposition 1 we need to compute centrality for lower values of α (cf. α = 0.5) and closer to the effective measure and effective connectivity matrices eigenvector centrality for larger values of this parameter (cf. 7 a b c d e

2 2 Fig. 3: Illustrative example of a graphon with finite rank. (a) The graphon WFR = (x + y )/2 is decomposable into a finite number of components, inducing a finite-rank graphon operator. (b-d) Degree, eigenvector, Katz, and PageRank centralities for the graphon depicted in (a).

R 1 R 1 α = 1.5). For the case of PageRank, on the other hand, block integrated functions g := 0 g(y)dy and h := 0 h(y)dy, as B3 is deemed as less central than B2 and B4. This can be well as the following normalized versions of h and E attributed to the larger size of these latter blocks. Indeed, the Z 1 h(y) Z 1 h(y)g(y)T classical PageRank centrality measure is partially driven by hnor = T dy, Enor = T dy. (19) size [10]. 0 g h(y) 0 g h(y) With this notation in place, we can establish the following 4.2 Finite-rank graphons result, which is proven in Appendix A. We now consider a class of finite-rank (FR) graphons that can Proposition 2 (Centrality measures for FR graphons). Let be written as a finite sum of products of integrable functions. v1 be the principal eigenvector of E in (18). Then, the centrality d e k pr Specifically, we consider graphons of the form functions c , c , cα, and cβ of the graphon WFR can be computed m as follows X T WFR(x, y) := gi(x)hi(y) = g(x) h(y), (17) T d T e g(x) v1 i=1 c (x) =g(x) h, c (x) = , (20) p T v1 Qv1 where m ∈ N and we have defined the vectors of functions k T −1 T T c (x) = 1 + αg(x) (I − αE) h, g(x) = [g1(x), . . . , gm(x)] and h(y) = [h1(y), . . . , hm(y)] . α pr T −1 Observe that g(x) and h(y) must be chosen so that WFR is cβ (x) = (1 − β)(1 + βg(x) (I − βEnor) hnor). symmetric, and W (x, y) ∈ [0, 1] for all (x, y) ∈ [0, 1]2. FR In the next subsection we illustrate the use of Proposi- Based on g(x) and h(y) we can define the generalizations of tion 2 for the computation of graphon centralities. QSBM and ESBM introduced in Section 4.1, for this class of finite-rank graphons. 4.2.1 Example of a finite-rank graphon Definition 5. The effective measure matrix Q and the effective Consider the FR graphon given by connectivity matrix E for a finite-rank graphon W as defined FR W (x, y) = (x2 + y2)/2, in (17) are given by FR Z 1 Z 1 and illustrated in Fig. 3-(a). Notice that this FR graphon T T Q := g(x)g(x) dx, E := h(x)g(x) dx. (18) can be written in the canonical form (17) by defining the 0 0 vectors g(x) = [x2, 1/2]T and h(y) = [1/2, y2]T . From (18) The stochastic block model graphon operator introduced we then compute the relevant matrices Q and E, as well as in (11) is a special case of the class of operators in (17). More the relevant normalized quantities, to obtain precisely, we recover the SBM graphon by choosing gi(x) = Pm 1/5 1/6 1/6 1/4 1/3 1Bi (x) and hi(y) = j=1 Pij1Bj (y) for i = 1, . . . , m. The Q = , E = , g = , matrices defined in (14) are recovered when specializing 1/6 1/4 1/5 1/6 1/2       Definition 5 to this choice of gi(x) and hi(y). We may now 1/2 1.81 0.40 0.91 h = , hnor ≈ , Enor ≈ . relate the eigenfunctions of the FR graphon with the spectral 1/3 0.79 0.40 0.40 properties of E, as explained in the following lemma. A simple computation√ reveals√ that the principal eigenvector T Lemma 3. Let λi and vi denote the eigenvalues and eigenvectors of E is v1 = [ 10/3, 2 2/3] . We now leverage the result of E in (18), respectively. Then, all the nonzero eigenvalues of in Proposition 2 to obtain WFR, the operator associated with (17), are given by λi and the 1/2 x2 1 associated eigenfunctions are of the form ϕ (x) = g(x)T v . cd(x) = [x2, 1/2] = + , i i 1/3 2 6 s √ Lemma 3 is proven in Appendix A and shows that the 3 3  10/3 graphon in (17) is of finite rank since it has at most m ce(x) = √ [x2, 1/2] √ ≈1.07 x2 + 0.54, 2 2 2/3 non-zero eigenvalues. Notice that Lemma 2 follows from 3 + 5 Lemma 3 when specializing the finite rank operator to the 1 0 1/6 1/4−11/2 ck (x) = [x2, 1/2] −α α +1 SBM case as explained after Definition 5. Moreover, we α 0 1 1/5 1/6 1/3 can leverage the result in Lemma 3 to find closed-form ≈ 10.19 x2 + 5.44, expressions for the centrality functions of FR graphons. To pr 2 write these expressions compactly, we define the vectors of c0.85(x) ≈ 1.31x + 0.56, 8 a b c d e 0.15

0.1

d c 0.05

0 0 0.5 1 x

Fig. 4: Illustrative example of a general smooth graphon. (a) The graphon WG induces an operator that has countably infinite number of nonzero eigenvalues and corresponding eigenfunctions. (b-d) Degree, eigenvector, Katz, and PageRank centralities for the graphon depicted in (a). where we have set β = 0.85 in the PageRank centrality. More- Moreover, by computing the second derivatives in (22), we 00 over, we have evaluated the Katz centrality for α = 0.9/λ1, obtain that −ϕ(x) = λϕ (x). From the solution of this where λ1 is the largest eigenvalue of E. The four centrality differential equation subject to the boundary conditions functions are depicted in Fig. 3-(b) through (e). As anticipated it follows that the eigenvalues and eigenfunctions of the from the form of WFR, there is a simple monotonicity in the operator WG are centrality ranking for all the measures considered. More √ precisely, highest centrality values are located close to 1 in 1 λn = and ϕn(x) = 2 sin(nπx) for n ∈ N. (23) the interval [0, 1], whereas low centralities are localized close π2n2 to 0. Unlike in the example of the stochastic block model in Notice that has an infinite — but countable — Section 4.1.1, all centralities here have the same functional WG number of nonzero eigenvalues, with an accumulation point form of a quadratic term with a constant offset. at zero. Thus, WG cannot be written in the canonical form for finite-rank graphons (17). Nevertheless, having obtained the 4.3 General smooth graphons eigenfunctions we can still compute the centrality measures In general, a graphon W need not induce a finite-rank for WG. For degree centrality, a simple integration gives us operator as in the preceding Sections 4.1 and 4.2. How- Z x Z 1 x(1 − x) ever, as shown in Lemma 1, a graphon always induces cd(x) = (1 − x) y dy + x (1 − y) dy = . a diagonalizable operator with countably many nonzero 0 x 2 eigenvalues. In most cases, obtaining the degree centrality function is immediate since it entails the computation of From (23) it follows that the principal eigenfunction is an integral [cf. (7)]. On the other hand, for eigenvector, achieved when n = 1. Thus, the eigenvector centrality Katz, and PageRank centralities that depend on the spectral function [cf. (8)] is given by √ decomposition of W, there is no universal technique available. e Nonetheless, a procedure that has shown to be useful in c (x) = 2 sin(πx). practice to obtain the eigenfunctions ϕ and corresponding eigenvalues λ of smooth graphons is to solve a set of Finally, for the Katz centrality we leverage Remark 1 and the differential equations obtained by successive differentiation, eigenfunction expressions in (23) to obtain when possible, of the eigenfunction equation in (6), that is, ∞ ∞ k X X  1  1 − (−1)n by considering ck (x) = 1 + αk 2 sin(nπx) α n2π2 πn dk Z 1 dkϕ(x) k=1 n=1 W (x, y)ϕ(y)dy = λ , ∞ n k k (21) X 2α 1 − (−1) dx 0 dx = 1 + sin(nπx), n2π2 − α πn for k ∈ N. In the following section we illustrate this technique n=1 on a specific smooth graphon that does not belong to the which is guaranteed to converge as long as α < 1/λ = π2. finite-rank class. 1 We plot these three centrality functions in Fig. 4-(b) through 2 4.3.1 Example of a general smooth graphon (d), where we selected α = 0.9π for the Katz centrality. Also, in Fig. 4-(e) we plot an approximation to the PageRank Consider the graphon W depicted in Fig. 4-(a) and with the pr G centrality function c obtained by solving numerically the following functional form β integral in its definition. According to all centralities, the WG(x, y) = min(x, y)[1 − max(x, y)]. most important nodes within this graphon are those located in the center of the interval [0, 1], in line with our intuition. Specializing the differential equations in (21) for graphon Likewise, nodes at the boundary have low centrality values. W G we obtain Note that while the ranking according to all centrality dk  Z x Z 1  dkϕ(x) functions is consistent, unlike in the example in Section 4.2.1, k (1 − x) yϕ(y)dy+x (1 − y)ϕ(y)dy =λ k . dx 0 x dx here there are some subtle differences in the functional (22) forms. In particular, degree centrality is again a quadratic First notice that without differentiating (i.e. for k = 0) we function whereas the eigenvector and Katz centralities are of can determine the boundary conditions ϕ(0) = ϕ(1) = 0. sinusoidal form. 9

5 CONVERGENCEOFCENTRALITYMEASURES of any graph sampled from the graphon can be bounded, N In this section we derive concentration inequalities relating with high probability, in terms of the sampled graph size . the newly defined graphon centrality functions with standard Definition 6 (Sampled graphon). Given a graphon W and a N centrality measures. To this end, we start by noting that size N ∈ N fix the latent variables {ui}i=1 by choosing either: while graphon centralities are functions, standard centrality - ‘deterministic latent variables’: u = i . measures are vectors. To be able to compare such objects, we i N first show that there is a one to one relation between any finite - ‘stochastic latent variables’: ui = U(i) where U(i) is the i-th graph with adjacency matrix A and a suitably constructed order statistic of N random samples from Unif[0, 1]. stochastic block model graphon WSBM|A. Consequently, any centrality measure of A is in one to one relation with the Utilizing such latent variables construct (N) N×N corresponding centrality function of WSBM|A. To this end, for - the ‘probability’ matrix P ∈ [0, 1] each N ∈ N, we define a partition of [0, 1] into the intervals N N (N) Bi , where Bi = [(i − 1)/N, i/N) for i = 1,...,N − 1, and Pij := W (ui, uj) for all i, j ∈ {1,...,N}. N BN = [(N − 1)/N, 1]. We denote the associated indicator- T - the sampled graphon function vector by 1N (x) := [1 N (x),..., 1 N (x)] , consis- B1 BN PN PN (N) tently with (12). WN (x, y) := P 1 N (x)1 N (y). i=1 j=1 ij Bi Bj Lemma 4. For any adjacency matrix A ∈ {0, 1}N×N define the - the operator WN of the sampled graphon corresponding stochastic block model graphon as PN (N) R N N N (WN f)(x) := j=1 Pij BN f(y)dy for any x ∈ Bi X X j W (x, y) = Aij1 N (x)1 N (x). SBM|A Bi Bj i=1 j=1 The sampled graphon WN obtained when working with Then the centrality function cN (x) corresponding to the graphon deterministic latent variables can intuitively be seen as an WSBM|A is given by approximation of the graphon W by using a stochastic T block model graphon with N blocks, as the one described cN (x) = 1N (x) cA. in Section 4.1, and is useful to study graphon centrality functions as limit of graph centrality measures. On the other where cA is the centrality measures of the graph with rescaled 1 W adjacency matrix A. hand, the sampled graphon N obtained when working N with stochastic latent variables is useful as an intermediate Proof. The graphon WSBM|A has the stochastic block model step to analyze the relation between the graphon centrality structure described in Section 4.1, with N uniform blocks. By function and the centrality measure of graphs sampled from N selecting m = N and Bi = Bi for each i ∈ {1,...,N}, the graphon according to the following procedure. we obtain Q = 1 I. Consequently, the formulas in SBM N Definition 7 (Sampled graph). Given the ‘probability’ matrix Proposition 1 simplify as given in the statement of this P(N) of a sampled graphon we define lemma. S(N) ∈ {0, 1}N×N 1 - the sampled matrix as the adjacency matrix Remark 2. Note that the scaling factor N does not affect the of a symmetric (random) graph obtained by taking N isolated centrality rankings of the nodes in A, but only the magnitude of vertices i ∈ {1,...,N}, and adding undirected edges between the centrality measures. This re-scaling is needed to avoid diverging (N) vertices i and j at random with probability κN Pij for all i > j. centrality measures for graphs of increasing size. Observe further (N) (N) Note that E[S ] = κN P . that cN (x) is the piecewise-constant function corresponding to the - the associated (random) linear operator vector cA. We finally remark that graphons of the type WSBM|A have appeared before in the literature using a different notation [24]. PN (N) R N ( N f)(x) := S N f(y)dy for any x ∈ Bi , (24) S j=1 ij Bj By using the previous lemma, we can compare centralities of graphons and graphs by working in the function space. and its associated (random) graphon: Using this equivalence, we demonstrate that the previously T (N) SN (x, y) = 1N (x) S 1N (y). (25) defined graphon centrality functions are not only defined analogously to the centrality measures on finite graphs, but also emerge as the limit of those centrality measures for a In general, we denote the centrality functions associated sequence of graphs of increasing size. Stated differently, just with the sampled graphon operator WN by cN (x) whereas −1 like the graphon provides an appropriate limiting object for the centrality functions associated with the operator κN SN a growing sequence of finite graphs, the graphon centrality are denoted by cˆN (x). Note that thanks to Lemma 4 such functions can be seen as the appropriate limiting objects of centrality functions are in one to one correspondence with the (N) −1 (N) the finite centrality measures as the size of the graphs tends centrality measures of the finite graphs P and κN S , to infinity. In this sense, the centralities presented here may respectively. Consequently, studying the relation between be seen as a proper generalization of the finite setting, just c(x), cN (x) and cˆN (x) allows us to relate the graphon central- like a graphon provides a generalized framework for finite ity function with the centrality measure of graphs sampled graphs. Most importantly, we show that the distance (in L2 from the graphon. We note that previous works on graph −1 norm) between the graphon centrality function and the step- limit theory imply the convergence of WN and κN SN to the wise constant function associated with the centrality vector graphon operator W [22], [25], [54]. Intuitively, convergence 10

of cN (x) and cˆN (x) to c(x) then follows from continuity of spectral properties of the graphon operator. In the following, we make this argument precise and more importantly we provide a more refined analysis by establishing sharp rates of convergence of the sampled graphs, under the following smoothness assumption on W . A more detailed discussion of related convergence results can be found in Appendix B (see supplementary material). Assumption 1 (Piecewise Lipschitz graphon). There exists a constant L and a sequence of non-overlapping intervals Ik = [αk−1, αk) defined by 0 = α0 < ··· < αK+1 = 1, for a (finite) K ∈ N, such that for any k, l, any set Ikl = Ik × Il and pairs c 0 0 Fig. 5: Schematic for the sets AN and AN in the case of stochastic (x, y) ∈ Ikl, (x , y ) ∈ Ikl we have that latent variables for K = 1, threshold α1 and 4 Lipschitz blocks. The plot on the left shows the original graphon W (x, y), the 4 Lipschitz blocks 0 0 0 0 |W (x, y) − W (x , y )| ≤ L(|x − x | + |y − y |). and some representative latent variables. The plot on the right shows the sampled graphon WN (x, y) which is a piecewise constant graphon 1 1 This assumption has also been used in the context of with uniform square blocks of side N . The N -grid is illustrated in gray. N N graphon estimation [28], [53] and is typically fulfilled for The constant value in each block Bi × Bj corresponds to the value in the original graph sampled at the point (ui, uj ) (as illustrated by the most of the graphons of interest. N N arrows). The set Bi × Bi in the bottom left is an example where all N N the points (x, y) ∈ Bi × Bi belong to the same Lipschitz block as N N Theorem 1 (Convergence of graphon operators). For a their corresponding sample, which is (ui, ui), so that Bi × Bi = Sii and therein |D(x, y)| ≤ 2Ld . The set BN × BN instead is one of graphon fulfilling Assumption 1, it holds with probability N k l 0 the problematic ones since part of its points (cyan) belong to the same 1 − δ that: Lipschitz block as (uk, ul) and are thus in Skl but part of its points (red) do not and therefore belong to Ac . Note that by construction with ||| − ||| := sup k f − fk N WN W WN W probability 1 − δ0 all such problematic points are contained in the set D kfk=1 N (which is illustrated in light red). This figure also illustrates that in general q c 2 2 2 DN is a strict superset of AN (hence the bound in (30) is conservative). ≤ 2 (L − K )dN + KdN =: ρ(N), (26) d PN (N) 4 2N 0 1 then CN := maxi( j=1 Pij ) ≥ 9 log( δ ). where δ = 0 and dN = N in the case of determinis- tic latent variables and δ0 = δ ∈ (Ne−N/5, e−1) and Proof of Theorem 1: q 1 8 log(N/δ) We prove the three statements sequentially. dN := N + (N+1) in the case of stochastic latent variables. Moreover, if N is large enough, as specified in Proof of (26). First of all note that by definition, for any Lemma 5, then with probability at least 1 − δ − δ0 N N (x, y) ∈ Bi × Bj it holds WN (x, y) = W (ui, uj), but it is s N N −1 not necessarily true that (ui, uj) ∈ Bi × Bj . Let us define −1 4κN log(2N/δ) k , k ∈ {1,...,K + 1} (u , u ) κ N − ≤ + ρ(N). (27) i j such that the point i j belongs N S W N to the Lipschitz block Ikikj , as defined in Assumption 1 and κ = 1 τ ∈ [0, 1) illustrated in Fig. 5. We define as Sij the subset of points in In particular, if N N τ with , then: N N Bi × Bj that belong to the same Lipschitz block Ikikj as −1 lim κN SN − W = 0, almost surely. (28) (ui, uj). Mathematically, N→∞ S = {(x, y) ∈ BN ×BN | (u , u )∈ I and (x, y)∈ I }. Remark 3. To control the error induced by the random sampling, ij i j i j kikj kikj we derive a concentration inequality for uniform order statistics, In the following, we partition the set [0, 1]2 into the set c 2 as reported next, that can be of independent interest. As this result AN := ∪ijSij and its complement AN := [0, 1] \AN . In is not central to the discussion of our paper, we relegate its proof words, AN is the set of points, for which (x, y) and its to Appendix B. In addition, we make use of a lower bound on the corresponding sample (ui, uj) belong to the same Lipschitz 0 c maximum expected degree as reported in Lemma 5 and proven in block. We now prove that, with probability 1 − δ , AN has Appendix B. area U N c 2 2 Proposition 3. Let (i) be the order statistics of points sampled Area(AN ) ≤ Area(DN ) = 4KdN − 4K dN . (30) from a standard uniform distribution. Suppose that N ≥ 20 and δ ∈ (Ne−N/5, e−1). With probability at least 1 − δ To prove the above, we define the set DN by constructing a stripe of width 2dN centered at each discontinuity of the s i 8 log(N/δ) graphon, as specified in Assumption 1. This guarantees U − ≤ for all i. 2 (i) N + 1 (N + 1) that any point in [0, 1] \DN has distance more than dN -wise from a discontinuity. Lemma 5. If N is such that Note that in the case of deterministic latent variables for N any i ∈ {1,...,N} and any x ∈ Bi it holds by construction (α) i 1 2dN < ∆ := min (αk − αk−1), (29a) that |x − ui| = |x − | ≤ dN = and similarly |y − uj| ≤ MIN k∈{1,...,K+1} N N dN . In the case of stochastic latent variables, Proposition 3   1 − δ i, j ∈ 1 2N d d guarantees that with probability at least for any log + dN (2K + 3L) < C := max c (x) (29b) N N N δ x {1,...,N} and any x ∈ Bi , y ∈ Bj it holds |x − ui| = 11

|x−U(i)| ≤ dN , |y−uj| ≤ dN . In both cases, with probability Hence, to bound the norm of the difference between a 0 2 −1 1 − δ , all the points in [0, 1] \DN are less than dN close random SBM graphon operator κN SN based on the graphon (u , u ) d −1 −1 T (N) (N) (N) to their sample i j and more than N far from any κN S = κN 1(x) S 1(y), with Sij = Ber(κN Pij ), discontinuity (component-wise) hence they surely belong to and its expectation WN defined via the graphon WN = T (N) AN . 1(x) P 1(y), we can employ matrix concentration in- 0 c Consequently, with probability 1 − δ we have AN ⊆ equalities. Specifically, we use [55, Theorem 1] in order to (N) (N) DN . Each stripe in DN has width 2dN , length 1, and there bound the deviations of kS − κN P k. are 2K stripes in total. Formula (30) is then immediate by By Lemma 5, for N large enough, the maximum expected 2 noticing that multiplying 2dN times 2K counts twice the K d PN (N) degree κN CN := κN maxi( j=1 Pij ) of the random intersections between horizontal and vertical stripes. (N) 4 2N graph represented by S grows at least as κN log( ). Consider now any f ∈ L2([0, 1]) such that kfk = 1. Let 9 δ Consequently, all the conditions of [55, Theorem 1] are met D(x, y) := W (x, y) − W (x, y) and note that |D(x, y)| ≤ 1. N and we get that with probability 1 − δ − δ0 Then we get −1 2 R 1 2 −1 κN (N) (N) kWN f − Wfk = (WN f − Wf) (x)dx κ N − N = kS − κN P k 0 N S W N R 1 R 1 2 s = 0 0 D(x, y)f(y)dy dx −1 q −1 κN d 4κN log(2N/δ)     ≤ 4κN C log(2N/δ) ≤ , R 1 R 1 2 R 1 2 N N N ≤ 0 0 D(x, y) dy 0 f (y)dy dx (31) 1  1  1 1 d (N) R R 2 2 R R 2 where we used that CN ≤ N since each element in P = 0 0 D(x, y) dy kfk dx = 0 0 D(x, y) dydx belongs to [0, 1]. RR 2 RR 2 = D(x, y) dydx + c D(x, y) dydx. (32) AN AN Proof of (28). We finally show that (27) implies almost sure Expression (31) follows from the Cauchy-Schwarz inequality; convergence. We start by restating (27) as we used kfk = 1 and, in the last equation, we split the  q −1  −1 4κ log(2N/δ) 2 c Pr κ − ≤ N + ρ(N) ≥ 1 − 2δ. interval [0, 1] into the sets AN and AN , as described above N SN W N and illustrated in Fig. 5. (34) We can now bound both terms in (32). For the first term, Further, pick any γ > 0 and define the infinite sequence of note that for all the points (x, y) in AN the corresponding events sample (ui, uj) belongs to the same Lipschitz block and is  −1 at most dN apart (component-wise). Consequently, for these EN := κN SN − W ≥ γ + ρ(N) , points |D(x, y)| ≤ 2LdN . Overall, we get for each N ≥ 1. From (34) it follows that Pr [EN ] ≤ RR D(x, y)2dydx ≤ 4L2d2 RR 1 dy dx ≤ 4L2d2 . 4N −κ Nγ2/4 . κ = 1 AN N AN N exp N Consequently, if N N τ with τ ∈ [0, 1), then: For the second term in (32), we use (30) and the fact that  2  |D(x, y)| ≤ 1 to get P∞ P∞ −κN Nγ N=1 Pr [EN ] ≤ 4 N=1 Nexp 4 < ∞ RR 2 RR c Ac D(x, y) dydx ≤ Ac 1 dy dx = Area(AN ). N N and by the Borel-Cantelli lemma there exists a positive inte- Substituting these two terms into (32) yields ger Nγ such that for all N ≥ Nγ , the complement of EN , i.e., −1 2 2 2 2 2 κN SN − W ≤ γ + ρ(N), holds almost surely. To see that kWN f − Wfk ≤ (4L dN + 4KdN − 4K dN ). −1 κN SN − W → 0 almost surely we follow the ensuing f ∞ Since this bound holds for all functions with unit norm, argument. For any given deterministic sequence {aN }N=1 we recover (26). the fact that for each γ > 0 there is a positive integer Nγ such that for all N ≥ Nγ , |aN | ≤ γ+ρ(N) implies that aN → 0. In Proof of (27). From the triangle inequality we get that fact for all  > 0, if we set γ = /2 and N := max{Nγ ,Nρ} (where N is the smallest N such that ρ(N) ≤ /2) then we −1 −1 ρ κ N − ≤ κ N − N + ||| N − |||. (33) −1 N S W N S W W W get that for all N > N, κN SN − W ≤ . Hence, we can −1 We have already bounded the second term on the right conclude that κN SN − W → 0 almost surely.  hand side of (33), so we now concentrate on the first term. The previous theorem provides us with convergence rates −1 for the graphon operators. Based on these we are able to The operator κN SN − WN can be seen as the graphon −1 (N) (N) show a similar convergence result for centrality measures of operator of an SBM graphon with matrix κN S − P . By Lemma 2 we then have that its eigenvalues coincide with graphons with a simple dominant eigenvalue. the eigenvalues of the corresponding ESBM matrix which is 1 −1 (N) (N) 1 Assumption 2 (Simple dominant eigenvalue). Let the eigen- (κ S − P ) since in this case QSBM = IN (given N N N values of W be ordered such that λ1 ≥ λ2 ≥ λ3 ≥ ... and assume that all the intervals BN have length 1 ).2 Consequently, i N that λ1 > λ2. −1 −1 κN SN − WN = λmax(κN SN − WN ) We note that in most empirical studies degeneracy of the 1 −1 (N) (N) 1 −1 (N) (N) dominant eigenvalue is not observed, justifying the above = λmax(κ S − P ) = kκ S − P k. N N N N assumption. A noteworthy exception in which a non-unique dominant eigenvalue may arise is if the graph consists of 2. Note that Lemma 2 is formulated for graphon operators (i.e. linear integral operators with nonnegative kernels). An identical proof shows multiple components. In this case, however, one can treat that the result holds also if the kernel assumes negative values. each component separately. 12 (N) For the proof in case of PageRank we will make the λ1 = |||WN ||| ≤ |||W||| + |||WN − W||| ≤ λ1 + ρ(N) and following additional assumption on the graphon. (N) λ1 = |||W||| ≤ |||WN||| + |||WN − W||| ≤ λ1 + ρ(N). λ > λ Assumption 3 (Minimal degree assumption). There exists Furthermore, since by Assumption 2 we have that 1 2, there exists a large enough N¯ such that for all N > N¯ it η > 0 such that W (x, y) ≥ η for all x, y ∈ [0, 1]. (N) (N) holds that λ1 > λ2 and |λ1 − λ2| > |λ1 − λ1|. Therefore, Note that while this assumption is not fulfilled, e.g., for by Lemma 8 in Appendix B, we obtain a SBM graphon with a block of zero connection probability, √ 2 ||| − ||| it can be further relaxed to accommodate such cases as well. kϕ(N) − ϕ k ≤ WN W . 1 1 (N) (36) However, to simplify the proof and avoid additional technical |λ1 − λ2| − |λ1 − λ1| details, we invoke Assumption 3 in the following theorem. (N) From the facts that λ1 6= λ2 (by Assumption 2), |λ1 −λ1| ≤ ρ(N) ||| − ||| ≤ ρ(N) Theorem 2. (Convergence of centrality measures) The , and WN W , it follows that (36) implies that for N > N¯ following statements hold: √ 2ρ(N) 1) For any N > 1, the centrality functions c (x) and (N)  N kϕ1 − ϕ1k ≤ = O ρ(N) . (37) −1 |λ1 − λ2| − ρ(N) cˆN (x) corresponding to the operators WN and κN SN , respectively, are in one to one relation with the centrality For Katz centrality: Take any value of α < 1/||| |||, so N N W measures c ¯(N) ∈ , c ¯(N) ∈ of the graphs with −1  P R S R that Mα = I − αW is invertible and c(x) = Mα 1[0,1] (x) rescaled adjacency matrices P¯ (N) := 1 P(N) and S¯(N) := N is well defined. Since |||WN − W||| → 0 as N → ∞, there 1 S(N), via the formulaa NκN exists Nα > 0 such that α < 1/|||WN ||| for all N > Nα. This T T implies that for any N > Nα, [MN ]α := I−αWN is invertible cN (x) = 1N (x) cP¯(N) , cˆN (x) = 1N (x) cS¯(N) . −1  and cN (x) = [MN ]α 1[0,1] (x) is well defined. Note that 2) Under Assumptions 1, 2 and (for PageRank) 3, and N |||WN − W||| ≤ ρ(N) implies |||[MN ]α − Mα||| = O(ρ(N)). sufficiently large, with probability at least 1 − δ0 We now prove that −1 −1  kcN − ck ≤ Cρ(N) [MN ]α − Mα = O ρ(N) . (38) 2 for some constant C and ρ(N), δ0 defined as in Theorem 1. To this end, note that L ([0, 1]) is a Hilbert space, the inverse −1 N 3) Under Assumptions 1, 2 and (for PageRank) 3, and N operator Mα is bounded and for large enough it holds |||[ ] − ||| < 1/ −1 |||[ ] − ||| → 0 sufficiently large, with probability at least 1 − 2δ MN α Mα Mα , since MN α Mα . It then follows by [56, Theorem 2.3.5 ] with L := ,M :=   Mα q 4κ−1 log(2N/δ) 0 N [MN ]α that kcˆN − ck ≤ C N + ρ(N) , −1 2 |||[ N ]α − α||| 0 −1 −1 Mα M M for some constant C . [MN ]α − Mα ≤ −1 =O(ρ(N)), 1− Mα |||[MN ]α − Mα||| 4) Under Assumptions 1, 2 and (for PageRank) 3, if κN = 1 thus proving (38). Finally, since k1[0,1]k = 1, N τ with τ ∈ [0, 1), then: −1 −1 lim kcˆN − ck = 0, almost surely. kcN − ck = k[MN ]α 1[0,1] − Mα 1[0,1]k N→∞ −1 −1  ≤ [MN ]α − Mα = O ρ(N) .

N×N For PageRank centrality: β ∈ (0, 1) a. Note that P¯ (N), S¯(N) belong to as opposed to Consider such that R≥0 cpr(x) {0, 1}N×N . Nonetheless, the definitions of centrality measures Lβ is invertible and is well defined. Similar to the given in Section 2.1 can be extended to the continuous interval argument used to show (38) in the proof for Katz centrality, case in a straightforward manner. it suffices to show that |||[LN ]β − Lβ||| = O(ρ(N)). To show this note that under Assumption 3 it holds Proof. 1) Follows immediately from Lemma 4 since WN and Z 1 −1 d κN SN are the operators of the graphons corresponding to c (x) = W (x, y)dy ≥ η, P(N) and κ−1S(N), respectively. 0 N Z 1 (39) 2) We showed in Theorem 1 that, under Assumption 1, d 0 cN (x) = WN (x, y)dy ≥ η. |||WN − W||| ≤ ρ(N) with probability 1 − δ . This fact can be 0 exploited to prove convergence of the centrality measures Hence (cd(x))† = (cd(x))−1 ≤ 1 and (cd (x))† = c to c. All the subsequent statements hold with probability η N N (cd (x))−1 ≤ 1 f ∈ L2([0, 1]) kfk = 1 1 − δ0. N η . For any such that ,

For degree centrality: cN (x) = (WN 1[0,1])(x) and c(x) =  d −1  d −1 k[LN ]βf − Lβfk = β kWN f · (cN ) − W f · (c ) k (W1[0,1])(x). Since k1[0,1]k = 1 we get  d −1  d −1 d −1 ≤βk(WN − W) f ·(c ) k+βkWN f ·(cN ) − f ·(c ) k kcN −ck = k(WN −W)1[0,1]k ≤ |||WN − W||| ≤ ρ(N). (35) d −1 d −1 d −1 ≤ β |||WN − W|||k(c ) k+ β|||WN |||k(cN ) −(c ) k {λ , ϕ } For eigenvector centrality: Let k k k≥1, d −1 d −1 d −1 (N) (N) ≤ βρ(N)k(c ) k + 2β|||W|||k(cN ) − (c ) k, (40) {λk , ϕk }k≥1 be the ordered eigenvalues and eigenfunctions of W and WN , respec- where we used that for N large enough |||WN ||| ≤ 2|||W|||. In (N) d −1 tively. Note that |λ1 − λ1| ≤ ρ(N) since (40), the notation (c ) is used to denote the function that 13

d −1 d −1 takes values (c (x)) and similarly for (cN ) . Observe a b that equation (39) implies 1 k(cd)−1k ≤ (41) η and Z 1 2 d −1 d −1 2  d −1 d −1 k(cN ) − (c ) k = (cN (y)) − (c (y)) dy 0 !2 Z 1 cd (y) − cd(y) 1 = N dy ≤ kcd − cdk2. (42) d d 4 N 0 cN (y)c (y) η c Combining (35), (40), (41) and (42) yields the desired result β  2  k[L ] f − fk ≤ 1 + ||| ||| ρ(N) = O(ρ(N)). N β Lβ η η W (43) 3) It suffices to mimic the argument made above for kcN − ck adapting it for the case kcˆN − ck by making use of (27). The proof is omitted to avoid redundancy. −1 3 4) By Theorem 1 we have that κN SN − W → 0 al- ˜ ∞ most surely. This means that the set of realizations {SN }N=1 ∞ −1˜ of {SN } for which |||κ SN − W||| → 0 has probability N=1 N Fig. 6: Convergence of the eigenvector centrality function for the FR one. For each of these realizations it can be proven (exactly graphon in section 4.2.1. (a-b) Eigenvector centrality functions computed 3 as in part 2) that from a sampled graphon with deterministic latent variables (black; one realization shown in cyan for visualization purposes) and the eigenvector lim kc˜N − ck = 0, centrality function of the continuous graphon (blue). The examples shown N→∞ correspond to a resolution of (a) N = 68 grid points, and (b) N = 489 where c˜N (x) is the deterministic sequence of centrality mea- grid points. In each case 20 realizations were drawn from the discretized (N) sures associated with the realization {˜ }∞ . Consequently, graphon P . (c) Convergence of the error kcˆN − ck as a function of SN N=1 the number of grid points N, corresponding to the number of nodes in the Pr[limN→∞ kcˆN − ck = 0] = 1 and limN→∞ kcˆN − ck = 0 sampled graph. For each point we plot the sample mean ± one standard almost surely. deviation. To sum up, Theorem 2 shows that, on the one hand, the for some constant C0 and ρ(N), δ defined as in Theorem 1. centrality functions of the finite-rank operators WN and −1 κN SN can be computed by simple interpolation of the To check our analytical results, we performed numerical centrality vectors of the corresponding finite-size graphs with experiments as illustrated in Figs. 6 and 7. In Fig. 6, we (N) −1 (N) adjacency matrices P and κN S (suitably rescaled). On consider again the finite-rank graphon from our example in the other hand, such centrality functions cN (x) and cˆN (x) Section 4.2.1 and assess the convergence of the eigenvector become better approximations of the centrality function e centrality function cˆN from the sampled networks (with c(x) W N of the graphon as increases. As alluded above, deterministic latent variables and κN = 1), to the true the importance of this result derives from the fact that it underlying graphon centrality measure ce. As this graphon is establishes that the centrality functions here introduced are smooth we have K = 0, i.e., there is only a single Lipschitz the appropriate limits of the finite centrality measures, thus block, and we observe a smooth decay of the error when validating the presented framework. We finally note that increasing the number of grid points N, corresponding to as immediate corollary of the previous theorem we get the the number of nodes in the sampled graph. following robustness result for the centrality measures of For the stochastic block model graphon WSBM from our different realizations. example in Section 4.1.1, however, we have K = 4 and thus Corollary 1. Consider two graphs SN1 and SN2 sampled from 25 Lipschitz blocks, which are delimited by discontinuous a graphon W satisfying Assumptions 1 and 2. Assume without jumps in the value of the graphon. The effect of these N jumps is clearly noticeable when assessing the convergence loss of generality that N1 ≤ N2 and let cS(Ni) ∈ R be the centrality of the graphs with rescaled adjacency matrices S¯(Ni) := of the centrality measures, as illustrated in Fig. 7 for the 1 S(Ni), for i ∈ {1, 2}. Then for N sufficiently large, with example of Katz centrality. If the deterministic sampling NiκNi probability at least 1 − 4δ grid of the discretized graphon WN is aligned with the structure of the stochastic block model WSBM, there is no T T 1 (x) c (N ) − 1 (x) c (N ) mismatch introduced by the sampling procedure and thus N1 S¯ 1 N2 S¯ 2 s  the approximation error of the centrality measure cN is 4κ−1 log(2N /δ) 0 N1 1 smaller (also in the sampled version cˆN ). Stated differently, if ≤ 2C  + ρ(N1) 1 N1 the N -grid is exactly aligned with the Lipschitz blocks of the underlying graphon, we are effectively in a situation in which the area Ac is zero, which is analogous to the case of K = 0 3. In part 2(b) the rate of convergence of WN to W was used. N Nonetheless, the same statement holds under the less stringent condition (see Fig. 5). In contrast, if there is a misalignment between the (N) 1 |||WN − W||| → 0, since this is sufficient to prove that λ1 → λ1. Lipschitz blocks and the N -grid, then additional errors are 14

a b challenges that need to be addressed to widen the scope of applications of our methods. First, in most practical scenarios the graphon will need to be estimated from (finite) data. The validity (error terms) of the centrality scores will accordingly be contingent on the errors made in this estimation [28], [53]. It would therefore be very interesting to explore how to best levarage existing graphon estimation results in order to estimate our proposed graphon centrality functions. Second, the parameter κN allows for the analysis of sparse networks as introduced in [26], [27], [43], [44], [45] but not for networks c 0.6 with asymptotically bounded degrees [46]. An extension of mismatch aligned our results to the analysis of networks with finite degrees is 0.4 thus of future interest. Additionally, there are a number of other generalizations that are worth investigating. 0.2 First, the generalization of centralities for graphons beyond the four cases studied in this paper. In particular, the extension to centralities that do not rely directly on 0 3 spectral graph properties, such as closeness and betweenness centralities, appears to be a challenging task. Indeed, a suitable notion of for infinite-size networks needs to be Fig. 7: Convergence of the Katz centrality function for the SBM graphon defined first and bounds on the possible fluctuations of such in section 4.1.1. (a-b) Katz centrality functions computed from a sampled a notion of path would need to be derived. Enriching the graphon with deterministic latent variables (black; one realization shown class of graphon centrality measures would also contribute in cyan for visualization purposes) and the Katz centrality function of the continuous graphon (blue). The examples shown correspond to a to the characterization of the relative robustness of certain resolution of (a) N = 58 grid points, and (b) N = 960 grid points. In classes of centralities, which would allow us to better assess each case 20 realizations were drawn from the discretized graphon P(N). their utility for the analysis of uncertain networks in practice. (c) Convergence of the error kcˆN − ck as a function of the number of grid points N, corresponding to the number of nodes in the sampled graph. Second, the identification of classes of graphons (others For each point we plot the sample mean ± one standard deviation. For than the ones here discussed) for which explicit and efficient visualization purposes we connect the data-points (red) in which the grid formulas of the centrality functions can be derived. approximation was aligned with the piece-wise constant changes of the SBM (i.e. each grid point is within one Lipschitz block of the SBM graphon Third, the determination of whether the convergence – c.f. Fig. 5). For this graphon, this happens for all N that are divisible rates provided in Theorem 2 are also optimal. A related by 10. Likewise, we connected those data points where there was a question in this context is to derive convergence results for mismatch between the sampling grid and the SBM graphon structure the exact ordering of the nodes. This is in particular relevant (blue). for applications where we would like to know by how much the ranking of an individual node might have changed as a introduced leading to an overall slower convergence, which result of the uncertainty of the network. One possible avenue is consistent with our results above. to tackle this kind of question would be to start investigating `∞-norm bounds [57] for centralities, which enable us to 6 DISCUSSIONAND FUTURE WORK control the maximal fluctations of each individual entry of the centrality measures. In many applications of centrality-based network analysis, Finally, the extension of the centrality definitions from reg- the system of interest is subject to uncertainty. In this context, ular graphons to more complex objects, such as asymmetric a desirable trait for a centrality measure is that the relative networks or time-varying graphons [58], [59]. importance of agents should be impervious to random fluctuations contained in a particular realization of a network. In this paper, we formalized this intuition by extending the notion of centrality to graphons. More precisely, we proposed APPENDIX A:OMITTEDPROOFS a departure from the traditional concept of centrality mea- sures applied to deterministic graphs in favor of a graphon- Proof of Proposition 1 based, probabilistic interpretation of centralities. Thus, we 1) introduced suitable definitions of centrality measures for Proof. This proof is a consequence of Proposition 2. We notice graphons, 2) showed how such measures can be computed that we can specialize the formula therein to the case of block for specific classes of graphons, 3) proved that the standard stochastic models to obtain the relations: h = ESBM1, Q = −1 −1 centrality measures defined for graphs of finite size converge QSBM, E = ESBM, Enorm = ESBMDE , hnor = ESBMDE 1. To to our newly defined graphon centralities, and 4) bound the see that these equivalencies are true one can check that, for distance between graphon centrality function and centrality instance: measures over sampled graphs. Pm Pij [QSBM]jj Pm Pij [QSBM]jj The results presented here constitute a first step towards [hnor]i = Pm = Pm j=1 k=1[QSBM]kkPkj j=1 k=1[QSBM]kkPjk a systematic analysis of centralities in graphons and several Pm [ESBM]ij Pm −1 = = [ESBMD ]ij. questions remain unanswered. In particular, we see two main j=1 [DE]jj j=1 E 15

From the above equivalences, the formulas for cd(x), ce(x) where we used Definition 5 for the last equality. We leverage k k follow immediately. For cα(x) we obtain (45) to compute cα using the expression in Remark 1, T ∞ ck (x) = 1 + α1(x) P αkEk  E 1 k P∞ k k α k=0 SBM SBM cα(x) = 1 + k=1 α (W 1[0,1])(x) T ∞ = 1 + 1(x) P αkEk  1 P∞ k R 1 T k−1 k=1 SBM = 1 + k=1 α 0 g(x) E h(y)dy = 1(x)T P∞ αkEk  1 = 1(x)T (1 − αE )−1 1. P∞ k T k−1 P∞ k T k k=0 SBM SBM = 1 + k=1 α g(x) E h = 1 + α k=0 α g(x) E h pr TP∞ k k T −1 Finally, cβ (x) can be proven similarly. =1+αg(x) k=0 α E h=1+αg(x) (I − αE) h,

Proof of Lemma 3 as we wanted to show. Proof. Assume that v is an eigenvector of E such that Ev = Finally, the methodology to prove the result for PageRank λv with λ 6= 0. We now show that this implies that ϕ(x) = is similar to the one used for Katz, thus we sketch the proof Pm to avoid redundancy. First, we recall the definition of G from j=1 vjgj(x) is an eigenfunction of W with eigenvalue λ. the proof of Proposition 1 and use induction to show that for From an explicit computation of (Wϕ)(x) we have that k T k−1 every finite-rank graphon (G 1[0,1])(x) = g(x) Enor hnor Pm R 1 Pm (Wϕ)(x) = i=1 gi(x) 0 hi(y) j=1 vjgj(y)dy for all integer k ≥ 1. We then compute the inverse in the Pm Pm R 1 definition of PageRank (10) via an infinite sum as done above = i=1 gi(x) j=1 vj 0 hi(y)gj(y)dy. k for Katz but using the derived expression for (G 1[0,1])(x). Recalling the definition of E from (18), it follows that Pm Pm Pm (Wϕ)(x)= i=1 gi(x) j=1 vjEij = i=1 gi(x)λvi =λϕ(x), where we used the fact that v is an eigenvector of E for the REFERENCES second equality. In order to show the converse statement, let us assume [1] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, ϕ L. Ling, and N. Zhang, “Topological structure analysis of the that is an eigenfunction of W with associated eigenvalue protein–protein interaction network in budding yeast,” Nucleic λ 6= 0. Then, we may write that acids research, vol. 31, no. 9, pp. 2443–2450, 2003. Pm R 1 [2] J. Kleinberg, “Authoritative sources in a hyperlinked environment,” (Wϕ)(x) = i=1 gi(x) 0 hi(y)ϕ(y)dy = λϕ(x), J. ACM, vol. 46, no. 5, pp. 604–632, Sep. 1999. [3] A. Garas, P. Argyrakis, C. Rozenblat, M. Tomassini, and S. Havlin, from where it follows that “Worldwide spreading of economic crisis,” New Journal of Physics, m 1 m ϕ(x)=λ−1P g (x)R h (y)ϕ(y)dy =P g (x)v , (44) vol. 12, no. 11, p. 113043, 2010. i=1 i 0 i i=1 i i [4] N. E. Friedkin, “Theoretical foundations for centrality measures,” 1 where we have implicitly defined v := λ−1 R h (y)ϕ(y)dy. American Journal of Sociology, vol. 96, no. 6, pp. 1478–1504, 1991. i 0 i [5] L. C. Freeman, “A set of measures of centrality based on between- Substituting (44) into this definition yields, for all i = ness,” Sociometry, pp. 35–41, 1977. 1, . . . , m, [6] ——, “Centrality in social networks conceptual clarification,” Social R 1 Pm Pm Networks, vol. 1, no. 3, pp. 215–239, 1978. λvi = 0 hi(y) j=1 gj(y)vjdy = j=1 vjEij = [Ev]i. [7] P. Bonacich, “Power and centrality: A family of measures,” American journal of sociology, vol. 92, no. 5, pp. 1170–1182, 1987. By writing the above equality in vector form we get that [8] S. P. Borgatti and M. G. Everett, “A graph-theoretic perspective on Ev = λv, thus completing the proof. centrality,” Social Networks, vol. 28, no. 4, pp. 466–484, 2006. [9] L. Page, S. Brin, R. Motwani, and T. Winograd, “The Proof of Proposition 2 citation ranking: Bringing order to the web.” Stanford InfoLab, Technical Report 1999-66, November 1999. Proof. The proof for degree centrality follows readily from (7), [10] D. F. Gleich, “Pagerank beyond the web,” SIAM Review, vol. 57, d R 1 T T R 1 T no. 3, pp. 321–363, 2015. i.e. c (x) = 0 g(x) h(y)dy = g(x) 0 h(y)dy = g(x) h. e e [11] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread Based on Lemma 3, for c it is sufficient to prove that kc k = of influence through a social network,” in Proceedings of the 1. To this end, note that International Conference on Knowledge Discovery and Data Mining e 2 1 R 1 T 2 (KDD), 2003, pp. 137–146. kc k = T (g(x) v1) dx v1 Qv1 0 [12] E. Costenbader and T. W. Valente, “The stability of centrality 1 R 1 T T measures when networks are sampled,” Social Networks, vol. 25, = T (v g(x))(g(x) v1)dx v1 Qv1 0 1 no. 4, pp. 283–307, 2003. T 1 T R 1 T v1 Qv1 [13] S. P. Borgatti, K. M. Carley, and D. Krackhardt, “On the robustness = T v g(x)g(x) dxv1 = T = 1. v1 Qv1 1 0 v1 Qv1 of centrality measures under conditions of imperfect data,” Social Networks, vol. 28, no. 2, pp. 124–136, 2006. For Katz centrality, we first prove by induction that [14] M. Benzi and C. Klymko, “On the limiting behavior of parameter- ( k f)(x) = R 1 g(x)T Ek−1h(y)f(y)dy, dependent network centrality measures,” SIAM Journal on Matrix WFR 0 (45) Analysis and Applications, vol. 36, no. 2, pp. 686–706, 2015. [15] S. Segarra and A. Ribeiro, “Stability and continuity of centrality for all finite-rank operators FR. The equality holds trivially W measures in weighted graphs,” IEEE Transactions on Signal Process- for k = 1. Now suppose that it holds for k − 1, we can then ing, vol. 64, no. 3, pp. 543–555, Feb 2016. compute [16] P. Balachandran, E. Airoldi, and E. Kolaczyk, “Inference of network k k−1 summary statistics through network denoising,” arXiv preprint (WFRf)(x) = (WFRWFR f)(x) arXiv:1310.0423, 2013. R 1 T R 1 T k−2 [17] K. Lerman, R. Ghosh, and J. H. Kang, “Centrality metric for = 0 g(x) h(z) 0 g(z) E h(y)f(y) dy dz dynamic networks,” in Proceedings of the Workshop on Mining and   = R 1 g(x)T R 1 h(z)g(z)T dz Ek−2h(y)f(y) dy Learning with Graphs (MLG), 2010, pp. 70–77. 0 0 [18] R. K. Pan and J. Saramaki,¨ “Path lengths, correlations, and centrality R 1 T k−1 in temporal networks,” Physical Review E, vol. 84, no. 1, p. 016105, = 0 g(x) E h(y)f(y) dy, 2011. 16

[19] P. Grindrod and D. J. Higham, “A matrix iteration for dynamic [45] O. Klopp, A. Tsybakov, and N. Verzelen, “Oracle inequalities for network summaries,” SIAM Review, vol. 55, no. 1, pp. 118–128, network models and sparse graphon estimation,” The Annals of 2013. Statistics, vol. 45, no. 1, pp. 316–354, 2017. [20] K. Dasaratha, “Distributions of centrality on networks,” arXiv [46] V. Veitch and D. M. Roy, “The class of random graphs arising from preprint arXiv:1709.10402, 2017. exchangeable random measures,” arXiv preprint arXiv:1512.03099, [21] C. Borgs and J. T. Chayes, “Graphons: A nonparametric method 2015. to model, estimate, and design algorithms for massive networks,” [47] D. J. Aldous, “Representations for partially exchangeable arrays of arXiv:1706.01143, 2017. random variables,” Journal of Multivariate Analysis, vol. 11, no. 4, [22] L. Lovasz,´ Large Networks and Graph Limits. American Mathemati- pp. 581–598, 1981. cal Society Providence, 2012, vol. 60. [48] D. N. Hoover, “Relations on probability spaces and arrays of [23] L. Lovasz´ and B. Szegedy, “Limits of dense graph sequences,” random variables,” Preprint, Institute for Advanced Study, Princeton, Journal of Combinatorial Theory, Series B, vol. 96, no. 6, pp. 933–957, NJ, vol. 2, 1979. 2006. [49] K. Deimling, Nonlinear Functional Analysis. Dover publications, [24] C. Borgs, J. T. Chayes, L. Lovasz,´ V. T. Sos,´ and K. Vesztergombi, 1985. “Convergent sequences of dense graphs I: Subgraph frequencies, [50] P. H. Bezandry and T. Diagana, Almost Periodic Stochastic Processes. metric properties and testing,” Advances in Mathematics, vol. 219, Springer Science & Business Media, 2011. no. 6, pp. 1801–1851, 2008. [51] P. Latouche and S. Robin, “Variational Bayes model averaging for [25] ——, “Convergent sequences of dense graphs II: Multiway cuts graphon functions and motif frequencies inference in W-graph and statistical physics,” Annals of Mathematics, vol. 176, no. 1, pp. models,” Statistics and Computing, vol. 26, no. 6, pp. 1173–1185, Nov 151–219, 2012. 2016. [26] P. J. Bickel and A. Chen, “A nonparametric view of network models [52] P. J. Wolfe and S. C. Olhede, “Nonparametric graphon estimation,” and Newman–Girvan and other modularities,” Proceedings of the arXiv preprint arXiv:1309.5936, 2013. National Academy of Sciences, vol. 106, no. 50, pp. 21 068–21 073, 2009. [53] C. Gao, Y. Lu, and H. H. Zhou, “Rate-optimal graphon estimation,” [27] P. J. Bickel, A. Chen, and E. Levina, “The method of moments and The Annals of Statistics, vol. 43, no. 6, pp. 2624–2652, 2015. degree distributions for network models,” The Annals of Statistics, [54] B. Szegedy, “Limits of kernel operators and the spectral regularity vol. 39, no. 5, pp. 2280–2301, 2011. lemma,” European Journal of Combinatorics, vol. 32, no. 7, pp. 1156– [28] E. M. Airoldi, T. B. Costa, and S. H. Chan, “Stochastic blockmodel 1167, 2011. approximation of a graphon: Theory and consistent estimation,” in [55] F. Chung and M. Radcliffe, “On the spectra of general random Advances in Neural Information Processing Systems, 2013, pp. 692–700. graphs,” the electronic journal of combinatorics, vol. 18, no. 1, p. 215, [29] J. Yang, C. Han, and E. Airoldi, “Nonparametric estimation and 2011. testing of exchangeable graph models,” in Artificial Intelligence and [56] K. Atkinson and W. Han, Theoretical Numerical Analysis: A Functional Statistics, 2014, pp. 1060–1067. Analysis Framework. Springer, 2009, vol. 39. [30] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic block- [57] J. Fan, W. Wang, and Y. Zhong, “An l-infinity eigenvector pertur- models: First steps,” Social Networks, vol. 5, no. 2, pp. 109–137, bation bound and its application to robust covariance estimation,” 1983. arXiv preprint arXiv:1603.03516, 2016. [58] H. Crane, “Dynamic random networks and their graph limits,” The [31] T. A. Snijders and K. Nowicki, “Estimation and prediction for Annals of Applied Probability, vol. 26, no. 2, pp. 691–721, 2016. stochastic blockmodels for graphs with latent block structure,” [59] M. Pensky, “Dynamic network models and graphon estimation,” Journal of Classification, vol. 14, no. 1, pp. 75–100, 1997. arXiv preprint arXiv:1607.00673, 2016. [32] S. J. Young and E. R. Scheinerman, “Random dot product graph [60] R. M. Dudley, Real Analysis and Probability. Cambridge University models for social networks,” in International Workshop on Algorithms Press, 2002, vol. 74. and Models for the Web-Graph. Springer, 2007, pp. 138–149. [61] S. Janson, Graphons, cut norm and distance, couplings and rearrange- [33] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda, ments. NYJM Monographs, 2013. “Learning systems of concepts with an infinite relational model,” in [62] C. Davis and W. M. Kahan, “The rotation of eigenvectors by a Proceedings of the National Conference on Artificial Intelligence (AAAI), perturbation,” SIAM Journal on Numerical Analysis, vol. 7, no. 1, pp. 2006, pp. 381–388. 1–46, 1970. et al. [34] A. Goldenberg, A. X. Zheng, S. E. Fienberg, E. M. Airoldi , “A [63] M. Avella-Medina, F. Parise, M. T. Schaub, and S. Segarra, “Central- Foundations and Trends in survey of statistical network models,” ity measures for graphons: Accounting for uncertainty in networks,” , vol. 2, no. 2, pp. 129–233, 2010. arXiv preprint arXiv:1707.09350v3, 2018. [35] M. W. Morency and G. Leus, “Signal processing on kernel-based [64] S. Boucheron and M. Thomas, “Concentration inequalities for order random graphs,” in Proceedings of the European Signal Processing statistics,” Electronic Communications in Probability, vol. 17, 2012. Conference (EUSIPCO), Kos island, Greece, 2017. [65] L. Devroye, Non-Uniform Random Variate Generation. Springer- [36] C. E. Lee and D. Shah, “Unifying framework for crowd-sourcing Verlag, New York, 1986. via graphon estimation,” arXiv preprint arXiv:1703.08085, 2017. [66] J. B. Conway, A Course In Functional Analysis. Springer Science & [37] S. Gao and P. E. Caines, “The control of arbitrary size networks Business Media, 1997, vol. 96. of linear systems via graphon limits: An initial investigation,” in [67] F. Sauvigny, Partial Differential Equations 2: Functional Analytic 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Methods. Springer Science & Business Media, 2012. Dec 2017, pp. 1052–1057. [38] Y. Zhang, E. Levina, and J. Zhu, “Estimating network edge probabilities by neighbourhood smoothing,” Biometrika, vol. 104, no. 4, pp. 771–783, 2017. [39] P. Orbanz and D. M. Roy, “Bayesian models of graphs, arrays and other exchangeable random structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 437–461, 2015. Marco Avella-Medina holds a B.A. degree in [40] C. Borgs, J. Chayes, and L. Lovasz,´ “Moments of two-variable Economics (2009), a M.Sc. degree in Statistics functions and the uniqueness of graph limits,” Geometric And (2011) and a Ph.D. in Statistics (2016) from the Functional Analysis, vol. 19, no. 6, pp. 1597–1619, 2010. University of Geneva, Switzerland. From August [41] L. Lovasz´ and B. Szegedy, “The automorphism group of a graphon,” 2016 to June 2018 he was a postdoctoral re- Journal of Algebra, vol. 421, pp. 136–166, 2015. searcher with the Statistics and Data Science [42] P. Diaconis and S. Janson, “Graph limits and exchangeable random Center, and the Sloan School of Management at graphs,” arXiv preprint arXiv:0712.2749, 2007. the Massachusetts Institute of Technology. Since July 2018, he is an Assistant Professor in the [43] C. Borgs, J. Chayes, and A. Smith, “Private graphon estimation for Department of Statistics at Columbia University. sparse graphs,” in Advances in Neural Information Processing Systems, His research interests include robust statistics, 2015, pp. 1369–1377. high-dimensional statistics and statistical machine learning. [44] C. Gao, Y. Lu, Z. Ma, and H. Zhou, “Optimal estimation and completion of matrices with biclustering structures,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 5602–5630, 2016. 17

Francesca Parise was born in Verona, Italy, in 1988. She received the B.Sc. and M.Sc. de- grees (cum Laude) in Information and Automa- tion Engineering from the University of Padova, Italy, in 2010 and 2012, respectively. She con- ducted her master thesis research at Imperial College London, UK, in 2012. She graduated from the Galilean School of Excellence, Univer- sity of Padova, Italy, in 2013. She defended her PhD at the Automatic Control Laboratory, ETH Zurich, Switzerland in 2016 and she is currently a Postdoctoral researcher at the Laboratory for Information and Decision Systems, M.I.T., USA. Her research focuses on identification, analysis and control of complex systems, with application to distributed multi-agent networks, game theory and systems biology.

Michael T. Schaub obtained a B.Sc. in Electrical Engineering and Information Technology from ETH Zurich (2009), an M.Sc. in Biomedical En- gineering (2010) from Imperial College London, and a Ph.D. in Applied Mathematics (2014) from Imperial College. After a Postdoctoral stay at the Universite´ catholique de Louvain and the Univer- sity of Namur (Belgium), he has been a Postdoc- toral Researcher at the Institute of Data, Systems and Society (IDSS) at the Massachusetts Institute of Technology since November 2016. Presently, he is a Marie-Sklodowska Curie Fellow at IDSS and the Department of Engineering Science, University of Oxford, UK. He is broadly interested in interdisciplinary applications of applied mathematics in engineering, social and biological systems. His research interest include in particular , data science, machine learning, and dynamical systems.

Santiago Segarra (M’16) received the B.Sc. de- gree in industrial engineering with highest honors (Valedictorian) from the Instituto Tecnologico´ de Buenos Aires (ITBA), Argentina, in 2011, the M.Sc. in electrical engineering from the University of Pennsylvania (Penn), Philadelphia, in 2014 and the Ph.D. degree in electrical and systems engineering from Penn in 2016. From September 2016 to June 2018 he was a postdoctoral re- search associate with the Institute for Data, Sys- tems, and Society at the Massachusetts Institute of Technology. Since July 2018, Dr. Segarra is an Assistant Professor in the Department of Electrical and Computer Engineering at Rice University. His research interests include network theory, data analysis, machine learning, and graph signal processing. Dr. Segarra received the ITBA’s 2011 Best Undergraduate Thesis Award in industrial engineering, the 2011 Outstanding Graduate Award granted by the National Academy of Engineering of Argentina, the 2017 Penn’s Joseph and Rosaline Wolf Award for Best Doctoral Dissertation in electrical and systems engineering as well as four best conference paper awards. 18

APPENDIX B:ADDITIONAL RESULTS In particular, for p = q = 2 we get Invariance of centrality measures under permutations q |W k ≤ |||W||| ≤ 8kW k . Just as the topology of a graph is invariant with respect   to relabelings or permutations of its nodes, graphons are It follows from Lemma 7 that convergence of the graphon defined only up to measure preserving transformations. operator can be deduced from previous works establishing π We show in the next lemma that the linear operator W graphon convergence in cut norm. For example one can use associated with any such ‘permutation’ π (formalized via a the results of [25], [54] combined with Lemma 7 to easily measure preserving transformation) of a graphon W shares conclude that the graphon operator converges in operator the same eigenvalues of W and ‘permuted’ eigenfunctions. norm. However taking this approach one typically does not obtain rates of convergence of the sampled graph’s Lemma 6. Consider the graphon W π(x, y) := W (π(x), π(y)) operator to the graphon operator. A convergence rate is obtained by transforming W using the measure preserving function π instead provided in [22, Lemma 10.16] for general graphons. π : [0, 1] → [0, 1]. Let W and W be the associated linear integral We next show that, for graphons satisfying Assumption 1 operators. If (λ, ϕ) is an eigenvalue-eigenfunction pair of W, then π with K = 0, the result in [22, Lemma 10.16] leads to a slower (λ, ϕ ◦ π) is an eigenvalue-eigenfunction pair of W . rate of convergence than the one provided in Theorem 1. Proof. From a direct computation we obtain that More precisely, combining [22, Lemma 10.16] and Lemma 7, we get that with probability at least 1−exp(− N ) it holds ( π(ϕ ◦ π))(x) = R 1 W π(x, y)ϕ(π(y))dy 2 log N W 0 √ 1 1 1/4 R R ||| N − ||| ≤ 176/(log N) . (46) = 0 W (π(x), π(y))ϕ(π(y))dy = 0 W (π(x), y)ϕ(y)dy S W = ( ϕ)(π(x)) = λϕ(π(x)). N W By defining δ = exp(− 2 log N ), the bound provided in Eq. κ = 1 K = 0 The third equality uses the fact that π is a measure preserving (27) (for N and ) leads to transformation and the ergodic theorem [60, Ch. 8].  1/2 |||SN − W||| ≤ O 1/(log N) , (47) Lemma 6 complements the discussion at the end of Section 2.2 by showing the effect of measure preserving thus proving faster convergence. Finally, we note that the transformations on the spectral properties of the graphon. bounds provided in Theorem 1 are not only tighter but also more flexible. In fact, by introducing the parameter δ we establish a trade off between sharper error bounds and Discussion on related graphon convergence results the probability that such bounds hold, as typically done in We introduce some additional definitions in order to compare concentration inequality results. our work with previous results on graphon convergence. In particular, let us start by introducing the cut norm which A useful variant of the Davis-Kahan theorem is typically used for the statement of graphon convergence results. For a graphon W in the graphon space W, the cut The following technical lemma is used to prove the conver- norm is denoted by kW k and is defined as gence of the eigenvector centrality for graphons and is a  consequence of the Davis-Kahan sin θ theorem for compact Z Z operators in Hilbert space [62]. kW k := sup W (x, y)dxdy , U,V U V Lemma 8. Consider two linear integral operators L and Lˆ, ˆ where U and V are measurable subsets of [0, 1]. The cut with ordered eigenvalues {λk}k≥1, {λk}k≥1. Let ϕˆ1, ϕ1 be the 0 ˆ metric between two graphons W, W ∈ W is eigenfunctions associated with the dominant eigenvalues λ1 and λ1 ˆ 0 φ 0 (normalized to norm one) and suppose that |λ1 − λ2| > |λ1 − λ1|. d(W, W ) := inf kW − W k, φ∈Π[0,1] Then √ ˆ 2|||L−L||| φ kϕˆ1 − ϕ1k ≤ ˆ . (48) where W (x, y) := W (φ(x, ), φ(y)) and Π[0,1] is the class of |λ1−λ2|−|λ1−λ1| φ : [0, 1] 7→ [0, 1] measure preserving permutations . Intu- The proof can be found in [63]. itively, the function φ performs a node relabeling to find the best match between W and W 0. Because of such relabeling, 0 Concentration of uniform order statistics d(W, W ) is not a well defined metric in W since we might 0 0 have that d(W, W ) = 0 even if W 6= W . To avoid such The goal of this subsection is to derive a uniform deviation a problem, we define the space W as the space where we bound for order statistics sampled from a standard uniform identify graphons up to measure preserving transformations, distribution, as detailed in Proposition 3 in the main text. This so that d is a well defined metric in W. It can be shown result is required in the proof of Theorem 1. Although it is that the metric space (W, d) is complete [22]. The following intuitive to expect subgaussian deviations for uniform order lemma is instrumental for our comparison to previous work, statistics, we could not find the desired statement explicitly as it establishes the equivalence between the Lp,q norms and in the literature and believe it could be of interest on its own the cut norm. Recall that for any p, q ≥ 1, the Lp,q operator right. From a technical point of view, the key ingredient in ||| ||| := sup k fk norm is defined as W p,q f∈Lp,kfkp=1 W q. our argument is to use the exponential Efron-Stein inequality derived in [64]. Lemma 7. [61] For any W ∈ W and all p, 1 ∈ [1, ∞] √ Let U1,...,UN ∼ Unif(0, 1) and define their correspon- min(1−1/p,1/q) kW k ≤ |||W |||p,q ≤ 2(4kW k) . dent order statistics U(1) ≤ U(2) ≤ · · · ≤ U(N) and spacings 19

∆i = U(i) − U(i−1) for i = 1,...,N + 1 with the convention Proof. We note that by Chernoff’s inequality U = 0 U = 1 (i) and (N+1) . It is shown in [65] that λ|U − i |  i  [e (i) N+1 ] U − > t ≤ E - Each U(i) is distributed according to Beta(i, N + 1 − i) P (i) λt (50) i N + 1 e and thus has mean N+1 ; and from Lemma 9 we see that - The joint survival function of the spacings is λ|U − i | λ N [∆ (eλ∆i −1)] [e (i) N+1 ] ≤ e 2 E i . (51) N+1 !N E  X P ∆1 > s1,..., ∆N+1 > sN+1 = 1 − si . From the marginal density of the spacings we get i=1 + Z 1 k k N−1 k!N! Consequently the spacings are identically (but not indepen- E[∆i ]=N (si) (1−si) dsi =NB(k+1,N)= , 0 (N + k)! dently) distributed with cumulative distribution F∆(s) = N N−1 1 − (1 − s) and marginal density f∆(s) = N(1 − s) . where we used the definition of the beta function B(x, y) = R 1 x−1 y−1 (x−1)!(y−1)! 0 t (1 − t) dt = (x+y−1)! for integers x, y. Then

The following lemma is a key intermediate step in the k λ∆i P∞ (λ∆i) derivation of concentration inequalities for order statistics E[∆i(e − 1)] = E[∆i( k=0 k! − 1)] drawn from a uniform distribution. P∞ λk k+1 P∞ λk(k+1)N! = k=1 k! E[∆i ] = k=1 (N+k+1)! Lemma 9. For any λ ≥ 0 it holds P∞ λk(k+1)N! 1 P∞ λ k ≤ k=1 N k(N+1)N! = N+1 k=1 N (k + 1) N λ|U(i)− U(i)| λ∆i 1 h ∞ λ k i 1 h 1 i log e E ≤ λ [∆i(e − 1)]. P  E E = N+1 k=0 N (k + 1) − 1 = N+1 λ 2 − 1 2 (1− N ) h λ 2 i h λ 2 λ i 1 1−(1− N ) 1 1−1−( N ) +2 N Proof. We show this result by first proving = N+1 λ 2 = N+1 λ 2 (1− N ) (1− N ) λ λ(U − U ) N−i+1 λ∆i h 2− i (a) log Ee (i) E (i) ≤ λ E[∆i(e − 1)]; λ N 2 = N(N+1) λ 2 λ( U −U ) i λ∆i (1− N ) (b) log Ee E (i) (i) ≤ λ E[∆i(e − 1)]. 2 (52) To this end, note that the hazard rate of a uniform distribution ∞ 1 1 P αk(k + 1) = is increasing since it has the form h(x) = . Therefore where we used k=0 (1−α)2 (obtained by 1−x λ applying Theorem 2.9 of [64] shows (a). Note that therein the differentiating the geometric sum for N < 1). If we set result is proven only for i ≥ N/2 + 1 (equivalently in the λ < 0.35N then notation of [64] k := N + 1 − i ≤ N/2) but such condition on 4λ [∆ (eλ∆i − 1)] ≤ i is never used in the proof. Indeed, in order to prove claim E i N(N + 1) (2.1) of Theorem 2.9 in [64] one only needs to show that √ 7− 49−32 h 2−y i since y < ≈ 0.36 implies 2 < 4. Combin- Ent[eλX(k) ] ≤ k [eλX(k+1) ψ(λ(X − X ))] (49) 8 (1−y) E (k) (k+1) ing (50), (51) and (52) yields

for 1 ≤ k ≤ N, where Ent[Y ] = [Y log Y ] − [Y ] log [Y ] 2 E E E  i  2λ −λt U − > t ≤ e N+1 e . is the entropy of a non-negative random variable Y . This P (i) N + 1 follows easily from the arguments of Proposition 2.3 in [64]. t(N+1) Note that the authors only consider k := N + 1 − i ≤ N/2 Minimizing over λ leads to the choice λ = 4 and thus in this proposition because for k > N/2 the bound can be  i   t2(N + 1) improved. P U(i) − > t ≤ exp − . Let us now turn to the proof of (b). Note that the beta N + 1 8 distribution is reflection symmetric i.e. if X ∼ Beta(α, β) q 8 log(1/δ ) The proof is concluded if we select t = i . Note then 1 − X ∼ Beta(β, α) for α, β > 0. Therefore U(i) ∼ (N+1) that for this choice 1 − U(N−i+1) and EU(i) − U(i) ∼ U(N−i+1) − EU(N−i+1). s s Hence by (a) we have that t(N + 1) 8 log(1/δ ) (N + 1) log(1/δ )(N + 1) λ= = i = i . λ( U −U ) λ(U − U ) log Ee E (i) (i) = log Ee (N−i+1) E (N−i+1) 4 (N + 1) 4 2 N−(N−i+1)+1 λ∆N−k+1 λ < 0.35N ≤ λ 2 E[∆N−i+1(e − 1)] We need to verify that or equivalently that 2 2 = λ i [∆ (eλ∆i − 1)], 2(0.35) N − log(1/δi)N √− log(1/δi) > 0. A sufficient 2 E i 2 2 log(1/δi)+ log(1/δi) +8 log(1/δi)(0.35) ¯ condition is N > 2 =: N. where in the last step we used the fact that the spacings √ 4(0.35) ¯ 1+ 1+8(0.35)2 ∆1,..., ∆N have the same marginal distribution. Finally, the Note that N < 4(0.35)2 log(1/δi) < 5 log(1/δi), since −1 statement of the lemma is an immediate consequence of (a) log(1/δi) > 1 for δi < e . Hence a simpler sufficient and (b). condition is N > 5 log(1/δi). −N/5 −1 Lemma 10. Suppose that N > 5 and δi ∈ (e , e ) then, for i = 1,...,N, with probability at least 1 − δi Proof of Proposition 3: s i 8 log(1/δ ) i = 1,...,N i From Lemma 10 we known that for each U(i) − ≤ . δ δ N + 1 (N + 1) if we set δi = N then with probability at least 1 − N 20 q G := U − i ≤ 8 log(N/δ) =: t. Bounded linear operators in Hilbert space it holds i (i) N+1 (N+1) It then   T Let us start by introducing some basic notions regarding follows from the union bound that P 1≤i≤N {Gi ≤ t} = linear operators in metric spaces.   S  PN   1 − P 1≤i≤N {Gi > t} ≥ 1 − i=1 P Gi > t ≥ Definition 8. Let X , Y be normed linear spaces and let L : X → 1 − PN δ = 1 − δ. i=1 N Y be a linear operator.

(a) L is continuous at a point f ∈ X if fn → f in X implies f → f Y Proof of Lemma 5: A lower bound on the maximum L n L in . expected degree (b) L is continuous if it is continuous at every point, i.e. if fn → f in X implies Lfn → Lf in Y for every f. Using the definition of W and the reverse triangle inequal- N (c) L is bounded if there exists a finite M ≥ 0 such that, for all ity yields f ∈ X , k fk ≤ Mkfk.  N   N  L 1 d 1 X (N) 1 X CN = max  Pij  = max  W (ui, uj) Note that kLfk is the norm of Lf in Y, while kfk is the norm of N N i N i j=1 j=1 f in X . Z 1  Z 1  (d) The operator norm of is ||| ||| := sup k fk. = max W (x, y)dy ≥ max W (x, y)dy L L kfk=1 L N c N x∈[0,1] 0 x∈CN 0 (e) We let B(X , Y) denote the set of all bounded linear operators Z 1 Z 1  mapping X into Y, that is ≥ max W (x, y)dy − |D(x, y)|dy , (53) x∈Cc N 0 0 B(X , Y) = {L : X → Y| L is bounded and linear}. where CN := {x ∈ [0, 1] | ∃k ∈ {1,...,K} s.t. |x − αk| ≤ If X = Y we write B(X ) = B(X , X ) dN } is the subset of points in [0, 1] that are up to dN close to c a discontinuity. Note that for any x ∈ CN , with probability 1 − δ0 (see part 1 of Theorem 1) The following Proposition shows that (a), (b) and (c) are equivalent. Z 1 Z Z |D(x, y)|dy = |D(x, y)|dy + |D(x, y)|dy Proposition 4. Let L : X → Y be a linear operator. Then the c 0 CN CN following conditions are equivalent. ≤ 2LdN + Area(CN ) = 2LdN + 2KdN . (a) L is continuous at every point of X . (b) L is continuous at 0 ∈ X . Substituting in (53) we get (c) kLfk is bounded on the unit ball {f ∈ X ; kfk ≤ 1}. 1 Z 1  Cd ≥ max W (x, y)dy − 2(L + K)d . N c N N x∈CN 0 Let us now focus on linear operators acting on Hilbert spaces. Finally, note that Assumption 1 implies that the degree Proposition 5. (Adjoint) Let ∈ B(X , Y), where X and Y are cd(x) is piece-wise Lipschitz continuous, that is, for any L 0 d Hilbert spaces. Then there exists a unique bounded linear map k ∈ {1,...,K + 1} and any x, x ∈ Ik it holds |c (x) − ∗ d 0 0 (α) L : Y → X such that c (x )| ≤ L|x − x |. If ∆MIN > 2dN this implies that ∗ hLx, yi = hx, L yi for all x ∈ X , y ∈ Y. Z 1  d | max W (x, y)dy − C | ≤ LdN , c Definition 9. Let X be a Hilbert space and L ∈ B(X ). x∈CN 0 ∗ (a) L is self-adjoint if L = L i.e. hLx, yi = hx, Lyi for all c since there must be at least one point in CN which belongs to x, y ∈ X . the same Lipschitz block as argmax cd(x) and has distance (b) L is compact if it maps the unit ball in X to a set with compact dN from it. Overall, we have proven closure.

d d 2N  4 2N  CN ≥ NC − N(3L + 2K)dN ≥ log δ ≥ 9 log δ . We are now ready to state the spectral theorem for compact operators.

APPENDIX C:MATHEMATICAL BACKGROUND Theorem 1. (Spectral theorem, [67, Theorem 2, Chapter 8 §7]) Let the L : X → X be a compact self-adjoint operator on the For completeness, we provide a self-contained review of Hilbert space X satisfying L 6= 0. Then we have a finite or the mathematical tools required in the proofs of our results. countably infinite system of orthonormal elements {ϕk}k≥1 in X The subsection on bounded linear operators is a condensed such that overview of concepts detailed in, e.g., [49], [66], [67]. The (a) The elements ϕk are eigenfunctions associated with the subsection on perturbation theory introduces concepts neces- eigenvalues λk ∈ R, i.e. sary for a formal statement of the sin θ theorem of [62] in the case of compact operators. Lϕk = λkϕk, k = 1, 2,... 21

If the set of nonzero eigenvalues is infinite, then 0 is the unique where LE0 = E0L0 and LE1 = E1L1. Similarly, we can accumulation point. consider the eigenspace [F0] spanned by the eigenfunctions m (b) The operator L has the representation {ϕ˜k}k=1 of L + H and write X ∗ X ∗ ∗ L = λkϕkϕk i.e. Lf = λkhϕk, fiϕk for all f ∈ X . Le = L + H = F0Le0F0 + F1Le1F1. k≥1 k≥1 The problem of measuring the closeness between the eigenspaces [E0] = P0X and [F0] = Pe0X can be The following useful result shows that linear integral tackled by looking at the angle between these sub- operators are compact. spaces. To do so, we can define a diagonal operator Proposition 6. ( [66, Chapter 2, Proposition 4.7]) If K ∈ Θ0 using the principal angles between E0 and F0, i.e., 2 2 R 1 cos−1(s ) ... cos−1(s ) s ≥ · · · ≥ s L ([0, 1] ), then (Kf)(x) = 0 K(x, y)f(y)dy is a compact 1 m where 1 m are ∗ operator. the singular values of E0F0, or equivalently the square- ∗ ∗ root of the nonzero eigenvalues of E0F0F0E0. Then, writ- We conclude this subsection with a generalization of −1 −1  ing S := diag cos (s1) ... cos (sm) , we can define the Perron-Frobenius theorem to linear operators in Hilbert Θ0 = Θ(E0, F0) as space. Let us first introduce some additional notions used −1 Pm −1 in the statement of the result. A closed convex set K ⊂ X is Θ0 = cos (S) i.e. Θ0f = k=1 cos (sk)hφk, fiφk called a cone if λK ⊂ K for all λ ≥ 0 and K ∩ (−K) = {0}. f ∈ X {φ}∞ . If the set {u − v : u, v ∈ K} is dense in X , then K is called a for all and any basis k=1 We are now ready to sin θ total cone. state the Davis-Kahan theorem. Theorem 2. (Krein-Rutman theorem, [49, Theorem 19.2]) Let Theorem 3. (Davis-Kahan sin θ theorem [62]) Let L and Le = + be two self-adjoint operators acting on the Hilbert space X be a Hilbert space, K ⊂ X a total cone and L : X → X a L H X such that = ∗ + ∗ and + = ∗ + compact linear operator that is positive (i.e. L(K) ⊂ K) with L E0L0E0 E1L1E1 L H F0Le0F0 ∗ with [ , ] and [ , ] orthogonal. If the eigenvalues positive spectral radius r(L). Then r(L) is an eigenvalue with an F1Le1F1 E0 E1 F0 F1 (a, b) eigenvector ϕ ∈ K \{0} : Lϕ = r(L)ϕ. of L0 are contained in an interval , and the eigenvalues of Le1 are excluded from the interval (a − δ, b + δ) for some δ > 0, then Perturbation theory for compact self-adjoint operators k ∗ k k sin Θ( , )k = k ∗ k ≤ F1HE0 The natural definition of the angle between two nonzero E0 F0 F1E0 δ vectors ϕ and ϕ˜ in a Hilbert space X is the number for any unitarily invariant operator norm k · k.  hϕ, ϕ˜i  Θ(ϕ, ϕ˜) = cos−1 . (54) kϕkkϕ˜k Note that the above theorem holds even for non-compact operators. Indeed, one might consider more general orthogo- Note that the above concept is well defined because of nal subspaces defined through their projectors, which in turn Cauchy-Schwartz inequality. Consider now the two sub- might not be written as countable sums of the product of the spaces spanned by the two nonzero vectors ϕ and ϕ˜, that elements of an orthogonal basis [62]. is [ϕ] := ϕϕ∗X and [ϕ ˜] :=ϕ ˜ϕ˜∗X . One can extend (54) to define an angle between the two subspaces [ϕ] and [ϕ ˜] as n o Θ([ϕ], [ϕ ˜]) := inf Θ(u, v); u ∈ [ϕ], v ∈ [ϕ ˜] . u,v More generally, one can extend this definition of angle to subspaces spanned by eigenfunctions. This will be partic- ularly useful in situations where we are interested in a compact self-adjoint operator L but we only have access to a modified operator Le = L+H. Indeed, in this case one way to measure how close these operators are is to measure the angle between subspaces spanned by their eigenfunctions. Let us introduce some notation in order to formalize this. We write the subspace (eigenspace) spanned by the eigenfunctions m {ϕk}k=1 of L by [E0] := [ϕ1 ϕ2 . . . ϕm]. We denote ∗ Pm ∗ the projector of [E0] by P0 = E0E0 = k=1 ϕkϕk and its ∗ complementary projector by P1 = E1E1. Now any vector x ∈ X can be written as    x0 x = E0 E1 = E0x0 + E1x1, x1 ∗ ∗ where x0 = E0x and x1 = E1x. We therefore say that x x  is represented by 0 . The corresponding notation for an x1 operator L : X → X is    ∗  L0 0 E0 ∗ ∗ L = E0 E1 ∗ = E0L0E0 + E1L1E1, 0 L1 E1