Blind Estimation of Eigenvector Centrality from Graph Signals: Beyond Low-Pass Filtering T
Total Page:16
File Type:pdf, Size:1020Kb
1 Blind Estimation of Eigenvector Centrality from Graph Signals: Beyond Low-pass Filtering T. Mitchell Roddenberry and Santiago Segarra Abstract—This paper characterizes the difficulty of estimating Related work. One can frame the general problem of inferring a network’s eigenvector centrality only from data on the nodes, the complete set of edges of an underlying network from data i.e., with no information about the topology of the network. We on the nodes under the concept of network topology inference. model this nodal data as graph signals generated by passing white noise through generic (not necessarily low-pass) graph This is commonly studied through a statistical lens, where filters. Leveraging the spectral properties of graph filters, we each node represents a random variable, and edges encode the estimate the eigenvectors of the adjacency matrix of the un- dependence structure between these random variables [10]– derlying network. To this end, a simple selection algorithm is [12]. Alternative approaches include those based on partial proposed, which chooses the correct eigenvector of the signal correlations [10], structural equation models [13], [14], and covariance matrix with minimal assumptions on the underlying graph filter. We then present a theoretical characterization of the Granger causality [15]. Graph signal processing provides a asymptotic and non-asymptotic performance of this algorithm, different view on network topology inference, where the nodal thus providing a sample complexity bound for the centrality data is assumed to be the output of a latent process on the estimation and revealing key elements driving this complexity. hidden graph [16]–[19]. These approaches solve the topology Finally, we illustrate the developed insights through a set of inference problem by making different assumptions on the pro- numerical experiments on different random graph models. cess that generates the graph signals, e.g., kernel models [20], signal smoothing [16], or consensus dynamics [21], [22]. I. INTRODUCTION Motivated by the high sampling and computational require- The representation of data as graphs, or networks, has ments of network topology inference, the framework of blind become an increasingly prominent approach in science and network inference was proposed. More precisely, recent works engineering [1], [2], allowing one to uncover community have considered the estimation of network characteristics – structure [3], common connection patterns [4], and node such as community structure and centralities – directly from importance [5]. In many settings, there is an assumed network nodal data, i.e., circumventing the intermediate step of con- structure lying underneath a set of interacting agents, but the structing the network. In this context, [23] considers the ob- precise connections in this structure are unobserved. However, servation of a simple finite-length diffusion process with white we would still like to use graph-based analysis tools, such as noise input on a planted partition graph, and characterizes the centrality measures [6]–[9], to draw conclusions about the role relationship between the diffusion time and the difficulty of of the agents in the unobserved interconnected structure. identifying the latent communities from the observed graph As a motivating example, consider a social network where signals. Alternatively, [24] models the observed signals as the set of connections is not known precisely, but one has being low-rank, while [25] models them as a function of a measurements of opinion dynamics among all of the indi- latent time-series. Expanding on this, [26] considers the blind viduals. One then seeks to infer who the most influential community detection problem where there is no fixed graph, individual in the network is. In the network topology inference but rather a family of graphs with shared latent partitions. Most arXiv:2005.00659v1 [cs.SI] 2 May 2020 framework, one would use this collection of opinions to infer related to this work, [27], [28] implement this blind inference a graph structure, and then compute the eigenvector centrality approach for the estimation of eigenvector centralities from of the constructed network. This approach requires the costly – graph signals. Both of these works assume that the graph both in the data and computational sense – construction of an signals are smooth, and characterize the error between the intermediate network, even though the ultimate interest is just estimated and true centralities. We depart from this assumption in the resulting centrality. This leads to the guiding question of in the current paper. this work: How can we estimate the eigenvector centrality of Contributions. The contributions of this paper are three- a graph with hidden edges directly from data supported on the fold: 1) We provide a simple algorithm to select an estimate nodes? Working in the framework of graph signal processing, of the eigenvector centrality from a set of graph signals, we model this data as a set of graph signals, obtained via the 2) We derive sampling requirements for this algorithm, and output of a graph filter applied to white noise. We then explore 3) We illustrate our theoretical findings through experiments the difficulty of this problem, characterized by the distribution on different random graph models. of centrality over the nodes as well as the spectral properties II. PRELIMINARIES of the graph and graph filter. A. Graphs and eigenvector centrality T.M. Roddenberry and S. Segarra are with the Dept. of ECE, Rice University. TMR partially received funding from the Ken Kennedy 2019/20 An undirected graph G consists of a set V of n := jVj nodes, AMD Graduate Fellowship. Emails: [email protected], [email protected]. and a set E ⊆ V × V of edges, corresponding to unordered 2 pairs of nodes. Networks can be represented by an adjacency where H is a polynomial of the graph adjacency matrix A, and (`) m matrix A defined by first setting an arbitrary indexing of the the set w `=1 consists of i.i.d. samples from a zero-mean nodes with the integers 1; : : : ; n, and then assigning A = 1 (`) h (`) (`)>i ij distribution obeying Cov w := E w w = I. (i; j) 2 E A = 0 if and ij otherwise. Unlike existing works, we do not assume the filter H in (2) eigenvector centrality i G The of a node in a graph is to be low-pass. ith A given by the entry of the leading eigenvector of , By considering (1), the covariance matrix of signals follow- u which we denote for convenience. The Perron-Frobenius ing (2) shares a set of eigenvectors with the adjacency matrix. Theorem [29, Theorem 8.3.1] guarantees that every element Specifically, of u has the same sign. Due to this property, the set of all (`) (`) possible eigenvector centralities (ignoring the stipulation of Cy := Cov y = H (A) Cov w H (A) having unit norm) can be characterized as a symmetric convex T + − n (3) cone C = C [C 2 R where 2 X 2 > = [H (A)] = [H (λk)] vkvk : + n − n C = fx 2 R : xi ≥ 0g ; C = fx 2 R : xi ≤ 0g : k=0 Consequently, one can study the eigenvectors of the covariance The boundary of this set, then, is matrix Cy to gain insights into the spectral structure of the ( n ) Y adjacency matrix A. @C = x 2 C : xi = 0 : i=1 III. PROBLEM STATEMENT AND ALGORITHM That is, @C consists of vectors with same-signed entries and at Consider a set of m graph signals obtained as the output least one entry that takes value 0, while C consists of vectors of an unknown graph filter, as in (2). Then, based on (3), one with same-signed entries. In fact, for connected, undirected can analyze the spectral structure of a graph strictly from the graphs, the leading eigenvector of A lies in the interior of observation of such signals, without knowledge of the graph C, since no node can have centrality exactly equal to 0. Of itself. More precisely, we aim to extract the best estimate of the interest in this work is the projection of a vector v onto C, eigenvector centrality u from the eigenvectors of the sample denoted by Π (v), which essentially takes the positive-signed C covariance. This leads to the eigenvector centrality selection or negative-signed part of v, depending on which is closer to problem, stated next. v in the `2-norm. Problem 1 Given the observation of m graph signals follow- B. Graph signals and graph filters ing the model in (2), estimate the eigenvector centrality u. Graph signals, analogously to discrete time signals, are Based on the shared set of eigenvectors between C and A, it functions mapping the nodes to the reals, i.e., x: V! R. y For an indexing of V with [n], a graph signal x is represented makes intuitive sense to estimate the eigenvector centrality by n selecting an eigenvector from the sample covariance matrix. as a vector in R , where xi = x (i). A graph filter H is a linear map between graph signals representable as a polynomial of However, due to noise induced by finite sampling and the H the adjacency matrix1 fact that is unknown, it is not immediately clear which eigenvector should be chosen. T T X k X > Given certain assumptions on the graph filter H, one could H (A) = γkA := H (λk) vkvk ; (1) select the leading eigenvector of the empirical covariance k=0 k=0 matrix m where γk are real-valued coefficients, H (λ) is the extension m 1 X (`) (`) > Cby = (y − y¯)(y − y¯) (4) of the polynomial H to scalar-valued arguments, and (λk; vk) m `=1 denote the eigenpairs of A. as an estimate of u, as done in [27], [28]. In this work, we make no assumptions on the structure of the graph or the C.