Learning Spectral Graph Transformations for Link Prediction

Learning Spectral Graph Transformations for Link Prediction Jérôme Kunegis [email protected] Andreas Lommatzsch [email protected] DAI-Labor, Technische Universität Berlin, Ernst-Reuter-Platz 7, 10587 Berlin, Germany Abstract and edge weight prediction algorithms can be mapped to this form. As a result, the method we propose pro- We present a unified framework for learn- vides a mechanism for estimating any parameters of ing link prediction and edge weight predic- such link prediction algorithms. Analogously, we also tion functions in large networks, based on consider a network’s Laplacian matrix as the basis for the transformation of a graph’s algebraic link prediction. spectrum. Our approach generalizes several graph kernels and dimensionality reduction Recently, several link prediction methods have been methods and provides a method to estimate studied: weighted sums of path counts between their parameters efficiently. We show how nodes (Liben-Nowell & Kleinberg, 2003), the ma- the parameters of these prediction functions trix exponential (Wu & Chang, 2004), the von Neu- can be learned by reducing the problem to mann kernel and diffusion processes (Kandola et al., a one-dimensional regression problem whose 2002), the commute time and resistance distance ker- runtime only depends on the method’s re- nels (Fouss et al., 2007), random forest kernels (Cheb- duced rank and that can be inspected visu- otarev & Shamis, 1997) and the heat diffusion ker- ally. We derive variants that apply to undi- nel (Ito et al., 2005). Similarly, rank reduction of rected, weighted, unweighted, unipartite and the adjacency matrix has been proposed to implement bipartite graphs. We evaluate our method edge weight prediction (Sarwar et al., 2000). Our main experimentally using examples from social contribution is to generalize these link prediction func- networks, collaborative filtering, trust net- tions to a common form and to provide a way to reduce works, citation networks, authorship graphs the high-dimensional problem of learning the various and hyperlink networks. kernels’ parameters to a one-dimensional curve fitting problem that can be solved efficiently. The runtime of the method only depends on the chosen reduced rank, 1. Introduction and is independent of the original graph size. In the area of graph mining, several machine learning We show that this generalization is possible under the problems can be reduced to the problem link predic- assumption that the chosen training and test set are tion. These problems include the prediction of social simultaneously diagonalizable, which we show to be links, collaborative filtering and predicting trust. an assumption made by all link prediction methods we studied. Our framework can be used to learn the Approaching the problem algebraically, we can con- parameters of several graph prediction algorithms, in- sider a graph’s adjacency matrix A, and look for a cluding the reduced rank for dimensionality reduction function F (A) returning a matrix of the same size methods. Since we reduce the parameter estimation whose entries can be used for prediction. Our ap- problem to a one-dimensional curve fitting problem proach consists of computing a matrix decomposition that can be plotted and inspected visually to compare A = UDVT and considering functions of the form the different prediction algorithms, an informed choice F (A) = UF (D)VT , where F (D) applies a function on can be made about them without having to evaluate reals to each element of the graph spectrum D sepa- each algorithm on a test set separately. rately. We show that a certain number of common link We begin by describing the method for undirected, th Appearing in Proceedings of the 26 International Confer- unipartite graphs, and then extend it to weighted ence on Machine Learning, Montreal, Canada, 2009. Copy- and bipartite graphs. As an experimental evaluation, right 2009 by the author(s)/owner(s). we then apply our method to several large network Learning Spectral Graph Transformations for Link Prediction datasets and show which link prediction algorithm per- α is a positive parameter. Additionally, the von Neu- forms best for each. mann kernels require α < 1. 2. Link Prediction in Undirected 2.2. Laplacian Kernels Graphs L = D−A is the combinatorial Laplacian of the graph, and L = I − A = D−1/2LD−1/2 is the normalized In this section, we review common link prediction Laplacian. The Laplacian matrices are singular and methods in undirected graphs that we generalize in positive-semidefinite. Their Moore-Penrose pseudoin- the next section. verse is called the commute time or resistance distance We will use the term link prediction in the general kernel (Fouss et al., 2007). The combinatorial Lapla- sense referring to any problem defined on a graph in cian matrix is also known as the Kirchhoff matrix, due which the position or weight of edges have to be pre- to its connection to electrical resistance networks. dicted. The networks in question are usually large and + sparse, for instance social networks, bipartite rating FCOM(L) = L (5) graphs, trust networks, citation graphs and hyperlink + FCOM(L) = L (6) networks. The link prediction problems we consider can be divided into two classes: In unweighted graphs, the task consists of predicting where edges will form By regularization, we arrive at the regularized Lapla- in the future. In weighted graphs, the task consists of cian kernels (Smola & Kondor, 2003): predicting the weight of such edges. While many net- −1 works are directed in practice, we restrict this study to FCOMR(L) = (I + αL) (7) −1 undirected graphs. Applying this method to directed FCOMR(L) = (I + αL) (8) graphs can be achieved by ignoring the edge directions, or by reducing them to bipartite graphs, mapping each As a special case, the non-normalized regularized vertex to two new vertices containing the inbound and Laplacian kernel is called the random forest kernel for outbound edges respectively. α = 1 (Chebotarev & Shamis, 1997). The normalized Let A ∈ {0, 1}n×n be the adjacency matrix of a sim- regularized Laplacian is equivalent to the normalized von Neumann kernel by noting that (1 + αL)−1 = ple, undirected, unweighted and connected graph on n −1 vertices, and F (A) a function that maps A to a matrix (1 + α)(I − αA) . of the same dimension. (Ito et al., 2005) define the heat diffusion kernel as The following subsections describe link prediction functions F (A) that result in matrices of the same FHEAT(L) = exp(−αL) (9) dimension as A and whose entries can be used for link FHEAT(L) = exp(−αL) (10) prediction. Most of these methods result in a positive- semidefinite matrix, and can be qualified as graph ker- The normalized heat diffusion kernel is equivalent nels. The letter α will be used to denote parameters to the normalized exponential kernel: exp(−αL) = − of these functions. e αexp(αA) (Smola & Kondor, 2003). 2.1. Functions of the Adjacency Matrix 2.3. Rank Reduction Let D ∈ Rn×n be the diagonal degree matrix with Using the eigenvalue decomposition A = UΛUT , a D A D−1/2AD−1/2 rank-k approximation of A, L, A and L is given by a ii = Pj ij . Then A = is the normalized adjacency matrix. Transformations of the ad- truncation leaving only k eigenvalues and eigenvectors jacency matrices A and A give rise to the exponential in Λ and U. and von Neumann graph kernels (Kondor & Lafferty, T 2002; Ito et al., 2005). F(k)(A) = U(k)Λ(k)U(k) (11) For A and A, the biggest eigenvalues are used while the smallest eigenvalues are used for the Laplacian ma- FEXP(A) = exp(αA) (1) trices. F (A) can be used for prediction itself, or F (A) = exp(αA) (2) (k) EXP serve as the basis for any of the graph kernels (Sarwar −1 FNEU(A) = (I − αA) (3) et al., 2000). In practice, only rank-reduced versions −1 FNEU(A) = (I − αA) (4) of graph kernels can be computed for large networks. Learning Spectral Graph Transformations for Link Prediction 2.4. Path Counting In the following, we study the problem of finding suit- able functions F . Another way of predicting links consists of computing the proximity between nodes, measured by the number 3.1. Finding F and length of paths between them. One can exploit the fact that powers An of the adja- Given a graph G, we want to find a spectral transfor- cency matrix of an unweighted graph contain the num- mation F that performs well at link prediction for this ber of paths of length n connecting all node pairs. On particular graph. To that end, we divide the edge set the basis that nodes connected by many paths should of G into a training set and a test set, and then look be considered nearer to each other than nodes con- for an F that maps the training set to the test set with nected by few paths, we compute a weighted mean of minimal error. powers of A as a link prediction function. Formally, let A and B be the adjacency matrices of the d training and test set respectively. We will call A the i source matrix and B the target matrix. The solution FP(A) = X αiA (12) i=0 to the following optimization problem gives the opti- mal spectral transformation for the task of predicting The result is a matrix polynomial of degree d. The the edges in the test set. coefficients αi should be decreasing to reflect the assumption that links are more likely to arise between Problem 1 Let A and B be two adjacency matrices nodes that are connected by short paths than nodes over the same vertex set.

Load more