Graph Comparison Via the Non-Backtracking Spectrum

GRAPH COMPARISON VIA THE NON-BACKTRACKING SPECTRUM Andrew Mellor* and Angelica Grusovin Mathematical Institute University of Oxford *[email protected] December 14, 2018 ABSTRACT The comparison of graphs is a vitally important, yet difficult task which arises across a number of diverse research areas including biological and social networks. There have been a number of approaches to define graph distance however often these are not metrics (rendering standard data- mining techniques infeasible), or are computationally infeasible for large graphs. In this work we define a new metric based on the spectrum of the non-backtracking graph operator and show that it can not only be used to compare graphs generated through different mechanisms, but can reliably compare graphs of varying size. We observe that the family of Watts-Strogatz graphs lie on a mani- fold in the non-backtracking spectral embedding and show how this metric can be used in a standard classification problem of empirical graphs. 1 Introduction Comparing graph structures is a fundamental task in graph theory. In particular the need to identify similar graph structure arises in order to determine common function, behaviour, or generative process. This need is universal across disciplines and applications range from image processing [11], chemistry [1, 26], and social network analysis [24, 29]. For small graphs, identifying two identical graph structures (up to an isomorphism, or relabelling of vertices) is a trivial and computationally feasible task due to the limited number of possible configurations. For graphs with four vertices there are six possible graph configurations. However, for graphs with ten vertices there are 11; 716; 571 possible configurations. Although there is not a closed formula for the number of graph configurations with n vertices, it is clear that it soon becomes intractable to enumerate or compare large graphs through brute-force. The problem of determining whether two graphs are isomorphic is equivalent to finding a permutation matrix P such arXiv:1812.05457v1 [cs.SI] 13 Dec 2018 that AP = PB where A and B are the adjacency matrices of the two graphs. Despite the difficulty in the task1 there have been a number of attempts to define graph distance by minimising jjAP − PBjj for a suitably defined matrix norm jj · jj. The matrices A and B can also take different forms to shift from a local perspective (adjacency matrix) to consider global properties, where A and B capture the number of paths between vertices. Examples of such distances include the chemical distance [26], CKS distance [8], and edit distance [18, 39]. Recent work has focused on reducing the space over which to find a permutation matrix [6] which makes the problem tractable, although still computationally restrictive for large graphs. Beyond scalability issues, one of the drawbacks of these methods is that they require graphs to be of the same order (i.e. the same number of vertices). For many applications this may be desirable, especially where the addition of a single vertex or edge can have drastic consequences on graph function (in graphs of chemical compounds for example). However for other applications we may be interested in the structure at a coarser level. For example, a ring graph of 100 vertices is structurally similar to a ring graph of 200 vertices, however the two cannot be compared using traditional isomorphism arguments. 1 It is currently unknown as to whether this problem lies in the class of P or NP-hard problems [4] GRAPH COMPARISION VIA THE NON-BACKTRACKING SPECTRUM Contrasting approaches use a number of graph properties, or features, to characterise graph structure [43, 17, 23, 35, 33]. The most promising of these make use of the graph spectrum, either of the adjacency matrix directly, or of a version of the graph Laplacian [13, 44, 46, 16, 28]. The spectra of graphs or graph operators are useful since the eigenvalues and eigenvectors characterise the topological structure of a graph in a way which can be interpreted physically. Eigenvalues of the graph Laplacian describe how a quantity (information, heat, people, etc.) localised at a vertex can spread across the graph. These eigenvalues also dictate the stability of dynamics acting across the graph [32]. Beyond the Laplacian, recent work has investigated the spectral properties of higher-order operators such as the non-backtracking graph operator (described in the next section) [40]. For a more detailed (but not complete) examination of graph distances we refer the reader to a recent survey [15]. In this article we present a new method to compare graph structure using the distribution of eigenvalues of the non- backtracking operator in the complex plane. In Section 2 we describe the non-backtracking operator and investigate some of its spectral properties before introducing the distributional non-backtracking spectral distance (d-NBD) in Section 3. We show empirical results for both synthetic and real graphs in Section 4 before discussing the significance of the results and future research in Section 5. 2 The Non-backtracking Operator and Spectrum Consider a simple undirected binary graph G = (V; E) where V ⊂ N is a set of vertices and E ⊂ V 2 is a set of edges. Here we use the word simple to mean that the graph contains no self-loops or multiple edges between vertices. Let n = jV j be the number of vertices and m = jEj be the number of edges. Spectral analysis of graphs is typically conducted using the Laplacian matrix L = A − D where A is the adjacency P matrix which encodes G, and D is a diagonal matrix of vertex degrees with Dii = j Aij and zeros elsewhere. In this work we chose instead to use a different linear graph operator, namely the non-backtracking operator. The non-backtracking (or Hashimoto) matrix B is a 2m × 2m matrix defined on the set of directed edges of G. Here each undirected edge u $ v is replaced by two directed edges u ! v and v ! u. The non-backtracking matrix is then given by 1 if x = v and y 6= u B = (u!v);(x!y) 0 otherwise. The non-backtracking matrix is asymmetric and the entries of Bk describe the number of non-backtracking walks of k length k across G. Of special note is (B )ii which counts the non-backtracking cycles of length k which start and end at edge i2. The total number of non-backtracking cycles of length k is therefore captured in the trace of Bk. 2.1 Spectral Properties The spectral properties of the non-backtracking matrix are well understood, especially for regular graphs [31, 3, 38]. As B is asymmetric the eigenvalues can also take complex values. In Figure 1 we show the spectrum in the complex plane of three real-world graphs and three synthetic graphs of varying size. p p The first observation is that the bulk of the eigenvalues λk lie within the disk jλkj < ρ(B) where ρ(B) = maxk jλkj is the spectral radius of B. This is commonly used as a heuristic for the bulk of the spectrum, however in particular cases more is known. Forp random regular graphs of degree d there are n pairs of conjugate eigenvalues which lie exactly on a circle of radius d − 1, and for a stochastic block model the expected magnitude of the eigenvalues can be shown to be less than or equal to pρ(B) in the limit n ! 1 [25]. Another property of the spectrum of B is that any tree structure (whether connected to the graph, or disconnected) contributes zero eigenvalues to the spectrum. This is a consequence of any non-backtracking walk becoming ‘stuck’ at the leaves of the tree. Furthermore any unicyclic component gives rise to eigenvalues in f0g [ fλ : jλj = 1g. These two properties are easily explained as we make the connection between the number of non-backtracking cycles in the graph and the eigenvalues of B. For any matrix A the trace of Ak can be calculated as n k X k tr A = λi i=1 2Non-backtracking does not mean that a walk only visits distinct vertices, only that it does not immediately return to a vertex it has just arrived from. 2 GRAPH COMPARISION VIA THE NON-BACKTRACKING SPECTRUM 3 1.5 3 2 1.0 2 1 1 0.5 0 0 0.0 2 0 2 4 6 6 4 2 0 2 4 6 2 1 0 1 2 3 1 1 0.5 2 2 1.0 3 3 1.5 4 10 5 2 5 0 0 0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 0 10 20 2 5 5 10 4 Figure 1: The non-backtracking spectrum of real and synthetic networks in the complex plane. Each eigenvalue λk = αk + βki is represented by the point (αk; βk). (Top, left to right) The Karate Club graph [45], Davis Southern Women graph [12], and Florentine Families graph [7]. (Bottom, left to right) An Erdos-R˝ enyi´ graph with edge connection probability p = 0:2 for 50; 100 and 150 vertices. The red circle marks the curve α2 + β2 = pρ(B) where ρ(B) is the spectral radius of B. n where (λi)i=1 are the eigenvalues of A. This result is trivial for symmetric matrices which can be diagonalised, however the result also holds for asymmetric matrices. We can therefore write 2m k X k tr B = λi ; (1) i=1 noting that tr Bk captures a count of all non-backtracking cycles of length k. This means that the eigenvalues of B encode all information regarding the number of non-backtracking cycles in A.

Graph Comparison Via the Non-Backtracking Spectrum

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support