A Property Testing Framework for the Theoretical Expressivity of Graph Kernels

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) A Property Testing Framework for the Theoretical Expressivity of Graph Kernels Nils M. Kriege, Christopher Morris, Anja Rey, Christian Sohler TU Dortmund University, Dortmund, Germany fnils.kriege,christopher.morris,anja.rey,[email protected] Abstract cal information of the input graphs adequately. Newly proposed graph kernels are often justified by their ability to take structural Graph kernels are applied heavily for the classifica- graph properties into account that were ignored by previous tion of structured data. However, their expressivity is kernels. Yet, to the best of our knowledge, this argument has not assessed almost exclusively from experimental stud- been formalized. Moreover, there is no theoretical justification ies and there is no theoretical justification why one why certain kernels perform better than others, but merely exper- kernel is in general preferable over another. We in- imental evaluations. We address this by introducing a theoretical troduce a theoretical framework for investigating the framework for the analysis of the expressivity of graph kernels expressive power of graph kernels, which is inspired motivated by concepts from property testing, see, e.g., [Gol- by concepts from the area of property testing. We dreich, 2017]. We consider normalized kernels, which measure introduce the notion of distinguishability of a graph similarity in terms of angles in a feature space. We say that a property by a graph kernel. For several established graph kernel identifies a property if no two graphs are mapped to graph kernels we show that they cannot distinguish the same normalized feature vector unless they both have or both essential graph properties. In order to overcome this, do not have the property. A positive angle between two such we consider a kernel based on k-disc frequencies. We feature vectors can be helpful to classify the property. As the show that this efficiently computable kernel can distin- graph size increases, on the one hand, this angle can become very guish fundamental graph properties. Finally, we obtain small (dependent on the graph size), which is hindering when learning guarantees for nearest neighbor classifiers in applying this knowledge to a learning setting. On the other hand, our framework. we observe that a constant angle between any two feature vectors of two graphs with complementing properties can only rarely be 1 Introduction the case, since only a marginal change in a graph’s features can Linked data arises in various domains such as chem- and bioin- change its property. If a graph can be edited slightly to obtain formatics, social network analysis and pattern recognition. Such a property, it can, however, be viewed as close enough to the data can naturally be represented by graphs. Therefore, machine property to be ignored. Thus, in the sense of property testing, it learning on graphs has become an active research area of increas- is desirable to differentiate between the graph set far away from ing importance. The prevalent approach to classify graphs is a property and the property itself, which motivates the following distinguishes to design kernels on graphs in order to employ standard kernel concept. We say that a graph kernel a property if methods such as support vector machines. Consequently, in the it guarantees a constant angle (independent of the graph size) past two decades a large number of graph kernels have been pro- between the feature vectors of any two graphs, one of which posed, see, e.g., [Vishwanathan et al., 2010]. Most graph kernels has the property and the other is far away from doing so. We decompose graphs and add up the pairwise similarities between study well-known graph kernels and their ability to identify and their substructures following the seminal concept of convolution distinguish fundamental properties such as connectivity. kernels [Haussler, 1999]. Here, substructures may be walks [Gart-¨ The significance of our framework is demonstrated by ad- ner et al., 2003] or certain subgraphs [Ramon and Gartner,¨ 2003; dressing several current research questions. In the graph kernels Shervashidze et al., 2009]. Considering the large number of literature it has been argued that many kernels take either local available graph kernels and the wealth of available benchmark or global graph properties into account, but not both [Kondor data sets [Kersting et al., 2016], it becomes increasingly difficult and Pan, 2016; Morris et al., 2017]. Recent property testing to perform a fair experimental comparison of kernels and to results, however, suggest that under mild assumptions local graph assess their advantages and disadvantages for specific data sets. features are sufficient to derive global properties [Newman and Indeed, current experimental comparisons cannot give a complete Sohler, 2013]. We consider a graph kernel based on local k-discs picture and are of limited help to a practitioner who has to choose which can, in contrast to previous kernels, distinguish global a kernel for a particular application. properties such as planarity in bounded-degree graphs. For a con- Graph kernels are developed with the (possibly conflicting) stant dimensional feature space, we obtain learning guarantees goals of being efficiently computable and capturing the topologi- for kernels that distinguish the class label property. 2348 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) 1.1 Related Work 2 Preliminaries We summarize related work on graph kernels, graph isomorphism, An (undirected) graph G is a pair (V; E) with a finite set of and property testing. vertices V and a set of edges E ⊆ ffu; vg ⊆ V j u 6= vg. We et al. [ ] et al. [ ] Gartner¨ 2003 and Kashima 2003 simultane- denote the set of vertices and the set of edges of G by V (G) ously proposed graph kernels based on random walks, which and E(G), respectively. A walk in a graph G is a sequence of count the number of walks two graphs have in common. Since vertices such that for each pair of consecutive vertices there then, random walk kernels have been studied intensively, see, exists an edge in E(G).A path is a walk that contains each [ et al. e.g., Sugiyama and Borgwardt, 2015; Vishwanathan , 2010; vertex at most once; a cycle is a walk that ends in the starting et al. ] [ ] Kriege , 2014 . Borgwardt and Kriegel 2005 have intro- vertex. Moreover, N(v) denotes the neighborhood of v in V (G), [ ] duced kernels based on shortest paths; Costa and De Grave 2010 i.e., N(v) = fu 2 V (G) j fu; vg 2 E(G)g. The k-disc of a based on neighborhood subgraphs. Recently, graph kernels using vertex v in V (G) is the subgraph induced by all vertices u such [ et al. ] [ matchings Kriege , 2016 and geometric embeddings Jo- that there exists a path of length at most k between u and v. We ] hansson and Dubhashi, 2015 have been proposed. Furthermore, say that two graphs G and H are isomorphic if there exists an [ ] spectral approaches were explored Kondor and Pan, 2016 . edge preserving bijection ' : V (G) ! V (H), i.e., fu; vg in A different line in the development of graph kernels focused E(G) if and only if ('(u);'(v)) in E(H). The equivalence [ et al. on scalable graph kernels, see, e.g., Shervashidze , 2011; classes of the isomorphism relation are called isomorphism types. et al. ] Morris , 2016; Hido and Kashima, 2009 . We denote the set of graphs on n vertices by G . There are few works which investigate graph kernels from a n Let χ be a non-empty set and let κ: χ × χ ! R be a function. theoretical viewpoint. Gartner¨ et al. [2003] introduced the con- Then, κ is a kernel on χ if there is a Hilbert space H and a cept of a complete graph kernel as a graph kernel with an injective κ mapping φ: χ !H such that κ(x; y) = hφ(x); φ(y)i for all feature map. The concept of completeness is too strict for the com- κ x and y in χ, where h·; ·i denotes the inner product of H . We parison of graph kernels and none of the numerous graph kernels κ call φ a feature map, and H a feature space of the kernel κ. proposed for practical applications actually is complete. Two mea- κ Let κ^ be the cosine normalized version of a kernel κ and denote sures of expressivity of kernels from statistical learning theory its normalized feature map by φ^, i.e., where proposed and applied to graph kernels [Oneto et al., 2017] . However, these measures are not specific to graph structured D E φ(x) φ(y) data and cannot be interpreted in terms of distinguishable graph κ^(x; y) = φ^(x); φ^(y) = ; kφ(x)k kφ(y)k properties. The ability of the Weisfeiler–Lehman test to recognize 2 2 (1) non-isomorphic graphs has been studied extensively and the class κ(x; y) = 2 [−1; 1]: of identifiable graphs characterized recently [Kiefer et al., 2015; pκ(x; x) · κ(y; y) Arvind et al., 2015]. Goldreich et al. [1998] formally established the study of The normalized kernel κ^(x; y) is equal to the cosine of the angle property testing, where a central aim is to decide with high between φ(x) and φ(y) in the feature space. Let G be the set of probability in sublinear time whether a property is satisfied all graphs, then a kernel κ: G × G ! R is called graph kernel. or whether it is far from doing so. Goldreich and Ron [2002] initiated a growing line of research of property testers in the 2.1 Definitions from Property Testing bounded degree graph model.

A Property Testing Framework for the Theoretical Expressivity of Graph Kernels

On the Tree–Depth of Random Graphs Arxiv:1104.2132V2 [Math.CO] 15 Feb 2012

Testing Isomorphism in the Bounded-Degree Graph Model (Preliminary Version)

Practical Verification of MSO Properties of Graphs of Bounded Clique-Width

Harary Polynomials

Every Monotone Graph Property Is Testable∗

Graph Properties and Hypergraph Colourings

Isomorphism and Embedding Problems for Infinite Limits of Scale

Distributed Testing of Graph Isomorphism in the CONGEST Model

Tree-Decomposition Graph Minor Theory and Algorithmic Implications

Testing Graph Isomorphism ∗

Elimination Graphs⋆

Spectral Graph Theory