Graph Convolutional Networks Using Heat Kernel for Semi-Supervised

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning Bingbing Xu , Huawei Shen , Qi Cao , Keting Cen and Xueqi Cheng CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China fxubingbing, shenhuawei, caoqi, cenketing, [email protected] Abstract get node by its neighboring nodes. GraphSAGE [Hamilton et al., 2017] defines the weighting function as various aggrega- Graph convolutional networks gain remarkable tors over neighboring nodes. Graph attention network (GAT) success in semi-supervised learning on graph- proposes learning the weighting function via self-attention structured data. The key to graph-based semi- mechanism [Velickovic et al., 2017]. MoNet [Monti et al., supervised learning is capturing the smoothness 2017] offers us a general framework for designing spatial of labels or features over nodes exerted by graph methods. It takes convolution as a weighted average of multi- structure. Previous methods, spectral methods and ple weighting functions defined over neighboring nodes. One spatial methods, devote to defining graph con- open challenge for spatial methods is how to determine ap- volution as a weighted average over neighboring propriate neighborhood for target node when defining graph nodes, and then learn graph convolution kernels convolution. to leverage the smoothness to improve the per- formance of graph-based semi-supervised learning. Spectral methods define graph convolution via convolution One open challenge is how to determine appro- theorem. As the pioneering work of spectral methods, Spec- priate neighborhood that reflects relevant informa- tral CNN [Bruna et al., 2014] leverages graph Fourier trans- tion of smoothness manifested in graph structure. form to convert signals defined in vertex domain into spec- In this paper, we propose GraphHeat, leveraging tral domain, and defines the convolution kernel as a set of heat kernel to enhance low-frequency filters and learnable coefficients associated with Fourier bases, i.e., the enforce smoothness in the signal variation on the eigenvectors of Laplacian matrix. Given that the magnitude graph. GraphHeat leverages the local structure of of eigenvalue reflects the smoothness over graph of the as- target node under heat diffusion to determine its sociated eigenvector, only the eigenvectors associated with neighboring nodes flexibly, without the constraint smaller eigenvalues are used. Unfortunately, this method re- of order suffered by previous methods. GraphHeat lies on the eigendecomposition of Laplacian matrix, resulting achieves state-of-the-art results in the task of graph- in high computational complexity. ChebyNet [Defferrard et based semi-supervised classification across three al., 2016] introduces a polynomial parametrization to convo- benchmark datasets: Cora, Citeseer and Pubmed. lution kernel, i.e., convolution kernel is taken as a polynomial function of the diagonal matrix of eigenvalues. Subse- quently, Kipf and Welling [Kipf and Welling, 2017] proposed 1 Introduction graph convolutional network (GCN) via a localized first-order Convolutional neural networks (CNNs) [LeCun et al., 1998] approximation to ChebyNet. However, both ChebyNet and have been successfully used in various machine learning GCN fail to filter out high-frequency noise carried by eigen- problems, such as image classification [He et al., 2016] and vectors associated with high eigenvalues. GWNN [Xu et al., speech recognition [Hinton et al., 2012], where there is an 2019] leverages graph wavelet to implement localized con- underlying Euclidean structure. However, in many research volution. In sum, to the best of our knowledge, previous areas, data are naturally located in a non-Euclidean space, works lack an effective graph convolution method to capture with graph or network being one typical case. The success of the smoothness manifested in the structure of network. CNNs motivates researchers to design convolutional neural In this paper, we propose graph convolutional network network on graphs. Existing methods fall into two categories, with heat kernel, namely GraphHeat, for graph-based semi- spatial methods and spectral methods, according to the way supervised learning. Different from existing spectral meth- that convolution is defined. ods, GraphHeat uses heat kernel to assign larger importance Spatial methods define convolution directly in the vertex to low-frequency filters, explicitly discounting the effect of domain, following the practice of the conventional CNN. high-frequency variation of signals on graph. In this way, For each node, convolution is defined as a weighted aver- GraphHeat performs well at capturing the smoothness of la- age function over its neighboring nodes, with the weight- bels or features over nodes exerted by graph structure. From ing function characterizing the influence exerting to the tar- the perspective of spatial methods, GraphHeat leverages the 1928 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) process of heat diffusion to determine neighboring nodes that ChebyNet is defined as reflect the local structure of the target node and the relevant K−1 information of smoothness manifested in graph structure. Ex- X y = Ug U >x = U( α Λk)U >x periments show that GraphHeat achieves state-of-the-art re- θ k sults in the task of graph-based semi-supervised classification k=0 > > across benchmark datasets: Cora, Citeseer and Pubmed. = fα0I + α1(λ1u1u1 + ··· + λnunun ) (4) K−1 > K−1 > + ··· + αK−1(λ1 u1u1 + ··· + λn unun )gx 2 Preliminary 2 K−1 = (α0I + α1L + α2L + ··· + αK−1L )x: 2.1 Graph Definition GCN [Kipf and Welling, 2017] simplifies ChebyNet by G = fV; E; Ag denotes an undirected graph, where V is the only considering the first-order polynomial approximation, set of nodes with jV j = n, E is the set of edges, and A is i.e., K = 2, and setting α = α0 = −α1. Graph convolu- the adjacency matrix with Ai;j = Aj;i to define the con- tion is defined as nection between node i and node j. Graph Laplacian matrix is defined as L = D − A where D is a diagonal degree 1 P > X k > matrix with Di;i = j Ai;j , normalized Laplacian matrix y = UgθU x = U( αkΛ )U x −1=2 −1=2 k=0 L = In − D AD where In is the identity matrix. (5) > > Since L is a real symmetric matrix, it has a complete set = fαI − α(λ1u1u1 + ··· + λnunun )gx of orthonormal eigenvectors U = (u1; u2; :::; un), known as = α(I − L)x: Laplacian eigenvectors. These eigenvectors have associated n Note that the above three methods all take fu u>gn as real, non-negative eigenvalues fλlgl=1, identified as the fre- i i i=1 quencies of the graph. Without loss of generality, we have a set of basic filters. Spectral CNN directly learns the coef- λ1 ≤ λ2 ≤ · · · ≤ λn. Using eigenvectors and eigenvalues of ficients of each filter. ChebyNet and GCN parameterize the > n convolution kernel g to get combined filters, e.g., L, accel- L, we have L = UΛU , where Λ =diag(fλlgl=1). θ erating the computation and reducing parameter complexity. 2.2 Graph Fourier Transform Taking the eigenvectors of normalized Laplacian matrix as a 3 GraphHeat for Semi-supervised Learning set of bases, graph Fourier transform of a signal x 2 Rn on Graph smoothness is the prior information over graph that graph G is defined as x^ = U >x, and the inverse graph Fourier connected nodes tend to have the same label or similar fea- transform is x = Ux^ [Shuman et al., 2013]. According to tures. Graph-based semi-supervised learning, e.g., node clas- convolution theorem, graph Fourier transform offers us a way sification, gains success via leveraging the smoothness of la- to define the graph convolution operator, represented as ∗G. bels or features over nodes exerted by graph structure. Graph f denotes the convolution kernel in spatial domain, and ∗G is convolutional neural network offers us a promising and flex- defined as ible framework for graph-based semi-supervised learning. In this section, we analyze the weakness of previous graph con- > > x ∗G f = U (U f) (U x) ; (1) volution neural networks at capturing smoothness manifested in graph structure, and propose GraphHeat to circumvent the where is the element-wise Hadamard product. problem of graph-based semi-supervised learning. 2.3 Graph Convolutional Networks 3.1 Motivation > With the vector U f replaced by a diagonal matrix gθ, Given a signal x defined on graph, its smoothness with re- Hadamard product can be written in the form of matrix mul- spect to the graph is measured using tiplication. Filtering the signal x by convolution kernel gθ, Spectral CNN [Bruna et al., 2014] is obtained as 2 > X x(a) x(b) x Lx = Aa;b p − p ; (6) d d (a;b)2E a b y = Ug U >x = (θ u u> +θ u u> +···+θ u u>)x; (2) θ 1 1 1 2 2 2 n n n where a; b 2 V , x(a) represents the value of signal x on node n where gθ =diag(fθigi=1) is defined in spectral domain. a, da is the degree of node a. The smoothness of signal x The above non-parametric convolution kernel is not local- characterizes how likely it has similar values, normalized by ized in space and its parameter complexity scales up to O(n). degree, on connected nodes. To combat these issues, ChebyNet [Defferrard et al., 2016] is For an eigenvector ui of the normalized Laplacian matrix proposed to parameterize gθ with a polynomial expansion L, its associated eigenvalue λi captures the smoothness of ui [Shuman et al., 2013] as K−1 X k > gθ = αkΛ ; (3) λi = ui Lui: (7) k=0 According to Eq. (7), eigenvectors associated with small where K is a hyper-parameter and the parameter αk is the eigenvalues are smooth with respect to graph structure, i.e., polynomial coefficient.

Load more