Graph Convolutional Networks Yunsheng Bai Overview

Graph Convolutional Networks Yunsheng Bai Overview 1. Improve GCN itself a. Convolution i. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) ii. Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) iii. Dynamic Filters in Graph Convolutional Networks (2017) b. Pooling (Unpooling) i. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) 2. Apply to new/larger datasets/graphs a. Use GCN as an auxiliary module i. Structured Sequence Modeling with Graph Convolutional Recurrent Networks (2017) b. Use GCN only i. Node/Link classification/prediction: http://tkipf.github.io/misc/GCNSlides.pdf 1. Directed graph a. Modeling Relational Data with Graph Convolutional Networks (2017) ii. Graph classification, e.g. MNIST (with or without pooling) Roadmap 1. Define Convolution for Graph 2. Architecture of Graph Convolutional Networks 3. Improvements: Generalizable Graph Convolutional Networks with Deconvolutional Layers a. Improvement 1: Dynamic Filters -> Generalizable b. Improvement 2: Deconvolution Define Convolution for Graph Laplace http://www.norbertwiener.umd.edu/Research/lectures/2014/MBegue_Prelim.pdf Graph Laplacian http://www.norbertwiener.umd.edu/Research/lectures/2014/MBegue_Prelim.pdf Graph Laplacian Labeled graph Degree matrix Adjacency matrix Laplacian matrix https://en.wikipedia.org/wiki/Laplacian_matrix Graph Fourier Transform L: (Normalized) Graph Laplacian D: Degree Matrix W: Adjacency Matrix U: Eigenvectors of L (Orthonormal b.c. L is symmetric PSD) Λ: Eigenvalues of L : Fourier Transform of x Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) 1-D Convolution Ex. x = [0 1 2 3 4 5 6 7 8] f = [4 -1 0] y = [0*4 1*4+0*(-1) 2*4+1*(-1)+0*0 3*4+2*(-1)+1*0 ...] = [0 4 7 10 ...] I made this based on my EECS 351 lecture notes. Convolution <--> Multiplication in Fourier Domain View X and F as vectors I made this based on my EECS 351 lecture notes. Spectral Filtering ~ to the Convolution: previous slide Filter a signal: “As we cannot express a meaningful translation x(1) y(1) e1 operator in the vertex 1 0 0 (2) (2) e x domain, the convolution e1 e2 e3 0 2 0 2 = y operator on graph G is 0 0 3 e defined in the Fourier 3 x(3) y(3) domain” Inverse Fourier Non-parametric Fourier Transform Transform Filter of x Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) Spectral Filtering x(1) e1 1 0 0 e x(2) e1 e2 e3 0 2 0 2 0 0 3 e 3 x(3) Inverse Fourier Non-parametric Fourier Transform Transform Filter of x e e *<e ,x> e1 + *<e ,x> 2 + *<e ,x> 3 = 1 1 2 2 3 3 Fourier Basis Spectral Filtering x(1) e1 1 0 0 e x(2) e1 e2 e3 0 2 0 2 0 0 3 e 3 x(3) Inverse Fourier Non-parametric Fourier Transform Transform Filter of x = e e e 1*<e1,x> 1 + 2*<e2,x> 2 + 3*<e3,x> 3 The result of the convolution is the original signal: (1) first Fourier Transformed (2) then multiplied by a filter (3) finally inverse Fourier Transformed Spectral Filtering Convolution: Filter a signal: x(1) y(1) e1 1 0 0 (2) (2) e x e1 e2 e3 0 2 0 2 = y 0 0 3 e 3 x(3) y(3) Inverse Fourier Non-parametric Fourier Transform Transform Filter of x Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) Better Filters Convolution: Filter a signal: x(1) y(1) e1 1 0 0 (2) (2) e x e1 e2 e3 0 2 0 2 = y 0 0 3 e 3 x(3) y(3) Inverse Fourier Localized & Fourier Transform Transform Polynomial of x Filter Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) Better Filters: Localized Labeled graph Degree matrix Adjacency matrix Laplacian matrix L2 Wavelets on Graphs via Spectral Graph Theory (2011) Better Filters: Localized Filter: Filter a signal: x(1) x(1) x(1) 1-step neighbors 2-step neighbors x(2) x(2) x(2) K=3 x(3) x(3) x(3) = Θ * + Θ * * + Θ *( )* 0 x(4) 1 x(4) 2 x(4) x(5) x(5) x(5) x(6) x(6) x(6) Fixed Θ for every neighbor :( (Dynamic Filters in Graph Convolutional Networks (2017)) Better Filters, but O(n2) Convolution: Filter a signal: Filter: s(1) Computing Eigenvectors s(2) e1 e2 e3 O(n3) :( s(3) I am actually confused. They used Chebyshev polynomials to approximate the filter, but at the end of the day, the filtered signal is the same as the previous slide. O(n2) :( In fact, authors of Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) set K=1, so no Chebyshev at all. Approximations If K=1, filtering becomes: If set (further approximate): Then, filtering becomes: If input is a matrix: Then, filtering becomes: Filter parameters: Convolved signal matrix: Filtering complexity: Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) Illustration of -1/2 -1/2 D * A * D * X * Feature 2 of Word Embedding of Node 1 F F Node 1 i i l l t t * * = L X e e Z r r Feature 2 of 1 2 Word Embedding of Node 6 Node 6 Feature 1 of Node 6 Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) Roadmap 1. Define Convolution for Graph 2. Architecture of Graph Convolutional Networks 3. Improvements: Generalizable Graph Convolutional Networks with Deconvolutional Layers a. Improvement 1: Dynamic Filters -> Generalizable b. Improvement 2: Deconvolution Architecture of Graph Convolutional Networks Schematic Depiction Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) Roadmap 1. Define Convolution for Graph 2. Architecture of Graph Convolutional Networks 3. Improvements: Generalizable Graph Convolutional Networks with Deconvolutional Layers a. Improvement 1: Dynamic Filters -> Generalizable b. Improvement 2: Deconvolution Improvements: Generalizable Graph Convolutional Networks with Deconvolutional Layers Roadmap 1. Define Convolution for Graph 2. Architecture of Graph Convolutional Networks 3. Improvements: Generalizable Graph Convolutional Networks with Deconvolutional Layers a. Improvement 1: Dynamic Filters -> Generalizable b. Improvement 2: Deconvolution Improvement 1 Dynamic Filters -> Generalizable Roadmap 1. Define Convolution for Graph 2. Architecture of Graph Convolutional Networks 3. Improvements: Generalizable Graph Convolutional Networks with Deconvolutional Layers a. Improvement 1: Dynamic Filters -> Generalizable i. Basics ii. Ideas iii. Ordering iv. Example b. Improvement 2: Deconvolution Baseline Filter: Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) 0 12 0 0 0 12 ... 0 0 0 0 11 11 3 2 1 11 0 11 0 0 0 Filter 4 0 0 0 0 1 All share 11 11 the same 0 0 0 0 5 parameter. 11 11 6 11 11 11 11 0 0 0 11 11 11 11 0 Poor Filter: Parameters All Different (No Sharing) 0 1212 0 0 0 1216 ... 0 0 0 0 1112 1116 3 2 0 0 0 0 1 What if they 1121 1123 Filter 4 share? 1 1131 0 0 0 0 1136 0 0 0 0 5 1145 1145 6 1151 1152 1153 1154 0 0 What if they share? 0 1162 1163 1164 1165 0 Proposed Filter: Parameters Shared across Nodes with Same # of Neighbors 0 ’1221 0 0 0 ’1222 ... 0 ’ 0 0 0 ’ 1121 1122 3 2 1 ’1121 0 ’1122 0 0 0 Share the Filter same ’ 4 112 1 ’1121 0 0 0 0 ’1122 0 0 ’ 0 ’ 0 5 1121 1122 6 ’1141 ’1142 ’1143 ’1144 0 0 Share the same ’114 0 ’1141 ’1142 ’1143 ’1144 0 Proposed Filter: Total Size O(N2*F*C) ’1211 0 0 0 0 0 ... ’ 0 0 0 0 0 1111 3 2 1 ’1121 ’1122 0 0 0 0 Filter 1 4 ’ ’ ’ 0 0 0 (view N=6, 1131 1132 1133 without F=1, adjacency ’ ’ ’ ’ 0 0 5 C=2 1141 1142 1143 1143 info) 6 ’1151 ’1152 ’1153 ’1154 ’1155 0 ’1161 ’1162 ’1163 ’1164 ’1165 ’1166 2 2 Proposed Filter: Total Size O(nmax *F*C) <= O(N *F*C) 0 0 0 0 0 0 ... 0 0 0 0 0 0 3 2 1 ’1121 ’1122 0 0 0 0 Filter 1 4 N=6, 0 0 0 0 0 0 (view F=1, without adjacency C=2, ’ ’ ’ ’ 0 0 5 1141 1142 1143 1144 info) 6 nmax=4 0 0 0 0 0 0 0 0 0 0 0 0 Proposed Filter: Generalizable to Regular CNN 0 0 0 0 0 0 Moving Filter ... Stride == 1 0 0 0 0 0 0 0 0 0 0 0 ’ ’ 0 0 0 0 Filter 1121 1122 0 1 2 3 0 1 0 0 0 0 0 0 (view without 0 4 5 6 0 adjacency ’ ’ ’ ’ 0 0 1141 1142 1143 1144 info) 0 0 0 0 0 0 0 0 0 0 0 Regular 2-D image, Regular CNN 0 0 0 0 0 0 Proposed Filter: More Sharing of Weights 0 0 0 0 0 0 ... 0 0 0 0 0 0 3 2 1 ’1121 ’1122 0 0 0 0 Filter 1 4 Weights from 0 0 0 0 0 0 (view previous rows without adjacency are related to ’ ’ ’ ’ 0 0 5 later rows. 1141 1142 1143 1144 info) 6 0 0 0 0 0 0 0 0 0 0 0 0 Proposed Filter: More Sharing of Weights 0 0 0 0 0 0 ... 0 0 0 0 0 0 3 2 1 ’1121 ’1122 0 0 0 0 Filter 1 4 But copy or 0 0 0 0 0 0 (view other relations? without adjacency If copy, ’ ’ ’ ’ 0 0 5 randomly copy? 1141 1142 1143 1144 info) 6 0 0 0 0 0 0 0 0 0 0 0 0 Proposed Filter: Soft Assignment (Dynamic Filters in Graph Convolutional Networks (2017)) 0 ’1221 0 0 0 ’1222 ..

Load more