<<

Convolutional Networks on (Directed) Graphs via the Graph

Michael Perlmutter

Department of Mathematics University of California, Los Angeles

1 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph-Structured Data

Natural Model for Many Phenomenon Social Networks Molecules Email Networks Citation Networks

2 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Overview - Convolutional Neural Networks on Graphs

Motivation Example tasks Definitions - on graphs?

Limitations (and how we overcome them) Oversmoothing (graph scattering transform) Unable to handle directed graphs (MagNet)

3 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph-Level Task

Graph Classification Is a molecule related to cancer?

Graph Regression What is the formation energy of a molecule?

Graph Synthesis Can we generate new molecules for, e.g., drug development?

4 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Node-Level Tasks

Node Classification Is a member of a social network a Republican or a Democrat?

Node Clustering Divide customers into meaningful subgroups.

5 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Edge-level task

Link Prediction Suggest “people you may know” on Facebook, LinkedIn, etc.

6 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Deep Learning on Graphs

Convolutional neural networks (CNNs) are very good at extracting information from signal data by leveraging the underlying Euclidean geometry. Graph data has structure, but it is non-Euclidean. How should we extend CNNs to graphs?

7 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph Convolutional Networks

Goal: Translate CNN principals to graph-structured data.

Graph Convolutional Networks: Spectral Approach: Define through the spectral decomposition of the (normalized or unnormalized) Graph Laplacian. Spatial Approach: Define convolution as a localized averaging operator. Construct Graph Neural Networks as an alternating sequence of convolutions and nonlinearities.

8 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Notation

G = (V , E) is a graph, V = {v1,..., vN }, E ⊆ V × V Adjacency matrix A

(1 if (v , v ) ∈ E A(j, k) = j k 0 otherwise

Degree vector and matrix

D = diag(d), d(j) = degree of vertex j

Neighborhoods N(vk ) = {vj :(vj , vk ) ∈ E}

9 Perlmutter(UCLA) Deep Learning on (Directed) Graphs - Fourier

Euclidean setting: Signal (e.g. sound) can be seen as function over time. Signals can be decomposed into waves/oscillations thanks to Fourier. On graphs: Functions f : V → R can be indentified with signals, N x ∈ R , x(k) := f (vk ). Question: Can we use Fourier on graph domains? Figure: Frequency decomposition a Yes: We can use Fourier to sample signal study diffusion on graphs via (hearinghealthmatters.org) a graph Laplacian.

10 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph Laplacians

Multiple Different Laplacians:

Unnormalized : LU := D − A − 1 − 1 Symmetric Normalized : LN := I − D 2 AD 2 −1 Random Walk Normalized : LN := I − D A First two are more commonly used since the are symmetric, positive semidefinite. Commonly represented by L, L, or ∆.

11 Perlmutter(UCLA) Deep Learning on (Directed) Graphs The Graph Laplacian On the Cycle Graph

 2 −1 0 ... 0 −1 −1 2 −1 0 ... 0    ......   ......    L = D − A =   . U ......   ......    ......   ......  −1 0 ... 0 −1 2 Eigenvectors are Fourier modes: 2πink/N uk (n) = e . 12 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph Fourier Transform

Fourier Modes

Eigenvectors u1,..., un of L an be interpreted as Fourier modes. T Luk = λk uk , L = UΛU .

Frequency reframed in terms of smoothness and variation of modes Increasing variation quantified by λk = huk , Luk i

Fourier Transform and Inverse N X xˆ(k) = hx, uk i, x = xˆ(k)uk , k=1 Figure: (taken from [Khalid xˆ = UT x, x = Uxˆ. et al., 2011])

13 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Spectral Convolution

Convolutional Filter: The filtration of a signal x by a filter y can be defined via

y[? x(k) := yˆ(k)xˆ(k).

This yields N X y ? x = yˆ(k)xˆ(k)uk k=1 N X T = uk yˆ(k)uk x k=1 =: UYUˆ T x.

This class of convolutions was used in [Bruna et al., 2014]. No direct relationship between Yˆ and Λ.

14 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Avoid Diagonalization?

Drawbacks of diagonalization Computational Cost Overfitting Stability Polynomials of the Laplacian For any polynomial p

N X p(L)x = p(λk )bx(k)uk , k=1 and therefore p(L)x = Up(Λ)UT x. Punchline Can perform spectral convolution without actually computing eigenvalues or eigenvectors.

15 Perlmutter(UCLA) Deep Learning on (Directed) Graphs ChebNet and GCN ChebNet [Defferrard et al., 2016]

Let Tk be the K-th Chebyshev polynomial and use filters:

K X yθ ? x = θk Tk (L)x. k=0 (This idea was previously applied to the graph wavelets in [Hammond et al., 2011].) GCN [Kipf and Welling, 2016] Make numerous simplifications / approximations

−1/2 −1/2 yθ ? x = θDe AeDe x := θAxb , Ae = A + I, De = A1e . With multiple channels and filters

Z = AXΘb , X = (x1 ... xC ). Important Note: Θ is learned but Ab is designed. 16 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Semi-Supervised Node Classification

Setup - All Vertices and Edges

- Node features x1,..., xC for all graph nodes - Labels for some nodes (≤5%)

Popular methods such as GCN suffer from oversmoothing Figure: Visualizations of [Li et al., 2018] benchmarks, taken from [Mernyei and Cangea, 2020].

17 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Oversmoothing

Graph Convolutional Network [Kipf and Welling, 2016]

Z = AXΘb where 1 1 Ab ≈ T = (I + D−1/2AD−1/2) = I − L . 2 2 N

T and LN have the same eigenvectors

LN ui = λi ui , 0 = λ0 < λ1 ≤ ... ≤ λN ≤ 2,

Tui = ωi ui , 1 = ω0 > ω1 ≥ ... ≥ ωN ≥ 0. Low-pass filter Multiplying by T leaves bottom eigenvector unchanged. All other frequencies are depressed. Repeated applications increasingly depress high-frequencies. “Deep” Graph Neural Nets typically use 2 layers.

18 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Bandpass Filters

Diffusion Wavelets

WJ x = {Ψj x, ΦJ x}0≤j≤J , where 2J+1 2J 2J+1 Ψj = K − K , ΦJ = K , and K is either T, P = I − AD−1, or another diffusion matrix.

Theorem: ([Perlmutter et al., 2019])

WJ is a non-expansive frame on a suitable weighted space.

19 Perlmutter(UCLA) Deep Learning on (Directed) Graphs The Graph Scattering Transform

Scattering Channels

Use many paths of the form p = (j1,..., jm):

Upx := Ψjm σ(Ψjm−1 σ(. . . σ(Ψj2 σ(Ψj1 x)) ... ).

Layer-wise update rule:

`  `−1  Hsct := σ UpH Θ + B .

Hybrid Network [Min et al., 2020a] uses both GCN chanels and Scattering channels of each layer. GCN channels focus on low-frequency information. Scattering Channels retain high-frequency information.

20 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Overcoming oversmoothing

Several theoretical results in [Min et al., 2020a] demonstrate the advantages of band-pass filters in conjunction with conventional GCNs. Repeated applications of a low-pass filter cause a signal to converge to its projection onto the bottom eigenvector which is either constant or a function of the degree vector.

21 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph Scattering Transforms

Untrained Variations [Zou and Lerman, 2019] - Original - Shannon wavelets [Gama et al., 2018] - Diffusion wavelets based on T, invariance and stability analysis [Gao et al., 2019] - Diffusion wavelets based on P, statistical moments, graph classification (no theoretical guarantees) [Perlmutter et al., 2019] - Invariance and Stability analysis for general diffusion wavelets

Trained Networks [Min et al., 2020a] - Hybrid network, node classification [Min et al., 2020b] - Attention mechanism [Tong et al., 2020a] - Learns scales based on data [Castro et al., 2020] - Autoencoder for molecule generation

22 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Scattering as a Model of Neural Nets

Original Scattering Transform [Mallat, 2012] vs CNNs CNNs learn relatively arbitary filters, which in practice resemble wavelets Scattering serves as a wavelet-based model for studying stability and invariance.

Graph Scattering vs GNNs Both Scattering and, e.g., GCN, use predefined filters. Learnable weights can be incorporated into a graph scattering framework, e.g., [Min et al., 2020a]. Primary difference is the type of predefined filters, i.e., low-pass vs band-pass. Untrained versions of the graphs scattering transform have similar stability and invariance guarantees to the Euclidean version.

23 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Part 2: Directed Graphs

Setup: G = (V , E), E is a set of directed edges. Adjacency matrix A is asymmetric,

A(j, k) 6= 0 ⇔ (vj , vk ) ∈ E.

Examples: Email, Citations, Traffic, Websites, Twitter, Sports

24 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Does Direction Matter? Common Approach: Symmetrization Preprocess data by symmetrizing the adjacency matrix Sometimes this is reasonable: Node Classification on Citation Networks Papers with the label ‘data science’ are likely to both cite, and be cited by, other data science papers. Other times its not: Node Classification on Email Networks Link Prediction on Citation Networks A paper with many citations is not the same as a paper with many references Goal: Flexible model which can incorporate directional information automatically in a data driven manner (or choose not to when appropriate) 25 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Directed Stochastic Block Model.

(a) Directed adjacency (b) Symmetrized adjacency Figure: Directed edges point from cluster 1 to cluster 2 90% of the time.

26 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Meta Graphs

Noise edges Majority flow 0

1 4

2 3

Figure: Cyclic meta-graph with noise

27 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Telegram Dataset

Activity of the far right [Bovet and Grindrod, 2020] constructed a directed graph of “influence” on Telegram. Analyzed core-peripherary structure (channels in red, websites in blue).

28 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Extending GNNs to Directed Graphs

Spatial Approaches

Local Neighborhoods Nvk = {vj :(vj , vk ) ∈ E} are still well-defined.

However, Nvk 6= {vj :(vk , vj ) ∈ E}. Symmetrizing typically improves performance .

Spectral Approaches Not clear how to generalize the Graph Laplacian. Many different proposed methods. We use a Hermitian matrix, known as the Magnetic Laplacian.

29 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Hermitian Adjacency Matrix

Phase Matrix Θ(q)(j, k) := 2πq(A(j, k) − A(k, j)), 0 ≤ q ≤ .25

Hermitian Adjacency matrix   1 H(q) := A exp iΘ(q) , A = (A + AT ) s s 2 Undirected geometry is captured by the magnitude of entries. Directional information encoded by phase.

Two Special Values of q q = 0 treats the graph as undirected. (q) (q) q = .25 implies H (j, k) = −H (k, j) if (vj , vk ) ∈ E, (vk , vj ) ∈/ E.

30 Perlmutter(UCLA) Deep Learning on (Directed) Graphs The Magnetic Laplacian Unnormalized and Normalized Magnetic Laplacians

(q) (q)  (q) LU := Ds − H = Ds − As exp iΘ (q)  −1/2 −1/2  (q) LN := I − Ds As Ds exp iΘ Setting q = 0 recovers the undirected Laplacians. (q) (q) Both LU and LN are Hermitian positive semidefinite. History First appears in physics literature in 1993 (Lieb and Loss) Ongoing research in the graph signal processing community Different q’s highlight different motifs Has been used for Clustering Community Detection

31 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Directed Cycle

Eigenvectors and Eigenvalues Eigenvectors are the classical Fourier modes, independent of q. Eigenvalues depend on q.

32 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Directed Stars

(a) G (in) (b) G (out) Eigenvectors and Eigenvalues out in Eigenvectors depend on q and satisfy uk =u ¯k . in out Eigenvalues do not depend on q and λk = λk .

33 Perlmutter(UCLA) Deep Learning on (Directed) Graphs MagNet [Zhang et al., 2021]

34 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Other Approaches - [Tong et al., 2020b]

First-order proximity

AF (i, j) = As (i, j) Second-order proximity

Ain(j, k) > 0 if (vj , vk ) have a common incoming neighbor

Aout (j, k) > 0 if (vj , vk ) have a common outgoing neighbor Multi-channel convolution   ZF = GCNF X, A˜ F  ˜  ZSin = GCNSin X, ASin ,  ˜  ZSout = GCNSout X, ASout

35 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Graph Random Walks and PageRank

Random Walk: Take a step to any neighbor at random PageRank: Teleport to random location with probability α, otherwise perform random walk step

36 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Random Walk based GNNs

DGCN [Ma et al., 2019] Use algebraic relationship between graph Laplacian, the random walk matrix, and it’s stationary distribution to define a Laplacian on directed graphs. Makes certain assumptions on the graph. Requires explicitly computing stationary distribution.

DiGraph Inception [Tong et al., 2020c] Replaces random walk with page rank. Allows for applications to more general graphs.

37 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Node Classification Results - Real Data

Table: Node classification accuracy (%). The best results are in bold and the second are underlined.

Type Method Cornell Texas Wisconsin Cora-ML Telegram ChebNet 79.8±5.0 79.2±7.5 81.6±6.3 80.0±1.8 70.2 ±6.8 Spectral GCN 59.0±6.4 58.7±3.8 55.9±5.4 82.0±1.1 73.4 ±5.8 APPNP 58.7±4.0 57.0±4.8 51.8±7.4 82.6±1.4 67.3 ±3.0 SAGE 80.0±6.1 84.3±5.5 83.1±4.8 82.3±1.2 56.6 ±6.0 Spatial GIN 57.9±5.7 65.2±6.5 58.2±5.1 78.1±2.0 74.4 ±8.1 GAT 57.6±4.9 61.1±5.0 54.1±4.2 81.9±1.0 72.6 ±7.5 DGCN 67.3±4.3 71.7±7.4 65.5±4.7 81.3±1.4 90.4 ±5.6 Directed Digraph 66.8±6.2 64.9±8.1 59.6±3.8 79.4±1.8 82.0 ±3.1 DiGraphIB 64.4±9.0 64.9±13.7 64.1±7.0 79.3± 1.2 64.1±7.0 MagNet 84.3±7.0 83.3±6.1 85.7±3.2 79.8±2.5 87.6 ±2.9 This paper Best q 0.25 0.15 0.05 0.0 0.15

38 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Node Classification Results - Synthetic Data

Table: Node classification on DSBM graphs with varying net flow 1 − β∗.

Method / β∗ .05 .10 .15 .20 .25 .30 .35

ChebNet 19.9±0.7 20.1±0.6 20.0±0.6 20.1±0.8 19.9±0.9 20.0±0.5 19.7±0.9 GCN-D 68.6±2.2 74.1±1.8 75.5±1.3 74.9±1.3 72.0±1.4 65.4±1.6 58.1±2.4 APPNP-D 97.4±1.8 94.3±2.4 89.4±3.6 79.8±9.0 69.4±3.9 59.6±4.9 51.8±4.5 SAGE-D 20.2±1.2 20.0±1.0 20.0±0.8 20.0±0.7 19.6±0.9 19.8±0.7 19.9±0.9 GIN-D 57.9±6.3 48.0±11.4 32.7±12.9 26.5±10.0 23.8±6.0 20.6±3.0 20.5±2.8 GAT-D 42.0±4.8 32.7±5.1 25.6±3.8 19.9±1.4 20.0±1.0 19.8±0.8 19.6±0.2 DGCN 81.4±1.1 84.7±0.7 85.5±1.0 86.2±0.8 84.2±1.1 78.4±1.3 69.6±1.5 DiGraph 82.5±1.4 82.9±1.9 81.9±1.1 79.7±1.3 73.5±1.9 67.4±2.8 57.8±1.6 DiGraphIB 99.2±0.4 97.9±0.6 94.1±1.7 88.7±2.0 82.3±2.7 70.0±2.2 57.8±6.4 MagNet 99.6±0.2 99.0±1.0 97.5±0.8 94.2±1.6 88.7±1.9 79.4±2.9 68.8±2.4 Best q 0.25 0.20 0.20 0.25 0.20 0.20 0.20

39 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Link Prediction Results - Real Data

Table: Link prediction accuracy (%). The best results are in bold and the second are underlined.

Direction prediction Existence prediction Cornell Wisconsin Cora-ML CiteSeer Cornell Wisconsin Cora-ML CiteSeer ChebNet 71.0±5.5 67.5±4.5 72.7±1.5 68.0±1.6 80.1±2.3 82.5±1.9 80.0±0.6 77.4±0.4 GCN 56.2±8.7 71.0±4.0 79.8±1.1 68.9±2.8 75.1±1.4 75.1±1.9 81.6±0.5 76.9±0.5 APPNP 69.5±9.0 75.1±3.5 83.7±0.7 77.9±1.6 74.9±1.5 75.7±2.2 82.5±0.6 78.6±0.7 SAGE 75.2±11.0 72.0±3.5 68.2±0.8 68.7±1.5 79.8±2.4 77.3±2.9 75.0±0.0 74.1±1.0 GIN 69.3±6.0 74.8±3.7 83.2±0.9 76.3±1.4 74.5±2.1 76.2±1.9 82.5±0.7 77.9±0.7 GAT 67.9±11.1 53.2±2.6 50.0±0.1 50.6±0.5 77.9±3.2 74.6±0.0 75.0±0.0 75.0±0.0 DGCN 80.7±6.3 74.5±7.2 79.6±1.5 78.5±2.3 80.0±3.9 82.8±2.0 82.1±0.5 81.2±0.4 DiGraph 79.3±1.9 82.3±4.9 80.8±1.1 81.0±1.1 80.6±2.5 82.8±2.6 81.8±0.5 82.2±0.6 DiGraphIB 79.8±4.8 82.0±4.9 83.4±1.1 82.5±1.3 80.5±3.6 82.4±2.2 82.2±0.5 81.0±0.5 MagNet 80.7±2.7 83.6±2.8 86.1±0.9 85.1±0.8 80.6±3.8 82.9±2.6 82.8±0.7 79.9±0.5 Best q 0.10 0.05 0.05 0.15 0.25 0.25 0.05 0.05

40 Perlmutter(UCLA) Deep Learning on (Directed) Graphs Conclusion

Graphs naturally model a variety of phenomena Most GNNs suffer from two limitations Oversmoothing Inability to process directional information The graph scattering transform avoids oversmoothing via bandpass filters MagNet extends popular spectral GNNs to directed graphs

41 Perlmutter(UCLA) Deep Learning on (Directed) Graphs THANK YOU!

42 Perlmutter(UCLA) Deep Learning on (Directed) Graphs ReferencesI

Asad A. Khalid, Muhammad Tanveer Afzal, and Maisaa I. Abdul Qadir. Citation network visualization of citeseer dataset. 2011 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT), pages 367–370, 2011. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and deep locally connected networks on graphs. In International Conference on Learning Representations (ICLR), 2014. Micha¨el Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016. David K. Hammond, Pierre Vandergheynst, and R´emi Gribonval. Wavelets on graphs via . Applied and Computational Harmonic Analysis, 30(2):129 – 150, 2011.

43 Perlmutter(UCLA) Deep Learning on (Directed) Graphs ReferencesII

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International conference on learning representations, 2016. Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. P´eter Mernyei and C˘at˘alina Cangea. Wiki-cs: A wikipedia-based benchmark for graph neural networks. arXiv preprint arXiv:2007.02901, 2020. Michael Perlmutter, Feng Gao, Guy Wolf, and Matthew Hirn. Understanding graph neural networks with asymmetric geometric scattering transforms. arXiv preprint arXiv:1911.06253, 2019. Yimeng Min, Frederik Wenkel, and Guy Wolf. Scattering gcn: Overcoming oversmoothness in graph convolutional networks. Advances in Neural Information Processing Systems, 33, 2020a.

44 Perlmutter(UCLA) Deep Learning on (Directed) Graphs ReferencesIII

Dongmian Zou and Gilad Lerman. Graph convolutional neural networks via scattering. Applied and Computational Harmonic Analysis, 2019. Fernando Gama, Alejandro Ribeiro, and Joan Bruna. Diffusion scattering transforms on graphs. arXiv preprint arXiv:1806.08829, 2018. Feng Gao, Guy Wolf, and Matthew Hirn. Geometric scattering for graph data analysis. In International Conference on , pages 2122–2131, 2019. Yimeng Min, Frederik Wenkel, and Guy Wolf. Geometric scattering attention networks. arXiv preprint arXiv:2010.15010, 2020b. Alexander Tong, Frederik Wenkel, Kincaid MacDonald, Smita Krishnaswamy, and Guy Wolf. Data-driven learning of geometric scattering networks. arXiv preprint arXiv:2010.02415, 2020a. Egbert Castro, Andrew Benz, Alexander Tong, Guy Wolf, and Smita Krishnaswamy. Uncovering the folding landscape of rna secondary structure using deep graph embeddings. In 2020 IEEE International Conference on Big Data (Big Data), pages 4519–4528, 2020. doi: 10.1109/BigData50022.2020.9378305.

45 Perlmutter(UCLA) Deep Learning on (Directed) Graphs ReferencesIV

St´ephane Mallat. Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10):1331–1398, 2012. Alexandre Bovet and Peter Grindrod. The activity of the far right on telegram. https://www.researchgate.net/publication/ 346968575_The_Activity_of_the_Far_Right_on_Telegram_v21, 2020. Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, and Andrew Lim. Directed graph convolutional network. arXiv:2004.13970, 2020b. Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, and Guangyong Chen. Spectral-based graph convolutional network for directed graphs. arXiv:1907.08990, 2019. Z. Tong, Yuxuan Liang, Changsheng Sun, Xinke Li, David S. Rosenblum, and A. Lim. Digraph inception convolutional networks. In NeurIPS, 2020c.

46 Perlmutter(UCLA) Deep Learning on (Directed) Graphs