Graph Structure of Neural Networks

Jiaxuan You1, Jure Leskovec1, Kaiming He2, Saining Xie2 Stanford University1 Facebook AI Research2

1 Neural networks consist of neurons We know connections between neurons affect NN performance But how?

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 2 ? ⟺

Underneath a NN, there is a graph We want to find a proper graph representation of NN to answer: Is there a link between the graph structure and NN performance? If so, what are structural signatures of well-performing NNs? Can these signatures generalize across tasks and datasets? … J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 3 Overview: Methodology

§ A novel representation of neural networks: relational graphs

1 2 ⟺ 4 3 § Relational graphs can represent diverse neural architectures

1 2 MLP layer 1 Translate Translate 3x3 conv to MLP to ResNet MLP layer 2 3x3 conv 4 3 … … § Tools from network science à Graph structure vs NN performance WS-flex graph

WS-flex graph

Translate to 5-layer MLP

Complete graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 4 Overview: Key Findings

§ Consistent “Sweet Spot” for top NNs across architectures

5-layer MLP on CIFAR-10 8-layer CNN on ImageNet ResNet-34 on ImageNet : Complete graph: Complete graph: 33.34 ± 0.36 48.27 ± 0.14 25.73 ± 0.07 Best graph: Best graph: Best graph: • Graphs with certain structure 32.05 ± 0.14 46.73 ± 0.16 24.57 ± 0.10 measures consistently performs well (controlling computational budgets)

§ Top artificial NNs are similar to real biological NNs • Graph structure of the best 5-layer MLP we found, is similar to the macaque whole cortex network

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 5 Overview: Methodology

§ A novel representation of neural networks: relational graphs ß § Computational vs. Relational graphs § Neural computation as message exchange on relational graphs

1 2 ⟺ 4 3

§ Relational graphs can represent diverse neural architectures

§ Tools from network science à Graph structure vs NN performance

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 6 Background: NNs as Computational Graphs

MLP model Computational graphs MLP layer 1 layer Computational 1 2 3 4 MLP layer graphs

MLP layer

MLP layer 1 2 3 4 …

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 7 Background: Related Work

Existing graph-based architecture design approaches focus on computational graphs

Deep Expander Networks, Prabhu et al., 2018 RandWire, Xie et al., 2019 NAS-Bench-101, Ying et al., 2019

Generate bipartite Generate directed graphs over neurons acyclic graphs over NN layers

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 8 Limitations: NNs as Computational Graphs

Limitations: • Lack of flexibility MLP model • Directed acyclic graphs Computational graphs • Disconnection with neuroscience MLP layer 1 layer • Brain networks have flexible structure Computational 1 2 3 4 • Bi-directional information exchange MLP layer graphs

MLP layer

MLP layer 1 2 3 4 …

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 9 Our Approach: Relational Graphs

Relational Graphs 1 round of message exchange Relational graph definition: • Nodes are neurons 1 2 • Edges specify (undirected) connectivity between neurons; • Computation is conducted by message exchange over the graph structure, where a 4 3 node exchange messages with its neighbors

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 10 Our approach: Relational Graphs

Computational graphs Relational Graphs 1 layer 1 round of message exchange Relational graph definition: • Nodes are neurons 1 2 3 4 1 2 • Edges specify (undirected) connectivity between neurons; ⟺ • Computation is conducted by message exchange over the graph structure, where a 4 3 node exchange messages with its neighbors 1 2 3 4

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 11 Our approach: Relational Graphs

Computational graphs Relational Graphs 1 layer 1 round of message exchange

1 2 3 4 1 2 ⟺

4 3 1 2 3 4

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 12 Our Approach: Relational Graphs

Computational graphs Relational Graphs 1 layer 1 round of message exchange

1 2 3 4 1 2 ⟺

4 3 1 2 3 4

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 13 Benefits of Relational Graphs

Benefits: • Flexibility Computational graphs Relational Graphs • No restrictions on graph structure 1 layer 1 round of message exchange (we focus on undirected graph) 1 2 3 4 • Connections with neuroscience 1 2 • Bi-directional information exchange ⟺ Biological neural network: Artificial neural network: 4 3 Macaque whole cortex Best 5-layer MLP 1 2 3 4

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 14 Diverse Relational Graphs

Computational graphs Relational Graphs Computational graphs Relational Graphs 1 layer 1 round of message exchange 1 layer 1 round of message exchange

1 2 3 4 1 2 3 4 1 2 1 2 ⟺ ⟺

4 3 4 3 1 2 3 4 1 2 3 4

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 15 Neural Computation as Message Exchange

Computational graphs Relational graphs Directed message flow Bi-Directed message exchange

1 2 3 4 ���(⋅) ���(⋅)

1 2

4 3

1 2 3 4 ���(⋅) ���(⋅) ���(⋅) ���(⋅) ���(⋅) ���(⋅)

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 16 Neural Network as Rounds of Message Exchange

A relational graph with A 5-layer 5 rounds of message exchange Neural network ���(⋅) ���(⋅)

1 2

4 3

���(⋅) ���(⋅)

This is how Graph Neural Networks compute embeddings!

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 17 Side Note: Connections with GNNs

���(⋅) ���(⋅) This is how Graph Neural Networks

1 2 compute embeddings! Specialty of GNNs: (1) Graph structure is regarded as the input instead of neural architecture; (2) Message functions are shared across 4 3 all the edges to respect input graph’s invariance properties. ���(⋅) ���(⋅)

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 18 Overview: Methodology

§ A novel representation of neural networks: relational graphs

§ Relational graphs can represent diverse neural architectures ß § Can represent architectures from MLP to ResNet

1 2 MLP layer 1 Translate Translate 3x3 conv to MLP to ResNet MLP layer 2 3x3 conv 4 3 … …

§ Tools from network science à Graph structure vs NN performance

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 19 Relational Graphs à Diverse Architectures

Relational Graph

�(�)

�(�) � � ���(⋅) ) � ( �

� � The same relational graph à diverse architectures 4 Key components

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 20 Relational Graphs à Diverse Architectures

Relational Graph 4-layer 4-dim MLP scalar �(�) MLP layer 1 � (� ) Node feature: � ∈ ℝ � � 4-dim 4-dim Message: � � = � � ���(⋅) MLP layer 2 Aggregation: ��� ⋅ = �∑ ⋅ 1 1 )

� Rounds: � = 4 2 2 (

MLP layer 3 � translate 3 3 � � MLP layer 4 4 4

MLP as relational graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 21 Relational Graphs à Diverse Architectures

Relational Graph 4-layer 64-dim MLP vector �(�) MLP layer 1 � (� ) Node feature: � ∈ ℝ � � 64-dim 64-dim Message: � � = � � ���(⋅) MLP layer 2 Aggregation: ��� ⋅ = �∑ ⋅ 1 1 )

� Rounds: � = 4 2 2 (

MLP layer 3 � translate 3 3 � � MLP layer 4 4 4

MLP as relational graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 22 Relational Graphs à Diverse Architectures

Relational Graph 4-layer 65-dim MLP vector �(�) MLP layer 1 � (� ) Node feature: � ∈ ℝ, � ∈ ℝ � � ,, 65-dim 65-dim Message: � � = � � ���(⋅) MLP layer 2 1 1

) Aggregation: ��� ⋅ = �∑ ⋅

� Rounds: � = 4 2 2 (

MLP layer 3 � translate 3 3 � � MLP layer 4 4 4

MLP as relational graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 23 Relational Graphs à Diverse Architectures

Relational Graph ResNet-34 64-channels

� (� ) tensor 3x3 conv ×× �(�) Node feature: � ∈ ℝ � � ���(⋅) Message: � � = � ∗ � 3x3 conv 64 channels 64 channels

) � is 3x3 conv kernel 1 1 2 2 �

( Aggregation: ��� ⋅ = �∑ ⋅ 3 3

3x3 conv � Rounds: � = 34 4 4 � � translate 3x3 conv

ResNet-34 as relational graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 25 Overview

§ A novel representation of neural networks: relational graphs

§ Relational graphs can represent diverse neural architectures

§ Network science à Graph structure vs NN performance ß

WS-flex graph § Graph measures that characterize graph properties § Graph generators that generate diverse graphs WS-flex graph § Control computational budget Complete graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 26 Measuring Graph Structure

§ Graph measures: § Global: average length (�) The average shortest path between any pair of nodes § Local: (�) A measure of the to which nodes in a graph tend to cluster together

� = 2.3 � = 2.0 � = 1.4 � = 0 � = 0.1 � = 0.5

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 27 Generating Diverse Graphs

§ WS-flex graphs have a much better coverage

Classic families of graphs Proposed WS-flex graphs

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 30 Generating Diverse Graphs

§ of WS-flex graphs WS-flex graph

WS-flex graph

Complete graph

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 31 We Control Computational Budget

Relational graphs Total: 8-dim layer Total: 8-dim layer 1-dim 1-dim 1-dim 1-dim 1-dim 1-dim Have different computational FLOPs! L 1-dim A 1-dim 1-dim B 1-dim FLOPs = O(num of edges)

1-dim 1-dim 1-dim 1-dim 1-dim FLOPs A < FLOPs B 1-dim

Relational graphs Total: 10-dim layer Total: 8-dim layer 2-dim 1-dim 1-dim 2-dim 1-dim 1-dim Matched computational FLOPs! J 1-dim A 1-dim 1-dim B 1-dim By controlling each node’s

1-dim 1-dim 1-dim 1-dim 1-dim 1-dim FLOPs A ≈ FLOPs B J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 33 Experimental Setup

§ 5-layer 512-dim MLPs on CIFAR-10 § 3942 graphs, results averaged over 5 seeds § CNNs & ResNet families & EfficientNet-B0 on ImageNet § 52 graphs per experiment, results averaged over 3 seeds

§ Computational budgets in all experiments are controlled

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 34 Overview: Findings

§ Finding 1: Consistent Sweet Spot for top NNs across architectures

§ Finding 2: NN Performance as a smooth function over graph measures

§ Finding 3: Sweet spot can be quickly identified

§ Finding 4: Top artificial NNs are similar to real NNs

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 35 Finding 1: Sweet Spot for Top NNs

WS-flex graph Best graph: (n=64, k=8.19, p=0.13)

WS-flex graph

Translate to 5-layer MLP Complete graph

Complete graph Top-1 Error: 33.34 ± 0.36 Best graph Top-1 Error: 32.05 ± 0.14

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 36 Finding 1: Sweet Spot for Top NNs

5-layer MLP on CIFAR-10

Best graph: (n=64, k=8.19, p=0.13)

Best bin of Binning over graphs the results

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 37 Finding 1: Sweet Spot for Top NNs

5-layer MLP on CIFAR-10

Best graph: (n=64, k=8.19, p=0.13) Use t-test to statistically compare performance of all the bins vs. the best bin

Best bin of Binning over graphs the results

significantly worse p-value < 0.05 Not significantly worse p-value > 0.05

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 38 Finding 1: Sweet Spot for Top NNs

5-layer MLP on CIFAR-10

Best graph: (n=64, k=8.19, p=0.13) Use t-test to statistically compare performance of all “Sweet spot” the bins vs. the best bin

Best bin of Binning over graphs the results

significantly worse p-value < 0.05 Not significantly worse p-value > 0.05

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 39 Finding 1: Consistent Sweet Spot for Top NNs

5-layer MLP on CIFAR-10 8-layer CNN on ImageNet ResNet-34 on ImageNet Complete graph: Complete graph: Complete graph: 33.34 ± 0.36 48.27 ± 0.14 25.73 ± 0.07 Best graph: Best graph: Best graph: 32.05 ± 0.14 46.73 ± 0.16 24.57 ± 0.10

A consistent sweet spot � ∈ 0.43, 0.50 � ∈ 1.82, 2.28

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 40 Finding 1: Consistent Sweet Spot for Top NNs

5-layer MLP on CIFAR-10 8-layer CNN on ImageNet ResNet-34 on ImageNet Complete graph: Complete graph: Complete graph: 33.34 ± 0.36 48.27 ± 0.14 25.73 ± 0.07 Best graph: Best graph: Best graph: 32.05 ± 0.14 46.73 ± 0.16 24.57 ± 0.10 2.89e6 FLOPS 3.66e9 FLOPS

~1000x FLOPS difference But similar sweet spot!

A consistent sweet spot � ∈ 0.43, 0.50 � ∈ 1.82, 2.28

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 41 Finding 1: Consistent Sweet Spot for Top NNs

Quantitative consistency across architectures

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 42 Overview: Findings

§ Finding 1: Consistent Sweet Spot for top NNs across architectures

§ Finding 2: NN Performance as a smooth function over graph measures

§ Finding 3: Sweet spot can be quickly identified

§ Finding 4: Top artificial NNs are similar to real NNs

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 43 Finding 2: NN performance as a smooth function over graph measures a 5-layer MLP on CIFAR-10 b Measures vs Performance c ResNet-34 on ImageNet d Measures vs Performance

Best graph: Best graph:

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 44 Overview: Findings

§ Finding 1: Consistent Sweet Spot for top NNs across architectures

§ Finding 2: NN Performance as a smooth function over graph measures

§ Finding 3: Sweet spot can be quickly identified

§ Finding 4: Top artificial NNs are similar to real NNs

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 45 Finding 3: Sweet spot can be quickly identified

correlation = 0.93 (1) Few graphs are correlation = 0.90 (2) Few epochs are needed to locate a needed to locate a sweet spot sweet spot 52 graphs 3942 graphs 3 epochs 100 epochs

5-layer MLP on CIFAR-10 ResNet-34 on ImageNet

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 46 Overview: Findings

§ Finding 1: Consistent Sweet Spot for top NNs across architectures

§ Finding 2: NN Performance as a smooth function over graph measures

§ Finding 3: Sweet spot can be quickly identified

§ Finding 4: Top artificial NNs are similar to real NNs

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 47 Finding 4: Top artificial NNs are similar to real NNs

a

(1) Best graphs we found are similar to biological neural networks b Biological neural network: Artificial neural network: Macaque whole cortex Best 5-layer MLP

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 48 Finding 4: Top artificial NNs are similar to real NNs

a

(2) Translate biological networks to MLP yields good performance

(1) Best graphs we found are similar to biological neural networks b Biological neural network: Artificial neural network: Macaque whole cortex Best 5-layer MLP

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 49 Finding 4: Top artificial NNs are similar to real NNs

a

(2) Translate biological networks to MLP yields good performance

(1) Best graphs we found are similar More advanced bio networks à to biological neural networks better performed deep networks? b Biological neural network: Artificial neural network: Macaque whole cortex Best 5-layer MLP

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 50 Conclusions

§ A new transition from studying conventional computation architecture to studying graph structure of neural networks. § Well-established methodologies from network science and neuroscience could contribute to understanding and designing deep neural networks. a c d 5-layer MLP on CIFAR-10 e 5-layer MLP WS-flex graph Best graph: (C=0.45, L=2.48) Neural Networks Relational Graphs 1 layer 1 round of message exchange

1 2 3 4 1 2 WS-flex graph Translate to ⟺ 5-layer MLP 4 3 1 2 3 4 Complete graph b 1 2 3 4 1 2 3 4 1 2 1 2 ⟺ ⟺ Complete graph Top-1 Error: 4 3 4 3 33.34 ± 0.36 1 2 3 4 1 2 3 4 Best graph Top-1 Error: 32.05 ± 0.14 J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 51 Graph Structure of Neural Networks

Jiaxuan You Jure Leskovec Kaiming He Saining Xie

J. You, J. Leskovec, K. He, S. Xie, Graph Structure of Neural Networks, ICML 2020 52