Graph Structure of Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Graph Structure of Neural Networks Jiaxuan You 1 Jure Leskovec 1 Kaiming He 2 Saining Xie 2 Abstract all the way to the output neurons (McClelland et al., 1986). Neural networks are often represented as graphs While it has been widely observed that performance of neu- of connections between neurons. However, de- ral networks depends on their architecture (LeCun et al., spite their wide use, there is currently little un- 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, derstanding of the relationship between the graph 2015; Szegedy et al., 2015; He et al., 2016), there is currently structure of the neural network and its predictive little systematic understanding on the relation between a performance. Here we systematically investigate neural network’s accuracy and its underlying graph struc- how does the graph structure of neural networks ture. This is especially important for the neural architecture affect their predictive performance. To this end, search, which today exhaustively searches over all possible we develop a novel graph-based representation of connectivity patterns (Ying et al., 2019). From this per- neural networks called relational graph, where spective, several open questions arise: Is there a systematic layers of neural network computation correspond link between the network structure and its predictive perfor- to rounds of message exchange along the graph mance? What are structural signatures of well-performing structure. Using this representation we show that: neural networks? How do such structural signatures gener- (1) a “sweet spot” of relational graphs leads to alize across tasks and datasets? Is there an efficient way to neural networks with significantly improved pre- check whether a given neural network is promising or not? dictive performance; (2) neural network’s perfor- mance is approximately a smooth function of the Establishing such a relation is both scientifically and practi- clustering coefficient and average path length of cally important because it would have direct consequences its relational graph; (3) our findings are consistent on designing more efficient and more accurate architectures. across many different tasks and datasets; (4) the It would also inform the design of new hardware architec- sweet spot can be identified efficiently; (5) top- tures that execute neural networks. Understanding the graph performing neural networks have graph structure structures that underlie neural networks would also advance surprisingly similar to those of real biological neu- the science of deep learning. ral networks. Our work opens new directions for However, establishing the relation between network archi- the design of neural architectures and the under- tecture and its accuracy is nontrivial, because it is unclear standing on neural networks in general. how to map a neural network to a graph (and vice versa). The natural choice would be to use computational graph rep- resentation but it has many limitations: (1) lack of generality: Computational graphs are constrained by the allowed graph 1. Introduction properties, e.g., these graphs have to be directed and acyclic Deep neural networks consist of neurons organized into lay- (DAGs), bipartite at the layer level, and single-in-single-out ers and connections between them. Architecture of a neural at the network level (Xie et al., 2019). This limits the use of network can be captured by its “computational graph” where the rich tools developed for general graphs. (2) Disconnec- neurons are represented as nodes and directed edges link tion with biology/neuroscience: Biological neural networks neurons in different layers. Such graphical representation have a much richer and less templatized structure (Fornito demonstrates how the network passes and transforms the et al., 2013). There are information exchanges, rather than information from its input neurons, through hidden layers just single-directional flows, in the brain networks (Stringer et al., 2018). Such biological or neurological models cannot 1Department of Computer Science, Stanford University be simply represented by directed acyclic graphs. 2Facebook AI Research. Correspondence to: Jiaxuan You <ji- [email protected]>, Saining Xie <[email protected]>. Here we systematically study the relationship between the th graph structure of a neural network and its predictive per- Proceedings of the 37 International Conference on Machine formance. We develop a new way of representing a neural Learning, Vienna, Austria, PMLR 119, 2020. Copyright 2020 by the author(s). network as a graph, which we call relational graph. Our Graph Structure of Neural Networks a Relational Graph Representation c Exploring Relational Graphs d Neural Net Performance 5-layer MLP WS-flex graph Translate to Neural Networks Relational Graphs 5-layer MLP 1 layer 1 round of message exchange WS-flex graph 1 2 3 4 1 2 ⟺ 4 3 Complete graph 1 2 3 4 Translate to b ResNet-34 1 2 3 4 1 2 3 4 1 2 1 2 ⟺ ⟺ 4 3 4 3 1 2 3 4 1 2 3 4 Figure 1: Overview of our approach. (a) A layer of a neural network can be viewed as a relational graph where we connect nodes that exchange messages. (b) More examples of neural network layers and relational graphs. (c) We explore the design space of relational graphs according to their graph measures, including average path length and clustering coefficient, where the complete graph corresponds to a fully-connected layer. (d) We translate these relational graphs to neural networks and study how their predictive performance depends on the graph measures of their corresponding relational graphs. key insight is to focus on message exchange, rather than works with significantly improved performance under just on directed data flow. As a simple example, for a fixed- controlled computational budgets (Section 5.1). width fully-connected layer, we can represent one input • Neural network’s performance is approximately a channel and one output channel together as a single node, smooth function of the clustering coefficient and aver- and an edge in the relational graph represents the message age path length of its relational graph. (Section 5.2). exchange between the two nodes (Figure1(a)). Under this • Our findings are consistent across many architec- formulation, using appropriate message exchange definition, tures (MLPs, CNNs, ResNets, EfficientNet) and tasks we show that the relational graph can represent many types (CIFAR-10, ImageNet) (Section 5.3). of neural network layers (a fully-connected layer, a convo- • The sweet spot can be identified efficiently. Identifying lutional layer, etc.), while getting rid of many constraints of a sweet spot only requires a few samples of relational computational graphs (such as directed, acyclic, bipartite, graphs and a few epochs of training (Section 5.4). single-in-single-out). One neural network layer corresponds • Well-performing neural networks have graph structure to one round of message exchange over a relational graph, surprisingly similar to those of real biological neural and to obtain deep networks, we perform message exchange networks (Section 5.5). over the same graph for several rounds. Our new representa- tion enables us to build neural networks that are richer and Our results have implications for designing neural network more diverse and analyze them using well-established tools architectures, advancing the science of deep learning and of network science (Barabasi´ & Posfai´ , 2016). improving our understanding of neural networks in general. We then design a graph generator named WS-flex that al- lows us to systematically explore the design space of neural 2. Neural Networks as Relational Graphs networks (i.e., relation graphs). Based on the insights from To explore the graph structure of neural networks, we first neuroscience, we characterize neural networks by the clus- introduce the concept of our relational graph representation tering coefficient and average path length of their relational and its instantiations. We demonstrate how our representa- graphs (Figure1(c)). Furthermore, our framework is flexible tion can capture diverse neural network architectures under and general, as we can translate relational graphs into di- a unified framework. Using the language of graph in the verse neural architectures, including Multilayer Perceptrons context of deep learning helps bring the two worlds together (MLPs), Convolutional Neural Networks (CNNs), ResNets, and establish a foundation for our study. etc. with controlled computational budgets (Figure1(d)). 2.1. Message Exchange over Graphs Using standard image classification datasets CIFAR-10 and ImageNet, we conduct a systematic study on how the ar- We start by revisiting the definition of a neural network chitecture of neural networks affects their predictive perfor- from the graph perspective. We define a graph G = mance. We make several important empirical observations: (V; E) by its node set V = fv1; :::; vng and edge set E ⊆ f(v ; v )jv ; v 2 Vg. We assume each node v has • A “sweet spot” of relational graphs lead to neural net- i j i j a node feature scalar/vector xv. Graph Structure of Neural Networks Fixed-width MLP Variable-width MLP ResNet-34 ResNet-34-sep ResNet-50 Scalar: Vector: multiple Tensor: multiple Tensor: multiple Tensor: multiple Node feature x i 1 dimension of data dimensions of data channels of data channels of data channels of data (Non-square) 3×3 depth-wise 3×3 Message function f (·) Scalar multiplication 3×3 Conv i matrix multiplication and 1×1 Conv and 1×1 Conv Aggregation function AGG(·) σ(P(·)) σ(P(·))