Neural Trees for Learning on Graphs Rajat Talak, Siyi Hu, Lisa Peng, and Luca Carlone

1 Neural Trees for Learning on Graphs Rajat Talak, Siyi Hu, Lisa Peng, and Luca Carlone Abstract—Graph Neural Networks (GNNs) have emerged as a Various GNN architectures have been proposed, that go flexible and powerful approach for learning over graphs. Despite beyond local message passing, in order to improve expressivity. this success, existing GNNs are constrained by their local message- Graph isomorphism testing, G-invariant function approximation, passing architecture and are provably limited in their expressive power. In this work, we propose a new GNN architecture – and the generalized k-order WL (k-WL) tests have served as the Neural Tree. The neural tree architecture does not perform end objectives and guided recent progress of this inquiry. For message passing on the input graph, but on a tree-structured example, k-order linear GNN [Maron et al., 2019b] and k-order graph, called the H-tree, that is constructed from the input graph. folklore GNN [Maron et al., 2019a] have expressive powers Nodes in the H-tree correspond to subgraphs in the input graph, equivalent to k-WL and (k + 1)-WL test, respectively [Azizian and they are reorganized in a hierarchical manner such that a parent-node of a node in the H-tree always corresponds to a and Lelarge, 2021]. While these architectures can theoretically larger subgraph in the input graph. We show that the neural tree approximate any G-invariant function (as k ! 1), they use architecture can approximate any smooth probability distribution k-order tensors for representations, rendering them impractical function over an undirected graph, as well as emulate the junction for any k ≥ 3. tree algorithm. We also prove that the number of parameters needed to achieve an -approximation of the distribution function There is a need for a new way to look at constructing GNN is exponential in the treewidth of the input graph, but linear architectures. With better end objectives to guide theoretical in its size. We apply the neural tree to semi-supervised node progress. Such an attempt can result in new and expressive classification in 3D scene graphs, and show that these theoretical GNNs that are provably tractable – if not in general, at least properties translate into significant gains in prediction accuracy, in reasonably constrained setting. over the more traditional GNN architectures. A GNN, by its very definition, operates on a graph struc- Index Terms—graph neural networks, universal function ap- tured data. The graph structure of the data determines inter- proximation, tree-decomposition, semi-supervised node classification, 3D scene graphs. dependency between nodes and their features. Probabilistic graphical models present a reasonable and well-established way of articulating and working with such inter-dependencies I. INTRODUCTION in the data. Prior to the advent of neural networks, inference algorithms on such graphical models were successfully applied Graph-structured learning problems arise in several disci- to many real-world problems. Therefore, we pose that a GNN plines, including biology (e.g., molecule classification [Du- architecture operating on a graph should have the expressive venaud et al., 2015]), computer vision (e.g., action recogni- power of at least a probabilistic graphical model, i.e., it tion [Guo et al., 2018], image classification [Satorras and should be able to approximate any distribution defined by Estrach, 2018], shape and pose estimation [Kolotouros et al., a probabilistic graphical model. 2019]), computer graphics (e.g., mesh and point cloud classification and segmentation [Hanocka et al., 2019], [Li et al., 2019]), This is not a trivial requirement as exact inference (akin to and social networks (e.g., fake news detection [Monti et al., learning the distribution or its marginals) on a probabilistic 2019]), among others [Bronstein et al., 2017]. In this landscape, graphical model, without any structural constraints on the Graph Neural Networks (GNN) have gained popularity as a input graph, is known to be an NP-hard problem [Cooper, flexible and effective approach for regression and classification 1990]. Even approximate inference on a probabilistic graphical over graphs. model, without structural constraints, is known to be NP- arXiv:2105.07264v1 [cs.LG] 15 May 2021 Despite this growing research interest, recent work has hard [Roth, 1996]. A common trick, for exact inference, consists pointed out several limitations of existing GNN architec- in constructing a junction tree for an input graph and performing tures [Xu et al., 2019], [Morris et al., 2019], [Maron et al., message passing on the junction tree instead. In the junction 2019a], [Bouritsas et al., 2021]. Local message passing GNNs tree, each node corresponds to a subset of nodes of the input are no more expressive than the Weisfeiler-Lehman (WL) graph graph. The junction tree algorithm remains tractable for graphs isomorphism test [Xu et al., 2019], neither can they serve with bounded treewidth, while [Chandrasekaran et al., 2008] as universal approximators to all G-invariant functions, i.e., shows that treewidth is the only structural parameter, bounding functions defined over a graph G that remains unchanged with which, allows for tractable inference on graphical models. node permutation. [Chen et al., 2019] proves an equivalence This paper proposes a novel GNN architecture – the Neural between the ability to do graph isomorphism testing and the Tree. Neural trees do not perform message passing on the input ability to approximate any G-invariant function. graph, but on a tree-structured graph, called the H-tree, that is constructed from the input graph. Each node in the H-tree The authors are with the Laboratory of Information and Decision Systems corresponds to a subgraph of the input graph. These subgraphs (LIDS), Massachusetts Institute of Technology, Cambridge, MA 02139, USA. are arranged hierarchically in the H-tree such that a parent- Corresponding author: Rajat Talak (email: [email protected]) This work was partially funded by the Office of Naval Research under the node, of a node in the H-tree, will always correspond to a larger ONR RAIDER program (N00014-18-1-2828). subgraph in the input graph. Thus, only the leaf nodes in the 2 H-tree correspond to singleton subsets (i.e., individual nodes) Convolutional Networks (GCN) [Kipf and Welling, 2017], of the input graph. The H-tree is constructed by recursively Message Passing Neural Networks (MPNN) [Gilmer et al., computing tree decompositions of the input graph and its 2017], GraphSAGE [Hamilton et al., 2017], Graph Attention subgraphs, and attaching them to one another to form a Networks (GAT) [Velickoviˇ c´ et al., 2018], [Lee et al., 2019], hierarchy. Neural message passing on the H-tree generates and their successful applications have led to a renewed interest representations for all the nodes and important subgraphs of towards GNNs. In particular, GCNs extend convolutional neural the input graph. networks to graph-structured data, and have been developed in We show that the neural tree architecture can approximate several works [Henaff et al., 2015], [Defferrard et al., 2016], any smooth probability distribution function defined over [Kipf and Welling, 2017], [Bronstein et al., 2017], where they a given undirected graph. We also bound the number of are often motivated as a first-order approximation to spectral parameters required by a neural tree architecture to obtain an convolutions. approximation of an arbitrary (smooth) distribution function The GCN architecture has well-known limitations. For over a graphical model. We show that the number parameters example, the GCN architecture treats all the neighbors of a increases exponentially in the treewidth of the input graph, but node in the same way, irrespective of their features. GAT only linearly in the input graphs size. Thus, for graphs with was proposed to address this limitation [Velickoviˇ c´ et al., bounded treewidth, the neural tree architecture can tractably 2018], [Lee et al., 2019], [Busbridge et al., 2019]. In it, approximate any smooth distribution function. different weights are assigned to different edges, which are also The proposed neural tree architecture has a message passing trained. [Hamilton et al., 2017] propose a GNN architecture to structure that can emulate a junction tree algorithm. This is efficiently generate node embedding for a graph. [Gilmer et al., unlike the standard, local message passing GNN architectures. 2017] propose a message-passing neural network architecture [Xu et al., 2020] suggests that such algorithmic alignments that encompasses all the pre-existing GNN models, and shows of neural architectures lead to lower sample complexity and state-of-the-art results on the molecular property prediction better generalizability. Thus, for the task of inference on benchmark. graphs, the neural tree architecture may also provide for better Expressive Power of GNNs. [Xu et al., 2019] and [Morris generalizablility. et al., 2019] show that local-message-passing GNNs can We use the neural tree architecture for semi-supervised node be as expressive as the 1-Weisfeiler-Lehman (WL) test in classification in 3D scene graphs. A 3D scene graph [Armeni distinguishing between two non-isomorphic graphs. [Garg et al., et al., 2019], [Rosinol et al., 2020] is a hierarchical graphical 2020] further show that local-message-passing GNNs cannot model of a 3D environment, where nodes correspond to entities compute simple graph properties such as longest or shortest in the environment (e.g., objects, rooms, buildings), while edges cycle, diameter, or clique information. Many GNN architectures correspond to relations among these entities (e.g., “room i have been proposed to circumvent such problems of limited contains object j”). In many robotics and computer vision expressivity. [Zhang et al., 2020] show the limitations of GNN applications, the scene graph is partially labeled (e.g., a robot in representing probabilistic logic reasoning. [Morris et al., can predict the label of a subset of known objects, but may 2019] propose a k-GNN, with an architecture inspired by the not have direct access to the labels of rooms and buildings).

Load more