The General Theory of Permutation Equivarant Neural Networks and Higher Order Graph Variational Encoders

The general theory of permutation equivarant neural networks and higher order graph variational encoders Erik Henning Thiede 1 2 Truong Son Hy 1 Risi Kondor 2 1 Abstract Sn. The concept of permutation equivariant neural networks Previous work on symmetric group equivariant was proposed in (Guttenberg et al., 2016), and discussed in neural networks generally only considered the depth in “Deep sets” by (Zaheer et al., 2017). Since then, case where the group acts by permuting the ele- permutation equivariant models have found applications in ments of a single vector. In this paper we derive a number of domains, including understanding the compo- formulae for general permutation equivariant lay- sitional structure of language (Gordon et al., 2020), and ers, including the case where the layer acts on such models were analyzed from theoretical point of view matrices by permuting their rows and columns si- in (Keriven & Peyre´, 2019)(Sannai et al., 2019). The com- multaneously. This case arises naturally in graph mon feature of all of these approaches however is that they learning and relation learning applications. As a only consider one specific way that permutations can act specific case of higher order permutation equiv- on vectors, namely (f1; : : : ; fn) 7! (fσ−1(1); : : : ; fσ−1(n)). ariant networks, we present a second order graph This is not sufficient to describe certain naturally occurring variational encoder, and show that the latent dis- situations, for example, when Sn permutes the rows and tribution of equivariant generative models must columns of an adjacency matrix. be exchangeable. We demonstrate the efficacy of In the present paper we generalize the notion of permu- this architecture on the tasks of link prediction in tation equivariant neural networks to other actions of the citation graphs and molecular graph generation. symmetric group, and derive the explicit form of the corresponding equivariant layers, including how many learnable parameters they can have. In this sense our paper is similar 1. Introduction to recent works such as (Cohen et al., 2019b; Kondor & Generalizing from the success of convolutional neural net- Trivedi, 2018; Yarotsky, 2018) which examined the algebric works in computer vision, equivariance has emerged as aspects of equivariant nets, but with a specific focus on the a core organzing principle of deep neural network archi- symmetric group. tectures. Classical CNNs are equivariant to translations On the practical side, higher order permutation equivari- (LeCun et al., 1989). In recent years, starting with (Cohen ant neural networks appear naturally in graph learning and & Welling, 2016), researchers have also constructed net- graph generation (Maron et al., 2019; Hy et al., 2018). More works that are equivariant to the three dimensional rotation generally, we argue that this symmetry is critical for en- group (?)(Kondor et al., 2018a), the Euclidean group of coding relations between pairs, triples, quadruples etc. of arXiv:2004.03990v1 [cs.LG] 8 Apr 2020 translations and rotations (Cohen & Welling, 2017)(Weiler entities rather than just whether a given object is a member et al., 2018), and other symmetry groups (Ravanbakhsh of a set or not. As a specific example of our framework we et al., 2017). Closely related are generalizations of convolu- present a second order equivariant graph variational encoder tion to manifolds (Marcos et al., 2017)(Worrall et al., 2017). and demonstrate its use on link preduction and graph gen- Gauge equivariant CNNs form an overarching framework eration tasks. The distribution on the latent layer of such a that connects the two domains (Cohen et al., 2019a). model must be exchangeable (but not necessarily IID), form- The set of all permutations of n objects also forms a group, ing and interesting connection to Bayesian nonparametric called the symmetric group of degree n, commonly denoted models (Bloem-Reddy & Teh, 2019). 1Department of Computer Science, University of Chicago, Chicago, Illinois, USA 2Flatiron Institute, New 2. Equivariance to permutations York City, New York, USA. Correspondence to: Erik Hen- ning Thiede <[email protected]>, Risi Kondor <rkon- A permutation (of order n) is a bijective map dor@flatironinstitute.org>. σ : f1; 2; : : : ; ng ! f1; 2; : : : ; ng. The product of one The general theory of permutation equivarant neural networks and higher order graph variational encoders permutation σ1 with another permutation σ2 is the permutation that we get by first performing σ1, then σ2, i.e., (σ2σ1)(i) := σ2(σ1(i)). It is easy to see that with respect to this notion of product, the set of all n! permutations of order n form a group. This group is called the symmetric group of degree n, and denoted Sn. Now consider a feed-forward neural network consisting of s ; ;:::; neurons, n1 n2 ns. We will denote the activation of Figure 1: The symmetric group acts on vectors by permut- i f i the ’th neuron . Each activation may be a scalar, a vector, ing their elements (left). However, in this paper we also a matrix or a tensor. As usual, we assume that the input to consider equivariance to other types of –actions, such as x Sn our network is a fixed size vector/matrix/tensor , and the the way that a single permutation σ 2 permutes the rows y Sn ouput is a fixed sized vector/matrix/tensor . and columns of an adjacency matrix simultaneously (right). The focus of the present paper is to study the behavior of The most general types of Sn–actions are best expressed in neural networks under the action of Sn on the input x. This Fourier space. encompasses a range of special cases, relevant to different applications: This case arises, for example, in problems involving rank- ings, and was also investigated in (Maron et al., 2019). 1. Trivial action. The simplest case is when Sn acts on x trivially, i.e., σ(x) = x, so permutations don’t change x 5. Other actions. Not all actions of Sn can be reduced to at all. This case is not very interesting for our purposes. actually permuting the elements of a tensor. We will 2. First order permutation action. The simplest non- discuss more general cases in Section4. trivial Sn–action is when x is an n dimensional vector and Sn permutes its elements: 2.1. Invariance vs. equivariance [σ(x)]i = xσ−1(i): (1) Neural networks learning from sets or graphs must be invariant to the action of the symmetric group on their in- This is the case that was investigated in (Zaheer et al., puts. However, even when a network is overall invariant, 2017) because it arises naturally when learning from sets, its internal activations are often expected to be equivariant in particular when xi relates to the i’th element of a set S (covariant) rather than invariant. In graph neural networks, of n objects fo1; : : : ; ong. Permuting the numbering of for example, the output of the `’th layer is often a matrix f ` the objects does not change S as a set, but it does change whose rows are indexed by the vertices (Bruna et al., 2014). the ordering of the elements of x exactly as in (1). When If we permute the vertices, f ` will to change to f `0, where learning from sets the goal is to construct a network ` ` [f 0]i;j = [f ]σ−1(i); j. It is only at the top of the network which, as a whole, is invariant to the permutation action. that invariance is enforced, typically by summing over the i A natural extension allowing us to describe each object index of the penultimate layer. with more than just a single number is when x is an n×d dimensional matrix on which Sn acts by permuting its Other applications demand that the output of the network rows, [σ(x)]i;j = xσ−1(i); j. be covariant with the permutation action. Consider, for 3. Second order permutation action. The second level in example the case of learning to fill in the missing edges of n the hierarchy of permutation actions is the case when x a graph of vertices. In this case both the inputs and the n×n is an n×n matrix on which the symmetric group acts by outputs of the network are adjacency matrices, so the permuting both its rows and its columns: output must transform according to the same action as the input. Naturally, the internal nodes of such a network must [σ(x)]i;j = xσ−1(i); σ−1(j): also co-vary with permutations. action on the inputs and cannot just be invariant. While this might look exotic at first sight, it is exactly the case faced by graph neural networks, where x is In this paper we assume that every activation of the network the adjacency matrix of a graph. More generally, this covaries with permutations. However, each f i may trans- case encompasses any situation involving learning from form according to a different action of the symmetric group. binary relations on a set. The general term for how these activations transform in a equivariance 4. Higher order cases. Extending the above, if x is a tensor coordinated way is , which we define formally of order k, the k’th order permutation action of on x below. Note that invariance is a special case of equivariance Sn σ(f i) = f i transforms it as corresponding to the trivial Sn–action . Finally, we note that in this paper we use the terms covariant and −1 −1 [σ(x)]i1;:::;ik = xσ (i1),...,σ (ik): (2) equivariant essentially interchangeably: in general, the for- The general theory of permutation equivarant neural networks and higher order graph variational encoders mer is more commonly used in the context of compositional 3.

Load more