Arxiv Preprint Arxiv:1905.11460, 2019

Universal Equivariant Multilayer Perceptrons Siamak Ravanbakhsh 1 2 Abstract necessity of parameter-sharing for invariance was used to prove the limitation of a single layer Perceptron. The follow- Group invariant and equivariant Multilayer Per- up work showed how parameter symmetries can be used ceptrons (MLP), also known as Equivariant Net- to achieve invariance to finite and infinite groups (Shawe- works and Group Group Convolutional Neural Taylor, 1989; Wood & Shawe-Taylor, 1996; Shawe-Taylor, Networks (G-CNN) have achieved remarkable 1993; Wood, 1996). These fundamental early works went success in learning on a variety of data structures, unnoticed during the resurgence of neural network research such as sequences, images, sets, and graphs. This and renewed attention to symmetry (Hinton et al., 2011; paper proves the universality of a broad class of Mallat, 2012; Bruna & Mallat, 2013; Gens & Domingos, equivariant MLPs with a single hidden layer. In 2014; Jaderberg et al., 2015; Dieleman et al., 2016; Cohen particular, it is shown that having a hidden layer & Welling, 2016a). on which the group acts regularly is sufficient for universal equivariance (invariance). For example, When equivariance constraints are imposed on feed-forward some types of steerable-CNNs become universal. layers in an MLP, the linear maps in each layer is constrained Another corollary is the unconditional universal- to use tied parameters (Wood & Shawe-Taylor, 1996; Ravan- ity of equivariant MLPs for all Abelian groups. bakhsh et al., 2017b). This model that we call an equivariant A third corollary is the universality of equivari- MLP appears in deep learning with sets (Zaheer et al., 2017; ant MLPs with a high-order hidden layer, where Qi et al., 2017), exchangeable tensors (Hartford et al., 2018), we give both group-agnostic bounds and group- graphs (Maron et al., 2018), relational data (Graham & Ra- specific bounds on the order of the hidden layer vanbakhsh, 2019), and sets of symmetric elements (Maron that guarantees universal equivariance. et al., 2020). Universality results for some of these models exists (Zaheer et al., 2017; Segol & Lipman, 2019; Keriven & Peyre´, 2019; Maron et al., 2020). Broader results for high 1. Introduction order invariant MLPs appears in (Maron et al., 2019). Uni- versality results for “non-standard” architectures appears Invariance and equivariance properties constrain the out- in (Yarotsky, 2018; Sannai et al., 2019). In addition to put of a function under various transformations of its input. proving universality of networks using polynomial layer, This constraint serves as a strong learning bias that has Yarotsky(2018) also prove universality for standard MLPs proven useful in sample efficient learning for a wide range equivariant to Abelian groups. A similar results follows as of structured data. In this work, we are interested in uni- a corollary to our main theorem. versality results for Multilayer Perceptrons (MLPs) that are constrained to be equivariant or invariant. This type of result A parallel line of work in equivariant deep learning studies guarantees that the model can approximate any continuous linear action of a group beyond permutations. The resulting equivariant (invariant) function with an arbitrary precision, equivariant linear layers can be written using convolution in the same way an unconstrained MLP can approximate an operations (Cohen & Welling, 2016b; Kondor & Trivedi, arbitrary continuous function (Hornik et al., 1989; Cybenko, 2018). When limited to permutation groups, group convolu- 1989; Funahashi, 1989). tion is simply another expression of parameter-sharing (Ra- vanbakhsh et al., 2017b); see also Section 2.3. However, Study of invariance in neural networks goes back to the in working with linear representations, one may move be- book of Perceptrons (Minsky & Papert, 2017), where the yond finite groups (Cohen et al., 2019a; Kondor & Trivedi, 2018); see also (Wood & Shawe-Taylor, 1996). Some appli- 1School of Computer Science, McGill University, Montreal Canada. 2Mila - Quebec AI Institute.. Correspondence to: Siamak cations include equivariance to isometries of the Euclidean Ravanbakhsh <[email protected]>. space (Weiler & Cesa, 2019; Worrall et al., 2017), and sphere (Cohen et al., 2018). Extension of this view to mani- th Proceedings of the 37 International Conference on Machine folds is proposed in (Cohen et al., 2019b). Finally, a third Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- line of work in equivariant deep learning that involves a thor(s). Universal Equivariant Multilayer Perceptrons specialized architecture and learning procedure is that of Capsule networks (Sabour et al., 2017; Hinton et al., 2018); see (Lenssen et al., 2018) for a group theoretic generaliza- tion. 2. Preliminaries Let G = fgg be a finite group. We define the action of this group on two finite sets N and M of input and output units in a feedforward layer. Using these actions which define permutation groups we then define equivariance and invariance. In detail, G-action on the set N is a structure Figure 1. The equivariant MLP of (16). The symbol y indicates G-action on the units, W and W0 for all channels of the hidden preserving map (homomorphism) a : G ! SN, into the c c layer c = 1;:::;C are constrained by the parameter-sharing of (3). symmetric group SN, the group of all permutations of N. The image of this map is a permutation group G ≤ S . If G-action on the hidden layer is regular, the number of channels N N can grow to approximate any continuous G-equivariant function Instead of writing [a(g)](n) for g 2 G and n 2 N, we use with an arbitrary accuracy. Bias terms are not shown. the short notation g · n = g−1n to denote this action. Let M be another G-set, where the corresponding permutation action GM ≤ SM is defined by b : G ! SM. G-action on HnG = fHg; g 2 Gg, form a partition of G. G naturally RN : : N naturally extends to x 2 by g · xn = xg·n 8g 2 GN: acts on the right coset space, where g0 · (Hg) = H(gg0) More conveniently, we also write this action as Agx, where sends one coset to another. The significance of this action is Ag is the permutation matrix form of a(g; ·): N ! N. that “any” transitive G-action is isomorphic to G-action on some right coset space. To see why, note that in this action 2.1. Invariant and Equivariant Linear Maps any h 2 H stabilizes the coset He, because h · He = He.1 Let the real matrix W 2 RjN|×|Mj denote a linear map Therefore in any action the stabilizer identifies the coset W : RjNj ! RjMj. We say this map is G-equivariant iff space. N BgWx = WAgx 8x 2 R ; g 2 G: (1) 2.3. Parameter-Sharing and Group-CNNs View where similar to Ag, the permutation matrix Bg is defined Consider the equivariance condition of (1). Since the equal- N based on the action b(·; g): M ! M. In this definition, we ity holds for all x 2 R , and using the fact that the inverse assume that the group action on the input is faithful – that of a permutation matrix is its transpose, the equivariance ∼ is a is injective, or GN = G. If the action on the output constraint reduces to index set M is not faithful, then the kernel of this action > is a non-trivial normal subgroup of G, ker(b) / G. In this BgWAg = W 8g 2 G: (2) ∼ case GM = G= ker(b) is a quotient group, and it is more accurate to say that W is invariant to ker(b) and equivariant The equation above ties the parameters within the orbits of to G= ker(b). Using this convention G-equivariance and G-action on rows and columns of W: G-invariance correspond to extreme cases of ker(b) = G and ker(b) = feg. Moreover, composition of such invariant- W(m; n) = W(g · m; g · n)8g 2 G; n; m 2 N × M (3) equivariant functions preserves this property, motivating design of deep networks by stacking equivariant layers. where W(g · m; g · n) is an element of the matrix W. This type of group action on Cartesian product space is some- 2.2. Orbits and Homogeneous Spaces times called the diagonal action. In this case, the action is on the Cartesian product of rows and columns of W. GN partitions N into orbits N1;:::; NO, where GN is transitive on each orbit, meaning that for each pair n1; n2 2 No, We saw that any homogenous G-space is isomorphic to ∼ ∼ there is at least one g 2 GN such that g · n1 = n2. If GN a coset space. Using N = HnG and M = KnG, the has a single orbit, it is transitive, and N is called a homoge- 1More generally, when G acts on the coset Ha 2 HnG, all neous space for G. If moreover the choice of g 2 GN with g 2 a−1Ha stabilize Ha. Since g = a−1ha for some h 2 H, g · n1 = n2 is unique, then GN is called regular. we have (a−1ha) · Ha = H(aa−1ha) = Ha. This means that any transitive G-action on a set N may be identified with the Given a subgroup H ≤ G and g 2 G, the right coset : : stabilizer subgroup Gn = fg 2 G s:t: g · n = ng, for a choice of of H in G, defined as Hg = fhg; h 2 Hg is a subset n 2 N. This gives a bijection between N and the right coset space of G. For a fixed H ≤ G, the set of these right-cosets, GnnG. Universal Equivariant Multilayer Perceptrons parameter-sharing constraint of (2) becomes In practice, it is common to construct invariant networks by first constructing an equivariant network followed by 0 −1 −1 0 W(Kg; Hg ) = W(g · Kg; g · Hg ) (4) pooling over H(L).

Load more