Arxiv:1612.04530V1 [Cs.CV] 14 Dec 2016 Try Into the Functional Form of the Network at Some a Function of the Form F(T [X]) = T [F(X)]
Total Page:16
File Type:pdf, Size:1020Kb
Permutation-equivariant neural networks applied to dynamics prediction Nicholas Guttenberg Araya, Tokyo and Earth-Life-Science Institute, Tokyo Nathaniel Virgo and Olaf Witkowski Earth-Life-Science Institute, Tokyo Hidetoshi Aoki and Ryota Kanai Araya, Tokyo The introduction of convolutional layers greatly advanced the performance of neural networks on image tasks due to innately capturing a way of encoding and learning translation-invariant operations, matching one of the underlying symmetries of the image domain. In comparison, there are a number of problems in which there are a number of different inputs which are all 'of the same type' | multiple particles, multiple agents, multiple stock prices, etc. The corresponding symmetry to this is permutation symmetry, in that the algorithm should not depend on the specific ordering of the input data. We discuss a permutation-invariant neural network layer in analogy to convolutional layers, and show the ability of this architecture to learn to predict the motion of a variable number of interacting hard discs in 2D. In the same way that convolutional layers can generalize to different image sizes, the permutation layer we describe generalizes to different numbers of objects. INTRODUCTION invariances was proposed in [11], where the authors gen- eralized convolutional operations to affine invariant com- Problems with a high degree of symmetry create large putations and showed a way to construct similar things inefficiencies for machine learning algorithms which do for arbitrary symmetry spaces. For permutation sym- not take those symmetries into account. When the prob- metry, work along this line has been done in specifying lem has symmetries (such as translation or rotation in- an expansion in terms of a neural net applied to spe- variance, or in the case of this paper, permutation invari- cific symmetry functions for estimation of potentials in ab ance), then the algorithm must independently learn the initio quantum mechanics calculations[12]. In addition, same pattern multiple times over, leading to a massive in- [13] uses pooling over instances to construct permutation- crease in the amount of training data and time required, invariant summary statistics for different data distribu- and creating many opportunities for spurious biases to tions. creep in. One approach is to handle this at the level of Instead of invariance, which generates a result that the data, removing the redundant symmetries from the does not change under transformations of the input, one input and aligning things into a canonical form. Exam- might want to obtain a property called `equivariance'. In ples of this range from things like data augmentation[1], this case, the goal is not to make the final outcome con- SIFT features[2] and spatial transformer networks[3]. At- stant despite the application of some set of transforma- tentional models also do this implicitly, by aligning the tions, but rather to create a structure such that the input location of attention to features rather than coordinates and output of the network both behave the same way un- in the input data[4{8]. der transformation. That is to say, if invariance targets a The other possibility is to build the desired symme- function of the form f(x) = f(T [x]), equivariance targets arXiv:1612.04530v1 [cs.CV] 14 Dec 2016 try into the functional form of the network at some a function of the form f(T [x]) = T [f(x)]. This is use- level. For example, convolutional neural networks do this ful for things such as image segmentation, where apply- with respect to translation invariance. Because the same ing a rotation to the input image should not change the shared filter is applied at each point in the image space, pixel classes, but should change the positions at which and then (eventually) aggregated through a hierarchy those classes occur (in the same way as the input was of pooling operations, the final classification result be- rotated). A general recipe for constructing equivariant haves in a locally translation-invariant way at each scale. convolutions was proposed in [14], where this technique This kind of convolutional approach has been extended was used to generate a convolutional layer equivariant to to other domains and other symmetries. For example, 90 degree rotations and to mirror symmetry. graph convolutional networks allow for convolution op- In this paper, we're interested in making networks erations to be performed on local neighborhoods of ar- which predict the future dynamics of sets of similar, in- bitrary graphs[9, 10] (this kind of approach can be used teracting objects | for example, learning to predict and for both invariance and, see later, equivariance). A gen- synthesize the future dynamics of a set of particles just eral framework for constructing these kinds of specified by observing their trajectories. This could be something 2 Element 1 Element 2 Element 3 Element 4 PERMUTATIONAL LAYERS The common theme to constructing group-invariant or Properties group-equivariant functions is to perform a pooling over all combinations of the symmetry transforms. In the case x1 x1 x1 x2 x1 x3 x1 x4 of a symmetry group, applying a transformation associ- ated with that symmetry only shuffles around the mem- bers of the group | it does not add or remove a member. That means that as long as one does a calculation that is applied identically to all members of the symmetry group, the output will be invariant to transformations applied to the input. Equivariance can be constructed in a similar fashion by combining two members of the input space and only doing this pooling with respect to transformations applied to one of the two. +++ For permutation equivariance, there is some existing + work, which differ in how interactions between elements are considered. Variadic networks[15, 16] implement per- mutation equivariance through the addition of a kind of mean-field approximation of the interactions between el- 4 ements to otherwise single-element neural networks (but 3 2 the mean-field effect extends to hidden variables as well, Output 1 so may indirectly capture higher order interactions). Re- cent work on set convolutions[17] also provides an equiv- FIG. 1. Structure of a permutational layer. Each output ariant layer for interchangeable objects, in which sum- y is calculated with a function that separately combines the i mation or maximization over pairs of objects is used to correspondingly indexed input x with each of the other inputs i construct the equivariant functions. xj , and then is summed over j. The same weights are used for all i; j combinations. Both of these approaches have in common that there is a kind of wrapper which guarantees that the influence of the rest of the elements on a particular member of the set is pooled in an invariant way, but generally the structure outside of the wrapper is made complex and the struc- ture within the wrapper is made simple (being either an unstructured mean over the population or a maximiza- along the line of deriving effective equations of motions tion or summation of linear functions of the population for complex particles such as animals in a swarm or play- features). ers of a sport, or apply at a more abstract level to some- thing such as relationships between variations in different When considering interacting particles, this may run sectors of the economy or in the interaction between spe- into difficulties | interaction potentials may be, rela- cific stocks. The associated symmetry to this sort of data tively speaking, fairly complex functions. But once we is permutation symmetry, and we are looking for meth- apply a pooling operation, we will lose the ability to track ods which let us build permutation equivariant neural exactly what particles were interacted with in order to networks. contribute to a given hidden layer feature. In that sense, it might require an excessive number of layers to recon- struct the correct interactions, because the network has We present numerical experiments assessing the fea- to learn how to use the hidden variables to encode not sibility and performance of various approaches to this just the interaction but also to store some kind of in- problem, and show that a combination of multi-layer em- dexing information over its neighbors (similar to the way bedded functions inside a max pooling permutation in- that pooling forces an autoencoder in the image domain variant wrapper can achieve significantly higher accuracy to store positional information inside the learned features on predicting the dynamics of interacting hard discs. Fur- so that it can be reconstructed later during the decoding thermore, networks using this combination demonstrate process). the ability to generalize to a different number of particles The ability to wrap a function with a pooling operation than seen during training, and can also handle cases in such that the output is invariant is quite general though, which the particles are not identical by way of including so it should be possible to put a complex function inside an auxiliary random label feature. the interaction | a multi-layer neural network. This 3 State at time t plications to both input elements and then sum the result x to create the object which is aggregated to form the out- x x y put of the layer (their Eq. 3, paraphrased here): y v y x v v x y vx vy x y x vy x v X vx y ~y = σ( W ~x + W ~x ) vy i 1 j 2 i vy j NH Hidden, ReLU ... where W and W are matrices of parameters, σ is an Permutational 1 2 element-wise nonlinearity, and the summation over j can Layer NF Outputs, ReLU be replaced with anything with the appropriate proper- ties over the set of elements (for example, maximization).