Covariance in Physics and Convolutional Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Covariance in Physics and Convolutional Neural Networks Miranda C. N. Cheng 1 2 3 Vassilis Anagiannis 2 Maurice Weiler 4 Pim de Haan 5 4 Taco S. Cohen 5 Max Welling 5 Abstract change of coordinates. In his words, the general principle In this proceeding we give an overview of the of covariance states that “The general laws of nature are idea of covariance (or equivariance) featured in to be expressed by equations which hold good for all sys- the recent development of convolutional neural tems of coordinates, that is, are covariant with respect to networks (CNNs). We study the similarities and any substitutions whatever (generally covariant)” (Einstein, differencesbetween the use of covariance in theo- 1916). The rest is history: the incorporation of the math- retical physics and in the CNN context. Addition- ematics of Riemannian geometry in order to achieve gen- ally, we demonstrate that the simple assumption eral covariance and the formulation of the general relativity of covariance, together with the required proper- (GR) theory of gravity. It is important to note that the seem- ties of locality, linearity and weight sharing, is ingly innocent assumption of general covariance is in fact sufficient to uniquely determine the form of the so powerful that it determines GR as the unique theory of convolution. gravity compatible with this principle, and the equivalence principle in particular, up to short-distance corrections1. In a completely different context, it has become clear 1. Covariance and Uniqueness in recent years that a coordinate-independent description It is well-known that the principle of covariance, or coordi- is also desirable for convolutional networks. A covari- nate independence, lies at the heart of the theory of relativ- ant inference process is particularly useful in situations ity. The theory of special relativity was constructed to de- where the distribution of characteristic patterns is sym- scribe Maxwell’s theory of electromagnetism in a way that metric. Important practical examples include satellite im- satisfies the special principle of covariance, which states agery or biomedical microscopy imagery which often do not exhibit a preferred global rotation or chirality. In or- “If a system of coordinates K is chosen so that, in relation to it, physical laws hold good in their simplest form, the der to ensure that the inferred information of a network is equivalent for transformed samples, the network archi- same laws hold good in relation to any other system of co- 2 ′ tecture has to be designed to be equivariant under the ordinates K moving in uniform translation relatively to K” 3 (Einstein, 1916). corresponding group action . A wide range of equivari- ant models has been proposed for signals on flat Euclidean The transformation between K and K′, in other words spaces Rd. In particular, equivariance w.r.t. (subgroups between different inertial frames, can always be achieved of) the Euclidean groups E(d) of translations, rotations d arXiv:1906.02481v1 [cs.LG] 6 Jun 2019 through an element of the (global) Lorentz group. With the and mirrorings of R has been investigated for planar im- benefit of hindsight, there is no good reason why physics ages (d =2)(Cohen & Welling, 2016; 2017; Worrall et al., should only be covariant under a global change of coordi- 2017; Weiler et al., 2018b; Hoogeboom et al., 2018). and nates. Indeed, soon after the development of special rela- volumetric signals (d = 3) (Winkels & Cohen, 2018; tivity, Einstein started to develop his ideas for a theory that 1 is covariant with respect to a local, spacetime-dependent, The uniqueness argument roughly goes as follows. In order to achieve full general covariance, the only ingredients for the *Equal contribution 1Korteweg-de Vries Institute for Math- gravitational part of the action are the Riemann tensor and its ematics, University of Amsterdam, Amsterdam, the Nether- derivatives, with all the indices contracted. From a simple scal- lands 2Institute of Physics, University of Amsterdam, Am- ing argument one can show that all of them apart from the Ricci sterdam, the Netherlands 3On leave from CNRS, France scalar have subleading effects at large distances. Note that this 4Qualcomm-University of Amsterdam (QUVA) Lab, Amster- universality makes no assumptions on the matter fields that may dam, the Netherlands 5Qualcomm AI Research, Amsterdam, be coupled to gravity. 2 the Netherlands. Correspondence to: Vassilis Anagiannis In this article we will use the words “equivariant” and “co- <[email protected]>. variant” interchangeably as they convey the same concept. 3 To define the equivariance of a map requires the definition of Presented at the ICML 2019 Workshop on Theoretical Physics for a group action on the domain and codomain. One specific choice Deep Learning. Copyright 2019 by the author(s). of group action is the co/contravariant transformation of tensors. Covariance in Physics and Convolutional Neural Networks Worrall & Brostow, 2018; Weiler et al., 2018a) and has to be the orthonormal frame bundle, the structure group generally been found to outperform non-equivariant mod- is reduced to O(d) and in our analogy this corresponds to els in accuracy and data efficiency. Equivariance has fur- the vierbein formulation of GR where the group G is the ther proven to be a powerful principle in generalizing con- Lorentz group O(1, 3). volutional networks to more general spaces like the sphere Note that the parallel between the two problems regarding (Cohen et al., 2018b). In general, it has been shown in general covariance forces us to employ the same mathemat- (Kondor & Trivedi, 2018; Cohen et al., 2018c;a) that (glob- ical language of (psuedo) Riemannian geometry. Interest- ally) H-equivariant networks can be generalized to arbi- ingly, we will argue in the next section that our formalism is trary homogeneous spaces H/G where G H is a sub- basically unique once general covariance along with some group4. The feature spaces of such networks≤ are formal- basic assumptions is demanded. This can be comparedwith ized as spaces of sections of vector bundles over H/G, as- the long-distance uniqueness of GR, once covariance is re- sociated to the principal bundle H H/G. Our previous quired. examples are in this setting interpreted→ as E(d)-equivariant networks on Euclidean space Rd = E(d)/ O(d) and SO(3)- equivariant networks on the sphere S2 = SO(3)/ SO(2). 2. The Covariant Convolution This description includes Poincar´e-equivariant networkson In CNNs we are interested in devising a linear map between Minkowski spacetime since Minkowski space R1,3 arises the input feature space and the output space between every as quotient of the Poincar´egroup R1,3 ⋊ O(1, 3) w.r.t. the pair of subsequent layers in the convolutional network. In Lorentz group O(1, 3). this section we will argue that the four properties of 1) lin- Note that the change of coordinates required here is a earity, 2) locality, 3) covariance, and 4) weight sharing is global one. Global symmetries are extremely natural and sufficient to uniquely determine its form, which we give in readily applicable when the underlying space is homoge- (4) and(5) below. neous, i.e. the group action is transitive, meaning the space In mathematical terms, we describe the feature space in contains only a single orbit. At the same time, it is clearly the i-th layer in terms of a fiber bundle Ei with fiber Fi , desirable to have an effective CNN on an arbitrary surface, that is associated to the principal bundle P with the repre- often not equipped with a global symmetry. If the previ- sentation ρ of the structure group G, with the projection ous work on homogeneous spaces is based on an equiv- π : Ei M. The bundle structure captures the transfor- ariance requirement analogous to the special principle of mation properties→ of the feature fields under a change of covariance, then what one needs for general surfaces is an coordinates of the manifold M. For now we focus on a analogue of the general principle of covariance. In other sub-region U of the surface which admits a single coordi- words, we would like to have covariance with respect to a nate chart and a local trivialisation of the bundles. In this local, location-dependent coordinate transformations. language, the feature field corresponds to a local section6 This requirement for local transformation equivariance of fi Γi := Γ(Ei ,U) of the fiber bundle and the linear map ∈ convolutional networks on general manifolds has been rec- is between the space of sections: m Hom(Γin , Γout ). ognized and described in (Cohen et al., 2019). A choice Moreover, we require the linear map to∈ satisfy the follow- of local coordinates is thereby formalized as a gauge wx : ing locality condition: given the distance function on d 5 k k R TxM of the tangent space . Similar to the general the manifold M, which in our case will be supplied by → theory of equivariant networks on homogeneous spaces, the metric, we have (m f)(x) = (m f˜)(x) for all ◦ ◦ the feature fields of these networks are realized as sec- f, f˜ Γin with the property that f(y) = f˜(y) for all y tions of vector bundles over M, this time associated to the with ∈y, x < R for some (fixed) positive number R. The frame bundle FM of the manifold. Local transformations linearityk andk the locality of the map immediately leads to are described as position-dependent gauge transformations the following form of the map. To illustrate this, consider d wx wxgx, where gx G GL(R ) is an element the simplified scenario when the in- and output features are of the7→ structure group.