<<

arXiv:1906.02481v1 [cs.LG] 6 Jun 2019 I ytmo coordinates of system a “If h rnfrainbetween transformation The ( 4 epLann.Cprgt21 yteauthor(s). the by 2019 f Copyright Physics Theoretical Learning. on Anagiannis Deep Workshop 2019 ICML Vassilis the at Presented to: Correspondence < Netherlands. the a,teNetherlands the dam, tra,teNetherlands the sterdam, mtc,Uiest fAsedm mtra,teNether- the Amsterdam, Amsterdam, of lands University ematics, aifisteseilpicpeo oaine hc states which covariance, of principle special that way de- the a to in satisfies constructed was of theory relativity Maxwell’s relativ special scribe of of theory theory the The of heart ity. the coordi at or covariance, lies of independence, principle nate the that well-known is It Uniqueness and Covariance 1. scvratwt epc oalcl -dependent, local, a to th respect theory with a rela- for covariant ideas special is his of develop development to started coordi- the Einstein of after tivity, change soon global physics Indeed, a why under nates. reason covariant good be no only is should there the With hindsight, . of Lorentz benefit (global) achieved the be of element always an can through frames, inertial different between aelw odgo nrlto oayohrsse fco- of system the other form, any simplest to relation their ordinates in in good good hold laws hold same laws physical it, to Einstein ulomUiest fAsedm(UA a,Amster- Lab, (QUVA) Amsterdam of Qualcomm-University [email protected] * qa contribution Equal ufiin ouiul eemn h omo the of form is the convolution. sharing, determine weight uniquely to and sufficient linearity locality, proper- of required ties the with assumption together covariance, simple of the that demonstrate Addition- we context. ally, CNN the in and physics theo- retical in covariance of use and the similarities between differences the study We neural convolutional (CNNs). networks of in the development featured recent of equivariance) the overview (or an covariance give of we idea proceeding this In 2 nttt fPyis nvriyo mtra,Am- Amsterdam, of University Physics, of Institute iad .N Cheng N. C. Miranda , K 1916 ′ oigi nfr rnlto eaieyto relatively translation uniform in moving ). oainei hsc n ovltoa erlNetworks Neural Convolutional and Physics in Covariance > 5 1 ulomA eerh Amsterdam, Research, AI Qualcomm otwgd re nttt o Math- for Institute Vries Korteweg-de . Abstract 3 K nlaefo NS France CNRS, from leave On K scoe ota,i relation in that, so chosen is 3 2 1 and aslsAnagiannis Vassilis K ′ nohrwords other in , a Welling Max K or at 2 ” - - arc Weiler Maurice 1916 G)ter fgaiy ti motn ont htteseem- the that note to important is It . of theory (GR) hr h itiuino hrceitcpten ssym- is patterns characteristic of distribution the where ouercsgas( signals volumetric ain”itrhnebya hycne h aeconcept. same the convey they as interchangeably variant” eso oriae,ta s r oain ihrsetto respect with ( covariant covariant)” sys- (generally are all whatever is, substitutions for any that good are coordinates, hold nature which of of equations tems laws by expressed general be “The principle to that general the states words, covariance his of In coordinates. of change rnil npriua,u osotdsac corrections short-distance to up of equivalenc particular, the theory in and principle unique principle, the this as with GR compatible gravity fact determines in it that is covariance powerful general so of assumption innocent ingly gs( ages of mirrorings and fgopato stec/otaain rnfraino t of choice transformation specific co/contravariant One the is codomain. action and group domain of the on action group a nvraiymksn supin ntemte ed htm that fields tha gravity. matter to Note the coupled Ricci on be the distances. assumptions from large no apart makes at them universality effects of i subleading simple all and have that a show scalar From can Riemann one contracted. argument the indices ing the are fo all action ingredients with only the derivatives, the of covariance, part general gravitational full achieve to groups Euclidean the of) rlcvrac n h omlto ftegnrlrelativi general gen- the achieve of formulation to the order and covariance in eral geometry Riemannian of ematics 2017 Euclidean flat on signals for spaces proposed been has models be ant to action designed group archi- network be corresponding network a to the of has samples, information tecture or- transformed inferred In for the equivalent chirality. that is or ensure rotation to global do der preferred often a which exhibit imagery not microscopy im biomedical satellite or include examples agery practical situations Important in covari- metric. useful A particularly is networks. process convolutional clear inference for become ant description desirable has coordinate-independent also a it is that context, years recent different in completely a In 3 2 1 nti ril ewl s h od euvrat n “co- and “equivariant” words the use will we article this In h nqeesagmn ogl osa olw.I order In follows. as goes roughly argument uniqueness The 5 odfieteeuvrac famprqie h ento of definition the requires map a of equivariance the define To .Ters shsoy h noprto ftemath- the of incorporation the history: is rest The ). ; d elre al. et Weiler R 2 = d npriua,euvrac ...(subgroups w.r.t. equivariance particular, In . ( ) 4 oe Welling & Cohen i eHaan de Pim , R 2018b d d a enivsiae o lnrim- planar for investigated been has 3 = ; E( ogbo tal. et Hoogeboom 3 ierneo equivari- of range wide A . d ( ) 4 5 ) , 2016 ikl Cohen & Winkels ftasain,rotations translations, of aoS Cohen S. Taco equivariant ; 2017 ; orl tal. et Worrall , 2 2018 ne the under 5 Einstein , ensors. .and ). 2018 this t the r 1 scal- . ay ty ts e ; - , , Covariance in Physics and Convolutional Neural Networks

Worrall & Brostow, 2018; Weiler et al., 2018a) and has to be the orthonormal frame bundle, the structure group generally been found to outperform non-equivariant mod- is reduced to O(d) and in our analogy this corresponds to els in accuracy and data efficiency. Equivariance has fur- the vierbein formulation of GR where the group G is the ther proven to be a powerful principle in generalizing con- Lorentz group O(1, 3). volutional networks to more general spaces like the sphere Note that the parallel between the two problems regarding (Cohen et al., 2018b). In general, it has been shown in forces us to employ the same mathemat- (Kondor & Trivedi, 2018; Cohen et al., 2018c;a) that (glob- ical language of (psuedo) Riemannian geometry. Interest- ally) H-equivariant networks can be generalized to arbi- ingly, we will argue in the next section that our formalism is trary homogeneous spaces H/G where G H is a sub- basically unique once general covariance along with some group4. The feature spaces of such networks≤ are formal- basic assumptions is demanded. This can be comparedwith ized as spaces of sections of vector bundles over H/G, as- the long-distance uniqueness of GR, once covariance is re- sociated to the principal bundle H H/G. Our previous quired. examples are in this setting interpreted→ as E(d)-equivariant networks on Rd = E(d)/ O(d) and SO(3)- equivariant networks on the sphere S2 = SO(3)/ SO(2). 2. The Covariant Convolution This description includes Poincar´e-equivariant networkson In CNNs we are interested in devising a linear map between Minkowski spacetime since R1,3 arises the input feature space and the output space between every as quotient of the Poincar´egroup R1,3 ⋊ O(1, 3) w.r.t. the pair of subsequent layers in the convolutional network. In Lorentz group O(1, 3). this section we will argue that the four properties of 1) lin- Note that the change of coordinates required here is a earity, 2) locality, 3) covariance, and 4) weight sharing is global one. Global are extremely natural and sufficient to uniquely determine its form, which we give in readily applicable when the underlying space is homoge- (4) and(5) below. neous, i.e. the group action is transitive, meaning the space In mathematical terms, we describe the feature space in contains only a single orbit. At the same time, it is clearly the i-th layer in terms of a fiber bundle Ei with fiber Fi , desirable to have an effective CNN on an arbitrary surface, that is associated to the principal bundle P with the repre- often not equipped with a global . If the previ- sentation ρ of the structure group G, with the projection ous work on homogeneous spaces is based on an equiv- π : Ei M. The bundle structure captures the transfor- ariance requirement analogous to the special principle of mation properties→ of the feature fields under a change of covariance, then what one needs for general surfaces is an coordinates of the M. For now we focus on a analogue of the general . In other sub-region U of the surface which admits a single coordi- words, we would like to have covariance with respect to a nate chart and a local trivialisation of the bundles. In this local, location-dependent coordinate transformations. language, the feature field corresponds to a local section6 This requirement for local transformation equivariance of fi Γi := Γ(Ei ,U) of the fiber bundle and the linear map ∈ convolutional networks on general has been rec- is between the space of sections: m Hom(Γin , Γout ). ognized and described in (Cohen et al., 2019). A choice Moreover, we require the linear map to∈ satisfy the follow- of local coordinates is thereby formalized as a gauge wx : ing locality condition: given the distance function on d 5 k k R TxM of the tangent space . Similar to the general the manifold M, which in our case will be supplied by → theory of equivariant networks on homogeneous spaces, the metric, we have (m f)(x) = (m f˜)(x) for all ◦ ◦ the feature fields of these networks are realized as sec- f, f˜ Γin with the property that f(y) = f˜(y) for all y tions of vector bundles over M, this time associated to the with ∈y, x < R for some (fixed) positive number R. The frame bundle FM of the manifold. Local transformations linearityk andk the locality of the map immediately leads to are described as position-dependent gauge transformations the following form of the map. To illustrate this, consider d wx wxgx, where gx G GL(R ) is an element the simplified scenario when the in- and output features are of the7→ structure group. When∈ the≤ frame bundle is chosen just numbers(scalars) which do not transform under coordi-

4 nate transformation and M is replaced with a set with fi- Note that we are using an inverted definition of H and G w.r.t. nite elements equipped with the distance function,S then the the original paper to stay compatible with the convention used in (Cohen et al., 2019), discussed below. above requirements immediately leads to the matrix form 5 d The gauge is equivalent to choosing a basis {e˜i}i=1 := of the map (m f)(x) = y∈S,ky,xk

µ1µ2...µNo for the output feature field as fout and similarly fin y y ′ for the input, and the transformation property (2) is suc- ) (y ) fin(y | cinctly summarised by the tensor and index structure of y′ 1 2 fin µ µ ...µNo ′ the kernel function, which we write as Kν1...ν (x, v). γv(x ) Ni γv v |y ( x ′) In other words for a fixed x and v TxM, we have fin y (x) v(x′) ⊗No ∗ ⊗Ni ∈ | K(x, v) (TxM) (Tx M) . Explicitly, we now γ have for these∈ cases ⊗ x x′ e µ1µ2...µNo fout (x)= (4) Figure 1. Parallel transport of the feature field. 1 µ1µ2...µNo ν ...νNi d g(x) Kν1...νN (x, v) fin exp v (x) d v Z | | i | x Bx p see that our simple and general assumptions in fact com- pletely determine the form of the convolution map. Finally, we would like to impose the weight sharing condi- tion, which we phrase in the following way: when the (lo- calised) input signal is parallel transported along a curve, 3. Discussion the output signal should also equal to the parallel transport After pointing out the parallel between special and gen- of the previous result. First, we need to explain what we eral relativity and equivariant CNNs, it is also important to mean by parallel transporting the input feature field along point out the crucial differences. In the CNN setup, the ge- a curve γ˜ : [0, 1] M with γ˜(0) = x and γ˜(1) = x′. ometry is always held fixed and we do not consider dynam- For the point x itself→ it is clear that we can simply paral- ics of the metric. From this point of view the closer anal- lel transport fin (x) to fin (˜γ(t)). Suppose that x γ˜(t) ogy is perhaps the study of field theories in a fixed curved y = exp v is connected| by the| geodesic flow γ starting x v spacetime where the back-reaction of the matter fields to from x, we also transport v TxM to v(˜γ(t)) T M ∈ ∈ γ˜(t) the spacetime geometry has been ignored. It would be in- and transport fin (x) by transporting it to fin (˜γ(t)) y y teresting to explore equivariant CNNs with geometry that and then further transport| it along the geodesic flow| γ v(˜γ(t)) evolves between layers in future work. It is certainly tempt- starting from γ˜(t). After this prescription, depicted in Fig- ing to treat the direction of different layers as a part of the ure 1, it is clear how one should define K(x′, v) such that spacetime, either as the temporal or the holographic direc- the weight sharing condition is true. Recall that for a given tion (’t Hooft, 1993). This interpretation is particularly rel- v T M, K (x) := K(x, v) (T M)⊗No (T ∗M)⊗Ni . x v x x evant if all feature spaces carry the same group representa- Now∈ parallel transport it along∈γ˜ to obtain K⊗ (˜γ(t)), and v tion. define that

K(˜γ(t), v(˜γ(t))) := Kv(˜γ(t)). (5) Acknowledgements In other words, we simultaneously parallel transport the The work of MC is supported by ERC starting grant dependence on the tangent vector. The vanishing of the #640159 and NWO Vidi grant ERC starting grant H2020 of the output feature along the curve, ERC StG #640159. The work of VA is supported by ERC γ˜∗( )fout (˜γ(t)) then just comes from the vanishing of the starting grant #640159. covariant∇ derivative of volume form and the definition (5). Note that (5) means that, along the path, the ker- References nel is completely determined by the kernel K(γ(t), v) Cohen, T., Geiger, M., and Weiler, M. A General Theory at any point on the path. Supposed further K(γ(t0), v) t0 of Equivariant CNNs on Homogeneous Spaces. 2018a. that we select a reference point x M on the manifold. ∗ ∈ For any point y M that is connected to x by a unique Cohen, T. S. and Welling, M. Group equivariant convolu- ∈ ∗ geodesic, parallel transport with respect to the geodesic tional networks. In ICML, 2016. then unambiguously “share” the kernel at x∗ with y. On the other hand, when y is connected by more than one Cohen, T. S. and Welling, M. Steerable CNNs. In ICLR, geodesics, the general covariance then dictates the relation 2017. between the outputs corresponding to different geodesics. Cohen, T. S., Geiger, M., Koehler, J., and Welling, M. Moreover, this covariance also holds for transporting along Spherical CNNs. In ICLR, 2018b. different paths (not necessarily geodesics) in general. More precisely, we see how different kernels, related again by a Cohen, T. S., Geiger, M., and Weiler, M. Intertwiners be- local change of coordinates, can be compensated by a trans- tween Induced Representations (with Applications to the formation of the input and output feature fields. We hence Theory of Equivariant Neural Networks). 2018c. Covariance in Physics and Convolutional Neural Networks

Cohen, T. S., Weiler, M., Kicanaoglu, B., and Welling, M. Gauge equivariant convolutional networks and the icosa- hedral cnn. arXiv preprint arXiv:1902.04615, 2019. Einstein, A. The foundation of the general theory of rela- tivity. Annalen der Physik, 49(7):769–822, 1916. Hoogeboom, E., Peters, J. W. T., Cohen, T. S., and Welling, M. HexaConv. In ICLR, 2018. Kondor, R. and Trivedi, S. On the generalization of equiv- ariance and convolution in neural networks to the action of compact groups. arXiv preprint arXiv:1802.03690, 2018. ’t Hooft, G. Dimensional reduction in quantum gravity. Conf. Proc., C930308:284–296, 1993. Weiler, M., Geiger, M., Welling, M., Boomsma, W., and Cohen, T. S. 3D Steerable CNNs: Learning Rotation- ally Equivariant Features in Volumetric Data. In NIPS, 2018a. Weiler, M., Hamprecht, F. A., and Storath, M. Learn- ing Steerable Filters for Rotation Equivariant CNNs. In CVPR, 2018b. Winkels, M. and Cohen, T. S. 3D G-CNNs for Pulmonary Nodule Detection. In International Conference on Med- ical Imaging with Deep Learning (MIDL), 2018. Worrall, D. E. and Brostow, G. J. Cubenet: Equivariance to 3d rotation and translation. In ECCV, 2018. Worrall, D. E., Garbin, S. J., Turmukhambetov, D., and Brostow, G. J. Harmonic Networks: Deep Translation and Rotation Equivariance. In CVPR, 2017.