Learning the Irreducible Representations of Commutative Lie Groups Groups Only

Learning the Irreducible Representations of Commutative Lie Groups Taco Cohen [email protected] Max Welling [email protected] Machine Learning Group, University of Amsterdam Abstract for meaning for future work. We present a new probabilistic model of compact What do we mean, intuitively, when we speak of invari- commutative Lie groups that produces invariant- ance and disentangling? A disentangled representation is equivariant and disentangled representations of one that explicitly represents the distinct factors of varia- data. To define the notion of disentangling, we tion in the data. For example, visual data (i.e. pixels) can borrow a fundamental principle from physics that be thought of as a composition of object identity, position is used to derive the elementary particles of a sys- and pose, lighting conditions, etc. Once disentangling is tem from its symmetries. Our model employs a achieved, invariance follows easily: to build a representa- newfound Bayesian conjugacy relation that en- tion that is invariant to the transformation of a factor of variables fully tractable probabilistic inference over ation (e.g. object position) that is considered a nuisance for compact commutative Lie groups – a class that a particular task (e.g. object classification), one can sim- includes the groups that describe the rotation and ply ignore the units in the representation that encode the cyclic translation of images. We train the model nuisance factor. on pairs of transformed image patches, and show To get a mathematical handle on the concept of disentan- that the learned invariant representation is highly gling, we borrow a fundamental principle from physics, effective for classification. which we refer to as Weyl’s principle, following Kanatani (1990). In physics, this idea is used to tease apart (i.e. disentangle) the elementary particles of a physical system 1. Introduction from mere measurement values that have no inherent phys- Recently, the field of deep learning has produced some re- ical significance. We apply this principle to the area of vi- markable breakthroughs. The hallmark of the deep learn- sion, for after all, pixels are nothing but physical measure- ing approach is to learn multiple layers of representa- ments. tion of data, and much work has gone into the develop- Weyl’s principle presupposes a symmetry group that acts ment of representation learning modules such as RBMs on the data. By this we mean a set of transformations and their generalizations (Welling et al., 2005), and autoen- that does not change the “essence” of the measured phe- coders (Vincent et al., 2008). However, at this point it is not nomenon, although it may change the “superficial appear- arXiv:1402.4437v2 [cs.LG] 25 May 2014 quite clear what characterizes a good representation. In this ance”, i.e. the measurement values. As a concrete example paper, we take a fresh look at the basic principles behind that we will use throughout this paper, consider the group unsupervised representation learning from the perspective known as SO(2), acting on images by 2D rotation about the of Lie group theory1. origin. A transformation from this group (a rotation) may Various desiderata for learned representations have change the value of every pixel in the image, but leaves in- been expressed: representations should be meaning- variant the identity of the imaged object. Weyl’s principle ful (Bengio & Lecun, 2014), invariant (Goodfellow et al., states that the elementary components of this system are 2009), abstract and disentangled (Bengio et al., 2013), but given by the irreducible representations of the symmetry so far most of these notions have not been defined in a group – a concept that will be explained in this paper. mathematically precise way. Here we focus on the no- Although this theoretical principle is widely applicable, tions of invariance and disentangling, leaving the search we demonstrate it for real-valued compact commutative Proceedings of the 31 st International Conference on Machine 1. We will at times assume a passing familiarity with Lie Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy- groups, but the main ideas of this paper should be accessible to right 2014 by the author(s). a broad audience. Learning the Irreducible Representations of Commutative Lie Groups groups only. We introduce a probabilistic model that de- 2. Preliminaries scribes a representation of such a group, and show how it can be learned from pairs of images related by arbitrary and 2.1. Equivalence, Invariance and Reducibility unobserved transformations in the group. Compact com- In this section, we discuss three fundamental concepts on mutative groups are also known as toroidal groups, so we which the analysis in the rest of this paper is based: equiv- refer to this model as Toroidal Subgroup Analysis (TSA). alence, invariance and reducibility. Using a novel conjugate prior, the model integrates proba- RD bility theory and Lie group theory in a very elegant way. Consider a function Φ: X that assigns to each pos- RD → All the relevant probabilistic quantities such as normal- sible data point x a class-label (X = 1,...,L ) ∈ R{L } ization constants, moments, KL-divergences, the posterior or some distributed representation (e.g. X = ). Such a density over the transformation group, the marginal density function induces an equivalence relation on the input space RD: we say that two vectors x, y RD are Φ-equivalent if in data space, and their gradients can be obtained in closed ∈ form. they are mapped onto the same representation by Φ. Sym- bolically, x y Φ(x)=Φ(y). ≡Φ ⇔ 1.1. Related work Every equivalence relation on the input space fully de- termines a symmetry group acting on the space. This The first to propose a model and algorithm for learning group, call it G, contains all invertible transformations Lie group representations from data were Rao & Ruder- ρ : RD RD that leave Φ invariant: G = ρ x man (1999). This model deals only with one-parameter RD : Φ(ρ→(x)) = Φ(x) . G describes the symmetries{ | ∀ of∈ groups, a limitation that was later lifted by Miao and Rao Φ, or, stated differently, the} label function/representation Φ (2007). Both works rely on MAP-inference proceduresthat is invariant to transformations in G. Hence, we can speak can only deal with infinitesimally small transformations. of G-equivalence: x y ρ G : ρ(x) = y. For This problem was solved by Sohl-Dickstein et al. (2010) G example, if some elements≡ of⇔G ∃act∈ by rotating the image, using an elegant adaptive smoothing technique, making it two images are G-equivalent if they are rotations of each possible to learn from large transformations. This model other. uses a general linear transformation to diagonalize a one- parameter group, and combines multiple one-parameter Before we can introduce Weyl’s principle, we need one groups multiplicatively. more concept: the reduction of a group representation (Kanatani, 1990). Let us restrict our attention to linear Other, non-group-theoretical approaches to learning representations of Lie groups: ρ becomes a matrix-valued transformations and invariant representations ex- function ρ of an abstract group element g G, such that ist (Memisevic & Hinton, 2010). These gating models g ∈ g,h G : ρ ◦ = ρ ρ . In general, every coordinate y were found to perform a kind of joint eigenspace analy- g h g h i of∀ y =∈ρ x can depend on every coordinate x of x. Now, sis (Memisevic, 2012), which is somewhat similar to the g j since x is G-equivalent to y, it makes no sense to consider irreducible reduction of a toroidal group. the coordinates xi as separate quantities; we can only con- Motivated by a number of statistical phenomena observed sider the vector x as a single unit because the symmetry in natural images, Cadieu & Olshausen (2012) describe a transformations ρg tangle all coordinates. In other words, model that decomposes a signal into invariant amplitudes we cannot say that coordinate xi is an independent part of ′ and covariant phase variables. the aggregate x, because a mapping x x = ρgx that is supposed to leave the intrinsic properties→ of x unchanged, None of the mentioned methods take into account the full will in fact induce induce a functional dependence between uncertainty over transformation parameters, as does TSA. all supposed parts x′ and x . Due to exact or approximate symmetries in the data, there i j is in general no unique transformation relating two images, However, we are free to change the basis of the measure- so that only a multimodal posterior distribution over the ment space. It may be possible to use a change of basis to group gives a complete description of the geometric situa- expose an invariant subspace, i.e. a subspace V RD that tion. Furthermore, posterior inference in our model is per- is mapped onto itself by every transformation in⊂ the group: formed by a very fast feed-forward procedure, whereas the g G : x V ρgx V . If such a subspace ex- MAP inference algorithm by Sohl-Dicksteint et al. requires ists∀ ∈ and its orthogonal∈ ⇒ complement∈ V ⊥ RD is also an a more expensive iterative optimization. invariant subspace, then it makes sense to⊂ consider the two parts of x that lie in V and V ⊥ to be distinct, because they remain distinct under symmetry transformations. Let W be a change of basis matrix that exposes the invari- Learning the Irreducible Representations of Commutative Lie Groups ant subspaces, that is, the total energy (norm) of the image, they can be repre- 1 sented by orthogonal matrices acting on vectorized images. ρg −1 ρg = W 2 W , (1) ρg As is well known, commuting matrices can be simulta- neously diagonalized, so one could represent a toroidal for all g G.

Learning the Irreducible Representations of Commutative Lie Groups Groups Only

Notes 9: Eli Cartan's Theorem on Maximal Tori

Simultaneous Diagonalization and SVD of Commuting Matrices

Math 210C. Simple Factors 1. Introduction Let G Be a Nontrivial Connected Compact Lie Group, and Let T ⊂ G Be a Maximal Torus

The Algebra Generated by Three Commuting Matrices

Arxiv:1705.10957V2 [Math.AC] 14 Dec 2017 Ideal Called Is and Commutator X Y Vraﬁeld a Over Let Matrix Eae Leri Es Is E Sdﬁerlvn Notions

Lecture 2: Spectral Theorems

Loop Groups and Twisted K-Theory III

On Low Rank Perturbations of Complex Matrices and Some Discrete Metric Spaces∗

Exceptional Groups of Lie Type: Subgroup Structure and Unipotent Elements

Representation Theory

Lecture Notes: Basic Group and Representation Theory

On the Borel Transgression in the Fibration $ G\Rightarrow G/T$