On the Continuity of Rotation Representations in Neural Networks

On the Continuity of Rotation Representations in Neural Networks Yi Zhou∗ Connelly Barnes∗ Jingwan Lu University of Southern California Adobe Research Adobe Research [email protected] [email protected] [email protected] Jimei Yang Hao Li Adobe Research University of Southern California, Pinscreen [email protected] USC Institute for Creative Technologies [email protected] Abstract joints in skeletons [30]. Many of these works represent 3D rotations using 3D or 4D representations such as quater- In neural networks, it is often desirable to work with var- nions, axis-angles, or Euler angles. ious representations of the same space. For example, 3D However, for 3D rotations, we found that 3D and 4D rep- rotations can be represented with quaternions or Euler an- resentations are not ideal for network regression, when the gles. In this paper, we advance a definition of a continuous full rotation space is required. Empirically, the converged representation, which can be helpful for training deep neu- networks still produce large errors at certain rotation an- ral networks. We relate this to topological concepts such as gles. We believe that this actually points to deeper topolog- homeomorphism and embedding. We then investigate what ical problems related to the continuity in the rotation rep- are continuous and discontinuous representations for 2D, resentations. Informally, all else being equal, discontinuous 3D, and n-dimensional rotations. We demonstrate that for representations should in many cases be “harder” to approx- 3D rotations, all representations are discontinuous in the imate by neural networks than continuous ones. Theoreti- real Euclidean spaces of four or fewer dimensions. Thus, cal results suggest that functions that are smoother [33] or widely used representations such as quaternions and Eu- have stronger continuity properties such as in the modulus ler angles are discontinuous and difficult for neural net- of continuity [32, 10] have lower approximation error for a works to learn. We show that the 3D rotations have con- given number of neurons. tinuous representations in 5D and 6D, which are more suit- Based on this insight, we first present in Section 3 our able for learning. We also present continuous represen- definition of the continuity of representation in neural net- tations for the general case of the n dimensional rotation works. We illustrate this definition based on a simple exam- group SO(n). While our main focus is on rotations, we also ple of 2D rotations. We then connect it to key topological show that our constructions apply to other groups such as concepts such as homeomorphism and embedding. the orthogonal group and similarity transforms. We finally Next, we present in Section 4 a theoretical analysis of the present empirical results, which show that our continuous continuity of rotation representations. We first investigate in rotation representations outperform discontinuous ones for Section 4.1 some discontinuous representations, such as Eu- several practical problems in graphics and vision, includ- ler angle and quaternion representations. We show that for ing a simple autoencoder sanity test, a rotation estimator 3D rotations, all representations are discontinuous in four or for 3D point clouds, and an inverse kinematics solver for fewer dimensional real Euclidean space with the Euclidean 3D human poses. topology. We then investigate in Section 4.2 some continuous rotation representations. For the n dimensional rotation group SO(n), we present a continuous n2 − n dimensional 1. Introduction representation. We additionally present an option to reduce the dimensionality of this representation by an additional 1 Recently, there has been an increasing number of appli- to n − 2 dimensions in a continuous way. We show that cations in graphics and vision, where deep neural networks these allow us to represent 3D rotations continuously in 6D are used to perform regressions on rotations. This has been and 5D. While we focus on rotations, we show how our con- done for tasks such as pose estimation from images [13, 31] tinuous representations can also apply to other groups such and from point clouds [15], structure from motion [29], and as orthogonal groups O(n) and similarity transforms. skeleton motion synthesis, which generates the rotations of Finally, in Section 5 we test our ideas empirically. We conduct experiments on 3D rotations and show that our ∗Authors have equal contribution. 6D and 5D continuous representations always outperform 1 the discontinuous ones for several tasks, including a rota- Continuity for rotations. Grassia et al. [16] pointed out tion autoencoder “sanity test,” rotation estimation for 3D that Euler angles and quaternions are not suitable for ori- point clouds, and 3D human pose inverse kinematics learn- entation differentiation and integration operations and pro- ing. We note that in our rotation autoencoder experiments, posed exponential map as a more robust rotation represen- discontinuous representations can have up to 6 to 14 times tation. Saxena et al. [28] observed that the Euler angles and higher mean errors than continuous representations. Fur- quaternions cause learning problems due to discontinuities. thermore they tend to converge much slower while still pro- However, they did not propose general rotation representa- ducing large errors over 170◦ at certain rotation angles even tions other than direct regression of 3x3 matrices, since they after convergence, which we believe are due to the discon- focus on learning representations for objects with specific tinuities being harder to fit. This phenomenon can also symmetries. be observed in the experiments on different rotation repre- Neural networks for 3D shape pose estimation. Deep sentations for homeomorphic variational auto-encoding in networks have been applied to estimate the 6D poses of ob- Falorsi et al. [14], and in practical applications, such as 6D ject instances from RGB images, depth maps or scanned object pose estimation in Xiang et al. [31]. point clouds. Instead of directly predicting 3x3 matri- We also show that one can perform direct regression on ces that may not correspond to valid rotations, they typ- 3x3 rotation matrices. Empirically this approach introduces ically use more compact rotation representations such as larger errors than our 6D representation as shown in Sec- quaternion [31, 21, 20] or axis-angle [29, 15, 13]. In tion 5.2. Additionally, for some applications such as inverse PoseCNN [31], the authors reported a high percentage of and forward kinematics, it may be important for the network errors between 90◦ and 180◦, and suggested that this is itself to produce orthogonal matrices. We therefore require mainly caused by the rotation ambiguity for some symmet- an orthogonalization procedure in the network. In particu- ric shapes in the test set. However, as illustrated in their lar, if we use a Gram-Schmidt orthogonalization, we then paper, the proportion of errors between 90◦ to 180◦ is still effectively end up with our 6D representation. high even for non-symmetric shapes. In this paper, we ar- Our contributions are: 1) a definition of continuity for gue that discontinuity in these representations could be one rotation representations, which is suitable for neural net- cause of such errors. works; 2) an analysis of discontinuous and continuous rep- Neural networks for inverse kinematics. Recently, re- resentations for 2D, 3D, and n-D rotations; 3) new formulas searchers have been interested in training neural networks for continuous representations of SO(3) and SO(n); 4) em- to solve inverse kinematics equations. This is because such pirical results supporting our theoretical views and that our networks are faster than traditional methods and differen- continuous representations are more suitable for learning. tiable so that they can be used in more complex learning tasks such as motion re-targeting [30] and video-based hu- 2. Related Work man pose estimation [19]. Most of these works represented rotations using quaternions or axis-angle [18, 19]. Some In this section, we will first establish some context for works also used other 3D representations such as Euler an- our work in terms of neural network approximation theory. gles and Lie algebra [19, 34], and penalized the joint posi- Next, we discuss related works that investigate the continu- tion errors. Csiszar et al. [11] designed networks to output ity properties of different rotation representations. Finally, the sine and cosine of the Euler angles for solving the in- we will report the types of rotation representations used in verse kinematics problems in robotic control. Euler angle previous learning tasks and their performance. representations are discontinuous for SO(3) and can result Neural network approximation theory. We review a in large regression errors as shown in the empirical test in brief sampling of results from neural network approxima- Section 5. However, those authors limited the rotation an- tion theory. Hornik [17] showed that neural networks can gles to be within a certain range, which avoided the discon- approximate functions in the Lp space to arbitrary accuracy p tinuity points and thus achieved very low joint alignment if the L norm is used. Barron et al. [6] showed that if a errors in their test. However, many real-world tasks require function has certain properties in its Fourier transform, then −2 the networks to be able to output the full range of rotations. at most O( ) neurons are needed to obtain an order of ap- In such cases, continuous rotation representations will be a proximation . Chapter 6.4.1 of LeCun et al. [23] provides better choice. a more thorough overview of such results. We note that results for continuous functions indicate that functions that 3. Definition of Continuous Representation have better smoothness properties can have lower approximation error for a particular number of neurons [32, 10, 33]. In this section, we begin by defining the terminology we For discontinuous functions, Llanas et al. [24] showed that will use in the paper.

On the Continuity of Rotation Representations in Neural Networks

Learning to Infer Implicit Surfaces Without 3D Supervision

CEO & Co-Founder, Pinscreen, Inc. Distinguished Fellow, University Of

Innovations Dialogue. DEEPFAKES, TRUST & INTERNATIONAL SECURITY 25 AUGUST 2021 | 09:00-17:30 CEST

Employing Optical Flow on Convolutional Recurrent Structures for Deepfake Detection

Rapid Avatar Capture and Simulation Using Commodity Depth Sensors

ARCH: Animatable Reconstruction of Clothed Humans

Headphones Needed: ☒ YES ☐ NO

Chloe A. Legendre E-Mail: [email protected] Cell: (443) 690-6924

Protecting World Leaders Against Deep Fakes

Arxiv:2012.02512V3 [Cs.CV] 20 Aug 2021 a High Level of Realism

Cultural Leaders Annual Meeting of the New Champions 2019

View the Revised S2018 Advance Program