Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Total Page:16

File Type:pdf, Size:1020Kb

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform Orthogonal Recurrent Neural Networks with Scaled Cayley Transform Kyle E. Helfrich * 1 Devin Willmott * 1 Qiang Ye 1 Abstract oversteps the local minimum. This issue significantly di- minishes RNNs’ ability to learn time-based dependencies, Recurrent Neural Networks (RNNs) are designed particularly in problems with long input sequences. to handle sequential data but suffer from vanish- ing or exploding gradients. Recent work on Uni- A variety of architectures have been introduced to overcome tary Recurrent Neural Networks (uRNNs) have this difficulty. The current preferred RNN architectures are been used to address this issue and in some cases, those that introduce gating mechanisms to control when in- exceed the capabilities of Long Short-Term Mem- formation is retained or discarded, such as LSTMs (Hochre- ory networks (LSTMs). We propose a simpler iter & Schmidhuber, 1997) and GRUs (Cho et al., 2014), and novel update scheme to maintain orthogonal at the cost of additional trainable parameters. More re- recurrent weight matrices without using complex cently, the unitary evolution RNN (uRNN) (Arjovsky et al., valued matrices. This is done by parametrizing 2016) uses a parametrization that forces the recurrent weight with a skew-symmetric matrix using the Cayley matrix to remain unitary throughout training, and exhibits transform; such a parametrization is unable to rep- superior performance to LSTMs on a variety of testing prob- resent matrices with negative one eigenvalues, but lems. For clarity, we follow the convention of Wisdom et al. this limitation is overcome by scaling the recur- (2016) and refer to this network as the restricted-capacity rent weight matrix by a diagonal matrix consisting uRNN. of ones and negative ones. The proposed training Since the introduction of uRNNs, orthogonal and unitary scheme involves a straightforward gradient calcu- RNN schemes have increased in both popularity and com- lation and update step. In several experiments, the plexity. Wisdom et al.(2016) use a multiplicative update proposed scaled Cayley orthogonal recurrent neu- method detailed in Tagare(2011) and Wen & Yin(2013) ral network (scoRNN) achieves superior results to expand uRNNs’ capacity to include all unitary matrices. with fewer trainable parameters than other unitary These networks are referred to as full-capacity uRNNs. Jing RNNs. et al.(2016) and Mhammedi et al.(2017) parametrize the space of unitary/orthogonal matrices with Givens rotations and Householder reflections, respectively, but typically opti- 1. Introduction mize over a subset of this space by restricting the number of parameters. Another complex parametrization has also been Deep neural networks have been used to solve numerical explored in Hyland & Gunnar(2017). There is also work problems of varying complexity. RNNs have parameters using unitary matrices in GRUs (i.e. in GORU of Jing et al. that are reused at each time step of a sequential data point (2017)) or near-unitary matrices in RNNs by restricting the and have achieved state of the art performance on many se- singular values of the recurrent matrix to an interval around quential learning tasks. Nearly all optimization algorithms 1 (see Vorontsov et al.(2017)). For other work in addressing for neural networks involve some variant of gradient descent. the vanishing and exploding gradient problem, see Henaff One major obstacle to training RNNs with gradient descent arXiv:1707.09520v3 [stat.ML] 19 Jun 2018 et al.(2017) and Le et al.(2015). is due to vanishing or exploding gradients, as described in Bengio et al.(1993) and Pascanu et al.(2013). This prob- In this paper, we consider RNNs with a recurrent weight lem refers to the tendency of gradients to grow or decay matrix taken from the set of all orthogonal matrices. To exponentially in size, resulting in gradient descent steps that construct the orthogonal weight matrix, we parametrize are too small to be effective or so large that the network it with a skew-symmetric matrix through a scaled Cayley transform. This scaling allows us to avoid the singularity * 1 Equal contribution Department of Mathematics, Univer- issue occurring for −1 eigenvalues that may arise in the sity of Kentucky, Lexington, Kentucky, USA. Correspondence standard Cayley transform. By tuning this scaling matrix, to: Kyle Helfrich <[email protected]>, Devin Willmott <[email protected]>. the network can reach an appropriate orthogonal matrix using a relatively simple gradient descent update step. The Proceedings of the 35 th International Conference on Machine resulting method achieves superior performance on various Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 sequential data tasks. by the author(s). Orthogonal Recurrent Neural Networks with Scaled Cayley Transform The method we present in this paper works entirely with real Wisdom et al.(2016) note that this representation has only matrices, and as such, our results deal only with orthogonal 7n parameters, which is insufficient to represent all unitary and skew-symmetric matrices. However, the method and all matrices for n > 7. In response, they present the full- related theory remain valid for unitary and skew-Hermitian capacity uRNN, which uses a multiplicative update step that matrices in the complex case. The experimental results in is able to reach all unitary matrices of order n. this paper indicate that state of the art performance can be The full-capacity uRNN aims to construct a unitary ma- achieved without using complex matrices to optimize along trix W (k+1) from W (k) by moving along a curve on the the Stiefel manifold. Stiefel manifold fW 2 Cn×n j W ∗W = Ig. For the network optimization, it is necessary to use a curve that is 2. Background in a descent direction of the cost function L := L(W ). In Tagare(2011), Wen & Yin(2013), Wisdom et al.(2016), and 2.1. Recurrent Neural Networks Vorontsov et al.(2017), a descent direction is constructed A recurrent neural network (RNN) is a function with input as B(k)W (k), which is a representation of the derivative (k) parameters U 2 Rn×m, recurrent parameters W 2 Rn×n, operator DL(W ) in the tangent space of the Stiefel man- (k) (k) (k) recurrent bias b 2 Rn, output parameters V 2 Rp×n, and ifold at W . Then, with B W defining the direction output bias c 2 Rp where m is the data input size, n is the of a descent curve, an update along the Stiefel manifold is number of hidden units, and p is the output data size. From obtained using the Cayley transform as m an input sequence x = (x1; x2; :::; xT ) where xi 2 R , the λ −1 λ RNN returns an output sequence y = (y1; y2; :::; yT ) where W (k+1) = I + B(k) I − B(k) W (k) (1) p each yi 2 R is given recursively by 2 2 ht = σ (Uxt + W ht−1 + b) where λ is the learning rate. yt = V ht + c 3. Scaled Cayley Orthogonal RNN n where h = (h0; : : : ; hT −1), hi 2 R is the hidden layer state at time i and σ(·) is the activation function, which is 3.1. Cayley Transform often a pointwise nonlinearity such as a hyperbolic tangent The Cayley transform gives a representation of orthogonal function or rectified linear unit (Nair & Hinton, 2010). matrices without −1 eigenvalues using skew-symmetric matrices (i.e., matrices where AT = −A): 2.2. Unitary RNNs W = (I + A)−1 (I − A) ;A = (I + W )−1 (I − W ) : A real matrix W is orthogonal if it satisfies W T W = I. The complex analog of orthogonal matrices are unitary matrices, This bijection parametrizes the set of orthogonal matri- ∗ which satisfy W W = I, where ∗ denotes the conjugate ces without −1 eigenvalues with skew-symmetric matrices. transpose. Orthogonal and unitary matrices have the desir- This direct and simple parametrization is attractive from a able property that kW xk2 = kxk2 for any vector x. This machine learning perspective because it is closed under addi- property motivates the use of orthogonal or unitary matrices tion: the sum or difference of two skew-symmetric matrices in RNNs to avoid vanishing and exploding gradients, as is also skew-symmetric, so we can use gradient descent detailed in Arjovsky et al.(2016). algorithms like RMSprop (Tieleman & Hinton, 2012) or Arjovsky et al.(2016) follow the framework of the previous Adam (Kingma & Ba, 2014) to train parameters. section for their restricted-capacity uRNN, but introduce a However, this parametrization cannot represent orthogonal parametrization of the recurrent matrix W using a product matrices with −1 eigenvalues, since in this case I +W is not of simpler matrices. This parameterization is given by a invertible. Theoretically, we can still represent matrices with product consisting of diagonal matrices with complex norm eigenvalues that are arbitrarily close to −1; however, it can 1, complex Householder reflection matrices, discrete Fourier require large entries of A. For example, a 2x2 orthogonal transform matrices, and a fixed permutation matrix with the matrix W with eigenvalues ≈ −0:99999 ± 0:00447i and its resulting product being unitary. The Efficient Unitary RNN parametrization A by the Cayley transform is given below, (EURNN) by Jing et al.(2016) and orthogonal RNN (oRNN) where α = 0:99999. by Mhammedi et al.(2017) parametrize in a similar manner p 2 with products of Givens rotation matrices and Householder W = p −α − 1 − α ;A ≈ 0 447:212 reflection matrices, respectively. This can also be seen in 1 − α2 −α −447:212 0 the parametrization through matrix exponentials in Hyland & Gunnar(2017), which does not appear to perform as well Gradient descent algorithms will learn this A matrix very as the restricted-capacity uRNN. slowly, if at all. This difficulty can be overcome through a Orthogonal Recurrent Neural Networks with Scaled Cayley Transform suitable diagonal scaling according to results from Kahan @L(W (A(k))) A(k+1) = A(k) − λ (2006).
Recommended publications
  • Continuous Cayley Transform
    Continuous Cayley Transform A. Jadczyk In complex analysis, the Cayley transform is a mapping w of the complex plane to itself, given by z − i w(z) = : z + i This is a linear fractional transformation of the type az + b z 7! cz + d given by the matrix a b 1 −i C = = : c d 1 i The determinant of this matrix is 2i; and, noticing that (1 + i)2 = 2i; we can 1 as well replace C by 1+i C in order to to have C of determinant one - thus in the group SL(2; C): The transformation itself does not change after such a replacement. The Cayley transform is singular at the point z = −i: It maps the upper complex half{plane fz : Im(z) > 0g onto the interior of the unit disk D = fw : jwj < 1g; and the lower half{plane fz : Im(z) < 0g; except of the singular point, onto the exterior of the closure of D: The real line fz : Im(z) = 0g is mapped onto the unit circle fw : jwj = 1g - the boundary of D: The singular point z = −i can be thought of as being mapped to the point at infinity of the Riemann sphere - the one{point compactification of the complex plane. By solving the equation w(z) = z we easily find that there are two fixed points (z1; z2) of the Cayley transform: p p ! 1 3 1 3 z = + + i − − ; 1 2 2 2 2 p p ! 1 3 1 3 z = − + i − + : 2 2 2 2 2 In order to interpolate between the identity mapping and the Cayley trans- form we will choose a one{parameter group of transformations in SL(2; C) shar- ing these two fixed points.
    [Show full text]
  • Characterization and Computation of Matrices of Maximal Trace Over Rotations
    Characterization and Computation of Matrices of Maximal Trace over Rotations Javier Bernal1, Jim Lawrence1,2 1National Institute of Standards and Technology, Gaithersburg, MD 20899, USA 2George Mason University, 4400 University Dr, Fairfax, VA 22030, USA javier.bernal,james.lawrence @nist.gov [email protected] { } Abstract The constrained orthogonal Procrustes problem is the least-squares problem that calls for a rotation matrix that optimally aligns two cor- responding sets of points in d dimensional Euclidean space. This − problem generalizes to the so-called Wahba’s problem which is the same problem with nonnegative weights. Given a d d matrix M, × solutions to these problems are intimately related to the problem of finding a d d rotation matrix U that maximizes the trace of UM, × i.e., that makes UM a matrix of maximal trace over rotation matrices, and it is well known this can be achieved with a method based on the computation of the singular value decomposition (SVD) of M. As the main goal of this paper, we characterize d d matrices of maximal trace × over rotation matrices in terms of their eigenvalues, and for d = 2, 3, we show how this characterization can be used to determine whether a matrix is of maximal trace over rotation matrices. Finally, although depending only slightly on the characterization, as a secondary goal of the paper, for d = 2, 3, we identify alternative ways, other than the SVD, of obtaining solutions to the aforementioned problems. MSC: 15A18, 15A42, 65H17, 65K99, 93B60 Keywords: Eigenvalues, orthogonal, Procrustes, rotation, singular value decomposition, trace, Wahba 1 Contents 1 Introduction 2 2 Reformulation of Problems as Maximizations of the Trace of Matrices Over Rotations 4 3 Characterization of Matrices of Maximal Trace Over Rota- tions 6 4 The Two-Dimensional Case: Computation without SVD 17 5 The Three-Dimensional Case: Computation without SVD 22 6 The Three-Dimensional Case Revisited 28 Summary 33 References 34 1 Introduction Suppose P = p1,...,pn and Q = q1,...,qn are each sets of n points in Rd.
    [Show full text]
  • Rotation Matrix - Wikipedia, the Free Encyclopedia Page 1 of 22
    Rotation matrix - Wikipedia, the free encyclopedia Page 1 of 22 Rotation matrix From Wikipedia, the free encyclopedia In linear algebra, a rotation matrix is a matrix that is used to perform a rotation in Euclidean space. For example the matrix rotates points in the xy -Cartesian plane counterclockwise through an angle θ about the origin of the Cartesian coordinate system. To perform the rotation, the position of each point must be represented by a column vector v, containing the coordinates of the point. A rotated vector is obtained by using the matrix multiplication Rv (see below for details). In two and three dimensions, rotation matrices are among the simplest algebraic descriptions of rotations, and are used extensively for computations in geometry, physics, and computer graphics. Though most applications involve rotations in two or three dimensions, rotation matrices can be defined for n-dimensional space. Rotation matrices are always square, with real entries. Algebraically, a rotation matrix in n-dimensions is a n × n special orthogonal matrix, i.e. an orthogonal matrix whose determinant is 1: . The set of all rotation matrices forms a group, known as the rotation group or the special orthogonal group. It is a subset of the orthogonal group, which includes reflections and consists of all orthogonal matrices with determinant 1 or -1, and of the special linear group, which includes all volume-preserving transformations and consists of matrices with determinant 1. Contents 1 Rotations in two dimensions 1.1 Non-standard orientation
    [Show full text]
  • Generalization of Rotational Mechanics and Application to Aerospace Systems
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Texas A&amp;M Repository GENERALIZATION OF ROTATIONAL MECHANICS AND APPLICATION TO AEROSPACE SYSTEMS A Dissertation by ANDREW JAMES SINCLAIR Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2005 Major Subject: Aerospace Engineering GENERALIZATION OF ROTATIONAL MECHANICS AND APPLICATION TO AEROSPACE SYSTEMS A Dissertation by ANDREW JAMES SINCLAIR Submitted to Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved as to style and content by: John L. Junkins John E. Hurtado (Co-Chair of Committee) (Co-Chair of Committee) Srinivas R. Vadali John H. Painter (Member) (Member) Helen L. Reed (Head of Department) May 2005 Major Subject: Aerospace Engineering iii ABSTRACT Generalization of Rotational Mechanics and Application to Aerospace Systems. (May 2005) Andrew James Sinclair, B.S., University of Florida; M.S., University of Florida Co–Chairs of Advisory Committee: Dr. John L. Junkins Dr. John E. Hurtado This dissertation addresses the generalization of rigid-body attitude kinematics, dynamics, and control to higher dimensions. A new result is developed that demon- strates the kinematic relationship between the angular velocity in N-dimensions and the derivative of the principal-rotation parameters. A new minimum-parameter de- scription of N-dimensional orientation is directly related to the principal-rotation parameters. The mapping of arbitrary dynamical systems into N-dimensional rotations and the merits of new quasi velocities associated with the rotational motion are studied.
    [Show full text]
  • Medical Issue2012 4
    Journal of Science and Technology Vol. 13 June /2012 ISSN -1605 – 427X Natural and Medical Sciences (NMS No. 1) www.sustech.edu On the Development and Recent Work of Cayley Transforms [I] Adam Abdalla Abbaker Hassan Nile Valley University, Faculty of Education- Department of Mathematics ABSTRACT: This paper deals with the recent work and developments of Cayely transforms which had been rapidly developed. ا : : ﺘﺘﻌﻠﻕ ﻫﺫﻩ ﺍﻝﻭﺭﻗﺔ ﺍﻝﻌﻠﻤﻴﺔ ﺒﺘﻘﺩﻴﻡ ﺍﻝﻌﻤل ﺍﻝﺤﺩﻴﺙ ﻭ ﺍﻝﺘﻁﻭﺭﺍﺕ ﺍﻝﺴﺭﻴﻌﺔ ﻝﺘﺤﻭﻴﻼﺕ ﻜﻤ ﺎﻴﻠﻲ . INTRODUCTION In this study we shall give the basic is slightly simplified variant of that of important transforms and historical Langer (1, 2) . background of the subject, then introduce -1 the accretive operator of Cayley transform Cayley Transforms V = ( A – iI)(A+iI) and a maximal accretive and dissipative First we introduce the transformation B = operator, and relate this with Cayley -1 ( I + T*T) and Transform which yield the homography in function calculus. After this we will study C = T ( I + T*T) -1 in order to study the the fractional power and maximal Cayely transforms. accretive operators; indeed all this had been developed by Sz.Nagy and If the linear transform T is bounded , it is C. Foias (1) . The method of extending and clear that the transformation B appearing accretive (or dissipative) operator to above is also bounded , symmetric , and a maximal one via Cayley transforms such that 0 ≤ B ≤ I ; (3) . modeled on Von Neumann’s theory on symmetric operators, is due to Philips (1) C = TB is then bounded too. If T is a fractional powers of operators A in Hilbert linear transformation with dense domain, space , or even in Banachh space, such as we know that T*, and consequently also T*T, exist, but we know nothing of their – A is infinitesimal generator of a conts (4) one parameter sem–group of contractions, domains of definition .
    [Show full text]
  • Complex Variables Visualized
    Diplomarbeit Revised Version - June 17, 2014 Computer Algebra and Analysis: Complex Variables Visualized Ausgef¨uhrtam Research Institute for Symbolic Computation (RISC) der Johannes Kepler Universit¨atLinz unter Anleitung von Univ.-Prof. Dr. Peter Paule durch Thomas Ponweiser Stockfeld 15 4283 Bad Zell Datum Unterschrift ii Kurzfassung Ziel dieser Diplomarbeit ist die Visualisierung einiger grundlegender Ergeb- nisse aus dem Umfeld der Theorie der modularen Gruppe sowie der modularen Funktionen unter Zuhilfenahme der Computer Algebra Software Mathematica. Die Arbeit gliedert sich in drei Teile. Im ersten Kapitel werden f¨urdiese Ar- beit relevante Begriffe aus der Gruppentheorie zusammengefasst. Weiters wer- den M¨obiusTransformationen eingef¨uhrtund deren grundlegende geometrische Eigenschaften untersucht. Das zweite Kapitel ist der Untersuchung der modularen Gruppe aus al- gebraischer und geometrischer Sicht gewidmet. Der kanonische Fundamen- talbereich der modularen Gruppe sowie die daraus abgeleitete Kachelung der oberen Halbebene wird eingef¨uhrt.Weiters wird eine generelle Methode zum Auffinden von Fundamentalbereichen f¨urUntergruppen der modularen Gruppe vorgestellt, welche sich zentral auf die Konzepte der 2-dimensionalen hyper- bolischen Geometrie st¨utzt. Im dritten Kapitel geben wir einige konkrete Beispiele, wie die aufge- baute Theorie f¨urdie Visualisierung bestimmter mathematischer Sachverhalte angewendet werden kann. Neben der Visualisierung von Graphen modularer Funktionen stellt sich auch der Zusammenhang zwischen modularen Transfor- mationen und Kettenbr¨uchen als besonders sch¨onesErgebnis dar. iii iv Abstract The aim of this diploma thesis is the visualization of some fundamental results in the context of the theory of the modular group and modular functions. For this purpose the computer algebra software Mathematica is utilized. The thesis is structured in three parts.
    [Show full text]
  • Metaplectic Group, Symplectic Cayley Transform, and Fractional Fourier Transforms
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector J. Math. Anal. Appl. 416 (2014) 947–968 Contents lists available at ScienceDirect Journal of Mathematical Analysis and Applications www.elsevier.com/locate/jmaa Metaplectic group, symplectic Cayley transform, and fractional Fourier transforms MauriceA.deGossona,∗,FranzLuefb a University of Vienna, Faculty of Mathematics (NuHAG), 1090 Vienna, Austria b Department of Mathematics, Norwegian University of Science and Technology, 7491 Trondheim, Norway article info abstract Article history: We begin with a survey of the standard theory of the metaplectic group with some Received 15 February 2013 emphasis on the associated notion of Maslov index. We thereafter introduce the Available online 14 March 2014 Cayley transform for symplectic matrices, which allows us to study in detail the Submitted by P.G. Lemarie-Rieusset spreading functions of metaplectic operators, and to prove that they are basically Keywords: quadratic chirps. As a non-trivial application we give new formulae for the fractional Metaplectic and symplectic group Fourier transform in arbitrary dimension. We also study the regularity of the Fractional Fourier transform solutions to the Schrödinger equation in the Feichtinger algebra. Weyl operator © 2014 The Authors. Published by Elsevier Inc. This is an open access article Feichtinger algebra under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Contents 1. Introduction.....................................................................
    [Show full text]
  • Superconnection Character Forms and the Cayley Transform
    0040 9283 RX so3 00 + 00 Pergamon Press plc SUPERCONNECTION CHARACTER FORMS AND THE CAYLEY TRANSFORM DANIEL QUILLEN (Received June 1987) $0. INTRODUCTION LET E = E+ @E- be a super vector bundle equipped with inner product, and let D be a connection on E compatible with the grading and inner product. Let x= where T:E’-+E- is a vector bundle map. In [l l] we introduced the superconnection character form (1) O(E, D,X)=tr,exp u(X’+[D,X]~+D*) which generalizes the well-known definition of the Chern character in de Rham co- homology. This construction was motivated by the problem of proving a local version of the Atiyah-Singer index theorem for families of Dirac operators. The formula (1) is used,in an infinite dimensional context in Bismut’s solution of this problem [3]. From the cohomological viewpoint the differential form (1) is not very interesting. Its cohomology class remains unchanged as X is contracted to zero by tX, so the class doesn’t depend on X. In the first part of this paper we remedy this defect by showing that (1) has a meaning when X is allowed to become infinite in a certain sense. We show that this formula can be written in terms of the (ungraded) subbundle of E given by the graph of T. Moreover the resulting expression makes sense for an arbitrary subbundle W. Thus superconnection character forms can be defined for any correspondence between E+ and E-. We also show that the corresponding cohomology class is the Chern character of the difference W-E-.
    [Show full text]
  • Cayley Transformation and Numerical Stability of Calibration Equation
    Int J Comput Vis (2009) 82: 156–184 DOI 10.1007/s11263-008-0193-x Cayley Transformation and Numerical Stability of Calibration Equation F.C. Wu · Z.H. Wang · Z.Y. Hu Received: 25 May 2008 / Accepted: 22 October 2008 / Published online: 5 December 2008 © US Government 2008 Abstract The application of Cayley transformation to en- it is proved that the standard calibration equation, the Cay- hance the numerical stability of camera calibration is inves- ley calibration family and the S-Cayley calibration family tigated. First, a new calibration equation, called the standard are all some special cases of this generic calibration family. calibration equation, is introduced using the Cayley transfor- mation and its analytical solution is obtained. The standard Keywords Camera calibration · The absolute conic · calibration equation is equivalent to the classical calibration Calibration equation · Cayley transformation · Numerical equation, but it exhibits remarkable better numerical stabil- stability ity. Second, a one-parameter calibration family, called the Cayley calibration family which is equivalent to the stan- dard calibration equation, is obtained using also the Cayley 1 Introduction transformation and it is found that this family is composed Camera calibration is a necessary step to recover 3D met- of those infinite homographies whose rotation has the same ric information from 2D images. Many methods and tech- axis with the rotation between the two given views. The con- niques for camera calibration have been proposed in the last dition number of equations in the Cayley calibration family decades. The existing methods can be roughly classified into varies with the parameter value, and an algorithm to deter- two categories: methods based on a reference calibration mine the best parameter value is provided.
    [Show full text]
  • Spectral Theory of Unbounded Self-Adjoint Operators in Hilbert Spaces
    University of Helsinki Department of Mathematics and Statistics Pro Gradu Thesis Spectral Theory of Unbounded Self-adjoint Operators in Hilbert spaces Timo Weckman 013626675 Instructor: Jari Taskinen April 30, 2020 HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Tiedekunta/Osasto Fakultet/Sektion Faculty Laitos Institution Department Faculty of Science Department of Mathematics and Statistics Tekijä Författare Author Timo Weckman Työn nimi Arbetets titel Title Spectral Theory of Unbounded Self-adjoint Operators in Hilbert spaces Oppiaine Läroämne Subject Mathematics Työn laji Arbetets art Level Aika Datum Month and year Sivumäärä Sidoantal Number of pages Pro gradu -thesis April 2020 44 p. Tiivistelmä Referat Abstract In this work we present a derivation of the spectral theorem of unbounded spectral operators in a Hilbert space. The spectral theorem has several applications, most notably in the theory of quantum mechanics. The theorem allows a self-adjoint linear operator on a Hilbert space to be represented in terms of simpler operators, projections. The focus of this work are the self-adjoint operators in Hilbert spaces. The work begins with the int- roduction of vector and inner product spaces and the denition of the complete inner product space, the Hilbert space. Three classes of bounded linear operators relevant for this work are introduced: self-adjoint, unitary and projection operators. The unbounded self-adjoint operator and its proper- ties are also discussed. For the derivation of the spectral theorem, the basic spectral properties of operators in Hilbert space are presented. The spectral theorem is rst derived for bounded operators. With the denition of basic spectral properties and the introduction of the spectral family, the spectral theorem for bounded self-adjoint operators is presented with a proof.
    [Show full text]
  • Hybrid Attitude Control and Estimation on SO(3)
    Western University Scholarship@Western Electronic Thesis and Dissertation Repository 11-29-2017 4:00 PM Hybrid Attitude Control and Estimation On SO(3) Soulaimane Berkane The University of Western Ontario Supervisor Tayebi, Abdelhamid The University of Western Ontario Graduate Program in Electrical and Computer Engineering A thesis submitted in partial fulfillment of the equirr ements for the degree in Doctor of Philosophy © Soulaimane Berkane 2017 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Acoustics, Dynamics, and Controls Commons, Controls and Control Theory Commons, and the Navigation, Guidance, Control and Dynamics Commons Recommended Citation Berkane, Soulaimane, "Hybrid Attitude Control and Estimation On SO(3)" (2017). Electronic Thesis and Dissertation Repository. 5083. https://ir.lib.uwo.ca/etd/5083 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract This thesis presents a general framework for hybrid attitude control and estimation de- sign on the Special Orthogonal group SO(3). First, the attitude stabilization problem on SO(3) is considered. It is shown that, using a min-switch hybrid control strategy de- signed from a family of potential functions on SO(3), global exponential stabilization on SO(3) can be achieved when this family of potential functions satisfies certain properties. Then, a systematic methodology to construct these potential functions is developed. The proposed hybrid control technique is applied to the attitude tracking problem for rigid body systems.
    [Show full text]
  • Math 512 / Problem Set 11 (Three Pages) • Study/Read: Inner Product
    Due: Wednesday, December 3, at 4 PM Math 512 / Problem Set 11 (three pages) • Study/read: Inner product spaces & Operators on inner product spaces - Ch. 5 & 6 from LADW by Treil. - Ch. 8 & 9 from Linear Algebra by Hoffman & Kunze. (!) Make sure that you understand perfectly the definitions, examples, theorems, etc. R 1 Legendre Polynomials Recall that (f, g) = −1 f(x)g(x)dx is an inner product on C(I, R), where I = [−1, 1]. The system of polynomials (L0(t), L1(t), ... , Ln−1(t)) resulting from the n−1 Gram-Schmidt orthonormalization of the standard basis E := (1, t, ... , t ) of Pn(R) are called the Legendre polynomials. Google the term “Legendre polynomials” and learn about their applications. 1) Answer / do the following: a) Compute L0, L1, L2, L3. 3 b) Write t − t + 1 as a linear combination of L0, L1, L2, L3. c) Show that in general, one has deg(Lk) = k. k d) What is the coefficient of t in Lk(t)? 2) Which of the following pairs of matrices are unitarily equivalent: 0 1 0 2 0 0 0 1 0 1 0 0 1 0! 0 1! a) & b) −1 0 0 & 0 −1 0 c) −1 0 0 & 0 −i 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 i [Hint: Unitarily equivalent matrices have the same characteristic polynomial (WHY), thus the same eigenvalues including multiplicities, the same trace, determinant, etc.] 3) For each of the following matrices over F = R decide whether they are unitarily diago- nalizable, and if so, find the unitary matrix P over R such that P AP −1 is diagonal.
    [Show full text]