Cheap Orthogonal Constraints in Neural Networks: a Simple Parametrization of the Orthogonal and Unitary Group

Total Page:16

File Type:pdf, Size:1020Kb

Cheap Orthogonal Constraints in Neural Networks: a Simple Parametrization of the Orthogonal and Unitary Group Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group Mario Lezcano-Casado 1 David Mart´ınez-Rubio 2 Abstract Recurrent Neural Networks (RNNs). In RNNs the eigenval- ues of the gradient of the recurrent kernel explode or vanish We introduce a novel approach to perform first- exponentially fast with the number of time-steps whenever order optimization with orthogonal and uni- the recurrent kernel does not have unitary eigenvalues (Ar- tary constraints. This approach is based on a jovsky et al., 2016). This behavior is the same as the one parametrization stemming from Lie group theory encountered when computing the powers of a matrix, and through the exponential map. The parametrization results in very slow convergence (vanishing gradient) or a transforms the constrained optimization problem lack of convergence (exploding gradient). into an unconstrained one over a Euclidean space, for which common first-order optimization meth- In the seminal paper (Arjovsky et al., 2016), they note that ods can be used. The theoretical results presented unitary matrices have properties that would solve the ex- are general enough to cover the special orthogo- ploding and vanishing gradient problems. These matrices nal group, the unitary group and, in general, any form a group called the unitary group and they have been connected compact Lie group. We discuss how studied extensively in the fields of Lie group theory and Rie- this and other parametrizations can be computed mannian geometry. Optimization methods over the unitary efficiently through an implementation trick, mak- and orthogonal group have found rather fruitful applications ing numerically complex parametrizations usable in RNNs in recent years (cf. Section 2). at a negligible runtime cost in neural networks. In parallel to the work on unitary RNNs, there has been an in- In particular, we apply our results to RNNs with creasing interest for optimization over the orthogonal group orthogonal recurrent weights, yielding a new ar- and the Stiefel manifold in neural networks (Harandi & Fer- chitecture called EXPRNN. We demonstrate how nando, 2016; Ozay & Okatani, 2016; Huang et al., 2017; our method constitutes a more robust approach to Bansal et al., 2018). As shown in these papers, orthogonal optimization with orthogonal constraints, show- constraints in linear and CNN layers can be rather beneficial ing faster, accurate, and more stable convergence for the generalization of the network as they act as a form 1 in several tasks designed to test RNNs. of implicit regularization. The main problem encountered while using these methods in practice was that optimiza- tion with orthogonality constraints was neither simple nor 1. Introduction computationally cheap. We aim to close that bridge. Training deep neural networks presents many difficulties. In this paper we present a simple yet effective way to ap- One of the most important is the exploding and vanishing proach problems that present orthogonality or unitary con- arXiv:1901.08428v3 [cs.LG] 30 May 2019 gradient problem, as first observed and studied in (Bengio straints. We build on results from Riemannian geometry et al., 1994). This problem arises from the ill-conditioning and Lie group theory to introduce a parametrization of these of the function defined by a neural network as the number groups, together with theoretical guarantees for it. of layers increase. This issue is particularly problematic in This parametrization has several advantages, both theoreti- cal and practical: 1Mathematical Institute, University of Oxford, Oxford, United Kingdom 2Department of Computer Science, University of Oxford, Oxford, United Kingdom. Correspondence to: Mario Lezcano 1. It can be used with general purpose optimizers. Casado <[email protected]>. 2. The parametrization does not create additional minima Proceedings of the 36 th International Conference on Machine or saddle points in the main parametrization region. Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). 3. It is possible to use a structured initializer to take ad- 1Implementation can be found at https://github.com/ vantage of the structure of the eigenvalues of the or- Lezcano/expRNN thogonal matrix. Cheap Orthogonal Constraints in Neural Networks 4. Other approaches need to enforce hard orthogonality used in practice are similar to those used in the real case, constraints, ours does not. cf. (Manton, 2002; Abrudan et al., 2008). Most previous approaches fail to satisfy one or many of Unitary RNNs. The idea of parametrizing the matrix that these points. The parametrization in (Helfrich et al., 2018) defines an RNN by a unitary matrix was first proposed and (Maduranga et al., 2018) comply with most of these in (Arjovsky et al., 2016). Their parametrization centers on points but they suffer degeneracies that ours solves (cf. Re- a matrix-based fast Fourier transform-like (FFT) approach. mark in Section 4.2). We compare our architecture with As pointed out in (Jing et al., 2017), this representation, other methods to optimize over SO(n) and U(n) in the although efficient in memory, does not span the whole space remarks in Sections 3 and 4. of unitary matrices, giving the model reduced expressive- ness. This second paper solves this issue in the same way High-level idea The matrix exponential maps skew- it is solved when computing the FFT—using log(n) iter- symmetric matrices to orthogonal matrices transforming ated butterfly operations. A different approach to perform an optimization problem with orthogonal constraints into this optimization was presented in (Wisdom et al., 2016; an unconstrained one. We use Pade´ approximants and the Vorontsov et al., 2017). Although not mentioned explic- scale-squaring trick to compute machine-precision approx- itly in either of the papers, this second approach consists imations of the matrix exponential and its gradient. We of a retraction-based Riemannian gradient descent via the can implement the parametrization with negligible overhead Cayley transform. The paper (Hyland & Ratsch,¨ 2017) pro- observing that it does not depend on the batch size. poses to use the exponential map on the complex case, but Structure of the Paper In Section 3, we introduce the they do not perform an analysis of the algorithm or provide parametrization and present the theoretical results that a way to approximate the map nor the gradients. A third support the efficiency of the exponential parametrization. approach has been presented in (Mhammedi et al., 2017) In Section 4, we explain the implementation details of via the use of Householder reflections. Finally, in (Helfrich the layer. Finally, in Section 5, we present the numeri- et al., 2018) and the follow-up (Maduranga et al., 2018), a cal experiments confirming the numerical advantages of this parametrization of the orthogonal group via the use of the parametrization. Cayley transform is proposed. We will have a closer look at these methods and their properties in Sections 3 and 4. 2. Related Work 3. Parametrization of Compact Lie Groups Riemannian gradient descent. There is a vast literature on optimization methods on Riemannian manifolds, and in For a much broader introduction to Riemannian geometry particular for matrix manifolds, both in the deterministic and Lie group theory see Appendix A. We will restrict 2 and the stochastic setting. Most of the classical convergence our attention to the special orthogonal and unitary case, results from the Euclidean setting have been adapted to but the results in this section can be generalized to any the Riemannian one (Absil et al., 2009; Bonnabel, 2013; connected compact matrix Lie group equipped with a bi- Boumal et al., 2016; Zhang et al., 2016; Sato et al., 2017). invariant metric. We prove the results for general connected On the other hand, the problem of adapting popular opti- compact matrix Lie groups in Appendix C. mization algorithms like RMSPROP (Tieleman & Hinton, 2012), ADAM (Kingma & Ba, 2014) or ADAGRAD (Duchi 3.1. The Lie algebras of SO(n) and U(n) et al., 2011) is a topic of current research (Kumar Roy et al., We are interested in the study of parametrizations of the 2018; Becigneul & Ganea, 2019). special orthogonal group Optimization over the Orthogonal and Unitary groups. n×n The first formal study of optimization methods on manifolds SO(n) = fB 2 R j B|B = I; det(B) = 1g with orthogonal constraints (Stiefel manifolds) is found in the thesis (Smith, 1993). These ideas were later simplified and the unitary group in the seminal paper (Edelman et al., 1998), where they n×n ∗ were generalized to Grassmannian manifolds and extended U(n) = fB 2 C j B B = Ig: to get the formulation of the conjugate gradient algorithm and the Newton method for these manifolds. After that, These two sets are compact and connected Lie groups. Fur- n×n n×n optimization with orthogonal constraints has been a central thermore, when seen as submanifolds of R (resp. C ) topic of study in the optimization community. A rather in equipped with the metric induced from the ambient space depth literature review of existing methods for optimization 2Note that we consider just matrices with determinant equal with orthogonality constraints can be found in (Jiang & Dai, to one, since the full group of orthogonal matrices O(n) is not 2015). When it comes to the unitary case, the algorithms connected, and hence, not amenable to gradient descent algorithms. Cheap Orthogonal Constraints in Neural Networks hX; Y i = tr(X|Y ) (resp. tr(X∗Y )), they inherit a bi- 3.3. From Riemannian to Euclidean optimization invariant metric, meaning that the metric is invariant with In this section we describe some properties of the expo- respect to left and right multiplication by matrices of the nential parametrization which make it a sound choice for group. This is clear given that the matrices of the two groups optimization with orthogonal constraints in neural networks.
Recommended publications
  • The Grassmann Manifold
    The Grassmann Manifold 1. For vector spaces V and W denote by L(V; W ) the vector space of linear maps from V to W . Thus L(Rk; Rn) may be identified with the space Rk£n of k £ n matrices. An injective linear map u : Rk ! V is called a k-frame in V . The set k n GFk;n = fu 2 L(R ; R ): rank(u) = kg of k-frames in Rn is called the Stiefel manifold. Note that the special case k = n is the general linear group: k k GLk = fa 2 L(R ; R ) : det(a) 6= 0g: The set of all k-dimensional (vector) subspaces ¸ ½ Rn is called the Grassmann n manifold of k-planes in R and denoted by GRk;n or sometimes GRk;n(R) or n GRk(R ). Let k ¼ : GFk;n ! GRk;n; ¼(u) = u(R ) denote the map which assigns to each k-frame u the subspace u(Rk) it spans. ¡1 For ¸ 2 GRk;n the fiber (preimage) ¼ (¸) consists of those k-frames which form a basis for the subspace ¸, i.e. for any u 2 ¼¡1(¸) we have ¡1 ¼ (¸) = fu ± a : a 2 GLkg: Hence we can (and will) view GRk;n as the orbit space of the group action GFk;n £ GLk ! GFk;n :(u; a) 7! u ± a: The exercises below will prove the following n£k Theorem 2. The Stiefel manifold GFk;n is an open subset of the set R of all n £ k matrices. There is a unique differentiable structure on the Grassmann manifold GRk;n such that the map ¼ is a submersion.
    [Show full text]
  • Building Deep Networks on Grassmann Manifolds
    Building Deep Networks on Grassmann Manifolds Zhiwu Huangy, Jiqing Wuy, Luc Van Goolyz yComputer Vision Lab, ETH Zurich, Switzerland zVISICS, KU Leuven, Belgium fzhiwu.huang, jiqing.wu, [email protected] Abstract The popular applications of Grassmannian data motivate Learning representations on Grassmann manifolds is popular us to build a deep neural network architecture for Grassman- in quite a few visual recognition tasks. In order to enable deep nian representation learning. To this end, the new network learning on Grassmann manifolds, this paper proposes a deep architecture is designed to accept Grassmannian data di- network architecture by generalizing the Euclidean network rectly as input, and learns new favorable Grassmannian rep- paradigm to Grassmann manifolds. In particular, we design resentations that are able to improve the final visual recog- full rank mapping layers to transform input Grassmannian nition tasks. In other words, the new network aims to deeply data to more desirable ones, exploit re-orthonormalization learn Grassmannian features on their underlying Rieman- layers to normalize the resulting matrices, study projection nian manifolds in an end-to-end learning architecture. In pooling layers to reduce the model complexity in the Grass- summary, two main contributions are made by this paper: mannian context, and devise projection mapping layers to re- spect Grassmannian geometry and meanwhile achieve Eu- • We explore a novel deep network architecture in the con- clidean forms for regular output layers. To train the Grass- text of Grassmann manifolds, where it has not been possi- mann networks, we exploit a stochastic gradient descent set- ble to apply deep neural networks.
    [Show full text]
  • Stiefel Manifolds and Polygons
    Stiefel Manifolds and Polygons Clayton Shonkwiler Department of Mathematics, Colorado State University; [email protected] Abstract Polygons are compound geometric objects, but when trying to understand the expected behavior of a large collection of random polygons – or even to formalize what a random polygon is – it is convenient to interpret each polygon as a point in some parameter space, essentially trading the complexity of the object for the complexity of the space. In this paper I describe such an interpretation where the parameter space is an abstract but very nice space called a Stiefel manifold and show how to exploit the geometry of the Stiefel manifold both to generate random polygons and to morph one polygon into another. Introduction The goal is to describe an identification of polygons with points in certain nice parameter spaces. For example, every triangle in the plane can be identified with a pair of orthonormal vectors in 3-dimensional 3 3 space R , and hence planar triangle shapes correspond to points in the space St2(R ) of all such pairs, which is an example of a Stiefel manifold. While this space is defined somewhat abstractly and is hard to visualize – for example, it cannot be embedded in Euclidean space of fewer than 5 dimensions [9] – it is easy to visualize and manipulate points in it: they’re just pairs of perpendicular unit vectors. Moreover, this is a familiar and well-understood space in the setting of differential geometry. For example, the group SO(3) of rotations of 3-space acts transitively on it, meaning that there is a rotation of space which transforms any triangle into any other triangle.
    [Show full text]
  • The Biflag Manifold and the Fixed Points of a Unipotent Transformation on the Flag Manifold* Uwe Helmke Department of Mathematic
    The Biflag Manifold and the Fixed Points of a Unipotent Transformation on the Flag Manifold* Uwe Helmke Department of Mathematics Regensburg University 8400 Regensburg, Federal Republic of Germany and Mark A. Shaymanf Electrical Engineering Department and Systems Research Center University of Mayland College Park, Maryland 20742 Submitted by Paul A. Fuhrmann ‘, ABSTRACT We investigate the topology of the submanifold of the partial flag manifold consisting of those flags S, c . c S,, in___@ (.F = R or C) that are fixed by a given unipotent A E SL,(.F) and for which the restrictions AIS, have prescribed Jordan type. It is shown that this submanifold has a compact nonsingular projective variety as a strong deformation retract. This variety, the b&g manifold, consists of rectangular arrays of subspaces which are nested along each row and column. It is proven that the biflag manifold has the homology type (mod 2 for F = R) of a product of Grassmannians. A partition of the biflag manifold into a finite union of affine spaces is also given. In the special case when the biflag manifold is a flag manifold, this partition coincides with the Bruhat cell decomposition. *Presented at the 824th meeting of the American Mathematical Society, Claremont, California, November 1985. ‘Research partially supported by the National Science Foundation under Grant ECS- 8301015. LINEAR ALGEBRA AND ITS APPLlCATlONS 92:125-159 (1987) 125 0 Elsevier Science Publishing Co., Inc., 1987 52 Vanderbilt Ave., New York, NY 10017 0024-3795/87/$3.50 126 UWE HELMKE AND MARK A. SHAYMAN 1. INTRODUCTION Let n be a positive integer, and let k = (k,, .
    [Show full text]
  • Arxiv:1610.07968V3 [Math.AT] 11 Jan 2019
    EQUIVARIANT COHOMOLOGY RINGS OF THE REAL FLAG MANIFOLDS CHEN HE Abstract. We give Leray-Borel-type descriptions for the mod-2 and the rational equivariant cohomology rings of the real and the oriented flag manifolds under the canonical torus or 2-torus actions. Contents 1. Introduction 1 2. Preliminaries 2 2.1. Real flag manifolds 2 2.2. Equivariant cohomology 5 3. The main tools and strategy 7 3.1. Mod-2 and Z1/2-coefficient ordinary cohomology rings of the real Stiefel manifolds 7 3.2. The Leray-Borel theorems 8 3.3. The main strategy 10 4. Mod-2 equivariant cohomology rings of the real flag manifolds 10 4.1. Reduction from real flag manifolds to real Stiefel manifolds 11 4.2. The case of real Stiefel manifolds 12 5. Z1/2-coefficient equivariant cohomology rings of the oriented or real flag manifolds 17 5.1. Covering between the oriented and real flag manifolds 17 5.2. The case of real Stiefel manifolds 17 5.3. The case when n1,...,nk are all even 19 5.4. The case when n1,...,nk are not all even 22 6. Integral equivariant cohomology rings of the complex or quaternionic flag manifolds 25 6.1. Connections with the Schubert calculus 26 References 27 1. Introduction Let G be a compact connected Lie group and H be a closed connected subgroup. A general method of computing the cohomology of the homogeneous space G/H in rational coefficients was given by H. Cartan [Car50] ∗ ∗ arXiv:1610.07968v3 [math.AT] 11 Jan 2019 as the cohomology of a cochain complex H (G, Q) ⊗ H (BH, Q), δ , where BH is the classifying space of H and δ is a differential.
    [Show full text]
  • Communication on the Grassmann Manifold: a Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student Member, IEEE, and David N
    IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student Member, IEEE, and David N. C. Tse, Member, IEEE Abstract—In this paper, we study the capacity of multiple-an- increase [2]. The parameter is the number of de- tenna fading channels. We focus on the scenario where the fading grees of freedom per second per hertz provided by the multiple coefficients vary quickly; thus an accurate estimation of the coef- antenna channel, and is a key measure of performance. This ficients is generally not available to either the transmitter or the receiver. We use a noncoherent block fading model proposed by observation suggests the potential for very sizable improvement Marzetta and Hochwald. The model does not assume any channel in spectral efficiency. side information at the receiver or at the transmitter, but assumes The result above is derived under the key assumption that that the coefficients remain constant for a coherence interval of the instantaneous fading coefficients are known to the receiver. length symbol periods. We compute the asymptotic capacity of Thus, this result can be viewed as a fundamental limit for co- this channel at high signal-to-noise ratio (SNR) in terms of the coherence time , the number of transmit antennas , and the herent multiple-antenna communications. In a fixed wireless number of receive antennas . While the capacity gain of the co- environment, the fading coefficients vary slowly, so the trans- herent multiple antenna channel is min bits per second mitter can periodically send pilot signals to allow the receiver per hertz for every 3-dB increase in SNR, the corresponding gain to estimate the coefficients accurately.
    [Show full text]
  • The Geometry of Algorithms with Orthogonality Constraints∗
    SIAM J. MATRIX ANAL. APPL. c 1998 Society for Industrial and Applied Mathematics Vol. 20, No. 2, pp. 303–353 " THE GEOMETRY OF ALGORITHMS WITH ORTHOGONALITY CONSTRAINTS∗ ALAN EDELMAN† , TOMAS´ A. ARIAS‡ , AND STEVEN T. SMITH§ Abstract. In this paper we develop new Newton and conjugate gradient algorithms on the Grassmann and Stiefel manifolds. These manifolds represent the constraints that arise in such areas as the symmetric eigenvalue problem, nonlinear eigenvalue problems, electronic structures computations, and signal processing. In addition to the new algorithms, we show how the geometrical framework gives penetrating new insights allowing us to create, understand, and compare algorithms. The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms. It is our hope that developers of new algorithms and perturbation theories will benefit from the theory, methods, and examples in this paper. Key words. conjugate gradient, Newton’s method, orthogonality constraints, Grassmann man- ifold, Stiefel manifold, eigenvalues and eigenvectors, invariant subspace, Rayleigh quotient iteration, eigenvalue optimization, sequential quadratic programming, reduced gradient method, electronic structures computation, subspace tracking AMS subject classifications. 49M07, 49M15, 53B20, 65F15, 15A18, 51F20, 81V55 PII. S0895479895290954 1. Introduction. Problems on the Stiefel and Grassmann manifolds arise with sufficient frequency that a unifying investigation of algorithms designed to solve these problems is warranted. Understanding these manifolds, which represent orthogonality constraints (as in the symmetric eigenvalue problem), yields penetrating insight into many numerical algorithms and unifies seemingly unrelated ideas from different areas. The optimization community has long recognized that linear and quadratic con- straints have special structure that can be exploited.
    [Show full text]
  • Low Rank Pure Quaternion Approximation for Pure Quaternion Matrices
    Low Rank Pure Quaternion Approximation for Pure Quaternion Matrices Guangjing Song∗ Weiyang Ding† Michael K. Ng‡ January 1, 2021 Abstract Quaternion matrices are employed successfully in many color image processing ap- plications. In particular, a pure quaternion matrix can be used to represent red, green and blue channels of color images. A low-rank approximation for a pure quaternion ma- trix can be obtained by using the quaternion singular value decomposition. However, this approximation is not optimal in the sense that the resulting low-rank approximation matrix may not be pure quaternion, i.e., the low-rank matrix contains real component which is not useful for the representation of a color image. The main contribution of this paper is to find an optimal rank-r pure quaternion matrix approximation for a pure quaternion matrix (a color image). Our idea is to use a projection on a low- rank quaternion matrix manifold and a projection on a quaternion matrix with zero real component, and develop an alternating projections algorithm to find such optimal low-rank pure quaternion matrix approximation. The convergence of the projection algorithm can be established by showing that the low-rank quaternion matrix manifold and the zero real component quaternion matrix manifold has a non-trivial intersection point. Numerical examples on synthetic pure quaternion matrices and color images are presented to illustrate the projection algorithm can find optimal low-rank pure quaternion approximation for pure quaternion matrices or color images. Keywords: Color images, pure quaternion matrices, low-rank approximation, manifolds 1 Introduction The RGB color model is one of the most commonly applied additive color models.
    [Show full text]
  • Projection-Like Retractions on Matrix Manifolds Pierre-Antoine Absil, Jérôme Malick
    Projection-like retractions on matrix manifolds Pierre-Antoine Absil, Jérôme Malick To cite this version: Pierre-Antoine Absil, Jérôme Malick. Projection-like retractions on matrix manifolds. SIAM Jour- nal on Optimization, Society for Industrial and Applied Mathematics, 2012, 22 (1), pp.135-158. 10.1137/100802529. hal-00651608v2 HAL Id: hal-00651608 https://hal.archives-ouvertes.fr/hal-00651608v2 Submitted on 14 Dec 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. PROJECTION-LIKE RETRACTIONS ON MATRIX MANIFOLDS∗ P.-A. ABSIL† AND JER´ OMEˆ MALICK‡ Abstract. This paper deals with constructing retractions, a key step when applying optimization algorithms on matrix manifolds. For submanifolds of Euclidean spaces, we show that the operation consisting of taking a tangent step in the embedding Euclidean space followed by a projection onto the submanifold, is a retraction. We also show that the operation remains a retraction if the pro- jection is generalized to a projection-like procedure that consists of coming back to the submanifold along “admissible” directions, and we give a sufficient condition on the admissible directions for the generated retraction to be second order. This theory offers a framework in which previously-proposed retractions can be analyzed, as well as a toolbox for constructing new ones.
    [Show full text]
  • Period Integrals and Tautological Systems
    Surveys in Differential Geometry XXII Period integrals and tautological systems An Huang, Bong Lian, Shing-Tung Yau, and Chenglong Yu Abstract. Tautological systems are Picard-Fuchs type systems arising from varieties with large symmetry. In this survey, we discuss recent progress on the study of tautological systems. This includes tautological systems for vector bundles, a new construction of Jacobian rings for homogenous vector bundles, and relations between period integrals and zeta functions. Contents 1. Introduction 275 2. Tautological systems for homogenous vector bundles 277 3. Jacobian rings for homogenous vector bundles 283 4. Hasse-Witt invariants and unit roots 285 References 287 1. Introduction Period integrals connect Hodge theory, number theory, mirror symme- try, and many other important areas of math. The study of periods has a long history dating back to Euler, Legendre, Gauss, Abel, Jacobi and Pi- card in the form of special functions, which are periods of curves. Period integrals first appeared in the form of hypergeometric functions. The name hypergeometric functions appeared to have been introduced by John Wallis in his book Arithmetica Infinitorum (1655). Euler introduced the theory of elliptic integrals. Gauss studied their differential equations, now known as Euler-Gauss hypergeometric equations. Legendre made the first connection to geometry of hypergeometric equations through the theory of elliptic inte- grals. He showed that periods of the Legendre family of elliptic curves were in fact solutions to a special hypergeometric equation. Euler obtained the power series solutions to the Euler-Gauss equations and also wrote them in terms of contour integrals. He introduced the important idea that instead c 2018 International Press 275 276 A.
    [Show full text]
  • The Topology of Fiber Bundles Lecture Notes Ralph L. Cohen
    The Topology of Fiber Bundles Lecture Notes Ralph L. Cohen Dept. of Mathematics Stanford University Contents Introduction v Chapter 1. Locally Trival Fibrations 1 1. Definitions and examples 1 1.1. Vector Bundles 3 1.2. Lie Groups and Principal Bundles 7 1.3. Clutching Functions and Structure Groups 15 2. Pull Backs and Bundle Algebra 21 2.1. Pull Backs 21 2.2. The tangent bundle of Projective Space 24 2.3. K - theory 25 2.4. Differential Forms 30 2.5. Connections and Curvature 33 2.6. The Levi - Civita Connection 39 Chapter 2. Classification of Bundles 45 1. The homotopy invariance of fiber bundles 45 2. Universal bundles and classifying spaces 50 3. Classifying Gauge Groups 60 4. Existence of universal bundles: the Milnor join construction and the simplicial classifying space 63 4.1. The join construction 63 4.2. Simplicial spaces and classifying spaces 66 5. Some Applications 72 5.1. Line bundles over projective spaces 73 5.2. Structures on bundles and homotopy liftings 74 5.3. Embedded bundles and K -theory 77 5.4. Representations and flat connections 78 Chapter 3. Characteristic Classes 81 1. Preliminaries 81 2. Chern Classes and Stiefel - Whitney Classes 85 iii iv CONTENTS 2.1. The Thom Isomorphism Theorem 88 2.2. The Gysin sequence 94 2.3. Proof of theorem 3.5 95 3. The product formula and the splitting principle 97 4. Applications 102 4.1. Characteristic classes of manifolds 102 4.2. Normal bundles and immersions 105 5. Pontrjagin Classes 108 5.1. Orientations and Complex Conjugates 109 5.2.
    [Show full text]
  • Spaces Associated with Stiefel Manifolds
    SPACES ASSOCIATED WITH STIEFEL MANIFOLDS By I. M. JAMES [Received 11 January 1958.—Read 16 January 1958] THIS article concludes the survey of properties of the family of Stiefel mani- folds which we began with (3) and (4). It is primarily concerned with two families of auxiliary spaces, called stunted protective and stunted quasi- projective spaces, whose algebraic topology not only makes an interesting study in itself but also illuminates the classical problem of determining which Stiefel manifolds admit cross-sections. The theory is fairly self- contained, although there are important connexions and resemblances of technique as regards previous papers in this series. The auxiliary spaces are denned in the introductory section and our main results are stated in the following one. 1. Introduction Let K denote the field of real numbers, complex numbers, or quaternions. We refer to K as the basic field, and we denote its dimension over the reals by d. Thus d = 1, 2, or 4 according as K is real, complex, or quaternionic. Let E^, where| m ^ 1, denote the m-dimensional right vector-space over K, with a fixed basis and the classical inner product. We regard the elements of Km as column-vectors u = [>!,..., xm] (*!,..., xm G K), so that linear transformations are determined by matrices acting on the left. As usual we identify u with [xv..., xm, 0], so that JLn% is embedded in Km+i> We also identify KpxKa with K,n, where p+q = m, by means of the relation: Let Sm denote the subspace of K^ which consists of unit vectors.
    [Show full text]