Optimization Algorithms on Matrix Manifolds
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Grassmann Manifold
The Grassmann Manifold 1. For vector spaces V and W denote by L(V; W ) the vector space of linear maps from V to W . Thus L(Rk; Rn) may be identified with the space Rk£n of k £ n matrices. An injective linear map u : Rk ! V is called a k-frame in V . The set k n GFk;n = fu 2 L(R ; R ): rank(u) = kg of k-frames in Rn is called the Stiefel manifold. Note that the special case k = n is the general linear group: k k GLk = fa 2 L(R ; R ) : det(a) 6= 0g: The set of all k-dimensional (vector) subspaces ¸ ½ Rn is called the Grassmann n manifold of k-planes in R and denoted by GRk;n or sometimes GRk;n(R) or n GRk(R ). Let k ¼ : GFk;n ! GRk;n; ¼(u) = u(R ) denote the map which assigns to each k-frame u the subspace u(Rk) it spans. ¡1 For ¸ 2 GRk;n the fiber (preimage) ¼ (¸) consists of those k-frames which form a basis for the subspace ¸, i.e. for any u 2 ¼¡1(¸) we have ¡1 ¼ (¸) = fu ± a : a 2 GLkg: Hence we can (and will) view GRk;n as the orbit space of the group action GFk;n £ GLk ! GFk;n :(u; a) 7! u ± a: The exercises below will prove the following n£k Theorem 2. The Stiefel manifold GFk;n is an open subset of the set R of all n £ k matrices. There is a unique differentiable structure on the Grassmann manifold GRk;n such that the map ¼ is a submersion. -
Cheap Orthogonal Constraints in Neural Networks: a Simple Parametrization of the Orthogonal and Unitary Group
Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group Mario Lezcano-Casado 1 David Mart´ınez-Rubio 2 Abstract Recurrent Neural Networks (RNNs). In RNNs the eigenval- ues of the gradient of the recurrent kernel explode or vanish We introduce a novel approach to perform first- exponentially fast with the number of time-steps whenever order optimization with orthogonal and uni- the recurrent kernel does not have unitary eigenvalues (Ar- tary constraints. This approach is based on a jovsky et al., 2016). This behavior is the same as the one parametrization stemming from Lie group theory encountered when computing the powers of a matrix, and through the exponential map. The parametrization results in very slow convergence (vanishing gradient) or a transforms the constrained optimization problem lack of convergence (exploding gradient). into an unconstrained one over a Euclidean space, for which common first-order optimization meth- In the seminal paper (Arjovsky et al., 2016), they note that ods can be used. The theoretical results presented unitary matrices have properties that would solve the ex- are general enough to cover the special orthogo- ploding and vanishing gradient problems. These matrices nal group, the unitary group and, in general, any form a group called the unitary group and they have been connected compact Lie group. We discuss how studied extensively in the fields of Lie group theory and Rie- this and other parametrizations can be computed mannian geometry. Optimization methods over the unitary efficiently through an implementation trick, mak- and orthogonal group have found rather fruitful applications ing numerically complex parametrizations usable in RNNs in recent years (cf. -
What's in a Name? the Matrix As an Introduction to Mathematics
St. John Fisher College Fisher Digital Publications Mathematical and Computing Sciences Faculty/Staff Publications Mathematical and Computing Sciences 9-2008 What's in a Name? The Matrix as an Introduction to Mathematics Kris H. Green St. John Fisher College, [email protected] Follow this and additional works at: https://fisherpub.sjfc.edu/math_facpub Part of the Mathematics Commons How has open access to Fisher Digital Publications benefited ou?y Publication Information Green, Kris H. (2008). "What's in a Name? The Matrix as an Introduction to Mathematics." Math Horizons 16.1, 18-21. Please note that the Publication Information provides general citation information and may not be appropriate for your discipline. To receive help in creating a citation based on your discipline, please visit http://libguides.sjfc.edu/citations. This document is posted at https://fisherpub.sjfc.edu/math_facpub/12 and is brought to you for free and open access by Fisher Digital Publications at St. John Fisher College. For more information, please contact [email protected]. What's in a Name? The Matrix as an Introduction to Mathematics Abstract In lieu of an abstract, here is the article's first paragraph: In my classes on the nature of scientific thought, I have often used the movie The Matrix to illustrate the nature of evidence and how it shapes the reality we perceive (or think we perceive). As a mathematician, I usually field questions elatedr to the movie whenever the subject of linear algebra arises, since this field is the study of matrices and their properties. So it is natural to ask, why does the movie title reference a mathematical object? Disciplines Mathematics Comments Article copyright 2008 by Math Horizons. -
Multivector Differentiation and Linear Algebra 0.5Cm 17Th Santaló
Multivector differentiation and Linear Algebra 17th Santalo´ Summer School 2016, Santander Joan Lasenby Signal Processing Group, Engineering Department, Cambridge, UK and Trinity College Cambridge [email protected], www-sigproc.eng.cam.ac.uk/ s jl 23 August 2016 1 / 78 Examples of differentiation wrt multivectors. Linear Algebra: matrices and tensors as linear functions mapping between elements of the algebra. Functional Differentiation: very briefly... Summary Overview The Multivector Derivative. 2 / 78 Linear Algebra: matrices and tensors as linear functions mapping between elements of the algebra. Functional Differentiation: very briefly... Summary Overview The Multivector Derivative. Examples of differentiation wrt multivectors. 3 / 78 Functional Differentiation: very briefly... Summary Overview The Multivector Derivative. Examples of differentiation wrt multivectors. Linear Algebra: matrices and tensors as linear functions mapping between elements of the algebra. 4 / 78 Summary Overview The Multivector Derivative. Examples of differentiation wrt multivectors. Linear Algebra: matrices and tensors as linear functions mapping between elements of the algebra. Functional Differentiation: very briefly... 5 / 78 Overview The Multivector Derivative. Examples of differentiation wrt multivectors. Linear Algebra: matrices and tensors as linear functions mapping between elements of the algebra. Functional Differentiation: very briefly... Summary 6 / 78 We now want to generalise this idea to enable us to find the derivative of F(X), in the A ‘direction’ – where X is a general mixed grade multivector (so F(X) is a general multivector valued function of X). Let us use ∗ to denote taking the scalar part, ie P ∗ Q ≡ hPQi. Then, provided A has same grades as X, it makes sense to define: F(X + tA) − F(X) A ∗ ¶XF(X) = lim t!0 t The Multivector Derivative Recall our definition of the directional derivative in the a direction F(x + ea) − F(x) a·r F(x) = lim e!0 e 7 / 78 Let us use ∗ to denote taking the scalar part, ie P ∗ Q ≡ hPQi. -
Handout 9 More Matrix Properties; the Transpose
Handout 9 More matrix properties; the transpose Square matrix properties These properties only apply to a square matrix, i.e. n £ n. ² The leading diagonal is the diagonal line consisting of the entries a11, a22, a33, . ann. ² A diagonal matrix has zeros everywhere except the leading diagonal. ² The identity matrix I has zeros o® the leading diagonal, and 1 for each entry on the diagonal. It is a special case of a diagonal matrix, and A I = I A = A for any n £ n matrix A. ² An upper triangular matrix has all its non-zero entries on or above the leading diagonal. ² A lower triangular matrix has all its non-zero entries on or below the leading diagonal. ² A symmetric matrix has the same entries below and above the diagonal: aij = aji for any values of i and j between 1 and n. ² An antisymmetric or skew-symmetric matrix has the opposite entries below and above the diagonal: aij = ¡aji for any values of i and j between 1 and n. This automatically means the digaonal entries must all be zero. Transpose To transpose a matrix, we reect it across the line given by the leading diagonal a11, a22 etc. In general the result is a di®erent shape to the original matrix: a11 a21 a11 a12 a13 > > A = A = 0 a12 a22 1 [A ]ij = A : µ a21 a22 a23 ¶ ji a13 a23 @ A > ² If A is m £ n then A is n £ m. > ² The transpose of a symmetric matrix is itself: A = A (recalling that only square matrices can be symmetric). -
Building Deep Networks on Grassmann Manifolds
Building Deep Networks on Grassmann Manifolds Zhiwu Huangy, Jiqing Wuy, Luc Van Goolyz yComputer Vision Lab, ETH Zurich, Switzerland zVISICS, KU Leuven, Belgium fzhiwu.huang, jiqing.wu, [email protected] Abstract The popular applications of Grassmannian data motivate Learning representations on Grassmann manifolds is popular us to build a deep neural network architecture for Grassman- in quite a few visual recognition tasks. In order to enable deep nian representation learning. To this end, the new network learning on Grassmann manifolds, this paper proposes a deep architecture is designed to accept Grassmannian data di- network architecture by generalizing the Euclidean network rectly as input, and learns new favorable Grassmannian rep- paradigm to Grassmann manifolds. In particular, we design resentations that are able to improve the final visual recog- full rank mapping layers to transform input Grassmannian nition tasks. In other words, the new network aims to deeply data to more desirable ones, exploit re-orthonormalization learn Grassmannian features on their underlying Rieman- layers to normalize the resulting matrices, study projection nian manifolds in an end-to-end learning architecture. In pooling layers to reduce the model complexity in the Grass- summary, two main contributions are made by this paper: mannian context, and devise projection mapping layers to re- spect Grassmannian geometry and meanwhile achieve Eu- • We explore a novel deep network architecture in the con- clidean forms for regular output layers. To train the Grass- text of Grassmann manifolds, where it has not been possi- mann networks, we exploit a stochastic gradient descent set- ble to apply deep neural networks. -
On Manifolds of Tensors of Fixed Tt-Rank
ON MANIFOLDS OF TENSORS OF FIXED TT-RANK SEBASTIAN HOLTZ, THORSTEN ROHWEDDER, AND REINHOLD SCHNEIDER Abstract. Recently, the format of TT tensors [19, 38, 34, 39] has turned out to be a promising new format for the approximation of solutions of high dimensional problems. In this paper, we prove some new results for the TT representation of a tensor U ∈ Rn1×...×nd and for the manifold of tensors of TT-rank r. As a first result, we prove that the TT (or compression) ranks ri of a tensor U are unique and equal to the respective seperation ranks of U if the components of the TT decomposition are required to fulfil a certain maximal rank condition. We then show that d the set T of TT tensors of fixed rank r forms an embedded manifold in Rn , therefore preserving the essential theoretical properties of the Tucker format, but often showing an improved scaling behaviour. Extending a similar approach for matrices [7], we introduce certain gauge conditions to obtain a unique representation of the tangent space TU T of T and deduce a local parametrization of the TT manifold. The parametrisation of TU T is often crucial for an algorithmic treatment of high-dimensional time-dependent PDEs and minimisation problems [33]. We conclude with remarks on those applications and present some numerical examples. 1. Introduction The treatment of high-dimensional problems, typically of problems involving quantities from Rd for larger dimensions d, is still a challenging task for numerical approxima- tion. This is owed to the principal problem that classical approaches for their treatment normally scale exponentially in the dimension d in both needed storage and computa- tional time and thus quickly become computationally infeasable for sensible discretiza- tions of problems of interest. -
Geometric-Algebra Adaptive Filters Wilder B
1 Geometric-Algebra Adaptive Filters Wilder B. Lopes∗, Member, IEEE, Cassio G. Lopesy, Senior Member, IEEE Abstract—This paper presents a new class of adaptive filters, namely Geometric-Algebra Adaptive Filters (GAAFs). They are Faces generated by formulating the underlying minimization problem (a deterministic cost function) from the perspective of Geometric Algebra (GA), a comprehensive mathematical language well- Edges suited for the description of geometric transformations. Also, (directed lines) differently from standard adaptive-filtering theory, Geometric Calculus (the extension of GA to differential calculus) allows Fig. 1. A polyhedron (3-dimensional polytope) can be completely described for applying the same derivation techniques regardless of the by the geometric multiplication of its edges (oriented lines, vectors), which type (subalgebra) of the data, i.e., real, complex numbers, generate the faces and hypersurfaces (in the case of a general n-dimensional quaternions, etc. Relying on those characteristics (among others), polytope). a deterministic quadratic cost function is posed, from which the GAAFs are devised, providing a generalization of regular adaptive filters to subalgebras of GA. From the obtained update rule, it is shown how to recover the following least-mean squares perform calculus with hypercomplex quantities, i.e., elements (LMS) adaptive filter variants: real-entries LMS, complex LMS, that generalize complex numbers for higher dimensions [2]– and quaternions LMS. Mean-square analysis and simulations in [10]. a system identification scenario are provided, showing very good agreement for different levels of measurement noise. GA-based AFs were first introduced in [11], [12], where they were successfully employed to estimate the geometric Index Terms—Adaptive filtering, geometric algebra, quater- transformation (rotation and translation) that aligns a pair of nions. -
Determinants Math 122 Calculus III D Joyce, Fall 2012
Determinants Math 122 Calculus III D Joyce, Fall 2012 What they are. A determinant is a value associated to a square array of numbers, that square array being called a square matrix. For example, here are determinants of a general 2 × 2 matrix and a general 3 × 3 matrix. a b = ad − bc: c d a b c d e f = aei + bfg + cdh − ceg − afh − bdi: g h i The determinant of a matrix A is usually denoted jAj or det (A). You can think of the rows of the determinant as being vectors. For the 3×3 matrix above, the vectors are u = (a; b; c), v = (d; e; f), and w = (g; h; i). Then the determinant is a value associated to n vectors in Rn. There's a general definition for n×n determinants. It's a particular signed sum of products of n entries in the matrix where each product is of one entry in each row and column. The two ways you can choose one entry in each row and column of the 2 × 2 matrix give you the two products ad and bc. There are six ways of chosing one entry in each row and column in a 3 × 3 matrix, and generally, there are n! ways in an n × n matrix. Thus, the determinant of a 4 × 4 matrix is the signed sum of 24, which is 4!, terms. In this general definition, half the terms are taken positively and half negatively. In class, we briefly saw how the signs are determined by permutations. -
Matrices and Tensors
APPENDIX MATRICES AND TENSORS A.1. INTRODUCTION AND RATIONALE The purpose of this appendix is to present the notation and most of the mathematical tech- niques that are used in the body of the text. The audience is assumed to have been through sev- eral years of college-level mathematics, which included the differential and integral calculus, differential equations, functions of several variables, partial derivatives, and an introduction to linear algebra. Matrices are reviewed briefly, and determinants, vectors, and tensors of order two are described. The application of this linear algebra to material that appears in under- graduate engineering courses on mechanics is illustrated by discussions of concepts like the area and mass moments of inertia, Mohr’s circles, and the vector cross and triple scalar prod- ucts. The notation, as far as possible, will be a matrix notation that is easily entered into exist- ing symbolic computational programs like Maple, Mathematica, Matlab, and Mathcad. The desire to represent the components of three-dimensional fourth-order tensors that appear in anisotropic elasticity as the components of six-dimensional second-order tensors and thus rep- resent these components in matrices of tensor components in six dimensions leads to the non- traditional part of this appendix. This is also one of the nontraditional aspects in the text of the book, but a minor one. This is described in §A.11, along with the rationale for this approach. A.2. DEFINITION OF SQUARE, COLUMN, AND ROW MATRICES An r-by-c matrix, M, is a rectangular array of numbers consisting of r rows and c columns: ¯MM.. -
28. Exterior Powers
28. Exterior powers 28.1 Desiderata 28.2 Definitions, uniqueness, existence 28.3 Some elementary facts 28.4 Exterior powers Vif of maps 28.5 Exterior powers of free modules 28.6 Determinants revisited 28.7 Minors of matrices 28.8 Uniqueness in the structure theorem 28.9 Cartan's lemma 28.10 Cayley-Hamilton Theorem 28.11 Worked examples While many of the arguments here have analogues for tensor products, it is worthwhile to repeat these arguments with the relevant variations, both for practice, and to be sensitive to the differences. 1. Desiderata Again, we review missing items in our development of linear algebra. We are missing a development of determinants of matrices whose entries may be in commutative rings, rather than fields. We would like an intrinsic definition of determinants of endomorphisms, rather than one that depends upon a choice of coordinates, even if we eventually prove that the determinant is independent of the coordinates. We anticipate that Artin's axiomatization of determinants of matrices should be mirrored in much of what we do here. We want a direct and natural proof of the Cayley-Hamilton theorem. Linear algebra over fields is insufficient, since the introduction of the indeterminate x in the definition of the characteristic polynomial takes us outside the class of vector spaces over fields. We want to give a conceptual proof for the uniqueness part of the structure theorem for finitely-generated modules over principal ideal domains. Multi-linear algebra over fields is surely insufficient for this. 417 418 Exterior powers 2. Definitions, uniqueness, existence Let R be a commutative ring with 1. -
INTRODUCTION to ALGEBRAIC GEOMETRY 1. Preliminary Of
INTRODUCTION TO ALGEBRAIC GEOMETRY WEI-PING LI 1. Preliminary of Calculus on Manifolds 1.1. Tangent Vectors. What are tangent vectors we encounter in Calculus? 2 0 (1) Given a parametrised curve α(t) = x(t); y(t) in R , α (t) = x0(t); y0(t) is a tangent vector of the curve. (2) Given a surface given by a parameterisation x(u; v) = x(u; v); y(u; v); z(u; v); @x @x n = × is a normal vector of the surface. Any vector @u @v perpendicular to n is a tangent vector of the surface at the corresponding point. (3) Let v = (a; b; c) be a unit tangent vector of R3 at a point p 2 R3, f(x; y; z) be a differentiable function in an open neighbourhood of p, we can have the directional derivative of f in the direction v: @f @f @f D f = a (p) + b (p) + c (p): (1.1) v @x @y @z In fact, given any tangent vector v = (a; b; c), not necessarily a unit vector, we still can define an operator on the set of functions which are differentiable in open neighbourhood of p as in (1.1) Thus we can take the viewpoint that each tangent vector of R3 at p is an operator on the set of differential functions at p, i.e. @ @ @ v = (a; b; v) ! a + b + c j ; @x @y @z p or simply @ @ @ v = (a; b; c) ! a + b + c (1.2) @x @y @z 3 with the evaluation at p understood.