Tensors in Computations 3 a Large Part of Our Article; the Title of Our Article Also Includes the Use of Tensors As a Tool in the Analysis of Algorithms (E.G
Total Page:16
File Type:pdf, Size:1020Kb
Acta Numerica (2021), pp. 1–208 © Cambridge University Press, 2021 doi:10.1017/S0962492921000076 Printed in the United Kingdom Tensors in computations Lek-Heng Lim Computational and Applied Mathematics Initiative, University of Chicago, Chicago, IL 60637, USA E-mail: [email protected] The notion of a tensor captures three great ideas: equivariance, multilinearity, separ- ability. But trying to be three things at once makes the notion difficult to understand. We will explain tensors in an accessible and elementary way throughthe lensof linear algebra and numerical linear algebra, elucidated with examples from computational and applied mathematics. CONTENTS 1 Introduction 1 2 Tensors via transformation rules 11 3 Tensors via multilinearity 41 4 Tensors via tensor products 85 5 Odds and ends 191 References 195 1. Introduction arXiv:2106.08090v1 [math.NA] 15 Jun 2021 We have two goals in this article: the first is to explain in detail and in the simplest possible terms what a tensor is; the second is to discuss the main ways in which tensors play a role in computations. The two goals are interwoven: what defines a tensor is also what makes it useful in computations, so it is important to gain a genuine understanding of tensors. We will take the reader through the three common definitions of a tensor: as an object that satisfies certain transformation rules, as a multilinear map, and as an element of a tensor product of vector spaces. We will explain the motivations behind these definitions, how one definition leads to the next, and how they all fit together. All three definitions are useful in compu- tational mathematics but in different ways; we will intersperse our discussions of each definition with considerations of how it is employed in computations, using the latter as impetus for the former. Tensors arise in computations in one of three ways: equivariance under coordin- 2 L.-H. Lim ate changes, multilinear maps and separation of variables, each corresponding to one of the aforementioned definitions. Separability. The last of the three is the most prevalent, arising in various forms in problems, solutions and algorithms. One may exploit separable structures in ordinary differential equations, in integral equations, in Hamiltonians, in objective functions, etc. The separation-of-variables technique may be applied to solve partial differential and integral equations, or even finite difference and integro- differential equations. Separability plays an indispensable role in Greengard and Rokhlin’s fast multipole method, Grover’s quantum search algorithm, Hartree– Fock approximations of various stripes, Hochbaum and Shanthikumar’s non-linear separable convex optimization, Smolyak’s quadrature, and beyond. It also underlies tensor product construction of various objects – bases, frames, function spaces, kernels, multiresolution analyses, operators and quadratures among them. Multilinearity. Many common operations, ranging from multiplying two complex numbers or matrices to the convolution or bilinear Hilbert transform of functions, are bilinear operators. Trilinear functionals arise in self-concordance and numerical stability, and in fast integer and fast matrix multiplications. The last leads us to the matrix multiplication exponent, which characterizes the computational complexity of not just matrix multiplication but also inversion, determinant, null basis, linear systems, LU/QR/eigenvalue/Hessenberg decompositions, and more. Higher-order multilinear maps arise in the form of multidimensional Fourier/Laplace/Z/discrete cosine transforms and as cryptographic multilinear maps. Even if a multivariate function is not multilinear, its derivatives always are, and thus Taylor and multipole series are expansions in multilinear maps. Equivariance. Equivariance is an idea as old as tensors and has been implicitly used throughout algorithms in numerical linear algebra. Every time we transform the scalars, vectors or matrices in a problem from one form to another via a sequence of Givens or Jacobi rotations, Householder reflectors or Gauss transforms, or through a Krylov or Fourier or wavelet basis, we are employing equivariance in the form of various 0-, 1- or 2-tensor transformation rules. Beyond numerical linear algebra, the equivariance of tools from interior point methods to deep neural networks is an important reason for their success. The equivariance of Newton’s method allows it to solve arbitrarily ill-conditioned optimization problems in theory and highly ill-conditioned ones in practice. AlphaFold 2 conquered the protein structure prediction problem with an equivariant neural network. Incidentally, out of the so-called ‘top ten algorithms of the twentieth century’ (Dongarra and Sullivan 2000), fast multipole (Board and Schulten 2000), fast Fourier transform (Rockmore 2000), Krylov subspaces (van der Vorst 2000), Fran- cis’s QR (Parlett 2000) and matrix decompositions (Stewart 2000) all draw from these three aforementioned tensorial ideas in one way or another. Nevertheless, by ‘computations’ we do not mean just algorithms, although these will constitute Tensors in computations 3 a large part of our article; the title of our article also includes the use of tensors as a tool in the analysis of algorithms (e.g. self-concordance in polynomial-time convergence), providing intractability guarantees (e.g. cryptographic multilinear maps), reducing complexity of models (e.g. equivariant neural networks), quanti- fying computational complexity (e.g. exponent of matrix multiplication) and in yet other ways. It would be prudent to add that while tensors are an essential ingredient in the aforementioned computational tools, they are rarely the only ingredient: it is usually in combination with other concepts and techniques – calculus of variations, Gauss quadrature, multiresolution analysis, the power method, reproducing kernel Hilbert spaces, singular value decomposition, etc. – that they become potent tools. The article is written with accessibility and simplicity in mind. Our exposition assumes only standard undergraduate linear algebra: vector spaces, linear maps, dual spaces, change of basis, etc. Knowledge of numerical linear algebra is a plus since this article mainly targets computational mathematicians. As physicists have played an outsize role in the development of tensors, it is inevitable that motivations for certain aspects, explanations for certain definitions, etc., are best understood from a physicist’s perspective and to this end we will include some discussions to provide context. When discussing applications or the physics origin of certain ideas, it is inevitable that we have to assume slightly more background, but we strive to be self-contained and limit ourselves to the most pertinent basic ideas. In particular, it is not our objective to provide a comprehensive survey of all things tensorial. We focus on the foundational and fundamental; results from current research make an appearance only if they illuminate these basic aspects. We have avoided a formal textbook-style treatment and opted for a more casual exposition that hopefully makes for easier reading (and in part to keep open the possibility of a future book project). 1.1. Overview There are essentially three ways to define a tensor, reflecting the chronological evolution of the notion through the last 140 years or so: ➀ as a multi-indexed object that satisfies certain transformation rules, ➁ as a multilinear map, ➂ as an element of a tensor product of vector spaces. The key to the coordinate-dependent definition in ➀ is the emphasized part: the transformation rules are what define a tensor when one chooses to view it as a multi-indexed object, akin to the group laws in the definition of a group. The more modern coordinate-free definitions ➁ and ➂ automatically encode these transformation rules. The multi-indexed object, which could be a hypermatrix, a polynomial, a differential form, etc., is then a coordinate representation of either ➁ or ➂ with respect to a choice of bases, and the corresponding transformation rule is a change-of-basis theorem. Take the following case familiar to anyone who has studied linear algebra. Let V 4 L.-H. Lim and W be finite-dimensional vector spaces over R. A linear map 5 : V W is an order-2 tensor of covariant order 1 and contravariant order 1; this is the description→ according to definition ➁. We may construct a new vector space V W, where ∗ ⊗ V∗ denotes the dual space of V, and show that 5 corresponds to a unique element ) V∗ W; this is the description according to definition ➂. With a choice of bases ∈ ⊗ < = ℬV and ℬW, 5 or ) may be represented as a matrix R × , where < = dim V and = = dim W. Note, however, that the matrix is∈ not unique and depends ℬ ℬ on our choice of bases, and different bases V′ and W′ would give a different < = matrix representation ′ R × . The change-of-basis theorem states that the two ∈ 1 matrices are related via the transformation rule1 ′ = -− ., where - and . are the change-of-basis matrices on W and V respectively. Definition ➀ is essentially an attempt to define a linear map using its change-of-basis theorem – possible but awkward. The reason for such an awkward definition is one of historical necessity: definition ➀ had come before any of the modern notions we now take for granted in linear algebra – vector space, dual space, linear map, bases, etc. – were invented. We stress that if one chooses to work with definition ➀, then it is the transform- ation rule/change-of-basis theorem and not the multi-indexed object that