Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time
Total Page:16
File Type:pdf, Size:1020Kb
2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS) Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time Jess Banks Jorge Garza-Vargas Archit Kulkarni Nikhil Srivastava Mathematics Mathematics Mathematics Mathematics UC Berkeley UC Berkeley UC Berkeley UC Berkeley [email protected] [email protected] [email protected] [email protected] Abstract—We exhibit a randomized algorithm which given decades of research. In particular, the currently best known n×n a square matrix A ∈ C with A≤1 and δ>0, computes provable algorithms for this problem run in time O(n10/δ2) V D with high probability an invertible and diagonal such that [3] or O(nc log(1/δ)) [7] with c ≥ 12 where δ>0 is A − VDV−1≤δ an error parameter, depending on the model of computation and notion of approximation considered.1 To be sure, the O T n 2 n/δ in ( MM( )log ( )) arithmetic operations on a floating non-Hermitian case is well-motivated: coupled systems of point machine with O(log4(n/δ)logn) bits of precision. The computed similarity V additionally satisfies V V −1≤ differential equations, linear dynamical systems in control 2.5 O(n /δ). Here TMM(n) is the number of arithmetic operations theory, transfer operators in mathematical physics, and the required to multiply two n × n complex matrices numerically nonbacktracking matrix in spectral graph theory are but a ω+η stably, known to satisfy TMM(n)=O(n ) for every η>0 few situations where finding the eigenvalues and eigenvec- ω where is the exponent of matrix multiplication [1]. The tors of a non-Hermitian matrix is important. algorithm is a variant of the spectral bisection algorithm in numerical linear algebra [2] with a crucial Gaussian pertur- The key difficulties in dealing with non-normal matrices bation preprocessing step. Our running time is optimal up are the interrelated phenomena of non-orthogonal eigenvec- to polylogarithmic factors, in the sense that verifying that a tors and spectral instability, the latter referring to extreme given similarity diagonalizes a matrix requires at least matrix sensitivity of the eigenvalues and invariant subspaces to multiplication time. It significantly improves the previously best known provable running times of O(n10/δ2) arithmetic perturbations of the matrix. Non-orthogonality slows down operations for diagonalization of general matrices [3], and (with convergence of standard algorithms such as the power regards to the dependence on n) O(n3) arithmetic operations method, and spectral instability can force the use of very for Hermitian matrices [4], and is the first algorithm to achieve high precision arithmetic, also leading to slower algorithms. nearly matrix multiplication time for diagonalization in any Both phenomena together make it difficult to reduce the model of computation (real arithmetic, rational arithmetic, or finite arithmetic). eigenproblem to a subproblem by “removing” an eigenvector The proof rests on two new ingredients. (1) We show or invariant subspace, since this can only be done approx- that adding a small complex Gaussian perturbation to any imately and one must control the spectral stability of the matrix splits its pseudospectrum into n small well-separated subproblem. components. In particular, this implies that the eigenvalues of In this paper, we overcome these difficulties by identifying the perturbed matrix have a large minimum gap, a property of independent interest in random matrix theory. (2) We give a and leveraging a phenomenon we refer to as pseudospectral rigorous analysis of Roberts’ [5] Newton iteration method for shattering: adding a small complex Gaussian perturbation computing the sign function of a matrix in finite arithmetic, to any matrix yields a matrix with well-conditioned eigen- itself an open problem in numerical analysis since at least vectors and a large minimum gap between the eigenvalues, 1986 [6]. This is achieved by controlling the evolution of the implying spectral stability. This result builds on the recent pseudospectra of the iterates using a carefully chosen sequence of shrinking contour integrals in the complex plane. solution of Davies’ conjecture [8], and is of independent interest in random matrix theory, where minimum eigenvalue Keywords -Numerical Analysis, Random Matrix Theory. gap bounds in the non-Hermitian case were previously only known for i.i.d. models [9], [10]. I. INTRODUCTION We complement the above by proving that a variant of We study the algorithmic problem of approximately find- the well-known spectral bisection algorithm in numerical ing all of the eigenvalues and eigenvectors of a given linear algebra [2] is both fast and numerically stable (i.e., can arbitrary n × n complex matrix. While this problem is quite be implemented using a polylogarithmic number of bits of well-understood in the special case of Hermitian matrices (see, e.g., [4]), the general non-Hermitian case has remained 1A detailed discussion of these and other related results appears in mysterious from a theoretical standpoint even after several Section I-C. 2575-8454/20/$31.00 ©2020 IEEE 529 DOI 10.1109/FOCS46700.2020.00056 precision) when run on a pseudospectrally shattered matrix. quantities such as eigenvalue multiplicity can have a The key step in the bisection algorithm is computing the sign significant meaning. function of a matrix, a problem of independent interest in many areas such including control theory and approximation Backward Approximation. Compute (λi,vi) which are the theory [11]. Our main algorithmic contribution is a rigorous exact eigenpairs of a matrix A satisfying analysis of the well-known Newton iteration method [5] for computing the sign function in finite arithmetic, showing A − A≤δ, that it converges quickly and numerically stably on matrices for which the sign function is well-conditioned, in particular i.e., find the exact solution to a nearby problem. This is on pseudospectrally shattered ones. the appropriate and standard notion in scientific computing, The end result is an algorithm which reduces the general where the matrix is of physical or empirical origin and diagonalization problem to a polylogarithmic (in the desired is not assumed to be known exactly (and even if it were, roundoff error would destroy this exactness). Note that since accuracy and dimension n) number of invocations of stan- Cn×n dard numerical linear algebra routines (multiplication, inver- diagonalizable matrices are dense in , one can hope to always find a complete set of eigenpairs for some nearby sion, and QR factorization), each of which is reducible to −1 matrix multiplication [12], yielding a nearly matrix multipli- A = VDV , yielding an approximate diagonalization of cation runtime for the whole algorithm. This improves on the A: −1 previously best known running time of O(n3 +n2 log(1/δ)) A − VDV ≤δ. (1) arithmetic operations even in the Hermitian case [4]. We now proceed to give precise mathematical formula- Note that the eigenproblem in either of the above tions of the eigenproblem and computational model, fol- formulations is not easily reducible to the problem of lowed by statements of our results and a detailed discussion computing eigenvalues, since they can only be computed of related work. approximately and it is not clear how to obtain approximate eigenvectors from approximate eigenvalues. We now A. Problem Statement introduce a condition number for the eigenproblem, which An eigenpair of a matrix A ∈ Cn×n is a tuple (λ, v) ∈ measures the sensitivity of the eigenpairs of a matrix C × Cn such that to perturbations and allows us to relate its forward and backward approximate solutions. Av = λv, and v is normalized to be a unit vector. The eigenprob- Condition Numbers. For diagonalizable A, the eigenvector lem is the problem of finding a maximal set of linearly condition number of A, denoted κV (A), is defined as: independent eigenpairs (λi,vi) of a given matrix A; note −1 κV (A):=infV V , (2) that an eigenvalue may appear more than once if it has V geometric multiplicity greater than one. In the case when A is diagonalizable, the solution consists of exactly n where the infimum is over all invertible V such that A = −1 eigenpairs, and if A has distinct eigenvalues then the solution VDV for some diagonal D, and its minimum eigenvalue is unique, up to the phases of the vi. gap is defined as: 1) Accuracy and Conditioning: Due to the Abel-Ruffini gap(A):=min|λi(A) − λj(A)|, theorem, it is impossible to have a finite-time algorithm i= j which solves the eigenproblem exactly using arithmetic where λi are the eigenvalues of A (with multiplicity). We operations and radicals. Thus, all we can hope for is 2 approximate eigenvalues and eigenvectors, up to a desired define the condition number of the eigenproblem to be : accuracy δ>0. There are two standard notions of κV (A) ≤ κ (A):= ∈ [0, ∞]. (3) approximation. We assume A 1 for normalization, eig gap(A) where throughout this work, · denotes the spectral norm (the 2 → 2 operator norm). It follows from the following proposition (whose proof appears in [14]) that a δ-backward approximate solution Forward Approximation. Compute pairs (λi,vi) such that of the eigenproblem is a 6nκeig(A)δ-forward approximate 3 solution |λi − λi|≤δ and vi − vi≤δ for the true eigenpairs (λi,vi), i.e., find a solution close 2This quantity is inspired by but not identical to the “reciprocal of the to the exact solution. This makes sense in contexts where distance to ill-posedness” for the eigenproblem considered by Demmel [13], to which it is polynomially related. the exact solution is meaningful; e.g. the matrix is of 3 In fact, it can be shown that κeig(A) is related by a poly(n) factor to theoretical/mathematical origin, and unstable (in the entries) the smallest constant for which (4) holds for all sufficiently small δ>0.