<<

arXiv:1912.08805v3 [math.NA] 13 Sep 2020 ignlzto nNal arxMlilcto Time Multiplication Nearly in Diagonalization † [email protected] ∗ upre yteNFGaut eerhFlosi rga u Program Fellowship Research Graduate NSF the by Supported upre yNFGatCCF-1553751. Grant NSF by Supported suopcrlSatrn,teSg ucin and Function, Sign the Shattering, Pseudospectral in optswt ihpoaiiya invertible an probability high with computes fpeiin h optdsimilarity computed The precision. of tby nw osatisfy to known stably, stenme faihei prtosrqie omlil tw multiply to required operations arithmetic of number the is ilcto [ tiplication rcs[ trices nerl ntecmlxplane. carefu complex a the using in iterates integrals the of pseudospectra the of evolution rationa arithmetic, (real computation of model any in ization si It time. of multiplication times matrix running provable least known veri at that requires sense the matrix in a factors, polylogarithmic to up optimal [ algebra linear noe rbe nnmrclaayi ic tlat18 [ 1986 least at since analysis numerical in problem open an to perturbation rpryo needn neeti admmti hoy 2 eg We (2) theory. perturb matrix the [ random in of interest eigenvalues independent of the property that implies this particular, arcs[ matrices Rob80 O(T eehbtarnoie loih hc ie qaematrix square a given which algorithm randomized a exhibit We h ro et ntonwigeins 1 eso htadn sm a adding that show We (1) ingredients. new two on rests proof The CBerkeley UC esBanks Jess MM ABB etnieainmto o optn h infnto fam a of function sign the computing for method iteration Newton ] Par98 n log (n) + DDHK07 18 ,ad(ihrgrst h eedneon dependence the to regards (with and ], BJD74 ,adi h rtagrtmt civ erymti multiplic matrix nearly achieve to algorithm first the is and ], any 2 (n/)) ∗ .Teagrtmi ain fteseta ieto algori bisection spectral the of variant a is algorithm The ]. arxslt t suopcrminto its splits matrix ihacuilGusa etrainpercsigse.O step. preprocessing perturbation Gaussian crucial a with ] rtmtcoeain naflaigpitmciewith machine point floating a on operations arithmetic T MM n O(n = (n) O(n [email protected] [email protected] 10 etme 5 2020 15, September !+ V ihlSrivastava Nikhil / A−VDV V − ‖A og Garza-Vargas Jorge 2 diinlysatisfies additionally CBerkeley UC ) ) CBerkeley UC o every for rtmtcoeain o ignlzto fgnrlma- general of diagonalization for operations arithmetic Abstract V n diagonal and 1 −1  ≤ ‖ 0 >  n ) Bye86 O(n nfiatyipoe h rvosybest previously the improves gnificantly l hsnsqec fsrnigcontour shrinking of sequence chosen lly yn htagvnsmlrt diagonalizes similarity given a that fying dmti aealremnmmgp a gap, minimum large a have matrix ed n rtmtc rfiiearithmetic). finite or arithmetic, l where † drGatDGE-1752814. Grant nder D ml elsprtdcmoet.In components. well-separated small 3 o uhthat such V‖‖V ‖V ) .Ti sahee ycnrligthe controlling by achieved is This ]. n × n rtmtcoeain o Hermitian for operations arithmetic v ioosaayi fRoberts’ of analysis rigorous a ive ! ℂ ∈ A −1 steepnn fmti mul- matrix of exponent the is ope arcsnumerically matrices complex ti nfiieaihei,itself arithmetic, finite in atrix O(n ≤ ‖ n×n [email protected] to iefrdiagonal- for time ation with l ope Gaussian complex all O(log 2.5 rhtKulkarni Archit rrnigtm is time running ur /) CBerkeley UC h nnumerical in thm A 1 ≤ ‖A‖ 4 Here . n o n) log (n/) and T MM 0 >  bits (n) , Contents

1 Introduction 3 1.1 Problem Statement ...... 4 1.1.1 Accuracy and Conditioning ...... 4 1.1.2 Models of Computation ...... 6 1.2 Results and Techniques ...... 7 1.3 Related Work ...... 10

2 Preliminaries 14 2.1 Spectral Projectors and Holomorphic ...... 14 2.2 Pseudospectrum and Spectral Stability ...... 15 2.3 Finite-Precision Arithmetic ...... 17 2.4 Sampling Gaussians in Finite Precision ...... 18 2.5 Black-box Error Assumptions for Multiplication, Inversion, and QR ...... 18

3 Pseudospectral Shattering 20 3.1 Smoothed Analysis of Gap and Eigenvector Condition Number ...... 20 3.2 Shattering ...... 23

4 Matrix Sign Function 27 4.1 Circles of Apollonius ...... 28 4.2 Exact Arithmetic ...... 29 4.3 Finite Arithmetic ...... 34

5 Spectral Bisection Algorithm 43 5.1 Proof of Theorem 5.5 ...... 49

6 Conclusion and Open Questions 55

A Deferred Proofs from Section 4 61

B Analysis of SPLIT 64

C Analysis of DEFLATE 66 C.1 Smallest Singular Value of the Corner of a Haar Unitary ...... 67 C.2 Sampling Haar Unitaries in Finite Precision ...... 69 C.3 Preliminaries of RURV ...... 71 C.4 Exact Arithmetic Analysis of DEFLATE ...... 72 C.5 Finite Arithmetic Analysis of DEFLATE ...... 74

2 1 Introduction

We study the algorithmic problem of approximately finding all of the eigenvalues and eigenvec- tors of a given arbitrary complex matrix. While this problem is quite well-understood in the special case of Hermitiann×n matrices (see, e.g., [Par98]), the general non-Hermitian case has remained mysterious from a theoretical standpoint even after several decades of research. In

particular, the currently best known provable algorithms for this problem run in time 10 2 O(n / ) [ABB+18] or c [Cai94] with where is an error parameter, depending on the modelO(n of computationlog(1/)) and notionc of ≥ 12approximation > 0 considered.1 To be sure, the non- Hermitian case is well-motivated: coupled systems of differential equations, linear dynamical systems in control theory, transfer operators in mathematical physics, and the nonbacktracking matrix in spectral graph theory are but a few situations where finding the eigenvalues and eigen- vectors of a non- is important. The key difficulties in dealing with non-normal matrices are the interrelated phenomena of non-orthogonal eigenvectors and spectral instability, the latter referring to extreme sensitivity of the eigenvalues and invariant subspaces to perturbations of the matrix. Non-orthogonality slows down convergence of standard algorithms such as the power method, and spectral instability can force the use of very high precision arithmetic, also leading to slower algorithms. Both phe- nomena together make it difficult to reduce the eigenproblem to a subproblem by “removing” an eigenvector or invariant subspace, since this can only be done approximately and one must control the spectral stability of the subproblem. In this paper, we overcome these difficulties by identifying and leveraging a phenomenon we refer to as pseudospectral shattering: adding a small complex Gaussian perturbation to any matrix yields a matrix with well-conditioned eigenvectors and a large minimum gap between the eigenvalues, implying spectral stability. This result builds on the recent solution of Davies’ conjecture [BKMS19], and is of independent interest in random matrix theory, where minimum eigenvalue gap bounds in the non-Hermitian case were previously only known for i.i.d. models [SJ12, Ge17]. We complement the above by proving that a variant of the well-known spectral bisection algorithm in numerical linear algebra [BJD74] is both fast and numerically stable (i.e., can be implemented using a polylogarithmic number of bits of precision) when run on a pseudospec- trally shattered matrix. The key step in the bisection algorithm is computing the sign function of a matrix, a problem of independent interest in many areas such including control theory and approximation theory [KL95]. Our main algorithmic contribution is a rigorous analysis of the well-known Newton iteration method [Rob80] for computing the sign function in finite arith- metic, showing that it converges quickly and numerically stably on matrices for which the sign function is well-conditioned, in particular on pseudospectrally shattered ones. The end result is an algorithm which reduces the general diagonalization problem to a poly- logarithmic (in the desired accuracy and dimension ) number of invocations of standard numer- ical linear algebra routines (multiplication, inversion,n and QR factorization), each of which is re- ducible to matrix multiplication [DDH07], yielding a nearly matrix multiplication runtime for the

1A detailed discussion of these and other related results appears in Section 1.3.

3 whole algorithm. This improves on the previously best known running time of 3 2 arithmetic operations even in the Hermitian case [Par98]. O(n +n log(1/)) We now proceed to give precise mathematical formulations of the eigenproblem and compu- tational model, followed by statements of our results and a detailed discussion of related work.

1.1 Problem Statement

An eigenpair of a matrix n×n is a tuple n such that A ∈ ℂ (,v)∈ℂ×ℂ

Av = v, and is normalized to be a vector. The eigenproblem is the problem of finding a maximal set ofv linearly independent eigenpairs of a given matrix ; note that an eigenvalue may appear more than once if it has geometric(i,v multiplicityi) greater thanA one. In the case when is diagonalizable, the solution consists of exactly eigenpairs, and if has distinct eigenvalues thenA the solution is unique, up to the phases of the n . A vi 1.1.1 Accuracy and Conditioning Due to the Abel-Ruffini theorem, it is impossible to have a finite-time algorithm which solves the eigenproblem exactly using arithmetic operations and radicals. Thus, all we can hope for is ap- proximate eigenvalues and eigenvectors, up to a desired accuracy . There are two standard notions of approximation. We assume for normalization, where > 0 throughout this work, ‖A‖ ≤ 1 denotes the spectral (the 2 2 operator norm). ‖⋅‖ ‹ →‹

Forward Approximation. Compute pairs ¨ ¨ such that (i ,vi )

¨ and ¨ |i − i |≤ ‖vi −vi ‖≤ for the true eigenpairs , i.e., find a solution close to the exact solution. This makes sense in contexts where the exact(i,v solutioni) is meaningful; e.g. the matrix is of theoretical/mathematical origin, and unstable (in the entries) quantities such as eigenvalue multiplicity can have a signifi- cant meaning.

Backward Approximation. Compute ¨ ¨ which are the exact eigenpairs of a matrix ¨ satisfying (i ,vi ) A

¨ ‖A −A‖≤, i.e., find the exact solution to a nearby problem. This is the appropriate and standard notion in scientific computing, where the matrix is of physical or empirical origin and is not assumed to be known exactly (and even if it were, roundoff error would destroy this exactness). Note that

since diagonalizable matrices are dense in n×n, one can hope to always find a complete set of ℂ eigenpairs for some nearby ¨ −1, yielding an approximate diagonalization of : approxi- A = VDV A mate −1 (1) diago- ‖A−VDV ‖≤. naliza- 4 tion Note that the eigenproblem in either of the above formulations is not easily reducible to the problem of computing eigenvalues, since they can only be computed approximately and it is not clear how to obtain approximate eigenvectors from approximate eigenvalues. We now introduce a condition number for the eigenproblem, which measures the sensitivity of the eigenpairs of a matrix to perturbations and allows us to relate its forward and backward approximate solutions.

Condition Numbers. For diagonalizable , the eigenvectorcondition number of , denoted , is defined as: A A V (A)

−1 (2) V V  (A) ∶= infV ‖V‖‖V ‖,  (A) where the infimum is over all invertible such that −1 for some diagonal , and its minimum eigenvalue gap is defined as: V A = V DV D

i j gap(A) ∶= mini≠j | (A)−  (A)|, gap(A) where are the eigenvalues of (with multiplicity). We define the condition number of the eigenproblemi to be2: A

V (A) (3) eig(A) ∶= ∈ [0, ∞]. eig gap(A) It follows from the following proposition (whose proof appears in Section 2.2) that a -backward approximate solution of the eigenproblem is a -forward approximate solution 3 6neig(A) Proposition 1.1. If ¨ , ¨ , and , ¨ ¨ are eigenpairs of ¨ ‖A‖, ‖A ‖≤ 1 ‖A − A ‖≤ {(vi, i)}i≤n {(vi , i )}i≤n A, A with distinct eigenvalues, and gap(A) , then  < 8V (A) ¨ and ¨ (4) ‖vi − vi‖≤ 6neig(A) ‖i − i‖ ≤ V (A) ≤ 2eig(A) ∀i = 1, … ,n, after possibly multiplying the by phases. vi Note that if and only if has a double eigenvalue; in this case, a relation like (4) is not possible sinceeig = ∞ different infinitesimalA changes to can produce macroscopically different eigenpairs. A In this paper we will present a backward approximation approximation for the eigenproblem with running time scaling polynomially in , which by (4) yields a forward approximation algorithm with running time scaling polynomiallylog(1/) in . log(1/eig)

Remark 1.2 (Multiple Eigenvalues). A backward approximation algorithm for the eigenproblem can be used to accurately find bases for the eigenspaces of matrices with multiple eigenvalues, but quantifying the forward error requires introducing condition numbers for invariant subspaces rather than eigenpairs. A standard treatment of this can be found in any numerical linear algebra textbook, e.g. [Dem97], and we do not discuss it further in this paper for simplicity of exposition. 2This quantity is inspired by but not identical to the “reciprocal of the distance to ill-posedness” for the eigen- problem considered by Demmel [Dem87], to which it is polynomially related. 3 In fact, it can be shown that is related by a factor to the smallest constant for which (4) holds for all sufficiently small . eig(A) poly(n) >0 5 1.1.2 Models of Computation These questions may be studied in various computational models: exact real arithmetic (i.e., infi- nite precision), variable precision rational arithmetic (rationals are stored exactly as numerators and denominators), and finite precision arithmetic (real numbers are rounded to a fixed number of bits which may depend on the input size and accuracy). Only the last two models yield actual Boolean complexity bounds, but introduce a second source of error stemming from the fact that computers cannot exactly represent real numbers. We study the third model in this paper, axiomatized as follows.

Finite Precision Arithmetic. We use the standard axioms from [Hig02]. Numbers are stored and manipulated approximately up to some machine precision u u , which for us will depend on the instance size and desired accuracy . This means∶= every(,n number) > 0 is stored as fl for somen adversarially chosen  satisfying u, and eachx ∈ ℂ arithmetic operation(x) = (1+Δ)x is guaranteed to yield anΔ output ∈ ℂ satisfying|Δ| ≤ ◦ ∈ {+,−,×,÷} fl u (x◦y) =(x◦y)(1 + Δ) |Δ|≤ . It is also standard and convenient to assume that we can evaluate √ for any , where again x x ∈ ℝ fl √ √ for u. ( x)Thus, = thex(1+Δ) outcomes|Δ of|≤ all operations are adversarially noisy due to roundoff. The bit lengths of numbers stored in this form remain fixed at u , where denotes the logarithm base 2. The lg(1/ ) lg lg bit complexity of an algorithm is therefore the number of arithmetic operations times ∗ u , the running time of standard floating point arithmetic, where the suppresses O (log(1/u factors.)) We will state all running times in terms of arithmetic operations∗ accompaniedlogby log(1/ the required) number of bits of precision, which thereby immediately imply bit complexity bounds.

Remark 1.3 (Overflow, Underflow, and Additive Error). Using bits for the exponent in the

floating-point representation allows one to represent numbers with magnitude in the range −2 2 . [2 p 2 p ] It can be easily checked that all of the nonzero numbers, norms, andp condition numbers appear- ing during the execution of our algorithms lie in the range − lg (n/) lg (n/) for some small , so, overflow and underflow do not occur. In fact, we could have[2 analyzedc 2 ourc algorithm] in a compu- tational model where every number is simply rounded to the nearest, rational with denominatorc lg (n/)—corresponding to additive arithmetic errors. We have chosen to use the multiplicative error2 c floating point model since it is the standard in numerical analysis, but our algorithms do not exploit any subtleties arising from the difference between the two models. The advantages of the floating point model are that it is realistic and potentially yields very fast algorithms by using a small number of bits of precision (polylogarithmic in and ), in con- trast to rational arithmetic, where even a simple operation such as inverting an integer1/ matrix × requires extra bits of precision (see, e.g., Chapter 1 of [GLS12]). An iterative algorithmn  that can be implemented in finite precision (typically, polylogarithmic in the input sizen andn desired accu- racy) is calledn numerically stable, and corresponds to a dynamical system whose trajectory to the approximate solution is robust to adversarial noise (see, e.g. [Sma97]).

6 The disadvantage of the model is that it is only possible to compute forward approximations of quantities which are well-conditioned in the input — in particular, discontinuous quantities such as eigenvalue multiplicity cannot be computed in the floating point model, since it is not even assumed that the input is stored exactly.

1.2 Results and Techniques In addition to , we will need some more refined quantities measure the stability of the eigen- values and eigenvectorseig of a matrix to perturbations, and to state our results regarding it. The most important of these is the -pseudospectrum, defined for any and n×n as: 0 ∈ ℂ for some (5) Λ( )∶={ ∈ ℂ ∶ ∈ Λ( + ) ‖ ‖ > }ΛM  (6) ô −1ô M = ∈ ℂ ∶ ô( − M) ô E 1/ E < where denotes the spectrum of a matrix. The equivalence of (5) and (6) is simple and can be found inΛ( the⋅) excellent book [TE05].  M > 

Eigenvalue Gaps, , and Pseudospectral Shattering. The key probabilistic result of the paper is that a randomV complex Gaussian perturbation of any matrix yields a nearby matrix with large minimum eigenvalue gap and small . V Theorem 1.4 (Smoothed Analysis of and ). Suppose n×n with , and . Let be an matrix with i.i.d. complexgap  GaussianV ∈ ℂ entries,‖ and‖ ≤ let1 ∈ (0 1/2). n × (0 1ℂ/ ) ∶= + n Then  A A , G n n 2 4 Nand, n X A G V ( ) ≤ gap( ) ≥ 5 ‖ n‖ ≤ 4

with probability at least 2. n 1 − 12/X , X , G , The proof of Theorem 1.4 appears in Section 3.1n . The key idea is to first control using [BKMS19], and then observe thatn for a matrix with small , two eigenvalues of nearV a( complex) V number imply a small second-least singular value of , which we are able to control. X − In Section 3.2 we develop the notion of pseudospectral shattering, which isX implied by Theo- rem 1.4 andz says roughly that the pseudospectrum consistsz X of components that lie in separate squares of an appropriately coarse grid in the complex plane. This is useful in the analysis of the spectral bisection algorithm in Section 5. n

Matrix Sign Function. The sign function of a number with is defined as if and if . The matrix sign function of a∈ matrixℂ Re(with) Jordan≠ 0 normal form+1 Re( ) 0 −1 Re( ) 0 sgn(⋅) z z z > z < −1 A = N where (resp. ) has eigenvalues withA strictlyV negativeV (resp., positive) real part, is defined as 4 P5

N P − N −1 sgn( )= P I A V V , 47 I 5 where denotes the identity of the same size as . The sign function is undefined for matrices with eigenvaluesP on the imaginary axis. Quantifying this discontinuity, Bai and Demmel [BD98] definedI the following condition number for the signP function:

2 does not intersect the imaginary axis (7) sign( ) ∶= inf 1/ ∶Λ( ) sign and gave perturbation bounds for depending on .  M  M ,  Roberts [Rob80] showed that thesgn( simple) iteration sign M  −1 k + k (8) k+1 = A 2A converges globally and quadratically to A in exact arithmetic, but his proof relies on the fact that all iterates of the algorithm are simultaneouslysgn( ) diagonalizable, a property which is destroyed 4 in finite arithmetic since inversions can onlyA be done approximately. In Section 4 we show that this iteration is indeed convergent when implemented in finite arithmetic for matrices with small , given a numerically stable matrix inversion algorithm. This leads to the following result: sign Theorem 1.5 (Sign Function Algorithm). There is a deterministic algorithm SGN which on input an matrix with , a number with , and a desired accuracy , , outputs× an approximation‖ ‖≤SGN1 with ≥ sign( ) ∈ (0 1/12) (A) n n A A K K  A SGN ‖ (A) − sgn(A)‖≤ , in INV (9) O((log K + log log(1/ ))T (n)) arithmetic operations on a floating point machine with

u 3 lg(1/ )= O(log n log K(log K + log(1/ ))) bits of precision, where denotes the number of arithmetic operations used by a numerically TINV n stable matrix inversion algorithm( ) (satisfying Definition 2.7). The main new idea in the proof of Theorem 1.5 is to control the evolution of the pseudospec- tra of the iterates with appropriately decreasing (in ) parameters , using a sequence of A k  carefullyΛ ( k chosen) shrinking contour integrals in the complex plane. The pseudospectrumk provides k a richer induction hypothesis than scalar quantities such as condition numbers, and allows one to control all quantities of interest using the holomorphic functional calculus. This technique is introduced in Sections 4.1 and 4.2, and carried out in finite arithmetic in Section 4.3, yielding Theorem 1.5.

Diagonalization by Spectral Bisection. Given an algorithm for computing the sign function, there is a natural and well-known approach to the eigenproblem pioneered in [BJD74]. The idea is

4 Doing the inversions exactly in rational arithmetic could require numbers of bit length k for iterations, which will typically not even be polynomial. n k

8 that the matrices are spectral projectors onto the invariant subspaces corresponding I A to the eigenvalues( of±sgn(in))/2 the left and right open half planes, so if some shifted matrix or A z A has roughly half its eigenvalues in each half plane, the problem can be reduced to smaller+ z iA subproblems+ appropriate for recursion. The two difficulties in carrying out the above approach are: (a) efficiently computing the sign function (b) finding a balanced splitting along an axis that is well-separated from the spectrum. These are nontrivial even in exact arithmetic, since the iteration (8) converges slowly if (b) is not satisfied, even without roundoff error. We use Theorem 1.4 to ensure that a good splitting always exists after a small Gaussian perturbation of order , and Theorem 1.5 to compute splittings  efficiently in finite precision. Combining this with well-understood techniques such as rank- revealing QR factorization, we obtain the following theorem, whose proof appears in Section 5.1. Theorem 1.6 (Backward Approximation Algorithm). There is a randomized algorithm EIG which on input any matrix with and a desired accuracy parameter outputs a A n×n A  > diagonal and invertible∈ ℂsuch that‖ ‖ ≤ 1 0 D V

−1 2.5 ‖A − V DV ‖≤  and (V ) ≤ 32n / in n O TMM n 2  ( ) log   arithmetic operations on a floating point machine with

4 O(log (n/) log n) bits of precision, with probability at least . Here refers to the running time of n n2 TMM n a numerically stable matrix multiplication1 −algorithm 1/ − 12/ (detailed in Section( ) 2.5). Since there is a correspondence in terms of the condition number between backward and forward approximations, and as it is customary in numerical analysis, our discussion revolves around backward approximation guarantees. For convenience of the reader we write down below

the explicit guarantees that one gets by using (4) and invoking EIG with accuracy  . 6neig Corollary 1.7 (Forward Approximation Algorithm). There is a randomized algorithm which on input any matrix with , a desired accuracy parameter , and an estimate A n×n A  > outputs a∈ ℂforward approximate‖ ‖ ≤ 1 solution to the eigenproblem for 0in K ≥ eig(A) − A nK O TMM n 2 0 ( ) log  1 arithmetic operations on a floating point machine with

4 O(log (nK/) log n) bits of precision, with probability at least . Here refers to the running time of n n2 TMM n a numerically stable matrix multiplication1 −algorithm 1/ − 12/ (detailed in Section( ) 2.5).

9 Remark 1.8 (Accuracy vs. Precision). The gold standard of “backward stability” in numerical analysis postulates that u log(1/ ) = log(1/) + log(n), i.e., the number of bits of precision is linear in the number of bits of accuracy. The relaxed notion of “logarithmic stability” introduced in [DDHK07] requires

u c log(1/ ) = log(1/)+ O(log (n) log()) for some constant , where is an appropriate condition number. In comparison, Theorem 1.6 c  obtains the weaker relationship

u 4 5 log(1/ )= O(log (1/) log(n) + log (n)), which is still polylogarithmic in in the regime . n  = 1/poly(n) 1.3 Related Work Minimum Eigenvalue Gap. The minimum eigenvalue gap of random matrices has been studied in the case of Hermitian and unitary matrices, beginning with the work of Vinson [Vin11], who proved an lower bound on this gap in the case of the Gaussian Unitary Ensemble (GUE) n−4/3 and the CircularΩ( Unitary) Ensemble (CUE). Bourgade and Ben Arous [AB13] derived exact limit- ing formulas for the distributions of all the gaps for the same ensembles. Nguyen, Tao, and Vu [NTV17] obtained non-asymptotic inverse polynomial bounds for a large class of non-integrable Hermitian models with i.i.d. entries (including Bernoulli matrices).

In a different direction, Aizenman et al. proved an inverse-polynomial bound [APS+17] in the case of an arbitrary Hermitian matrix plus a GUE matrix or a Gaussian Orthogonal Ensemble (GOE) matrix, which may be viewed as a smoothed analysis of the minimum gap. Theorem 3.6 may be viewed as a non-Hermitian analogue of the last result. In the non-Hermitian case, Ge [Ge17] obtained an inverse polynomial bound for i.i.d. ma- trices with real entries satisfying some mild moment conditions, and [SJ12]5 proved an inverse polynomial lower bound for the complex Ginibre ensemble. Theorem 3.6 may be seen as a gen- eralization of these results to non-centered complex Gaussian matrices.

Smoothed Analysis and Free Probability. The study of numerical algorithms on Gaussian random matrices (i.e., the case of smoothed analysis) dates back to [VNG47, Sma85, Dem88, A Ede88]. The powerful idea of improving= 0 the conditioning of a numerical computation by adding a small amount of Gaussian noise was introduced by Spielman and Teng in [ST04], in the context of the simplex algorithm. Sankar, Spielman, and Teng [SST06] showed that adding real Gaussian noise to any matrix yields a matrix with polynomially-bounded condition number; [BKMS19] can be seen as an extension of this result to the condition number of the eigenvector matrix, where the proof crucially requires that the Gaussian perturbation is complex rather than real. The main difference between our results and most of the results on smoothed analysis (including

5At the time of writing, the work [SJ12] is still an unpublished arXiv preprint.

10 [ABB+18]) is that our running time depends logarithmically rather than polynomially on the size of the perturbation. The broad idea of regularizing the spectral instability of a nonnormal matrix by adding a ran- dom matrix can be traced back to the work of Śniady [Śni02] and Haagerup and Larsen [HL00] in the context of Free Probability theory.

Matrix Sign Function. The matrix sign function was introduced by Zolotarev in 1877. It be- came a popular topic in numerical analysis following the work of Beavers and Denman [BD73, BJD74, DBJ76] and Roberts [Rob80], who used it first to solve the algebraic Ricatti and Lyapunov equations and then as an approach to the eigenproblem; see [KL95] for a broad survey of its early history. The numerical stability of Roberts’ Newton iteration was investigated by Byers [Bye86], who identified some cases where it is and isn’t stable. Malyshev [Mal93], Byers, He, and Mehrmann [BHM97], Bai, Demmel, and Gu [BDG97], and Bai and Demmel [BD98] studied the condition number of the matrix sign function, and showed that if the Newton iteration converges then it can be used to obtain a high-quality invariant subspace6, but did not prove convergence in finite arithmetic and left this as an open question.7 The key issue in analyzing the convergence of the iteration is to bound the condition numbers of the intermediate matrices that appear, as N. Higham remarks in his 2008 textbook: Of course, to obtain a complete picture, we also need to understand the effect of round- ing errors on the iteration prior to convergence. This effect is surprisingly difficult to analyze. . . . Since errors will in general occur on each iteration, the overall error will be a complicated function of and for all . . . . We are not aware of sign X E k any published rounding error analysis( fork) the computationk of sign via the Newton A iteration. –[Hig08, Section 5.7] ( ) This is precisely the problem solved by Theorem 1.5, which is as far as we know the first prov- able algorithm for computing the sign function of an arbitrary matrix which does not require computing the Jordan form. In the special case of Hermitian matrices, Higham [Hig94] established efficient reductions between the sign function and the polar decomposition. Byers and Xu [BX08] proved backward stability of a certain scaled version of the Newton iteration for Hermitian matrices, in the context of computing the polar decomposition. Higham and Nakatsukasa [NH13] (see also the improve- ment [NF16]) proved backward stability of a different iterative scheme for computing the polar decomposition, and used it to give backward stable spectral bisection algorithms for the Hermi-

tian eigenproblem with 3 -type complexity. O(n ) Non-Hermitian Eigenproblem. Floating Point Arithmetic. The eigenproblem has been thor- oughly studied in the numerical analysis community, in the floating point model of computation. While there are provably fast and accurate algorithms in the Hermitian case (see the next subsec- tion) and a large body of work for various structured matrices (see, e.g., [BCD+05]), the general 6This is called an a fortiriori bound in numerical analysis. 7[BHM97] states: “A priori backward and forward error bounds for evaluation of the matrix sign function remain elusive.”

11 Result Error Arithmetic Ops Boolean Ops Restrictions

[Par98] Backward 3 2 3 2 Hermitian n + n log(1/) n log(n/)+ n log(1/a ) log(n/) [ABB+18] Backward 10 2 10 2 n / n / ⋅ polylog(n/) [BOE18] Backward !+1 !+1 Hermitian b n polylog(n) log(1/) n polylog(n) log(1/) Theorem 1.6 Backward MM 2 MM 6 T (n) log (n/) T (n) log (n/) log(n) Corollary 1.7 Forward MM 2 MM 6 T (n) log (neig/) T (n) log (neig/) log(n) a Does not specify a particular bound on precision. b MM for every , see Definition 2.6 for details. T (n) = O(n!+) >0 Table 1: Results for finite-precision floating-point arithmetic case is not nearly as well-understood. As recently as 1997, J. Demmel remarked in his well-known textbook [Dem97]: “. . . the problem of devising an algorithm [for the non-Hermitian eigenprob- lem] that is numerically stable and globally (and quickly!) convergent remains open." Demmel’s question remained entirely open until 2015, when it was answered in the follow-

ing sense by Armentano, Beltrán, Bürgisser, Cucker, and Shub in the remarkable paper [ABB+18]. They exhibited an algorithm (see their Theorem 2.28) which given any with A n×n ‖A‖ and produces in expected arithmetic operations the diagonalization∈ ℂ of the nearby≤ 1  > O n9  2 random0 perturbation ( / where) is a matrix with standard complex Gaussian entries. By A G G setting sufficiently small,+ this may be viewed as a backward approximation algorithm for di-  agonalization, in that it solves a nearby problem essentially exactly8 – in particular, by setting √ and noting that √ with very high probability, their result implies a running   n ‖G‖ O n time= of/ in our setting.= Their( ) algorithm is based on homotopy continuation methods, O n10  2 which they( argue/ ) informally are numerically stable and can be implemented in finite precision arithmetic. Our algorithm is similar on a high level in that it adds a Gaussian perturbation to the input and then obtains a high accuracy forward approximate solution to the perturbed problem. The difference is that their overall running time depends polynomially rather than logarithmically on the accuracy desired with respect to the original unperturbed problem.  Other Models of Computation. If we relax the requirements further and ask for any provable algorithm in any model of Boolean computation, there is only one more positive result with a polynomial bound on the number of bit operations: Jin Yi Cai showed in 1994 [Cai94] that given a rational matrix with integer entries of bit length , one can find an -forward approx- n n A a  imation to its× Jordan Normal Form in time , where the degree of A V JV −1 n,a,  the polynomial is at least 12. This algorithm= works in thepoly( rationallog(1/ arithmetic)) model of computa- tion, so it does not quite answer Demmel’s question since it is not a numerically stable algorithm. However, it enjoys the significant advantage of being able to compute forward approximations to discontinuous quantities such as the Jordan structure. As far as we are aware, there are no other published provably polynomial-time algorithms for the general eigenproblem. The two standard references for diagonalization appearing most

8The output of their algorithm is vectors on each of which Newton’s method converges quadratically to an eigenvector, which they refer to as “approximationn à la Smale”.

12 Result Model Error Arithmetic Ops Boolean Ops Restrictions [Cai94] Rational Forwarda b poly(a,n, log(1/)) poly(a,n, log(1/)) c [PC99] Rational Forward ! !+1 2 Eigenvalues only c n + n log log(1/) n a + n log(1/) log log(1/) [LV16] Finite Forward ! ! 4 2 Hermitian, only n log(n) log(1/) n log (n) log (n/) 1 a Actually computes the Jordan Normal Form. The degree of the polynomial is not specified, but is at least in . b In the bit operations, denotes the bit length of the input entries. 12 n c Uses a custom bit representationa of intermediate quantities.

Table 2: Results for other models of arithmetic often in theoretical computer science papers do not meet this criterion. In particular, the widely cited work by Pan and Chen [PC99] proves that one can compute the eigenvalues of in A O n! (suppressing logarithmic factors) arithmetic operations by finding the roots( of its+ n  characteristiclog log(1/ )) polynomial, which becomes a bound of bit oper- O n!+1a n2   ations if the characteristic polynomial is computed exactly( in rational+ log(1/arithmetic) log log(1/ and the)) matrix has entries of bit length . However that paper does not give any bound for the amount of time a taken to find approximate eigenvectors from approximate eigenvalues, and states this as an open problem.9 Finally, the important work of Demmel, Dumitriu, and Holtz [DDH07] (see also the followup [BDD10]), which we rely on heavily, does not claim to provably solve the eigenproblem either—it bounds the running time of one iteration of a specific algorithm, and shows that such an iteration can be implemented numerically stably, without proving any bound on the number of iterations required in general.

Hermitian Eigenproblem. For comparison, the eigenproblem for Hermitian matrices is much better understood. We cannot give a complete bibliography of this huge area, but mention one relevant landmark result: the work of Wilkinson [Wil68] and Hoffman-Parlett [HP78] in the 60’s and 70’s, which shows that the Hermitian eigenproblem can be solved with backward error  in arithmetic operations with bits of precision. There has also O n3 n2  O n  recently( + beenlog(1/ renewed)) interest in this problem in the(log( theoretical/ )) computer science community, with the goal of bringing the runtime close to : Louis and Vempala [LV16] show how to O n! find a approximation of just the largest eigenvalue( ) in bit operations,  O n! 4 n 2  and Ben-Or− and Eldar [BOE18] give an -bit-operation( log ( ) log algorithm(1/ )) for finding a O n!+1 n -approximate diagonalization of an( polylog(Hermitian)) matrix normalized to have . 1/poly(n) n × n ‖A‖ ≤ 1 Remark 1.9 (Davies’ Conjecture). The beautiful paper [Dav07] introduced the idea of approxi- mating a matrix function for nonnormal by for some well-chosen regularizing f A A f A E E the eigenvectors of . This( directly) inspired our approach( + ) to solving the eigenproblem via regu- A larization. 9 “The remaining nontrivial problems are, of course, the estimation of the above output precision [sufficient p for finding an approximate eigenvector from an approximate eigenvalue], . . . . We leave these open problems as a challenge for the reader.” – [PC99, Section 12].

13 The existence of an approximate diagonalization (1) for every with a well-conditioned simi- A larity (i.e, depending polynomially on and ) was precisely the content of Davies’ con- V  V  n jecture [Dav07(],) which was recently solved by some of the authors and Mukherjee in [BKMS19]. The existence of such a is a pre-requisite for proving that one can always efficiently find an V approximate diagonalization in finite arithmetic, since if is very large it may require ‖V‖‖V −1‖ many bits of precision to represent. Thus, Theorem 1.6 can be viewed as an efficient algorithmic answer to Davies’ question. Reader Guide. This paper contains a lot of parameters and constants. On first reading, it may be good to largely ignore the constants not appearing in exponents, and to keep in mind the typical setting for the accuracy, in which case the important auxiliary parameters  n are= all 1/poly( ) , and the machine precision is u . !, 1− ,, , 1/poly(n) log(1/ ) = polylog(n) 2 Preliminaries

Let be a complex matrix, not necessarily normal. We will write matrices and vectors M n×n with uppercase∈ ℂ and lowercase letters, respectively. Let us denote by the spectrum of and M M by its individual eigenvalues. In the same way we denote the singularΛ( ) values of by  M M  M M ,  M andi we( ) adopt the convention . When is clear from the contexti( we) Λ( ) i( )  M  M ⋯≥ M M  M will simplify notation and just write1( )≥ 2( or)≥ respectively.n( ) i( ) ,   Recall that the operator norm of Λis i i M

‖M‖ = 1(M) = sup ‖Mx‖. ‖x‖=1

As usual, we will say that is diagonalizable if it can be written as −1 for some diagonal matrix whose nonzeroM entries contain the eigenvalues of . InM this =VDV case we have the spectral expansionD M n ∗ (10) M = ∑ iviwi , i=1 where the right and left eigenvectors and ∗ are the columns and rows of and −1 respectively, vi wj V V normalized so that ∗ . wi vi = 1 2.1 Spectral Projectors and Holomorphic Functional Calculus

Let n×n, with eigenvalues . We say that a matrix is a spectralprojectorfor if spectral M ∈ ℂ 1, ..., n P M and 2 . For instance, each of the terms ∗ appearing in the spectral expansion projector MP =PM P = P viwi (10) is a spectral projector, as ∗ ∗ ∗ and ∗ . If is a simple closed posi- tively oriented rectifiable curveAv iniw thei = complexiviwi = planeviwi A separatingwi vi = 1fromΓ thei rest of the spectrum, then it is well-known that i

∗ 1 −1d viwi =  (z −M) z, 2i Γ by taking the Jordan normal form of the the resolventi −1 and applying Cauchy’s integral resolvent formula. (z −M)

14 Since every spectral projector commutes with , its range agrees exactly with an invariant subspace of . We will often findP it useful to chooseM some region of the complex plane bounded by a simpleM closed positively oriented rectifiable curve , and compute the spectral projector onto the invariant subspace spanned by those eigenvectorsΓ whose eigenvalues lie inside . Such a projector can be computed by a contour integral analogous to the above. Γ Recall that if is any function, and is diagonalizable, then we can meaningfully define f M −1, where is simply the result of applying to each element of the diagonal matrixf(M) ∶= V. The f (D)holomorphicV f functional(D) calculus gives an equivalentf definition that extends to the case whenD is non-diagonalizable. As we will see, it has the added benefit that bounds on the norm of theM resolvent of can be converted into bounds on the norm of . M f (M) Proposition 2.1 (Holomorphic Functional Calculus). Let be any matrix, be an open neighborhood of its spectrum (not necessarily connected), andM be simpleB ⊃ Λ( closedM) positively oriented rectifiable curves in whose interiors together contain allΓ1, of ..., Γk . Then if is holomorphic on , the definition B Λ(M) f B k 1 −1d holomor- f (M) ∶= ∑  f (z)(z − M) z 2i j=1 Γ phic is an algebra homomorphism in the sense that j for any and holomorphic functional g M f M g M f g on . (f )( ) = ( ) ( ) calculus B Finally, we will frequently use the resolvent identity

−1 ¨ −1 −1 ¨ ¨ −1 (z − M) −(z − M ) =(z − M) (M − M )(z − M ) to analyze perturbations of contour integrals.

2.2 Pseudospectrum and Spectral Stability The pseudospectrum of a matrix is defined in (5). Directly from this definition, we can relate  the pseudospectra− of a matrix and a perturbation of it.

Proposition 2.2 ([TE05], Theorem 52.4). For any matrices and and any , n n M E  > M ⊆ . × 0 Λ−‖E‖( ) Λ(M + E) It is also immediate that , and in fact a stronger relationship holds as well: Λ(M) ⊂ Λ(M) Proposition 2.3 ([TE05], Theorem 4.3). For any matrix , any bounded connectedcomponent n n M of must contain an eigenvalue of . × Λ(M) M Several other notions of stability will be useful to us as well. If has distinct eigenvalues M , and spectral expansion as in (10), we define the eigenvalue condition number of to be 1, … , n i ô ∗ô (i) ∶= ôviwi ô = ‖vi‖‖wi‖. (i)

15 By considering the scaling of in (2) in which its columns have unit length, so that , V v   ‖w ‖ we obtain the useful relationship i ( i)= i

−1 −1 2 (11) V (M) ≤ ‖V‖‖V ‖ ≤ ‖V‖F ‖V ‖F ≤ n ⋅ (i) . v i≤n Note also that the eigenvector condition number and pseudospectrum∑ are related as follows: Lemma 2.4 ([TE05]). Let denote the open disk of radius centered at . For every D z,r r z , ( ) ∈ ℂ M n×n ∈ ℂ (12) D(i,) ⊂ Λ(M) ⊂ D(i, V (M)). i i In this paper we will repeatedly⋃ use that assumptions⋃ about the pseudospectrum of a matrix can be turned into stability statements about functions applied to the matrix via the holomorphic functional calculus. Here we describe an instance of particular importance. Let be a simple eigenvalue of and let be a contour in the complex plane, as in Section  M 2.1, separatingi from the rest of the spectrumΓi of , and assume . Then, for any i M Λ(M)∩Γ=∅ ¨ , a combination of Proposition 2.2 and Proposition 2.3 implies that there is a M − M << unique eigenvalue ¨ of ¨ in the region enclosed by , and furthermore ¨ . If ¨ i M Γ Λ−(M )∩Γ=∅ vi and‖ ¨ are‖ the right and left eigenvectors of ¨ corresponding to ¨ we have wi M i d v¨w¨∗ v w∗ 1 z M −1 z M ¨ −1 z i i − i i =  ( − ) −( − ) 2 ô Γ ô ô ô d ‖ ‖ 1 ô z M −1 M M ¨ z Mô¨ −1 z =  ô ( − ) ( − )( − ô ) 2 ô Γ ô ô  ô (13) ‹(Γ)ô ô ≤ ô . ô 2 ( − ) We have introduced enough tools to prove Proposition 1.1.

Proof of Proposition 1.1. For define ¨. Since gap(A) the Bauer- t ∈ [0, 1] A(t) = (1− t)A + tA  < 8 (A) Fike theorem implies that has distinct eigenvalues for all , and in fact . A t t VA t 3gap(A) Standard results in perturbation( ) theory [GLO20] imply that for every gap( (, )) ≥ has4 a i , ,n A t unique eigenvalue such that is a differentiable trajectory, = 1and… ( ) . Let  t  t    ¨ and be thei( right) and lefti eigenvectors( ) of respectively, withi(0) = i i.(1) = i v t w t  t v t i( Let) bei( the) positively oriented contour formingi the( ) boundary of the diski( cent) =ered 1 at with  radius Γi , and define . Lemma 2.4 implies , and for fixed i , A  gap(A) A ‖ ‖ t , since gap( )/2 , Proposition= 8 (A) 2.2 gives Λ2( )∩Γ. By(i 13=∅) ∈ [0 1] A − A(t)

‖ ‖ ∗ ∗ ‹(Γi) (i)− (i(t)) ≤ vi(t)wi (t)− viwi ≤ 2 = 2V (A), 4 and hence | | ‖ . Combining‖ this with (11) we obtain (i(t)) ≤ (i) + 2V (A) ≤ 3V (A)

2 V (A(t)) ≤ 2 n ⋅ (i) < 4nV (A). v i ∑16 On the other hand, from standard perturbation theory we know that the phases of the may v t be chosen so that is a differentiable function, and moreover one can show that i( ) vi(t)  A t Ȧ t  A t v̇ t V ( ( )) ( ) < V ( ( )) i( ) ≤ A t A t gap( ‖( )) ‖ gap( ( )) see Section 2 of [GLO20] or the‖ references‖ therein for a derivation; of these facts. Now, using that and 3gap(A) , the above inequality yields 16n (A) . The V A t nV A A t v̇ i t desired result≤ is then obtained by integrating≥ 4 from 0 to 1. ≤ 3gap(VA) v̇ i t ( ( )) 4 ( ) gap( ( )) ‖ ( )‖ 2.3 Finite-Precision Arithmetic ( ) We briefly elaborate on the axioms for floating-point arithmetic given in Section 1.1. Similar guar- antees to the ones appearing in that section for scalar-scalar operations also hold for operations such as matrix-matrix addition and matrix-scalar multiplication. In particular, if is an A n n complex matrix, fl u × A A A◦ i,j < . It will be convenient for us to write such errors in additive, as opposed to multiplicative form. We can convert the above to additive error( )= as follows.+ Δ Recall|Δ | that for any matrix, the spectral n n norm (the operator norm) is at most times the operator norm, i.e. the 2 2 n 2 1 maximal norm‹ → of a ‹ column. Thus we have ‹ → ‹ × √ u (14) A n A ei n i,j Aei n A . ◦ ≤ i ◦ ≤ i,j i ≤ √ √ √ For more complicated‖ Δ operations‖ max such‖( Δ) as matrix-matrix‖ max |Δ multiplicat| max ‖ ‖ion and‖ matrix‖ inversion, we use existing error guarantees from the literature. This is the subject of Section 2.5. We will also need to compute the trace of a matrix , and normalize a vector . A n×n n Error analysis of these is standard (see for instance the discussionℂ in [Hig02, Section 3.1-3.4,x 4.1])ℂ and the results in this paper are highly insensitive to the details.∈ For simplicity, calling ∈ , we will assume that x̂ x x ∶= /‖ ‖ fl u (15) A A ≤n A fl u (16) | (Tr )x −̂ Trx̂ | ≤n‖ . ‖ Each of thesecan be achievedby assumingthat u for some suitably chosen , independent of ‖ ( )− ‖ , a requirement which will be depreciated shortlyn≤ by several tighter assumptions on the machine nprecision. Throughout the paper, we will take the pedagogical perspective that our algorithms are games played between the practitioner and an adversary who may additively corrupt each operation. In particular, we will include explicit error terms (always denoted by ) in each appropriate step of every algorithm. In many cases we will first analyze a routine in( exa⋅) ct arithmetic—in which case the error terms will all be set to zero—and subsequently determineE the machine precision u necessary to make the errors small enough to guarantee convergence.

17 2.4 Sampling Gaussians in Finite Precision For various parts of the algorithm, we will need to sample from normal distributions. For our model of arithmetic, we assume that the complex normal distribution can be sampled up to ma- chine precision in arithmetic operations. To be precise, we assume the existence of the following sampler: O(1) Definition 2.5 (Complex Gaussian Sampling). A N-stable Gaussian sampler N takes as input and outputs a sample of a random variable N with the property that there exists ℝ≥0 2 satisfying c () ℂ ,  ∈ NGž =u () N Gž G ≤c  ⋅ c withG ∼ N probability(0 ) one, in at most N arithmetic operations for some universal constant N . N T − T > T We will only sample 2 Gaussians| during| the algorithm, so this sampling will not con- tribute significantly to theO runtime.n Here as everywhere in the paper, we will omit issues0 of underflow or overflow. Throughout( ) this paper, to simplify some of our bounds, we will also assume that N . c ≥ 2.5 Black-box1 Error Assumptions for Multiplication, Inversion, and QR Our algorithm uses matrix-matrix multiplication, matrix inversion, and QR factorization as prim- itives. For our analysis, we must therefore assume some bounds on the error and runtime costs incurred by these subroutines. In this section, we first formally state the kind of error and runtime bounds we require, and then discuss some implementations known in the literature that satisfy each of our requirements with modest constants. Our definitions are inspired by the definition of logarithmic stability introduced in [DDH07]. Roughly speaking, they say that implementing the algorithm with floating point precision u yields an accuracy which is at most polynomially or quasipolynomially in worse than u (pos- sibly also depending on the condition number in the case of inversion). Theirn definition has the property that while a logarithmically stable algorithm is not strictly-speaking backward stable, it can attain the same forward error bound as a backward stable algorithm at the cost of increasing the bit length by a polylogarithmic factor. See Section 3 of their paper for a precise definition and a more detailed discussion of how their definition relates to standard numerical stability notions.

Definition 2.6. A MM -stable multiplication algorithm MM takes as input n×n and MM a precision u and outputsn MM satisfying ⋅,⋅ A, B ℂ > ( ) A, B ( ) ∈ MM u MM 0 C = ( ≤) ⋅ ,  on a floating point machine with precision u, in MM arithmetic operations. MM ‖C − AB‖  T(n) n ‖A‖‖B‖ T n Definition 2.7. A INV INV stable inversion algorithm INV takes as input n×n and a INV precision u and outputs n ,cINV satisfying ( ) ⋅ A ℂ ( ) ( ( ) )− ( ) ∈ u INV C = (A)−1 INV c log n −1 ≤ ⋅ ⋅ , INV INV on a floating point machine with precision u, in INV arithmetic operations.  INV,c ‖C − A ‖  (n) T (A)n ‖A ‖ T 18 ( ) Definition 2.8. A QR -stable QR factorization algorithm QR takes as input n×n and a QR precision u, and outputs n QR such that ⋅ A ℂ ( )Q, ( ) ∈ 1. is exactly upper triangular. [ R] = (A) 2. There is a unitary and a matrix such that R ¨ ¨

Q A ¨ ¨ (17) , and Q A = R ¨ QR u and ¨ QR u QR Q Q ≤ n , A A ≤ n A ,  on a floating point machine with precision u. Its running time is QR arithmetic operations. QR ‖ − ‖ ( ) ‖ − ‖ T ( n) ‖ ‖ T Remark 2.9. Throughout this paper, to simplify some of our bounds, we will assume that ( )

MM INV QR INV ≤ n , n , n ,c n. The above definitions can1 be instantiated( ) ( with) traditional( ) log 3 -complexity algorithms for which MM QR INV are all and INV [Hig02]. This yieldsO easily-implementablen practical algorithms , with, running timesO n dependingc cubically on . ( ) n In order to achieve ! -type( ) efficiency,= 1 we instantiate them with fast-matrix-multiplication- based algorithms and withO n taken to be a low-degree polynomial [DDH07]. Specifically, the following parameters are( known) n to be achievable. ( ) Theorem 2.10 (Fast and Stable Instantiations of MM INV QR). , ,

1. If is the exponent of matrix multiplication, then for every there is a MM stable !  >  n multiplication algorithm with MM c and MM !+ , where does not depend  n n T n O n c on .  0 ( )− n ( )= ( )= ( ) 2. Given an algorithm for matrix multiplication satisfying (1), there is a ( INV INV -stable inversion algorithm with  n ,c ( ) ) INV MM lg(10) INV  n ≤O  n n , c ≤ , and INV MM MM( ) .( ( ) ) 8 T n ≤ T n O T n 3. Given an algorithm for matrix multiplication satisfying (1), there is a stable QR fac- ( ) (3 )= ( ( )) QR torization algorithm with  n QR QR c MM ( )−  n O n  n , where QR is an absolute constant, and QR MM . c ( T)= n( O T ( ))n In particular, all of the running times above are bounded by MM for an matrix. ( )= ( T ( ))n n n ( ) × 19 Proof. (1) is Theorem 3.3 of [DDHK07]. (2) is Theorem 3.3 (see also equation (9) above its state- ment) of [DDH07]. The final claim follows by noting that MM MM by dividing a matrix into nine blocks and proceeding blockwise,T n at theO costT ofn a factor of in n n n n INV . (3) appears in Section 4.1 of [DDH07]. (3 ) = ( ( )) 3 ×n 3 × 9 We( ) remark that for specific existing fast matrix multiplication algorithms such as Strassen’s algorithm, specificsmallvalues of MM are known (see [DDHK07] and its references for details), so these may also be used as a black box,n though we will not do this in this paper. ( ) 3 Pseudospectral Shattering

This section is devoted to our central probabilistic result, Theorem 1.4, and the accompanying notion of pseudospectral shattering which will be used extensively in our analysis of the spectral bisection algorithm in Section 5.

3.1 Smoothed Analysis of Gap and Eigenvector Condition Number As is customary in the literature, we will refer to an random matrix whose entries are independent complex Gaussians drawn from  n asn a normalized complexGn Ginibre random Gn ℂ matrix. To be absolutely clear, and because other choices, n× of scaling are quite common, we mean that E and E 2 . (0 1 / ) InG thei,j course of provingGi,j Theoremn 1.4, we will need to bound the probability that the second- smallest singular= 0 value| | of= an 1/ arbitrary matrix with small Ginibre perturbation is atypically small. We begin with a well-known lower tail bound on the singular values of a Ginibre matrix alone. Theorem 3.1 ([Sza91, Theorem 1.2]). For an normalized complex Ginibre matrix and for any it holds that Gn n n ≥ × 2 0 ℙ n j 2(n−j+1) j Gn < ≤ e . ( −n + 1) √ As in several of the authors’ earlier( work) [BKMS19], we can2 transfer this result to case of a Ginibre perturbation via a remarkable4 coupling result of P.5 Śniady. Theorem 3.2 (Śniady [Śni02]). Let and be complex matrices such that for all . Assume further thatA1 A2 n n and for all i A1. Then≤i A for2 i 1 j 1 i 2 j 2 every ≤i≤n, there exists a joint distribution A on pairs≠  ofA× complex A ≠ matrices  A i≠j(such) that( ) t ≥ n n G1, G2 1. the1 marginals and are distributed( ) as normalized( ) complex( ) Ginibre( ) matrices, and 0 G1 G2 × ( ) 2. almost surely for every . i A1 tG1 ≤i A2 tG2 i Corollary 3.3. For any fixed√ matrix and√ parameters ,t > ( + ) M( + ) ℙ  M G

0 ℙ  M G

Theorem 3.4 ([BKMS19, Theorem 1.5]). Suppose A ℂn×n with A and  , . Let G be a complex Ginibre matrix, and let  , ,  ℂ be the (random) eigenvalues≤ of A G . Thenn for 1 n n every measurable open set B ⊂ℂ, ∈ ‖ ‖ 1 ∈ (0 1) … ∈ + n2 E   2 B . i ≤  2  ∈B Our final lemma before embarking on∑i the( proof) in earnestvol( ) shows that bounds on the j-th small- est singular value and eigenvector condition number are sufficient to rule out the presence of j eigenvalues in a small region. For our particular application we will take j ..

Lemma 3.5. Let D z ,r z ℂ z z

V (z − D)V −1  =z M max 0 x n−j+1 0 S∶dim(minS)=j x∈S {0} ‖ ( x− ) ‖ ( − )== max⧵ V z0 D y setting = S∶dim(S)=jmin y∈V(S) {0} ‖ ‖ y Vx ( −Vy) ⧵ ‖ ‖ = max V z0 D y since is invertible S∶dim(S)=jmin y∈S {0} ‖ ‖ V (Vy− ) ⧵ ‖ ‖ max V z0 D y ≤ min ‖n( )‖ S∶dim(S)=j y∈S {0}  V y V ( ) n j ⧵ ( ‖− ‖‖ ). ‖ ≤  M  − +1 z0 D ‖ ‖ Since z −D is diagonal its singular values are just z −i ,sothe j-th smallestis atmost r, finishing the proof.0 0 We now present the main tail bound that we use| to control| the minimum gap and eigenvector condition number.

21 n n Theorem 3.6 (Multiparameter Tail Bound). Let A ∈ ℂ × . Assume A 1 and < 1/2, and let X A Gn where Gn is a complex Ginibre matrix. For every t,r > : ≤ ‖ ‖ n ∶= +ℙ V X < t, X > r, Gn < trn 8 0 n2 2t2 e−2 . (18) ≥ r 2 ⋅ 144 Proof. Write[ ( X) gap( , ) , n ‖for‖ the4] (random)1− eigenvalues4( / ) of+X (9 / A ) G + 2n, in increasing order of magnitude (there1 are no ties almost surely).0 Let  ⊂ ℂ be a minimal r -net1 of B

D , , recallingΛ( the) ∶= standard { … fact} that one exists of size no more than ∶= r +2 r 2. The most useful feature of such a net is that, by the triangle inequality, for any ⋅ /2with distance∶= at(0 most3) , there is a point y  with y satisfying (3a,b4/ )Dy,r=, 144/. In particular, if X t} E Lemma 3.5 applied to each  with reveals that E ∶= { (y−X)< } ∈ E

D  y , y ∈ gapj = 2 y  ∈ whence E ⊆ E ∪ E ∪ ⋃ E  D  y . gap y  ∈ By a union bound, we have E ∪ E ⊆ E ∪ E ∪ ⋃ E

ℙ  ℙ D   ℙ y . (19) gap y  ≤ ∈ From the tail bound on the operator[E ∪ E norm] of[E a∪ Ginibre E ]+ | matrix| max in[E [BKMS19] , Lemma 2.2],

√ 2n n ℙ D ℙ G −(4−2 2) −2 . (20) ≤ ≤ ≤ Observe that by (11), [E ] [E ] 2e 2e

V X > n  i 2 ⊂ D, T  D , U v ∈ (0 3) which implies that ( ) i ∑ ( ) E

 D i 2 2 < D , = ∈ (0 3) . Theorem 3.4 and Markov’s inequalityE ⊂ E yields∪ i ∑ ( ) >t /n

2 3 ℙ i 2 2  D , ≤ 2 2 2 2 ∈ (0 3) n n n i ∑   >t n 9 9 . 4 ( ) 22/ 5 t = t Thus, we have 3 n ℙ  G −2 ≤ 2n 2 Corollary 3.3 applied to E givesE the9 bound e . [ ∪ ] t + 2

M =−y + A 8 ℙ y ≤ trn E , for each  , and plugging these estimates[ ] back4 0 into1 (19) we have

y ∈ 8 2 n ℙ  G −2 gap ≤ ⋅ 2 trn 2n 2 E E E 144 9 e , as desired. [ ∪ ∪ ] r 4 0 1 + t + 2 A specific setting of parameters in Theorem 3.6 immediately yields Theorem 1.4.

n Proof of Theorem 1.4. Applying Theorem 3.6 with parameters 2 and 4 , we have n5 t r n10 2 ∶=8 ∶=n ℙ V < t, X ⊂ D , e−2 n2, (21) ≥ 8 n2 n2 ≥ X >r,  X 9 as desired,[gap( where) in( the) last stepΛ( ) we use(0 the3)] assumption1 − 600 0 < 1. − − 2 1 − 12/ Since it is of independent interest in random matrix theory, we record the best bound on the 1/2 gap alone that is possible to extract from the theorem above.

Corollary 3.7 (Minimum Gap Bound). For X as in Theorem 3.6,

. . n ℙ X

23 Definition 3.8 (Grid). A grid in the complex plane consists of the boundaries of a lattice of grid squares with lower edges parallel to the real axis. We will write

grid z ,!,s , s ⊂ ℂ 0 1 2 to denote an s s grid of ! !-sized squares and lower left corner at z ℂ. Write g 1 2 ( ) 0 !ts2 s2 for the diameter of the grid. g 1 2 × × ∈ diam( ) ∶= Definition 3.9 (Shattering). A pseudospectrum A is shattered with respect to a grid g if: shattered +  diam( ) 1. Every square of g has at most one eigenvalue of A. Λ ( )

2.  A g . Observation 3.10. As A contains a ball of radius  about each eigenvalue of A, shattering of Λ ( )∩ =∅  the -pseudospectrum with respect to a grid with side length ! implies  ! . Λ ( ) ≤ As a warm-up for more sophisticated arguments later on, we give here an easy consequence of /2 the shattering property.

Lemma 3.11. If  M is shattered with respect to a grid g with side length !, then every eigenvalue ! condition number satisfies i M 2 . Λ ( ) ≤ Proof. Let v,w be a right/left eigenvector pair for some eigenvalue  of M, normalized so that ∗ ( ) i w∗v . Letting be the positively oriented boundary of the square of g containing i, we can extract the projector vw∗ by integrating, and pass norms inside the contour integral to obtain = 1 Γ ! i A vw∗ ô z M −1dzô z M −1 dz . (22) ô i ô  ô ô  ‖ ‖ ô  ô ≤  ô ô ≤ ô 1 Γ ô 1 Γ 2 In the final step we( )= have used= the fact( that,− ) given the definition( − of) pseudospectrum (6) above, 2 2  M g means z M −1  on g. ‖ ‖ ≤ The theorem below quantifies the extent to which perturbing by a Ginibre matrix results in aΛ shattered( )∩ =∅ pseudospectrum.( − ) See Figure1/ 1 for an illustration in the case where the initial matrix is poorly conditioned. In general, not all eigenvalues need move so far upon such a perturbation, in particular if the respective i are small.

n n Theorem 3.12 (Exact Arithmetic Shattering). Let A ℂ × and X A Gn for Gn a complex 4 Ginibre matrix. Assume A and < < . Let g grid z,!, ! , ! with ! n , ‖ ‖ ≤ ⌈ ⌉ ⌈ ⌉ 4 5 and z chosen uniformly at random from the square of side∈ ! cornered at∶= + i. Then, V X n2 , ≤ A X , and  X is shattered1 with0 respect1/2 to g for ∶= ( 8/ 8/ ) ∶= ‖ ‖ ≤ −4−4 ( ) /

− 4 Λ ( )  5 , n9 with probability at least n n . 2 ∶= 16 1 − 1/ − 12/ 24 .1 .1

0 5 0 5

.0 .0

−0 5 −0 5 . . . .

Figure 1: is a−1 sample of an upper triangular 10 × 10 Toeplitz−1 matrix with zeros on the diagonal and an independent standard−1 real −0 Gaussian5 0 repeated 0 5 along 1 each diagonal−1above −0 5 the main 0 diagonal. 0 5 is a sample of a 10 × 10T complex Ginibre matrix with unit variance entries. Using the MATLAB package EigTool G [WT02], the boundaries of the -pseudospectrum of (left) and + 10−6 (right) for = 10−6 are plotted along with the spectra. The latter pseudospectrum is shattered with respect to the pictured grid.  T T G 

Proof. Condition on the event in Theorem 1.4, so that

n2 4 V X , X A , and X !. ≤ ‖ ‖ ≤ ≥ n5 Consider the random grid( )g. Since D −, is contained4 ingap( the square) of= side 4 length centered at the origin, every eigenvalue of X is contained in one square of g with probability . Moreover, since X > !, no square can contain(0 3) two eigenvalues. Let 8 1 distg z z y . gap( ) 4 y g min∈ | − | ! Let i i X . We now have for each i and( ) ∶= every s < : ∶= ( ) 2

! s 2 s s2 s ℙ distg i > s ( − 2 ) 4 4 4 , [ ( ) ]= !2 =1− ! + !2 ≥ 1− ! since the distribution of i inside its square is uniform with respect to Lebesgue measure. Setting s ! n2, this probability is at least n2, so by a union bound = /4 1 − 1/

ℙ distg i >! n2 > n, (23) [mini n ( ) /4 ] 1 − 1/ i.e., every eigenvalue is well-separated≤ from g with probability n. We now recall from (12) that 1 − 1/

 X ⊂ D i, V X  . Λ ( ) i⋃n ( ( ) )

≤ 25 Thus, on the events (21) and (23), we see that  X is shattered with respect to g as long as Λ ( ) ! V X  < , ( ) n2 which is implied by 4 4 5  < 1 . n5 ⋅ n2 ⋅ n2 = n9 Thus, the advertised claim holds with probability4 4 at least16

1 12, 1− n − n2 as desired. Finally, we show that the shattering property is retained when the Gaussian perturbation is added in finite precision rather than exactly. This also serves as a pedagogicalwarmup for our pre- sentation of more complicated algorithms later in the paper: we use to represent an adversarial roundoff error (as in step 2), and for simplicity neglect roundoff error completely in computations whose size does not grow with (such as steps 3 and 4, which set scalarE parameters).

n SHATTER n n Input: Matrix ∈ × , Gaussian perturbation size ∈ (0 1/2). Requires: 1. Algorithm: ( A )ℂ = SHATTER( ) , ‖A‖ ≤ 1. ij NX,(1/  ) for = 1 …A, . ← 2. G + n + .i,j , , n ← 4 3.X Let g beA a G randomE grid with = n and bottom left corner chosen as in Theorem 3.12. 4 5

5 ! z 4. 1 n ← 2 16 9 n n Output: Matrix⋅ ∈ × , grid g, shattering parameter 0. Ensures: − 4 , V ( ) 2/ , and Λ( ) is shattered with respect to g, with probability at least 1 − 1/ − 12/ . X ℂ  > ‖X 2 A‖ ≤  X ≤ n X n n Theorem 3.13 (Finite Arithmetic Shattering). Assume there is a N-stable Gaussian sampling algo- rithm N satisfying the requirements of Definition 2.5. Then SHATTER has the advertised guarantees as long as the machine precision satisfies c

5 u 1 1 (24) ≤ 9 ⋅ N √ 2 16 (3 + ) and runs in , n c n 2 N 2 2 + = ( ) arithmetic operations. n T n O n 26 Proof. The two sources of error in SHATTER are:

1. An additive error of operator norm at most N √ u N√ u from N, by Definition 2.5. ⋅ ⋅ (1/ ) ⋅ ≤⋅ n c n c n 2. An additive error of norm at most √ u √ u, with probability at least , from the roundoff in step 2. ⋅ ‖ ‖ ⋅ ≤ 3 1 − 1/ n X n n Thus, as long as theE precision satisfies (24), we have 1 SHATTER 5 ‖ ( )− ‖ ≤ 2 16 9 A, A, , where refers to the (exact arithmetic)shatter( outcome) of Theoremn 3.12. The correctness of SHATTER now⋅ follows from Proposition 2.2. Its running time is bounded by shatter( ) 2 N 2

arithmetic operations, as advertised. n T n + 4 Matrix Sign Function

The algortithmic centerpiece of this work is the analysis, in finite arithmetic, of a well-known iterative method for approximating to the matrix sign function. Recall from Section 1 that if is a matrix whose spectrum avoids the imaginary axis, then A

sgn( )= + − − where the and are the spectral projectorsA correspondingP P to eigenvalues in the open right and left half-planes,+ − respectively. The iterative algorithm we consider approximates the matrix sign functionP by repeatedP application to of the function

A 1 −1 (25) ( ) ∶= ( + ) 2 g z z z . g This is simply Newton’s method to find a root of 2 , but one can verify that the function fixes the left and right halfplanes, and thus we should− expect 1 it to push those eigenvalues in the former towards , and those in the latter towardsz . g We denote the−1 specific finite-arithmetic implementation+1 used in our algorithm by SGN; the pseudocode is provided below. In Subsection 4.1 we briefly discuss the specific preliminaries that will be used throughout this section. In Subsection 4.2 we give a pseudospectral proof of the rapid global convergence of this iteration when implemented in exact arithmetic. In Subsection 4.2 we show that the proof pro- vided in Subsection 4.3 is robust enough to handle the finite arithmetic case; a formal statement of this main result is the content of Theorem 4.9.

27 SGN n n Input: Matrix ∈ × , pseudospectral guarantee , circle parameter , and desired accuracy Requires: Λ( ) C . Algorithm: A= SGNℂ ( )   A ⊂ 1. Slg(1/(1 − A,, ,)) + 3 lg lg(1/(1 − )) + lg lg(1/( )) + 7.59 ← 2. AN ⌈A ⌉ 0 ← 3. For k = 1, ..., N ,

(a) Ak 1 (Ak +A−1k ) + k ← 2 −1 −1 4. N E ← Output:S ApproximateA matrix sign function Ensures: − sgn( ) S ‖S A ‖≤ 4.1 Circles of Apollonius It has been known since antiquity that a circle in the plane may be described as the set of points with a fixed ratio of distances to two focal points. By fixing the focal points and varying the ratio in question, we get a family of circles named for the Greek geometer Apollonius of Perga. We will exploit several interesting properties enjoyed by these Circles of Apollonius in the analysis below. More precisely, we analyze the Newton iteration map in terms of the family of Apollonian z circles whose foci are the points ℂ. For the remainder of this section we will write 1−z for the Möbius transformation taking±1 ∈ the right half-planeg to the unit disk, and for each ( )= 1+ we denote by m z∈ (0 1) , C + ℂ C − ℂ −1 C+ C− = { ∈ ∶ | ( )| ≤ } = { ∈ ∶ | ( )| ≤ } C the closed region in the rightz (respectivelym z left), half-planez boundedm z by such a circle. Write + , and C− for their boundaries, and C C+ ∪ C− for their union. See Figure 2 for an illustration. = 2 ) The region C+ is a disk centered at 1+ ∈ ℝ, with radius 2 , and whose intersection with ) 1− 2 1− 2 the real line is the interval ( ( ) ( )−1); C− can be obtained by reflecting C+ with respect to the imaginary axis. For > 0, we will write m , m

> A , + = C + C + A+ , , A− , for the Apollonian annulus lying inside C+ and outside⧵ C+ ; note that the circles are not concentric so this is not strictly speaking an annulus, and note also that in our notation this set does not include )C+ . In the same way define A− , for the left half-plane and write A , = A+ , ∪ A− , .

Observation 4.1 ([Rob80]). The Newton map is a two-to-one map from C+ to C+ , and a two- 2 to-one map from C− to C − . 2 g

28 3

2

1

1 2 3 4 5 −1

−2

−3

Figure 2: Apollonian circles appearing in the analysis of the Newton iteration. Depicted are C for + and , with smaller circles corresponding to larger . 2 = 0 8 = 0 1 2 3 ) k . k , , , k Proof. This follows from the fact that for each in the right half-plane,

1− ( + 1/z ) (1 − ) ó 1 ó 2 ( ( )) = ó 2 ó = ó ó = ( ) 2 | | ó1+ 1 ( + 1/ )ó ó( + 1)2 ó | | ó z z ó ó z ó m g z ó 2 ó ó ó m z and similarly for the left half-plane. z z z It follows from Observation 4.1 that under repeated application of the Newton map , any point in the right or left half-plane converges to +1 or −1, respectively. g 4.2 Exact Arithmetic

In this section, we set and k k for all . In the case of exact arithmetic, k Observation 4.1 implies0 global convergence+1 of the Newton iteration when is diagonalizable. For the convenience ofA the∶= readerA weA provide∶= g this(A ) argumentk ≥ (due0 to [Rob80]) below. A A Proposition 4.2. Let be a diagonalizable matrix and assume that C for some . Then for every we have the guarantee A ℕ n × n Λ(A) ⊂ , N ∈ (0 1) ∈ 2 N 4 N V ‖ sgn( )‖ 2 +1 ( ) N + 1 A A ≤ ⋅  A . Moreover, when does not have eigenvalues− on the imaginary axis the minimum for which C is given by Λ( ) A 4 i A ⊂ 2 max 1− i n Re( ( )) = < i 2 = 1 (| ) − sgn(A |)

≤ ≤ | A A |

29 n N Proof. Consider the spectral decomposition i i iwi∗, and denote by i( ) the eigenvalues =1 of AN . = A ∑  v N By Observation 4.1 we have that AN ⊂ C and i i( ) . Moreover, AN and Λ( ) 2 sgn( ) = sgn( ) A have the same eigenvectors. Hence N sgn( )

N N AN A i( ) viwi∗ i( ) viwi∗ . (26) − sgn( ) ô  > ( − 1) ô + ô  < ( + 1) ô ôRe( ) 0 ô ôRe( ) 0 ô ‖ ‖≤ ô ∑ ô ô ∑ ô ô i ô ô i ô Now we will use that for any matrixô X we have thatô Xô V X spr X whereô spr X denotes the of X. Observe that the spectral radii of the( two) matrices( ) appearing( ) on the right hand side of (26) are bounded by maxi i − i ‖, which‖≤ in turn is bounded by the radius of the circle C , namely +1 . On thesgn( other) hand, the eigenvector condition number + 2 2 2 2 N /( N + 1) | | of these matricesN is bounded by V A . This concludes the first part of the statement. In order to compute note that( if)z with , then =x+ x > 0

2 iy 2 2 (1−x) + y 4x ( ) = 2 2 =1− 2 2 , (1+x) + y (1+x) + y |m z | and analogously when < and we evaluate −2. x 0 ( ) The above analysis becomes useless when|m tryingz | to prove the same statement in the frame- work of finite arithmetic. This is due to the fact that at each step of the iteration the roundoff error can make the eigenvector condition numbers of the k grow. In fact, since V k is sensitive to infinitesimal perturbations whenever k has a multiple eigenvalue, it seems difficult( ) to control it against adversarial perturbations as the iteration convergesA to k (which hasA very high mul- tiplicity eigenvalues). A different approach,A also due to [Rob80sgn(], yields) a proof of convergence in exact arithmetic even when is not diagonalizable. However, thatA proof relies heavily on the fact that N is an exact power of , or more precisely, it requires the sequence k to have the same generalized( ) eigenvectors,A which( 0) is again not the case in the finite arithmetic setting. Therefore,m A a robust version, tolerantm A to perturbations, of the above proof is needed. ToA this end, instead of simultaneously keeping track of the eigenvector condition number and the spectrumof the matrices k, we will just show that for certain k , the k pseudospectra of these matrices are contained in a certain shrinking region dependent0 on . This− invariant is inherently robust to perturbationsA smaller than k, unaffected by clustering > of eigenvalues due to convergence, and allows us to bound the accuracy and other quantities ofk interest via the functional calculus. For example, the following lemma shows how to obtain a bound on N solely using information from the pseudospectrum of N . − sgn( ) ‖A A ‖ Lemma 4.3 (Pseudospectral Error Bound)A. Let be any matrix and let N be the th iterate of the Newton iteration under exact arithmetic. Assume× that N and N satisfy  N C . Then we have the guarantee A n n 0 A∈ (0 1) N Λ ( )  > , N N A ⊂ N2 N 8 (27) − sgn( ) N 2 N N (1 − ) (1 + ) ‖A A ‖≤ .  30 Proof. Note that N . Using the functional calculus we get sgn( ) = sgn( ) A A N N 1 z z AN −1 dz 1 z AN −1 dz z AN −1 dz − sgn( ) = ô i )C ( − ) − i )C ( − ) − )C ( − ) ô ô2 2 + − ô ‖A A ‖ ô ô ô  N  N  N ô ô H Iô ô 1 z z AN −1 z AN −1 dz 1 z z AN −1 z AN −1 dzô i )C i )C = ô + ( − ) −( − ) + − ( − ) +( − ) ô ô2 2 ô ô ô ô  N  N ô ô1 z z AN −1 dz 1 z z AN −1 dz ô  )C  )C ô + ( − 1)( − ) ô + ô − ( + 1)( − ) ô 2 ô ô 2 ô ô ≤ ô ô ô ô ô NC ô C ô N ô ô1 ) + z z ô + ô1 ô 2  ( ) sup{ − 1 ∶ ∈ }N 2 N N ≤ ⋅ N ‹ N | | 4 1+ 1 = N2 N − 1 N 1− 1− N2 08 1. = N 2 N N (1 − ) (1 + )

In view of Lemma 4.3, we would now like to find sequences k and k such that

 Ak ⊂ C Λ ( ) k k and k2 k converges rapidly to zero. The dependence of this quantity on the square of k turns out to be/ crucial. As we will see below, we can find such a sequence with k shrinking roughly at the same rate as k. This yields quadratic convergence, which will be necessary for our bound on the required machine precision in the finite arithmetic analysis of Section 4.3. The lemma below is instrumental in determining the sequences k,k.

Lemma 4.4 (Key Lemma). If  A ⊂ C , then for every ¨ > 2, we have  C where Λ ( ) Λ ¨ ( ( )) ¨ ¨ 2 2 g A ⊂ ¨ ( − )(1 − ) ∶= 8   .

Proof. From the definition of pseudospectrum, our hypothesis implies −1 <  for every ( − ) 1/ z outside of C . The proof will hinge on the observation that, for each ¨ 2, , this resolvent bound allows us to bound the resolvent of everywhere in the Appolonian‖ z∈(A annulus‖) A . , ¨ Let w A ; see Figure 3 for an illustration.( ) We must show that w . Since , ¨ ¨ w C , Observation∈ 4.1 ensures no z gCA satisfies w; in other words,∉ Λ ( the( )) function ∉ 2 ∈ ( ) = g A w −1 is holomorphic in on C . As  C , Observation 4.1 also guarantees that( − ( )) C . Thus for w in the unionΛ( of) theΛ two( g) Appolonianz annuli in question, we can 2 calculateΛ(g z( the)) resolvent of atz w using the holomorphicA ⊂ A ⊂ functional calculus: g A ⊂ ( ) g A w −1 1 w −1 −1d ( − ( )) = i )C ( − ( )) ( − ) 2 g A g z z A z,  31 +

C 0 5 + . ¨ + ( ) C2 1 2 g z C w −0 5 z .

Figure 3: Illustration of the proof of Lemma 4.4

where by this we mean to sum the integrals over C+ and C− , both positively oriented. Taking norms, passing inside the integral, and applying Observation 4.1 one final time, we get: ) )

w −1 1 w −1 −1 d ( − ( ))  )C ( − ( )) ( − ) 2 ô g A ô ≤ C | g zw |⋅‖y z A )‖C z w y ô ô + y C+ −1 − y C− −1 ( )sup ∈ 2 ( − ) + (( )sup ∈ 2 ( − )  ‹ ) | 2| ‹ | | ≤ 1 8 .  ¨ 2 2 ( − )(1 − ) ≤ In the last step we also use the forthcoming Lemma 4.5. Thus, with ¨ defined as in the theorem statement, A , contains none of the  -pseudospectrum of . Since C , Theorem ¨ ¨ 2 2.3 tells us that therecan be no -pseudospectrum in the remainder( ) of Λ(C ,( as)) such a connected ¨ ¨ component would need to contain an eigenvalue of . g A g A ⊂  ( ) ℂ⧵ Lemma 4.5. Let > be given. Then for any g AC and C , we have . 1 0 x ∈ ) y ∈ ) x−y ( − )/2 Proof. Without loss> of , generality C+ and C+ . Then we have | |≥ x ∈ ) y ∈ )

2 x − y − = (x) − ( ) (x) − ( ) = 2 x − y 1 +| x 1 +| y | | ||m | |m y || ≤ |m m y | ≤ | |. | || |

Lemma 4.4 will also be useful in bounding the condition numbers of the k , which is necessary for the finite arithmetic analysis. A Corollary 4.6 (Condition Number Bound). Using the notation of Lemma 4.4, if  C , then Λ ( ) A ⊂ −1 1 and 4 2 (1 − ) ‖A ‖≤ ‖A‖ ≤ .  32  Proof. The bound −1 follows from the fact that C  In order to bound we use the contour integral bound1/ 0 ∉ Λ ( ) ‖A ‖ ≤  ⊃ A . A

1 z z A −1 dz = ô i )C ( − ) ô ô2 ô ‖A‖ ô C ô ô )  ô ô ( ) z 1 ô  zsup)C  ‹ 2 ∈ ≤ | | 4 1+0 1 . 1 = 2  1− 1−

Another direct application of Lemma 4.4 yields the following.

Lemma 4.7. Let  > . If  A ⊂ C , and >D> then for every N we have the guarantee 0 Λ ( ) 1/ 1  AN ⊂ C , Λ ( ) NN N  D 2 for N D 2 D and N ( −1)(1−D ) . N =( ) / = N 8  Proof. Define recursively ,  , k D k2 and k 1 k k D 2 . It is easy to , 0 0 +1 +1 8 0 0 0 see by induction that this definition= = is consistent= with the definition= ( of− 1)(1N and − N) given in the statement. We will now show by induction that  Ak ⊂ C . Assume the statement is true for k, so from Lemma 4.4 we have that the statementΛ ( is also) true for Ak if we pick the pseudospectral k k parameter to be +1 k k2 k2 ¨ k ( +1 − )(1 − ) 1k k D k2 . = k = ( − 1)(1 − ) 8 8 On the other hand

1k k D k2 1k k D 2 k , ( − 1)(1 − ) ( − 1)(1 − 0 )= +1 8 8 which concludes the proof of the first statement.≥ We are now ready to prove the main result of this section, a pseudospectral version of Propo- sition 4.2.

n n Proposition 4.8. Let A × be a diagonalizable matrix and assume that  A ⊂ C for some ∈ Λ ( ) , . Then, for any

33 Proof. Using the choice of k and k given in the proof of Lemma 4.7 and the bound (27), we get that

 N2 AN A 8 − sgn( ) N 2 N N (1 − ) (1 + ) N ‖ ‖≤  N D 8 0 8 =  N 2 N D 2 0(1 − ) (1 + ) ( − 1)(1 − 0 ) N 0D3 1 D D 2 8 0 8 =( 0) N D D 2 2 D D 2  D 2 ( −( 0) N ) ( +( 0) N ) 0N ( − 1)(1 − 0 ) D2 D 0 1 D 2 8 0 8 ( 0) N D 2 D 2 ( − 1) 0 ( − 1)(1 − 0 ) ≤ N  202 D 1 +2 D 2 0(1 − 0 ) 8 , =( 0) N  D 2 8 0 ( − 1)(1 − 0 ) where the last inequality was taken solely to make0 the expression1more intuitive, since not much is lost by doing so.

4.3 Finite Arithmetic Finally, we turn to the analysis of SGN in finite arithmetic. By making the machine precision small enough, we can bound the effect of roundoff to ensure that the parameters k, k are not too far from what they would have been in the exact arithmetic analysis above. We will stop the iteration before any of the quantities involved become exponentially small, so we will only need , , bits of precision, where is the accuracy parameter. polylog(1 − 0 0 ) In exact arithmetic, recall that the Newton iteration is given by Ak k 1 k −1k Here we will consider the finite arithmetic version G of the Newton map+1 = , defined( ) = 2 as( G+ ) A where A is an adversarial perturbation coming from the round-offg A error. HencA ( e,A) the. G g A sequence of interest is given by and k G k . ∶= k g(AIn)+ thisE subsectionE we will prove0 the following+1 theorem concerning the runtime and precision ž ž ž ž of SGN. Our assumptions on theA size∶= A of theA parameters∶= (A ) are in place only to simplify the A analysis of constants; these assumptions are not required for0 the execution of the algorithm. , Theorem 4.9 (Main guarantees for SGN). Assume INV is a INV n ,cINV -stable matrix inversion algorithm satisfying Definition 2.7. Let  , , , , and assume A Až has its  - 0 0 0 pseudospectrum contained in C where < < . Run( SGN( )with) 0 0 ∈ (0 1) ∈ (0 1/12) = N 0 1− 1/100  . 0 0 0 ¡ iterations (as specified in⌈ the statement of the algorithm). Then AN SGN A satisfies⌉ the advertised accuracy guarantee = lg(1/(1 − )) + 3lglg(1/(1 − )) + lg lg(1/( ))+7 59 ¡ AN A = ( ) sgn( ) ‖ − ‖≤ 34 when run with machine precision satisfying

+1 cINV n 2 ( log +3) u 0 N , INV n nN 2 ( ) ≤ √ corresponding to at most

u O n 3  lg(1/ )= (log log (1/(1 − 0))(log(1/ ) + log(1/ 0))) required bits of precision. The number of arithmetic operations is at most

N n2 TINV n . (4 + ( )) Later on, we will need to call SGN on a matrix with shattered pseudospectrum; the lemma below calculates acceptable parameter settings for shattering so that the pseudospectrum is con- tained in the required pair of Appolonian circles, satisfying the hypothesis of Theorem 4.9.

Lemma 4.10. If A has -pseudospectrum shattered with respect to a grid g grid z ,!,s , s that 0 1 2 includes the imaginary axis as a grid line, then one has  A ⊆ C where =  ( and ) Λ 0 ( ) 0 0 = /2  . 0 =1− g 2 diam( ) In particular, if  is at least n and !s and !s are at most n), then  and are also at least n . 1/poly( ) 1 2 poly( 0 1− 0 1/poly( ) Proof. First, because it is shattered, the  -pseudospectrum of A is at least distance  from g. Recycling the calculation from Proposition/24.2, it suffices to take /2

4 z 2 max 1− . z A Re 0 = z z 2 ∈Λ /2( ) −| sgn(| )  From what we just observed about the pseudospectrum,0 | we| can1 take z  . To bound the denominator, we can use the crude bound that any two points inside theRe grid are/2 at distance no more than g . Finally, we use for any . | | ≥ diam( ) √1 − x 1 − x/2 x ∈ (0,1) The proof of Theorem 4.9 will proceed≤ as in the exact arithmetic case, with the modification that k must be decreased by an additional factor after each iteration to account for roundoff. At each step, we set the machine precision u small enough so that the k remain close to what they would be in exact arithmetic. For the analysis we will introduce an exp licit auxiliary sequence k that lower bounds the k, provided that u is small enough. e Lemma 4.11 (One-step additive error). Assume the matrix inverse is computed by an algorithm INV satisfying the guarantee in Definition 2.7. Then G for some error matrix with norm ( ) = ( )+ AcINV gn A E E −1 INV log −1 u (28) + + ( ) ( ) 4√ ‖E‖ ≤ ‖A‖ ‖A ‖  n  A ‖A ‖ n . 35 The proof of this lemma is deferred to Appendix A. With the error bound for each step in hand, we now move to the analysis of the whole iteration. It will be convenient to define , which should be thought of as a small parameter. As 0 in the exact arithmetic case, for ∶= 1−we will recursively define decreasing sequences k and k maintaining the property s 1 s k ≥ , k C for all (29) Λ ( ) 0 by induction as follows: k Až ⊂ k k ≥

1. The base case holds because by assumption,  C . = 0 Λ 0 0 2. Here we recursivelyk define k . Set ⊂ k +1

k k2 +1 ∶= (1+ /4) In the notation of Subsection 4.2, this correspondss to. setting . This definition = 1+ /4 ensures that k2 k k for all , and also gives us the bound . We also have the closed+1 form D (1 + s/4) 0 1− /2 ≤ ≤ k s ≤ s k 2 −1 2 = (1+ /4) k 0 k which implies the useful bound s , k 2 (30) (1 − /2) k 3. Here we recursively define k . Combining ≤ Lemmas 4.4. , the recursive definition of k , and the fact that +1 , we find that C , where +1 k2 2 ¨ k 1− 1− 0 1− 0 = Λ ( ) +1 ≥ ≥ s g Až ⊂ k k k2 k2 k k2  k 2 ¨ k +1 − (1 − ) k (1 − ) k = k = 8  s 32 32s    ≥ . Thus in particular k C  s2 Λ /32 ( ) +1 Since G , fork k some error matrixk arising from roundoff, Proposi- k k k k g Až  ⊂ . k k tion 2.2 ensures+1 = ( that) = if we( set)+ Až Až g Až E 2 k E E k k k (31) k +1 ∶= − s32 we will have  k C as desired. ‖E ‖  Λ +1 ( +1) +1 k k We now need to showAž that⊂ the k, do not decrease too fast as increases. In view of (31), it will be helpful to set the machine precision small enough to guarantee that k is a small fraction s  k of k 2 . 32k ‖E ‖ First, we need to control the quantities k , k , and k k k appearing in our  −1 −1 upper bound (28) on k from Lemma 4.11, as functions of k.( By) Corollary = 4.6, we have ‖Až ‖ ‖Až ‖  Až ‖Až ‖‖Až ‖ ‖E ‖ k  k−1 1 and k 4 k 4 k 2 k 2 k (1 − ) ‖Až ‖≤ ‖Až ‖≤ ≤ .   s  36 Thus, we may write the coefficient of u in the bound (28) as

cINV n log  4 1 INV 4 1  ∶= 2 k + k + ( ) 2 k2 k 4 K k  n √n K k so that Lemma 4.11 reads s   0s  1  L M k  u (32)

Plugging this into the definition (31) of k ,we havek +1‖E ‖ ≤ K . 2 k k k  u (33) +1 − s32 k Now suppose we take u small enough ≥ so that K .

2 k  u 1 k (34)

k 3 s32 For such u, we then have K ≤  . 2 k k 2 k (35) +1 3 s32 which implies  ≥  , k 1 k (36) +1 this bound is loose but sufficient for our purposes.2 Inductively, we now have the following bound ‖E ‖ ≤  ; on k in terms of k:

Lemma 4.12 (Preliminary lower bound on k). Let , and for all , assume u satisfies the requirement (34):  k ≥ ≤i≤k 2 i 0 0 − 1  u i

Then we have i s K ≤ 1 .k 3 322 k k k k 0 s In fact, it suffices to assume the hypothesis ≥e only for . . e ∶= 0 1 50 Proof. The last statement follows from the fact thati ikis decreasing in and  is increasing in . Since (34) implies (35), we may apply (35) repeatedly= − to 1 obtain  i K i i k k −1 k 2 i 0 i =0  ≥ s k k ( 2/48) ∏ 2 −1− 2 −1 by the definition of i 0 k k k 0 2 k =  (s /48) (1 + s/4) 0 s 0  k = 0 2 s 1 48(1 + k/4) < 0 0 s ≥ . ≤ , s 0 1 1 1/8 50 37 We now show that the conclusion of Lemma 4.12 still holds if we replace i everywhere in the hypothesis by ei, which is an explicit function of  and defined in Lemma 4.12. Note that we 0 0 do not know i ei a priori, so to avoid circularity we must use a short inductive argument.

Corollary 4.13≥(Lower bound on k with explicit hypothesis). Let k , and for all i k , assume u satisfies ≥ ≤ ≤ s2 i 0 0 − 1 Ke u ei (37)

i where ei is defined in Lemma 4.12. Then we have≤ 1 3 32 k ek.

In fact, it suffices to assume the hypothesis only for≥ i k .

Proof. The last statement follows from the fact that ei is decreasing in i and Ke is increasing in i. Assuming the full hypothesis of this lemma, we= prove− 1 e for i k by induction on i. i i i For the base case, we have  e  . 0 0 0 0 ≥ ≤ ≤ For the inductive step, assume i ei. Then as long as i k , the0 hypothesis of this lemma implies ≥ = ≥ ≤ s2 i − 1 K u i ,

i so we may apply Lemma 4.12 to obtain i ≤ei 1 , as desired. +1 +1 3 32 Lemma 4.14 (Main accuracy bound). Suppose≥ u satisfies the requirement (34) for all k N . Then N N k ≤ ≤ ž −1 0 AN A 8 8 N50 2 (38) sgn( ) s k k2 + 2 +2 (1 − /2) N =0 ‖E+1‖ ⋅ 0 Proof. Since ,‖ for− every we‖≤ have∑ s . sgn = sgn  s  k ◦g k k k k k k k sgn( +1) − sgn( ) = sgn( +1) − sgn( ( )) = sgn( +1) − sgn( +1 − ) From the‖ holomorphicA¡ functionalAž ‖ ‖ calculusA¡ we cang rewriteAž ‖ ‖ kA¡ kA¡ k E as‖. the norm of a certain contour integral, which in turn can be boundedsgn( as follows+1)−sgn(: +1 − ) ‖ A¡ A¡ E ‖ ¡ ¡ 1 z Ak −1 z Ak k −1 k −1 k k −1  )C +1 +1 )C +1 +1 ô + [( − ) −( −( − )) ] − − [( − ) −( −( − )) ] ô 2 ô +1 +1 ô ô E dz z A¡ z A¡ E dzô ô k ¡  k ô 1 ô z Ak k −1 k k −1 k k −1 k k −1 ô  )C +1 +1 )C +1 +1 = ô + [( −( − )) ( − ) ] − − [( −( − )) ( − ) ] ô 2 ô +1 +1 ô ô E E z A¡ dz z A¡ E E z A¡ dzô ô k ¡  k ô 1 ô z Ak k −1 k k −1 ô  C +1 +1 ) + ( −( − )) ( − ) ≤ +1 ‖ E ‖‖E ‖‖ z A¡ ‖ dz  k 1 )C + k 1 1  ( +1 ) k k k k +1 − +1 ≤ ‹ k ‖E ‖ 4 +1 k  1 ‖E ‖1 = k2 k k k 1− +1 +1 − +1 ‖E ‖ ,  ‖E ‖  38 where we use the definition (6) of pseudospectrum and Proposition 2.2, together with the property (29). Ultimately, this chain of inequalities implies

k k k k 4 +1 k 1 1 sgn( +1) − sgn( +1 − ) k2 k k k 1− +1 +1 − +1 ‖ A¡ A¡ E ‖≤ ‖E ‖ . Summing over all and using the triangle inequality, we obtain ‖E ‖ 

N k −1 k N 4 +1 k 1 1 sgn( ) − sgn( 0) k k2 k k k =1 +1 +1 N 1− +1 − ‖ A¡ Až ‖≤ ∑ k ‖E ‖ 8 −1  ‖E ‖  k k2 =0 ‖E+1‖ ≤ ∑ , where in the last step we use k and sk2  , as well as (36). By Lemma 4.3, we have 1 1− +1 ≤ ≥ s N2 N N 8 − sgn( ) N 2 N N (1 − ) (1 + ) ‖A¡ A¡ ‖≤ N 8 N  N 2 ≤ N s8 N 1 50 2 2 0 ≤ N s  0 s 1 8 2 50 2 (1 − /2) N 2 0 ≤ N s 0 1 s8 N50 2 s 2 +2 (1 − /2) N ⋅ 0 ≤ s . where we use < in the last step. s  Combining the1/2 above with the triangle inequality, we obtain the desired bound. s

We would like to apply Lemma 4.14 to ensure A¡N A is at most , the desired accuracy parameter. The upper bound (38) in Lemma . is the− sum sgn( of) two terms; we will make each term less than . The bound for the second term4 14 will‖ yield a sufficient‖ condition on the number of iterations N/2. Given that, the bound on the first term will then give a sufficient condition on the machine precision u. This will be the content of Lemmas 4.16 and 4.17. We start with the second term. The following preliminary lemma will be useful: Lemma 4.15. Let >t> and >c> be given. Then for 1/800 0 1/2 0 j t t c . , lg(1/ ) + 2 lg lg(1/ ) + lg lg(1/ ) + 1 62 we have ≥ t 2 (1 − j ) j < c. t2 39 The proof is deferred to Appendix A. Lemma 4.16 (Bound on second term of (38)). Suppose we have

N s s s2 . . lg(8/ ) + 2 lg lg(8/ ) + lg lg(16/( 0))+1 62 Then ≥ N

8 N50 s 2 . s2 +2 (1 − /2) N /2 ⋅ 0 Proof. It is sufficient that ≤ N

8 N64 s 2 . s2 +2 (1 − /8) N /2 ⋅ 0 The result now follows from applying Lemma 4.15 with≤ c s2 and t s . = 0/16 = /8

Now we move to the first term in the bound of Lemma 4.14. Lemma 4.17 (Bound on first term of (38)). Suppose

N s s s2 . , lg(8/ ) + 2 lg lg(8/ ) + lg lg(16/( 0))+1 62 and suppose the machine precision≥ u satisfies

+1 cINV n s 2 ( log +3) u (1 − ) N . INV n nN 2 ( ) Then we have ≤ √ N k 8 −1 . s k k2 /2 =0 ‖E+1‖ ∑ ≤ Proof. It suffices to show that for all k N , 0 − 1 ≤ ≤  s k2 . k N+1 16 ‖E ‖≤ In view of (32), which says k  u, it is sufficient to have for all 0 − 1 k ‖E ‖ ≤ K k2 s ≤k≤N u 1 +1 . (39)  N 16 ≤ For this, we claim it is sufficient to have for allK k k N 0 − 1 e≤ k2 ≤s u 1 +1 . (40) Ke N 16 ≤ k Indeed, on the one hand, since < and by the loose bound ek < s k < s k we have that +1 +1 s2e 1/6 (40) implies u K1 , which means that the assumption in Corollary 4.13 is satisfied. On the 3 32k ek ≤ 40 other hand Corollary 4.13 yields ek k for all k N , which in turn, combined with (40) would give (39) and conclude the proof. 0 We now show that (40) holds for≤ all k N ≤ . Because≤ Ke and ek are decreasing in k, 0 − 1 1/ it is sufficient to have the single condition k ≤ ≤ eN2 s u 1 . Ke N 16 We continue the chain of sufficient conditions≤ onN u, where each line implies the line above:

eN2 s u 1 Ke N 16 ≤ N e s u N2 1 cINV n log N s 4e e1 INV n s 4e e1 n 16 ≤ 2 + + ( ) 2 2 4 N N  N √ N e s u 4  N2 5 1 cINV n log +1 N INV n s 4e n 6 ( ) 2 4 16 ≤ cINV n N e √s log +3 u  N 2 . INV n nN 6 4 16 ( ) 4 ≤ √ where we use the bound e1 ⋅ ⋅ s 4e without much0 loss,1 and we also assume INV n and 2 2 ( ) 1 cINV n for simplicity.N ≤ N ≥ Substitutinglog 1 the value of eN as defined in Lemma 4.12, we get the sufficient condition ≥ N cINV n  s2 N s2 log +3 u 0( /50) . INV n nN 384 ( ) 4 ≤ √ Replacing N by the smaller quantity s and cleaning up the constants yields the 2N0 2N 1 sufficient condition 0 = (1− )

N cINV n  s2 s 2 s2 log +3 u 0( /50) (1 − ) N . INV n nN 400 ( ) 4 Now we finally will≤ use our hypothesis√ on the size of N to simplify this expression. Applying 0 1 Lemma 4.16, we have s N 2N  s2 4(1 − ) . 0( /50) /4 Thus, our sufficient condition becomes ≥

cINV n +1 s 2 log +3 u 4(1 − ) N . INV n nN 400 ( ) ≤ √ To make the expression simpler, since cINV0 n 1we may pull out a factor of 4 > and remove the occurrences of to yield the sufficientlog + 3 condition4 4 200 ≥ 41 +1 cINV n s 2 ( log +3) u (1 − ) N . INV n nN 2 ( ) ≤ √

Matching the statement of Theorem 4.9, we give a slightly cleaner sufficient condition on N that implies the hypothesis on N appearing in the above lemmas. The proof is deferred to Appendix A.

Lemma 4.18 (Final sufficient condition on N ). If

N s s  . , = lg(1/ ) + 3 lg lg(1/ ) + lg lg(1/( 0))+7 59 then ⌈ ⌉

N s s s2 . . lg(8/ ) + 2 lg lg(8/ ) + lg lg(16/( 0))+1 62 Taking the logarithm of≥ the machine precision yields the number of bits required: Lemma 4.19 (Bit length computation). Suppose

N s s  . = lg(1/ ) + 3 lg lg(1/ ) + lg lg(1/( 0))+7 59 and ⌈ ⌉ +1 cINV n s 2 ( log +3) u (1 − ) N . INV n nN 2 ( ) Then ≤ √

u O n s 3  . log(1/ )= log log(1/ ) (log(1/ ) + log(1/ 0)) Proof. Immediately we have 

N u O INV n n N n +1 s . log(1/ )= log(1/ ) + log ( ) + log + log + (log )2 log(1/(1 − )) N We first focus on the term +1 s . Note that s O s . Thus,  2 log(1/(1 − )) log(1/(1 − )) = ( ) N s  . +1 s s 3 lglg(1/ )+lglg(1/( 0))+9 59 O s O s 3  . 2 log(1/(1 − )) = (1/ ) 2 ( )= (log(1/ ) (log(1/ ) + log(1/ 0))) Using that INV n n and⋅ discarding subdominant⋅ terms, we obtain the desired bound. ( )=poly( )

This completes the proof of Theorem 4.9. Finally, we may prove the theorem advertised in Section 1.

Proof of Theorem 1.5. Set  K1 , . Then  A does not intersect the imaginary axis, and min{ 1} Λ ( ) furthermore  A ⊆ D , because A . Thus, we mayapplyLemma 4.10 with g to obtain parametersΛ ( ) (0,2)with∶= the property1 that and  arediam( both O) = 4K√2. Theorem 4.9 now yields0 the0 desired conclusion.‖ ‖≤ log(1/(1 − 0)) log(1/ 0) (log )

42 5 Spectral Bisection Algorithm

In this section we will prove Theorem 1.6. As discussed in Section 1, our algorithm is not new, and in its idealized form it reduces to the two following tasks:

Split: Given an n n matrix A, find a partition of the spectrum into pieces of roughly equal size, and output× spectral projectors onto each of these pieces. ± Deflate: Given an rank- projector , outputP an matrix with orthogonal columns that span the× range of . × n n k P n k Q These routines in hand, on inputP one can compute and the corresponding , and then ± ± find the eigenvectors and eigenvalues of ∗ . The observation below verifies that this recursion is sound. A ± ∶= ± ± P Q A Q AQ Observation 5.1. The spectrum of is exactly , and every eigenvector of is of the form for some eigenvector of one of Λ(. +) Λ( −) ± A ± A ⊔ A A The difficulty,Q v of course, is thatv neither ofA these routines can be executed exactly: we will never have access to true projectors , nor to the actual orthogonal matrices whose columns span their range, and must instead make± do with approximations. Because our± algorithm is re- cursive and our matrices nonnormal,P we must take care that the errors in theQ sub-instances do not corrupt the eigenvectors and eigenvalues we are hoping to find. Additionally, the Newton± iteration we will use to split the spectrum behaves poorly when an eigenvalue is close to theA imaginary axis, and it is not clear how to find a splitting which is balanced. Our tactic in resolving these issues will be to pass to our algorithms a matrix and a grid with respect to which its -pseudospectrum is shattered. To find an approximate eigenvalue, then, one can settle for locating the grid square it lies in; containment in a grid square is robust to perturbations of size smaller than . The shattering property is robust to small perturbations, inherited by the subproblems we pass to, and—because the spectrum is quantifiably far from the grid lines—allows us to run the Newton iteration in the first place. Let us now sketch the implementations and state carefully the guarantees for SPLIT and DEFLATE; the analysis of these will be deferred to Appendices B and C. Our splitting algorithm is presented a matrix whose -pseudospectrum is shattered with respect to a grid g. For any vertical grid line with real part , gives the difference between the number of eigen- values lying to its leftA and right. AsTr sgn( − ) ℎ A ℎ SGN SGN Tr ( − ) − Trsgn( − ) ( − ) − sgn( − ) we can determine| these eigenvalueA ℎ countsAexactlyℎ |≤n‖by runningA SGNℎ to accuracyA ℎ ‖, and round- ing SGN to the nearest integer. We will show in Appendix B that, by mounting(1/ ) a binary searchTr over( horizontal− ) and vertical lines of g, we will always arrive at a partitionO ofn the eigenval- ues into twoA partsℎ with size at least . Having found it, we run SGN one final time at the desired precision to find the approximatemin{ /5 spectral1} projectors. n ,

43 SPLIT n n Input: Matrix × , pseudospectral parameter , grid g grid , and desired accuracy ∈ = ( 0 1 2) Requires:  A is shattered with respect to g, and . n Algorithm:Λ (žA)g ℂ SPLIT g  0 05/ z ,!,s , s ( ± ± ±) = ( ) ≤ 1. ExecuteP a, binary, n search overA, ,horizontal, grid shifts ℎ until

 SGN A ℎ,  , , n . Tr − /4 1 − g 3 /5 2 diam( )2 ≤ 2. If this fails, set A iA and repeat0 with vertical grid shifts 1 ← 3. Once a shift is found,

ž 1 SGN I , ± ← 2 − /4 1 − g ± 2 diam( )2 P A ℎ,  , , and g are set to the two subgrids0 0 1 1 ± ž n n Output: Two matrices × , two subgrids g , and two numbers Ensures: Each subgrid g± ∈contains eigenvalues± of , , and± , where are the true spectral projectorsP for± theℂ eigenvalues± in the subgrids ±g respectively./5 n ± − ± ± n A n ±≥ n ‖Pž P ‖ ≤ P

Theorem 5.2 (Guarantees for SPLIT). Assume INV is a INV INV -stable matrix inversion algo- rithm satisfying Definition 2.7. Let , . n, and( A ) and g have side lengths of at most , and define 0 5 0 05/  ,c 4 8  ≤ . ≤ ‖ ‖ ≤ NSPLIT 256 256 4 . . NSPLIT ∶= lg  + 3 lg lg  + lg lg  + 7 59 Then SPLIT has the advertised guarantees when run on a floating point machine with precision

SPLIT  +1 cINV n ⎧ 2N ( log +3)  u uSPLIT 1− 256 , , uSPLIT ∶= min INV n nNSPLIT n ⎪ 2 ( ) 100 ⎪⎫ ≤ ⎪  ⎪ Using at most ⎨ √ ⎬ ⎪ ⎪ TSPLIT n, g,, ⎩ 1 NSPLIT TINV n O n⎭2 TSPLIT ( ) 12 lg ! g ( )+ ( ) ( ) arithmetic operations. The number of≤ bits required⋅ is ⋅  uSPLIT O n 3 256 1 4 . lg 1/ = log log  log + log  Deflation of the approximate projectors we obtain from SPLIT amounts to a standard rank- revealing QR factorization. This can be0 achieved deterministically0 in11O n3 time with the classic algorithm of Gu and Eisenstat [GE96], or probabilistically in matrix-multiplication( ) time with a variant of the method of [DDH07]; we will use the latter.

44 DEFLATE n n Input: Matrix × , desired rank , input precision , and desired accuracy Requires: ∈ for some rank- projector . › 1 Algorithm: −P DEFLATEℂ 4 k ‖ = ‖ ≤ ≤ ( ) k P  › 1. PQž P Haar unitaryP, k, × + ,1 2. n nQR ∗ E H( ←) ( ) 3. U , R first columnsP of . ← H Output: A tall matrixk n kU ž × Ensures:Q ←There exists a matrix∈ n k whose orthogonal columns span range , such that , Qž ℂ × n 3 ∈ ( ) − with probability at least (20 ) . ž 1 − Q2 √ ℂ P ‖Q Q‖  ≤ Theorem 5.3 (Guarantees for DEFLATE). Assume MM and QR are matrix multiplication and QR factorization algorithms satisfying Definitions 2.6 and 2.8. Then DEFLATE has the advertised guarantees when run on a machine with precision:

u u DEFLATE › ∶= min max( QR( ) MM( )) 2 QR( ) <  = uDEFLATE ≤ 4 , . The number of arithmetic operations is at most: n , n  n ‖P‖  DEFLATE( )= 2 N + 2 QR( )+ MM( ) DEFLATE

Remark 5.4. The proof ofT the aboven theorem,n T whichT n is deferredT n . to Appendix C, closely fol- T lows and builds on the analysis of the randomized rank revealing factorization algorithm (RURV) introduced in [DDH07] and further studied in [BDDR19]. The parameters in the theorem are optimized for the particular application of finding a basis for a deflating subspace given an ap- proximate spectral projector. The main difference with the analysis in [DDH07] and [BDDR19] is that here, to make it applicable to complex matrices, we make use of Haar unitary random matrices instead of Haar orthogonal random matrices. In our analysis of the unitary case, we discovered a strikingly simple formula (Corollary C.6) for the density of the smallest singular value of an × sub-matrix of an × Haar unitary; this formula is leveraged to obtain guarantees that work for any and , and not only for when − 30, aswas thecasein[BDDR19]. Finally, we explicitlyr r account for finiten arithmeticn considerations in the Gaussian randomness used in the algorithm, wheren truer Haar unitary matrices cann r never ≥ be produced. We are ready now to state completely an algorithm EIG which accepts a shattered matrix and grid and outputs approximate eigenvectors and eigenvalues with a forward-error guarantee. Aside from the a priori un-motivated parameter settings in lines 2 and 3—which we promise to justify in the analysis to come—EIG implements an approximate version of the split and deflate framework that began this section.

45 EIG

Input: Matrix × , desired eigenvector accuracy , grid g grid , pseudospectral 0 1 2 guarantee , acceptable∈ m failurem probability , and global instance size= ( ) Requires: Ais shatteredℂ with respect to g, and . z ,!,s , s Algorithm:ΛEIG( ) g  n  A ( ) m n 1. If is A,, , ,,,n ≤ 1 × 1 ( ) (1 ) 2. A 2 V,ž Dž ,A 200 ← 3.  4 2 ← 6 8 (20 ) 4 4. gn gn SPLIT g ( + ← − + − + −) ( ) 5. DEFLATEn , n A, , , P›± , P› , , , ( ± ←± ) 6. ∗ n , Qž± ← ± ± + 6P›± , , 7. EIG , g . (Až ±←±Q)ž AžQž (E ± 4 /5 ± 4 /5 ) 8. n Vž , Dž ←+ + −Až −, + ,8 ,  , , 9. Vž normalizeQž Vž Qž Vž  E ← ( ) + 9 Vž Vž E 10. ← + ž ž D − D ž Output: ←Eigenvectors0 D 1 and eigenvalues Ensures: With probability at least ( , each) entry lies in the same square as exactly one ž ž eigenvalue , and each column1 − V,ofD has norm = u, and satisfies for some exact ži ži,i unit right eigenvector∈ Λ( ) .   1 ± D − i A = v›i Vž n ‖v›i vi‖  Avi ivi ≤ Theorem 5.5 (EIG: Finite Arithmetic Guarantee). Assume MM QR, and INV are numerically stable algorithms for matrix multiplication, QR factorization, and inversion satisfying Definitions , 2.6, 2.8, and 2.7. Let < 1, A ∈ × have A 3.5 and, for some  < 1, have -pseudospectrum shattered with respect to a grid g =ngridn (z ,!,s , s ) with side lengths at most 8 and ! 1. Define  ℂ 0 1 ≤2 ‖ ‖ n n n 26 ≤ NEIG . NEIG   2 49 256 256 (5 ) Then EIG has the advertised guarantees∶= lg when+ 3 run lg lg on a floating+ lg lg point machine with precision satis- fying:

n n 26 n 30 u 3 14 83 cINV n NEIG, MM n ,QR n ,n  2 48 . 2 48 < = ≥ n n(5 ) (5 ) lg 1/ maxO lg3 lg n 2. ( log + 3) + lg lg + lgmax{ ( ) ( ) }  0 1 = log log log 46   The number of arithmetic operations is at most

T n,, g,,,n N T n O n T n T n EIG EIG ! g INV 2 QR MM 1 n ( ) =O 60 lg ( )+ ( ) + 10T n( ) +. 25 ( ) ! g( )   MM 1 1 Remark 5.6. We have not fully= optimizedlog thelog large+ log constant log appearing( ) in the bit length ( ) 14 83 above. 0 0 1 . 1 2 Theorem 5.5 easily implies Theorem 1.6 when combined with SHATTER. Theorem 5.7 (Restatement of Theorem 1.6). There is a randomized algorithm EIG which on input

any matrix A × with A and a desired accuracy parameter  , outputs a diagonal D and invertible V suchn n that ℂ ≤ ∈ ‖ ‖ 1 ∈ (0 1) A V DV −1   V n2 5  and ( ) 32 . / in ≤ ≤ ‖ − ‖ n O TMM n 2 ( ) log  arithmetic operations on a floating point machine with  n  O 4 n log  log bits of precision, with probability at least n n2. Here TMM n refers to the running time of a numerically stable matrix multiplication1 −algorithm 1/ − 12/ (detailed in Section( ) 2.5). Proof. Given A and , consider the following two step algorithm: 1. X, g, SHATTER A,  . ( ) ( /8) 2. V , D EIG X, ¨, g,, n,n , where ( ) ← ( 1/ )  3 ←  ¨ . ∶= n2 5 . 6 128 2 We will show that this choice of  ¨ guarantees ⋅ ⋅ ⋅

X V DV −1  . − /2

Theorem 3.13 implies that X W −1 is diagonalizable≤ with probability one, and moreover = ‖ ‖

−1 2 CW( )= 8 / when is normalized to have unit columns, by (11) (where≤ we are using the proof of Theorem  W ‖W ‖‖W ‖ n  3.6), with probability at least 2. W 1 − 12/ n 47 Since from Theorem 3.13, the hypotheses of Theorem 5.5 are satisfied. Thus EIG+ succeeds− with1 + 4 probability3 at least . Taking a union bound with the success of SHATTER≤ , we have≤ ≤ for some 1 − 1/ , so ‖X‖ ‖A‖ ‖A X‖ ¨ = + n V W E ¨E ≤  √n − ‖ ‖ as well as ≤ √ ‖V W ‖  n, ¨ ( ) ( )− 2 − 2 n n 8 16 since our choice of ¨ satisfies. ≥ ≥   √n≥ ,  V  W ‖E‖ n n ¨  2 5 16 This implies that  ≤ . , n 2 −1 16 ( )= 2 n establishing the last item of the theorem. V V V ≤ √n⋅ , We can control the perturbation of the‖ inverse‖‖ ‖ as: 

−1 −1 −1 −1 − = ( − ) V W W W2 2 V V 8 ¨ ‖ ‖ 2‖ ‖ n ≤ 2 5 ¨  √n 128  0 . 1 n2  ≤ . Combining this with from Theorem 5.5, we have: − −1 D C−1≤  −1 −1 −1 −1 − ‖ ‖ ( − ) + ( − ) + ( − ) ≤ 2 2 2 5 ¨ V DV WCW ¨V W DV16 W¨ D16 C V WC128 V. W ‖ ‖ ‖ 5 ‖+ ‖ + ‖ 5 ‖ 2 ‖ ≤ ¨ 2 5⋅ ⋅ n n ⋅ ⋅ n   √n √n5 128 √n = . 5 16 + 16 +   ⋅  ¨n2 5 ⋅  .  2 06 128 1 ≤  n ⋅ ⋅ which is at most , for ¨ chosen as above. We conclude that /2    −1 −1 − − + − ≤ ≤ with probability A 2 Vas DV desired. A X X V DV  To compute the1 − running 1/ − 12/‖ time and precision,‖ ‖ we‖ ‖ observe that‖ SHATTER, outputs a grid with parameters n n 4 5 = Ω = Ω  5  9 ! ,  . n n 0 1 48 0 1 Plugging this into the guarantees of EIG, we see that it takes

MM MM 2 log log + log log ( ) = ( ( ) log ( / )) n n n arithmetic operations,O on a floating point machineT withn precisionO T n n         3 4 log log log = (log ( / ) log( )) n n bits, as advertised. O n O n  n     5.1 Proof of Theorem 5.5 A key stepping-stone in our proof will be the following elementary result controlling the spec- trum, pseudospectrum, and eigenvectors after perturbing a shattered matrix. Lemma 5.8 (Eigenvector Perturbation for a Shattered Matrix). Let be shattered with respect Λ ( ) to a grid whosesquares have side length , and assume that < . Then, (i) each eigenvalue of Ažlies in the same grid square as exactly one eigenvalue of A−, (ii) AAž is shattered with respect to the same grid, and (iii) for any right! unit eigenvector v› ofAžAž,A there≤Λ − exists( ) a right unit eigenvector ‖ ‖   of A corresponding to the same grid square, and for which

!  v› v 8 . −     √ ( − ) ž ≤ Proof. For (i), consider A A t A ‖A for‖ t , . By continuity, the entire trajectory of each = + ( − ) ∈ [0 1] ž eigenvalue is contained int a unique connected component of A ⊂ A . For (ii), A ⊂ Λ ( ) Λ ( ) Λ − ( ) A , which is shattered by hypothesis. Finally, for (iii), let w∗ and wž∗ be the corresponding left Λ ( )     eigenvectors to v and v› respectively, normalized so that w∗v wž∗v› . Let be the boundary of the grid square containing the eigenvalues associated to v =and v› respectively.= 1 Γ Then, using a contour integral along as in (13) above, one gets Γ !  v›wž∗ vw∗ 2 . −     ( − ) ≤ ‖ ‖ Thus, using that v and w∗v , = 1 = 1

v›wž∗ vw∗ v›wž∗ vw∗ v wž∗v v› v . ‖ ‖ − ( − ) = ( ) − Now, since v› v v› is the orthogonal projection of v onto the span of v›, we have that ∗ ≥ ( ) ‖ ‖ ‖ ‖ ‖ ‖ wž∗v v› v v›∗v v› v v›∗v 2. ( ) − ( ) − = 1−( ) Multiplying v by a phase we can assume≥ without loss oft generality that v›∗v which implies that ‖ ‖ ‖ ‖ 0 ≥ v›∗v 2 v›∗v v›∗v v›∗v. 1−( ) = (1 − )(1 + ) 1− t t 49 ≥ √ The above discussion can now be summarized in the following chain of inequalities !  v›∗v v›∗v 2 wž∗v v› v v›wž∗ vw∗ 2 . 1− 1−( ) ( ) − −     ( − ) √ ≤ t ≤ ≤ ≤ √  Finally, note that v v› v›∗v ‖8    as we‖ wanted‖ to show.‖ − = 2 − 2 ! ( − ) The algorithm EIG works√ by recursively≤ reducing to subinstances of smaller size, but requires a pseudospectral guarantee‖ ‖ to ensure speed and stability. We thus need to verify that the pseu- dospectrum does not deteriorate too subtantially when we pass to a sub-problem.

n n Lemma 5.9 (Compressing a Shattered Matrix). Suppose is a spectral projector of × of ∈ rank and is an matrix with ∗ k and ∗ ∗ . Then for every , × = = 0 ℂ P A k Q n k Q Q ∗  Λ ( I ) PQQΛ ( ) QQ P  > k Proof. Take  ∗ . Then, there exists satisfying ∗ . Since ∈ Λ ( ) Q AQ ⊂∈ A . ( − ) k ∗ n we have = ℂ ≤ ∗ z Q AQ ( − ) v ‖ z Q AQ v‖ ‖v‖ SinceI Q I∗ Qis an isometry on and ≤ ,wehave ∗ and hence range( ) ‖Q( z− )A Qv‖∈ range(‖v‖. ) ( − ) = ( − )

Q Q ( − z) A Qv = Q ‖Q z A Qv‖ ‖ z A Qv‖ showing that  . ≤ ∈Λ ( ) ‖ z A Qv‖ ‖v‖ ‖Qv‖, Observation 5.10. Since g , our assumption on in Line 2 of the pseudocode of EIG implies the followingz A bounds on( ) which1 we will use below: ≤ ,! ,  2  min 0 02 /75 /100 g < 200 ( )= ≤  Initial lemmas in hand, let us begin to. analyze, , the algorithm., At. several points we will make an assumption on the machine precision in the margin. These! will be collected at the end of the proof, where we will verify that they follow from the precision hypothesis of Theorem 5.5.

Correctness.

Lemma 5.11 (Accuracy of i). When DEFLATE succeeds, each eigenvalue of shares a square of g with a unique eigenvalue of either or , and furthermore   . + − Λ4 /5( ±) Λ ( ) Proof. Let be the true projectorsž onto the two bisection regions found byASPLIT , be ± Až Až Až ⊂ A ( ) ± ± ± the matrices whose orthogonal columns span their ranges, and ∗ . From Theorem 5.3, on the event that DEFLATE succeeds, the approximation that± it∶= outputs± satisfies± , ± P ± A,± − ±Q P ,Q A Q AQ ≤ A Qž ‖Qž Q ‖ 

50 so in particular as . The error , from performing the matrix multiplications necessary to compute± 2admits the1 bound 6 ± ≤ ± ≤ ‖Qž ‖  E , MM u MM 2 u MM 2 2 u 6 ± ( ) ± Až± + ( ) ± + ( ) ± MM ž u ž MM 2u2 ž ž and ≤ 16 n Q( ) A+Q ( ) n Q A  n Q A 4 ± 1+ 1 02 2 ‖E ‖  ‖ ‖‖ ‖ ‖ ‖ ‖ ‖ ‖ ‖ ≤  n  n uA ≤ Qž ≤ ≤ . ≤ √ 3 MM 2  ‖ ‖ 10  (‖ ) ‖ ≤  ≤ . Iterating the triangle inequality, we obtain  n

, ± − ± 6 ± + ( ± − ±) ± + ± ( ± − ±) ž ž ž ž A A ≤ 3E + 8 +Q 4 Q AQ Q A Q Q ± − ± ‖ ‖ ‖ ‖ ‖ ‖ ‖ ‖ ≤ /5   Qž /75Q ≤ ‖ ‖ We can now apply Lemma≤ 5.8. ≤ . Everything is now in place to show that, if every call to DEFLATE succeeds, EIG has the advertised accuracy guarantees. After we show this, we will lower bound this success probability and compute the running time.

When 1×1, the algorithm works as promised. Assume inductively that EIG has the desired guarantees∈ on instances of size strictly smaller than . In particular, maintaining the notation fromA theℂ above lemmas, we may assume that n EIG g ( ± ±)= ( ± 4 /5 ± 4 /5 ) satisfy (i) each eigenvalue of Vžshares, Dž a squareAž , of g, with,  exactly,,n one eigenvalue of , and (ii) each column of is -close± to a true eigenvector± of . From Lemma 5.8, each eigenvalue± of shares a grid square± 4 /5 with exactlyDž one eigenvalue of ,± and thus the output Až ± Vž  Až Až A + = Dž − Dž satisfies the eigenvalue guarantee. Dž To verify that the computed eigenvectors are0 close to1 the true ones, let be some approximate right unit eigenvector of one of output by EIG (with norm u), the± exact unit eigenvector ± ž ž of that it approximates, and ± the corresponding exact unit1± eigenvector± v› of . Recursively, v›± ž EIG ± g will output anA approximate± unit eigenvectorn v› ± v›± Až( ) v A v A,, ,,,n ± ± + ¨ ∶= + Qž±vž›± e v› + e , v› whose proximity to the actual eigenvector Qž vž› e we need now to quantify. The error terms here are , a column of the error matrix whose‖ ∶= norm±‖ we can crudely bound by 8 v Qv v e MME u MM u 8 ( ) ± ± 4 ( ) ž ž e ≤ E ≤ n Q51V ≤  n ≤ , ‖ ‖ ‖ ‖ ‖ ‖‖ ‖ and ¨, a column incurred by performing the normalization in floating point; in our initial 9 discussion of floating point arithmetic we assumed in (16) that ¨ u. The distance between ande is justE the difference in their norms—since they are parallel—so ± ± e ≤ n v› Qž vž› ‖ ‖ ± ± + u MMu − ± ± + ± ± + − 1 (1 + )(1 + ) + 4 − 1 4 Qž±vž›± e ô + ž ž ô ž ž ô Q v› eô ≤ ó Q v› e ó ≤   ≤ . Inductivelyô Qž vž› e , andô sinceó‖ ‖ ó and has shattered -pseudospectrum ô ô ó ó from Lemmaô‖ 5.9± −, Lemma± ‖ 4 5.8/5 ensuresô ± − ± /5 ± vž› vž› ≤  A Až ≤  A  ‖ ‖ g ‖ ‖ 8 ( ) 15 ± − ±     √ !( −⋅ 15 ) vž› v ≤ ! g  ‖ ‖ 8⋅ ( ) 15    2 /75 √ 4 ⋅/5 ≤ ≤ 2  ⋅  . /10 ! g 200 ( ) Thus iterating the triangle identity≤ and using Q , ≤ ± = 1 Qž vž› e v› v ± ± + e¨ Q v ‖ ‖ − = Qž vž› e + − ± ± ô ± ± + ô ô Qž vž› e ô ‖ ‖ ô ± ± Qž vž› e ô e e Qž Q vž› Q vž› v› Q v› v ô ž ž + ô ¨ ô‖Q v› e‖ − ± ± + ô+ + + ( ± − ±) ± + ±( ± − ±) + ±( ± − ±) ô ± ± + ô ≤ ô  nu MM n u ô nu   4ô + + ( ) + ô(1 +‖ ‖ ) +‖ 4‖ /5‖ + /10 ‖ ‖ ‖ ‖ ‖ ô‖  ‖  ô nu,MM n u  ≤ 8ô + 4 /5 + /10 ô ( )    . ≤ /200≤ This≤ concludes the proof of correctness of EIG. ≤

Running Time and Failure Probability. Let’s begin with a simple lemma bounding the depth of EIG’s recursion tree. Lemma 5.12 (Recursion Depth). The recursion tree of EIG has depth at most n, and every branch ends with an instance of size . log5/4 1 × 1 Proof. By Theorem 5.2, SPLIT can always find a bisection of the spectrum into two regions con- taining n eigenvalues respectively, with n n n and n n , and when n can always peel off at± least one eigenvalue. Thus the depth+ + −d=n satisfies± 4 /5 5 ( ) ≥ ≤ n n d n 5 (41) ( )= max , d(n) n>5 T1+ ∈[1/5 4/5] ≤ As n n for n , the result is immediate from induction. log5/4 5 ≤ ≤ 52 We pause briefly and verify that the assumptions on  < ,  < , and A . in Theorem 5.5 ensure that every call to SPLIT throughout the algorithm1 satisfies1/2 the hypotheses3 5 in Theorem 5.2. Since , are non-increasing as we travel down the recursion tree of EIG≤ , we need only ‖ ‖ verify for their initial settings. Theorem 5.2 needs  < , which is satisfied immediately, and we 1/2 additionally have 42 n 6 n8 6n . n. Finally, we need= that every/(20 ) matrix4 passed1/20 to SPLIT0 05/ throughout the course of the algorithm has norm at most . Lemma 5.11 ⋅shows≤ that if ≤A and has its -pseudospectrum shattered, then Až A  4 , and since A A , this means 4Až A  . Thus each time we pass to a subproblem,± − ± the/5 norm of the matrix± = we pass to EIG≤(and± thus to+SPLIT/5 ) increases by at most  . ‖ ‖ Since  decreases≤ by a factor of on each recursion, this means≤ that by the end of the algorithm/5 ‖ ‖ ‖4/5‖ ‖ ‖ ‖ ‖ ‖ ‖ the norm of the matrix passed to EIG will be at most 1   . Thus we will be safe if our initial matrix has norm at most . , as assumed. 5 (1−4 /5) 1/2 3 5 ⋅ ≤ ≤ Lemma 5.13 (Lower Bounds on the Parameters). The input parameters given to every recursive call EIG A¨, ¨, grid¨,¨,,n and SPLIT A¨ ℎ¨,¨, g¨, ¨ satisfy ( ) ( − ) 2 2 48  ¨  n ¨  n  . / / n3 4 n 26 200 (5 ) Proof. Along each branch≥ of the recursion≥ tree,≥ we replace  ≥  and   at most n times, so each can only decrease by a factor of n from their initial4 /5 settings. 4 /5 log5/4 Lemma 5.14 (Failure Probability). EIG fails with probability no← more than . ←

Proof. Since each recursion splits into at most two subproblems, and the recursion tree has depth n, there are at most 5/4 log n log 2 log5/4 n log 5/4 n4 2 2 = 2 2 DEFLATE calls to . We have set every ⋅ and so that the≤ failure probability of each is  n4,soa crude union bound finishes the proof. /2 The arithmetic operations required for EIG satisfy the recursive relationship

TEIG n,, g,,,n TSPLIT n,, TDEFLATE n, , TMM n ( ) ( )+ ( ) + 2 ( ) TEIG n ,  , g ,  ,,n TEIG n ,  , g ,  ,,n ≤ + ( + 4 /5 + 4 /5 )+ ( − 4 /5 − 4 /5 ) TMM n O n2 . + 2 ( )+ ( ) Each of the T terms is of the form n n , where both polynomials have nonnegative coefficients, and the exponent on polylog(n is at least)poly(. Thus,) when we split into problems of sizes ◦ 2 2 2 n n n and n n , by convexity T n ,... T n ,... 4 +1 T n,... 16 T n,... . Recursively then,++ − if= we were± to keep4 /5 all accuracy parameters( + )+ fixed,( − the) total52 cost( of the)= operations25 ( ) we perform in each layer is at≥ most times the◦ cost of the◦ previous≤ one.◦ Using our◦ parameter lower 16/25

53 bounds from Lemma 5.13, and these geometrically decreasing bit operations, we then have

 482 TEIG n,, g,,,n 25 TSPLIT n, n, g, ( ) / n 26 8 (5 ) ≤  482 TDEFLATE n, n, n, TMM n O n2 0 + 0 / /1 n 26 + 4 ( )+ ( ) (5 ) 25 NEIG 1 TINV n O n2 TQR n = 12 lg ! g 0 ( )+ ( ) +1 2 ( ) 1 8 ( ) TMM n n2TN O n2 0 + 5 ( )+ + ( ) 

NEIG 1 TINV n O n2 TQR n TMM n , 60 lg ! g ( )+ ( ) +1 10 ( ) + 25 ( ) ( ) where ≤  n n n 26 NEIG .   2 49 In the final expression for T we have256 used the fact256 that T (5O ) . Thus we have EIG ∶= lg + 3 lg lg + lgN lg n T n,, g,,,n O = (1) T n, u , EIG ! g   MM 1 1 by Theorem 2.10. ( )= log log + log log ( ) 0 ( ) 0 1 1 Required Bits of Precision. We will need the following bound on the norms of all spectral projectors.

Lemma 5.15 (Sizes of Spectral Projectors). Throughout the algorithm, every approximate spectral projector › given to DEFLATE satisfies .

Proof. Every such is -close to a true spectral projector of a matrix whose -pseudosepctrum ≤ 10n/ is shatteredP with respect to the initial‖P‖› unit grid g. Since we can generate by a contour › integral around theP boundary of a rectangular subgrid, we have /n P  P 8 × 8 n n ,   › 32 with the last inequality followingP from≤ 2+

54 lower bounds from Lemma 5.13, we need u to satisfy

EIG  +1 cINV n n 2N ( log +3) u 1− 256 , min INV n nNEIG ⎪⎧ 2 ( ) ≤ ⎪    ⎨ , 2 4 8 √ ⎪ › 1 ⎪ n2 n 26 max{ QR( ) MM( )} 100⎩ (5 ) 4 , 2 n , 2 n 100 3 2 QR( )‖P‖100 3 max{4 MM( ) 2 QR( )}   = , From Lemma 5.15, 10 /n, so⋅  the conditionsn n in the secondn ,n, two linesn are all satisfied if we make the crass upper bound P› ≤ n  ‖ ‖ 1 u 2 4 8 (5 ) max{ QR( ) MM ( ) }   30 n ≤ , i.e. if lg 1/u lg  . Unpackingn the first requirement n , n and,n using the definition of EIG and lg 1/2 (1 − )2 for ∈ (0 1), we have ≥x O N 26 c n ≤  3 256 (5 ) 8 59 INV EIG c n  256 lg lg 2 4 8 2 ( log +3)  +1 INV 1− n n . x 1−x 2, ( log +3) n n  n     256 N = 256 2 INV( ) EIG 2 INV( ) EIG

26 3 256 (5 ) 8 59 cINV n  2− lg lg 2 4 8 2 ( log +3) n √nN n n  . n √nN      2 INV( ) EIG so the final expression is a sufficient upper≥ bound on u. This gives ,  n √nN

(5 )26 . lg 1/u lg3 lg 214 83( INV EIG 2 4 8 log + 3) + lg n n ≥ 3 c n N = log  log log n n This dominates the precision requirementO above, andn completes. the proof of Theorem 5.5.     6 Conclusion and Open Questions

In this paper, we reduced the approximate diagonalization problem to a polylogarithmic number of matrix multiplications, inversions, and factorizations on a floating point machine with precision depending only polylogarithmically on and . The key phenomena enabling this were: (a) every matrix is -close to a matrixQR with well-behaved1/ pseudospectrum, and such a matrix can be found by a complex Gaussian perturbation;n and (b) the spectral bisection algorithm can be shown to converge rapidly to a forward approximate solution on such a well-behaved matrix, using a polylogarithmic in and amount of precision and number of iterations. The combination of these facts yields a -backward1/ approximate solution for the original problem. n   55 Using fast matrix multiplication, we obtain algorithms with nearly optimal asymptotic com- putational complexity (as a function of , compared to matrix multiplication), for general complex matrices with no assumptions. Using naive matrix multiplication, we get easily implementable al- n gorithms with 3 type complexity and much better constants which are likely faster in practice. The constants in( our) bit complexity and precision estimates, while not huge, are likely suboptimal. The reasonableO practicaln performance of spectral bisection based algorithms is witnessed by the many empirical papers (see e.g. [BDG97]) which have studied it. The more recent of these works further show that such algorithms are communication-avoiding and have good parallelizability properties.

Remark 6.1 (Hermitian Matrices). A curious feature of our algorithm is that even when the input matrix is Hermitian or real symmetric, it begins by adding a complex non-Hermitian perturbation to regularize the spectrum. If one is only interested in this special case, one can replace this first

step by a Hermitian GUE or symmetric GOE perturbation and appeal to the result of [APS+17] instead of Theorem 1.4, which also yields a polynomial lower bound on the minimum gap of the perturbed matrix. It is also possible to obtain a much stronger analysis of the Newton iteration in the Hermitian case, since the iterates are all Hermitian and V for such matrices. By combining these observations, one can obtain a running time for Hermitian= 1 matrices which is significantly better (in logarithmic factors) than our main theorem. We do not pursue this further since our main goal was to address the more difficult non-Hermitian case.

We conclude by listing several directions for future research.

1. Devise a deterministic algorithm with similar guarantees. The main bottleneck to doing this is deterministically finding a regularizing perturbation, which seems quite mysterious. Another bottleneck is computing a rank-revealing QR factorization in near matrix multipli-

cation time deterministically (all of the currently known algorithms require 3 time). Ω( ) 2. Determine the correct exponent for smoothed analysis of the eigenvalue gapn of + where is a complex Ginibre matrix. We currently obtain roughly 8/3 in Theorem 3.6. ( / ) A G Is it possible to match the −4/3 type dependence [Vin11] which is known for a pure Ginibre matrix?G n n 3. Reduce the dependence of the running time and precision to a smaller power of . The bottleneck in the current algorithm is the number of bits of precision required forlog(1/ stable) convergence of the Newton iteration for computing the sign function. Other, “inverse-free” iterative schemes have been proposed for this, which conceivably require lower precision.

4. Study the convergence of “scaled Newton iteration” and other rational approximation meth- ods (see [Hig08, NF16]) for computing the sign function on non-Hermitian matrices. Per- haps these have even faster convergence and better stability properties?

More broadly, we hope that the techniques introduced in this paper—pseudospectral shattering and pseudospectral analysis of matrix iterations using contour integrals—are useful in attacking other problems in numerical linear algebra.

56 Acknowledgments We thank Peter Bürgisser for introducing us to this problem, and Ming Gu, Olga Holtz, Vishesh Jain, Ravi Kannan, Pravesh Kothari, Lin Lin, Satyaki Mukherjee, Yuji Nakatsukasa, and Nick Tre- fethen for helpful conversations. We also thank the Institute for Pure and Applied Mathematics, where part of this work was carried out.

References

[AB13] Gérard Ben Arous and Paul Bourgade. Extreme gaps between eigenvalues of random matrices. The Annals of Probability, 41(4):2648–2681, 2013.

[ABB+18] Diego Armentano, Carlos Beltrán, Peter Bürgisser, Felipe Cucker, and Michael Shub. A stable, polynomial-time algorithm for the eigenpair problem. Journal of the Euro- pean Mathematical Society, 20(6):1375–1437, 2018.

[APS+17] Michael Aizenman, Ron Peled, Jeffrey Schenker, Mira Shamis, and Sasha Sodin. Ma- trix regularizing effects of Gaussian perturbations. Communications in Contemporary Mathematics, 19(03):1750028, 2017.

[BCD+05] David Bindel, Shivkumar Chandresekaran, James Demmel, David Garmire, and Ming Gu. A fast and stable nonsymmetric eigensolver for certain structured matrices. Tech- nical report, Technical report, University of California, Berkeley, CA, 2005.

[BD73] A. N. Beavers and E. D. Denman. A computational method for eigenvalues and eigen- vectors of a matrix with real eigenvalues. Numerische Mathematik, 21(5):389–396, 1973.

[BD98] Zhaojun Bai and James Demmel. Using the matrix sign function to compute invariant subspaces. SIAM Journal on Matrix Analysis and Applications, 19(1):205–225, 1998.

[BDD10] Grey Ballard, James Demmel, and Ioana Dumitriu. Minimizing communication for eigenproblems and the singular value decomposition. arXiv preprint arXiv:1011.3077, 2010.

[BDDR19] Grey Ballard, James Demmel, Ioana Dumitriu, and Alexander Rusciano. A generalized randomized rank-revealing factorization. arXiv preprint arXiv:1909.06524, 2019.

[BDG97] Zhaojun Bai, James Demmel, and Ming Gu. An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems. Numerische Mathematik, 76(3):279–308, 1997.

[BHM97] Ralph Byers, Chunyang He, and Volker Mehrmann. The matrix sign function method and the computation of invariant subspaces. SIAM Journal on Matrix Analysis and Applications, 18(3):615–632, 1997.

57 [BJD74] A. N. Beavers Jr. and E. D. Denman. A new similarity transformation method for eigenvalues and eigenvectors. Mathematical Biosciences, 21(1-2):143–169, 1974.

[BKMS19] Jess Banks, Archit Kulkarni, Satyaki Mukherjee, and Nikhil Srivastava. Gaus- sian regularization of the pseudospectrum and Davies’ conjecture. arXiv preprint arXiv:1906.11819, to appear in Communications on Pure and Applied Mathematics, 2019.

[BOE18] Michael Ben-Or and Lior Eldar. A quasi-random approach to matrix spectral analysis. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

[BX08] Ralph Byers and Hongguo Xu. A new scaling for Newton’s iteration for the polar decomposition and its backward stability. SIAM Journal on Matrix Analysis and Ap- plications, 30(2):822–843, 2008.

[Bye86] Ralph Byers. Numerical stability and instability in matrix sign function based al- gorithms. In Computational and Combinatorial Methods in Systems Theory. Citeseer, 1986.

[Cai94] Jin-yi Cai. Computing Jordan normal forms exactly for commuting matrices in poly- nomial time. International Journal of Foundations of Computer Science, 5(03n04):293– 302, 1994.

[CGH+96] Robert M. Corless, Gaston H. Gonnet, David E. G. Hare, David J. Jeffrey, and Donald E. Knuth. On the lambert w function. Advances in Computational mathematics, 5(1):329– 359, 1996.

[Dav07] E Brian Davies. Approximate diagonalization. SIAM Journal on Matrix Analysis and Applications, 29(4):1051–1064, 2007.

[DBJ76] Eugene D. Denman and Alex N. Beavers Jr. The matrix sign function and computa- tions in systems. Applied mathematics and Computation, 2(1):63–94, 1976.

[DDH07] James Demmel, Ioana Dumitriu, and Olga Holtz. Fast linear algebra is stable. Nu- merische Mathematik, 108(1):59–91, 2007.

[DDHK07] James Demmel, Ioana Dumitriu, Olga Holtz, and Robert Kleinberg. Fast matrix mul- tiplication is stable. Numerische Mathematik, 106(2):199–224, 2007.

[Dem87] James Weldon Demmel. On condition numbers and the distance to the nearest ill- posed problem. Numerische Mathematik, 51(3):251–289, 1987.

[Dem88] James W. Demmel. The probability that a numerical analysis problem is difficult. Mathematics of Computation, 50(182):449–480, 1988.

[Dem97] James W. Demmel. Applied numerical linear algebra, volume 56. SIAM, 1997.

58 [Dum12] Ioana Dumitriu. Smallest eigenvalue distributions for two classes of -Jacobi ensem- bles. Journal of Mathematical Physics, 53(10):103301, 2012.

[Ede88] Alan Edelman. Eigenvalues and condition numbers of random matrices. SIAM Journal on Matrix Analysis and Applications, 9(4):543–560, 1988.

[ER05] Alan Edelman and N. Raj Rao. Random matrix theory. Acta Numerica, 14:233–297, 2005.

[ES08] Alan Edelman and Brian D. Sutton. The beta-Jacobi matrix model, the CS decom- position, and generalized singular value problems. Foundations of Computational Mathematics, 8(2):259–285, 2008.

[For10] Peter J. Forrester. Log-gases and random matrices (LMS-34). Princeton University Press, 2010.

[GE96] Ming Gu and Stanley C. Eisenstat. Efficient algorithms for computing a strong rank- revealing QR factorization. SIAM Journal on Scientific Computing, 17(4):848–869, 1996.

[Ge17] Stephen Ge. The Eigenvalue Spacing of IID Random Matrices and Related Least Singular Value Results. PhD thesis, UCLA, 2017.

[GLO20] Anne Greenbaum, Ren-cang Li, and Michael L Overton. First-order perturbation theory for eigenvalues and eigenvectors. SIAM Review, 62(2):463–482, 2020.

[GLS12] Martin Grötschel, László Lovász, and Alexander Schrijver. Geometric algorithms and combinatorial optimization, volume 2. Springer Science & Business Media, 2012.

[Hig94] Nicholas J. Higham. The matrix sign decomposition and its relation to the polar decomposition. Linear Algebra and its Applications, 212:3–20, 1994.

[Hig02] Nicholas J. Higham. Accuracy and stability of numerical algorithms, volume 80. SIAM, 2002.

[Hig08] Nicholas J. Higham. Functionsof matrices: theory and computation, volume 104. SIAM, 2008.

[HJ12] Roger A. Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012.

[HL00] Uffe Haagerup and Flemming Larsen. Brown’s spectral distribution measure for - diagonal elements in finite von Neumann algebras. Journal of , 176(2):331–367, 2000. R [HP78] Walter Hoffmann and Beresford N Parlett. A new proof of global convergence for the tridiagonal QL algorithm. SIAM Journal on Numerical Analysis, 15(5):929–937, 1978.

59 [KL95] Charles S. Kenney and Alan J Laub. The matrix sign function. IEEE Transactions on Automatic Control, 40(8):1330–1348, 1995.

[LV16] Anand Louis and Santosh S Vempala. Accelerated newton iteration for roots of black box polynomials. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 732–740. IEEE, 2016.

[Mal93] Alexander N. Malyshev. Parallel algorithm for solving some spectral problems of linear algebra. Linear algebra and its applications, 188:489–520, 1993.

[Mez06] Francesco Mezzadri. How to generate random matrices from the classical compact groups. arXiv preprint math-ph/0609050, 2006.

[NF16] Yuji Nakatsukasa and Roland W. Freund. Computing fundamental matrix decom- positions accurately via the matrix sign function in two iterations: The power of Zolotarev’s functions. SIAM Review, 58(3):461–493, 2016.

[NH13] Yuji Nakatsukasa and Nicholas J. Higham. Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM Journal on Scientific Computing, 35(3):A1325–A1349, 2013.

[NTV17] Hoi Nguyen, Terence Tao, and Van Vu. Random matrices: tail bounds for gaps be- tween eigenvalues. Probability Theory and Related Fields, 167(3-4):777–816, 2017.

[Par98] Beresford N. Parlett. The symmetric eigenvalue problem, volume 20. SIAM, 1998.

[PC99] Victor Y. Pan and Zhao Q. Chen. The complexity of the matrix eigenproblem. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 507–516. ACM, 1999.

[Rob80] John Douglas Roberts. Linear model reduction and solution of the algebraic Riccati equation by use of the sign function. International Journal of Control, 32(4):677–687, 1980.

[SJ12] Dai Shi and Yunjiang Jiang. Smallest gaps between eigenvalues of random matri- ces with complex Ginibre, Wishart and universal unitary ensembles. arXiv preprint arXiv:1207.4240, 2012.

[Sma85] Steve Smale. On the efficiency of algorithms of analysis. Bulletin (New Series) of The American Mathematical Society, 13(2):87–121, 1985.

[Sma97] Steve Smale. Complexity theory and numerical analysis. Acta Numerica, 6:523–551, 1997.

[Śni02] Piotr Śniady. Random regularization of Brown spectral measure. Journal of Functional Analysis, 193(2):291–313, 2002.

60 [SST06] Arvind Sankar, Daniel A. Spielman, and Shang-Hua Teng. Smoothed analysis of the condition numbers and growth factors of matrices. SIAM Journal on Matrix Analysis and Applications, 28(2):446–476, 2006.

[ST04] Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.

[Sun91] Ji-Guang Sun. Perturbation bounds for the Cholesky and QR factorizations. BIT Numerical Mathematics, 31(2):341–352, 1991.

[Sza91] Stanislaw J. Szarek. Condition numbers of random matrices. Journal of Complexity, 7(2):131–149, 1991.

[TE05] Lloyd N. Trefethen and Mark Embree. Spectra and pseudospectra: the behavior of nonnormal matrices and operators. Princeton University Press, 2005.

[Vin11] Jade P. Vinson. Closest spacing of eigenvalues. arXiv preprint arXiv:1111.2743, 2011.

[VNG47] John Von Neumann and Herman H. Goldstine. Numerical inverting of matrices of high order. Bulletin of the American Mathematical Society, 53(11):1021–1099, 1947.

[Wil68] James Hardy Wilkinson. Global convergence of tridiagonal QR algorithm with origin shifts. Linear Algebra and its Applications, 1(3):409–420, 1968.

[WT02] Thomas G. Wright and Lloyd N. Trefethen. Eigtool. Software available at http://www.comlab. ox.ac.uk/pseudospectra/eigtool, 2002.

A Deferred Proofs from Section 4

Lemma A.1 (Restatement of Lemma 4.11). Assume the matrix inverse is computed by an algorithm INV satisfying the guarantee in Definition 2.7. Then G for some error matrix with norm ( ) = ( )+ cINV n −1 INV log −1 u (42) + + ( ) ( )A g A E4 E Proof. The computation of≤G consists of threen steps: A A √n . ‖E‖ (‖A‖) ‖A ‖  ‖ ‖ 1. Form −1 according to DefinitionA 2.7. This incurs an additive error of INV INV u cINV n = ( ) log −1 . The result is INV −1 INV ( ) A ( )= + E  n ⋅ ⋅ 2. Add A to INVA . This incurs an entry-wiseA A relativeE . error of size u: The result is ‖ ( ‖) A A −1 INV ( + + ) ( + ) where denotes the all-ones matrix,A A E ◦ J u,E andadd where denotes the entrywise

(Hadamard) product of matrices. ma J Eadd x ≤ ◦ ‖ ‖ 61 3. Divide the resulting matrix by 2. This incurs an entrywise relative error of size u. The final result is

G 1 −1 INV ( )= ( + + ) ( + ) ( + ) 2 iv where max u. A A A E ◦ J Eadd ◦ J Ed Finally, recallE thatdiv for≤ any matrices and , we have the relation (14) ‖ ‖ × n n M E

ma Putting it all together, we have M◦E ≤ M E x √n. ‖ ‖ ‖ ‖‖ ‖ −1 2 2 G 1 u u INV u ( )− ( ) + (2 + ) + (1 + ) 2 A g A ≤ A A−1 2 n E n INV 1 u u √ INV u √ log −1 u 2 ‖ ‖ ‖ ‖ + ‖ ‖ (2 + ) + ‖ (‖ ) ( ) n (1 + ) 2 INV c ≤ A −1A INV √n log n−1 ⋅ ⋅ uA A √n ‖ +‖ ‖ +‖ ( ) ( )c n 4 ‖ ‖ where we use u < in≤ theA lastA line.  n  A A √n 1 ‖ ‖ ‖ ‖ ‖ ‖ In what remains of this section we will repeatedly use the following simple calculus fact. Lemma A.2. Let , then 0 1 x,ylog( > + ) log( )+ and lg( + ) lg( )+ log 2 Proof. This follows directly from≤ the concavityy of the logarithm.≤ y x y x x y x . x x Lemma A.3 (Restatement of Lemma 4.15). Let and be given. Then for 1/800 0 1/2 0 t > >c> lg(1/ ) + 2 lg lg(1/ ) + lg> lg(1/ ) + 1 62 we have j ≥ t t c . , 2 (1 − ) j < c. 2t Proof of Lemma 4.15. An exact solution for j canj be written in terms of the Lambert W -function; t see [CGH+96] for further discussion and a useful series expansion. For our purposes, it is simpler to derive the necessary quantitative bound from scratch. Immediately from the assumption t < , we have j > t . First let us solve the case c . We will1/800 prove the contrapositive,log(1/ ) 9 so assume = 1/2 ≥ t 2 (1 − ) j . t2 1/2 Then taking on both sides, we have j ≥ log j t t t. 2 log(1/ ) + 1 −2j log(1 − ) 2j ≥ 62 ≥ Taking and applying Lemma A.2, we obtain lg j t 1 1 j t. 1+lg + lg log(1/ )+ j t + lg log 2 2 log(1/ ) ≥ Since t < we have 1 1 < . , so 1/800 log2 2 log(1/ ) 0 01 j j tj t t . t t . K. − lg lg(1/ ) + lg log(1/ ) + 1 01 lg(1/ ) + lg lg(1/ ) + 0 49 =∶ But since j , we have≤ j j . j, so ≤ 9 − lg 0 64 ≥ ≥ j 1 j j 1 K . ( − lg ) . 0 64 0 64 which implies ≤ ≤ j K j K . K K K . . + lg + lg(1 57 )= + lg + 0 65 Note K . t , because K t t . . t for t . Thus 1 39 lg(1/ ) ≤ −≤ lg(1/ ) = lg lg(1/ ) + 0 49 0 39 lg(1/ ) 1/800 ≤ K . t t ≤. , ≤ lg lg(1 39 lg(1/ )) lg lg(1/ ) + 0 48 so for the case c we conclude≤ the proof of the≤ contrapositive of the lemma: = 1/2 j K K . + lg + 0 65 t t . t . . ≤ lg(1/ ) + lg lg(1/ ) + 0 49 + (lg lg(1/ ) + 0 48) + 0 65 t t . . ≤= lg(1/ ) + 2 lg lg(1/ ) + 1 62 For the general case, once t t , consider the effect of incrementing j on the left 2j 2 (1 − ) / j 1/2 hand side. This has the effect of squaring and then multiplying by t2 −2, which makes it even smaller. At most c increments are≤ required to bring the left handj side down to c, since lg lg(1/ ) lg lg(1/ ) 2 c. This gives the value of j stated in the lemma, as desired. (1/2) c = Lemma A.4 (Restatement of Lemma 4.18). If N s s  . , = lg(1/ ) + 3 lg lg(1/ ) + lg lg(1/( 0))+7 59 then

N ⌈ s s s2 . ⌉ . lg(8/ ) + 2 lg lg(8/ ) + lg lg(16/( 0))+1 62 Proof of Lemma 4.18. We aim≥ to provide a slightly cleaner sufficient condition on N than the current condition

N s s s2 . . lg(8/ ) + 2 lg lg(8/ ) + lg lg(16/( 0))+1 62 Repeatedly using Lemma A.2, as well asthe cruderfact provided , ≥ we have lg lg( ) lg lg +lg lg 4 ≤ ≥ 2 2 ab a b a,b lg lg(16/( 0)) lg lg(16/ ) + lg lg(1/( 0)) =≤ 1+lg(3+lg(1/ )) + lg lg(1/( 0)) s  s  3 1 + lg lg(1/ )+s + lg lg(1/( 0)) log 2 lg(1/ ) ≤ lg lg(1/ ) +s lg lg(1/( 0))+1 66  ≤ s s 63  . where in the last line we use the assumption < . Similarly, 1/100 s s s s lg(8/ ) + 2 lg lg(8/ ) 3 + lg(1/s )+ 2lg(3+ lg(1/ )) ≤ s s 3 3 + lg(1/ ) + 2 lg lg(1/ )+ s log 2 lg(1/ ) ≤ s s . lg(1/ ) + 2 lg lg(2/ ) + 4 31 0 1 Thus, a sufficient condition is ≤

N s s  . . = lg(1/ ) + 3 lg lg(1/ ) + lg lg(1/( 0))+7 59

⌈ ⌉ B Analysis of SPLIT

Although it has many potential uses in its own right, the purpose of the approximate matrix sign function in our algorithm is to split the spectrum of a matrix into two roughly equal pieces, so that approximately diagonalizing A may be recursively reduced to two sub-problems of smaller size. First, we need a lemma ensuring that a shattered pseudospectrum can be bisected by a grid line with at least n eigenvalues on each side. /5 Lemma B.1. Let A have -pseudospectrum shattered with respect to some grid g. Then there ex- ists a horizontal or vertical grid line of g partitioning g into two grids g , each containing at least n , eigenvalues. ± min{ /5 1} Proof. We will view g asa s s array of squares. Write r ,r ,...,r for the number of eigenvalues 1 × 2 1 2 1 in each row of the grid. Either there exists i < s such that r s r n and r r n — 1 2 1+ + /5 +1+ + 1 /5 in which case we can bisect at the grid line dividing the ith from ii st rows—ori theres exists some i for which r . In the latter case,≤ we can always find a vertical( +≥ 1) grid line so that at≥ least 3/5 ⋯ ⋯ n of the eigenvaluesi in the ith row are on each of the left and right sides. Finally, if n , we may/5 trivially pick a grid≥ line to bisect along so that both sides contain at least one eigenvalue.5 ≤ Proof of Theorem 5.2. We’ll prove first that SPLIT has the advertised guarantees. The main ob- servation is that, given any matrix A, we can determine how many eigenvalues are on either side of any horizontal or vertical line by approximating the matrix sign function. In particular, A ℎ n n , where n are the eigenvalue counts on either side of the line z ℎ. Tr sgn(Running− )=SGN+to− a− final accuracy± of , Re =

, SGN A ℎ e , A ℎ , SGN A ℎ , A ℎ e Tr ( − )+ 4 − Tr sgn( − ) Tr ( − ) − Tr sgn( − ) + 4 n SGN A ℎ A ℎ SGN A ℎ u Using (15) ≤ ( − ) − sgn( − ) + ( − ) | | |n u A ℎ u . | | | ≤ + + sgn( − ) ‖ ‖ ‖ ‖  ≤ 64 ‖ ‖  SPLIT

Input: Matrix × , grid g grid pseudospectral guarantee , and a desired accuracy . 0 1 2 Requires: ∈is shatteredn n with= respect( to g, and) . Algorithm:Λ (A) ℂg g SPLIT zg,!,s , s 0 05/   ( A+ − + −) = ( ) n 1. P› , P› , , A, , , ≤ . Re 0 + 1/2 2. ℎ z !s ← − + 2 3. M A ℎ Eg 2 0 ← 1 − 2  4. rounddiam(SGN) ← (Tr ( /4 ) + ) 5. If < n , n M,  , 0, 4 ← min(3 /5 − 1) e (a)| | g grid z ,!,s , s  = ( /2 ) (b) z z ℎ − + 0 1 2 (c) g grid z ,!,s , s 0 = 0 ( /2 ) (d) › ← SGN ( + ) = 0(1 ± 1 (2 − )) 1 6. Else, execute+ − a binary2 search overℎ, horizontal grid-line shifts until SGN , at which point outputP , P› g , the subgridsA on either side of the shift , and set ( −SGN/4 ) . ℎ A( ℎ, ( ,− 0, /4 )) 1 7. If this fails, set± , and execute a binary search amongℎ verticalP›± shifts2 fromℎ theA, original  , 0 grid., ← Output: Sub-grids g , approximateiA spectral projectors , and ranks . Ensures: There existA← true spectral projectors satisfying (i) , (ii) , (iii) › , and (iv)± are the spectral projectors ontoP± the interiors+ n of±=g 1. rank( ) = /5 − P± P+ P− P± n± ≥ n ‖P± P›±‖ ≤ ± ± Since we can form P A ℎ by integrating around the boundary of the portions of g on either side of the line zsgn(ℎ, the− fact) that A is shattered means that Re = Λ ( )  A ℎ 1 1! s s  sgn( − )   (2 + 4 ) 8/ 2 in the last inequality we have used that g has side lengths1 of2 at most . Since we have run SGN to ≤ ≤ ; accuracy , this gives a total additive‖ error‖ of n u u  in computing the trace. If . n and u  n, then this error will strictly less than . and we can8 round , SGN A ℎ to the nearest real integer. Horizontal bisections work( similarly,+ + 8 with/ ) iA ℎ instead. ≤ 0 1/ Since≤ we/100 need only modify the diagonal entries of0 5A when creating MTr, we incur( − a diagonal) error matrix of norm at most u when creating . Using− and , the fact that u ensuresi thati,i the -pseudospectrum of i,i will still be shattered with respect to3 a translation of the original gridℎ g that includesM a segmentA of≤ theA imaginary≤ ℎ axis.≤ E | max A − | | | ‖ ‖ 4 | | 4 Using Lemma 4.10≤ /100andn≤ the/16 fact that g /2 , we can safely callMSGN with parameters and 2 diam( ) = 128 0  = /4  0 . =1−65 256 Plugging these in to the Theorem 4.9 ( < so , and . n so the hypotheses are satisfied) for final accuracy a sufficient number of iterations is 0  1/2 1− ≤ 1/100 ≤ 0 05/ ≤ 1/12 N . . SPLIT    256 256 4 In the course of these binary searches,∶= lg we+ make 3 lg lg at most+ lgs lgs calls+ 7 to59SGN at accuracy . These require at most  1 2 s s T n, , lg , SGN g 1 2 arithmetic operations. In addition,lg creating M/2and1− computing2 the trace of the approximate sign function cost us O n s s scalar addition0 operations.2 diam( We are) assuming1 that g has side lengths at most , so s s ! g . Combining all of this with the runtime analysis and machine 1 2 precision of SGN appearing( lg ) in Theorem 4.9, we obtain 1 2 8 lg ≤ 12 lg 1/ ( ) T n, g,, N T n, u O n . SPLIT ! g SPLIT INV 2 1 ( ) ≤ 12 lg ⋅ ⋅ ( )+ ( ) ( )  C Analysis of DEFLATE

The algorithm DEFLATE, defined in Section 5, can be viewed as a small variation of the random- ized rank revealing algorithm introduced in [DDH07] and revisited subsequently in [BDDR19]. Following these works, we will call this algorithm RURV. Roughly speaking, in finite arithmetic, RURV takes a matrix A with  A  A ≫ , for some r n , and finds nearly unitary matrices U , V and an upper triangular matrix such r r+1 that ≈ . Crucially, has the block decomposition ( )/ ( ) 1 ≤ ≤ 1 − 1 R U RV A R = (43) 11 12 R R22 where ∈ has smallest singularR value close to , ( ), and has largest singular value r r 0 R 1 RURV roughly ( ).× We will use and analyze the following implementationr of . As discussed11 ℂ in Section 5, we hope to use DEFLATE to approximate22 the range of a projector R r+1  A R with rank < n, given an approximation › close to in operator norm. We will show that  A from the output of RURV( ) we can obtain a good approximation to such a subspace. More specifically, under certain conditions, if ( ) = RURV( ), then the first columns of carry P r P P all the information we need. For a formal statement see Proposition C.12 and Proposition C.18 P› below. U , R P› r U Since it may be of broader use, we will work in somewhat greater generality, and define the subroutine DEFLATE which receives a matrix and an integer and returns a matrix ∈ with nearly orthonormal columns. Intuitively, if is diagonalizable, then under the guaranteen×r that is the smallest integer such that ( )/ ( ) 1, the columns of the output spanℂ a A r S k k+1A k  A 66 A ≫ S r RURV Input: Matrix ∈ Algorithm: RURV n×n ℂ( ) 1. A complex Ginibre matrix × A + 2. n nQR E1 (G ←) ( ) 3. V , R ∗ G ← + 3 4. B AV QRE ( ←) ( ) OutputU: A, R pair of matricesB . ← ( ) Ensures: with probability , for every and , where is ( − ) U , R 2 the 22 √lower-rightr n r +1( corner) of . 1 − 1 − 1 0 22 ( − ) ×‖R ( ‖− )  r A  r n  > R n r n ≤r R ≤ ≤ space close to the span of the top eigenvectors of . Our implementation of DEFLATE is as follows. Throughout this section we user and A to denote the exact arithmetic versions of RURV and DEFLATE respectively. In Subsectiondef C.1 we present a random matrix result that ⋅ ⋅,⋅ def ⋅ will be needed in the analysis of DEFLATErurv( ) . In Subsectionlate( ) C.3 we state the properties of RURV rurv( ) ⋅,⋅ that will be needed. Finally in Subsections C.4 and C.5 we prove the main guarantees of late( ) and DEFLATE, respectively, that are used throughout this paper. deflate C.1 Smallest Singular Value of the Corner of a Haar Unitary We recall the defining property of the Haar measure on the unitary group: Definition C.1. A random is Haar-distributed if, for any other unitary matrix , and are Haar-distributed as well. n n V For short, we will often refer× to such a matrix as a Haar unitary. W VW W V Let be positive integers. In what follows we will consider an Haar unitary matrix and denote by its upper-left corner. The purpose of the present subsection is to derive a n>r n n tail bound for the random variable . We begin with the well-known× fact that we can always reduceV our analysisX to the case whenr r . × n  X Observation C.2. Let 0 and( )∈ be a unitary matrix and denote by and its r≤n × upper-left × corner and its lower-right/2(n n− )×( − ) corner respectively. If 11 /2,22 then 2 − of the singular valuesn>r> of areV equalℂ to 1, while the remaining − are equalV to thoseV of . r r 22 n r n r r ≥n r11 n V n r Proposition C.3 ( of a submatrix of a Haar unitary). Let 0 and let be an × Haar V unitary. Let be the upper left × corner of . Then, for all 0 n n>r> V n n 1 1 X r ℙr V = (1− 2) ( −) > (44) ( ) r n r n ≤  .  X  675 4 DEFLATE

Input: Matrix × and parameter ∈ Requires: n, andn for some × with 2 , as well as Až ℂ r n 1/3and MM −QR N. ∈ n n rank( ) = rank( ) = 1/4 1‖A‖ ( ‖)Až A‖( ) ≤ ℂ ‖ ‖ ≤ n ,  n ≤, c A A A r Algorithm:ž DEFLATE . ≤ ≤ A = ≤  ( ) 1. S› RURV A, r ( ) ( ) 2. U , R first columnsA of . ← 3.S Output› r U ← › Output: MatrixS × . ∈ n r Ensures: There exists a matrix × whose orthogonal columns span range , such that , S ℂ ∈ ( ) − with probability at least 3 . n k (20 )S ℂ A ‖S› S‖  1 − 2n (√) r ≤   A In particular, for every 0 we have

 > 1 ( − ) ℙ 1− (45) ( ) 2 tr n r n ≤ ≥  . This exact formula for the CDFL of theX smallest singular value of is remarkably simple, and we have not seen it anywhere in the literature. It is anM immediate consequence of substantially more general results of Dumitriu [Dum12], from which one can extractX and simplify the density of ( ). We will begin by introducing the relevant pieces of [Dum12], deferring the final proof

untiln the end of this subsection. SomeX of the formulas presented here are written in terms of the generalized hypergeometric function which we denote by ( For our application it is sufficient to know 2 1 1 that m F a,b; c;( … )) (46) 2 1 x , ,1 x . whenever and is well defined. The above equationm can be derived directly from the 2 1 definition of (see Definition 13.1.1F in(0,b [For10; c, (x], or… , Definition x ))=1, 2.2 in [Dum12]). The generic2 1 results in [Dum12] concern the -Jacobi random matrices, which we have no 2 1 c > 0 F cause here to define in full. Of particular use to us will be [Dum12, Theorem 3.1], which ex- F presses the density of the smallest singular value of such a matrix in terms of the generalized F

hypergeometric function: Theorem C.4 ([Dum12]). The density of the probability distribution of the smallest eigenvalue , of the -Jacobi ensembles of parameters and size , which we denote by , is given by n  n m f () f 2 ( +1)−1 2 ( + )−1 2/ a,b −1 a m b m 2 1 m ,m m m (47) ,a,b (a + 1) (b + − 1) (b + 2 − 1)  , C  (1−) F 1− , ; +1;(1− ) 0 2 2 2 1 68 for some normalizing constant .

For a particular choice of parameters,,m the above theorem can be applied to describe the the C ,a,b distribution of 2 . The connection between singular values of corners of Haar unitary matrices and -Jacobi ensembles is the content of [ES08, Theorem 1.5], which we rephrase below to match n our context.  (X) Theorem C.5 ([ES08]). Let be an Haar unitary matrix and let . Let be the n2 upper-left corner of . Then, the eigenvalues of ∗ distribute as the eigenvalues of a Jacobi matrix of size with parameters n × n and . r ≤ X r × r V V XX − In view of the above result, Theorem C.4 gives an formular for the density of 2 . r = 2, a = 0 b = − 2 Corollary C.6 (Density of ). Let be an Haar unitary and be itsn upper-left 2  (X) corner with /2 this case (n − r)×(n − r) V r≤n n r, m r /2 (=− 2)−1 =1 0 = − 2 = −1 ( − )−1 (49) r n r , a2 1 ,b r r n r n where the lastf equality follows from (46). Usingn r then relation between the distribution of 2 and the distribution( )=C(1− of the minimum) eigenvalue(0 − − of 1; the;(1− respective) )=C(1−-Jacobi) ensemble described x x F , x x n in Theorem C.5 we have . By integrating on the right side of (49) we find( ) 2  X . n n ( ) = ( ) [0 1] Proof ofn Propositionr C.3. Fromf x (48) wef x have that , C = r( − )

ℙ 2  ( − )−1 ( − ) r n r r n r 0 n X ≤ r n r from where (44) follows.( To) prove= ((45)− note) that(1 − ) =1−(1−is convex) in , and hence   ‰ x dx ( − )  , r n r ¨ for every . (t)∶=(1− t) [0, 1] g t ≥g tg t , g C.2( ) Sampling(0) + (0) Haar Unitaries∈ [0 1] in Finite Precision It is a well-known fact that Haar unitary matrices can be numerically generated from complex Ginibre matrices. We refer the reader to [ER05, Section 4.6] and [Mez06] for a detailed discussion. In this subsection we carefully analyze this process in finite arithmetic. The following fact (see [Mez06, Section 5]) is the starting point of our discussion.

69 Lemma C.7 (Haar from Ginibre). Let be a complex Ginibre matrix and ℂ × be n n defined implicitly, as a function of , by then equation and the constraints that is unitary G n 10n U , R and is upper-triangular with nonnegativen diagonal entriesn × . Then, is Haar distributed∈ in the unitary group. G G = U R U R U The above lemma suggests that QR can be used to generate random matrices that are ap- proximately Haar unitaries. While doing this, one should keep in mind that when working with finite arithmetic, the matrix passed to(⋅QR) is not exactly Ginibre-distributed, and the algorithm QR itself incurs round-off errors.n Following the discussionGž in Section 2.4 we canassume that we have accessto a random matrix , with

Gžn n n n where is a complex Ginibre matrixž and ℂ × is an adversarial perturbation whose ž G = G + E, G entries are bounded by Nu. Hence, we have n n Nu. n √1 G n n E QR In what follows we usen× to denote the exact arithmetic∈ F version of . Furthermore, we assume that for any ℂcQR(, ) returns a pairE ≤ E ≤with√nc the property( that) has nonneg- QR × ‖ ‖ ‖ ‖ ative entries on the diagonal.∈ n n Since⋅QR( we) want to compare( ) with QR ⋅ it is necessary to A A U , R QR( ) ( ) R have a bound on the condition number of the decomposition.n For this, wen cite the following consequence of a result of Sun [Sun91, Theorem 1.6]: G Gž QR Lemma C.8 (Condition number for the decomposition [Sun91]). Let ℂ × with ∈ n n invertible. Furthermore assume that −1 1 . If and , then QR 2 ( ) = QR( ) ( )A, =E QR( + ) A ž › E A ≤ U−1 , R A U, R A E ‖ ‖‖ − ‖ 4 F F We are now ready to prove the mainUž resultU of≤ thisA subsection.E . As in the other sections devoted ‖ ‖ ‖ ‖‖ ‖ to finite arithmetic analysis, we will assume that u is small compared to QR ; precisely, let us assume that ( ) u QR  n (50) ( ) 1 Proposition C.9 (Guarantees for finite-arithmetic n ≤ Haar. unitary matrices). Suppose that QR sat- isfies the assumptions in Definition 2.8 and that it is designed to output upper triangular matrices with nonnegative entries on the diagonal11. If QR , then there is a Haar unitary matrix ( )= ( ) and a random matrix such that . Moreover, forn every and we have = + V , R Gž 1 0 2 2 + 1 U E tn 3 Vž U E n > > t > √ 2 2 2 ℙ < 8 cNQR n u 10 cNu e 2 e− . ( ) + 1 − 2 − 2 t n E ž ≥ Proof. From our Gaussian‖ ‖ sampling assumption, G G5 where Nu. Also, by the 4 = + assumptions on QR from Definition 2.8, there are matricesn n and such that√ , ‖ ‖ ≤ nc( ) = QR( ) 10 is almost surely invertible and under this event and are uniquelyžEn determinedE by these conditions. žn G U R Gž Vž Vž , R Gž 11Any algorithm that yields the decomposition can be modified in a stable way to satisfy this last condition at n QR the cost of u operations O∗(n log(1/ ))

70 and

<QR n u − ( ) ž ž ž G‖Vž GV‖ QR n u G QR n u G ncNu . − ( ) ( ) + n n n n √ The latter inequality implies,‖ using‖≤ (50), that‖ ‖≤ ‖ ‖  ž Gž G QR n u G ncNu ncNu QR n u G ncNu. (51) − ( ) + + ( ) + 2 n n n √ √ n √ Let U, ¨‖ ‖≤ . From Lemma‖ ‖ C.7 we know that≤ is Haar‖ distributed‖ on the unitary group, so( using) ∶= (51 QR() and Lemma) C.8, and the fact that for any matrix n × , we know that √ R G ‖M‖ ≤U ‖M‖F ≤ n‖M‖ n n

M QR u N QR u −1 Nu −1 (52) − − ( ) − − − − 4√ ( ) + 10 ž ž n n n Now,‖U fromV‖  −1n ≤ ‖U V‖and‖ fromV V‖≤‖U Theorem V‖≤3.1 we havenc  thatn ‖G ‖‖G ‖ nc ‖G ‖. = 1/ ( ) ‖Gn ‖ n Gn −1 2 2 ( 2 ) = 2 n √ P ‖Gn ‖≥ ≤ e e . 2 On the other hand, from Lemma 2.2 of [BKMS19  ] we have − . Hence, 2 2+ nt under the events −1 and , inequality (52) yieldsn √ n 2 2+ P ‖G ‖> t ≤ e n n √ ‖G ‖≤ ‖G ‖≤3 t 2 2 4 N QR u 10 Nu − ( ) 2 2+ + 1 + n √ n ‖U V‖≤ c  n t  c . Finally, if we can exchange the term for in the bound. Then, using a union bound we2√2 obtain + 1 the advertised guarantee. 2√2+ + 1 2 t > t t C.3 Preliminaries of RURV

Let ℂ × and . As will become clear later, in order to analyze DEFLATE it is of fundamental∈ n n ( importance) = rur to bound the quantity , where is the lower-right 22 22 22 blockA of . To thisU end, , R it willv( sufficeA) to use Corollary C.11 below, which is the complex analog(A, r) to the upper bound given in equation (4) of [BDDR19‖R, Theorem‖ 5.1].R Actually, Corollary(n−rC.11)×(nis−r a) R direct consequenceR of Lemma 4.1 in the aforementioned paper and Proposition C.3 proved above. We elaborate below.

Lemma C.10 ([BDDR19]). Let , ℂ × and ∗ be its singular value decompo- sition. Let , be the lower rightn n corner of , and be such that 22 n>r> A A P Q . Then, if ∗ ∗, 0 ∈ = Σ (U , R) = rurv(A) R (n − r)×(n − r) R V A U RV X Q V +1 = = 22 r  A11 where is the upper left block of ‖R. ‖≤ n ( ), 11  (X ) X r r X × 71 This lemma reduces the problem to obtaining a lower bound on . But, since is a Haar 11 unitary matrix by construction and ∗ with ∗ unitary, we haven that is distributed as a Haar unitary. Combining Lemma C.10 and Proposition C.3 gives the following(X ) result.V X = Q V Q X Corollary C.11. Let , ℂ × , and be the lower right corner of . Then for any n n 22 n>r> 0 A ∈ (U , R) = rurv(A) R (n −r)×(n −r) R  > 0ℙ 2 22 +1 tr n r ‖R ‖≤ ( − )r A ≥  . L  ( ) 1− C.4 Exact Arithmetic Analysis of DEFLATEM It is a standard consequence of the properties of the decomposition that if is a matrix of rank , then almost surely is a matrix with orthonormal columns that span the range of . As a warm-updef let’s recall this argument. QR A r A, r n r Let andlate(be the) unitary× matrix used by the algorithm to produce this output. SinceA we are working in exact arithmetic, is a Haar unitary matrix, and hence it is U , R A V almost surely( ) invertible. = rurv( ) Therefore, with probability 1 we have ∗ , so if ∗ we V rank( )= = will have and ℂ × , where and are as in (43). Writing 22 = 0 11 ∈ r r 11 22 AV r U R AV R R R R 11 12 = U21 U22 U for the block decomposition of with ℂU× , noteU that 11 ∈0 r r 1 U U ∗ 11 11 11 12 + 12 22 (53) = = U21R11 U21R12 + U22R22 AV U R . On the other hand, almost surely the first UcolumnsR U ofR ∗UspanR the range of . Using the right side of equation (53) we see that this subspace0 also coincides with1 the span of the first columns of , since is invertible. r AV A We will now11 prove a robust version of the above observation for a large class ofr matrices, U R 12 namely those for which 2 . We make this precise below and defer the proof to the end of the subsection.rank( ) = rank( ) A A A Proposition C.12 (Main guarantee for ). Let 0 and ∈ ℂ × be such that − def n n and 2 . Denote and . Then, for any rank( ) = rank( ) = ∶= def ≤ (0 1), with probability 1− 2 there existslate a unitary ∈ ℂ × suchž that ž > A,T A A, r ‖A A‖ late( ž ) r r ∶= deflate( )  , A A r  S 8 (A,−W r ) ∈ − (54) ∗ z ( ) tr n∗ r w Remark C.13 (The projector case)S . InTW the case≤ inr which the⋅ matrix of Proposition C.12 is a (not ‖ ‖  T AT . necessarily orthogonal) projector, ∗ = , and the term in the denominator of (54) becomes a 1. r r A 12 T AT For example, diagonalizable matrices satisfy thisI criterion. 

72 We begin by recalling a result about the stability of singular values which will be important throughout this section. This fact is a consequence of Weyl’s inequalities; see for example [HJ12, Theorem 3.3.16] .

Lemma C.14 (Stability of singular values). Let ∈ ℂ × . Then, for any = 1 … we have n n ( + )− ( ) k , ,n X, E k k We now show that the orthogonal X projectionE  X ≤ E . ∗ is close to a projection onto the range of , in the| sense that ≈| . ‖def‖ P A,ž r A,ž r ∶= late( )deflate( ) Lemma C.15. Let 0 andA ∈ ℂ × be suchPA that A and . Let and . Then, almostn n surely rank( ) = − ( ) ∶= rur def ≤ ž ‖ ž‖ > A, A ∗ A r A A U , R (55) v(Až) S ∶= late(A,rž ) 22 where is the lower right blockn of≤ . 22 ‖(SS − I )A‖ ‖R ‖+ , Proof. We will begin by showingn r thatn r ∗ R is small. Let be the unitary matrix that ( − )×( − ) was usedR to generate . As outputsn the first columns of , we have the block decomposition , where ℂSS and ℂ . ¨ ‖( × − I )Až‖¨ ×( − ) V On the other hand we have ⋅,n⋅ sor n n r (U , R) deflate( , ) r U U = S U  S ∈ U ∈ ∗ ∗ ¨ 0 − ¨ = 0 − ¨ Až = U RV 2 2 n , Since = = 1 from the above equation we get ( − ) . Now we can conclude ¨ ( − ) ž =( − ) = ∗ that SS I A SS I S U  RV U  RV 22U R  V. n ≤ ‖ ‖ ‖ ‖ ( ∗ − ) ( ∗ − ) + ( ∗ −‖ )( − )‖ ‖ ‖ + U V SS I Až R22 n ≤ n n ≤ ‖ ‖ ‖ ž‖ ‖ ž ‖ ‖ ‖ The inequality (55SS) canI beA appliedSS to quantifyI A theSS distanceI A betweenA R the ranges . of and in terms of , as the following result shows. def 22 late( ž ) Lemma C.16 (Bound in terms of ). Let 0 and ∈ ℂ × be such that A, r deflate( ) ‖ ‖ 22 n n rank( ) = 2 A, rand R . Denote by , and rank( ) = − ( ) ∶= rur def . Then, almost surely there‖ exists‖ a unitary ℂ מ such that ≤ R > A, A TA ‖ ž‖ v(r žr ) ∶= late( ž ) ∶= A A, r r A A U , R W A S A,r deflate( ) ∗ 22 ∈ (56) w R ∗ S TW ≤ ‖ ‖+ , where is the lower right ‖ − block‖ 2 of .r T AT 22  ( ) Proof. RFrom Lemma C.15 wen knowr thatn r almost surelyR ∗ . We will use this ( − )×( − ) 22 to show that ∗ ∗ is small, which can be interpreted asn ∗ being close to unitary. First note that SS ≤ r ‖( − ) ‖ ‖ ‖+ T SS T I AT R ‖ − ‖ ∗ ∗ I ∗ ∗ r w S T ∗ SS∗ Ir w . (57) w sup, w ( − ) =w supA , w ( − ) ∈ℂ ‖ ‖=1 ∈range( ) ‖ ‖=1 T SS T r r T SS T ‖ − I ‖= ‖ I73 ‖ ‖ ‖ Now, since A A2 , if w A then w Av for some v A . So by the Courant-Fischerrank( formula) = rank( ) ∈ range( ) = ∈ range( ) w Av Au r T ∗AT . v v u A u = ∈range(inf ) = ( ) We can then revisit (57) and‖ get‖ ‖ ‖ ≥ ‖ ‖ ‖ ‖ ‖ ‖ ‖ ‖ T ∗ SS∗ Ir Av T ∗ SS∗ Ir AT T ∗ SS∗ Ir w ( − ) ( − ) . (58) w supA , w ( − ) =v supA , v r T ∗AT r T ∗AT ∈range( ) ‖ ‖=1 ∈range( ) ‖ ‖ 1 ( ) ( ) ‖ ‖ ≤ ‖ ‖ ≤ On the other hand‖ T ∗ SS∗ Ir AT‖ SS∗ Ir A , so combining this fact with (57) and (58) we obtain ( − ) ( − ) 22 + ≤ ≤ ‖ ‖∗ ‖∗ r ‖ 22 ‖R+ ‖ − r ∗ ‖ ( ‖ ) Now define , 22 TandSS letT ≤ be the. polar decomposition of . Observe ∗ ¨ ‖ ‖+ ‖ ‖ R T AT ∶= ∶= ( ∗ ) I= that r  X S T RT AT  |2 | ∗ ¨ − 1( ) − 1 X 1(W)X− 1 = − X Thus r Finally note that r ∗ ≤ ≤ ¨ ≤ − = −‖| | = ‖( − ) | | ‖ ‖ X I  Xn  X X X I . T W X ∗ 2W ∗ X ∗ ≤ ∗ ‖S ‖− ‖ =‖ ( ‖ −| | I )(W−‖ . ) ∗ ∗ ∗ TW = 2S − W T S− TW ‖S ‖ ‖ r ∗ ∗ ∗ ‖ ∗ ∗ ∗ ∗ = 2 − TW( +W T −S )−( + − ) ‖ Ir S∗ ∗ ∗ ‖ ∗ ∗ ∗ ∗ ¨ 2 − T T S + W ( T S− S)T+ (W −S T )T S 4 ‖ Ir S ‖ which concludes the proof.≤ T T S S T W T S W S T T S ≤ ‖I S ‖ ‖ ‖ ‖ ‖ , Note that so far our results have been deterministic. The possibility of failure of the guarantee given in Proposition C.12 comes from the non-deterministic bound on . 22 Proof of Proposition C.12. From Lemma C.14 we have . Now combine Lemma C.16 with Corollary C.11. +1( ) ‖R ‖ r ≤  Až C.5 Finite Arithmetic Analysis of DEFLATE In what follows we will have an approximation of a matrix of rank with the guarantee that . −For the sake of readability we will not present optimal bounds for the error induced by round- Až A r off, and≤ we will assume that ‖A Až‖ 1 max{ N MM( )u N QR( )u} MM QR N (59) 4 4 and 1 min{ ( ) ( ) } ⋅ n ,c  n ≤ RURV≤ ≤ ≤ n , n ,c . We‖ ‖ begin by analyzing the subroutine ‖in‖ finite arithmetic. This was done in [DDH07, LemmaA 5.4]. Herec  we make the constants arising fromA this analysis expl icit and take into consid- eration that Haar unitary matrices cannot be exactly generated in finite arithmetic.

74 Lemma C.17 (RURV analysis). Assume that QR and MM satisfy the guarantees in Definitions 2.6 and 2.8. Also suppose that the assumptions in (59) hold. Then, if RURV and is the matrix used to produce such output, there are unitary matrices ( and) ∶= a matrix( ) such that and the following guarantees hold: U , R A V = U,ž Vž Až Až 1.Už RVž QR u. − ( ) 2. Uis HaarUž ≤ distributedn in the unitary group. ‖ ‖ 3.V Forž every and , the event: 1 0 2 2 + 1 tn> >3 t > n√ tn 3 n 2 2 ž 2 2 < 8 cNQR n u 10 u A A < A 9 cNQR n u MM n u 10 cNu − ( ) + and − ( ) + 2 ( ) + Vž V (60) ‖ ‖ ‖ 2 ‖ ‖ ‖ occurs with probability at least e 2 e− . 0 1 1 − 2 − 2 t n Proof. By definition V QR Gž with Gž G , where is an Ginibre matrix and u. A direct application= ( of) the guarantees= + on each step yields the× following: n n n n n n E 1.≤ √ Fromn Proposition C.9, we know that there isE a Haar unitaryG and a random matrix , ‖ ‖ such that and 0 = + 0 Vž E V Vž E tn 3 n 2 2 2 ℙ < 8 cNQR n u 10 cNu e 2 e− . (61) 0 ( ) + 1 − 2 − 2 t n E ≥ ‖ ‖ 2. If B MM A, V ∗ 4 AV ∗ , then from the guarantees5 for MM we have MM u. 1 1 Now∶= from the( guarantees)= + for QR we know that is QR u away from a unitary, and hence( ) ( ) ≤ n ‖ ‖ ‖ ‖‖ ‖ E V  n E A V  MM u QR u MM u 5 MM u ( ) (1 + ( ) ) ( ) ( ) 4 V  n ≤  n  n ≤  n where the last inequality‖ ‖ follows from the assumptions in (59). This translates into

QR u 5 + 1 (1 + ( ) ) + 1 + 1 4 B ≤ A V E ≤  n A E ≤ A E . Putting the above‖ together‖ ‖ ‖‖ and‖ ‖ using‖ (59) again, we‖ get‖ ‖ ‖ ‖ ‖ ‖ ‖

5 MM u and 5 MM u < A . (62) 1 ( ) (1 + ( ) ) 2 4 4 E ≤ A  n B≤ A  n 3. Let U, QR‖ .‖ Then‖ there‖ is a unitary and a‖ matrix‖ such that ‖ ‖ , , 2 3 and( )=, with( ) error bounds QR u and QR u. Using= + (62) we= obtain+ = 2 ( ) 3 ( ) R B Už B› U Už E B B› E ≤ QR n u < AEQR≤ nBu. n (63) B› Už R 3 ‖E ‖  ( ) 2 ‖ ‖ (‖ )‖ E ≤ B  n ‖ ‖ ‖ ‖ ‖ ‖ 75 4. Finally, define Až B›Vž. Note that Až Už and ∶= =

∗ ∗ ∗ = =( − 3) =( + 1 − 3) =(RVž ( + 0) + 1 − 3) = +( 0 + 1 − 3) which translates into Až B›Vž B E Vž AV E E Vž A Vž E E E Vž A AE E E V,ž − 0 + 1 + 3 Hence, on the event described in the left≤ side of (61), we have ‖A Až‖ ‖A‖‖E ‖ ‖E ‖ ‖E ‖. 3 2 2 8 N QR u 10 Nu 5 MM u QR u − ( ) + + ( ) + 2 ( ) tn n 4 ≤ c  n c  n  n , and using some‖ crude‖ ‖ bounds,‖ the above inequality yields the advertised bound. A Až A 0 1

We can now prove a finite arithmetic version of Proposition C.12.

Proposition C.18 (Main guarantee for DEFLATE). Let be positive integers, and let 0 and ℂ × be such that and 2 . Let DEFLATE and ∈ n n . If QR and− MM satisfyrank( the guarantees) =n>r rank( in) Definitions = 2.6∶=and 2.8, and( (59) ∶= def ≤ , > holds, then,ž for every ‖ 1 therež‖ exist a unitary ∈ ℂ × such that ž A,T A A, r A A A A r S A, r late( ) r r t > √ W 2 2+ 2 ( − ) − QR( )u + 12 (64) ∗ z ( ) tn tr ∗n r w 2 S TW ≤ n r . ‖ ‖ 2 with probability at least 1 − 7 2 − 2 − .  T AT , t n  Proof. Let ( )= RURV( ). From Lemma C.17 we know that there exist ∈ ℂ , such that  e × − and − are small, and ( )= for the respective realization of ann n exact Haar unitary matrix.U , R Then, fromAž and (60), for every and U,ž Ažž we have ‖ ž‖ ‖ ž žž‖ ž rurv(žž) U U A A ≤ U , R A3 t > √ + 2 1 0 2 2 + 1 ‖Až‖ ‖A‖ QR Nu > >MM u Nu (65) ž ž tn n ôA Ažô ≤ ôAž Ažô Až A ≤ A 9  n c  n 10 c ô − ô ô − ô + ‖ − ‖ 2 (‖ ‖ + ) ( ) + 2 ( ) + + withô probabilityô ô ô 2 − . 0 1 , t n Now, from (59) we have u 1 and N u for QR MM , so we can bound the respective terms1 − 2 in (65−) 2 by : 4 e e ≤ ≤ ≤ n , n ‖ ‖ = ( ) ( ) 3 c A    3 2 2 2 2 N QR u MM u Nu tn n tn n 9 c  n  n 10 c ≤ 9 10 (‖ ‖ + ) ( ) + 2 ( ) + + (1 + ) 2 + 2 + +(66) A 0 1 0 1 t ≤ (12 + 16)n where the last crude bound uses 3 and . 2 2 5 , 4 ≤n ≤n , ≤ t > 1 1+76 2 Observe that is the matrix formed by the first columns of , and that by def Proposition C.12 we know that for every 0, with probability 1− 2 there exists a unitary S› A,ržž r Už such that = late( )  >  W 8 ( − ) − − ∗ (67) z ( ∗ ) { žž tr n r ôA Aô On the other hand, is the matrixS› TW formed≤ by ther first .columnsô ofô. . Hence ‖ ‖  T AT ô  ô

S − − rQR( )u U

Putting the above together we get thatS S› under≤ U thisUž event≤ n . ‖ ‖ ‖ ‖

8 ( − ) − − ∗ − + − ∗ QR( )u + (68) z ( ∗ ) { žž tr n r ôA Aô S TW ≤ S S› S› TW ≤ n r . ô ô. Now, taking ‖ = , we‖ note‖ that‖ both‖ events‖ in (65) and (67) happenT AT withô probability ô at least 2 1−(2 +1) 2 −2 − . The result follows from replacing the constant 2 +1 with 7, using 2 2+1 and replacing 8(12t n + 16) with 144 , and combining the inequalities (65), (66) and (68). e  e e t > √ We end by provingt Theorem 5.3t, the guarantees on DEFLATE that we will use when analyzing the main algorithm. Proof of Theorem 5.3. As Remark C.13 points out, in the context of this theorem we are passing to DEFLATE an approximate projector , and the above result simplifies. Using this fact, as well as the upper bound ( − ) 2/4, we get that P› r n r ≤n 12 3 − ∗ QR( )u + ttn S TW ≤ n 2 with probability at least 1 − 7 2 −‖ 2 − for‖ every 2 2. If our desired. quality of approximation t n is − ∗ = , then some basic algebra gives the success probability as at least t > √ S TW   e 3 2 ‖ ‖ 1 − 1008 − 2 − ( − QR( )u) n t 2 t n e . Since 1/4, we can safely set = 2/ , giving n   ≤ t t 3 1 − 1426 − 2 −2 / ( − QR( )u)2 n n t e . n 3 To simplify even further, we’d like to use the upper bound 2 −2 / . These two terms   QR u 2 have opposite curvature in on the interval (0 1), and are equaln at( zero,− n ( so) ) it suffices to check √n e ≤  

, 77 that the inequality holds when = 1. The terms only become closer by setting = 1 everywhere except in the argument of QR( ), so we need only check that n  ⋅ 2 1

2 ( − QR( )u)2 ≤ . Under our assumptions QR( )u 1,e the right  handn side is greater than one, and the left hand less. Thus we can make the replacement, use u , and round for readability to a success , n ≤ 2  ( ) probability of no worse than QR ≤  n 1 − 6000 3 n t2 the constant here is certainly not optimal. ; Finally, for the running time, we need to sample 2 complex Gaussians, perform two QR decompositions, and one matrix multiplication; this gives the total bit operations as n

DEFLATE 2 N QR MM

T (n)= n T + 2T (n)+ T (n). Remark C.19. Note that the exact same proof of Theorem 5.3 goes through in the more general case where the matrix in question is not necessarily a projection, but any matrix close to a rank-

deficient matrix . In this case an extra ∗ term appears in the probability of success (see DEFLATE the guarantee given in the box for the Algorithmr that appears in this appendix). A  (T AT )

78