A New Analysis of Iterative Refinement and its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems

Carson, Erin and Higham, Nicholas J.

2017

MIMS EPrint: 2017.12

Manchester Institute for Mathematical Sciences School of Mathematics

The University of Manchester

Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK

ISSN 1749-9097 A NEW ANALYSIS OF ITERATIVE REFINEMENT AND ITS APPLICATION TO ACCURATE SOLUTION OF ILL-CONDITIONED SPARSE LINEAR SYSTEMS∗

ERIN CARSON† AND NICHOLAS J. HIGHAM‡

Abstract. Iterative refinement is a long-standing technique for improving the accuracy of a computed solution to a nonsingular linear system Ax = b obtained via LU factorization. It makes use of residuals computed in extra precision, typically at twice the working precision, and existing results guarantee convergence if the matrix A has condition number safely less than the reciprocal of the unit roundoff, u. We identify a mechanism that allows iterative refinement to produce solutions with normwise relative error of order u to systems with condition numbers of order u−1 or larger, provided that the update equation is solved with a relative error sufficiently less than 1. A new rounding error analysis is given and its implications are analyzed. Building on the analysis, we develop a GMRES-based iterative refinement method (GMRES-IR) that makes use of the computed LU factors as preconditioners. GMRES-IR exploits the fact that even if A is extremely ill conditioned the LU factors contain enough information that preconditioning can greatly reduce the condition number of A. Our rounding error analysis and numerical experiments show that GMRES-IR can succeed where standard refinement fails, and that it can provide accurate solutions to systems with condition numbers of order u−1 and greater. Indeed in our experiments with such matrices—both random and from the University of Florida Collection—GMRES-IR yields a normwise relative error of order u in at most 3 steps in every case.

1. Introduction. Ill-conditioned linear systems Ax = b arise in a wide variety of science and engineering applications, ranging from geomechanical problems [11] to computational number theory [4]. When A is ill conditioned the solution x to Ax = b is extremely sensitive to changes in A and b. Indeed, when the condition number κ(A) = kAkkA−1k is of order u−1, where u is the unit roundoff, we cannot expect any accurate digits in a solution computed by standard techniques. This poses an obstacle in applications where an accurate solution is required, of which there is a growing number [3], [13], [21], [22], [32]. Iterative refinement (Algorithm 1.1) is frequently used to obtain an accurate so- lution to a linear system Ax = b. Typically, one computes an initial approximate solution xb using (GE), saving the factorization A = LU. Here and throughout, to simplify the notation we assume that A is a nonsingular matrix for which any required row or column interchanges have been carried out in advance (that is, “A ≡ P AQ”, where P and Q are appropriate permutation matrices). After 1 2 computing the residual r = b − Axb in higher precision u, where typically u = u , one reuses L and U to solve the system Ad = r, rewritten as LUd = r, by substi- tution. The original approximate solution is then refined by adding the corrective term, xb ← xb + d. This process is repeated until a desired backward or forward error criterion is satisfied. If A is very ill conditioned however, the iterative refinement process may not

∗Version of March 27, 2017. Funding: The work of the second author was supported by Math- Works, European Research Council Advanced Grant MATFUN (267526), and Engineering and Phys- ical Sciences Research Council grant EP/I01912X/1. The opinions and views expressed in this pub- lication are those of the authors, and not necessarily those of the funding bodies. †Courant Institute of Mathematical Sciences, New York University, New York, NY. (er- [email protected], http://math.nyu.edu/˜erinc). ‡School of Mathematics, The University of Manchester, Manchester, M13 9PL, UK ([email protected], http://www.maths.manchester.ac.uk/˜higham). 1We are not concerned here with iterative refinement in fixed precision, which can also benefit accuracy, though to a lesser extent [15, Chap. 12], [33]. 1 Algorithm 1.1 Iterative Refinement

Input: n × n matrix A; right-hand side b; maximum number of refinement steps imax. Output: Approximate solution xb to Ax = b. 1: Compute LU factorization A = LU. 2: Solve Ax0 = b by substitution. 3: for i = 0 : imax − 1 do 4: Compute ri = b − Axi in precision u; store in precision u. 5: Solve Adi = ri. 6: Compute xi+1 = xi + di. 7: if converged then return xi+1, quit, end if 8: end for 9: % Iteration has not converged.

invhilb 1020

1015

1010

κ Ub −1Lb−1A 105 ∞( ) 1 + κ∞(A)u 100 10 15 20 25

randsvd 105

104

103

102 b −1 b−1 κ∞(U L A) 1 10 1 + κ∞(A)u 100 10 15 20 25

−1 −1 Fig. 1.1. Comparison of κ∞(Ub Lb A) and 1+κ∞(A)u for a computed LU factorization with partial pivoting, for matrices of dimension shown on the x-axis. These matrices are very ill condi- 13 35 tioned with κ∞(A) ranging from 10 to 10 . Computations were in double precision arithmetic. Top: inverse Hilbert matrix. Bottom: matrices generated as gallery(’randsvd’,n,10^(n+5)) in MATLAB. converge, and indeed all existing results on the convergence of iterative refinement require A to be safely less than u−1. Nevertheless, despite the ill conditioning of A, there is still useful information contained in the LU factors and their inverses (perhaps implicitly applied). It has been observed that if Lb and Ub are the computed factors of A from LU factorization with partial pivoting then κ(Lb−1AUb−1) ≈ 1 + κ(A)u even for κ(A)  u−1, where Lb−1AUb−1 is computed by substitution; for discussion and further experiments see [24], [28], [29]. This approximation can also be made in the case where left-preconditioned is used, that is, κ(Ub−1Lb−1A) ≈ 1 + κ(A)u. The experimental results shown in Figure 1.1 illustrate the quality of this approximation. The results in Figure 1.1 lead us to question the conventional wisdom that iterative refinement cannot work in the regime where κ(A) > u−1. If the LU factors contain useful information in that regime, can iterative refinement be made to work there 2 too? We will give a new rounding error analysis that identifies a mechanism by which iterative refinement can indeed work when κ(A) > u−1, provided that we can solve the equations for the updates on line5 of Algorithm 1.1 with some relative accuracy. We will then use the analysis to develop an implementation of Algorithm 1.1 that enables accurate solution of sparse, very ill conditioned systems. We use the generalized minimal residual (GMRES) method [31] preconditioned by the computed LU factors to solve for the corrections. We refer to this method as GMRES-based iterative refinement (GMRES-IR). −1 We show that in the case where the condition number κ∞(A) is around u , our approach can obtain a solution xb to Ax = b for which the forward error kxb−xk∞/kxk∞ is of order u. Extra precision (assumed to be with a unit roundoff of u2) need only be used in computing the residual, in the triangular solves involved in applying the preconditioner, and in matrix-vector multiplication with A. An advantage of our approach is that it can succeed when standard iterative re- finement (using the LU factors to solve Ad = r) fails or is, in the words of Ma et al. [22], “on the brink of failure”. Ma et al. [22] need to solve linear programming prob- lems arising in a biological application and their attempts to use iterative refinement in the underlying linear system solutions were only partially successful. GMRES-IR offers the potential for better results in this application. We note that there is growing interest in using lower precisions such as single or even half precision in climate and weather modeling [27] and machine learning [12], but this brings an increased likelihood that the problems encountered will be very ill conditioned relative to the working precision. Our work, which is applicable for any u, could be especially relevant in these contexts. In Section2 we present the new rounding error analysis and investigate its impli- cations. In Section3 we explain how we use preconditioned GMRES within iterative refinement and give, using the results of Section2, theoretical justification that this GMRES-based iterative refinement can yield an error of the order of u even in cases −1 where κ∞(A) ≥ u . Since we are primarily concerned with sparse linear systems, we discuss in Section 3.1 various choices of pivoting strategy and how these affect the nu- merical behavior. In Section4 we discuss other, related work on iterative refinement. Numerical experiments presented in Section5 confirm our theoretical analysis. Our numerical experiments motivate a two-stage iterative refinement approach, which we briefly discuss in Section 5.3, that first attempts the less expensive standard iterative refinement and switches to a GMRES-IR stage in the case of slow convergence or divergence. 2. Error analysis of iterative refinement. The most general rounding error analysis of iterative refinement is that of Higham [15, Sec. 12.1], which appeared first in [14]. That analysis cannot provide the result we want; convergence is guaranteed only −1 when κ∞(A) ≤ u . We therefore carry out a new analysis with different assumptions. A key observation is that an inequality used without comment in previous analyses (p) can be very weak. We introduce a quantity µi , in (2.2) below, that captures the sharpness of the inequality and allows us to draw stronger conclusions. We first define some notation that will be used in the remaining text. Given an integer k, we define

γk = ku/(1 − ku), γ¯k = ku/(1 − ku), γ˜k = cku/(1 − cku),

where c is some small constant independent of the problem size. For a matrix A and 3 vector x, we define the condition numbers

−1 −1 −1 k|A ||A||x|kp κp(A) = kA kpkAkp, condp(A) = k|A ||A|kp, condp(A, x) = , kxkp

where |A| = (|aij|). If p is not specified the ∞-norm is implied. n×n Let A ∈ R be nonsingular and let xb be a computed solution to Ax = b. Define x0 = xb and consider the following iterative refinement process: ri = b − Axi (compute in precision u and round result to precision u), solve Adi = ri (precision u), xi+1 = xi + di (precision u), for i = 1, 2,. . . . For traditional iterative refinement, u = u2. From this point until the statement of Theorem 2.1 we define ri, di, and xi to be the computed quantities, in order to avoid a profusion of hats. The only assumption we will make on the solver for Adi = ri is that the computed solution yb to a system Ay = f satisfies

ky − yk (2.1) b ∞ = θu, θu ≤ 1, kyk∞ where θ is a constant depending on A, f, n, and u. Thus the solver need not be LU factorization, or even a factorization method. (p) For any p-norm we define µi by

(p) (2.2) kA(x − xi)kp = µi kAkpkx − xikp and note that

−1 (p) κp(A) ≤ µi ≤ 1.

(p) As we argue in the next subsection, µi may be far below its upper bound, and this −1 is the key reason why iterative refinement can work when κ(A) & u . Consider first the computation of ri. There are two stages. First, si = fl(b − Axi) = b − Axi + ∆si is formed in precision u, so that |∆si| ≤ γn+1(|b| + |A||xi|) [15, Sec. 3.5]. Second, the residual is rounded to the working precision: ri = fl(si) = si + fi, where |fi| ≤ u|si|. Hence

(2.3) ri = b − Axi + ∆ri, |∆ri| ≤ u|b − Axi| + (1 + u)γn+1(|b| + |A||xi|). For the second step we have, by (2.1) and (2.2),

−1 −1 kdi − A rik∞ ≤ θiukA rik∞ −1 = θiukA (b − Axi + ∆ri)k∞  (∞) ≤ θiu kx − xik∞ + uµi κ∞(A)kx − xik∞ −1  + (1 + u)γn+1k |A |(|b| + |A||xi|) k∞ (∞)  ≤ θiu 1 + uµi κ∞(A) kx − xik∞ −1 (2.4) + θiu(1 + u)γn+1k |A |(|b| + |A||xi|) k∞. For the last step, using the variant [15, Eq. (2.5)] of the rounding error model we have

xi+1 = xi + di + ∆xi, |∆xi| ≤ u|xi+1|. 4 Rewriting gives

−1 −1 xi+1 = xi + A ri + di − A ri + ∆xi −1 −1 = x + A ∆ri + di − A ri + ∆xi.

Hence, using (2.3) and (2.4),

−1  kxi+1 − xk∞ ≤ k |A u|A(x − xi)| + (1 + u)γn+1(|b| + |A||xi|) k∞ (∞)  (2.5) + θiu 1 + uµi κ∞(A) kx − xik∞ −1 (2.6) + θiu(1 + u)γn+1k |A |(|b| + |A||xi| k∞ + ukxi+1k∞ (∞) (∞)  ≤ u µi κ∞(A) + θi(1 + uµi κ∞(A)) kx − xik∞ −1 (2.7) + γn+1(1 + u)(1 + θiu)k |A |(|b| + |A||xi|) k∞ + ukxi+1k∞. We summarize the analysis in the following theorem. Theorem 2.1. Let iterative refinement in precisions u and u ≤ u, and with a solver satisfying assumption (2.1), be applied to a linear system Ax = b with nonsin- n×n gular A ∈ R and a given approximation x0 to x. Then for i ≥ 0 the computed iterate xbi+1 satisfies (∞)  kxbi+1 − xk∞ . 2µi κ∞(A)u + θiu kx − xbik∞ −1 (2.8) + nu(1 + θiu)k |A |(|b| + |A||xbi|) k∞ + ukxbi+1k∞.

Proof. The result follows from (2.7) on dropping second order terms, since θiu ≤ 1 by assumption. We conclude that as long as

(∞) (2.9) 2µi κ∞(A)u + θiu < 1 for all i, the relative error will contract until a limiting normwise relative accuracy of order

−1 nu(1 + θu)k |A |(|b| + |A||x|) k∞/kxk∞ + u ≤ 2nu(1 + θu) cond∞(A, x) + u is achieved, where θ is an upper bound on the θi terms. To achieve (2.9) we need θiu to be sufficiently less than 1, which is a condition (∞) on the solver, and µi κ∞(A)u to be sufficiently less than 1, which is essentially a condition on the iteration. In the next subsection we consider the latter condition. Note that the limiting accuracy is essentially independent of θ, as long as θu < 1. Therefore it is not necessary to solve the correction equation Adi = ri to high accuracy in order to achieve a final relative error of order u. (p) 2.1. Bounding µi. Now we consider the size of µi in (2.2). We will focus on the 2-norm, but by equivalence of norms our conclusions also apply to the ∞-norm. Let A have the singular value decomposition A = UΣV T and denote the jth columns of the matrices of left singular vectors U and right singular vectors V by uj and vj, respectively. (Note that in this subsection only, U denotes the matrix of left singular vectors of A rather than the upper triangular factor from an LU factorization of A.) Since we are interested in the case where A is ill conditioned (but nonsingular) we can assume that the singular values satisfy 0 < σn  σ1. 5 Denote by ri = b − Axbi = A(x − xbi) the exact residual for the computed xbi. Then we can rewrite (2.2) for p = 2 as

(2) (2.10) krik2 = µi kAk2kx − xbik2. We have

n T X (uj ri)vj x − x = VΣ−1U T r = , bi i σ j=1 j and so

n T 2 n 2 X (uj ri) 1 X kPkrik kx − x k2 ≥ ≥ (uT r )2 = 2 , bi 2 σ2 σ2 j i σ2 j=n+1−k j n+1−k j=n+1−k n+1−k

T where Pk = UkUk with Uk = [un+1−k, . . . , un]. Hence from (2.10) we have

(2) krik2 σn+1−k µi ≤ . kPkrik2 σ1

(2) The bound tells us that µi will be much less than 1 if ri contains a significant component in the subspace span(Uk) for any k such that σn+1−k ≈ σn. (2) This argument says that we can expect µi  1 when ri is a “typical” vector— one having sizeable components in the direction of every left singular vector of A—in which case x−xbi is not typical, in that it has large components in the direction of the right singular vectors of A corresponding to small singular values. We cannot prove that ri is typical, but we can verify it numerically, which we do in Section5. We can gain further insight from backward error considerations. For any back- ward stable solver (such as LU factorization with appropriate pivoting for stability, or GMRES) we know that the backward error krik2/(kAk2kxbik2) of the computed solu- tion xbi to Ax = b will be small, yet the forward error may be large. For the refinement, the initial backward error will be small and the same will be true for each iterate xbi, as refinement does not degrade the backward error. So for an ill-conditioned system we would expect to see that kr k kx − x k i 2 ≈ u  bi 2 , kAk2kxbik2 kxk2 (2) or equivalently µi  1 assuming kxbik2 ≈ kxk2, at least in the early stages of the refinement when xbi is not very accurate. However, close to convergence both the residual and the error will be small, so that

krik2 ≈ kAk2kx − xbik2, (2) (2) or µi ≈ 1. Therefore we can expect µi to increase as the refinement steps progress and this could result in a slowing of the convergence. We note that Wilkinson [36] comments that “The successive r derived during the course of iterative refinement become progressively more deficient in components corresponding to the smaller singular values of A”. This claim is equivalent to saying (2) that µi will increase with i, but Wilkinson does not justify the claim or make any further use of it. 6 2.2. The role of θi. In standard iterative refinement the LU factorization of A is reused to solve Adi = rbi by substitution in each refinement step, where here and in the remaining text rbi denotes the computed residual vector. We will now show that no matter how much precision is used in the substitutions, a relative error less than 1 for the computed solution dbi cannot be guaranteed. Indeed, it suffices to assume that the substitutions with the computed LU factors Lb and Ub are carried out exactly. Then

−1 −1 −1 dbi = Ub Lb rbi = (A + ∆A) rbi, where |∆A| ≤ γn|Lb||Ub| [15, Thm. 9.3]. Hence

−1 −1 −1 kdbi − A rbik∞ kA ∆AA rbik∞ −1 (2.11) −1 ≈ −1 ≤ γnk |A ||Lb||Ub| k∞. kA rbik∞ kA rbik∞

−1 The term k |A ||Lb||Ub| k∞, which is at least as large as cond∞(A), will usually −1 be of similar size to κ∞(A), unless A has poor row scaling. Therefore if κ∞(A) ≥ u then (2.11) does not guarantee any relative accuracy in dbi, so we have θiu > 1 in (2.1) and our analysis suggests that iterative refinement may fail. The culprit is the ∆A term, which comes from the LU factorization in precision u. We conclude that if the correction equation is solved using the original LU factors then standard iterative refinement may fail to converge for very ill conditioned A, no matter how much precision is used in the triangular solves and regardless of the size of the µi values. One way to satisfy (2.1) is to use higher precision in computing the LU factor- ization, but this is very expensive. In the following section we present an alternative approach. We show that the correction equations can be solved with some relative accuracy even for numerically singular A by using a different solver: GMRES precon- ditioned by the existing (precision u) LU factors. This approach can be motivated by the observation, mentioned in Section1, that even if A is very ill conditioned the computed LU factors still contain useful information. 3. GMRES iterative refinement. In this section we will show that we can use GMRES [31] to solve Adi = rbi in iterative refinement in such a way that Theorem 2.1 guarantees accurate solution of ill-conditioned systems. We will use the computed LU factors as left preconditioners, so that GMRES solves the preconditioned system

(3.1) Ade i = si,

−1 −1 −1 −1 where Ae = Ub Lb A and si = Ub Lb rbi. The GMRES method presented in Algo- rithm 3.1 is a simplified variant in which no restarting is used and we assume that the iteration is started with the zero vector as the initial guess. Additionally, the method uses precision u = u2 in the triangular solves with Lb and Ub and in matrix-vector mul- tiplication with A. The remaining computations are performed in precision u and all quantities are stored in precision u. For clarity, in this section we use hats to decorate all quantities computed in finite precision. To avoid confusion, in the remaining text we will use the word iterations and indices j and k in association with GMRES and the word steps and index i in association with the iterative refinement process. Our analysis proceeds in three main steps. First, we show that κ∞(Ae) is small. Then we show that the error ksbi − sik∞ in the computed right-hand side sbi = −1 −1 fl(Ub fl(Lb rbi)) is small. Then we use the analysis of [26] to show that our GMRES 7 Algorithm 3.1 Left-preconditioned GMRES Input: n×n matrix A; right-hand-side b; maximum number of iterations m; tolerance τ; approximate LU factors L and U. Output: Approximate solution xb to Ax = b. −1 −1 1: Compute r0 = U (L b) in precision u; store in precision u. 2: β = kr0k2, v1 = r0/β 3: for j = 1, . . . , m do −1 −1 4: Compute z = U (L (Avj)) in precision u; store in precision u. 5: for ` = 1, . . . , j do ∗ 6: h`,j = z v` 7: z = z − h`,jv` 8: end for 9: hj+1,j = kzk2, vj+1 = z/hj+1,j 10: Let V = [v1, . . . , vj] and H = {hi,`}1≤i≤j+1,1≤`≤j. 11: Update decomposition Q = HR (via Givens rotations). 12: g = Qβe1 13: if g(k + 1) ≤ τβ then break, end if 14: end for 15: Solve y = argminy¯kg − Ry¯k2. 16: xb = V y 17: Return xb.

variant provides a backward stable solution to Ade i = sbi. These three results allow us to conclude that Ade i = si can be solved with some degree of relative accuracy, that is, (2.1) is satisfied. To simplify the analysis we assume in this section that κ∞(A) is not too much larger than u−1, although our experiments in Section5 suggest that the GMRES-based approach can work even when κ∞(A) is a few orders of magnitude larger than u−1. We begin by showing that the matrix Ae is well conditioned. We can write Ae = Ub−1Lb−1A = (A + ∆A)−1A ≈ I − A−1∆A, Ae−1 = A−1LbUb = A−1(A + ∆A) = I + A−1∆A, which give the bounds

−1 kAek∞ . 1 + γnk |A ||Lb||Ub| k∞, −1 −1 kAe k∞ . 1 + γnk |A ||Lb||Ub| k∞ and then

−1 2 κ∞(Ae) . (1 + γnk |A ||Lb||Ub| k∞) . −1 Therefore even if A is so ill conditioned that γnk |A ||Lb||Ub| k∞ is of order 100 (say), we still expect κ∞(Ae) to be of modest size. (Note that by comparison with the observation in Section1 that κ(Ub−1Lb−1A) ≈ 1 + κ(A)u for the computed matrix, here we have a strict bound for the corresponding exact matrix.) Of course, the matrix Ae is not explicitly formed in preconditioned GMRES. Since GMRES only requires matrix-vector products with the preconditioned coefficient ma- trix, we compute Ave i by forming wi = Avi and performing the triangular solves 2 Lyb i = wi and Uzb i = yi, all at precision u = u . 8 Unlike Ae, the right-hand side si is explicitly formed at the beginning of the preconditioned GMRES algorithm. Using precision u, this computation yields

−1 −1 sbi = (Ub + ∆U) (Lb + ∆L) rbi,

where |∆U| ≤ γ¯n|Ub| and |∆L| ≤ γ¯n|Lb|. Some manipulation gives

−1 −1 −1 −1 −1 −1 sbi ≈ (Ub − Ub ∆UUb )(Lb − Lb ∆LLb )rbi −1 −1 −1 −1 −1 −1 −1 −1 ≈ Ub Lb ri − Ub Lb ∆LLb ri − Ub ∆UUb Lb rbi −1 −1 = si − Ub Lb (∆LUb + L∆Ub )si −1 = si − (A + ∆A) (∆LUb + L∆Ub )si, so

−1 si − sbi ≈ A (∆LUb + L∆Ub )si. Hence

−1 (3.2) ksi − sbik∞ . γ¯2nk |A ||Lb||Ub| k∞ksik∞.

−1 Again, the quantity k |A ||Lb||Ub| k∞ can be as large as κ∞(A). Nevertheless, since precision u is used, we still expect ksi − sbik∞ . γ˜nksik∞ as long as κ∞(A) is not too much larger than u−1. We now want to show that GMRES provides a backward stable solution to Ade i = sbi. We will use the analysis of [26], where it is proved that the variant of GMRES that uses modified Gram-Schmidt orthogonalization (MGS-GMRES) is backward stable. This proof relies on the observation that, given a matrix A and right-hand side b, carrying out k − 1 iterations of the Arnoldi process is equivalent to applying k steps of modified Gram-Schmidt to the matrix

[b, fl(AVbk−1)] = [b, AVk−1] + [0, ∆Vk−1],

where Vbk−1 = [vb1,..., vbk−1] is the matrix of computed basis vectors and Vk−1 = [v1, . . . , vk−1] is Vbk−1 with its columns correctly normalized, that is, for j ≤ k − 1,

0 0 (3.3) vbj = vj + ∆vj, k∆vjk2 ≤ γ˜n.

The term ∆Vk−1 = [∆v1, . . . , ∆vk−1] contains both errors in applying the matrix A to vectors vbj and errors in normalizing vbj, for j ≤ k − 1. It is shown in [26] that 1/2 k∆Vk−1k ≤ k γnkAkF . This result is then combined with results on the finite precision behavior of MGS, including the loss of orthogonality in the MGS process and the backward stability of MGS for solving linear least squares problems, to show the backward stability of MGS-GMRES for solving Ax = b. We now consider the case where MGS-GMRES is used to solve Ade i = sbi. The only thing that will change computationally is the error in applying Ae to a vector, which is done in this case without explicitly forming Ae. Other aspects of the MGS-GMRES algorithm, such as the MGS orthogonalization process and least squares solve, remain unchanged. Therefore if we can show that

1/2 (3.4) k∆Vk−1kF ≤ k γnkAekF , 9 then carrying through the remaining analysis in [26] shows that the MGS-GMRES backward error results of [26] hold for the left-preconditioned GMRES method run with Ae and sbi, that is, for some k ≤ n, we have

(3.5) (Ae + ∆Ae)dbi = sbi + ∆sbi, k∆AekF ≤ γ˜knkAekF , k∆sbik2 ≤ γ˜knksbik2.

We now show that if precision u is used in implicitly applying Ae to vbj, then ∆Vk−1 indeed satisfies the required bound (3.4). Using precision u, we compute

(A + ∆A)vbj = wbj, |∆A| ≤ γ¯n|A|, (Lb + ∆L)ybj = wbj, |∆L| ≤ γ¯n|Lb|, (Ub + ∆U)zbj = ybj, |∆U| ≤ γ¯n|Ub|.

The computed vector zbj can therefore be written −1 −1 zbj = (Ub + ∆U) (Lb + ∆L) (A + ∆A)vbj −1 −1 −1 −1 −1 −1 ≈ (Ub − Ub ∆UUb )(Lb − Lb ∆LLb )(A + ∆A)vbj 0 = (Ae + ∆Ae )vbj, where

∆Ae0 ≈ Ub−1Lb−1∆A − Ub−1Lb−1∆LLb−1A − Ub−1∆UUb−1Lb−1A = AAe −1∆A − Ub−1Lb−1∆LUbAe − Ub−1∆UA,e from which we obtain

0  (3.6) k∆Ae kF ≤ γ¯n κF (A) + κF (Ub)κF (Lb) + κF (Ub) kAekF .

−1 If κF (A) ≈ u and κF (Lb) is modestly sized (which will usually be the case, as Lb is unit triangular with off-diagonal elements bounded by 1), we then have that 0 k∆Ae kF . γ˜nkAekF . Accounting for the errors in normalization, with (3.3) we have

0 0 0 0 zbj ≈ (Ae + ∆Ae )(vj + ∆vj) ≈ Ave j + ∆Ae vj + A∆ve j = Ave j + ∆vj,

0 0 with ∆vj = ∆Ae vj + A∆ve j. Using (3.6), this gives

0 0 k∆vjk2 ≤ k∆Ae k2kvjk2 + kAek2k∆vjk2 ≈ γ˜nkAekF . Then after k − 1 iterations,

1/2 Zbk−1 = [zb1,..., zbk−1] = AVe k−1 + ∆Vk−1, k∆Vk−1kF ≤ k γ˜nkAekF . Therefore (3.4) is satisfied, and so the backward error result (3.5) holds, that is, at some iteration k ≤ n,

(3.7) (Ae + ∆Ae)dbi = sbi + ∆sbi, k∆AekF ≤ γ˜knkAekF , k∆sbik2 ≤ γ˜knksbik2.

We now want to show that the computed dbi is a backward stable solution to Ade i = si (with si rather than sbi as the right-hand side). Writing sbi = si + (sbi − si), we have

si − Aedbi = ∆Aedbi − (sbi − si) − ∆sbi, 10 which, using (3.2) and (3.7) gives the bound

ksi − Aedbik∞ ≤ k∆Aek∞kdbik∞ + ksbi − sik∞ + k∆sbik∞ 1/2 . nγ˜knkAek∞kdbik∞ +γ ˜nksik∞ + n γ˜knksbik∞ 1/2 . nγ˜knkAek∞kdbik∞ +γ ˜nksik∞ + n γ˜kn(1 +γ ˜n)ksik∞

. nγ˜kn(kAek∞kdbik∞ + ksik∞). Thus the normwise relative backward error for the system (3.1) is

ksi − Aedbik∞ . nγ˜kn, kAek∞kdbik∞ + ksik∞

and therefore the relative error of the computed dbi can be bounded by

kdi − dbik∞ . nγ˜knκ∞(Ae). kdik∞

Thus in (2.1) we can take θiu = nγ˜knκ∞(Ae). Since, as we have shown, κ∞(Ae) is small, we expect that θiu  1. We conclude that this variant of preconditioned GMRES can solve for the correc- tion vector with sufficient accuracy to allow convergence of the iterative refinement process. Thus we define a new iterative refinement scheme, where in Algorithm 1.1, the solve in line5 is performed by invoking GMRES (Algorithm 3.1) with input matrix A, right-hand side ri, preconditioners Lb, Ub, and a specified tolerance τ and maximum number of iterations m. We call this method GMRES-based iterative re- finement (GMRES-IR). In Section5, we show experimentally that GMRES-IR can indeed converge to an accurate solution to Ax = b even when κ∞(A) is a few orders of magnitude larger than u−1. In discussing the backward stability of GMRES, we have used results from [26] that are specific to the MGS variant of GMRES. We conjecture however that one could prove similar results (that is, show θiu  1) when certain other GMRES variants are used to solve for the corrective term with LU preconditioning, such as Householder GMRES [35], and flexible GMRES (FGMRES) [30], which were proved to be backward stable in [8] and [1], respectively. As an alternative to Ae we could use right preconditioning or split preconditioning. Consider the split preconditioning case, where A¯ = Lb−1AUb−1. It is straightforward to show that

¯ −1 −1 |A − I| ≤ γn|Lb ||Lb||Ub||Ub |, −1 −1 |Ae − I| ≤ γn|Ub ||Lb ||Lb||Ub|.

The first of these two bounds is the more favorable as it allows the diagonal of U, which has elements of potentially widely varying magnitude, to cancel, whereas in the second bound the Lb-based term intervenes. A related observation is that for the exact LU factors we have AD = LUD for diagonal D, and D does not affect the pivot sequence. Therefore, since |U||U −1| = |UD||(UD)−1|, the bound for A¯ − I has the desirable property of being insensitive to the column scaling of A, so split preconditioning might be the best choice when the matrix is badly-scaled. 11 3.1. Pivoting strategies for sparse LU. In the sparse case, it is common to use a pivoting strategy that allows for minimizing fill-in of the triangular factors and preallocating data structures. A common technique is static pivoting [9], [10], [20], in which a strict pivot ordering is decided during a structural analysis phase. If a pivot is encountered that is too small then a small perturbation can be added to the diagonal in order to limit the element growth. Another technique for sparse matrices is threshold pivoting, in which an entry apq is selected as a pivot only if n−1 |apq| ≥ φ maxp |apj|, where 0 < φ < 1. This limits the growth factor to (1 + 1/φ) . Another point of interest is the use of an incomplete LU factorization, where the nonzero structure of L and U is restricted based on the nonzero structure of Ak for some fill level k ≥ 0. One possibility is to use the complete LU factors for the initial solve and to drop entries from L and U for their use as preconditioners in GMRES-IR. This could allow the preconditioned system to remain reasonably well- conditioned while reducing the cost of applying the preconditioner in some cases. The investigation of incomplete LU factorizations for our purposes remains future work. 4. Related work. Kobayashi and Ogita [18], [19] have designed an iterative refinement method for linear systems Ax = b with ill-conditioned A. They compute an LU factorization with partial pivoting of AT , perform an initial solve, then carry out standard iterative refinement. If iterative refinement fails to converge in a set number of steps then a second phase is entered: W = U −T is computed, the products Z = WA and d = W b are formed using special techniques that yield greater accuracy, and the system Zx = d is solved by LU factorization with partial pivoting followed by iterative refinement. An alternative approach requiring fewer flops is given in [19], in which the preconditioned matrix is constructed by computing an accurate residual of the LU factorization. However, both methods require explicit construction of the preconditioned system, making them unsuitable for sparse problems, and they need a second LU factorization of the preconditioned coefficient matrix. No analysis is given in [18], [19] to support the method. However, our analysis is applicable, as we briefly indicate. We need to determine a bound on θu in (2.1) for solution of the update equation Ay = f, which is actually solved via (WA)y = W f. Here, both WA and W f are effectively computed at precision u and then rounded to precision u, and an LU factorization of WA is used. Relative errors of order roughly κ(WA)u + ukZ−1kkW kkAk are incurred. It is an assumption of this method that κ(WA)  κ(A), and if this inequality is true we can expect θu  1. Ogita [24] and Oishi, Ogita, and Rump [25] develop algorithms for accurate solu- tion of ill-conditioned linear systems that build approximate inverses of A or its LU factors. These algorithms are very different from that developed here and are not applicable to sparse matrices because of the need to form explicit approximations to inverses. Arioli and Duff [1] show that FGMRES implemented in double precision and preconditioned with an LU factorization computed in single precision can deliver backward stability at double precision, even for ill conditioned systems. This work builds on the earlier work of Arioli et al. [2], which focuses on the symmetric indefinite case. Based on this work, Hogg and Scott [16] have implemented an algorithm for symmetric indefinite systems that computes a solution using a direct solver in single precision, performs iterative refinement using the factorization of A, and then uses mixed precision FGMRES preconditioned by the direct solver to solve the original system. The stopping criteria are backward error-based. 12 Turner and Walker [34] frame restarted GMRES as a type of “abstract improve- ment algorithm”. They show that restarted GMRES can be viewed as an iterative refinement process where, in each step, the corrective term is found using a fixed number of GMRES iterations. They use this connection with standard iterative re- finement to justify the use of high-accuracy computations in selected parts of restarted GMRES. They do not, however, give any supporting analysis, nor do they consider preconditioned versions of GMRES. Our approach is related to those in [1], [2], and [16] in the sense that restarted GMRES can be viewed as an iterative refinement process (see [34]). However our approach differs from those in [1], [2], and [16] in a number of ways. First, we analyze the convergence of the iterative refinement process where a preconditioned GMRES solver is used for refinement, rather than analyze the convergence of GMRES (right) preconditioned by the triangular factors. Second, our emphasis is on solving sparse nonsymmetric linear systems, whereas the algorithms in [2] and [16] are aimed at the sparse symmetric case. Finally—and most importantly—our focus is on the forward error as opposed to the backward error. Our goal is to obtain a forward error of order the unit roundoff, u, whereas a backward error of order u only guarantees a forward error of order κ∞(A)u. 5. Numerical experiments. In this section we compare the convergence of the forward error in standard iterative refinement and GMRES-IR for problems where the matrix is very ill conditioned. Our test problems include both random dense matrices generated in MATLAB and real problems from the University of Florida Sparse Matrix Collection [5], [6]. We test two combinations of u and u: single/double precision (u = 2−24, u = 2−53) and double/quadruple precision (u = 2−53, u = 2−113). Single and double quantities and computations use built-in MATLAB datatypes and routines. For quadruple precision, we use the Advanpix Multiprecision Computing Toolbox [23] with the setting mp.Digits(34), which is compliant with the IEEE 754-2008 standard [17]. For all the test problems in this section, the right-hand-side is generated in MAT- LAB by b = randn(n,1) and then normalized so that kbk∞ ≈ 1. This results in a true solution x for which kxk∞ is large. We also carried out the same experiments using a small-normed x, by choosing x as a random vector and constructing the right- hand side b = Ax using extra precision. The results were similar to the results for large-normed x presented in this section. At the start of each experiment we use the MATLAB command rng(1) to seed the random number generator for reproducibil- ity. We use the MATLAB LU function to compute the LU factorization with partial pivoting. All quantities are stored in the working precision u. The computation of the residual at the start of each refinement step is done in precision u. Within the GMRES method, the matrix-vector multiplication with A and the triangular solves with Lb and Ub are also performed in precision u (as explained in Section3). All other computations are performed in precision u. In the figures, plots on the left show the relative error ei = kx − xbik∞/kxk∞ for standard iterative refinement (red line and circles) and GMRES-IR (blue line and squares), both started from the initial solution obtained via LU factorization with partial pivoting. Here, x is a reference solution computed in precision u2 (and stored in precision u). We let the process run until the forward error converges to the level  = n1/2u (indicated by a dashed black line) or the maximum number of iterations is (∞) reached. Plots in the middle show the computed values of µi (in (2.2)), and plots 13 on the right show the computed values of θiu (in (2.1)) for the solves for the correction terms. In these plots, the dashed black line marks 1, which is an upper bound on (∞) µi and a constraint on θiu for convergence of the iterative refinement process. In all tests in this section, we set the maximum number of iterative refinement steps (parameter imax in Algorithm 1.1) to 15. For GMRES-IR, the maximum number of GMRES iterations m in each iterative refinement step is set to n, although conver- gence always occurs well before n iterations. The GMRES convergence tolerance (the parameter τ in Algorithm 3.1) is set to 10−4. As discussed just after Theorem 2.1, it is not necessary to solve the correction equation to high accuracy. Since we expect the preconditioned matrix Ae to be very well conditioned the forward error of the correction will be not too much larger than the backward error, so GMRES can be terminated long before the backward error is at the level O(u). In these tests, we found that τ = 10−4 provided a good balance between ensuring convergence of the GMRES-IR process and minimizing the number of GMRES iterations required. In practice, this parameter may be adjusted depending on the application, the condi- tioning of A, and the relative costs of standard iterative refinement and GMRES-IR steps.

5.1. Random dense matrices. We begin by testing random dense matrices of dimension n = 100 using u = 2−24 (single precision) and u = 2−53 (double precision). The test matrices were generated using the MATLAB command gallery(’randsvd’, n, kappa(i), 3), where kappa is a list of the desired 2-norm condition numbers 107, 108, 109, and 1010. Our test results are shown in Figure 5.1. Table 5.1 shows the number of standard iterative refinement (SIR) steps, the number of GMRES-IR steps, and the number of GMRES iterations summed over all GMRES-IR steps. In the parenthetical list next to the number of GMRES iterations, element i gives the number of GMRES iterations in GMRES-IR step i. Dashes in the table indicate that the method did not converge to the level  = n1/2u within the maximum number of refinement steps (15 in all experiments). From Figure 5.1, we can see that when the condition number of A is close to but still less than u−1 (Test 1), standard iterative refinement converges within 4 steps. GMRES-IR converges in 2 steps, each of which consists of 3 iterations of precondi- tioned GMRES. When the condition number of A grows to u−1 and larger, however, standard iterative refinement no longer converges within 15 steps (in fact it diverges in Tests 2, 3, and 4). From the plots on the right we can see that θiu > 1 for standard iterative refinement in these tests, and so by Theorem 2.1, we should not expect stan- dard iterative refinement to converge. For GMRES-IR, however, θiu < 1 for all tests, and GMRES-IR converges in at most 3 refinement steps, though Table 5.1 shows that as κ∞(A) grows larger more GMRES iterations are required for convergence in each refinement step. (∞) The middle plots show the values of µi for each iterative refinement step. We (∞) −1 see that µi starts out close to κ∞(A) and grows at a rate proportional to the rate of the decrease of the error ei. So if the iterative refinement process is converging, the (∞) error grows small and µi increases towards 1. In Tests 2, 3, and 4, where standard (∞) iterative refinement does not converge, µi stays small. −1 Failure of GMRES-IR is possible if κ∞(A) becomes large enough relative to u , but the algorithm often does better than we might hope. For the single/double ex- periments with this class of matrices, GMRES-IR exhibits slower convergence and/or 10 stagnation of the error once κ∞(A) & 5 · 10 . 14 Table 5.1 Comparison of refinement steps for each method shown in Figure 5.1.

Test κ∞(A) κ∞(Ae) SIR steps GMRES-IR steps GMRES its 1 6.7 · 107 2.3 4 2 6 (3,3) 2 1.0 · 109 6.2 · 101 - 2 12 (5,7) 3 1.8 · 1010 6.4 · 103 - 2 37 (16,21) 4 1.3 · 1010 1.4 · 104 - 3 104 (33,36,35)

Table 5.2 Comparison of refinement steps for each method shown in Figure 5.2.

Test κ∞(A) κ∞(Ae) SIR steps GMRES-IR steps GMRES its 1 5.3 · 1015 1.1 6 3 6 (2,2,2) 2 4.8 · 1016 2.5 10 3 9 (3,3,3) 3 2.9 · 1017 2.7 · 101 - 3 15 (5,5,5) 4 1.6 · 1018 8.5 · 102 - 3 34 (10, 12, 12)

We now perform an analogous experiment using u = 2−53 (double precision) and u = 2−113 (quadruple precision). The problems are the same size (n = 100) and are generated in the same way as before using the MATLAB gallery(’randsvd’) function, but now with kappa values 1015, 1016, 1017, and 1018. The results are shown in Figure 5.2. We give the total number of standard iterative refinement steps, GMRES-IR steps, and GMRES iterations required for con- vergence in each test in Table 5.2. The observations from the single/double experiments hold for the double/quad case as well. In these tests, standard iterative refinement converged in Test 1 and 2 but not in Tests 3 and 4. In Test 3, we can see that it appears that standard iterative refinement may eventually converge to level  = n1/2u if allowed enough refinement steps. Interestingly, the corresponding plot for θiu shows that θiu is very close to 1 (it is around 0.4 in each step), confirming that the iterative refinement process can still make progress despite not having θiu  1. GMRES-IR converges in 3 refinement steps in all the double/quad tests. Table 5.2 shows that the number of GMRES iterations required per GMRES-IR step increases with κ∞(A), although when both standard iterative refinement and GMRES-IR con- verge, the total number of GMRES iterations is about the same as the number of standard iterative refinement steps. In Test 4, we see that GMRES-IR can converge 1/2 to a relative error of order n u even when κ∞(A) is orders of magnitude larger than −1 18 u (in this test, κ∞(A) = 1.6 · 10 ). 5.2. University of Florida Sparse Matrix Collection tests. We now test the two iterative refinement schemes on some ill-conditioned problems from the Uni- versity of Florida Sparse Matrix Collection [5], [6]. Properties of the test matrices are listed in Table 5.3. We first test matrices that are close to numerically singular for u = 2−24 (single precision) and u = 2−53 (double precision); see the the first four rows of Table 5.3. Fig- ure 5.3 and Table 5.4 show the results in the same format as previous experiments. For the matrices adder dcop 06 and adder dcop 26, standard iterative refinement does not converge, as we would expect since θiu ≥ 1. For matrices radfr1 and adder dcop 19, 15 Table 5.3 Properties of the test matrices. The quantities cond(A), κ∞(A), and κ∞(Ae) given in the table were computed in single precision for the first four rows and double precision for the last four rows (corresponding to the working precision in the corresponding tests).

Matrix Application n cond(A) κ∞(A) κ∞(Ae) radfr1 chem. eng. 1048 2.1 · 108 1.0 · 1011 1.6 · 103 adder dcop 06 circuit sim. 1813 1.3 · 1010 7.2 · 1012 2.8 adder dcop 19 circuit sim. 1813 2.8 · 108 9.1 · 1011 1.0 adder dcop 26 circuit sim. 1813 4.3 · 108 7.9 · 1011 4.5 · 101 mhda416 MHD 416 1.1 · 1019 1.9 · 1025 7.3 · 109 oscil dcop 06 circuit sim. 430 1.7 · 1018 1.1 · 1021 4.5 · 101 oscil dcop 42 circuit sim. 430 6.7 · 1017 5.0 · 1020 2.3 oscil dcop 43 circuit sim. 430 1.0 · 1018 7.7 · 1020 2.1

Table 5.4 Comparison of refinement steps for each method shown in Figure 5.3.

Matrix name SIR steps GMRES-IR steps GMRES its radfr1 13 1 1 (1) adder dcop 06 - 1 2 (2) adder dcop 19 - 1 1 (1) adder dcop 26 - 1 3 (3)

θiu is close to but still less than 1, and standard iterative refinement converges slowly. In all tests, GMRES-IR converges in only a single refinement step consisting of at most 3 iterations of GMRES. The last four matrices in Table 5.3 were tested using u = 2−53 (double precision) and u = 2−113 (quadruple precision). The results can be found in Figure 5.4 and Table 5.5. Again, we see that the behavior of standard iterative refinement and GMRES-IR is as expected. In short, GMRES-IR enables the accurate solution of very ill conditioned problems even when standard iterative refinement fails. 5.3. Two-stage iterative refinement. From our numerical experiments, we can see that in some cases, even though A is close to numerically singular, standard iterative refinement still converges (see, e.g., Test 1 in Figures 5.1–5.4). Since a step of standard iterative refinement is likely to be less expensive than a step of GMRES-IR (how much less expensive depends on the number of GMRES iterations in each GMRES-IR step), it may be preferable in such cases to use standard iterative refinement. We therefore propose a two-stage iterative refinement process, which starts by trying standard iterative refinement and switches to GMRES-IR (and makes use of the existing LU factorization) in case of slow convergence or divergence. The decision of whether to switch from standard iterative refinement to GMRES-IR can be based on the stopping criteria suggested by Demmel et al. [7], which detect when standard refinement is converging too slowly or not at all. The optimal parameters to use in these stopping criteria will be dependent on the particular application. 6. Conclusions and future work. There is an argument in numerical analysis that a nearly singular problem does not deserve to be solved accurately, because if 16 Table 5.5 Comparison of refinement steps for each method shown in Figure 5.4.

Matrix name SIR steps GMRES-IR steps GMRES its mhda416 5 2 3 (1,2) oscil dcop 06 - 2 7 (3,4) oscil dcop 42 - 3 9 (2,3,4) oscil dcop 43 - 3 10 (2,4,4) the data is inexact there may be an exactly singular problem within the region of uncertainty. The problem should therefore be reformulated or regularized. While this argument is often valid, there is an increasing number of applications where very ill conditioned problems do arise and an accurate solution is warranted, as explained in Section1. Moreover, the trend towards trading precision for performance (single precision for double precision, or half precision for single precision) means that prob- lems that are only moderately ill conditioned at one precision become extremely ill conditioned at the reduced precision. We have shown that, contrary to the conventional wisdom, iterative refinement can provide a highly accurate solution to a linear system Ax = b with condition number of order u−1. Our new rounding error analysis shows that it is sufficient to obtain some correct significant digits in solving the correction equation, thanks to a special property of the residuals of the iterates that enables much smaller error bounds to be obtained. Our use of GMRES to solve the correction equation preconditioned by the LU factors (GMRES-IR) yields the necessary accuracy for refinement to work, so it expands the range of accurately solvable square linear systems. More work is required on practical implementation of GMRES-IR. As noted in Section 3.1, various pivoting strategies as well as incomplete LU factorization might be used, and the two-stage process proposed in Section 5.3 requires various choices of parameters.

17 e ( ) 3 u i 7i1 i 105 0 100 10 100 -5 10-5 10 10-5

10-10 10-10 10-10 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 1: gallery(’randsvd’,100,1e7,3)

e ( ) 3 u i 7i1 i 105 0 100 10 100 -5 10-5 10 10-5

10-10 10-10 10-10 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 2: gallery(’randsvd’,100,1e8,3)

e ( ) 3 u i 7i1 i 105 0 100 10 100 -5 10-5 10 10-5

10-10 10-10 10-10 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 3: gallery(’randsvd’,100,1e9,3)

e ( ) 3 u i 7i1 i 105 0 100 10 100 -5 10-5 10 10-5

10-10 10-10 10-10 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 4: gallery(’randsvd’,100,1e10,3)

(∞) Fig. 5.1. Relative error ei = kx − xbik∞/kxk∞ (left), µi (middle), and θiu (right) versus refinement step i for tests generated using the MATLAB function randsvd, with condition numbers (from top to bottom) 107, 108, 109, and 1010. Here u = 2−24 (single precision) and u = 2−53 (double precision). Red circles correspond to standard iterative refinement and blue squares correspond to GMRES-IR.

18 e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10

-10 -10 10-10 10 10

-15 10-15 10-15 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 1: gallery(’randsvd’,100,1e15,3)

e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10

-10 -10 10-10 10 10

-15 10-15 10-15 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 2: gallery(’randsvd’,100,1e16,3)

e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10

-10 -10 10-10 10 10

-15 10-15 10-15 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 3: gallery(’randsvd’,100,1e17,3)

e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10

-10 -10 10-10 10 10

-15 10-15 10-15 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 4: gallery(’randsvd’,100,1e18,3)

(∞) Fig. 5.2. Relative error ei = kx − xbik∞/kxk∞ (left), µi (middle), and θiu (right) versus refinement step i for tests generated using the MATLAB function randsvd, with condition numbers (from top to bottom) 1015, 1016, 1017, and 1018. Here u = 2−53 (double precision) and u = 2−113 (quadruple precision). Red circles correspond to standard iterative refinement and blue squares correspond to GMRES-IR.

19 e ( ) 3 u i 7i1 i 0 0 100 10 10

-5 -5 10-5 10 10

-10 -10 10-10 10 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 1:radfr1

e ( ) 3 u i 7i1 i 0 0 100 10 10

-5 -5 10-5 10 10

-10 -10 10-10 10 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 2: adder dcop 06

e ( ) 3 u i 7i1 i 0 0 100 10 10

-5 -5 10-5 10 10

-10 -10 10-10 10 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 3: adder dcop 19

e ( ) 3 u i 7i1 i 0 0 100 10 10

-5 -5 10-5 10 10

-10 -10 10-10 10 10

0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 4: adder dcop 26

(∞) Fig. 5.3. Relative error ei = kx − xbik∞/kxk∞ (left), µi (middle), and θiu (right) versus refinement step i for tests from the University of Florida Sparse Matrix Collection. Here u = 2−24 (single precision) and u = 2−53 (double precision). Red circles correspond to standard iterative refinement and blue squares correspond to GMRES-IR.

20 e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10 -10 10-10 10-10 10 -15 10-15 10-15 10 10-20 10-20 10-20 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 1: mhda416

e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10 -10 10-10 10-10 10 -15 10-15 10-15 10 10-20 10-20 10-20 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 2: oscil dcop 06

e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10 -10 10-10 10-10 10 -15 10-15 10-15 10 10-20 10-20 10-20 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 3: oscil dcop 42

e ( ) 3 u i 7i1 i 0 0 100 10 10 -5 -5 10-5 10 10 -10 10-10 10-10 10 -15 10-15 10-15 10 10-20 10-20 10-20 0 5 10 15 0 5 10 15 0 5 10 15 re-nement step i re-nement step i re-nement step i Test 4: oscil dcop 43

(∞) Fig. 5.4. Relative error ei = kx − xbik∞/kxk∞ (left), µi (middle), and θiu (right) versus refinement step i for tests from the University of Florida Sparse Matrix Collection. Here u = 2−53 (double precision) and u = 2−113 (quadruple precision). Red circles correspond to standard iterative refinement and blue squares correspond to GMRES-IR.

21 REFERENCES

[1] M. Arioli and I. S. Duff. Using FGMRES to obtain backward stability in mixed precision. Electron. Trans. Numer. Anal., 33:31–44, 2009. [2] M. Arioli, I. S. Duff, S. Gratton, and S. Pralet. A note on GMRES preconditioned by a perturbed LDLT decomposition with static pivoting. SIAM J. Sci. Comput., 29(5):2024– 2044, 2007. [3] David H. Bailey and Jonathan M. Borwein. High-precision arithmetic in mathematical physics. Mathematics, 3(2):337–367, 2015. [4] Gleb Beliakov and Yuri Matiyasevich. A parallel algorithm for calculation of determinants and minors using arbitrary precision arithmetic. BIT, 56(1):33–50, 2015. [5] Timothy A. Davis. University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/ research/sparse/matrices. [6] Timothy A. Davis and Yifan Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software, 38(1):1:1–1:25, 2011. [7] James Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Sonil Mukherjee, and E. Jason Riedy Riedy. Error bounds from extra-precise iterative refinement. ACM Trans. Math. Software, 32(2):325–351, 2006. [8] J. Drkoˇsov´a,A. Greenbaum, M. Rozloˇzn´ık,and Z. Strakoˇs. of GMRES. BIT, 35:309–330, 1995. [9] Iain S. Duff. MA57—A code for the solution of sparse symmetric definite and indefinite systems. ACM Trans. Math. Software, 30(2):118–144, 2004. [10] Iain S. Duff and St´ephanePralet. Towards stable mixed pivoting strategies for the sequential and parallel solution of sparse symmetric indefinite systems. SIAM J. Matrix Anal. Appl., 29(3):1007–1024, 2007. [11] Massimiliano Ferronato, Carlo Janna, and Giorgio Pini. Parallel solution to ill-conditioned FE geomechanical problems. International Journal for Numerical and Analytical Methods in Geomechanics, 36(4):422–437, 2012. [12] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of JMLR: Workshop and Conference Proceedings, 2015, pages 1737–1746. [13] Yun He and Chris H. Q. Ding. Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomputing, 18(3):259–277, 2001. [14] Nicholas J. Higham. Iterative refinement for linear systems and LAPACK. IMA J. Numer. Anal., 17(4):495–509, 1997. [15] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. Second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. xxx+680 pp. ISBN 0-89871-521-0. [16] J. D. Hogg and J. A. Scott. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems. ACM Trans. Math. Software, 37(2):17:1–17:24, 2010. [17] IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2008 (revision of IEEE Std 754- 1985). IEEE Computer Society, New York, 2008. 58 pp. ISBN 978-0-7381-5752-8. [18] Yuka Kobayashi and Takeshi Ogita. A fast and efficient algorithm for solving ill-conditioned linear systems. JSIAM Letters, 7:1–4, 2015. [19] Yuka Kobayashi and Takeshi Ogita. Accurate and efficient algorithm for solving ill-conditioned linear systems by preconditioning methods. Nonlinear Theory and Its Applications, IEICE, 7(3):374–385, 2016. [20] Xiaoye S. Li and James W. Demmel. Making sparse Gaussian elimination scalable by static pivoting. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, IEEE Computer Society, Washington, DC, USA, 1998, pages 1–17. CD ROM. [21] Ding Ma and Michael Saunders. Solving multiscale linear programs using the simplex method in quadruple precision. In Numerical Analysis and Optimization, Mehiddin Al-Baali, Lucio Grandinetti, and Anton Purnama, editors, number 134 in Springer Proceedings in Mathe- matics and, Springer-Verlag, Berlin, 2015, pages 223–235. [22] Ding Ma, Laurence Yang, Ronan M. T. Fleming, Ines Thiele, Bernhard O. Palsson, and Michael A. Saunders. Quadruple-Precision Solution of Genome-Scale Models of Metabolism and Macromolecular Expression, May 2016. ArXiv preprint 1606.00054. [23] Multiprecision Computing Toolbox. Advanpix, Tokyo. http://www.advanpix.com. [24] Takeshi Ogita. Accurate matrix factorization: Inverse LU and inverse QR factorizations. SIAM J. Matrix Anal. Appl., 31(5):2477–2497, 2010. [25] Shin’ichi Oishi, Takeshi Ogita, and Siegfried M. Rump. Iterative refinement for ill-conditioned 22 linear systems. Japan J. Indust. Appl. Math., 26(2-3):465–476, 2009. [26] Christopher C. Paige, Miro Rozloˇzn´ık,and ZdenˇekStrakoˇs. Modified Gram-Schmidt (MGS), least squares, and backward stability of MGS-GMRES. SIAM J. Matrix Anal. Appl., 28 (1):264–284, 2006. [27] T. N. Palmer. More reliable forecasts with less precise computations: A fast-track route to cloud-resolved weather and climate simulators? Phil. Trans. R. Soc. A, 372(2018), 2014. [28] Siegfried M. Rump. Approximate inverses of almost singular matrices still contain useful infor- mation. Technical Report 90.1, Hamburg University of Technology, 1990. [29] Siegfried M. Rump. Inversion of extremely ill-conditioned matrices in floating-point. Japan Journal of Industrial and Applied Mathematics, 26(2-3):249–277, 2009. [30] Youcef Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Comput., 14(2):461–469, 1993. [31] Youcef Saad and Martin H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 7(3):856–869, 1986. [32] Scott A. Sarra. Radial basis function approximation methods with extended precision floating point arithmetic. Engineering Analysis with Boundary Elements, 35(1):68–76, 2011. [33] Robert D. Skeel. Iterative refinement implies numerical stability for Gaussian elimination. Math. Comp., 35(151):817–832, 1980. [34] Kathryn Turner and Homer F. Walker. Efficient high accuracy solutions with GMRES(m). SIAM J. Sci. Statist. Comput., 13(3):815–825, 1992. [35] Homer F. Walker. Implementation of the GMRES method using Householder transformations. SIAM J. Sci. Statist. Comput., 9(1):152–163, 1988. [36] J. H. Wilkinson. The use of the single-precision residual in the solution of linear systems. Unpublished manuscript, 1977. 11 pp.

23