Three-Precision GMRES-Based Iterative Refinement for Least Squares Problems Carson, Erin and Higham, Nicholas J. and Pranesh, Sr

Three-Precision GMRES-Based Iterative Refinement for Least Squares Problems Carson, Erin and Higham, Nicholas J. and Pranesh, Srikara 2020 MIMS EPrint: 2020.5 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK ISSN 1749-9097 THREE-PRECISION GMRES-BASED ITERATIVE REFINEMENT FOR LEAST SQUARES PROBLEMS∗ ERIN CARSONy , NICHOLAS J. HIGHAMz , AND SRIKARA PRANESHz Abstract. The standard iterative refinement procedure for improving an approximate solution m×n to the least squares problem minx kb − Axk2, where A 2 R with m ≥ n has full rank, is based on solving the (m + n) × (m + n) augmented system with the aid of a QR factorization. In order to exploit multiprecision arithmetic, iterative refinement can be formulated to use three precisions, but the resulting algorithm converges only for a limited range of problems. We build an iterative refinement algorithm called GMRES-LSIR, analogous to the GMRES-IR algorithm developed for linear systems [SIAM J. Sci. Comput., 40 (2019), pp. A817-A847], that solves the augmented system using GMRES preconditioned by a matrix based on the computed QR factors. We explore two left preconditioners; the first has full off-diagonal blocks and the second is block diagonal and can be applied in either left-sided or split form. We prove that for a wide range of problems the first preconditioner yields backward and forward errors for the augmented system of order the working precision under suitable assumptions on the precisions and the problem conditioning. Our proof does not extend to the block diagonal preconditioner, but our numerical experiments show that with this preconditioner the algorithm performs about as well in practice. Key words. least squares, iterative refinement, GMRES, preconditioning, mixed precision, half precision arithmetic AMS subject classifications. 65F05, 65F08, 65F20 1. Introduction. We consider the linear least squares problem minx kb − Axk2, where A 2 Rm×n (m ≥ n) has full rank. A common method of solution uses the QR factorization R A = Q ≡ QU; 0 m×m m×n where Q = [Q1;Q2] 2 R is an orthogonal matrix with Q1 2 R and Q2 2 Rm×(m−n), and R 2 Rn×n is upper triangular. The unique least squares solution is −1 T T x = R Q1 b with residual kb − Axk2 = kQ2 bk2. Least squares problems may be ill conditioned in practice, and so rounding errors may result in an insufficiently accurate solution. In this case, iterative refinement may be used to improve accuracy, and it also improves stability. Two different approaches can be used for iterative refinement of least squares problems. When the overdetermined system is nearly compatible (i.e., there exists an x such that Ax = b exactly or nearly exactly), an approach analogous to iterative refinement for square linear systems can be employed. After computing the initial approximate solution x0, the solution is refined via the process whose (i + 1)st step is 1. Compute ri = b − Axi. 2. Solve the least squares problem mindi kAdi − rik2. 3. Update xi+1 = xi + di. If a QR factorization was used to compute the initial approximate solution x0 then the ∗Funding: The first author was supported by Charles University Primus program project PRIMUS/19/SCI/11. The second author was supported by the Royal Society. The second and third authors were supported by Engineering and Physical Sciences Research Council grant EP/P020720/1. yFaculty of Mathematics and Physics, Charles University, 186 75, Praha 8, Czech Republic. ([email protected]). zDepartment of Mathematics, University of Manchester, Manchester, M13 9PL, UK ([email protected], [email protected]). 1 2 CARSON, HIGHAM, and PRANESH QR factors can be reused to solve for the correction di in each step 2. This approach was used by Golub [10] and analyzed by Golub and Wilkinson [11]. A generalization of this approach that works even when Ax = b is inconsistent was suggested by Björck [2]. Refinement is performed on the augmented system IA r b (1.1) = ; AT 0 x 0 which is equivalent to the normal equations. Given x0 and r0 = b − Ax0, the (i + 1)st refinement step is as follows. 1. Compute the residual vector for the augmented system: fi b IA ri b − ri − Axi (1.2) = − T = T : gi 0 A 0 xi −A ri 2. Solve IA δri fi (1.3) T = : A 0 δxi gi 3. Update the solution to the augmented system: r r δr (1.4) i+1 = i + i : xi+1 xi δxi In this way, the solution xi and residual ri for the least squares problem are simultaneously refined. Björck [2] shows that the linear system (1.3) can be solved by reusing the QR factors of A via the process −T (1.5) h = R gi; d1 T (1.6) = [Q1;Q2] fi; d2 h (1.7) δri = Q ; d2 −1 (1.8) δxi = R (d1 − h): Existing analyses of the convergence and accuracy of this approach in finite precision assume that at most two precisions are used; the working precision u is used to compute the QR factorization, solve the system (1.3), and compute the update (1.4). A second precision ur ≤ u is used to compute the residuals in (1.2). Typically 2 ur = u , in which case it can be shown that as long as the condition number of the augmented system matrix is smaller than u−1, the refinement process will converge with a limiting forward error on the order of u; see [3] and [15, sect. 20.5] and the references therein. Motivated by the emergence of multi-precision capabilities in hardware, Carson and Higham [5] have recently analyzed iterative refinement for (square) linear systems in the case where three different precisions are used: uf for the matrix factorization, u for the working precision, and ur for the computation of residuals, where uf ≥ u ≥ ur. The analysis additionally uses a fourth \precision", denoted us, which is essentially a parameter that describes how accurately the correction equation is solved (and us takes the value u or uf in the cases of interest). THREE-PRECISION LEAST SQUARES ITERATIVE REFINEMENT 3 As the factorization of the system matrix is often the most expensive part of the computation, it is desirable from a performance standpoint that low precision be used in the factorization, ideally without affecting convergence, numerical stability, or accuracy. The results in [5] show that this is possible under certain constraints on the condition number of the matrix, which depend on the particular method of solving the correction equations. For example, with single precision as the working precision, the matrix factorization computed in half precision, residuals computed in double precision, and the correction equation solved by GMRES preconditioned with the computed LU factors, it is possible to solve the square linear system Ax = b to full −1 7 single precision accuracy for condition numbers κ1(A) = kA k1kAk1 ≤ 10 , with the O(n3) part of the computations carried out entirely in half precision. Therefore significant speedups can be obtained on hardware that supports half precision. In [12], [13], using the tensor cores on an NVIDIA V100 GPU, Haidar et al. obtain a speedup of 4 over the state of the art double precision solver. In this work we wish to extend the results of [5] to least squares problems. In [17] a mixed-precision least square solver based on the normal equations was proposed for the case where A is well conditioned. The present work differs in that it is not based on the normal equations and it is applicable to a wider range of problems. This work also differs from that in [9], which focuses on the use of higher precision arithmetic and does not use a lower precision than the working precision. Define IA b ri δri fi (1.9) Ae = T ; eb = ; yi = ; δyi = ; si = : A 0 0 xi δxi gi Three-precision iterative refinement based on the augmented system is written in Algorithm 1.1 in an analogous way as for linear systems. Algorithm 1.1 (LSIR) Iterative refinement in three precisions for the augmented system defined in (1.9). 1: Solve Aye 0 = eb in precision uf and store y0 at precision u. 2: for i = 0 : 1 do 3: Compute si = eb − Aye i at precision ur and round si to precision us. 4: Solve Aδye i = si at precision us and store δyi at precision u. 5: yi+1 = yi + δyi at precision u. 6: end for The theorems developed in [5] regarding the forward error and normwise and componentwise backward error for iterative refinement of linear systems are thus applicable. The only thing that will change is the analysis of the method for solving the correction equation in line4, since we now have a QR factorization of A, which can be used in various ways, including the procedure (which we will refer to as the \standard" method) outlined in (1.5){(1.8). For a given solver, in order to apply the analysis from [5] we need to show that (1.10) δyci = (I + usEi)δyi; uskEik1 < 1; (1.11) ksbi − Ae δycik1 ≤ us(c1kAek1kδycik1 + c2ksbik1); (1.12) jsbi − Aeδycij ≤ usGijδycij; 4 CARSON, HIGHAM, and PRANESH Table 1.1 For Algorithm 1.1, under assumptions (1.10){(1.12), conditions for convergence and the limiting size of the forward error, normwise backward error, and componentwise backward error for the solution of the augmented system Aye = eb (1.1).

Three-Precision GMRES-Based Iterative Refinement for Least Squares Problems Carson, Erin and Higham, Nicholas J. and Pranesh, Sr

Superlu Users' Guide

A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems Carson

Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization – LAPACK Working Note 184

Error Bounds from Extra-Precise Iterative Refinement

Using Mixed Precision in Numerical Computations to Speedup Linear Algebra Solvers

A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination

Multiprecision Algorithms 2 / 95 Lecture 1

Mixed Precision Iterative Refinement

Design, Implementation and Testing of Extended and Mixed Precision BLAS ∗

Adaptive Dynamic Precision Iterative Refinement Jun Kyu Lee [email protected]

PARDISO User Guide Version

Mixed-Precision Numerical Linear Algebra Algorithms: Integer Arithmetic Based LU Factorization and Iterative Refinement for Hermitian Eigenvalue Problem