Preconditioning

For: Acta Numerica Preconditioning A. J. Wathen Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK E-mail: [email protected] The computational solution of problems can be restricted by the availability of solution methods for linear(ized) systems of equations. In conjunction with iterative methods, preconditioning is often the vital component in enabling the solution of such systems when the dimension is large. We attempt a broad review of preconditioning methods. CONTENTS 1 Introduction 1 2 Krylov subspace iterative methods 5 3 Symmetric and positive definite matrices 8 4 Semidefinite matrices 18 5 Symmetric and indefinite matrices 19 6 Nonsymmetric matrices 27 7 Some comments on symmetry 33 8 Some comments on practical computing 35 9 Summary 38 References 39 1. Introduction This review aims to give a comprehensive view of the current state of preconditioning for general linear(ized) systems of equations. However we start with a specific example. 2 A. J. Wathen Consider a symmetric matrix with constant diagonals: 2 3 a0 a1 ·· an−1 6 a1 a0 a1 ·· 7 6 7 n×n A = 6 · a1 a0 ·· 7 2 R : 6 7 4 ···· a1 5 an−1 ·· a1 a0 Such a matrix is called a symmetric Toeplitz matrix. It is clearly defined by just n real numbers, but what is the best solution method for linear systems involving A? The fastest known stable factorisation (direct) methods n n 2 for solving a linear system Ax = b for x 2 R given b 2 R require O(n ) floating point computations|this is still better than the O(n3) computations required for a general real n × n matrix using the well-known Gauss Elimination algorithm. Strang (1986) observed that one could approximate A by the circulant matrix 2 a0 a1 : : : a n a n−1 : : : a2 a1 3 b 2 c b 2 c 6 a1 a0 a1 : : : a n a n−1 : : : a2 7 6 b 2 c b 2 c 7 6 7 6 .. .. 7 6 : : : a1 a0 . : : : ab n c . ::: 7 6 2 7 6 .. .. .. .. 7 a n ::: . ::: . a n−1 6 b c b c 7 n×n C = 6 2 2 7 2 R 6 .. .. .. .. 7 6 a n−1 . ::: . : : : a n 7 6 b c b c 7 6 2 2 7 6 .. .. 7 6 ::: . ab n c ::: . a0 a1 ::: 7 6 2 7 6 a2 : : : a n−1 a n : : : a1 a0 a1 7 4 b 2 c b 2 c 5 a1 a2 : : : a n−1 a n : : : a1 a0 b 2 c b 2 c where the outer diagonals of A have been overwritten by the central diagonals which are `wrapped around' in a periodic manner. Circulant matrices are diagonalised by a Fourier basis (Van Loan 1992) and it follows n that for any given b 2 R , the fast Fourier transform (FFT) (Cooley and Tukey 1965) enables the computation of the matrix-vector product Cb and n of the solution, y 2 R of Cy = b in O(n log n) arithmetic operations. The importance of Strang's observation is that C is close to A if the discarded entries a n ; : : : ; an are small; precise statements will describe this in terms d 2 e of decay in the entries of A along rows/columns moving away from the di- agonal. With such a property, one can show that not only are the matrices A and C close entrywise, but that they are spectrally close, i.e. that the eigenvalues of C−1A (equivalently the eigenvalues λ of the matrix `pencil' A − λC) are mostly close to 1. As a consequence, an appropriate iterative method (it would be the conjugate gradient method if A is positive definite: Preconditioning 3 see below) will compute a sequence of vectors x1; x2;::: which converge rapidly from any starting guess, x0 to the solution x of Ax = b. At each iteration only a matrix-vector product with A and the solution of a single system with C as well as vector operations need be computed. For any de- sired accuracy, a fixed number of iterations not dependent on the dimension n will be required. The outcome is that use of the FFT for solving systems with C and for doing multiplications of vectors by A (since A can in an ob- vious way be embedded in a circulant matrix of dimension 2n − 1 × 2n − 1) guarantees that the iterative method will require only O(n log n) operations to compute the solution. Strang's `proposal for Toeplitz matrix calculations' is thus to employ the circulant matrix C as a preconditioner for linear systems involving A. One can naively think of the preconditioning as converting a linear system Ax = b which is not so readily solved into the system C−1Ax = C−1b for which there is a fast solution method. This is not exactly what is done: generally C−1A will be nonsymmetric even though A and C are symmetric, but it is possible to precondition with C whilst preserving symmetry of the preconditioned system matrix; see Saad (2003, Algorithm 9.1). This example captures the essence of preconditioning. Not always does one have matrices which are so obviously structured (nor with potentially so many non-zero entries), nor does one always have a clever and efficient tool like the FFT, but, especially for matrices of large dimension, it can be extremely beneficial to have a good matrix approximation, i.e. a good preconditioner, when solving linear systems of equations. Indeed, often such a preconditioner is necessary in order to enable practical computation at all of large-scale problems within reasonable time on any given computational platform. The reduction from O(n2) to O(n log n) in the above example brings some large-dimensional Toeplitz systems into the range of practical computation. One does not need to necessarily have a preconditioner as a matrix; a linear operation (i.e. a computational procedure which applies a linear operation to a vector) is what is required. This article is a personal perspective on the subject of preconditioning. It addresses preconditioning only in the most common context of the solution of linear systems of equations. Much research on this problem has been directed at sparse matrices coming from the discretization of partial differential equations (PDEs) by finite element, finite volume or finite differ- ence methods. This is certainly one major source for such problems, but far from the only area where large dimensional linear systems arise. It is true, 4 A. J. Wathen however, that underlying properties of a PDE problem are often reflected in the matrices which arise from discretization, so that matrices arising from PDEs can inherit structural properties which allow them to not simply be considered as arrays of numbers. Indeed there are many PDE analysts who consider PDE matrices as simply members of families of discrete operators with many of the properties of the underlying continuous operators from which they are derived through discretization, see for example Mardal and Winther (2011), or the recent book by Málekand Strakoˇs(2014). This can be a very valuable viewpoint. Indeed from this perspective, not applying any preconditioning is seen to coincide with rather an unnatural choice of map- pings on the underlying spaces (Günnel et al. 2014). Sometimes one comes across the perception that multigrid must be the most efficient method to solve any PDE problem: this is far from the truth, however multigrid methods and ideas remain very important both as solvers for certain simpler problems, and, in particular, as components of preconditioners for a wider class of problems as we shall describe. Whether or not a matrix system arises from PDEs, knowledge of the source problem can be helpful in designing a good preconditioner. Purely algebraic approaches which simply take the numerical matrix entries as input can have merit in situations where little is known about the underlying problem|they certainly have the useful property that it is usually possible to apply such a method|but the generated preconditioners can be poor. This division between approaches to preconditioning essentially comes down to whether one is `given a problem' or `given a matrix' (Neytcheva 1999). Though we emphasise much about the former, we also give pointers to and brief description of the latter. Preconditioners are overwhelmingly used together with Krylov subspace iterative methods (van der Vorst 2003, Greenbaum 1997, Saad 2003, Meurant 1999, Hackbusch 1994, Elman et al. 2014b, Liesen and Strakoˇs2013, Olshan- skii and Tyrtyshnikov 2014), and the applicable iterative methods make different requirements of a preconditioner: for example, one could employ a nonsymmetric preconditioner for a symmetric matrix, but in general one would then lose the possibility to use the very effective conjugate gradient or minres Krylov subspace iterative methods. Therefore, we have structured this paper into sections based on the symmetry and definiteness properties of the presented coefficient matrix. In essence, this subdivision can be thought of also as in terms of the Krylov subspace method of choice. The coefficient matrix will usually be denoted by A as in the example above (though different fonts will be used). For any invertible preconditioning matrix or linear operator, P , there is an iterative method that can be used, but it is usually desirable not to ignore structural properties such as symmetry when choosing/deriving a preconditioner. We have generally restricted consideration to real square matrices, A, but Preconditioning 5 it will be evident that some of the different preconditioning ideas could be applied to matrices over different algebraic fields, in particular the complex numbers (for which `symmetric' ! `Hermitian' etc.).

Load more