Numerical Linear Algebra and Matrix Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Numerical Linear Algebra and Matrix Analysis Higham, Nicholas J. 2015 MIMS EPrint: 2015.103 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK ISSN 1749-9097 1 exploitation of matrix structure (such as sparsity, sym- Numerical Linear Algebra and metry, and definiteness), and the design of algorithms y Matrix Analysis to exploit evolving computer architectures. Nicholas J. Higham Throughout the article, uppercase letters are used for matrices and lower case letters for vectors and scalars. Matrices are ubiquitous in applied mathematics. Matrices and vectors are assumed to be complex, unless ∗ Ordinary differential equations (ODEs) and partial dif- otherwise stated, and A = (aji) denotes the conjugate ferential equations (PDEs) are solved numerically by transpose of A = (aij ). An unsubscripted norm k · k finite difference or finite element methods, which lead denotes a general vector norm and the corresponding to systems of linear equations or matrix eigenvalue subordinate matrix norm. Particular norms used here problems. Nonlinear equations and optimization prob- are the 2-norm k · k2 and the Frobenius norm k · kF . lems are typically solved using linear or quadratic The notation “i = 1: n” means that the integer variable models, which again lead to linear systems. i takes on the values 1; 2; : : : ; n. Solving linear systems of equations is an ancient task, undertaken by the Chinese around 1AD, but the study 1 Nonsingularity and Conditioning of matrices per se is relatively recent, originating with Arthur Cayley’s 1858 “A Memoir on the Theory of Matri- Nonsingularity of a matrix is a key requirement in many ces”. Early research on matrices was largely theoret- problems, such as in the solution of n linear equations ical, with much attention focused on the development in n unknowns. For some classes of matrices, nonsingu- of canonical forms, but in the 20th century the practi- larity is guaranteed. A good example is the diagonally cal value of matrices started to be appreciated. Heisen- dominant matrices. The matrix A 2 Cn×n is strictly berg used matrix theory as a tool in the development diagonally dominant by rows if of quantum mechanics in the 1920s. Early proponents X of the systematic use of matrices in applied mathemat- jaij j < jaiij; i = 1: n ics included Frazer, Duncan, and Collar, whose 1938 j6=i book Elementary Matrices and Some Applications to and strictly diagonally dominant by columns if A∗ is Dynamics and Differential Equations emphasized the strictly diagonally dominant by rows. Any matrix that important role of matrices in differential equations and is strictly diagonally dominant by rows or columns mechanics. The continued growth of matrices in appli- is nonsingular (a proof can be obtained by applying cations, together with the advent of mechanical and Gershgorin’s theorem in section 5.1). then digital computing devices, allowing ever larger Since data is often subject to uncertainty we wish problems to be solved, created the need for greater to gauge the sensitivity of problems to perturbations, understanding of all aspects of matrices from theory which is done using condition numbers. An appropriate to computation. condition number for the matrix inverse is This article treats two closely related topics: matrix k(A + ∆A)−1 − A−1k analysis, which is the theory of matrices with a focus lim sup : "!0 "kA−1k on aspects relevant to other areas of mathematics, and k∆Ak6"kAk numerical linear algebra (also called matrix computa- This expression turns out to equal κ(A) = kAkkA−1k, tions), which is concerned with the construction and which is called the condition number of A with respect analysis of algorithms for solving matrix problems as to inversion. This condition number occurs in many well as related topics such as problem sensitivity and contexts. For example, suppose A is contaminated rounding error analysis. by errors and we perform a similarity transformation −1 −1 −1 Important themes that are discussed in this article X (A + E)X = X AX + F. Then kFk = kX EXk 6 include the matrix factorization paradigm, the use of κ(X)kEk and this bound is attainable for some E. Hence unitary transformations for their numerical stability, the errors can be multiplied by a factor as large as κ(X). We therefore prefer to carry out similarity and †. Author’s final version, before copy editing and cross-referencing, other transformations with matrices that are well con- of: N. J. Higham. Numerical linear algebra and matrix analysis. In N. J. ditioned, that is, ones for which κ(X) is close to its Higham, M. R. Dennis, P. Glendinning, P. A. Martin, F. Santosa, and lower bound of 1. By contrast, a matrix for which is J. Tanner, editors, The Princeton Companion to Applied Mathematics, κ pages 263–281. Princeton University Press, Princeton, NJ, USA, 2015. large is called ill conditioned. For any unitary matrix X, 2 κ2(X) = 1, so in numerical linear algebra transforma- angular systems. It is then clear how to solve efficiently tions by unitary or orthogonal matrices are preferred several systems Axi = bi, i = 1: r , with different right- and usually lead to numerically stable algorithms. hand sides but the same coefficient matrix A: compute In practice we often need an estimate of the matrix the LU factors once and then re-use them to solve for condition number number κ(A) but do not wish to go each xi in turn. to the expense of computing A−1 in order to obtain This matrix factorization1 viewpoint dates from it. Fortunately, there are algorithms that can cheaply around the 1940s and has been extremely successful produce a reliable estimate of κ(A) once a factorization in matrix computations. In general, a factorization is a of A has been computed. representation of a matrix as a product of “simpler” Note that the determinant, det(A), is rarely com- matrices. Factorization is a tool that can be used to puted in numerical linear algebra. Its magnitude gives solve a variety of problems, as we will see below. no useful information about the conditioning of A, not Two particular benefits of factorizations are unity least because of its extreme behavior under scaling: and modularity. GE, for example, can be organized det(αA) = αn det(A). in several different ways, corresponding to different orderings of the three nested loops that it comprises, 2 Matrix Factorizations as well as the use of different blockings of the matrix elements. Yet all of them compute the same LU factor- The method of Gaussian elimination (GE) for solving ization, carrying out the same mathematical operations a nonsingular linear system Ax = b of n equations in a different order. Without the unifying concept of a in n unknowns reduces the matrix A to upper trian- factorization, reasoning about these GE variants would gular form and then solves for x by substitution. GE be difficult. is typically described by writing down the equations Modularity refers to the way that a factorization (k+1) (k) (k) (k) (k) aij = aij − aik akj =akk (and similarly for b) that breaks a problem down into separate tasks, which can (1) (1) be analyzed or programmed independently. To carry describe how the starting matrix A = A = (aij ) changes on each of the n − 1 steps of the elimina- out a rounding error analysis of GE we can analyze the tion in its progress towards upper triangular form U. LU factorization and the solution of the triangular sys- Working at the element level in this way leads to a pro- tems by substitution separately and then put the analy- fusion of symbols, superscripts, and subscripts that ses together. The rounding error analysis of substitu- tend to obscure the mathematical structure and hin- tion can be re-used in the many other contexts in which der insights being drawn into the underlying process. triangular systems arise. One of the key developments in the last century was An important example of the use of LU factoriza- the recognition that it is much more profitable to work tion is in iterative refinement. Suppose we have used at the matrix level. Thus the basic equation above is GE to obtain a computed solution xb to Ax = b in (k+1) (k) floating-point arithmetic. If we form r = b − Ax and written as A = MkA , where Mk agrees with the b identity matrix except below the diagonal in the kth solve Ae = r , then in exact arithmetic y = xb + e is (k) (k) the true solution. In computing e we can reuse the LU column, where its (i, k) element is mik = −aik =akk , i = k + 1: n. Recurring the matrix equation gives factors of A, so obtaining y from xb is inexpensive. In (n) practice, the computation of r , e, and y is subject to U := A = Mn−1 :::M1A. Taking the Mk matrices over to the left-hand side leads, after some calculations, to rounding errors so the computed yb is not equal to x. the equation A = LU, where L is unit lower triangular, But under suitable assumptions yb will be an improved approximation and we can iterate this refinement pro- with (i, k) element mik. The prefix “unit” means that L cess. Iterative refinement is particularly effective if has ones on the diagonal. r can be computed using extra precision. GE is therefore equivalent to factorizing the matrix Two other key factorizations are: A as the product of a lower triangular matrix and an upper triangular matrix—something that is not at all • Cholesky factorization: for Hermitian positive def- obvious from the element-level equations.