Systems of Linear Equations

CHAPTER 4 Systems of Linear Equations In this chapter we'll examine both iterative and direct methods for solving equations of the form Ax = b (4.1) n where x, b R , A MnR is an n n matrix, x is an unknown vector (to be found) and b and A are 2 2 × known. Solving systems of linear equations is still the most important problem in computational mathematics, because it is used as a sub-problem in solving other problems. Algorithms that solve non-linear systems commonly use linear approximations, which give rise to systems of linear equations. Algorithms that op- timise over feasible sets given by linear and non-linear equalities and inequalities commonly solve systems related to first-order optimality conditions iteratively, which give rise to systems of linear equations. In undergraduate mathematics you have learned the mathematical ideas behind direct and iterative approaches to solving (4.1). You should understand: Theorem 4.1. The following are equivalent for any n n matrix A: × Ax = b has a unique solution of all b R. • 2 Ax = 0 implies x = 0. • A−1 exists. • det(A) = 0. • 6 rank(A) = n. • The full rank of A is also our assumption throughout the chapter. Here, we build on these and analyse the related algorithms, focussing first on conditioning, second on the stability of direct methods, and third on the convergence and stability of iterative methods. This still leaves much unexplained, including conjugate gradients (CG), generalised minimal residuals (GMRES), and preconditioning, i.e. methods for changing the condition. See Liesen and Strakos [2012] for much more. 1. Condition of a System of Linear Equations Condition suggests how changes in A and b, the \instance" of the problem, affect the solution x, using any algorithm. We will examine errors in A and b separately. It turns out that in both cases the condition number of the matrix A plays a role. Example 4.2 (An Ill-Conditioned Matrix). Consider the system of linear equations x1 + 0:99x2 = 1:99 0:99x1 + 0:98x2 = 1:97: The true solution is x = 1 and x = 1 but x = 3:0000 and x = 1:0203 gives 1 2 1 2 − x1 + 0:99x2 = 1:989903 0:99x1 + 0:98x2 = 1:970106: Thus, a small change in the problem data, a change in the vector b from 1:99 1:989903 to ; 1:97 1:970106 1 3 1.0001 2 2 x 2 1 x 1.0000 0 1 0.9999 − 1 0 1 2 3 − 0.9999 1.0000 1.0001 x1 x1 Figure 4.1. The system x1 + 0:99x2 = 1:99 (solid), 0:99x1 + 0:98x2 = 1:97 (dashed). leads to a large change in the solution: this is our criterion for ill-conditioning. Intuitively, it is easy to see what is going wrong. See the illustration in Figure 4.1. } 1.1. Perturbation of b. Let the right-hand-side b be perturbed by δb. Then we want to find the solution of A(x + δx) = b + δb: (4.2) Let denote a vector or matrix norm, according to context. By (4.2), Aδx = δb, so k k δx = A−1δb δx A−1 δb (a sharp bound). (4.3) ) k k ≤ k k k k Since b = Ax, the properties of matrix norms again give b A x : (4.4) k k ≤ k k k k Hence, combining (4.3) and (4.4): each LHS RHS, so QLHS QRHS: ≤ ≤ δx b A A−1 x δb k k k k ≤ k k k k k k k k and assuming b = 0 we get 6 δx δb k k A A−1 k k x ≤ k k k k b k k k k i.e., (rel. error in x) A A−1 (rel. error in b): ≤ k k k k 2. The Condition Number of a Matrix Thus, the quantity A A−1 measures the relative change in solution for a given relative change in k k k k problem: it measures the relative condition of the system of linear equations problem. This leads to: Definition 4.3. Given a matrix norm , the condition number of matrix A is k k cond (A) = A A−1 : rel k kk k This depends on the norm used; but, since the underlying vector norms only differ by a fixed multiplicative constant for a given n (all norms on Rn are equivalent), all measures of condition number are equally good. We can interpret condrel(A) as: the amount a relative error in b is magnified in the solution vector x; or • the distortion A produces when applied to the unit sphere; or • 2 how \close" A (and indeed A−1) is to being a singular matrix. • Definition 4.4. We also define the spectral condition number of A as maxλ2σ A λ cond* (A) := ( ) j j: rel min λ λ2σ(A) j j Here σ(A), the spectrum of A, is the set of all eigenvalues of A. Recall that max λ = ρ(A) = rσ(A), λ2σ(A) j j the spectral radius of A. If λ is an eigenvalue of A, its modulus (absolute value) or length λ is the factor by which a λ-eigenvector j j is expanded (if λ > 1) or contracted (if λ < 1). Thus j j j j ρ(A) = max λ is the largest factor by which A multiplies an eigenvector, while • λ2σ(A) j j min λ is the smallest factor by which A multiplies an eigenvector. • λ2σ(A) j j jλj The ratio cond* (A) := maxλ2σ(A) is thus a measure of the distortion produced by A: it measures rel minλ2σ(A) jλj how great is the difference in expansion/contraction of eigenvectors that A can cause. See Example 4.2 below. In the example above, the related matrix 1:00 :99 A = : :99 :98 has eigenvalues λ = 1:98, λ = 0:00005 of the characteristic equation, 1 2 − 1:00 λ :99 det(A λI) = det − − :99 :98 λ − = (1 λ)(:98 λ) :992 − − − = λ2 1:98λ + :98 :9801 = λ2 1:98λ :0001: − − − − Thus the spectral condition number is cond* (A) = 1:98 = 0:00005 =39,600. Hence, this matrix is rel j j j − j very ill-conditioned. The condition number cond (A) is bounded below by 1: this is seen by noting that I = 1 for any rel k k norm and 1 = I = AA−1 A A−1 = cond (A): k k k k ≤ k kk k rel Fact 4.5. Each norm-based condition number is also bounded below by the spectral condition number of A: 1 cond* (A) cond (A) ≤ rel ≤ rel for any norm. Thus the spectral condition number is the smallest measure of relative condition of the system of linear equations problem. 2.1. Perturbation of A. Having looked at changes in b, and seen how the condition number arises naturally there, we now look at changes in A alone. If A is perturbed by δA then we have b = (A + δA)(x + δx) = Ax + Aδx + δAx + δAδx Aδx = δA(x + δx) ) − δx = A−1δA(x + δx) ) − Taking norms and using the triangle inequality we have − δx = A 1δA(x + δx) A−1 δA ( x + δx ) ≤ k kk k k k k k δx 1 A−1 δA A−1 δA x : ) k k − k k k k ≤ k k k k k k δx A−1 δA A A−1 δA Thus k k k k k k = k k k k k k x ≤ 1 A−1 δA (1 A−1 δA ) A k k − k k k k − k k k k k k 3 Since 1 = I = AA−1 A A−1 we have 1 A−1 . Thus, if A−1 δA 1 (and so k k k k ≤ k kk k kAk ≤ k k k k k k δA = A 1), then 1 A−1 δA 1 and we have k k k k − k k k k ≈ δx δA k k cond (A)k k: x ≤ rel A k k k k Thus, for a small perturbation of A, we again have that condition number measures the relative condition of the system of linear equations problem. A similar result can be derived for the case where both A and b are perturbed. Theorem 4.6. If A is non-singular, and δA 1 k k < A condrel(A) k k then A + δA is also non-singular. This theorem tells us that the condition number measures the distance from A to the nearest singular matrix: it is a better measure than the determinant of \how close to singularity" a matrix is. It also 1 says that the set GLnR of invertible n n matrices is an open set: about any matrix in GLnR there × is an "-neighbourhood (or "-ball) contained wholly in GLnR. 2.2. Errors and Residuals. There are two common ways to measure the discrepancy between the true solution x and the computed solutionx ^: Error: δx = x x^ − Residual: r = b Ax^ − If A is invertible, and either δx or r is zero, then both must be zero. In many applications of linear equations, we want to solve Ax = b so that r, the difference between the left- and right-hand sides, is small, i.e., so that r = b Ax^ is small. k k k − k Intuitively, we can think of the residual as follows: if you have a computed solutionx ^ to a system of linear equations and you know the exact solution x, then you know the error δx = x x^; but if you − don't know the solution x beforehand then the residual is a measure along a different axis of how close you are, r = b Ax^.

Load more