6. Iterative Methods for Linear Systems
Total Page:16
File Type:pdf, Size:1020Kb
TU M ¨unchen 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 1 TU M ¨unchen 6.3. Large Sparse Systems of Linear Equations II – Minimization Methods Formulation as a Problem of Minimization • One of the best known solution methods, the method of conjugate gradients (cg), is based on a different principle than relaxation or smoothing. To see that, we will tackle the problem of solving systems of linear equations using a detour. ; • In the following, let A 2 Rn n be a symmetric and positive definite matrix, i.e. A = AT and xT Ax > 0 8x 6= 0. In this case, solving the linear system Ax = b is equivalent to minimizing the quadratic function 1 f (x) := xT Ax − bT x + c 2 for an arbitrary scalar constant c 2 R. • As A is positive definite, the hyperplane given by z := f (x) defines a paraboloid in + Rn 1 with n-dimensional ellipsoids as isosurfaces f (x) = const:, and f has a global minimum in x. The equivalence of the problems is obvious: 1 1 f 0(x) = AT x + Ax − b = Ax − b = −r(x) = 0 , Ax = b : 2 2 Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 2 TU M ¨unchen A Simple Two-Dimensional Example Compute the intersection point of two lines given by 2x = 0 and y = 0. This corresponds to the linear system y 2 0 x 0 y=0 x = : 0 1 y 0 2x=0 This is equivalento to minimizing the quadratic function 1 f (x; y) = −x2+ y 2: 2 graph of f (x) isolines f (x) = c Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 3 TU M ¨unchen Clearly: Every disturbance e 6= 0 of the solution x of Ax = b increases the value of f . Thus, the point x is in deed the minimum of f : 1 1 f (x + e) = xT Ax + eT Ax + eT Ae − bT x − bT e + c 2 2 1 = f (x) + eT (Ax − b) + eT Ae 2 1 = f (x) + eT Ae > f (x) : 2 Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 4 TU M ¨unchen The Method of Steepest Descent • What do we gain by reformulating? We can also apply optimization methods and, therefore, increase the set of possible solvers now. • Let’s examine the techniques to find minima. An obvious possibility is provided by the method of steepest descent. guardian.co.uk Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 5 TU M ¨unchen • The method of steepest descent tries to find an update in the direction of the steepest descent of our quadratic function 1 f (x) = xT Ax − bT x + c; 2 i.e., in the direction of the negative gradient −f 0(x(i)) = b − Ax(i) = r (i): Of course, it would be better to search in the direction of the error e(i) := x(i) − x, but the latter is unfortunately unknown. Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 6 TU M ¨unchen • The steepest descent searches the one-dimensional minimum of f in direction r (i): (i) (i) min f x + αi r : αi >0 • Thus, αi has to be chosen such that 0 (i) (i) f x + αi r = 0 T (i) (i) (i) , b − A x + αi r r = 0 T r (i) r (i) , αi = : r (i)T Ar (i) Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 7 TU M ¨unchen • Summarizing our results, we get the following steepest descent method: For i = 0; 1;:::, repeat r (i) := b − Ax(i) ; T r (i) r (i) αi := ; r (i)T Ar (i) (i+1) (i) (i) x := x + αi r ; • Alternatively, start with r (0) := b − Ax(0) and repeat for i = 0; 1;::: T r (i) r (i) αi := ; r (i)T Ar (i) (i+1) (i) (i) x := x + αi r ; (i+1) (i) (i) r := r − αi Ar ; which saves one of the two matrix vector products (the only really expensive step of the algorithm). Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 8 TU M ¨unchen • Remember: also the relaxation methods (Richardson, Jacobi, Gauss-Seidel, SOR) use the residual as a direction for improving the solution: – Richardson: x(i+1) = x(i) + r (i); (i+1) (i) −1 (i) – Jacobi: x = x + DA r ; (i+1) (i) −1 (i) – Gauss-Seidel: x = x + (DA + LA) r ; −1 (i+1) (i) 1 (i) – SOR: x = x + α DA + LA r : • In contrast to the relaxation methods, that multiply the residual with a matrix M−1 bevor using it to update x, the seepest descent updates x(i) with (i+1) (i) (i) x = x + αi r ; i.e., multiplies the residual only with a scalar, whos value is determined by a one-dimensional minimization in the direction of the residual. Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 9 TU M ¨unchen Discussion of the Steepest Descent • The convergence behavior of the method of steepest descent is poor. One of the few trivial special cases is the identity matrix: Here, the isosurfaces are spheres, the gradient always points towards the center (the minimum), and we reach our goal with only one step! Generally, we eventually get arbitrarily close to the minimum but that can also last arbitrarily long (as we can always destroy something of the already achieved). • To eliminate this drawback, we stick to our minimization approach but keep looking for better search directions. If all search directions were orthogonal, and if the error was orthogonal to all previous search directions after i steps, then the results already achieved would never be lost again, and after at most n steps we would be at the minimum – just as with a direct solver. Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 10 TU M ¨unchen Conjugate Directions • We will now look for alternative search directions d(i), i = 0; 1;:::, instead of the residuals or negative gradients r (i): (i+1) (i) (i) x := x + αi d : – For a short moment, let’s think about the ideal case of above: Let the active error e(i) be orthogonal to the subspace spanned by d(0);:::; d(i−1). Then, (i) (i) the new update αi d has to be chosen appropriately in a way such that d is orthogonal to d(0);:::; d(i−1) as well and that the new error e(i+1) is additionally orthogonal to d(i), i.e. (i)T (i+1) (i)T (i) (i) 0 = d e = d e + αi d T d(i) e(i) ) αi := − : d(i)T d(i) – Even if someone gave us the new direction d(i) as a gift, it would be of no use, as we do not know e(i). Therefore, we once again consider the residuals instead of errors and construct A-orthogonal or conjugate search directions instead of orthogonal ones. Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 11 TU M ¨unchen – Two vectors u 6= 0 and v 6= 0 are called A-orthogonal or conjugate, if uT Av = 0. The according requirement for e(i+1) is now (i)T (i+1) (i)T (i) (i) 0 = d Ae = d A e + αi d T T d(i) Ae(i) d(i) r (i) ) αi := − = : d(i)T Ad(i) d(i)T Ad(i) – Effectively, A is “inserted” into the usual definition of orthogonality – in case of the identity (A = I), there is no change. • Interestingly, the mentioned constraint is equivalent to trying to find the minimum in the direction d(i), the way we did it for the steepest descent: (i+1) @f @x T T 0 = (x(i+1)) = f 0(x(i+1)) = −r (i+1) d(i) = d(i) Ae(i+1) : @αi @αi Miriam Mehl: 6. Iterative Methods for Linear Systems The stepwise approach to the solution... by minimization, January 14, 2013 12 TU M ¨unchen Conjugate Directions (2) • In contrast to what has been said before, here, all terms are known. Therefore, we get the following preliminary algorithm – ‘preliminary’ because we still don’t know how to get the conjugate search directions d(i): – We begin with d(0) := r (0) = b − Ax(0) and iterate for i = 0; 1;:::: T d(i) r (i) αi := ; d(i)T Ad(i) (i+1) (i) (i) x := x + αi d ; (i+1) (i) (i) r := r − αi Ad : – A comparison of this method with steepest descent shows the close relation between those two methods. • Therefore, the only problem remaining is how to construct the conjugate directions. This can be done, for example, with a Gram-Schmidt A-process, which works totally analogously to the common Gram-Schmidt process (we start with a basis u(0);:::; u(n−1) and subtract all fractions that are not A-orthogonal to u(0);:::; u(i−1) from u(i)).