The Conjugate Gradient Method for Solving Linear Systems of Equations

The Conjugate Gradient Method for Solving Linear Systems of Equations Mike Rambo Mentor: Hans de Moor May 2016 Department of Mathematics, Saint Mary's College of California Contents 1 Introduction 2 2 Relevant Definitions and Notation 2 3 Linearization of Partial Differential Equations 4 4 The Mechanisms of the Conjugate Gradient Method 6 4.1 Steepest Descent . .6 4.2 Conjugate Directions . 12 4.3 Conjugate Gradient Method . 14 5 Preconditioning 18 6 Efficiency 19 7 Conclusion 20 8 References 21 1 Abstract The Conjugate Gradient Method is an iterative technique for solving large sparse systems of linear equations. As a linear algebra and matrix manipulation technique, it is a useful tool in approximating solutions to linearized partial differential equations. The fundamental concepts are introduced and utilized as the foundation for the derivation of the Conjugate Gradient Method. Alongside the fundamental concepts, an intuitive geometric understanding of the Conjugate Gradient Method's details are included to add clarity. A detailed and rigorous analysis of the theorems which prove the Conjugate Gradient algorithm are presented. Extensions of the Conjugate Gradient Method through preconditioning the system in order to improve the efficiency of the Conjugate Gradient Method are discussed. 1 Introduction The Method of Conjugate Gradients (cg-method) was initially introduced as a direct method for solving large systems of n simultaneous equations and n unknowns. Eduard Stiefel of the Institute of Applied Mathematics at Zürich and Magnus Hestenes of the Institute for Numerical Analysis, met at a symposium held at the Institute for Numerical Analysis in August 1951 where one of the topics discussed was solving systems of simultaneous linear equations. Hestenes and Stiefel collaborated to write the first papers on the method, published as early as December 1952. During the next two decades, the cg- method was utilized heavily, but not accepted by the numerical analysis community due to a lack of understanding of the algorithms and their speed. Hestenes and Stiefel had suggested the use of the cg-method as an iterative technique for well-conditioned problems; however, it was not until the early 1970's that researchers were drawn to the cg-method as an iterative method. In current research the cg- method is understood as an iterative technique with a focus on improving and extending the cg-method to include different classes of problems such as unconstrained optimization or singular matrices. 2 Relevant Definitions and Notation Throughout this paper, the following definitions and terminology will be utilized for the cg-method. Defintion 2.1. Partial Differential Equations (PDEs): an equation in which the solution is a function which has two or more independent variables. Remark. The cg-method is utilized to estimate solutions to linear PDEs. Linear PDEs are PDEs in which the unknown function and its derivatives all have power 1. 2 A solution to a PDE can be found by linearizing into a system of linear equations: a11x1 + a12x2 + ··· + a1nxn = b1 a21x1 + a22x2 + ··· + a2nxn = b2 . .. = . an1x1 + an2x2 + ··· + annxn = bn This system of equations will be written in matrix form Ax = b, where 2 3 2 3 2 3 a11 a12 ··· a1n x1 b1 6 7 6 7 6 7 6 7 6 7 6 7 6a21 a22 ··· a2n 7 6x2 7 6b2 7 6 7 6 7 6 7 A = 6 . 7 x = 6 . 7 b = 6 . 7 6 . .. 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 4 5 4 5 4 5 an1 an2 ··· ann xn bn The matrix A is referred to as the Coefficient Matrix, and x the Solution Matrix. Defintion 2.2. A matrix is defined to be sparse if a large proportion of its entries are zero. Defintion 2.3. A n × n matrix A is defined to be symmetric if A = AT . Defintion 2.4. A n × n matrix A is Positive Definite if xTAx > 0 for every n-dimensional column matrix x 6= 0, where xT is the transpose of vector x. Remark. The notation hx;Axi is commonly used. This notation is called the inner product, and hx;Axi = xTAx. Typically systems of equations arising from linearizing PDEs yield coefficient matrices which are sparse. However, when discussing large sparse matrix solutions, it is not sufficient to know that the coefficient matrix is sparse. Systems arising from linear PDEs have a coefficient matrix which have non-zero diagonals and all other entries zero. When programming the cg-method, a programmer can utilize the sparse characteristic and only program the non-zero diagonals to reduce storage and improve efficiency. In order for the cg-method to yield a solution the coefficient matrix needs to be symmetric and positive definite. Because the inner product notation is utilized throughout the rest of this paper, it is useful to know some of the basic properties of the inner product. Theorem 2.1. [4] For any vectors x; y; and z and any real number α, the following are true: 1. hx; yi = hy; xi. 2. hαx; yi = hx; αyi = αhx; yi 3. hx + z; yi = hx; yi + hz; yi 4. hx; xi ≥ 0 5. hx; xi = 0 if and only if x = 0 3 Defintion 2.5. The Gradient is defined as rF where r is called the Del operator and is the sum of the partial derivatives with respect to each variable of F . @ @ @ Eg. For a function F (x; y; z), then r is @x x^ + @y y^ + @z z^, and rF can be written as @ @ @ rF = x^ + y^ + z^ F @x @y @z @F @F @F = x^ + y^ + z:^ @x @y @z Remark. The gradient will be used to find the direction in which a scalar-valued function of many variables increases most rapidly. Theorem 2.2. [9] Let the scalar σ be defined by σ = bTx where b and x are n × 1 matrices and b is independent of x, then @σ = b @x Theorem 2.3. [9] Let σ be defined by σ = xTAx where A is a n × n matrix, x is a n × 1 matrix, and A is independent of x, then @σ = (AT + A)x @x 3 Linearization of Partial Differential Equations There are several established methods to linearize PDEs. For the following example for linearizing the one-dimensional heat equation, the Forward Difference Method is utilized. Note that this process will work for all linear PDEs. Consider the one-dimensional heat equation: @φ @2φ (x; t) = α (x; t); 0 < x < l, t > 0 @t @x2 This boundary value problem has the properties that the endpoints remain constant throughout time at 0, so φ(0; t) = φ(l; t) = 0. Also at time = 0, φ(x; 0) = f(x), which describes the whole system from 0 ≤ x ≤ l. The forward difference method utilizes Taylor series expansion. A Taylor expansion yields infinitely many terms, so the utilization of Taylor0s formula with remainder which allows the truncation of the Taylor series at n derivatives with an error R is also needed. 4 Theorem 3.1. [6] Taylor0s formula with remainder If f and its first n derivatives are continuous in a closed interval I : fa ≤ x ≤ bg, and if f n+1(x) exists at each point of the open interval J : fa < x < bg, then for each x in I there is a β in J for which f n+1(β) R = (x − a)n+1. n+1 (n + 1)! l First, the x-axis is split into equal segments of length s, and a step size of ∆x = . For a time step s size ∆t, note that (xi; tj) = (i∆x; j∆t) for 0 ≤ i ≤ s and j ≥ 0. By expanding φ using a Taylor series expansion in t, @φ ∆t2 @2φ φ(xi; ti + ∆t) = φ(xi; ti) + ∆t + 2 + ··· @t tj 2! @t tj @φ Solving for , @t 2 @φ φ(xi; tj + ∆t) − φ(xi; tj) ∆t@ φ = − 2 − · · · @t ti ∆t 2! @t tj ∆t2 @2φ By Taylor's formula with remainder, the higher order derivatives can be replaced with (x ; τ ), 2 @t2 i j @φ for some τ 2 (t ; t ) so is reduced to j j−1 t+1 @t 2 2 @φ φ(xi; tj + ∆t) − φ(xi; tj) ∆t @ φ = − 2 @t tj ∆t 2 @t τ By a similar Taylor expansion, 2 2 4 @ φ φ(xi + ∆x; tj) − 2φ(xi; tj) + φ(xi − ∆x; tj) ∆x ∆ φ = − (χ, t ), χ 2 (x ; x ) @x2 ∆x2 12 ∆x4 j i−1 x+1 By substituting back into the heat equation and writing φ(xi; tj) as Φi;j, Φi;j+1 − Φi;j Φi+1;j − 2Φi;j + Φi−1;j = α ∆t ∆x2 with an error of ∆t@2φ ∆2 @4φ ξ = (x ; τ ) − α (χ ; t ) 2 @t2 i j 12 @x4 i j Solving for Φi;j+1 as a function of Φi;j, which is found to be: 2α∆t ∆t Φ = (1 − )Φ + α (Φ + Φ ) for each i = 0; 1; ··· ; s − 1 and j = 0; 1; ··· i;j+1 ∆x2 i;j ∆x2 i+1;j i−1;j Now by choosing initial values i = 0 and j = 0, the first vector for the given system is: 5 Φ0;0 = f(x0); Φ1;0 = f(x1);::: Φs;0 = f(xs) Building the next vectors, Φ0;1 = φ(0; t1) = 0 2α∆t ∆t Φ = (1 − )Φ + α (Φ + Φ ) 1;1 ∆x2 1;0 ∆x2 2;0 0;0 . 2α∆t ∆t Φ = (1 − )Φ + α (Φ + Φ ) s−1;1 ∆x2 s−1;0 ∆x2 s;0 s−2;0 Φs;1 = φ(s; t1) = 0 By continuing to build these vectors in s, there is a pattern which can be written as the matrix equation θ(j) = Aθ(j−1), for each successive j > 0 and where A is the (s − 1) x (s − 1) matrix 2 3 (1 − 2β) β 0 ··· 0 6 7 6 .

Load more