 0   a2,1 0     . . ..  A = diag (a1,1, a2,2, ··· , an,n) +  . . .     an−1,1 an−1,2 ··· 0  an,1 an,2 ··· an,n−1 0   0 a1,2 ··· a1,n−1 a1,n  0 ··· a2,n−1 a2,n     .. . .  +  . . .     0 an−1,n  0             def       = D − L − U =   −   −   .      

§7.3 The Jacobi and Gauss-Siedel Iterative Techniques n×n I Problem: To solve Ax = b for A ∈ R . I Methodology: Iteratively approximate solution x. No GEPP.             def       = D − L − U =   −   −   .      

§7.3 The Jacobi and Gauss-Siedel Iterative Techniques n×n I Problem: To solve Ax = b for A ∈ R . I Methodology: Iteratively approximate solution x. No GEPP.  0  Matrix splitting  a2,1 0     . . ..  A = diag (a1,1, a2,2, ··· , an,n) +  . . .     an−1,1 an−1,2 ··· 0  an,1 an,2 ··· an,n−1 0   0 a1,2 ··· a1,n−1 a1,n  0 ··· a2,n−1 a2,n     .. . .  +  . . .     0 an−1,n  0 §7.3 The Jacobi and Gauss-Siedel Iterative Techniques n×n I Problem: To solve Ax = b for A ∈ R . I Methodology: Iteratively approximate solution x. No GEPP.  0  Matrix splitting  a2,1 0     . . ..  A = diag (a1,1, a2,2, ··· , an,n) +  . . .     an−1,1 an−1,2 ··· 0  an,1 an,2 ··· an,n−1 0   0 a1,2 ··· a1,n−1 a1,n  0 ··· a2,n−1 a2,n     .. . .  +  . . .     0 an−1,n  0             def       = D − L − U =   −   −   .        10 −1 2 0   −1 11 −1 3  Ex: Matrix splitting for A =    2 −1 10 −1  0 3 −1 8

                  A =   −   −        

 0   0 1 −2 0   1 0   0 1 −3  = diag (10, 11, 10, 8) −   −    −2 1 0   0 1  0 −3 1 0 0 Gauss-Siedel Method: Rewrite

x = (D − L)−1 U x + (D − L)−1 b.

Gauss-Siedel iteration with given x(0),

x(k+1) = (D − L)−1 U x(k) + (D − L)−1 b, for k = 0, 1, 2, ··· .

The Jacobi and Gauss-Siedel Methods for solving Ax = b

Jacobi Method: With matrix splitting A = D − L − U, rewrite

x = D−1 (L + U) x + D−1 b.

Jacobi iteration with given x(0),

x(k+1) = D−1 (L + U) x(k) + D−1 b, for k = 0, 1, 2, ··· . The Jacobi and Gauss-Siedel Methods for solving Ax = b

Jacobi Method: With matrix splitting A = D − L − U, rewrite

x = D−1 (L + U) x + D−1 b.

Jacobi iteration with given x(0),

x(k+1) = D−1 (L + U) x(k) + D−1 b, for k = 0, 1, 2, ··· .

Gauss-Siedel Method: Rewrite

x = (D − L)−1 U x + (D − L)−1 b.

Gauss-Siedel iteration with given x(0),

x(k+1) = (D − L)−1 U x(k) + (D − L)−1 b, for k = 0, 1, 2, ··· . Jacobi iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 (k) −1 xJ = D (L + U) xJ + D b  1 2   6  0 10 − 10 0 10 1 1 3 25  11 0 11 − 11  (k)  11  =  2 1 1  xJ +  11   − 10 10 0 10   − 10  3 1 15 0 − 8 8 0 8

Ex: Jacobi Method for Ax = b, with  10 −1 2 0   6   −1 11 −1 3   25  A =   , b =    2 −1 10 −1   −11  0 3 −1 8 15 A = D − L − U  0   0 1 −2 0   1 0   0 1 −3  = diag (10, 11, 10, 8) −   −   .  −2 1 0   0 1  0 −3 1 0 0 Ex: Jacobi Method for Ax = b, with  10 −1 2 0   6   −1 11 −1 3   25  A =   , b =    2 −1 10 −1   −11  0 3 −1 8 15 A = D − L − U  0   0 1 −2 0   1 0   0 1 −3  = diag (10, 11, 10, 8) −   −   .  −2 1 0   0 1  0 −3 1 0 0 Jacobi iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 (k) −1 xJ = D (L + U) xJ + D b  1 2   6  0 10 − 10 0 10 1 1 3 25  11 0 11 − 11  (k)  11  =  2 1 1  xJ +  11   − 10 10 0 10   − 10  3 1 15 0 − 8 8 0 8 Gauss-Siedel iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 −1 xGS = (D − L) U xGS + (D − L) b  10 −1  0 1 −2 0    −1 11   0 1 −3  (k) =     x   2 −1 10   0 1  GS  0 3 −1 8 0  6  10 25  11  +  11  .  − 10  15 8

Ex: Gauss-Siedel Method for Ax = b A = D − L − U  10   0 1 −2 0   −1 11   0 1 −3  =   −   .  2 −1 10   0 1  0 3 −1 8 0 Ex: Gauss-Siedel Method for Ax = b A = D − L − U  10   0 1 −2 0   −1 11   0 1 −3  =   −   .  2 −1 10   0 1  0 3 −1 8 0 Gauss-Siedel iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 −1 xGS = (D − L) U xGS + (D − L) b  10 −1  0 1 −2 0    −1 11   0 1 −3  (k) =     x   2 −1 10   0 1  GS  0 3 −1 8 0  6  10 25  11  +  11  .  − 10  15 8  1   2  Jacobi vs. Gauss-Siedel: solution x =  .  −1  1

Convergence Comparision, Jacobi vs. G-S 10 0 Jacobi G-S 10 -2

10 -4

10 -6

10 -8

10 -10

10 -12

10 -14

10 -16 0 5 10 15 20 25 30 35 40 45 50 General Iteration Methods

To solve A x = b with matrix splitting A = D − L − U,

I Jacobi Method:

(k+1) −1 (k) −1 xJ = D (L + U) xJ + D b.

I Gauss-Siedel Method:

(k+1) −1 (k) −1 xGS = (D − L) U xGS + (D − L) b.

General Iteration Method: for k = 0, 1, 2, ···

x(k+1) = T x(k) + c.

Next: convergence analysis on General Iteration Method Proof: Assume ρ(T ) < 1. Then (1) has unique solution x(∗).

    x(k+1) − x(∗) = T x(k) − x(∗) = T 2 x(k−1) − x(∗)   = ··· = T k+1 x(0) − x(∗) =⇒ 0.

Conversely, if ··· (omitted)

General Iteration: x(k+1) = T x(k) + c for k = 0, 1, 2, ···

Thm: The following statements are equivalent

I ρ(T ) < 1.

I The equation x = T x + c (1) has a unique solution and {x(k)} converges to this solution from any x(0). General Iteration: x(k+1) = T x(k) + c for k = 0, 1, 2, ···

Thm: The following statements are equivalent

I ρ(T ) < 1.

I The equation x = T x + c (1) has a unique solution and {x(k)} converges to this solution from any x(0). Proof: Assume ρ(T ) < 1. Then (1) has unique solution x(∗).

    x(k+1) − x(∗) = T x(k) − x(∗) = T 2 x(k−1) − x(∗)   = ··· = T k+1 x(0) − x(∗) =⇒ 0.

Conversely, if ··· (omitted) Jacobi on random upper triangular matrix −1 I A = D − U. T = D U with ρ(T ) = 0.

A is randn upper triangular with n = 50 0

5

10

15

20

25

30

35

40

45

50 0 10 20 30 40 50 nz = 1275

I Convergence plot

G-S Convergence on upper triangular matrix 10 14

10 12

10 10

10 8

10 6

10 4

10 2

10 0

10 -2

10 -4 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Taking difference of two equations,

(D − ω L) x = ((1 − ω) D + ω U) x + ω b.

Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···

(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR.

converges if ρ (TSOR) < 1.

Good choice of ω is tricky, but critical for accelerated convergence

§7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite

D x = D x, ω L x = ω (D − U) x − ω b, for any ω. Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···

(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR.

converges if ρ (TSOR) < 1.

Good choice of ω is tricky, but critical for accelerated convergence

§7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite

D x = D x, ω L x = ω (D − U) x − ω b, for any ω.

Taking difference of two equations,

(D − ω L) x = ((1 − ω) D + ω U) x + ω b. converges if ρ (TSOR) < 1.

Good choice of ω is tricky, but critical for accelerated convergence

§7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite

D x = D x, ω L x = ω (D − U) x − ω b, for any ω.

Taking difference of two equations,

(D − ω L) x = ((1 − ω) D + ω U) x + ω b.

Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···

(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR. §7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite

D x = D x, ω L x = ω (D − U) x − ω b, for any ω.

Taking difference of two equations,

(D − ω L) x = ((1 − ω) D + ω U) x + ω b.

Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···

(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR.

converges if ρ (TSOR) < 1.

Good choice of ω is tricky, but critical for accelerated convergence Optimal SOR parameters

Thm: If A is symmetric positive definite and tridiagonal, then

2 ρ (TGS ) = (ρ (TJ )) < 1,

and the optimal choice of ω for the SOR method is 2 ωOPT = q 2 1 + 1 − (ρ (TJ ))

with  2 ρ (TJ ) ρ (TSOR) = ωOPT − 1 =   . q 2 1 + 1 − (ρ (TJ )) √ I ρ(TJ) = 0.625

 0 −3 0   1  1 T = D−1 (L + U) = −3 0 1 , b = 1 . J 4     0 1 0 1

I Optimal ω: 2 2 ωOPT = q = ωOPT = √ ≈ 1.24. 2 1 + 0.375 1 + 1 − (ρ (TJ ))

 4 3 0   1   0  1 A =  3 4 −1  , b =  1  , x = 3  1  0 −1 4 1 1

I If A is symmetric positive definite and tridiagonal,

 4 3  det (A) = 24, det = 7, 4 > 0. 3 4  4 3 0   1   0  1 A =  3 4 −1  , b =  1  , x = 3  1  0 −1 4 1 1

I If A is symmetric positive definite and tridiagonal,

 4 3  det (A) = 24, det = 7, 4 > 0. 3 4

√ I ρ(TJ) = 0.625

 0 −3 0   1  1 T = D−1 (L + U) = −3 0 1 , b = 1 . J 4     0 1 0 1

I Optimal ω: 2 2 ωOPT = q = ωOPT = √ ≈ 1.24. 2 1 + 0.375 1 + 1 − (ρ (TJ ))  4 3 0   1   0  1 A =  3 4 −1  , b =  1  , x = 3  1  0 −1 4 1 1

Convergence Comparision, G-S vs. SOR 10 0 G-S SOR 10 -2

10 -4

10 -6

10 -8

10 -10

10 -12

10 -14

10 -16

10 -18 0 50 100 150

I I However, big kx − bxk can still lead to small kbrk. Ex:  1 2   3  x = . 1 + 10−τ 2 3 + 10−τ

 1   3  Exact solution x = . Bad approximation x = 1 b 0 has a small residual for large τ:

 3   1 2   3   0  r = − = . b 3 + 10−τ 1 + 10−τ 2 0 −2 × 10−τ

§7.5 Error Bounds and Iterative Refinement

Assume that bx is an approximation to the solution x of A x = b. def I Residual br = b − A bx = A (x − bx) . Thus small kx − bxk implies small kbrk. §7.5 Error Bounds and Iterative Refinement

Assume that bx is an approximation to the solution x of A x = b. def I Residual br = b − A bx = A (x − bx) . Thus small kx − bxk implies small kbrk. I However, big kx − bxk can still lead to small kbrk. Ex:  1 2   3  x = . 1 + 10−τ 2 3 + 10−τ

 1   3  Exact solution x = . Bad approximation x = 1 b 0 has a small residual for large τ:

 3   1 2   3   0  r = − = . b 3 + 10−τ 1 + 10−τ 2 0 −2 × 10−τ Near Linear Dependence

For τ = 4, equations define two nearly parallel lines

`1 : x1 + 2 x2 = 3, and `2 : 1.0001 x1 + 2 x2 = 3.0001.

Parallel lines do not have intersections. I A is well-conditioned if κ (A) = O(1): small residual implies small solution error.

I A is ill-conditioned if κ (A)  1: small residual may still allow large solution error.

Let A x = b with non-singular A and non-zero b

Thm: Assume x is an approximate solution with r = b − A x. Then ¯ b b b for any natural norm,

−1 kbx − xk ≤ A kbrk , kx − xk krk b ≤ κ (A) b , kxk kbk

def where κ (A) = kAk A−1 is the condition number of A Let A x = b with non-singular A and non-zero b

Thm: Assume x is an approximate solution with r = b − A x. Then ¯ b b b for any natural norm,

−1 kbx − xk ≤ A kbrk , kx − xk krk b ≤ κ (A) b , kxk kbk

def where κ (A) = kAk A−1 is the condition number of A

I A is well-conditioned if κ (A) = O(1): small residual implies small solution error.

I A is ill-conditioned if κ (A)  1: small residual may still allow large solution error.  1 2  Ex: Condition Number for A = 1 + 10−τ 2

−τ Solution: For τ > 0, kAk∞ = 3 + 10 . Since

10τ  2 −2  A−1 = − , 2 − (1 + 10−τ ) 1

−1 τ we have A ∞ = 2 × 10 . Thus

−1 τ κ (A) = kAk∞ A ∞ = 6 × 10 + 2. κ (A) grows exponentially in τ. A is ill-conditioned for large τ. In practice,

I F (·) could be from an (in-exact) LU factorization,

F (b) = U−1 L−1 b .

I Inaccuracies in LU factorization could be due to rounding-error, A ≈ LU.

Iterative Refinement (I)

I Let A x = b with non-singular A and non-zero b.

I Let F (·) be an in-exact equation solver, so F (b) is approximate solution.

I Assume F (·) is accurate enough that there exists a ρ < 1 so

kb − A F (b)k ≤ ρ for any b 6= 0. kbk Iterative Refinement (I)

I Let A x = b with non-singular A and non-zero b.

I Let F (·) be an in-exact equation solver, so F (b) is approximate solution.

I Assume F (·) is accurate enough that there exists a ρ < 1 so

kb − A F (b)k ≤ ρ for any b 6= 0. kbk

In practice,

I F (·) could be from an (in-exact) LU factorization,

F (b) = U−1 L−1 b .

I Inaccuracies in LU factorization could be due to rounding-error, A ≈ LU. Ex: A = randn (n, n) , b = randn (n, 1) , n = 3000

I LU factorize A to get L, U, (LU without pivoting) −1 −1  I x0 = U L b , (0) I r = b − A x0, −1 −1  I ∆x1 = U L r0 ,

I r1 = r0 − A ∆x1,

I x = x0 + ∆x1

I disp (norm(r0), norm(r1))

2.6606e-07 1.0996e-16 Convergence Proof:

(k+1) (k) 2 (k−1) k+1 (0) r ≤ ρ r ≤ ρ r ≤ · · · ≤ ρ r .

Iterative Refinement (II)

Given a tolerance τ > 0 and x(0) (0) (0) I Initialize r = b − A x . I for k = 0, 1, ···

I Compute   ∆x(k) = F r(k) , x(k+1) = x(k) + ∆x(k), r(k+1) = r(k) − A ∆x(k).

(k+1) I If r ≤ τ kbk stop. Iterative Refinement (II)

Given a tolerance τ > 0 and x(0) (0) (0) I Initialize r = b − A x . I for k = 0, 1, ···

I Compute   ∆x(k) = F r(k) , x(k+1) = x(k) + ∆x(k), r(k+1) = r(k) − A ∆x(k).

(k+1) I If r ≤ τ kbk stop.

Convergence Proof:

(k+1) (k) 2 (k−1) k+1 (0) r ≤ ρ r ≤ ρ r ≤ · · · ≤ ρ r . Perturbation Theory

Thm: Let x and bx be solutions to

A x = b and (A + ∆A) bx = b + ∆b with perturbations ∆A and ∆b. Then kx − xk κ (A) k∆Ak k∆bk b ≤ + . kxk k∆Ak kAk kbk 1 − κ (A) kAk

with κ (A) = kAk A−1 . Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function

g (x) def= xT Ax − 2 xT b.

Proof: Let A x∗ = b. Then

g (x) = xT Ax − 2 xT A x∗ = (x − x∗)T A (x − x∗) − (x∗)T A (x∗) = (x − x∗)T A (x − x∗) + g (x∗) .

Thus, g (x) ≥ g (x∗) for all x; and g (x) = g (x∗) iff x = x∗.

§7.6 The Conjugate Gradient Method (CG) for A x = b Assumption: A is symmetric positive definite (SPD) T I A = A, T I x Ax ≥ 0 for any x, T I x Ax = 0 if and only if x = 0. Proof: Let A x∗ = b. Then

g (x) = xT Ax − 2 xT A x∗ = (x − x∗)T A (x − x∗) − (x∗)T A (x∗) = (x − x∗)T A (x − x∗) + g (x∗) .

Thus, g (x) ≥ g (x∗) for all x; and g (x) = g (x∗) iff x = x∗.

§7.6 The Conjugate Gradient Method (CG) for A x = b Assumption: A is symmetric positive definite (SPD) T I A = A, T I x Ax ≥ 0 for any x, T I x Ax = 0 if and only if x = 0. Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function

g (x) def= xT Ax − 2 xT b. §7.6 The Conjugate Gradient Method (CG) for A x = b Assumption: A is symmetric positive definite (SPD) T I A = A, T I x Ax ≥ 0 for any x, T I x Ax = 0 if and only if x = 0. Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function

g (x) def= xT Ax − 2 xT b.

Proof: Let A x∗ = b. Then

g (x) = xT Ax − 2 xT A x∗ = (x − x∗)T A (x − x∗) − (x∗)T A (x∗) = (x − x∗)T A (x − x∗) + g (x∗) .

Thus, g (x) ≥ g (x∗) for all x; and g (x) = g (x∗) iff x = x∗. CG for A x = b

Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function

g (x) def= xT Ax − 2 xT b.

The CG Idea: Starting from an initial vector x(0), quickly compute new vectors x(1), ··· , x(k) ··· , with         g x(0) > g x(1) > g x(2) > ··· > g x(k) > ···

so that the sequence {x(k)} will converge to x∗. (k) I Descent Method: Assume a search direction v at (k−1) iteration x , next iteration with step-size tk (k) def (k−1) (k)  (k−1) (k) x = x + tk v minimizes g x + tv .

I Optimality Condition: d    T   0 = g x(k−1) + tv(k) = v(k) ∇g x(k−1) + tv(k) d t  T     = v(k) 2A x(k−1) + tv(k) − 2 b ,

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

search direction and line search (0) I The CG Idea: Starting from an initial vector x , quickly compute new vectors x(1), ··· , x(k) ··· , with         g x(0) > g x(1) > g x(2) > ··· > g x(k) > ···

so that the sequence {x(k)} will converge to x∗. I Optimality Condition: d    T   0 = g x(k−1) + tv(k) = v(k) ∇g x(k−1) + tv(k) d t  T     = v(k) 2A x(k−1) + tv(k) − 2 b ,

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

search direction and line search (0) I The CG Idea: Starting from an initial vector x , quickly compute new vectors x(1), ··· , x(k) ··· , with         g x(0) > g x(1) > g x(2) > ··· > g x(k) > ···

so that the sequence {x(k)} will converge to x∗. (k) I Descent Method: Assume a search direction v at (k−1) iteration x , next iteration with step-size tk (k) def (k−1) (k)  (k−1) (k) x = x + tk v minimizes g x + tv . (k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

search direction and line search (0) I The CG Idea: Starting from an initial vector x , quickly compute new vectors x(1), ··· , x(k) ··· , with         g x(0) > g x(1) > g x(2) > ··· > g x(k) > ···

so that the sequence {x(k)} will converge to x∗. (k) I Descent Method: Assume a search direction v at (k−1) iteration x , next iteration with step-size tk (k) def (k−1) (k)  (k−1) (k) x = x + tk v minimizes g x + tv .

I Optimality Condition: d    T   0 = g x(k−1) + tv(k) = v(k) ∇g x(k−1) + tv(k) d t  T     = v(k) 2A x(k−1) + tv(k) − 2 b , search direction and line search (0) I The CG Idea: Starting from an initial vector x , quickly compute new vectors x(1), ··· , x(k) ··· , with         g x(0) > g x(1) > g x(2) > ··· > g x(k) > ···

so that the sequence {x(k)} will converge to x∗. (k) I Descent Method: Assume a search direction v at (k−1) iteration x , next iteration with step-size tk (k) def (k−1) (k)  (k−1) (k) x = x + tk v minimizes g x + tv .

I Optimality Condition: d    T   0 = g x(k−1) + tv(k) = v(k) ∇g x(k−1) + tv(k) d t  T     = v(k) 2A x(k−1) + tv(k) − 2 b ,

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) search direction choices For a small step-size t:

     T   g x(k−1) + tv(k) ≈ g x(k−1) + t v(k) ∇g x(k−1) .

I steepest descent: Greatest decrease in the value of g x(k−1) + tv(k):   v(k) = −∇g x(k−1) .

(i) n I A-orthogonal directions: non-zero vectors {v }i=1  T   v(i) A v(j) = 0 for all i 6= j.

A-orthogonal vectors associated with the positive definite matrix A is linearly independent. Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) .

 (1)  (1) Magic (I): minτ1 g x0 + τ1v = g x0 + t1v .

 (1) (2)  (1) (2) minτ1,τ2 g x0 + τ1v + τ2v = g x0 + t1v + t2v .

minxg (x) =  (1) (n)  (1) (n) minτ1,··· ,τn g x0 + τ1v + ··· + τnv = g x0 + t1v + ··· + tnv .

(1) (n) Thus x = x0 + t1v + ··· + tnv is solution to A x = b.

A−orthogonality Craft  (1)  (1) Magic (I): minτ1 g x0 + τ1v = g x0 + t1v .

 (1) (2)  (1) (2) minτ1,τ2 g x0 + τ1v + τ2v = g x0 + t1v + t2v .

minxg (x) =  (1) (n)  (1) (n) minτ1,··· ,τn g x0 + τ1v + ··· + τnv = g x0 + t1v + ··· + tnv .

(1) (n) Thus x = x0 + t1v + ··· + tnv is solution to A x = b.

A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) . A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) .

 (1)  (1) Magic (I): minτ1 g x0 + τ1v = g x0 + t1v .

 (1) (2)  (1) (2) minτ1,τ2 g x0 + τ1v + τ2v = g x0 + t1v + t2v .

minxg (x) =  (1) (n)  (1) (n) minτ1,··· ,τn g x0 + τ1v + ··· + τnv = g x0 + t1v + ··· + tnv .

(1) (n) Thus x = x0 + t1v + ··· + tnv is solution to A x = b. Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) .

Proof (I): Let t = (τ1, ··· , τk ). Then

g (x0 + τ1v1 + ··· + τk vk ) = g (x0) T T T T (0) +t (v1, ··· , vk ) A (v1, ··· , vk ) t − 2t (v1, ··· , vk ) r ,  T T (0) ∇t g = 2 (v1, ··· , vk ) A (v1, ··· , vk ) t − (v1, ··· , vk ) r

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) ⇐⇒ ∇t g = 0.

A−orthogonality Craft Proof (I): Let t = (τ1, ··· , τk ). Then

g (x0 + τ1v1 + ··· + τk vk ) = g (x0) T T T T (0) +t (v1, ··· , vk ) A (v1, ··· , vk ) t − 2t (v1, ··· , vk ) r ,  T T (0) ∇t g = 2 (v1, ··· , vk ) A (v1, ··· , vk ) t − (v1, ··· , vk ) r

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) ⇐⇒ ∇t g = 0.

A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) . minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) ⇐⇒ ∇t g = 0.

A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) .

Proof (I): Let t = (τ1, ··· , τk ). Then

g (x0 + τ1v1 + ··· + τk vk ) = g (x0) T T T T (0) +t (v1, ··· , vk ) A (v1, ··· , vk ) t − 2t (v1, ··· , vk ) r ,  T T (0) ∇t g = 2 (v1, ··· , vk ) A (v1, ··· , vk ) t − (v1, ··· , vk ) r A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) = g (x0 + t1v1 + ··· + tk vk ) .

Proof (I): Let t = (τ1, ··· , τk ). Then

g (x0 + τ1v1 + ··· + τk vk ) = g (x0) T T T T (0) +t (v1, ··· , vk ) A (v1, ··· , vk ) t − 2t (v1, ··· , vk ) r ,  T T (0) ∇t g = 2 (v1, ··· , vk ) A (v1, ··· , vk ) t − (v1, ··· , vk ) r

minτ1,··· ,τk g (x0 + τ1v1 + ··· + τk vk ) ⇐⇒ ∇t g = 0. Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n (k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

 (1) (k)  (1) (k) minτ1,··· ,τk g x0 + τ1v + ··· + τk v = g x0 + t1v + ··· + tk v . Proof (II): Since vectors {v(k)} are A-orthogonal   T T  T   (1) (1)  (k) (k)  (1) (k) (0) ∇t g = 2 diag v A v , ··· , v A v t − v , ··· , v r .

 T  (v(1)) (r(0)) T  (v(1)) (A v(1))     .  ∇t g = 0 ⇐⇒ t =  .  .  T   (v(k)) (r(0))  T (v(k)) (A v(k))

A−orthogonality Craft Proof (II): Since vectors {v(k)} are A-orthogonal   T T  T   (1) (1)  (k) (k)  (1) (k) (0) ∇t g = 2 diag v A v , ··· , v A v t − v , ··· , v r .

 T  (v(1)) (r(0)) T  (v(1)) (A v(1))     .  ∇t g = 0 ⇐⇒ t =  .  .  T   (v(k)) (r(0))  T (v(k)) (A v(k))

A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n (k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

 (1) (k)  (1) (k) minτ1,··· ,τk g x0 + τ1v + ··· + τk v = g x0 + t1v + ··· + tk v . A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n (k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k) Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

 (1) (k)  (1) (k) minτ1,··· ,τk g x0 + τ1v + ··· + τk v = g x0 + t1v + ··· + tk v . Proof (II): Since vectors {v(k)} are A-orthogonal   T T  T   (1) (1)  (k) (k)  (1) (k) (0) ∇t g = 2 diag v A v , ··· , v A v t − v , ··· , v r .

 T  (v(1)) (r(0)) T  (v(1)) (A v(1))     .  ∇t g = 0 ⇐⇒ t =  .  .  T   (v(k)) (r(0))  T (v(k)) (A v(k)) Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

 (1) (k)  (1) (k) minτ1,··· ,τk g x0 + τ1v + ··· + τk v = g x0 + t1v + ··· + tk v .

Proof (III): Since

 k−1  T T T  (k)  (k−1)  (k) (0) X (j)  (k)  (0) v r = v r − tj A v  = v r , so j=1

T T v(k) r(0) v(k) r(k−1) tk = = . v(k)T A v(k) v(k)T A v(k)

A−orthogonality Craft Proof (III): Since

 k−1  T T T  (k)  (k−1)  (k) (0) X (j)  (k)  (0) v r = v r − tj A v  = v r , so j=1

T T v(k) r(0) v(k) r(k−1) tk = = . v(k)T A v(k) v(k)T A v(k)

A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

 (1) (k)  (1) (k) minτ1,··· ,τk g x0 + τ1v + ··· + τk v = g x0 + t1v + ··· + tk v . A−orthogonality Craft Thm: Let non-zero vectors {v(k)} be A-orthogonal with v(1) = −r(0) and for k = 1, ··· , n

(k)T (k−1) v r (k−1) def (k−1) tk = , r = b − A x (residual). v(k)T A v(k)

Then for g (x) = xT A x − 2 xT b and for k = 1, ··· , n,

 (1) (k)  (1) (k) minτ1,··· ,τk g x0 + τ1v + ··· + τk v = g x0 + t1v + ··· + tk v .

Proof (III): Since

 k−1  T T T  (k)  (k−1)  (k) (0) X (j)  (k)  (0) v r = v r − tj A v  = v r , so j=1

T T v(k) r(0) v(k) r(k−1) tk = = . v(k)T A v(k) v(k)T A v(k) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

Assume that {v(k)} are non-zero. Then they are A−orthogonal. Induction Proof: For all 1 ≤ i < k,

 T    T   v(k) A v(i) = − r(k−1) A v(i)

k−1 (j)T (k−1) X v A r  T   + v(j) A v(i) (j)T (j) j=1 v A v  T    T   = − r(k−1) A v(i) + v(i) A r(k−1) = 0.

A−orthogonality vectors (I) Induction Proof: For all 1 ≤ i < k,

 T    T   v(k) A v(i) = − r(k−1) A v(i)

k−1 (j)T (k−1) X v A r  T   + v(j) A v(i) (j)T (j) j=1 v A v  T    T   = − r(k−1) A v(i) + v(i) A r(k−1) = 0.

A−orthogonality vectors (I)

Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

Assume that {v(k)} are non-zero. Then they are A−orthogonal. A−orthogonality vectors (I)

Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

Assume that {v(k)} are non-zero. Then they are A−orthogonal. Induction Proof: For all 1 ≤ i < k,

 T    T   v(k) A v(i) = − r(k−1) A v(i)

k−1 (j)T (k−1) X v A r  T   + v(j) A v(i) (j)T (j) j=1 v A v  T    T   = − r(k−1) A v(i) + v(i) A r(k−1) = 0. Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(j) r(k) = 0, j = 1, ··· , k; r(j) r(k) = 0, j = 1, ··· , k−1.

Proof: Due to optimality property of x(k), for all τ and for 1 ≤ j ≤ k,     g x(k) ≤ g x(k) + τ v(j)    T  T   = g x(k) − 2τ r(k) v(j) + τ 2 v(j) A v(j) .

T This is true only when r(k) v(j) = 0. Residual vector orthogonality: r(j) = linear combination of v(1), ··· , v(j+1).

A−orthogonality vectors (II) Proof: Due to optimality property of x(k), for all τ and for 1 ≤ j ≤ k,     g x(k) ≤ g x(k) + τ v(j)    T  T   = g x(k) − 2τ r(k) v(j) + τ 2 v(j) A v(j) .

T This is true only when r(k) v(j) = 0. Residual vector orthogonality: r(j) = linear combination of v(1), ··· , v(j+1).

A−orthogonality vectors (II) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(j) r(k) = 0, j = 1, ··· , k; r(j) r(k) = 0, j = 1, ··· , k−1. Residual vector orthogonality: r(j) = linear combination of v(1), ··· , v(j+1).

A−orthogonality vectors (II) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(j) r(k) = 0, j = 1, ··· , k; r(j) r(k) = 0, j = 1, ··· , k−1.

Proof: Due to optimality property of x(k), for all τ and for 1 ≤ j ≤ k,     g x(k) ≤ g x(k) + τ v(j)    T  T   = g x(k) − 2τ r(k) v(j) + τ 2 v(j) A v(j) .

T This is true only when r(k) v(j) = 0. A−orthogonality vectors (II) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(j) r(k) = 0, j = 1, ··· , k; r(j) r(k) = 0, j = 1, ··· , k−1.

Proof: Due to optimality property of x(k), for all τ and for 1 ≤ j ≤ k,     g x(k) ≤ g x(k) + τ v(j)    T  T   = g x(k) − 2τ r(k) v(j) + τ 2 v(j) A v(j) .

T This is true only when r(k) v(j) = 0. Residual vector orthogonality: r(j) = linear combination of v(1), ··· , v(j+1). Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(k) r(j) = − r(k−1) r(k−1) , j = 1, ··· , k − 1.

Proof (I): For j = k − 1,

T  k−1 T  T (j) (k−1)  (k)  (k−1) (k−1) X v A r (j)  (k−1) v r = −r + v  r (j)T (j) j=1 v A v  T   = − r(k−1) r(k−1) .

A−orthogonality vectors (III) Proof (I): For j = k − 1,

T  k−1 T  T (j) (k−1)  (k)  (k−1) (k−1) X v A r (j)  (k−1) v r = −r + v  r (j)T (j) j=1 v A v  T   = − r(k−1) r(k−1) .

A−orthogonality vectors (III) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(k) r(j) = − r(k−1) r(k−1) , j = 1, ··· , k − 1. A−orthogonality vectors (III) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(k) r(j) = − r(k−1) r(k−1) , j = 1, ··· , k − 1.

Proof (I): For j = k − 1,

T  k−1 T  T (j) (k−1)  (k)  (k−1) (k−1) X v A r (j)  (k−1) v r = −r + v  r (j)T (j) j=1 v A v  T   = − r(k−1) r(k−1) . Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(k) r(j) = − r(k−1) r(k−1) , j = 1, ··· , k − 1.

Proof (II): For j < k − 1  T    T    T   v(k) r(j) = v(k) r(k−1) + v(k) r(j) − r(k−1)

 k−1  T T  (k)  (k−1)  (k) X (i) = v r + v  ti A v  i=j+1  T   = − r(k−1) r(k−1) .

A−orthogonality vectors (III) Proof (II): For j < k − 1  T    T    T   v(k) r(j) = v(k) r(k−1) + v(k) r(j) − r(k−1)

 k−1  T T  (k)  (k−1)  (k) X (i) = v r + v  ti A v  i=j+1  T   = − r(k−1) r(k−1) .

A−orthogonality vectors (III) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(k) r(j) = − r(k−1) r(k−1) , j = 1, ··· , k − 1. A−orthogonality vectors (III) Thm: Set v(1) = −r(0), and for k = 2, ··· , n

k−1 (j)T (k−1) X v A r v(k) = −r(k−1) + v(j). (j)T (j) j=1 v A v

(k) (0) (1) (k) (k) (k) Let x = x + t1v + ··· + tk v and r = b − A x . Then  T    T   v(k) r(j) = − r(k−1) r(k−1) , j = 1, ··· , k − 1.

Proof (II): For j < k − 1  T    T    T   v(k) r(j) = v(k) r(k−1) + v(k) r(j) − r(k−1)

 k−1  T T  (k)  (k−1)  (k) X (i) = v r + v  ti A v  i=j+1  T   = − r(k−1) r(k−1) . Set v(1) = −r(0), and for k = 2, ··· , n, write

k−1 (k)T (j) X v r v(k) = r(j). (j)T (j) j=0 r r Then

k−1 (k−1)T (k−1) X r r v(k) = − r(j) (j)T (j) j=0 r r (k−1)T (k−1) k−2 (k−2)T (k−2) r r X r r = −r(k−1) − r(j) (k−2)T (k−2) (j)T (j) r r j=0 r r (k−1) (k−1) = −r + sk−1 v ,

T (r(k−1)) (r(k−1)) with sk−1 = T . (r(k−2)) (r(k−2))

A−orthogonality: A Gift from Math God A−orthogonality: A Gift from Math God

Set v(1) = −r(0), and for k = 2, ··· , n, write

k−1 (k)T (j) X v r v(k) = r(j). (j)T (j) j=0 r r Then

k−1 (k−1)T (k−1) X r r v(k) = − r(j) (j)T (j) j=0 r r (k−1)T (k−1) k−2 (k−2)T (k−2) r r X r r = −r(k−1) − r(j) (k−2)T (k−2) (j)T (j) r r j=0 r r (k−1) (k−1) = −r + sk−1 v ,

T (r(k−1)) (r(k−1)) with sk−1 = T . (r(k−2)) (r(k−2)) (i) n (1) (0) Thm: Let {v }i=1 be A-orthogonal with v = −r and for k = 1, ··· , n

(k)T (k−1) (k−1)T (k−1) v r r r (k) def (k−1) (k) tk = = − , x = x +tk v . v(k)T A v(k) v(k)T A v(k)

Then A x(n) = b in exact arithmetic. Conjugate Gradient Algorithm

Thm: For k = 1, ··· , n, define,

(k−1)T (k−1) (k) (k−1) (k−1) r r v = −r + sk−1 v with sk−1 = , r(k−2)T r(k−2) (k−1)T (k−1) (k) (k−1) (k) r r x = x + tk v with tk = − . v(k)T A v(k)

Then vectors {v(k)} are A-orthogonal and A x(n) = b in exact arithmetic.

The CG Algorithm: C is for Craft, G is for Gift. Algorithm 1 Conjugate Gradient Algorithm n×n n Input: Symmetric positive definite A ∈ R , b ∈ R , (0) n initial guess x ∈ R , and tolerance τ > 0. Output: Approximate solution x. Algorithm: Initialize: r(0) = b − A x(0), v(0) = −r(0), k = 1 (k−1) while r 2 ≥ τ do T (r(k−1)) (r(k−1)) tk = − T . (v(k)) (A v(k)) (k) (k−1) (k) x = x + tk v (k) (k−1) (k) r = r − tk A v . T (r(k)) (r(k)) sk = T . (r(k−1)) (r(k−1)) (k+1) (k) (k) v = −r + sk v . k = k + 1. end while