0 Matrix splitting a2,1 0 . . .. A = diag (a1,1, a2,2, ··· , an,n) + . . . an−1,1 an−1,2 ··· 0 an,1 an,2 ··· an,n−1 0 0 a1,2 ··· a1,n−1 a1,n 0 ··· a2,n−1 a2,n .. . . + . . . 0 an−1,n 0 def = D − L − U = − − .
§7.3 The Jacobi and Gauss-Siedel Iterative Techniques n×n I Problem: To solve Ax = b for A ∈ R . I Methodology: Iteratively approximate solution x. No GEPP. def = D − L − U = − − .
§7.3 The Jacobi and Gauss-Siedel Iterative Techniques n×n I Problem: To solve Ax = b for A ∈ R . I Methodology: Iteratively approximate solution x. No GEPP. 0 Matrix splitting a2,1 0 . . .. A = diag (a1,1, a2,2, ··· , an,n) + . . . an−1,1 an−1,2 ··· 0 an,1 an,2 ··· an,n−1 0 0 a1,2 ··· a1,n−1 a1,n 0 ··· a2,n−1 a2,n .. . . + . . . 0 an−1,n 0 §7.3 The Jacobi and Gauss-Siedel Iterative Techniques n×n I Problem: To solve Ax = b for A ∈ R . I Methodology: Iteratively approximate solution x. No GEPP. 0 Matrix splitting a2,1 0 . . .. A = diag (a1,1, a2,2, ··· , an,n) + . . . an−1,1 an−1,2 ··· 0 an,1 an,2 ··· an,n−1 0 0 a1,2 ··· a1,n−1 a1,n 0 ··· a2,n−1 a2,n .. . . + . . . 0 an−1,n 0 def = D − L − U = − − . 10 −1 2 0 −1 11 −1 3 Ex: Matrix splitting for A = 2 −1 10 −1 0 3 −1 8
A = − −
0 0 1 −2 0 1 0 0 1 −3 = diag (10, 11, 10, 8) − − −2 1 0 0 1 0 −3 1 0 0 Gauss-Siedel Method: Rewrite
x = (D − L)−1 U x + (D − L)−1 b.
Gauss-Siedel iteration with given x(0),
x(k+1) = (D − L)−1 U x(k) + (D − L)−1 b, for k = 0, 1, 2, ··· .
The Jacobi and Gauss-Siedel Methods for solving Ax = b
Jacobi Method: With matrix splitting A = D − L − U, rewrite
x = D−1 (L + U) x + D−1 b.
Jacobi iteration with given x(0),
x(k+1) = D−1 (L + U) x(k) + D−1 b, for k = 0, 1, 2, ··· . The Jacobi and Gauss-Siedel Methods for solving Ax = b
Jacobi Method: With matrix splitting A = D − L − U, rewrite
x = D−1 (L + U) x + D−1 b.
Jacobi iteration with given x(0),
x(k+1) = D−1 (L + U) x(k) + D−1 b, for k = 0, 1, 2, ··· .
Gauss-Siedel Method: Rewrite
x = (D − L)−1 U x + (D − L)−1 b.
Gauss-Siedel iteration with given x(0),
x(k+1) = (D − L)−1 U x(k) + (D − L)−1 b, for k = 0, 1, 2, ··· . Jacobi iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 (k) −1 xJ = D (L + U) xJ + D b 1 2 6 0 10 − 10 0 10 1 1 3 25 11 0 11 − 11 (k) 11 = 2 1 1 xJ + 11 − 10 10 0 10 − 10 3 1 15 0 − 8 8 0 8
Ex: Jacobi Method for Ax = b, with 10 −1 2 0 6 −1 11 −1 3 25 A = , b = 2 −1 10 −1 −11 0 3 −1 8 15 A = D − L − U 0 0 1 −2 0 1 0 0 1 −3 = diag (10, 11, 10, 8) − − . −2 1 0 0 1 0 −3 1 0 0 Ex: Jacobi Method for Ax = b, with 10 −1 2 0 6 −1 11 −1 3 25 A = , b = 2 −1 10 −1 −11 0 3 −1 8 15 A = D − L − U 0 0 1 −2 0 1 0 0 1 −3 = diag (10, 11, 10, 8) − − . −2 1 0 0 1 0 −3 1 0 0 Jacobi iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 (k) −1 xJ = D (L + U) xJ + D b 1 2 6 0 10 − 10 0 10 1 1 3 25 11 0 11 − 11 (k) 11 = 2 1 1 xJ + 11 − 10 10 0 10 − 10 3 1 15 0 − 8 8 0 8 Gauss-Siedel iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 −1 xGS = (D − L) U xGS + (D − L) b 10 −1 0 1 −2 0 −1 11 0 1 −3 (k) = x 2 −1 10 0 1 GS 0 3 −1 8 0 6 10 25 11 + 11 . − 10 15 8
Ex: Gauss-Siedel Method for Ax = b A = D − L − U 10 0 1 −2 0 −1 11 0 1 −3 = − . 2 −1 10 0 1 0 3 −1 8 0 Ex: Gauss-Siedel Method for Ax = b A = D − L − U 10 0 1 −2 0 −1 11 0 1 −3 = − . 2 −1 10 0 1 0 3 −1 8 0 Gauss-Siedel iteration with x(0) = 0, for k = 0, 1, 2, ··· (k+1) −1 −1 xGS = (D − L) U xGS + (D − L) b 10 −1 0 1 −2 0 −1 11 0 1 −3 (k) = x 2 −1 10 0 1 GS 0 3 −1 8 0 6 10 25 11 + 11 . − 10 15 8 1 2 Jacobi vs. Gauss-Siedel: solution x = . −1 1
Convergence Comparision, Jacobi vs. G-S 10 0 Jacobi G-S 10 -2
10 -4
10 -6
10 -8
10 -10
10 -12
10 -14
10 -16 0 5 10 15 20 25 30 35 40 45 50 General Iteration Methods
To solve A x = b with matrix splitting A = D − L − U,
I Jacobi Method:
(k+1) −1 (k) −1 xJ = D (L + U) xJ + D b.
I Gauss-Siedel Method:
(k+1) −1 (k) −1 xGS = (D − L) U xGS + (D − L) b.
General Iteration Method: for k = 0, 1, 2, ···
x(k+1) = T x(k) + c.
Next: convergence analysis on General Iteration Method Proof: Assume ρ(T ) < 1. Then (1) has unique solution x(∗).
x(k+1) − x(∗) = T x(k) − x(∗) = T 2 x(k−1) − x(∗) = ··· = T k+1 x(0) − x(∗) =⇒ 0.
Conversely, if ··· (omitted)
General Iteration: x(k+1) = T x(k) + c for k = 0, 1, 2, ···
Thm: The following statements are equivalent
I ρ(T ) < 1.
I The equation x = T x + c (1) has a unique solution and {x(k)} converges to this solution from any x(0). General Iteration: x(k+1) = T x(k) + c for k = 0, 1, 2, ···
Thm: The following statements are equivalent
I ρ(T ) < 1.
I The equation x = T x + c (1) has a unique solution and {x(k)} converges to this solution from any x(0). Proof: Assume ρ(T ) < 1. Then (1) has unique solution x(∗).
x(k+1) − x(∗) = T x(k) − x(∗) = T 2 x(k−1) − x(∗) = ··· = T k+1 x(0) − x(∗) =⇒ 0.
Conversely, if ··· (omitted) Jacobi on random upper triangular matrix −1 I A = D − U. T = D U with ρ(T ) = 0.
A is randn upper triangular with n = 50 0
5
10
15
20
25
30
35
40
45
50 0 10 20 30 40 50 nz = 1275
I Convergence plot
G-S Convergence on upper triangular matrix 10 14
10 12
10 10
10 8
10 6
10 4
10 2
10 0
10 -2
10 -4 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Taking difference of two equations,
(D − ω L) x = ((1 − ω) D + ω U) x + ω b.
Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···
(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR.
converges if ρ (TSOR) < 1.
Good choice of ω is tricky, but critical for accelerated convergence
§7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite
D x = D x, ω L x = ω (D − U) x − ω b, for any ω. Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···
(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR.
converges if ρ (TSOR) < 1.
Good choice of ω is tricky, but critical for accelerated convergence
§7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite
D x = D x, ω L x = ω (D − U) x − ω b, for any ω.
Taking difference of two equations,
(D − ω L) x = ((1 − ω) D + ω U) x + ω b. converges if ρ (TSOR) < 1.
Good choice of ω is tricky, but critical for accelerated convergence
§7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite
D x = D x, ω L x = ω (D − U) x − ω b, for any ω.
Taking difference of two equations,
(D − ω L) x = ((1 − ω) D + ω U) x + ω b.
Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···
(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR. §7.4 Relaxation Techniques for Solving Linear Systems To solve A x = b with matrix splitting A = D − L − U, rewrite
D x = D x, ω L x = ω (D − U) x − ω b, for any ω.
Taking difference of two equations,
(D − ω L) x = ((1 − ω) D + ω U) x + ω b.
Successive Over-Relaxation (SOR), for k = 0, 1, 2, ···
(k+1) −1 (k) −1 xSOR = (D − ω L) ((1 − ω) D + ω U) xSOR + ω (D − ω L) b def (k) = TSOR xSOR + cSOR.
converges if ρ (TSOR) < 1.
Good choice of ω is tricky, but critical for accelerated convergence Optimal SOR parameters
Thm: If A is symmetric positive definite and tridiagonal, then
2 ρ (TGS ) = (ρ (TJ )) < 1,
and the optimal choice of ω for the SOR method is 2 ωOPT = q 2 1 + 1 − (ρ (TJ ))
with 2 ρ (TJ ) ρ (TSOR) = ωOPT − 1 = . q 2 1 + 1 − (ρ (TJ )) √ I ρ(TJ) = 0.625
0 −3 0 1 1 T = D−1 (L + U) = −3 0 1 , b = 1 . J 4 0 1 0 1
I Optimal ω: 2 2 ωOPT = q = ωOPT = √ ≈ 1.24. 2 1 + 0.375 1 + 1 − (ρ (TJ ))
4 3 0 1 0 1 A = 3 4 −1 , b = 1 , x = 3 1 0 −1 4 1 1
I If A is symmetric positive definite and tridiagonal,
4 3 det (A) = 24, det = 7, 4 > 0. 3 4 4 3 0 1 0 1 A = 3 4 −1 , b = 1 , x = 3 1 0 −1 4 1 1
I If A is symmetric positive definite and tridiagonal,
4 3 det (A) = 24, det = 7, 4 > 0. 3 4
√ I ρ(TJ) = 0.625
0 −3 0 1 1 T = D−1 (L + U) = −3 0 1 , b = 1 . J 4 0 1 0 1
I Optimal ω: 2 2 ωOPT = q = ωOPT = √ ≈ 1.24. 2 1 + 0.375 1 + 1 − (ρ (TJ )) 4 3 0 1 0 1 A = 3 4 −1 , b = 1 , x = 3 1 0 −1 4 1 1
Convergence Comparision, G-S vs. SOR 10 0 G-S SOR 10 -2
10 -4
10 -6
10 -8
10 -10
10 -12
10 -14
10 -16
10 -18 0 50 100 150
I I However, big kx − bxk can still lead to small kbrk. Ex: 1 2 3 x = . 1 + 10−τ 2 3 + 10−τ
1 3 Exact solution x = . Bad approximation x = 1 b 0 has a small residual for large τ:
3 1 2 3 0 r = − = . b 3 + 10−τ 1 + 10−τ 2 0 −2 × 10−τ
§7.5 Error Bounds and Iterative Refinement
Assume that bx is an approximation to the solution x of A x = b. def I Residual br = b − A bx = A (x − bx) . Thus small kx − bxk implies small kbrk. §7.5 Error Bounds and Iterative Refinement
Assume that bx is an approximation to the solution x of A x = b. def I Residual br = b − A bx = A (x − bx) . Thus small kx − bxk implies small kbrk. I However, big kx − bxk can still lead to small kbrk. Ex: 1 2 3 x = . 1 + 10−τ 2 3 + 10−τ
1 3 Exact solution x = . Bad approximation x = 1 b 0 has a small residual for large τ:
3 1 2 3 0 r = − = . b 3 + 10−τ 1 + 10−τ 2 0 −2 × 10−τ Near Linear Dependence
For τ = 4, equations define two nearly parallel lines
`1 : x1 + 2 x2 = 3, and `2 : 1.0001 x1 + 2 x2 = 3.0001.
Parallel lines do not have intersections. I A is well-conditioned if κ (A) = O(1): small residual implies small solution error.
I A is ill-conditioned if κ (A) 1: small residual may still allow large solution error.
Let A x = b with non-singular A and non-zero b
Thm: Assume x is an approximate solution with r = b − A x. Then ¯ b b b for any natural norm,
−1 kbx − xk ≤ A kbrk , kx − xk krk b ≤ κ (A) b , kxk kbk
def where κ (A) = kAk A−1 is the condition number of A Let A x = b with non-singular A and non-zero b
Thm: Assume x is an approximate solution with r = b − A x. Then ¯ b b b for any natural norm,
−1 kbx − xk ≤ A kbrk , kx − xk krk b ≤ κ (A) b , kxk kbk
def where κ (A) = kAk A−1 is the condition number of A
I A is well-conditioned if κ (A) = O(1): small residual implies small solution error.
I A is ill-conditioned if κ (A) 1: small residual may still allow large solution error. 1 2 Ex: Condition Number for A = 1 + 10−τ 2
−τ Solution: For τ > 0, kAk∞ = 3 + 10 . Since
10τ 2 −2 A−1 = − , 2 − (1 + 10−τ ) 1
−1 τ we have A ∞ = 2 × 10 . Thus
−1 τ κ (A) = kAk∞ A ∞ = 6 × 10 + 2. κ (A) grows exponentially in τ. A is ill-conditioned for large τ. In practice,
I F (·) could be from an (in-exact) LU factorization,
F (b) = U−1 L−1 b .
I Inaccuracies in LU factorization could be due to rounding-error, A ≈ LU.
Iterative Refinement (I)
I Let A x = b with non-singular A and non-zero b.
I Let F (·) be an in-exact equation solver, so F (b) is approximate solution.
I Assume F (·) is accurate enough that there exists a ρ < 1 so
kb − A F (b)k ≤ ρ for any b 6= 0. kbk Iterative Refinement (I)
I Let A x = b with non-singular A and non-zero b.
I Let F (·) be an in-exact equation solver, so F (b) is approximate solution.
I Assume F (·) is accurate enough that there exists a ρ < 1 so
kb − A F (b)k ≤ ρ for any b 6= 0. kbk
In practice,
I F (·) could be from an (in-exact) LU factorization,
F (b) = U−1 L−1 b .
I Inaccuracies in LU factorization could be due to rounding-error, A ≈ LU. Ex: A = randn (n, n) , b = randn (n, 1) , n = 3000
I LU factorize A to get L, U, (LU without pivoting) −1 −1 I x0 = U L b , (0) I r = b − A x0, −1 −1 I ∆x1 = U L r0 ,
I r1 = r0 − A ∆x1,
I x = x0 + ∆x1
I disp (norm(r0), norm(r1))
2.6606e-07 1.0996e-16 Convergence Proof:
(k+1) (k) 2 (k−1) k+1 (0) r ≤ ρ r ≤ ρ r ≤ · · · ≤ ρ r .
Iterative Refinement (II)
Given a tolerance τ > 0 and x(0) (0) (0) I Initialize r = b − A x . I for k = 0, 1, ···
I Compute ∆x(k) = F r(k) , x(k+1) = x(k) + ∆x(k), r(k+1) = r(k) − A ∆x(k).
(k+1) I If r ≤ τ kbk stop. Iterative Refinement (II)
Given a tolerance τ > 0 and x(0) (0) (0) I Initialize r = b − A x . I for k = 0, 1, ···
I Compute ∆x(k) = F r(k) , x(k+1) = x(k) + ∆x(k), r(k+1) = r(k) − A ∆x(k).
(k+1) I If r ≤ τ kbk stop.
Convergence Proof:
(k+1) (k) 2 (k−1) k+1 (0) r ≤ ρ r ≤ ρ r ≤ · · · ≤ ρ r . Perturbation Theory
Thm: Let x and bx be solutions to
A x = b and (A + ∆A) bx = b + ∆b with perturbations ∆A and ∆b. Then kx − xk κ (A) k∆Ak k∆bk b ≤ + . kxk k∆Ak kAk kbk 1 − κ (A) kAk
with κ (A) = kAk A−1 . Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function
g (x) def= xT Ax − 2 xT b.
Proof: Let A x∗ = b. Then
g (x) = xT Ax − 2 xT A x∗ = (x − x∗)T A (x − x∗) − (x∗)T A (x∗) = (x − x∗)T A (x − x∗) + g (x∗) .
Thus, g (x) ≥ g (x∗) for all x; and g (x) = g (x∗) iff x = x∗.
§7.6 The Conjugate Gradient Method (CG) for A x = b Assumption: A is symmetric positive definite (SPD) T I A = A, T I x Ax ≥ 0 for any x, T I x Ax = 0 if and only if x = 0. Proof: Let A x∗ = b. Then
g (x) = xT Ax − 2 xT A x∗ = (x − x∗)T A (x − x∗) − (x∗)T A (x∗) = (x − x∗)T A (x − x∗) + g (x∗) .
Thus, g (x) ≥ g (x∗) for all x; and g (x) = g (x∗) iff x = x∗.
§7.6 The Conjugate Gradient Method (CG) for A x = b Assumption: A is symmetric positive definite (SPD) T I A = A, T I x Ax ≥ 0 for any x, T I x Ax = 0 if and only if x = 0. Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function
g (x) def= xT Ax − 2 xT b. §7.6 The Conjugate Gradient Method (CG) for A x = b Assumption: A is symmetric positive definite (SPD) T I A = A, T I x Ax ≥ 0 for any x, T I x Ax = 0 if and only if x = 0. Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function
g (x) def= xT Ax − 2 xT b.
Proof: Let A x∗ = b. Then
g (x) = xT Ax − 2 xT A x∗ = (x − x∗)T A (x − x∗) − (x∗)T A (x∗) = (x − x∗)T A (x − x∗) + g (x∗) .
Thus, g (x) ≥ g (x∗) for all x; and g (x) = g (x∗) iff x = x∗. CG for A x = b
Thm: The vector x∗ solves the SPD equations A x = b if and only if it minimizes function
g (x) def= xT Ax − 2 xT b.
The CG Idea: Starting from an initial vector x(0), quickly compute new vectors x(1), ··· , x(k) ··· , with g x(0) > g x(1) > g x(2) > ··· > g x(k) > ···
so that the sequence {x(k)} will converge to x∗. (k) I Descent Method: Assume a search direction v at (k−1) iteration x , next iteration with step-size tk (k) def (k−1) (k) (k−1) (k) x = x + tk v minimizes g x + tv .
I Optimality Condition: d T 0 = g x(k−1) + tv(k) = v(k) ∇g x(k−1) + tv(k) d t T = v(k) 2A x(k−1) + tv(k) − 2 b ,