Conjugate Gradient Method (Part 4) Pre-Conditioning Nonlinear Conjugate Gradient Method
Total Page:16
File Type:pdf, Size:1020Kb
18-660: Numerical Methods for Engineering Design and Optimization Xin Li Department of ECE Carnegie Mellon University Pittsburgh, PA 15213 Slide 1 Overview Conjugate Gradient Method (Part 4) Pre-conditioning Nonlinear conjugate gradient method Slide 2 Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate ( ) ( ) ( ) D 0 = R 0 = B − AX 0 Step 3: update solution D(k )T R(k ) X (k +1) = X (k ) + µ (k )D(k ) where µ (k ) = D(k )T AD(k ) Step 4: calculate residual ( + ) ( ) ( ) ( ) R k 1 = R k − µ k AD k Step 5: determine search direction R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 3 Convergence Rate k ( + ) κ(A) −1 ( ) X k 1 − X ≤ ⋅ X 0 − X κ(A) +1 Conjugate gradient method has slow convergence if κ(A) is large I.e., AX = B is ill-conditioned In this case, we want to improve convergence rate by pre- conditioning Slide 4 Pre-Conditioning Key idea Convert AX = B to another equivalent equation ÃX̃ = B̃ Solve ÃX̃ = B̃ by conjugate gradient method Important constraints to construct ÃX̃ = B̃ à is symmetric and positive definite – so that we can solve it by conjugate gradient method à has a small condition number – so that we can achieve fast convergence Slide 5 Pre-Conditioning AX = B −1 −1 L A⋅ X = L B L−1 AL−T ⋅ LT X = L−1B A X B ̃ ̃ ̃ L−1AL−T is symmetric and positive definite, if A is symmetric and positive definite T (L−1 AL−T ) = L−1 AL−T T X T L−1 AL−T X = (L−T X ) ⋅ A⋅(L−T X )> 0 Slide 6 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ L−1AL−T has a small condition number, if L is properly selected In theory, L can be optimally found by Cholesky decomposition = T A LL −1 −T −1 T −T L AL = L ⋅ LL ⋅ L = I (Identify matrix) However, Cholesky decomposition is not efficient for large, sparse problems If we know Cholesky decomposition, we almost solve the equation – no need to use conjugate gradient method Slide 7 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ In practice, L can be constructed in many possible ways Diagonal pre-conditioning (or Jacobi pre-conditioning) Scale A along coordinate axes a 11 L = a22 Slide 8 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ Incomplete Cholesky pre-conditioning × = × × L × × L is lower-triangular Few or no fill-ins are allowed A ≈ LLT (not exactly equal) Slide 9 Pre-Conditioning Step 1: start from an initial guess X̃(0), and set k = 0 Step 2: calculate ~ ~ − − − ~ D(0) = R (0) = L 1B − L 1 AL T X (0) Step 3: update solution ~ (k )T ~(k ) ~ + ~ ~ D R (k 1) (k ) ~(k ) (k ) ~(k ) X = X + µ D where µ = ~ − − ~ D(k )T L 1 AL T D(k ) Step 4: calculate residual ~( + ) ~( ) ( ) − − ~ ( ) R k 1 = R k − µ~ k L 1 AL T D k Step 5: determine search direction ~(k +1)T ~(k +1) ~ + ~ + ~ ~ ~ R R (k 1) = (k 1) + β (k ) β = D R k +1,k D where k +1,k ~ (k )T ~(k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 10 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B à X̃ B̃ ~ (0) ~(0) −1 −1 −T ~ (0) D = R = L B − L AL X ~ (k )T ~(k ) ~ (k +1) ~ (k ) ~(k ) ~ (k ) ~(k ) D R X = X + µ D where µ = ~ − − ~ D(k )T L 1 AL T D(k ) ~ + ~ − − ~ R (k 1) = R (k ) − µ~(k )L 1 AL T D(k ) ~(k +1)T ~(k +1) ~ (k +1) ~(k +1) ~ ~ (k ) ~ R R D = R + βk +1,k D where βk +1,k = ~ ~ D(k )T R (k ) L−1 should not be explicitly computed Instead, Y = L−1W or Y = L−TW (where W is a vector) should be computed by solving linear equation LY = W or LTY = W Slide 11 Pre-Conditioning Diagonal pre-conditioning L is a diagonal matrix Y = L−1W or Y = L−TW can be found by simply scaling a 11 ⋅ = a22 Y W y1 = w1 a11 y2 = w2 a22 Slide 12 Pre-Conditioning Incomplete Cholesky pre-conditioning L is lower-triangular Y = L−1W or Y = L−TW can be found by backward substitution l11 ⋅ = l21 l22 Y W l31 l32 y1 = w1 l11 y2 = (w2 − l21 y1 ) l22 Slide 13 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ Once X̃ is known, X is calculated as X = L−TX̃ Solve linear equation L−TX = X̃ by backward substitution a11 0 0 l11 l21 l31 ~ ~ ⋅ = ⋅ = a22 0 X X l22 l32 X X ~ ~ x1 = x1 a11 xN = xN lNN ~ ~ x − = (x − − l − x ) l − − x2 = x2 a22 N 1 N 1 N ,N 1 N N 1,N 1 Diagonal pre-conditioning Incomplete Cholesky pre- conditioning Slide 14 Nonlinear Conjugate Gradient Method Conjugate gradient method can be extended to general (i.e., non-quadratic) unconstrained nonlinear optimization 1 min X T AX − BT X + C min f (X ) X 2 X Nonlinear programming X = A−1B Quadratic programming A number of changes must be made to solve nonlinear optimization problems Slide 15 Nonlinear Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate ( ) ( ) ( ) D 0 = R 0 = B − AX 0 Step 3: update solution D(k )T R(k ) X (k +1) = X (k ) + µ (k )D(k ) where µ (k ) = D(k )T AD(k ) Step 4: calculate residual ( + ) ( ) ( ) ( ) R k 1 = R k − µ k AD k Step 5: determine search direction R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 16 Nonlinear Conjugate Gradient Method New definition of residual D(0) = R(0) = B − AX (0) R(k ) = −∇f [X (k ) ] R(k +1) = R(k ) − µ (k ) AD(k ) Quadratic programming Nonlinear programming “Residual” is defined by the gradient of f(X) If X* is optimal, ∇f(X*) = 0 −∇f(X*) = B − AX for quadratic programming Slide 17 Nonlinear Conjugate Gradient Method New formula for conjugate search directions R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Quadratic programming Ideally, search directions should be computed by Gram- Schmidt conjugation of residues In practice, we often use approximate formulas R(k +1)T R(k +1) R(k +1)T ⋅[R(k +1) − R(k ) ] β + = β + = k 1,k R(k )T R(k ) k 1,k R(k )T R(k ) Fletcher-Reeves formula Polak-Ribiere formula Slide 18 Nonlinear Conjugate Gradient Method Optimal step size calculated by one-dimensional search D(k )T R(k ) (k +1) = (k ) + µ (k ) (k ) µ (k ) = X X D where (k )T (k ) D AD Quadratic programming µ(k) cannot be calculated analytically Optimize µ(k) by one-dimensional search min f [X (k +1) ]= f [X (k ) + µ (k )D(k ) ] µ (k ) Slide 19 Nonlinear Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate (0) (0) (0) D = R = −∇f [X ] Step 3: update solution ( ) ( ) ( ) min f [X k + µ k D k ] (k +1) (k ) (k ) (k ) µ (k ) X = X + µ D Step 4: calculate residual ( + ) ( + ) R k 1 = −∇f [X k 1 ] Step 5: determine search direction (Fletcher-Reeves formula) (k +1)T (k +1) R R (k +1) (k +1) (k ) β = D = R + β D k +1,k (k )T (k ) k +1,k R R Step 6: set k = k + 1 and go to Step 3 Slide 20 Nonlinear Conjugate Gradient Method Gradient method, conjugate gradient method and Newton method Conjugate gradient method is often preferred for many practical large-scale engineering problems Conjugate Gradient Newton Gradient 1st-Order Derivative Yes Yes Yes 2nd-Order Derivative No No Yes Pre-conditioning No Yes No Cost per Iteration Low Low High Convergence Rate Slow Fast Fast Preferred Problem Size Large Large Small Slide 21 Summary Conjugate gradient method (Part 4) Pre-conditioning Nonlinear conjugate gradient method Slide 22 .