Conjugate Gradient Method (Part 4)  Pre-Conditioning  Nonlinear Conjugate Gradient Method

18-660: Numerical Methods for Engineering Design and Optimization Xin Li Department of ECE Carnegie Mellon University Pittsburgh, PA 15213 Slide 1 Overview Conjugate Gradient Method (Part 4) Pre-conditioning Nonlinear conjugate gradient method Slide 2 Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate ( ) ( ) ( ) D 0 = R 0 = B − AX 0 Step 3: update solution D(k )T R(k ) X (k +1) = X (k ) + µ (k )D(k ) where µ (k ) = D(k )T AD(k ) Step 4: calculate residual ( + ) ( ) ( ) ( ) R k 1 = R k − µ k AD k Step 5: determine search direction R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 3 Convergence Rate k ( + ) κ(A) −1 ( ) X k 1 − X ≤ ⋅ X 0 − X κ(A) +1 Conjugate gradient method has slow convergence if κ(A) is large I.e., AX = B is ill-conditioned In this case, we want to improve convergence rate by pre- conditioning Slide 4 Pre-Conditioning Key idea Convert AX = B to another equivalent equation ÃX̃ = B̃ Solve ÃX̃ = B̃ by conjugate gradient method Important constraints to construct ÃX̃ = B̃ Ã is symmetric and positive definite – so that we can solve it by conjugate gradient method Ã has a small condition number – so that we can achieve fast convergence Slide 5 Pre-Conditioning AX = B −1 −1 L A⋅ X = L B L−1 AL−T ⋅ LT X = L−1B A X B ̃ ̃ ̃ L−1AL−T is symmetric and positive definite, if A is symmetric and positive definite T (L−1 AL−T ) = L−1 AL−T T X T L−1 AL−T X = (L−T X ) ⋅ A⋅(L−T X )> 0 Slide 6 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ L−1AL−T has a small condition number, if L is properly selected In theory, L can be optimally found by Cholesky decomposition = T A LL −1 −T −1 T −T L AL = L ⋅ LL ⋅ L = I (Identify matrix) However, Cholesky decomposition is not efficient for large, sparse problems If we know Cholesky decomposition, we almost solve the equation – no need to use conjugate gradient method Slide 7 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ In practice, L can be constructed in many possible ways Diagonal pre-conditioning (or Jacobi pre-conditioning) Scale A along coordinate axes a 11 L = a22 Slide 8 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ Incomplete Cholesky pre-conditioning × = × × L × × L is lower-triangular Few or no fill-ins are allowed A ≈ LLT (not exactly equal) Slide 9 Pre-Conditioning Step 1: start from an initial guess X̃(0), and set k = 0 Step 2: calculate ~ ~ − − − ~ D(0) = R (0) = L 1B − L 1 AL T X (0) Step 3: update solution ~ (k )T ~(k ) ~ + ~ ~ D R (k 1) (k ) ~(k ) (k ) ~(k ) X = X + µ D where µ = ~ − − ~ D(k )T L 1 AL T D(k ) Step 4: calculate residual ~( + ) ~( ) ( ) − − ~ ( ) R k 1 = R k − µ~ k L 1 AL T D k Step 5: determine search direction ~(k +1)T ~(k +1) ~ + ~ + ~ ~ ~ R R (k 1) = (k 1) + β (k ) β = D R k +1,k D where k +1,k ~ (k )T ~(k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 10 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B Ã X̃ B̃ ~ (0) ~(0) −1 −1 −T ~ (0) D = R = L B − L AL X ~ (k )T ~(k ) ~ (k +1) ~ (k ) ~(k ) ~ (k ) ~(k ) D R X = X + µ D where µ = ~ − − ~ D(k )T L 1 AL T D(k ) ~ + ~ − − ~ R (k 1) = R (k ) − µ~(k )L 1 AL T D(k ) ~(k +1)T ~(k +1) ~ (k +1) ~(k +1) ~ ~ (k ) ~ R R D = R + βk +1,k D where βk +1,k = ~ ~ D(k )T R (k ) L−1 should not be explicitly computed Instead, Y = L−1W or Y = L−TW (where W is a vector) should be computed by solving linear equation LY = W or LTY = W Slide 11 Pre-Conditioning Diagonal pre-conditioning L is a diagonal matrix Y = L−1W or Y = L−TW can be found by simply scaling a 11 ⋅ = a22 Y W y1 = w1 a11 y2 = w2 a22 Slide 12 Pre-Conditioning Incomplete Cholesky pre-conditioning L is lower-triangular Y = L−1W or Y = L−TW can be found by backward substitution l11 ⋅ = l21 l22 Y W l31 l32 y1 = w1 l11 y2 = (w2 − l21 y1 ) l22 Slide 13 Pre-Conditioning L−1 AL−T ⋅ LT X = L−1B A ̃ X̃ B ̃ Once X̃ is known, X is calculated as X = L−TX̃ Solve linear equation L−TX = X̃ by backward substitution a11 0 0 l11 l21 l31 ~ ~ ⋅ = ⋅ = a22 0 X X l22 l32 X X ~ ~ x1 = x1 a11 xN = xN lNN ~ ~ x − = (x − − l − x ) l − − x2 = x2 a22 N 1 N 1 N ,N 1 N N 1,N 1 Diagonal pre-conditioning Incomplete Cholesky pre- conditioning Slide 14 Nonlinear Conjugate Gradient Method Conjugate gradient method can be extended to general (i.e., non-quadratic) unconstrained nonlinear optimization 1 min X T AX − BT X + C min f (X ) X 2 X Nonlinear programming X = A−1B Quadratic programming A number of changes must be made to solve nonlinear optimization problems Slide 15 Nonlinear Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate ( ) ( ) ( ) D 0 = R 0 = B − AX 0 Step 3: update solution D(k )T R(k ) X (k +1) = X (k ) + µ (k )D(k ) where µ (k ) = D(k )T AD(k ) Step 4: calculate residual ( + ) ( ) ( ) ( ) R k 1 = R k − µ k AD k Step 5: determine search direction R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 16 Nonlinear Conjugate Gradient Method New definition of residual D(0) = R(0) = B − AX (0) R(k ) = −∇f [X (k ) ] R(k +1) = R(k ) − µ (k ) AD(k ) Quadratic programming Nonlinear programming “Residual” is defined by the gradient of f(X) If X* is optimal, ∇f(X*) = 0 −∇f(X*) = B − AX for quadratic programming Slide 17 Nonlinear Conjugate Gradient Method New formula for conjugate search directions R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Quadratic programming Ideally, search directions should be computed by Gram- Schmidt conjugation of residues In practice, we often use approximate formulas R(k +1)T R(k +1) R(k +1)T ⋅[R(k +1) − R(k ) ] β + = β + = k 1,k R(k )T R(k ) k 1,k R(k )T R(k ) Fletcher-Reeves formula Polak-Ribiere formula Slide 18 Nonlinear Conjugate Gradient Method Optimal step size calculated by one-dimensional search D(k )T R(k ) (k +1) = (k ) + µ (k ) (k ) µ (k ) = X X D where (k )T (k ) D AD Quadratic programming µ(k) cannot be calculated analytically Optimize µ(k) by one-dimensional search min f [X (k +1) ]= f [X (k ) + µ (k )D(k ) ] µ (k ) Slide 19 Nonlinear Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate (0) (0) (0) D = R = −∇f [X ] Step 3: update solution ( ) ( ) ( ) min f [X k + µ k D k ] (k +1) (k ) (k ) (k ) µ (k ) X = X + µ D Step 4: calculate residual ( + ) ( + ) R k 1 = −∇f [X k 1 ] Step 5: determine search direction (Fletcher-Reeves formula) (k +1)T (k +1) R R (k +1) (k +1) (k ) β = D = R + β D k +1,k (k )T (k ) k +1,k R R Step 6: set k = k + 1 and go to Step 3 Slide 20 Nonlinear Conjugate Gradient Method Gradient method, conjugate gradient method and Newton method Conjugate gradient method is often preferred for many practical large-scale engineering problems Conjugate Gradient Newton Gradient 1st-Order Derivative Yes Yes Yes 2nd-Order Derivative No No Yes Pre-conditioning No Yes No Cost per Iteration Low Low High Convergence Rate Slow Fast Fast Preferred Problem Size Large Large Small Slide 21 Summary Conjugate gradient method (Part 4) Pre-conditioning Nonlinear conjugate gradient method Slide 22 .

Conjugate Gradient Method (Part 4)  Pre-Conditioning  Nonlinear Conjugate Gradient Method

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support