Review for Nonlinear Programming
Total Page:16
File Type:pdf, Size:1020Kb
Review for Nonlinear Programming Zhiting Xu November 30, 2008 1 1 Line Search Methods In line search method, each iteration computes a search direction pk and then decides how far to move along that direction. That is, xk+1 = xk + αkpk (1.1) The search direction pk often has the form −1 pk = −Bk fk (1.2) where Bk is a symmetric and nonsingular matrix. 1.1 Step Length Define φ(α) = f(xk + αpk), α > 0. The ideal choice of α would be global minimizer of φ. However, it is usually too expensive to identify this value. Line search methods try to find a reasonable value α that meets some con- ditions by try out a sequence of candidate α. A popular condition is Wolfe Conditions, which has two inequality: T f(xk + αpk) ≤ f(xk) + c1α∇fk pk (1.3) T T ∇f(xk + αkpk) pk ≥ c2∇fk pk (1.4) where 0 < c1 < c2 < 1. The right-hand-side of 1.3 is a linear function with negative slop, so the first Wolfe Condition is that the function value of the new point should be sufficient small. To avoid a too short step, the Wolfe Condition also requires a smooth slope, which is the second Wolfe Condition. 0 The Strong Wolfe conditions requires that the derivative φ (αk) can’t be too positive: T f(xk + αkpk) ≤ f(xk) + c1αk∇fk pk (1.5a) T T |∇f(xk + αkfk) pk| ≤ c2|∇fk pk| (1.5b) 1.2 Convergence of Line Search To see the convergence of line search, discuss the angle θk between pk and the steepest descent direction, define by T −∇fk pk cos θk = (1.6) ||∇fk||||pk|| 2 Theorem 1.1 Consider any iteration of the form 1.1, where pk is a descent direction and αk satisfies the Wolfe conditions 1.3, 1.4. Suppose that f is n bounded below in R and that f is continuously differentiable in an open def set N containing the level set L = {x : f(x) ≤ f(x0)},where x0 is the starting point of the iteration. Assume also that the gradient ∇f is Lipschitz continuous on N , that is, there exists a constant L > 0 such that ||∇f(x) − ∇f(˜x)|| ≤ L||x − x˜||, for all x, x˜ ∈ N (1.7) Then X 2 2 cos θk||∇fk|| < ∞ (1.8) k≥0 0 If the angle between pk and −∇fk is bounded away from 90 , that is, cos θk ≥ δ > 0, for all k, then limk→∞ ||∇fk|| = 0. If Newton and quasi- −1 Newton methods require Bk bounded, that is, ||Bk||||Bk || ≤ M, then cos θk ≥ 1/M. Therefore, these methods are globally convergent. For conjugate gradient methods, we can only get weaker result: lim infk→∞ ||∇fk|| = 0. 1.3 Rate of Convergence Basic concepts: ∗ ||xk+1 − x || Q-linear: ∗ ≤ r, for all k sufficiently large (1.9) ||xk − x || ∗ ||xk+1 − x || Q-superlinear: lim ∗ = 0 (1.10) k→∞ ||xk − x || ∗ ||xk+1 − x || Q-quadratic: ∗ 2 ≤ M (1.11) ||xk − x || R-linear: if there is a sequence of nonnegative scalars {vk} such that ||xk − ∗ x || ≤ vk for all k, and {xk} converges Q-linearly to zero. The sequence ∗ ||xk − x || is said to be dominated by {vk}. Steepest Descent Theorem 1.2 When the steepest descent method with exact line searches T ∇fk ∇fk xk+1 = xk − T ∇fk is applied to the strongly convex quadratic function ∇fk Q∇fk 1 T T 1 ∗ 2 ∗ f(x) = 2 x Qx − b x, the error norm 2 ||x − x ||Q = f(x) − f(x) satisfies 2 ∗ 2 λn − λ1 ∗ 2 ||xk+1 − x ||Q ≤ ||xk − x ||Q (1.12) λn + λ1 where 0 < λ1 ≤ λ1 ≤ · · · ≤ λn are the eigenvalues of Q. 3 Newton’s Method In Newton iteration, the search pk is given by: N 2 −1 pk = −∇ fk ∇fk (1.13) 2 pk may not be the descent direction as ∇ fk may not be positive definite. Theorem 1.3 Suppose that f is twice differentiable and that the Hessian ∇2f(x) is Lipschitz continuous in a neighborhood of a solution x∗ at which the sufficient conditions are satisfied. Consider the iteration xk+1 = xk +pk, where pk is given by 1.13. Then • 1- if the starting point x0 is sufficiently close to x∗, the sequence of iterates converges to x∗ • 2- the rate of convergence of {xk} is quadratic; and • 3- the sequence of gradient norms {||∇fk||} converges quadratically to zero. Quasi-Newton Methods In Quasi-Newton method, pk is: −1 pk = −Bk ∇fk (1.14) where Bk is symmetric and positive definite. n Theorem 1.4 Suppose that f : R → R is twice continuously differentiable. Consider the iteration xk+1 = xk +akpk, where pk is a descent direction and αk satisfies the Wolfe conditions 1.3, 1.4 with c1 ≤ 1/2. If the sequence ∗ ∗ 2 ∗ {xk} converges to a point x such that ∇f(x ) = 0 and ∇ f(x ) is positive definite, and if the search direction satisfies ||∇f + ∇2f p || lim k k k = 0 (1.15) k→∞ ||pk|| then • 1- the step length αk is admissible for all k greater than a certain index k0: and ∗ • 2- if αk = 1 for all k > k0,{xk} converges to x superlinearly. If pk is a quasi-Newton search direction of the from 1.14, then 1.15 is equivalent to ||(B − ∇2f(x∗))p || lim k k = 0 (1.16) k→∞ ||pk|| This is both necessary and sufficient for the superlinear convergence of quasi-Newton methods. 4 n Theorem 1.5 Suppose that f : R → R is twice continuously differentiable. Consider the iteration xk+1 = xk +pk(that is, the step length αk is uniformly 1) and that pk is given by 1.14. Let us assume also that {xk} converges to a ∗ ∗ 2 ∗ point x such that ∇f(x ) = 0 and ∇ f(x ) is positive definite. Then {xk} converges superlinearly if and only if 1.16 holds. 1.4 Step-Length Selection Algorithms Line search uses an initial estimate α0 and generates a sequence {αi} that either terminates at a step that meets some conditions (like Wolfe condi- tion), or determines that such a step length does not exist. Basically, the precedure consists of two phases: a bracketing phase that finds an interval [a, b] that contains acceptable step lengths, and then a selection phase taht zooms in to locate the final step length. The selection phase usually reduces the bracketing interval and interpolates some of the function and derivative information gathered on earlier steps to guess the location of the minimizer. Interpolation At a guess αi, if we have 0 φ(αi) ≤ φ(0) + c1αiφ (0) (1.17) Then this step length satisfies the condition. Otherwise, we know that [0, αi] contains acceptable step lengths. Perform a quadratic approximation φq(α) to φ by interpolating the three pieces of information available - φ(0), φ0(0), and φ(αi)- to obtain 0 φ(αi) − φ(0) − α0φ (0) 0 φq(α) = ( 2 ) + φ (0)α + φ(0) (1.18) α0 Initial Step Length For Newton and quasi-Newton methods, the step α0 = 1 should always be used as the initial trial step length. For methods that do not produce well scaled search directions, such as the steepest descent and conjugate gradient methods, use current information about the problem and the algorithm to make the initial guess. 1.5 Barzilai-Borwein sk = xk − xk−1 yk = ∇f(xk) − ∇f(xk−1) 2 In Newton method, pk = −∇f (xk)∇f(xk).From Taylor theorem, we have 2 ∇f (xk)(xk − xk−1) ≈ ∇f(xk) − ∇f(xk−1) (1.19) ,which is secant condition. 5 Algorithm 1 Line Search Algorithm Set α0 ← 0, choose αmax > 0 and α1 ∈ (0, αmax) i ← 1 1: repeat 2: Evaluate φ(αi); 0 3: if φ(αi) > φ(0) + c1αiφ (0)or[φ(αi) ≥ φ(αi−1) and i > 1] then 4: α∗ ← zoom(αi−1, αi) and stop; 5: end if 0 6: Evaluate φ (αi) 0 0 7: if |φ (αi)| ≤ −c2φ (0) then 8: set α∗ ← αi and stop 9: end if 0 10: if φ (αi) ≥ 0 then 11: set α∗ ← zoom(αi, αi−1) and stop 12: end if 13: Choose αi+1 ∈ (αi, αmax) 14: i ← i + 1 15: until Algorithm 2 zoom 1: repeat 2: Interpolate to find a trial step length αj between αlo and αhi 3: Evaluate φ(αj) 0 4: if φ(αj) > φ(0) + c1αjφ (0)or φ(αj) ≥ φ(αlo) then 5: αhi ← αj 6: else 0 7: Evaluate φ (αj) 0 0 8: if |φ (αj)| ≤ −c2φ (0) then 9: Set α∗ ← αj and stop 10: end if 0 11: if φ (αj)(αhi − αlo) ≥ 0 then 12: αhi ← αlo 13: end if 14: αlo ← αj 15: end if 16: until In quasi Newton method, use B instead of H. In Barzilai-Borwein, use Bk = αkI, choose αk > 0 that Bksk ≈ yk, that 6 is, αs ≈ y. min ||αs − y||2 α 2 min(αs − y)T (αs − y) α sT y α = sT s Then we have αkpk = −∇f(xk) 1 pk = − ∇f(xk) αk T sk sk xk+1 = xk − T ∇f(xk) (1.20) sk yk Alternative BB formula 2 −1 2 Try to approximate ∇ f(xk) rather than ∇ f(xk) −1 2 −1 State secant condition as: sk ≈ f(xk) yk. Let τkI = ∇ f(xk) , so 2 τk = argmin||sk − τkyk||2 T sk yk = T (1.21) yk yk Switched BB Take BB step when k is even, and take BBalt step when k is odd.