Lecture 13 Gradient Methods for Constrained Optimization

Lecture 13 Gradient Methods for Constrained Optimization October 16, 2008 Lecture 13 Outline • Gradient Projection Algorithm • Convergence Rate Convex Optimization 1 Lecture 13 Constrained Minimization minimize f(x) subject x ∈ X • Assumption 1: • The function f is convex and continuously differentiable over Rn • The set X is closed and convex ∗ • The optimal value f = infx∈Rn f(x) is finite • Gradient projection algorithm xk+1 = PX[xk − αk∇f(xk)] starting with x0 ∈ X. Convex Optimization 2 Lecture 13 Bounded Gradients Theorem 1 Let Assumption 1 hold, and suppose that the gradients are uniformly bounded over the set X. Then, the projection gradient method generates the sequence {xk} ⊂ X such that • When the constant stepsize αk ≡ α is used, we have 2 ∗ αL lim inf f(xk) ≤ f + k→∞ 2 P • When diminishing stepsize is used with k αk = +∞, we have ∗ lim inf f(xk) = f . k→∞ Proof: We use projection properties and the line of analysis similar to that of unconstrained method. HWK 6 Convex Optimization 3 Lecture 13 Lipschitz Gradients • Lipschitz Gradient Lemma For a differentiable convex function f with Lipschitz gradients, we have for all x, y ∈ Rn, 1 k∇f(x) − ∇f(y)k2 ≤ (∇f(x) − ∇f(y))T (x − y), L where L is a Lipschitz constant. • Theorem 2 Let Assumption 1 hold, and assume that the gradients of f are Lipschitz continuous over X. Suppose that the optimal solution ∗ set X is not empty. Then, for a constant stepsize αk ≡ α with 0 2 < α < L converges to an optimal point, i.e., ∗ ∗ ∗ lim kxk − x k = 0 for some x ∈ X . k→∞ Convex Optimization 4 Lecture 13 Proof: n Fact 1: If z = PX[z − v] for some v ∈ < , then z = PX[z − τv] for any τ > 0. ∗ Fact 2: z ∈ X if and only if z = PX[z − ∇f(z)]. ∗ These facts imply that z ∈ X if and only if z = PX[z − τ∇f(z)] for any τ > 0. By using the definition of the method and the preceding relation with τ = α, we obtain for any z ∈ X∗, 2 2 kxk+1 − zk = kPX[xk − α∇f(xk)] − PX[z − α∇f(z)k . By non-expansiveness of the projection, it follows 2 2 kxk+1 − zk = kxk − z − α(∇f(xk) − ∇f(z))k 2 T = kxk − zk − 2α(xk − z) (∇f(xk) − ∇f(z)) 2 2 +α k∇f(xk) − ∇f(z)k Convex Optimization 5 Lecture 13 Using Lipschitz Gradient Lemma, we obtain for any z ∈ X∗, 2 2 α 2 kxk+1 − zk ≤ kxk − zk − (2 − αL)k∇f(xk) − ∇f(z)k . (1) L Hence, for all k, α 2 2 2 (2 − αL)k∇f(xk) − ∇f(z)k ≤ kxk − zk − kxk+1 − zk . L By summing the preceding relations from arbitrary K to N, with K < N, we obtain N α X 2 2 2 2 (2−αL) k∇f(xk)−∇f(z)k ≤ kxK−zk −kxN+1−zk ≤ kxK−zk . L k=K Convex Optimization 6 Lecture 13 In particular, setting K = 0 and letting N → ∞, we see that ∞ α X 2 2 (2 − αL) k∇f(xk) − ∇f(z)k ≤ kx0 − zk < ∞. (2) L k=0 As a consequence, we also have lim ∇f(xk) = ∇f(z). (3) k→∞ By discarding the non-positive term in the right hand side of Eq. (1), we have for any z ∈ X∗ and all k, 2 2 2 kxk+1 − zk ≤ kxk − zk + (2 − αL)k∇f(xk) − ∇f(z)k . By summing these relations over k = K, . , N for arbitrary K and N with K < N, we obtain Convex Optimization 7 Lecture 13 N 2 2 X 2 kxN+1 − zk ≤ kxK − zk + (2 − αL) k∇f(xk) − ∇f(z)k . k=K Taking limsup as N → ∞, we obtain ∞ 2 2 X 2 lim sup kxN+1 − zk ≤ kxK − zk + (2 − αL) k∇f(xk) − ∇f(z)k . N→∞ k=K Now, taking liminf as K → ∞ yields 2 2 lim sup kxN+1 − zk ≤ lim inf kxK − zk N→∞ K→∞ ∞ X 2 + (2 − αL) lim k ∇f(xk) − ∇f(z)k K→∞ k=K 2 = lim inf kxK − zk , K→∞ Convex Optimization 8 Lecture 13 where the equality follows in view of the relation in (2). Thus, we have that ∗ the sequence {kxk − zk} is convergent for every z ∈ X . By the inequality in Eq. (1), we have that kxk − zk ≤ kx0 − zk for all k. Hence, the sequence {xk} is bounded, and it has an accumulation point. ∗ Since the scalar sequence {kxk − zk} is convergent for every z ∈ X , the sequence {xk} must be convergent. Suppose now that xk → x.¯ By considering the definition of the iterate xk+1, we have xk+1 = PX[xk − α∇f(xk)]. Letting k → ∞ and using xk → x,¯ and continuity of the gradient ∇f(x), we obtain x¯ = PX[¯x − α∇f(¯x)]. ∗ In view of facts 1 and 2, the preceding relation is equivalent to x¯ ∈ X . Convex Optimization 9 Lecture 13 Modes of Convexity: Strict and Strong • Def. f is strictly convex if for all x 6= y and α ∈ (0, 1) we have f(αx + (1 − α)y) < αf(x) + (1 − α)f(y) • Def. f is strongly convex if there exists a scalar ν > 0 such that ν f(αx + (1 − α)y) ≤ αf(x) + (1 − α)f(y) − α(1 − α)kx − yk2 2 for all x, y ∈ <n and any α ∈ [0, 1]. The scalar ν is referred to as strongly convex constant. The function is said to be strongly convex with constant ν. Convex Optimization 10 Lecture 13 Modes of Convexity: Differentiable Function • Let f : <n → R be continuously differentiable. • Modes of convexity can be equivalently characterized in terms of the linearization properties of the function ∇f : <n → <n. • We have • f is convex if and only if f(x) + ∇f(x)T (y − x) ≤ f(y) for all x, y ∈ <n • f is strictly convex if and only if f(x) + ∇f(x)T (y − x) < f(y) for all x 6= y • f is strongly convex with constant ν if and only if ν f(x) + ∇f(x)T (y − x) + ky − xk2 ≤ f(y) for all x, y ∈ <n 2 Convex Optimization 11 Lecture 13 Modes of Convexity: Gradient Mapping • Let f : <n → R be continuously differentiable. • Modes of convexity can be equivalently characterized in terms of the monotonicity properties of the gradient mapping ∇f : <n → <n. • We have • f is convex if and only if (∇f(x) − ∇f(y))T (x − y) ≥ 0 for all x, y ∈ <n • f is strictly convex if and only if (∇f(x) − ∇f(y))T (x − y) > 0 for all x 6= y • f is strongly convex with constant ν if and only if (∇f(x) − ∇f(y))T (x − y) ≥ ν kx − yk2 for all x, y ∈ <n Convex Optimization 12 Lecture 13 Modes of Convexity: Twice Differentiable Function • Let f : <n → R be twice continuously differentiable. • Modes of convexity can be equivalently characterized in terms of the definiteness of the Hessians ∇2f(x) for x ∈ <n. • We have • f is convex if and only if ∇2f(x) ≥ 0 for all x ∈ <n • f is strictly convex if ∇2f(x) > 0 for all x ∈ <n • f is strongly convex with constant ν if and only if ∇2f(x) ≥ ν I for all x ∈ <n Convex Optimization 13 Lecture 13 Strong Convexity: Implications Let f be continuously differentiable and strongly convex∗ over Rn with constant m • Implications: • Lower Bound on f over Rn: for all x, y ∈ Rn m f(y) ≥ f(x) + ∇f(x)T (y − x) + kx − yk2 (4) 2 2 minimize w/r to y in the right-hand side: 1 f(y) ≥ f(x) − k∇f(x)k2 2m n minimum over y ∈ R : 1 f(x) − f ∗ ≤ k∇f(x)k2 2m • Useful as a stopping criterion (if you know m) ∗ n Strong convexity over R can be replaced by a strong convexity over a set X. Then, all the relations stay valid over the set Convex Optimization 14 Lecture 13 • Relation (4) with x = x0 and f(y) ≤ f(x0) implies that the level set Lf (f(x0)) is bounded • Relation (4) also yields for an optimal x∗ and any x ∈ Rn, m kx − x∗k2 ≤ f(x) − f(x∗) 2 • Last two bullets HWK6 assignment. Convex Optimization 15 Lecture 13 Convergence Rate: Once Differentiable Theorem 3 Let Assumption 1 hold, and assume that the gradients of f are Lipschitz continuous over X with constant L > 0. Suppose that the function is strongly convex with constant m > 0. Then: • A solution x∗ exists and it is unique. • The iterates generated by the gradient projection method with αk ≡ α 2 ∗ and α < L converge to x with geometric rate, i.e., ∗ 2 k ∗ 2 kxk+1 − x k ≤ q kxk − x k for all k with q ∈ (0, 1) depending on m and L. Proof: HWK 6. Convex Optimization 16 Lecture 13 Convergence Rate: Twice Differentiable Theorem 4 Let Assumption 1 hold. Assume that the function is twice continuously differentiable and strongly convex with constant m > 0. Assume also that ∇f 2(x) ≤ L for all x ∈ X. Then: • A solution x∗ exists and it is unique. • The iterates generated by the gradient projection method with αk ≡ α 2 ∗ and α < L converge to x with geometric rate, i.e., ∗ k ∗ kxk+1 − x k ≤ q kxk − x k for all k with q = max{|1 − αm|, |1 − αL}. Convex Optimization 17 Lecture 13 Proof: The q here is different from the one in the preceding theorem.

Lecture 13 Gradient Methods for Constrained Optimization

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support