Lecture 18 Subgradients

Lecture 18 Subgradients

Lecture 18 Subgradients November 3, 2008 Lecture 18 Outline • Existence of Subgradients • Subdifferential Properties • Optimality Conditions Convex Optimization 1 Lecture 18 Convex-Constrained Non-differentiable Minimization minimize f(x) subject to x ∈ C • Characteristics: • The function f : Rn → R is convex and possibly non-differentiable • The set C ⊆ Rn is nonempty closed and convex • The optimal value f ∗ is finite • Our focus here is non-differentiability Convex Optimization 2 Lecture 18 Definition of Subgradient and Subdifferential Def. A vector s ∈ Rn is a subgradient of f at xˆ ∈ dom f when f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f Def. A subdifferential of f at xˆ ∈ dom f is the set of all subgradients s of f at xˆ ∈ dom f • The subdifferential of f at xˆ is denoted by ∂f(ˆx) • When f is differentible at xˆ, we have ∂f(ˆx) = {∇f(ˆx)} (the subdif- ferential is a singleton) • Examples sign(x) for x 6= 0 f(x) = |x|, ∂f(0) = [−1, 1] for x = 0 x2 + 2|x| − 3 for |x| > 1 f(x) = 0 for |x| ≤ 1 Convex Optimization 3 Lecture 18 Subgradients and Epigraph • Let s be a subgradient of f at xˆ: f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f • The subgradient inequality is equivalent to −sT xˆ + f(ˆx) ≤ −sT x + f(x) for all x ∈ dom f • Let f(x) > −∞ for all x ∈ Rn. Then n epi f = {(x, w) | f(x) ≤ w, x ∈ R } Thus, −sT xˆ + f(ˆx) ≤ −sT x + w for all (x, w) ∈ epi f, equivalent to T T " −s # " xˆ # " −s # " x # ≤ for all (x, w) ∈ epi f 1 f(ˆx) 1 w Therefore, the hyperplane n o H = (x, γ) ∈ Rn+1 | (−s, 1)T (x, γ) = (−s, 1)T (ˆx, f(ˆx)) supports epi f at the vector (ˆx, f(ˆx)) Convex Optimization 4 Lecture 18 Subdifferential Set Properties Theorem 1 A subdifferential set ∂f(ˆx) is convex and closed Proof H7. Theorem 2 (Existence) Let f be convex with a nonempty dom f. Then: (a) For x ∈ relint(dom f), we have ∂f(x) 6= ∅. (b) ∂f(x) 6= ∅ is nonempty and bounded if and only if x ∈ int(dom f). Implications • The subdifferential ∂f(ˆx) is nonempty compact convex set for every xˆ in the interior of dom f. • When dom f = Rn, ∂f(x) is nonempty compact convex set for all x Convex Optimization 5 Lecture 18 Partial Proof: If xˆ ∈ int(dom f), then ∂f(ˆx) is nonempty and bounded. • ∂f(ˆx) Nonempty. Let xˆ be in the interior of dom f. The vector (ˆx, f(ˆx)) does not belong to the interior of epi f. The epigraph epi f is convex and by the Supporting Hyperlane Theorem, there is a vector (d, β) ∈ Rn+1, (d, β) 6= 0 such that dT xˆ + βf(ˆx) ≤ dT x + βw for all (x, w) ∈ epi f We have epi f = {(x, w) | f(x) ≤ w, x ∈ dom f}. Hence, dT xˆ + βf(ˆx) ≤ dT x + βw for all x ∈ dom f, f(x) ≤ w We must have β ≥ 0. We cannot have β = 0 (it would imply d = 0). Dividing by β, we see that −d/β is a subgradient of f at xˆ Convex Optimization 6 Lecture 18 • ∂f(ˆx) Bounded. By the subgradient inequality, we have f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f Suppose that the subdifferential ∂f(ˆx) is unbounded. Let sk be a sequence of subgradients in ∂f(ˆx) with kskk → ∞. Since xˆ lies in the interior of domain, there exists a δ > 0 such that xˆ + δy ∈ dom f for any y ∈ n. Letting x =x ˆ + δ sk for any k, we R kskk have ! sk f xˆ + δ ≥ f(ˆx) + δkskk for all k kskk As k → ∞, we have f xˆ + δ sk − f(ˆx) → ∞. kskk However, this relation contradicts the continuity of f at xˆ. [Recall, a convex function is continuous over the interior of its domain.] √ Example Consider f(x) = − x with dom f = {x | x ≥ 0}. We have ∂f(0) = ∅. Note that 0 is not in the interior of the domain of f Convex Optimization 7 Lecture 18 Boundedness of the Subdifferential Sets Theorem 2 Let f : Rn → R be convex and let X be a bounded set. Then, the set ∪x∈X∂f(x) is bounded. Proof Assume that there is an unbounded sequence of subgradients sk, i.e., lim kskk = ∞, k→∞ where sk ∈ ∂f(xk) for some xk ∈ X. The sequence {xk} is bounded, so n it has a convergent subsequence, say {xk}K converging to some x ∈ R . sk Consider dk = for k ∈ K. This is a bounded sequence, and it has a kskk Convex Optimization 8 Lecture 18 0 convergent subsequence, say {dk}K0 with K ⊆ K. Let d be the limit of {dk}K0. Since sk ∈ ∂f(xk), we have for each k, T f(xk + dk) ≥ f(xk) + sk dk = f(xk) + kskk. By letting k → ∞ with k ∈ K0, we see that lim sup[f(xk + dk) − f(xk)] ≥ lim sup kskk. k→∞ k→∞ k∈K0 k∈K0 By continuity of f, lim sup[f(xk + dk) − f(xk)] = f(x + d) − f(x), k→∞ k∈K0 hence finite, implying that kskk is bounded. This is a contradition ({sk} was assumed to be unbounded). Convex Optimization 9 Lecture 18 Continuity of the Subdifferential n Theorem 3 Let f : R → R be convex and let {xk} converge to some n x ∈ R . Let sk ∈ ∂f(xk) for all k. Then, the sequence {sk} is bounded and every of its limit points is a subgradient of f at x. Proof H7. Convex Optimization 10 Lecture 18 Subdifferential and Directional Derivatives Definition The directional derivative f 0(x; d) of f at x along direction d is the following limiting value f(x + αd) − f(x) f 0(x; d) = lim . α→0 α f(x+αd)−f(x) • When f is convex, the ratio α is nondecreasing function of α > 0, and as α decreases to zero, the ratio converges to some value or decreases to −∞. (HW) Theorem 4 Let x ∈ int(dom f). Then, the directional derivative f 0(x; d) is finite for all d ∈ Rn. In particular, we have f 0(x; d) = max sT d. s∈∂f(x) Convex Optimization 11 Lecture 18 Proof When x ∈ int(dom f), the subdifferential ∂f(x) is nonempty and compact. Using the subgradient defining relation, we can see that f 0(x; d) ≥ sT d for all s ∈ ∂f(x). Therefore, f 0(x; d) ≥ max sT d. s∈∂f(x) To show that actually equality holds, we rely on Separating Hyperplane Theorem. Define C1 = {(z, w) | z ∈ dom f, f(z) < w}, 0 C2 = {(y, v) | y = x + αd, v = f(x) + αf (x; d), α ≥ 0}. These sets are nonempty, convex, and disjoint (HW). By the Separating Convex Optimization 12 Lecture 18 Hyperplane Theorem, there exists a nonzero vector (a, β) ∈ Rn+1 such that aT (x + αd) + β(f(x) + αf 0(x; d)) ≤ aT z + βw, (1) for all α ≥ 0, z ∈ dom f, and f(z) < w. We must have β ≥ 0 - why? We cannot have β = 0 -why? Thus, β > 0 and we can divide by β the relation in (1), and obtain with ˜a = a/β, ˜aT (x + αd) + f(x) + αf 0(x; d) ≤ ˜aT z + w, (2) for all α ≥ 0, z ∈ dom f, and f(z) < w. Choosing α = 0 and letting w ↓ f(z), we see ˜aT x + f(x) ≤ ˜aT z + f(z), implying that f(x) − ˜aT (z − x) ≤ f(z) for all z ∈ dom f. Therefore −˜a ∈ ∂f(x). Convex Optimization 13 Lecture 18 Letting z = x, w ↓ f(z) and α = 1 in (2), we obtain ˜aT (x + d) + f(x) + f 0(x; d) ≤ ˜aT x + f(x), 0 T 0 T implying f (x; d) ≤ −˜a d. In view of f (x; d) ≥ maxs∈∂f(x) s d, it follows that f 0(x; d) = max sT d, s∈∂f(x) where the maximum is attained at the “constructed” subgradient −˜a. Convex Optimization 14 Lecture 18 Optimality Conditions: Unconstrained Case Unconstrained optimization minimize f(x) Assumption • The function f is convex (non-differentiable) and proper [f proper means f(x) > −∞ for all x and dom f 6= ∅] Theorem Under this assumption, a vector x∗ minimizes f over Rn if and only if 0 ∈ ∂f(x∗) • The result is a generalization of ∇f(x∗) = 0 • Proof x∗ is optimal if and only if f(x) ≥ f(x∗) for all x, or equivalently ∗ T ∗ n f(x) ≥ f(x ) + 0 (x − x ) for all x ∈ R Thus, x∗ is optimal if and only if 0 ∈ ∂f(x∗) Convex Optimization 15 Lecture 18 Examples • The function f(x) = |x| sign(x) for x 6= 0 ∂f(0) = [−1, 1] for x = 0 The minimum is at x∗ = 0, and evidently 0 ∈ ∂f(0) • The function f(x) = kxk x for x 6= 0 ∂f(x) = kxk {s | ksk ≤ 1} for x = 0 Again, the minimum is at x∗ = 0 and 0 ∈ ∂f(0) Convex Optimization 16 Lecture 18 • The function f(x) = max{x2 + 2x − 3, x2 − 2x − 3, 4} 2 x − 2x − 3 for x < −1 f(x) = 4 for x ∈ [−1, 1] 2 x + 2x − 3 for x > 1 2 2 1 x − for x > [−4, 0] for x = −1 ∂f(x) = 0 for x ∈ (−1, 1) [0, 4] for x = 1 2x + 2 for x > 1 • The optimal set is X∗ = [−1, 1] • For every x∗ ∈ X∗, we have 0 ∈ ∂f(x∗) Convex Optimization 17 Lecture 18 Optimality Conditions: Constrained Case Constrained optimization minimize f(x) subject to x ∈ C Assumption • The function f is convex (non-differentiable) and proper • The set C is nonempty closed and convex Theorem Under this assumption, a vector x∗ ∈ C minimizes f over the set C if and only if there exists a subgradient d ∈ ∂f(x∗) such that dT (x − x∗) ≥ 0 for all x ∈ C • The result is a generalization of ∇f(x∗)T (x − x∗) ≥ 0 for x ∈ C Convex Optimization 18.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us