Lecture 18 Subgradients
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 18 Subgradients November 3, 2008 Lecture 18 Outline • Existence of Subgradients • Subdifferential Properties • Optimality Conditions Convex Optimization 1 Lecture 18 Convex-Constrained Non-differentiable Minimization minimize f(x) subject to x ∈ C • Characteristics: • The function f : Rn → R is convex and possibly non-differentiable • The set C ⊆ Rn is nonempty closed and convex • The optimal value f ∗ is finite • Our focus here is non-differentiability Convex Optimization 2 Lecture 18 Definition of Subgradient and Subdifferential Def. A vector s ∈ Rn is a subgradient of f at xˆ ∈ dom f when f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f Def. A subdifferential of f at xˆ ∈ dom f is the set of all subgradients s of f at xˆ ∈ dom f • The subdifferential of f at xˆ is denoted by ∂f(ˆx) • When f is differentible at xˆ, we have ∂f(ˆx) = {∇f(ˆx)} (the subdif- ferential is a singleton) • Examples sign(x) for x 6= 0 f(x) = |x|, ∂f(0) = [−1, 1] for x = 0 x2 + 2|x| − 3 for |x| > 1 f(x) = 0 for |x| ≤ 1 Convex Optimization 3 Lecture 18 Subgradients and Epigraph • Let s be a subgradient of f at xˆ: f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f • The subgradient inequality is equivalent to −sT xˆ + f(ˆx) ≤ −sT x + f(x) for all x ∈ dom f • Let f(x) > −∞ for all x ∈ Rn. Then n epi f = {(x, w) | f(x) ≤ w, x ∈ R } Thus, −sT xˆ + f(ˆx) ≤ −sT x + w for all (x, w) ∈ epi f, equivalent to T T " −s # " xˆ # " −s # " x # ≤ for all (x, w) ∈ epi f 1 f(ˆx) 1 w Therefore, the hyperplane n o H = (x, γ) ∈ Rn+1 | (−s, 1)T (x, γ) = (−s, 1)T (ˆx, f(ˆx)) supports epi f at the vector (ˆx, f(ˆx)) Convex Optimization 4 Lecture 18 Subdifferential Set Properties Theorem 1 A subdifferential set ∂f(ˆx) is convex and closed Proof H7. Theorem 2 (Existence) Let f be convex with a nonempty dom f. Then: (a) For x ∈ relint(dom f), we have ∂f(x) 6= ∅. (b) ∂f(x) 6= ∅ is nonempty and bounded if and only if x ∈ int(dom f). Implications • The subdifferential ∂f(ˆx) is nonempty compact convex set for every xˆ in the interior of dom f. • When dom f = Rn, ∂f(x) is nonempty compact convex set for all x Convex Optimization 5 Lecture 18 Partial Proof: If xˆ ∈ int(dom f), then ∂f(ˆx) is nonempty and bounded. • ∂f(ˆx) Nonempty. Let xˆ be in the interior of dom f. The vector (ˆx, f(ˆx)) does not belong to the interior of epi f. The epigraph epi f is convex and by the Supporting Hyperlane Theorem, there is a vector (d, β) ∈ Rn+1, (d, β) 6= 0 such that dT xˆ + βf(ˆx) ≤ dT x + βw for all (x, w) ∈ epi f We have epi f = {(x, w) | f(x) ≤ w, x ∈ dom f}. Hence, dT xˆ + βf(ˆx) ≤ dT x + βw for all x ∈ dom f, f(x) ≤ w We must have β ≥ 0. We cannot have β = 0 (it would imply d = 0). Dividing by β, we see that −d/β is a subgradient of f at xˆ Convex Optimization 6 Lecture 18 • ∂f(ˆx) Bounded. By the subgradient inequality, we have f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f Suppose that the subdifferential ∂f(ˆx) is unbounded. Let sk be a sequence of subgradients in ∂f(ˆx) with kskk → ∞. Since xˆ lies in the interior of domain, there exists a δ > 0 such that xˆ + δy ∈ dom f for any y ∈ n. Letting x =x ˆ + δ sk for any k, we R kskk have ! sk f xˆ + δ ≥ f(ˆx) + δkskk for all k kskk As k → ∞, we have f xˆ + δ sk − f(ˆx) → ∞. kskk However, this relation contradicts the continuity of f at xˆ. [Recall, a convex function is continuous over the interior of its domain.] √ Example Consider f(x) = − x with dom f = {x | x ≥ 0}. We have ∂f(0) = ∅. Note that 0 is not in the interior of the domain of f Convex Optimization 7 Lecture 18 Boundedness of the Subdifferential Sets Theorem 2 Let f : Rn → R be convex and let X be a bounded set. Then, the set ∪x∈X∂f(x) is bounded. Proof Assume that there is an unbounded sequence of subgradients sk, i.e., lim kskk = ∞, k→∞ where sk ∈ ∂f(xk) for some xk ∈ X. The sequence {xk} is bounded, so n it has a convergent subsequence, say {xk}K converging to some x ∈ R . sk Consider dk = for k ∈ K. This is a bounded sequence, and it has a kskk Convex Optimization 8 Lecture 18 0 convergent subsequence, say {dk}K0 with K ⊆ K. Let d be the limit of {dk}K0. Since sk ∈ ∂f(xk), we have for each k, T f(xk + dk) ≥ f(xk) + sk dk = f(xk) + kskk. By letting k → ∞ with k ∈ K0, we see that lim sup[f(xk + dk) − f(xk)] ≥ lim sup kskk. k→∞ k→∞ k∈K0 k∈K0 By continuity of f, lim sup[f(xk + dk) − f(xk)] = f(x + d) − f(x), k→∞ k∈K0 hence finite, implying that kskk is bounded. This is a contradition ({sk} was assumed to be unbounded). Convex Optimization 9 Lecture 18 Continuity of the Subdifferential n Theorem 3 Let f : R → R be convex and let {xk} converge to some n x ∈ R . Let sk ∈ ∂f(xk) for all k. Then, the sequence {sk} is bounded and every of its limit points is a subgradient of f at x. Proof H7. Convex Optimization 10 Lecture 18 Subdifferential and Directional Derivatives Definition The directional derivative f 0(x; d) of f at x along direction d is the following limiting value f(x + αd) − f(x) f 0(x; d) = lim . α→0 α f(x+αd)−f(x) • When f is convex, the ratio α is nondecreasing function of α > 0, and as α decreases to zero, the ratio converges to some value or decreases to −∞. (HW) Theorem 4 Let x ∈ int(dom f). Then, the directional derivative f 0(x; d) is finite for all d ∈ Rn. In particular, we have f 0(x; d) = max sT d. s∈∂f(x) Convex Optimization 11 Lecture 18 Proof When x ∈ int(dom f), the subdifferential ∂f(x) is nonempty and compact. Using the subgradient defining relation, we can see that f 0(x; d) ≥ sT d for all s ∈ ∂f(x). Therefore, f 0(x; d) ≥ max sT d. s∈∂f(x) To show that actually equality holds, we rely on Separating Hyperplane Theorem. Define C1 = {(z, w) | z ∈ dom f, f(z) < w}, 0 C2 = {(y, v) | y = x + αd, v = f(x) + αf (x; d), α ≥ 0}. These sets are nonempty, convex, and disjoint (HW). By the Separating Convex Optimization 12 Lecture 18 Hyperplane Theorem, there exists a nonzero vector (a, β) ∈ Rn+1 such that aT (x + αd) + β(f(x) + αf 0(x; d)) ≤ aT z + βw, (1) for all α ≥ 0, z ∈ dom f, and f(z) < w. We must have β ≥ 0 - why? We cannot have β = 0 -why? Thus, β > 0 and we can divide by β the relation in (1), and obtain with ˜a = a/β, ˜aT (x + αd) + f(x) + αf 0(x; d) ≤ ˜aT z + w, (2) for all α ≥ 0, z ∈ dom f, and f(z) < w. Choosing α = 0 and letting w ↓ f(z), we see ˜aT x + f(x) ≤ ˜aT z + f(z), implying that f(x) − ˜aT (z − x) ≤ f(z) for all z ∈ dom f. Therefore −˜a ∈ ∂f(x). Convex Optimization 13 Lecture 18 Letting z = x, w ↓ f(z) and α = 1 in (2), we obtain ˜aT (x + d) + f(x) + f 0(x; d) ≤ ˜aT x + f(x), 0 T 0 T implying f (x; d) ≤ −˜a d. In view of f (x; d) ≥ maxs∈∂f(x) s d, it follows that f 0(x; d) = max sT d, s∈∂f(x) where the maximum is attained at the “constructed” subgradient −˜a. Convex Optimization 14 Lecture 18 Optimality Conditions: Unconstrained Case Unconstrained optimization minimize f(x) Assumption • The function f is convex (non-differentiable) and proper [f proper means f(x) > −∞ for all x and dom f 6= ∅] Theorem Under this assumption, a vector x∗ minimizes f over Rn if and only if 0 ∈ ∂f(x∗) • The result is a generalization of ∇f(x∗) = 0 • Proof x∗ is optimal if and only if f(x) ≥ f(x∗) for all x, or equivalently ∗ T ∗ n f(x) ≥ f(x ) + 0 (x − x ) for all x ∈ R Thus, x∗ is optimal if and only if 0 ∈ ∂f(x∗) Convex Optimization 15 Lecture 18 Examples • The function f(x) = |x| sign(x) for x 6= 0 ∂f(0) = [−1, 1] for x = 0 The minimum is at x∗ = 0, and evidently 0 ∈ ∂f(0) • The function f(x) = kxk x for x 6= 0 ∂f(x) = kxk {s | ksk ≤ 1} for x = 0 Again, the minimum is at x∗ = 0 and 0 ∈ ∂f(0) Convex Optimization 16 Lecture 18 • The function f(x) = max{x2 + 2x − 3, x2 − 2x − 3, 4} 2 x − 2x − 3 for x < −1 f(x) = 4 for x ∈ [−1, 1] 2 x + 2x − 3 for x > 1 2 2 1 x − for x > [−4, 0] for x = −1 ∂f(x) = 0 for x ∈ (−1, 1) [0, 4] for x = 1 2x + 2 for x > 1 • The optimal set is X∗ = [−1, 1] • For every x∗ ∈ X∗, we have 0 ∈ ∂f(x∗) Convex Optimization 17 Lecture 18 Optimality Conditions: Constrained Case Constrained optimization minimize f(x) subject to x ∈ C Assumption • The function f is convex (non-differentiable) and proper • The set C is nonempty closed and convex Theorem Under this assumption, a vector x∗ ∈ C minimizes f over the set C if and only if there exists a subgradient d ∈ ∂f(x∗) such that dT (x − x∗) ≥ 0 for all x ∈ C • The result is a generalization of ∇f(x∗)T (x − x∗) ≥ 0 for x ∈ C Convex Optimization 18.