<<

Lecture 18 Subgradients

November 3, 2008 Lecture 18 Outline

• Existence of Subgradients

• Subdifferential Properties

• Optimality Conditions

Convex Optimization 1 Lecture 18 Convex-Constrained Non-differentiable Minimization

minimize f(x) subject to x ∈ C • Characteristics:

• The f : Rn → R is convex and possibly non-differentiable • The C ⊆ Rn is nonempty closed and convex • The optimal value f ∗ is finite

• Our focus here is non-differentiability

Convex Optimization 2 Lecture 18 Definition of Subgradient and Subdifferential Def. A vector s ∈ Rn is a subgradient of f at xˆ ∈ dom f when f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f Def. A subdifferential of f at xˆ ∈ dom f is the set of all subgradients s of f at xˆ ∈ dom f • The subdifferential of f at xˆ is denoted by ∂f(ˆx)

• When f is differentible at xˆ, we have ∂f(ˆx) = {∇f(ˆx)} (the subdif- ferential is a singleton)

• Examples  sign(x) for x 6= 0 f(x) = |x|, ∂f(0) = [−1, 1] for x = 0  x2 + 2|x| − 3 for |x| > 1 f(x) = 0 for |x| ≤ 1

Convex Optimization 3 Lecture 18 Subgradients and Epigraph

• Let s be a subgradient of f at xˆ: f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f • The subgradient inequality is equivalent to −sT xˆ + f(ˆx) ≤ −sT x + f(x) for all x ∈ dom f

• Let f(x) > −∞ for all x ∈ Rn. Then n epi f = {(x, w) | f(x) ≤ w, x ∈ R } Thus, −sT xˆ + f(ˆx) ≤ −sT x + w for all (x, w) ∈ epi f, equivalent to T T " −s # " xˆ # " −s # " x # ≤ for all (x, w) ∈ epi f 1 f(ˆx) 1 w

Therefore, the hyperplane n o H = (x, γ) ∈ Rn+1 | (−s, 1)T (x, γ) = (−s, 1)T (ˆx, f(ˆx)) supports epi f at the vector (ˆx, f(ˆx))

Convex Optimization 4 Lecture 18 Subdifferential Set Properties

Theorem 1 A subdifferential set ∂f(ˆx) is convex and closed

Proof H7. Theorem 2 (Existence) Let f be convex with a nonempty dom f. Then:

(a) For x ∈ relint(dom f), we have ∂f(x) 6= ∅.

(b) ∂f(x) 6= ∅ is nonempty and bounded if and only if x ∈ int(dom f). Implications

• The subdifferential ∂f(ˆx) is nonempty compact for every xˆ in the interior of dom f.

• When dom f = Rn, ∂f(x) is nonempty compact convex set for all x

Convex Optimization 5 Lecture 18

Partial Proof: If xˆ ∈ int(dom f), then ∂f(ˆx) is nonempty and bounded.

• ∂f(ˆx) Nonempty. Let xˆ be in the interior of dom f. The vector (ˆx, f(ˆx)) does not belong to the interior of epi f. The epigraph epi f is convex and by the Supporting Hyperlane Theorem, there is a vector (d, β) ∈ Rn+1, (d, β) 6= 0 such that dT xˆ + βf(ˆx) ≤ dT x + βw for all (x, w) ∈ epi f We have epi f = {(x, w) | f(x) ≤ w, x ∈ dom f}. Hence, dT xˆ + βf(ˆx) ≤ dT x + βw for all x ∈ dom f, f(x) ≤ w We must have β ≥ 0. We cannot have β = 0 (it would imply d = 0). Dividing by β, we see that −d/β is a subgradient of f at xˆ

Convex Optimization 6 Lecture 18

• ∂f(ˆx) Bounded. By the subgradient inequality, we have f(x) ≥ f(ˆx) + sT (x − xˆ) for all x ∈ dom f

Suppose that the subdifferential ∂f(ˆx) is unbounded. Let sk be a sequence of subgradients in ∂f(ˆx) with kskk → ∞. Since xˆ lies in the interior of domain, there exists a δ > 0 such that xˆ + δy ∈ dom f for any y ∈ n. Letting x =x ˆ + δ sk for any k, we R kskk have ! sk f xˆ + δ ≥ f(ˆx) + δkskk for all k kskk   As k → ∞, we have f xˆ + δ sk − f(ˆx) → ∞. kskk However, this relation contradicts the continuity of f at xˆ. [Recall, a is continuous over the interior of its domain.] √ Example Consider f(x) = − x with dom f = {x | x ≥ 0}. We have ∂f(0) = ∅. Note that 0 is not in the interior of the domain of f

Convex Optimization 7 Lecture 18 Boundedness of the Subdifferential Sets

Theorem 2 Let f : Rn → R be convex and let X be a bounded set. Then, the set

∪x∈X∂f(x) is bounded.

Proof

Assume that there is an unbounded sequence of subgradients sk, i.e.,

lim kskk = ∞, k→∞ where sk ∈ ∂f(xk) for some xk ∈ X. The sequence {xk} is bounded, so n it has a convergent subsequence, say {xk}K converging to some x ∈ R . sk Consider dk = for k ∈ K. This is a bounded sequence, and it has a kskk

Convex Optimization 8 Lecture 18

0 convergent subsequence, say {dk}K0 with K ⊆ K. Let d be the limit of {dk}K0.

Since sk ∈ ∂f(xk), we have for each k,

T f(xk + dk) ≥ f(xk) + sk dk = f(xk) + kskk. By letting k → ∞ with k ∈ K0, we see that

lim sup[f(xk + dk) − f(xk)] ≥ lim sup kskk. k→∞ k→∞ k∈K0 k∈K0 By continuity of f,

lim sup[f(xk + dk) − f(xk)] = f(x + d) − f(x), k→∞ k∈K0 hence finite, implying that kskk is bounded. This is a contradition ({sk} was assumed to be unbounded).

Convex Optimization 9 Lecture 18 Continuity of the Subdifferential

n Theorem 3 Let f : R → R be convex and let {xk} converge to some n x ∈ R . Let sk ∈ ∂f(xk) for all k. Then, the sequence {sk} is bounded and every of its limit points is a subgradient of f at x.

Proof H7.

Convex Optimization 10 Lecture 18 Subdifferential and Directional Derivatives

Definition The directional derivative f 0(x; d) of f at x along direction d is the following limiting value

f(x + αd) − f(x) f 0(x; d) = lim . α→0 α

f(x+αd)−f(x) • When f is convex, the ratio α is nondecreasing function of α > 0, and as α decreases to zero, the ratio converges to some value or decreases to −∞. (HW) Theorem 4 Let x ∈ int(dom f). Then, the directional derivative f 0(x; d) is finite for all d ∈ Rn. In particular, we have

f 0(x; d) = max sT d. s∈∂f(x)

Convex Optimization 11 Lecture 18 Proof When x ∈ int(dom f), the subdifferential ∂f(x) is nonempty and compact. Using the subgradient defining relation, we can see that f 0(x; d) ≥ sT d for all s ∈ ∂f(x). Therefore,

f 0(x; d) ≥ max sT d. s∈∂f(x)

To show that actually equality holds, we rely on Separating Hyperplane Theorem. Define

C1 = {(z, w) | z ∈ dom f, f(z) < w},

0 C2 = {(y, v) | y = x + αd, v = f(x) + αf (x; d), α ≥ 0}. These sets are nonempty, convex, and disjoint (HW). By the Separating

Convex Optimization 12 Lecture 18

Hyperplane Theorem, there exists a nonzero vector (a, β) ∈ Rn+1 such that aT (x + αd) + β(f(x) + αf 0(x; d)) ≤ aT z + βw, (1) for all α ≥ 0, z ∈ dom f, and f(z) < w. We must have β ≥ 0 - why? We cannot have β = 0 -why? Thus, β > 0 and we can divide by β the relation in (1), and obtain with ˜a = a/β,

˜aT (x + αd) + f(x) + αf 0(x; d) ≤ ˜aT z + w, (2)

for all α ≥ 0, z ∈ dom f, and f(z) < w. Choosing α = 0 and letting w ↓ f(z), we see ˜aT x + f(x) ≤ ˜aT z + f(z), implying that f(x) − ˜aT (z − x) ≤ f(z) for all z ∈ dom f. Therefore −˜a ∈ ∂f(x).

Convex Optimization 13 Lecture 18

Letting z = x, w ↓ f(z) and α = 1 in (2), we obtain

˜aT (x + d) + f(x) + f 0(x; d) ≤ ˜aT x + f(x),

0 T 0 T implying f (x; d) ≤ −˜a d. In view of f (x; d) ≥ maxs∈∂f(x) s d, it follows that f 0(x; d) = max sT d, s∈∂f(x) where the maximum is attained at the “constructed” subgradient −˜a.

Convex Optimization 14 Lecture 18 Optimality Conditions: Unconstrained Case

Unconstrained optimization minimize f(x) Assumption • The function f is convex (non-differentiable) and proper [f proper means f(x) > −∞ for all x and dom f 6= ∅] Theorem Under this assumption, a vector x∗ minimizes f over Rn if and only if

0 ∈ ∂f(x∗) • The result is a generalization of ∇f(x∗) = 0 • Proof x∗ is optimal if and only if f(x) ≥ f(x∗) for all x, or equivalently ∗ T ∗ n f(x) ≥ f(x ) + 0 (x − x ) for all x ∈ R Thus, x∗ is optimal if and only if 0 ∈ ∂f(x∗)

Convex Optimization 15 Lecture 18 Examples

• The function f(x) = |x|

 sign(x) for x 6= 0 ∂f(0) = [−1, 1] for x = 0

The minimum is at x∗ = 0, and evidently 0 ∈ ∂f(0)

• The function f(x) = kxk

 x  for x 6= 0 ∂f(x) = kxk {s | ksk ≤ 1} for x = 0

Again, the minimum is at x∗ = 0 and 0 ∈ ∂f(0)

Convex Optimization 16 Lecture 18

• The function f(x) = max{x2 + 2x − 3, x2 − 2x − 3, 4}

  2 x − 2x − 3 for x < −1  f(x) = 4 for x ∈ [−1, 1]   2 x + 2x − 3 for x > 1

 2 2 1  x − for x >   [−4, 0] for x = −1  ∂f(x) = 0 for x ∈ (−1, 1)   [0, 4] for x = 1   2x + 2 for x > 1

• The optimal set is X∗ = [−1, 1]

• For every x∗ ∈ X∗, we have 0 ∈ ∂f(x∗)

Convex Optimization 17 Lecture 18 Optimality Conditions: Constrained Case

Constrained optimization minimize f(x) subject to x ∈ C Assumption • The function f is convex (non-differentiable) and proper

• The set C is nonempty closed and convex Theorem Under this assumption, a vector x∗ ∈ C minimizes f over the set C if and only if there exists a subgradient d ∈ ∂f(x∗) such that

dT (x − x∗) ≥ 0 for all x ∈ C

• The result is a generalization of ∇f(x∗)T (x − x∗) ≥ 0 for x ∈ C

Convex Optimization 18