CSC 576: Optimization Foundations
Ji Liu Department of Computer Sciences, University of Rochester
April 15, 2016
1 Introduction
We are interested in the following optimization problem
min f(x) + g(x) x s.t. x ∈ Ω, where f(x) is a smooth convex function, g(x) is a closed convex function, and Ω is a closed convex set. The following introduces some basic definitions in optimization.
2 Closed / Open Set, Bounded Set, Convex Set
There are several equivalent ways to defined open and closed sets. Here we only introduce one of them. Ω is an open set, if ∀x ∈ Ω, ∃ > 0 such that
Bx() ⊂ Ω where Bx() := {y | ky − xk ≤ } defines a ball with center x and radius r. Ω is an closed set, if Ωc is an open set. n ∅ and R are closed and open sets simultaneously. Ω is bounded, if ∃ > 0 such that Ω ⊂ B0(). A set Ω is convex if ∀x, y ∈ Ω and ∀θ ∈ [0, 1], the following holds
θx + (1 − θ)y ∈ Ω.
Two important properties for closed sets:
• The intersection of arbitrarily many closed sets is still closed;
• The union of finite number of closed sets is still closed. Note that the union of infinite number of closed sets may not be closed any more. For example,
∞ ∩n=1[1/n, 1] = (0, 1].
1 3 Convex Function, Closed Function
There are several ways to define convex function. We introduce two equivalent definitions. Let n f(x): R 7→ R ∪ {+∞}. f(x) is convex function if ∀x, y, ∀θ ∈ [0, 1], the following inequality holds: f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y).
The epigraph of the function f(x) is defined as a set
epif = {(x, t) | f(x) ≤ t}.
“f(x) is a convex function” is equivalent to saying “epif is a convex set”. The function f(x) is closed, if it epigraph epif is closed. If “f” is a continuous function, then f is closed. A smooth function is continuous, thus is closed. A few examples of convex functions: A linear function
T f1(x) = a x which is also closed. An indicator function 0 x ∈ Ω f (x) = 2 +∞ otherwise. where Ω is a convex set. This function is closed if Ω is closed. −x x ∈ [−1, 1) f3(x) = 1 x ∈ {1} +∞ otherwise.
This function is convex but not closed. If the function f(x) is smooth (or differentiable), then the following three statements are equiv- alent • f(x) is convex;
•∀ x, y, we have f(y) ≥ f(x) + h∇f(x), y − xi;
•∀ x, y, we have h∇f(x) − ∇f(y), x − yi ≥ 0. If the function f(x) is twice differentiable, then the following three statements are equivalent • f(x) is convex;
•∇ 2f(x) is positive semidefinite (PSD) for any x. Some important properties for preserving the convexity:
• If f1(x) and f2(x) are convex, then f1(x) + f2(x) is also convex;
• If f1(x) and f2(x) are convex, then max{f1(x), f2(x)} is also convex; • If f(x) is convex, then g(z) = f(Az + b) is also convex;
• If f(x, y) is convex, then g(x) = miny f(x, y) is convex.
2 4 Convexity for Differentiable Functions
Theorem 1. If a function f(x) is smooth (or differentiable), the following statements are equiva- lent, ∀x, y ∈ dom(f): 1. f(x) is convex; 2. f(y) ≥ f(x) + h∇f(x), y − xi; 3. h∇f(x) − ∇f(y), x − yi ≥ 0. Proof. We will complete the proof by first showing that (1) and (2) are equivalent, and then (2) and (3): (1)⇒(2): Choose any x, y ∈ dom(f) and consider f restricted to the linear combination of them, i.e., the function defined by g(t) = f(x+t(y −x)). Since f is convex, g is also convex. Notice that g(0) = f(x), g(1) = f(y), and g0(t) = h∇f(x + t(y − x)), y − xi. From the definition of g0(0), we have g(t) − g(0) (1 − t)g(0) + tg(1) − g(0) g0(0) = lim ≤ lim = g(1) − g(0) t→0 t t→0 t where the inequality uses the convexity of g. Hence we have g(1) ≥ g(0) + g0(0), which implies f(y) ≥ f(x) + h∇f(x), y − xi. (2)⇒(1): Choose any x, y ∈ dom(f), and 0 ≤ t ≤ 1, and let z = tx + (1 − t)y. Applying (2) yields f(x) ≥ f(z) + h∇f(z), x − zi, f(y) ≥ f(z) + h∇f(z), y − zi Multiplying the first inequality by t, the second by 1t, and adding them together yields tf(x) + (1 − t)f(y) ≥ f(z) which proves that f(x) is convex. (2)⇒(3): Changing variables in (2) yields f(y) ≥ f(x) + h∇f(x), x − yi Add that and (2) together and we have 0 ≥ h∇f(x) − ∇f(y), y − xi which derives (3) if we negate the inner product. (3)⇒(2): Use the same definition above for g. Applying Fundamental Theorem of Calculus (FTC) yields Z 1 g(1) = g(0) + g0(t) dt 0 By our definition, that is Z 1 f(y) = f(x) + h∇f(x + t(y − x)), y − xi dt 0 Z 1 = f(x) + h∇f(x), y − xi + h∇f(x + t(y − x)) − ∇f(x), y − xi dt 0 1 Let z = x+t(y −x). The last term in the equation above becomes h∇f(z)−∇f(x), t (z −x)i, which is greater than or equal to 0 because of (3), and then we have f(y) ≥ f(x) + h∇f(x), y − xi.
3 5 Convexity of Twice Differentiable Functions
Theorem 2. If a function f(x) is twice differentiable, the following statements are equivalent, ∀x ∈ dom(f):
1. f(x) is convex;
2. ∇2f(x) is PSD.
Proof. (1)⇒(2): Since f is differentiable, theorem 1 shows that
h∇f(y) − ∇f(x), y − xi ≥ 0
Define s = y − x and we have h∇f(x + s) − ∇f(x), si ≥ 0 Applying the FTC yields 1 Z t h(t) = h∇2f(x + λs)s, si dλ ≥ 0 t 0 for every t ∈ [0, 1]. As t approaches 0, we have
lim h(t) = h∇2f(x + λs)s, si ≥ 0 t→0 which means h∇2f(x)s, si ≥ 0, i.e., ∇2f(x) is PSD. (2)⇒(1): The mean value theorem shows there exists a λ ∈ [0, 1], such that
f(y) = f(x) + h∇f(x + λ(y − x)), y − xi = f(x) + h∇f(x), y − xi + h∇f(x + λ(y − x)) − ∇f(x), y − xi
Define g(y) to be the last term above, i.e., g = h∇f(x + λ(y − x)) − ∇f(x), y − xi. Notice that f is convex, if we can prove g is greater than or equal to 0. Applying the FTC to g yields
Z 1 Z λ g = h ∇2f(x + t(y − x))(y − x) dt, y − xi dλ 0 0 for every λ ∈ [0, 1]. Since ∇2f(x + t(y − x)) is PSD, the whole thing is greater than or equal to 0. This completes our proof.
6 Convexity of Smooth Convex Functions with Lipschitzian Gra- dient
A function f(x) has “L”-Lipschitzian gradient, if it satisfies the following condition:
L f(y) ≤ f(x) + h∇f(x), y − xi + kx − yk2. ∀x, y 2 Theorem 3. If a convex function f(x) has “L”-Lipschitzian gradient, the following statements are equivalent, ∀x, y ∈ dom(f):
4 L 2 1. f(y) − f(x) − h∇f(x), y − xi ≤ 2 ||y − x|| ; 1 2 2. f(y) ≥ f(x) + h∇f(x), y − xi + 2L ||∇f(x) − ∇f(y)|| ; 1 2 3. h∇f(x) − ∇f(y), x − yi ≥ L ||∇f(x) − ∇f(y)|| ; 4. ||∇f(x) − ∇f(y)|| ≤ L||x − y||.
Proof. (1)⇒(2): Define φ(·) = f(·) − h∇f(x), ·i. To prove (2) is equivalent to 1 1 φ(x) ≥ φ(y) − k∇f(x) − ∇f(y)k2 = φ(y) − k∇φ(y)k2. 2L 2L
Easy to see that x = arg miny φ(y) From (1), we know that φ(·) has L-Lipschitzian gradient. Therefore, we have 1 1 φ(x) = min φ(z) ≤ min ≤ min φ(y) − h∇φ(y), z − yi + kz − yk2 = φ(y) − k∇φ(y)k2. z z 2L 2L (2)⇒(3): Exchange x and y and then sum them up to obtain (3). (3)⇒(4): Applying triangle inequality, we have ||∇f(x) − ∇f(y), x − y|| ≤ ||∇f(x) − ∇f(y)|| · ||x − y||. Then from (3) we can get ||∇f(x) − ∇f(y)||2 ≤ L||x − y||. (4)⇒(1): We have
Z 1 f(y) =f(x) + h∇f(x + t(y − x)), y − xidt 0 Z 1 =f(x) + h∇f(x), y − xi + h∇f(x + t(y − x)) − ∇f(x), y − xidt 0 Z 1 ≤f(x) + h∇f(x), y − xi + tLky − xk2dt (due to (4)) 0 1 =f(x) + h∇f(x), y − xi + Lky − xk2. 2 It proves (1).
7 Properties of Strongly Convex Functions
A function f(x) is l-strongly convex, if it satisfies the following condition: l f(y) ≥ f(x) + h∇f(x), y − xi + kx − yk2. ∀x, y 2 Theorem 4. If a convex function f(x) is “l”-strongly convex, the following statements are equiv- alent, ∀x, y ∈ dom(f):
l 2 1. f(y) − f(x) − h∇f(x), y − xi ≥ 2 ||y − x|| ; l 2 2. f(y) ≤ f(x) + h∇f(x), y − xi + 2 ||∇f(x) − ∇f(y)|| ;
5 3. h∇f(x) − ∇f(y), x − yi ≥ l||∇f(x) − ∇f(y)||2;
4. ||∇f(x) − ∇f(y)|| ≥ l||x − y||.
Proof. You can follow the proof for Theorem 3.
6