CSC 576: Optimization Foundations

Ji Liu Department of Computer Sciences, University of Rochester

April 15, 2016

1 Introduction

We are interested in the following optimization problem

min f(x) + g(x) x s.t. x ∈ Ω, where f(x) is a smooth , g(x) is a closed convex function, and Ω is a closed . The following introduces some basic definitions in optimization.

2 Closed / Open Set, Bounded Set, Convex Set

There are several equivalent ways to defined open and closed sets. Here we only introduce one of them. Ω is an open set, if ∀x ∈ Ω, ∃ > 0 such that

Bx() ⊂ Ω where Bx() := {y | ky − xk ≤ } defines a ball with center x and radius r. Ω is an closed set, if Ωc is an open set. n ∅ and R are closed and open sets simultaneously. Ω is bounded, if ∃ > 0 such that Ω ⊂ B0(). A set Ω is convex if ∀x, y ∈ Ω and ∀θ ∈ [0, 1], the following holds

θx + (1 − θ)y ∈ Ω.

Two important properties for closed sets:

• The intersection of arbitrarily many closed sets is still closed;

• The union of finite number of closed sets is still closed. Note that the union of infinite number of closed sets may not be closed any more. For example,

∞ ∩n=1[1/n, 1] = (0, 1].

1 3 Convex Function, Closed Function

There are several ways to define convex function. We introduce two equivalent definitions. Let n f(x): R 7→ R ∪ {+∞}. f(x) is convex function if ∀x, y, ∀θ ∈ [0, 1], the following inequality holds: f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y).

The of the function f(x) is defined as a set

epif = {(x, t) | f(x) ≤ t}.

“f(x) is a convex function” is equivalent to saying “epif is a convex set”. The function f(x) is closed, if it epigraph epif is closed. If “f” is a , then f is closed. A smooth function is continuous, thus is closed. A few examples of convex functions: A linear function

T f1(x) = a x which is also closed. An indicator function  0 x ∈ Ω f (x) = 2 +∞ otherwise. where Ω is a convex set. This function is closed if Ω is closed.   −x x ∈ [−1, 1) f3(x) = 1 x ∈ {1}  +∞ otherwise.

This function is convex but not closed. If the function f(x) is smooth (or differentiable), then the following three statements are equiv- alent • f(x) is convex;

•∀ x, y, we have f(y) ≥ f(x) + h∇f(x), y − xi;

•∀ x, y, we have h∇f(x) − ∇f(y), x − yi ≥ 0. If the function f(x) is twice differentiable, then the following three statements are equivalent • f(x) is convex;

•∇ 2f(x) is positive semidefinite (PSD) for any x. Some important properties for preserving the convexity:

• If f1(x) and f2(x) are convex, then f1(x) + f2(x) is also convex;

• If f1(x) and f2(x) are convex, then max{f1(x), f2(x)} is also convex; • If f(x) is convex, then g(z) = f(Az + b) is also convex;

• If f(x, y) is convex, then g(x) = miny f(x, y) is convex.

2 4 Convexity for Differentiable Functions

Theorem 1. If a function f(x) is smooth (or differentiable), the following statements are equiva- lent, ∀x, y ∈ dom(f): 1. f(x) is convex; 2. f(y) ≥ f(x) + h∇f(x), y − xi; 3. h∇f(x) − ∇f(y), x − yi ≥ 0. Proof. We will complete the proof by first showing that (1) and (2) are equivalent, and then (2) and (3): (1)⇒(2): Choose any x, y ∈ dom(f) and consider f restricted to the linear combination of them, i.e., the function defined by g(t) = f(x+t(y −x)). Since f is convex, g is also convex. Notice that g(0) = f(x), g(1) = f(y), and g0(t) = h∇f(x + t(y − x)), y − xi. From the definition of g0(0), we have g(t) − g(0) (1 − t)g(0) + tg(1) − g(0) g0(0) = lim ≤ lim = g(1) − g(0) t→0 t t→0 t where the inequality uses the convexity of g. Hence we have g(1) ≥ g(0) + g0(0), which implies f(y) ≥ f(x) + h∇f(x), y − xi. (2)⇒(1): Choose any x, y ∈ dom(f), and 0 ≤ t ≤ 1, and let z = tx + (1 − t)y. Applying (2) yields f(x) ≥ f(z) + h∇f(z), x − zi, f(y) ≥ f(z) + h∇f(z), y − zi Multiplying the first inequality by t, the second by 1t, and adding them together yields tf(x) + (1 − t)f(y) ≥ f(z) which proves that f(x) is convex. (2)⇒(3): Changing variables in (2) yields f(y) ≥ f(x) + h∇f(x), x − yi Add that and (2) together and we have 0 ≥ h∇f(x) − ∇f(y), y − xi which derives (3) if we negate the inner product. (3)⇒(2): Use the same definition above for g. Applying Fundamental Theorem of Calculus (FTC) yields Z 1 g(1) = g(0) + g0(t) dt 0 By our definition, that is Z 1 f(y) = f(x) + h∇f(x + t(y − x)), y − xi dt 0 Z 1 = f(x) + h∇f(x), y − xi + h∇f(x + t(y − x)) − ∇f(x), y − xi dt 0 1 Let z = x+t(y −x). The last term in the equation above becomes h∇f(z)−∇f(x), t (z −x)i, which is greater than or equal to 0 because of (3), and then we have f(y) ≥ f(x) + h∇f(x), y − xi.

3 5 Convexity of Twice Differentiable Functions

Theorem 2. If a function f(x) is twice differentiable, the following statements are equivalent, ∀x ∈ dom(f):

1. f(x) is convex;

2. ∇2f(x) is PSD.

Proof. (1)⇒(2): Since f is differentiable, theorem 1 shows that

h∇f(y) − ∇f(x), y − xi ≥ 0

Define s = y − x and we have h∇f(x + s) − ∇f(x), si ≥ 0 Applying the FTC yields 1 Z t h(t) = h∇2f(x + λs)s, si dλ ≥ 0 t 0 for every t ∈ [0, 1]. As t approaches 0, we have

lim h(t) = h∇2f(x + λs)s, si ≥ 0 t→0 which means h∇2f(x)s, si ≥ 0, i.e., ∇2f(x) is PSD. (2)⇒(1): The mean value theorem shows there exists a λ ∈ [0, 1], such that

f(y) = f(x) + h∇f(x + λ(y − x)), y − xi = f(x) + h∇f(x), y − xi + h∇f(x + λ(y − x)) − ∇f(x), y − xi

Define g(y) to be the last term above, i.e., g = h∇f(x + λ(y − x)) − ∇f(x), y − xi. Notice that f is convex, if we can prove g is greater than or equal to 0. Applying the FTC to g yields

Z 1 Z λ g = h ∇2f(x + t(y − x))(y − x) dt, y − xi dλ 0 0 for every λ ∈ [0, 1]. Since ∇2f(x + t(y − x)) is PSD, the whole thing is greater than or equal to 0. This completes our proof.

6 Convexity of Smooth Convex Functions with Lipschitzian Gra- dient

A function f(x) has “L”-Lipschitzian gradient, if it satisfies the following condition:

L f(y) ≤ f(x) + h∇f(x), y − xi + kx − yk2. ∀x, y 2 Theorem 3. If a convex function f(x) has “L”-Lipschitzian gradient, the following statements are equivalent, ∀x, y ∈ dom(f):

4 L 2 1. f(y) − f(x) − h∇f(x), y − xi ≤ 2 ||y − x|| ; 1 2 2. f(y) ≥ f(x) + h∇f(x), y − xi + 2L ||∇f(x) − ∇f(y)|| ; 1 2 3. h∇f(x) − ∇f(y), x − yi ≥ L ||∇f(x) − ∇f(y)|| ; 4. ||∇f(x) − ∇f(y)|| ≤ L||x − y||.

Proof. (1)⇒(2): Define φ(·) = f(·) − h∇f(x), ·i. To prove (2) is equivalent to 1 1 φ(x) ≥ φ(y) − k∇f(x) − ∇f(y)k2 = φ(y) − k∇φ(y)k2. 2L 2L

Easy to see that x = arg miny φ(y) From (1), we know that φ(·) has L-Lipschitzian gradient. Therefore, we have 1 1 φ(x) = min φ(z) ≤ min ≤ min φ(y) − h∇φ(y), z − yi + kz − yk2 = φ(y) − k∇φ(y)k2. z z 2L 2L (2)⇒(3): Exchange x and y and then sum them up to obtain (3). (3)⇒(4): Applying triangle inequality, we have ||∇f(x) − ∇f(y), x − y|| ≤ ||∇f(x) − ∇f(y)|| · ||x − y||. Then from (3) we can get ||∇f(x) − ∇f(y)||2 ≤ L||x − y||. (4)⇒(1): We have

Z 1 f(y) =f(x) + h∇f(x + t(y − x)), y − xidt 0 Z 1 =f(x) + h∇f(x), y − xi + h∇f(x + t(y − x)) − ∇f(x), y − xidt 0 Z 1 ≤f(x) + h∇f(x), y − xi + tLky − xk2dt (due to (4)) 0 1 =f(x) + h∇f(x), y − xi + Lky − xk2. 2 It proves (1).

7 Properties of Strongly Convex Functions

A function f(x) is l-strongly convex, if it satisfies the following condition: l f(y) ≥ f(x) + h∇f(x), y − xi + kx − yk2. ∀x, y 2 Theorem 4. If a convex function f(x) is “l”-strongly convex, the following statements are equiv- alent, ∀x, y ∈ dom(f):

l 2 1. f(y) − f(x) − h∇f(x), y − xi ≥ 2 ||y − x|| ; l 2 2. f(y) ≤ f(x) + h∇f(x), y − xi + 2 ||∇f(x) − ∇f(y)|| ;

5 3. h∇f(x) − ∇f(y), x − yi ≥ l||∇f(x) − ∇f(y)||2;

4. ||∇f(x) − ∇f(y)|| ≥ l||x − y||.

Proof. You can follow the proof for Theorem 3.

6