CSC 576: Optimization Foundations
Total Page:16
File Type:pdf, Size:1020Kb
CSC 576: Optimization Foundations Ji Liu Department of Computer Sciences, University of Rochester April 15, 2016 1 Introduction We are interested in the following optimization problem min f(x) + g(x) x s.t. x 2 Ω; where f(x) is a smooth convex function, g(x) is a closed convex function, and Ω is a closed convex set. The following introduces some basic definitions in optimization. 2 Closed / Open Set, Bounded Set, Convex Set There are several equivalent ways to defined open and closed sets. Here we only introduce one of them. Ω is an open set, if 8x 2 Ω, 9 > 0 such that Bx() ⊂ Ω where Bx() := fy j ky − xk ≤ g defines a ball with center x and radius r. Ω is an closed set, if Ωc is an open set. n ; and R are closed and open sets simultaneously. Ω is bounded, if 9 > 0 such that Ω ⊂ B0(): A set Ω is convex if 8x; y 2 Ω and 8θ 2 [0; 1], the following holds θx + (1 − θ)y 2 Ω: Two important properties for closed sets: • The intersection of arbitrarily many closed sets is still closed; • The union of finite number of closed sets is still closed. Note that the union of infinite number of closed sets may not be closed any more. For example, 1 \n=1[1=n; 1] = (0; 1]: 1 3 Convex Function, Closed Function There are several ways to define convex function. We introduce two equivalent definitions. Let n f(x): R 7! R [ f+1g. f(x) is convex function if 8x; y, 8θ 2 [0; 1], the following inequality holds: f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y): The epigraph of the function f(x) is defined as a set epif = f(x; t) j f(x) ≤ tg: \f(x) is a convex function" is equivalent to saying \epif is a convex set". The function f(x) is closed, if it epigraph epif is closed. If \f" is a continuous function, then f is closed. A smooth function is continuous, thus is closed. A few examples of convex functions: A linear function T f1(x) = a x which is also closed. An indicator function 0 x 2 Ω f (x) = 2 +1 otherwise. where Ω is a convex set. This function is closed if Ω is closed. 8 < −x x 2 [−1; 1) f3(x) = 1 x 2 f1g : +1 otherwise. This function is convex but not closed. If the function f(x) is smooth (or differentiable), then the following three statements are equiv- alent • f(x) is convex; •8 x; y, we have f(y) ≥ f(x) + hrf(x); y − xi; •8 x; y, we have hrf(x) − rf(y); x − yi ≥ 0. If the function f(x) is twice differentiable, then the following three statements are equivalent • f(x) is convex; •r 2f(x) is positive semidefinite (PSD) for any x. Some important properties for preserving the convexity: • If f1(x) and f2(x) are convex, then f1(x) + f2(x) is also convex; • If f1(x) and f2(x) are convex, then maxff1(x); f2(x)g is also convex; • If f(x) is convex, then g(z) = f(Az + b) is also convex; • If f(x; y) is convex, then g(x) = miny f(x; y) is convex. 2 4 Convexity for Differentiable Functions Theorem 1. If a function f(x) is smooth (or differentiable), the following statements are equiva- lent, 8x; y 2 dom(f): 1. f(x) is convex; 2. f(y) ≥ f(x) + hrf(x); y − xi; 3. hrf(x) − rf(y); x − yi ≥ 0. Proof. We will complete the proof by first showing that (1) and (2) are equivalent, and then (2) and (3): (1))(2): Choose any x; y 2 dom(f) and consider f restricted to the linear combination of them, i.e., the function defined by g(t) = f(x+t(y −x)). Since f is convex, g is also convex. Notice that g(0) = f(x); g(1) = f(y), and g0(t) = hrf(x + t(y − x)); y − xi. From the definition of g0(0), we have g(t) − g(0) (1 − t)g(0) + tg(1) − g(0) g0(0) = lim ≤ lim = g(1) − g(0) t!0 t t!0 t where the inequality uses the convexity of g. Hence we have g(1) ≥ g(0) + g0(0), which implies f(y) ≥ f(x) + hrf(x); y − xi. (2))(1): Choose any x; y 2 dom(f), and 0 ≤ t ≤ 1, and let z = tx + (1 − t)y. Applying (2) yields f(x) ≥ f(z) + hrf(z); x − zi; f(y) ≥ f(z) + hrf(z); y − zi Multiplying the first inequality by t, the second by 1t, and adding them together yields tf(x) + (1 − t)f(y) ≥ f(z) which proves that f(x) is convex. (2))(3): Changing variables in (2) yields f(y) ≥ f(x) + hrf(x); x − yi Add that and (2) together and we have 0 ≥ hrf(x) − rf(y); y − xi which derives (3) if we negate the inner product. (3))(2): Use the same definition above for g. Applying Fundamental Theorem of Calculus (FTC) yields Z 1 g(1) = g(0) + g0(t) dt 0 By our definition, that is Z 1 f(y) = f(x) + hrf(x + t(y − x)); y − xi dt 0 Z 1 = f(x) + hrf(x); y − xi + hrf(x + t(y − x)) − rf(x); y − xi dt 0 1 Let z = x+t(y −x). The last term in the equation above becomes hrf(z)−rf(x); t (z −x)i, which is greater than or equal to 0 because of (3), and then we have f(y) ≥ f(x) + hrf(x); y − xi. 3 5 Convexity of Twice Differentiable Functions Theorem 2. If a function f(x) is twice differentiable, the following statements are equivalent, 8x 2 dom(f): 1. f(x) is convex; 2. r2f(x) is PSD. Proof. (1))(2): Since f is differentiable, theorem 1 shows that hrf(y) − rf(x); y − xi ≥ 0 Define s = y − x and we have hrf(x + s) − rf(x); si ≥ 0 Applying the FTC yields 1 Z t h(t) = hr2f(x + λs)s; si dλ ≥ 0 t 0 for every t 2 [0; 1]. As t approaches 0, we have lim h(t) = hr2f(x + λs)s; si ≥ 0 t!0 which means hr2f(x)s; si ≥ 0, i.e., r2f(x) is PSD. (2))(1): The mean value theorem shows there exists a λ 2 [0; 1], such that f(y) = f(x) + hrf(x + λ(y − x)); y − xi = f(x) + hrf(x); y − xi + hrf(x + λ(y − x)) − rf(x); y − xi Define g(y) to be the last term above, i.e., g = hrf(x + λ(y − x)) − rf(x); y − xi. Notice that f is convex, if we can prove g is greater than or equal to 0. Applying the FTC to g yields Z 1 Z λ g = h r2f(x + t(y − x))(y − x) dt; y − xi dλ 0 0 for every λ 2 [0; 1]. Since r2f(x + t(y − x)) is PSD, the whole thing is greater than or equal to 0. This completes our proof. 6 Convexity of Smooth Convex Functions with Lipschitzian Gra- dient A function f(x) has \L"-Lipschitzian gradient, if it satisfies the following condition: L f(y) ≤ f(x) + hrf(x); y − xi + kx − yk2: 8x; y 2 Theorem 3. If a convex function f(x) has \L"-Lipschitzian gradient, the following statements are equivalent, 8x; y 2 dom(f): 4 L 2 1. f(y) − f(x) − hrf(x); y − xi ≤ 2 jjy − xjj ; 1 2 2. f(y) ≥ f(x) + hrf(x); y − xi + 2L jjrf(x) − rf(y)jj ; 1 2 3. hrf(x) − rf(y); x − yi ≥ L jjrf(x) − rf(y)jj ; 4. jjrf(x) − rf(y)jj ≤ Ljjx − yjj. Proof. (1))(2): Define φ(·) = f(·) − hrf(x); ·i: To prove (2) is equivalent to 1 1 φ(x) ≥ φ(y) − krf(x) − rf(y)k2 = φ(y) − krφ(y)k2: 2L 2L Easy to see that x = arg miny φ(y) From (1), we know that φ(·) has L-Lipschitzian gradient. Therefore, we have 1 1 φ(x) = min φ(z) ≤ min ≤ min φ(y) − hrφ(y); z − yi + kz − yk2 = φ(y) − krφ(y)k2: z z 2L 2L (2))(3): Exchange x and y and then sum them up to obtain (3). (3))(4): Applying triangle inequality, we have jjrf(x) − rf(y); x − yjj ≤ jjrf(x) − rf(y)jj · jjx − yjj. Then from (3) we can get jjrf(x) − rf(y)jj2 ≤ Ljjx − yjj. (4))(1): We have Z 1 f(y) =f(x) + hrf(x + t(y − x)); y − xidt 0 Z 1 =f(x) + hrf(x); y − xi + hrf(x + t(y − x)) − rf(x); y − xidt 0 Z 1 ≤f(x) + hrf(x); y − xi + tLky − xk2dt (due to (4)) 0 1 =f(x) + hrf(x); y − xi + Lky − xk2: 2 It proves (1). 7 Properties of Strongly Convex Functions A function f(x) is l-strongly convex, if it satisfies the following condition: l f(y) ≥ f(x) + hrf(x); y − xi + kx − yk2: 8x; y 2 Theorem 4. If a convex function f(x) is \l"-strongly convex, the following statements are equiv- alent, 8x; y 2 dom(f): l 2 1. f(y) − f(x) − hrf(x); y − xi ≥ 2 jjy − xjj ; l 2 2.