Convex Optimization Scai Lab Study Group
Total Page:16
File Type:pdf, Size:1020Kb
Convex Optimization ScAi Lab Study Group Zhiping (Patricia) Xiao University of California, Los Angeles Winter 2020 Outline 2 Introduction: Convex Optimization Convexity Convex Functions' Properties Definition of Convex Optimization Convex Optimization General Strategy Learning Algorithms Convergence Analysis Examples Source of Materials 3 Textbook: I Convex Optimization and Intro to Linear Algebra by Prof. Boyd and Prof. Vandenberghe Course Materials: I ECE236B, ECE236C offered by Prof. Vandenberghe I CS260 Lecture 12 offered by Prof. Quanquan Gu Notes: I My previous ECE236B notes and ECE236C final report. I My previous CS260 Cheat Sheet. Related Papers: I Accelerated methods for nonconvex optimization I Lipschitz regularity of deep neural networks: analysis and efficient estimation Introduction: Convex Optimization q Notations 5 I iff: if and only if I R+ = fx 2 R j x ≥ 0g I R++ = fx 2 R j x > 0g I int K: interior of set K, not its boundary. I Generalized inequalities (textbook 2.4), based on a proper cone K (convex, closed, solid, pointed | if x 2 K and −x 2 K then x = 0): I x K y () y − x 2 K I x ≺K y () y − x 2 int K n n T I Positive semidefinite matrix X 2 S+, 8y 2 R , y Xy ≥ 0 () X 0. Convexity of Set 6 Set C is convex iff the line segment between any two points in C lies in C, i.e. 8x1; x2 2 C and 8θ 2 [0; 1], we have: θx1 + (1 − θ)x2 2 C Both convex and nonconvex sets have convex hull, which is defined as: k k X X conv C = f θixi j xi 2 C; θi ≥ 0; i = 1; 2; : : : ; k; θi = 1g i=1 i=1 Convexity of Set (Examples from Textbook) 7 Figure: Left: convex, middle & right: nonconvex. Figure: Left: convex hull of the points, right: convex hull of the kidney-shaped set above. Operations that Preserve Convexity of SetsI 8 The most common operations that preserve convexity of convex sets include: I Intersection I Image / inverse image under affine function I Cartesian Product, Minkowski sum, Projection I Perspective function I Linear-fractional functions Operations that Preserve Convexity of SetsII 9 Convexity is preserved under intersection: I S1;S2 are convex sets then S1 \ S2 is also convex set. I If Sα is convex for 8α 2 A, then \α2ASα is convex. Proof: Intersection of a collection of convex sets is convex set. If the intersection is empty, or consists of only a single point, then proved by definition. Otherwise, for any two points A, B in the intersection, line AB must lie wholly within each set in the collection, hence must lie wholly within their intersection. Operations that Preserve Convexity of Sets III 10 n m An affine function f : R ! R is a sum of a linear function and a constant, i.e., if it has the form f(x) = Ax + b, where m×n m A 2 R , b 2 R , thus f represents a hyperplane. n Suppose that S ⊆ R is convex and then the image of S under f is convex: f(S) = ff(x) j x 2 Sg m n Also, if f : R ! R is an affine function, the inverse image of S under f is convex: f −1(S) = fx j f(x) 2 Sg Examples include scaling αS = ff(x) j αx; x 2 Sg (α 2 R) and n translation S + a = ff(x) j x + a; x 2 Sg (a 2 R ); they are both convex sets when S is convex. Operations that Preserve Convexity of SetsIV 11 Proof: the image of convex set S under affine function f(x) = Ax + b is also convex. If S is empty or contains only one point, then f(S) is obviously convex. Otherwise, take xS; yS 2 f(S). xS = f(x) = Ax + b, xS = f(y) = Ay + b. Then 8θ 2 [0; 1], we have: θxS + (1 − θ)yS = A(θx + (1 − θ)y) + b = f(θx + (1 − θ)y) Since x; y 2 S, and S is convex set, then θx + (1 − θ)y 2 S, and thus f(θx + (1 − θ)y) 2 f(S). Operations that Preserve Convexity of SetsV 12 n m The Cartesian Product of convex sets S1 ⊆ R , S2 ⊆ R is obviously convex: S1 × S2 = f(x1; x2) j x1 2 S1; x2 2 S2g The Minkowski sum of the two sets is defined as: S1 + S2 = fx1 + x2 j x1 2 S1; x2 2 S2g and it is also obviously convex. The projection of a convex set onto some of its coordinates is also obviously convex. (consider the definition of convexity reflected on each coordinate) m n T = fx1 2 R j (x1; x2) 2 S for some x2 2 R g Operations that Preserve Convexity of SetsVI 13 n+1 n We define the perspective function P : R ! R , with domain n dom P = R × R++, as P (z; t) = z=t. The perspective function scales or normalizes vectors so the last component is one, and then drops the last component. We can interpret the perspective function as the action of a pin-hole camera. (x1; x2; x3) through a hold at (0; 0; 0) on plane x3 = 0 forms an image at −(x1=x3; x2=x3; 1) at x3 = −1. The last component could be dropped, since the image point is fixed. Proof: That this operation preserves convexity is already proved by affine function + projection preserve convexity. Operations that Preserve Convexity of Sets VII 14 n m A linear-fractional function f : R ! R is formed by composing the perspective function with an affine function. n m+1 Consider the following affine function g : R ! R : A b g(x) = x + cT d m×n m n where A 2 R , b 2 R , c 2 R , d 2 R. m+1 m Followed by a perspective function P : R ! R we have: f(x) = (Ax + b)=(cT x + d); dom f = fx j cT x + d > 0g And it naturally preserves convexity because both affine function and perspective function preserve convexity. Convexity of Function 15 Convex Functions Strict Convex Functions Strong Convex Functions Figure: The three commonly-seen types of convex functions and their relations. In brief, strong convex functions ) strict convex functions ) convex functions. Convexity of Function 16 n f : R ! R is convex iff it satisfies: I dom f is a convex set. I 8x; y 2 dom f, θ 2 [0; 1], we have the Jensen's inequality: f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y) f is strictly convex iff when x 6= y and θ 2 (0; 1), strict inequality of the above inequation holds. f is concave when −f is convex, strictly concave when −f strictly convex, and vice versa. f is strong convex iff 9α > 0 such that f(x) − αkxk2 is convex. k·k is any norm. Convexity of Function 17 Proof: strong convex functions ) strict convex functions ) convex functions. That all strict convex functions are convex functions, and that convex functions are not necessarily strict convex. Strong convexity implies, 8x; y 2 dom f; θ 2 [0; 1], x 6= y, 9α > 0: f(θx + (1 − θ)y) − αkθx + (1 − θ)yk2 (1.1) ≤ θf(x) + (1 − θ)f(y) − θαkxk2 − (1 − θ)αkyk2 Something we didn't prove yet but is true: k·k2 is strictly convex. We need it for this proof. kθx + (1 − θ)yk2 < θkxk2 + (1 − θ)kyk2 Convexity of Function 18 (proof continues) αkθx + (1 − θ)yk2 < θαkxk2 + (1 − θ)αkyk2 t = −αkθx + (1 − θ)yk2 + θαkxk2 + (1 − θ)αkyk2 > 0 (1.1) is equivalent with: f(θx + (1 − θ)y) + t ≤ θf(x) + (1 − θ)f(y) where t > 0, thus: f(θx + (1 − θ)y) < θf(x) + (1 − θ)f(y) Convexity of Function 19 Figure: Convex function illustration from Prof. Gu's Slides. This figure shows a typical convex function f, and instead of our expression of x and y he used u & v instead. Convexity of Function 20 Commonly-seen uni-variate convex functions include: I Constant: C I Exponential function: eax I Power function: xa (a 2 (−∞; 0] [ [1; 1), otherwise it is concave) I Powers of absolute value: jxjp (p ≥ 1) I Logarithm: − log(x)(x 2 R++) I x log(x)(x 2 R++) I All norm functions kxk I \The inequality follows from the triangle inequality, and the equality follows from homogeneity of a norm." Affine functions are Convex & Concave 21 n m An affine function f : R ! R , f(x) = Ax + b, where m×n m A 2 R , b 2 R , is convex & concave (neither strict convex nor strict concave). Conversely, all functions that are both convex and concave are affine functions. Proof: 8θ 2 [0; 1]; x; y 2 dom f, we have: f(θx + (1 − θ)y) = A(θx + (1 − θ)y) + b = θ(Ax + b) + (1 − θ)(Ay + b) = θf(x) + (1 − θ)f(y) th 0 -Order Characterization 22 f is convex iff it is convex when restricted to any line that intersects its domain. n In other words, f is convex iff 8x 2 dom f and 8v 2 R , the function: g(t) = f(x + tv) is convex. dom g = ft j x + tv 2 dom fg This property allows us to check convexity of a function by restricting it to a line. st 1 -Order Characterization 23 Suppose f is differentiable (its gradient rf exists at each point in dom f, which is open). Then f is convex iff: I dom f is a convex set I 8x; y 2 dom f: f(y) ≥ f(x) + rf(x)T (y − x) It states that, for a convex function, the first-order Taylor approximation (f(x) + rf(x)T (y − x) is the first-order Taylor approximation of f near x) is in fact a global underestimator of the function.