Convex Optimization (EE227BT: UC Berkeley)
Total Page:16
File Type:pdf, Size:1020Kb
Convex Optimization (EE227BT: UC Berkeley) Lecture 1 (Convex sets and functions) August 28, 2014 ◦ Laurent El Ghaoui (slides by Suvrit Sra) Course organization I Course material and interaction: use bCourse I Relevant texts / references: Convex optimization { Boyd & Vandenberghe (BV) ~ Introductory lectures on convex optimisation { Nesterov ~ Nonlinear programming { Bertsekas ~ Convex Analysis { Rockafellar ~ Numerical optimization { Nocedal & Wright ~ Lectures on modern convex optimization { Nemirovski ~ Optimization for Machine Learning { Sra, Nowozin, Wright ~ Optimization Models { Calafiore, El Ghaoui (to appear in ~ November) I Instructor: Laurent El Ghaoui ([email protected]) I HW + Quizzes (40%); Midterm (30%); Project (30%) I TA: Vu Pham ([email protected]) I Office hours: Thu 9-10am (El Ghaoui), TBA (Vu Pham) I If you email me, please put EE227BT in Subject: Linear algebra recap Eigenvalues and Eigenvectors n×n n Def. If A C and x C . Consider the equation 2 2 Ax = λx; x = 0; λ C: 6 2 If scalar λ and vector x satisfy this equation, then λ is called an eigenvalue and x and eigenvector of A. Above equation may be rewritten equivalently as (λI A)x = 0; x = 0: − 6 Thus, λ is an eigenvalue, if and only if det(λI A) = 0: − Def. pA(t) := det(tI A) is called characteristic polynomial. − Eigenvalues are roots of characteristic polynomial. Eigenvalues and Eigenvectors n×n Theorem Let λ1; : : : ; λn be eigenvalues of A C . Then, 2 X X Y Tr(A) = aii = λi ; det(A) = λi : i i i n×n ∗ ∗ Def. Matrix U C unitary if U U = I ([U ]ij = [¯uji ]) 2 n×n Theorem (Schur factorization). If A C with eigenvalues 2 n×n ∗ λ1; : : : ; λn, then there is a unitary matrix U C (i.e., U U = I ), such that 2 ∗ U AU = T = [tij ] is upper triangular with diagonal entries tii = λi . Corollary. If A∗A = AA∗, then there exists a unitary U such that A = UΛU∗. We will call this the Eigenvector Decomposition. Proof.A = VTV ∗, A∗ = VT ∗V ∗, so AA∗ = TT ∗ = T ∗T = A∗A. But T is upper triangular, so only way for TT ∗ = T ∗T , some easy but tedious induction shows that T must be diagonal. Hence, T = Λ. Singular value decomposition m×n Theorem (SVD) Let A C . There are unitaries s.t. U and V 2 ∗ U AV = Diag(σ1; : : : ; σp); p = min(m; n); where σ1 σ2 σp 0. Usually written as ≥ ≥ · · · ≥ A = UΣV ∗: left singular vectors U are eigenvectors of AA∗ right singular vectors V are eigenvectors of A∗A p ∗ p ∗ nonzero singular values σi = λi (AA ) = λi (A A) Positive definite matrices n×n Def. Let A R be symmetric, i.e., aij = aji . Then, A is called positive definite2 if T X x Ax = xi aij xj > 0; x = 0: ij 8 6 If > replaced by , we call A positive semidefinite. ≥ Theorem A symmetric real matrix is positive semidefinite (positive definite) iff all its eigenvalues are nonnegative (positive). Theorem Every semidefinite matrix can be written as BT B Exercise: Prove this claim. Also prove converse. Notation: A 0 (posdef) or A 0 (semidef) Amongst most important objects in convex optimization! Matrix and vector calculus f (x) f (x) T P r x a = i xi ai a T P T x Ax = ij xi aij xj (A + A )x log det(X ) X −1 P T Tr(XA) = ij xij aji A T P Tr(X A) = ij xij aij A Tr(X T AX ) (A + AT )X Easily derived using \brute-force" rules Wikipedia | Suvrit's ancient notes | Matrix cookbook | Convex Sets Convex sets Convex sets n Def. A set C R is called convex, if for any x; y C, the line-segment θx⊂+ (1 θ)y (here θ 0) also lies in C. 2 − ≥ Combinations Convex: θ1x + θ2y C, where θ1; θ2 0 and θ1 + θ2 = 1. I 2 ≥ I Linear: if restrictions on θ1; θ2 are dropped I Conic: if restriction θ1 + θ2 = 1 is dropped Convex sets Theorem (Intersection). Let C1, C2 be convex sets. Then, C1 C2 is also convex. \ Proof. If C1 C2 = , then true vacuously. \ ; Let x; y C1 C2. Then, x; y C1 and x; y C2. 2 \ 2 2 But C1, C2 are convex, hence θx + (1 θ)y C1, and also in C2. − 2 Thus, θx + (1 θ)y C1 C2. − 2 m\ Inductively follows that Ci is also convex. \i=1 Convex sets { more examples (psdcone image from convexoptimization.com, Dattorro) n Let x1; x2;:::; xm R . Their convex hull is ~ 2 nX X o co(x1;:::; xm) := θi xi θi 0; θi = 1 : i j ≥ i m×n m Let A R , and b R . The set x Ax = b is convex (it ~ is an affine2 space over2 subspace of solutionsf j of Axg= 0). halfspace x aT x b . ~ polyhedron xj Ax ≤ b; Cx = d . ~ f j ≤T g ellipsoid x (x x0) A(x x0) 1 ,(A: semidefinite) ~ j − − P ≤ probability simplex x x 0; xi = 1 ~ f j ≥ i g ◦ Quiz: Prove that these sets are convex. Convex functions Convex functions Def. Function f : I R on interval I called midpoint convex if ! f x+y f (x)+f (y) ; whenever x; y I : 2 ≤ 2 2 Read: f of AM is less than or equal to AM of f . n Def. A function f : R R is called convex if its domain dom(f ) ! is a convex set and for any x; y dom(f ) and θ 0 2 ≥ f (θx + (1 θ)y) θf (x) + (1 θ)f (y): − ≤ − Theorem (J.L.W.V. Jensen). Let f : I R be continuous. Then, f is convex if and only if it is midpoint convex.! n I Theorem extends to functions f : R R. Very useful to checking convexity of a given function.X ⊆ ! Convex functions ) f(y) λ)f(y λf(x) + (1 − f(x) x y f (λx + (1 λ)y) λf (x) + (1 λ)f (y) − ≤ − Convex functions f(x) y i ), x − f(y ) + h∇ f(y f(y) y x f (x) f (y) + f (y); x y ≥ hr − i Convex functions P R Q x z = λx + (1 λ)y y − slope PQ slope PR slope QR ≤ ≤ Recognizing convex functions If f is continuous and midpoint convex, then it is convex. ♠ If f is differentiable, then f is convex if and only if dom f is ♠ convex and f (x) f (y) + f (y); x y for all x; y dom f . ≥ hr − i 2 If f is twice differentiable, then f is convex if and only if dom f ♠ is convex and 2f (x) 0 at every x dom f . r 2 Convex functions I Linear: f (θ1x + θ2y) = θ1f (x) + θ2f (y); θ1; θ2 unrestricted I Concave: f (θx + (1 θ)y) θf (x) + (1 θ)f (y) − ≥ − I Strictly convex: If inequality is strict for x = y 6 Convex functions Example The pointwise maximum of a family of convex functions is convex. That is, if f (x; y) is a convex function of x for every y in some \index set" , then Y f (x) := max f (x; y) y2Y is a convex function of x (set is arbitrary). Y n m×n m Example Let f : R R be convex. Let A R , and b R . Prove that g(x) = f!(Ax + b) is convex. 2 2 Exercise: Verify truth of above examples. Convex functions Theorem Let be a nonempty convex set. Suppose L(x; y) is convex in (x; y), then,Y f (x) := inf L(x; y) y2Y is a convex function of x, provided f (x) > . −∞ Proof. Let u; v dom f . Since f (u) = infy L(u; y), for each > 0, there 2 is a y1 , s.t. f (u) + 2 is not the infimum. Thus, L(u; y1) f (u) + 2 . 2 Y ≤ Similarly, there is y2 , such that L(v; y2) f (v) + 2 . Now we prove that f2(λ Yu + (1 λ)v) λf (u≤) + (1 λ)f (v) directly. − ≤ − f (λu + (1 λ)v) = inf L(λu + (1 λ)v; y) − y2Y − L(λu + (1 λ)v; λy1 + (1 λ)y2) ≤ − − λL(u; y1) + (1 λ)L(v; y2) ≤ − λf (u) + (1 λ)f (v) + . ≤ − Since > 0 is arbitrary, claim follows. Example: Schur complement Let A; B; C be matrices such that C 0, and let AB Z := 0; BT C then the Schur complement A BC −1BT 0. Proof.L (x; y) = [x; y]T Z[x; y] is− convex in (x; y) since Z 0 T −1 T Observe that f (x) = infy L(x; y) = x (A BC B )x is convex. − (We skipped ahead and solved y L(x; y) = 0 to minimize). r Recognizing convex functions If f is continuous and midpoint convex, then it is convex. ♠ If f is differentiable, then f is convex if and only if dom f is ♠ convex and f (x) f (y) + f (y); x y for all x; y dom f . ≥ hr − i 2 If f is twice differentiable, then f is convex if and only if dom f ♠ is convex and 2f (x) 0 at every x dom f . r 2 By showing f to be a pointwise max of convex functions ♠ By showing f : dom(f ) R is convex if and only if its ♠ restriction to any line that! intersects dom(f ) is convex. That is, for any x dom(f ) and any v, the function g(t) = f (x + tv) is convex (on2 its domain t x + tv dom(f ) ). f j 2 g See exercises (Ch.