Course Notes: Convex Analysis and Optimization

Course notes: Convex Analysis and Optimization Dmitriy Drusvyatskiy May 27, 2019 ii Contents 1 Review of Fundamentals 1 1.1 Inner products and linear maps . .1 1.2 Norms . .2 1.3 Eigenvalue and singular value decompositions of matrices . .4 1.4 Point-set topology and differentiability . .4 1.5 Fundamental theorems of calculus & accuracy in approximation8 2 Smooth minimization 13 2.1 Optimality conditions: Smooth Unconstrained . 13 2.2 Convexity, a first look . 15 2.3 Rates of convergence . 19 2.4 Two basic methods . 20 2.4.1 Majorization view of gradient descent . 21 2.4.2 Newton's method . 24 2.5 Computational complexity for smooth convex minimization . 26 2.6 Conjugate Gradient Method . 29 2.7 Optimal methods for smooth convex minimization . 34 2.7.1 Fast gradient methods . 34 2.7.2 Fast gradient methods through estimate sequences . 40 2.7.3 Optimal quadratic averaging . 46 3 Convex geometry and analysis 55 3.1 Basic convex geometry . 55 3.1.1 Separation theorems . 58 3.1.2 Cones and polarity . 60 3.1.3 Tangents and normals . 61 3.2 Convex functions: basic operations and continuity . 63 3.3 The Fenchel conjugate . 67 3.4 Differential properties . 70 3.5 Directional derivative . 72 3.6 The value function . 74 3.7 Duality and subdifferential calculus . 75 3.7.1 Fenchel-Rockafellar duality . 75 iii iv CONTENTS 3.7.2 Lagrangian Duality . 78 3.8 Moreau-Yosida envelope and the proximal map . 82 3.9 Orthogonally invariant functions . 84 Chapter 1 Review of Fundamentals 1.1 Inner products and linear maps Throughout, we fix an Euclidean space E, meaning that E is a finite- dimensional real vector space endowed with an inner product h·; ·i. Recall that an inner-product on E is an assignment h·; ·i: E × E ! R satisfying the following three properties for all x; y; z 2 E and scalars a; b 2 R: (Symmetry) hx; yi = hy; xi (Bilinearity) hax + by; zi = ahx; zi + bhy; zi (Positive definiteness) hx; xi ≥ 0 and equality hx; xi = 0 holds if and only if x = 0. The most familiar example is the Euclidean space of n-dimensional column vectors Rn, which unless otherwise stated we always equip with the Pn T dot-product hx; yi := i=1 xiyi. One can equivalently write hx; yi = x y. A basic result of linear algebra shows that all Euclidean spaces E can be identified with Rn for some integer n, once an orthonormal basis is cho- sen. Though such a basis-specific interpretation can be useful, it is often distracting, with the indices hiding the underlying geometry. Consequently, it is often best to think coordinate-free. The space of real m × n-matrices Rm×n furnishes another example of an Euclidean space, which we always equip with the trace product hX; Y i := T P tr X Y . Some arithmetic shows the equality hX; Y i = i;j XijYij. Thus the trace product on Rm×n is nothing but the usual dot-product on the matrices stretched out into long vectors. This viewpoint, however, is typically not very fruitful, and it is best to think of the trace product as a standalone object. An important Euclidean subspace of Rn×n is the space of real symmetric n × n-matrices Sn, along with the trace product hX; Y i := tr XY . 1 2 CHAPTER 1. REVIEW OF FUNDAMENTALS For any linear mapping A: E ! Y, there exists a unique linear mapping A∗ : Y ! E, called the adjoint, satisfying hAx; yi = hx; A∗yi for all points x 2 E; y 2 Y: In the most familiar case of E = Rn and Y = Rm, the matrix representing A∗ is simply the transpose of the matrix representing A. Exercise 1.1. Given a collection of real m × n matrices A1;A2;:::;Al, define the linear mapping A: Rm×n ! Rl by setting A(X) := (hA1;Xi; hA2;Xi;:::; hAl;Xi): ∗ Show that the adjoint is the mapping A y = y1A1 + y2A2 + ::: + ylAl. Linear mappings A between E and itself are called linear operators, and are said to be self-adjoint if equality A = A∗ holds. Self-adjoint operators on Rn are precisely those operators that are representable as symmetric matrices. A self-adjoint operator A is positive semi-definite, denoted A 0, whenever hAx; xi ≥ 0 for all x 2 E: Similarly, a self-adjoint operator A is positive definite, denoted A 0, whenever hAx; xi > 0 for all 0 6= x 2 E: A positive semidefinite linear operator A is positive definite if and only if A is invertible. Consider a self-adjoint operator A. A number λ is an eigenvalue of X if there exists a vector 0 6= v 2 E satisfying Av = λv. Any such vector v is called an eigenvector corresponding to λ. The Rayleigh-Ritz theorem shows that the following relation always holds: hAu; ui λ (A) ≤ ≤ λ (A) for all u 2 E n f0g; min hu; ui max where λmin(A) and λmax(A) are the minimal and maximal eigenvalues of A, respectively. Consequently, an operator A is positive semidefinite if and only λmin(A) ≥ 0 and A is positive definite if and only λmin(A) > 0. 1.2 Norms A norm on a vector space V is a function k·k: V! R for which the following three properties hold for all point x; y 2 V and scalars a 2 R: (Absolute homogeneity) kaxk = jaj · kxk (Triangle inequality) kx + yk ≤ kxk + kyk 1.2. NORMS 3 (Positivity) Equality kxk = 0 holds if and only if x = 0. The inner product in the Euclidean space E always induces a norm kxk := phx; xi. Unless specified otherwise, the symbol kxk for x 2 E will always denote this induced norm. For example, the dot product on Rn p 2 2 induces the usual 2-norm kxk2 = x1 + ::: + xn, while the trace product m×n p T on R induces the Frobenius norm kXkF = tr (X X). n Other important norms are the lp−norms on R : ( p p 1=p (jx1j + ::: + jxnj ) for 1 ≤ p < 1 kxkp = : maxfjx1j;:::; jxnjg for p = 1 The most notable of these are the l1, l2, and l1 norms. For an arbitrary norm k · k on E, the dual norm k · k∗ on E is defined by kvk∗ := maxfhv; xi : kxk ≤ 1g: n For p; q 2 [1; 1], the lp and lq norms on R are dual to each other whenever p−1 + q−1 = 1. For an arbitrary norm k · k on E, the Cauchy-Schwarz inequality holds: jhx; yij ≤ kxk · kyk∗: Exercise 1.2. Given a positive definite linear operator A on E, show that the assignment hv; wiA := hAv; wi is an inner product on E, with the in- p duced norm kvkA = hAv; vi. Show that the dual norm with respect to ∗ p −1 the original inner product is kvkA = kvkA−1 = hA v; vi. All norms on E are \equivalent" in the sense that any two are within a constant factor of each other. More precisely, for any two norms ρ1(·) and ρ2(·), there exist constants α; β ≥ 0 satisfying αρ1(x) ≤ ρ2(x) ≤ βρ1(x) for all x 2 E: Case in point, for any vector x 2 Rn, the relations hold: p kxk2 ≤ kxk1 ≤ nkxk2 p kxk1 ≤ kxk2 ≤ nkxk1 kxk1 ≤ kxk1 ≤ nkxk1: For our purposes, the term \equivalent" is a misnomer: the proportionality constants α; β strongly depend on the (often enormous) dimension of the vector space E. Hence measuring quantities in different norms can yield strikingly different conclusions. Consider a linear map A: E ! Y, and norms k · ka on E and k · kb on Y. We define the induced matrix norm kAka;b := max kAxkb: x: kxka≤1 4 CHAPTER 1. REVIEW OF FUNDAMENTALS The reader should verify the inequality kAxkb ≤ kAka;bkxka: In particular, if k · ka and k · kb are the norms induced by the inner products in E and Y, then the corresponding matrix norm is called the operator norm of A and will be denoted simply by kAk. In the case E = Y and a = b, we simply use the notation kAka for the induced norm. n m Exercise 1.3. Equip R and R with the lp-norms. Then for any matrix A 2 Rm×n, show the equalities kAk1 = max kA•jk1 j=1;:::;n kAk1 = max kAi•k1 i=1;:::;n where A•j and Ai• denote the j'th column and i'th row of A, respectively. 1.3 Eigenvalue and singular value decompositions of matrices The symbol Sn will denote the set of n × n real symmetric matrices, while O(n) will denote the set of n × n real orthogonal matrices { those satisfying XT X = XXT = I. Any symmetric matrix A 2 Sn admits an eigenvalue decomposition, meaning a factorization of the form A = UΛU T with U 2 O(n) and Λ 2 Sn a diagonal matrix. The diagonal elements of Λ are precisely the eigenvalues of A and the columns of U are corresponding eigenvectors. More generally, any matrix A 2 Rm×n admits a singular value decomposition, meaning a factorization of the form A = UDV T , where U 2 O(m) and V 2 O(n) are orthogonal matrices and D 2 Rm×n is a diagonal matrix with nonnegative diagonal entries. The diagonal elements of D are uniquely defined and are called the singular values of A. Supposing without loss of generality m ≤ n, the singular values of A are precisely the square roots of the eigenvalues of AAT .

Load more