Course Notes: Convex Analysis and Optimization

Course notes: Convex Analysis and Optimization Dmitriy Drusvyatskiy May 27, 2019 ii Contents 1 Review of Fundamentals 1 1.1 Inner products and linear maps . .1 1.2 Norms . .2 1.3 Eigenvalue and singular value decompositions of matrices . .4 1.4 Point-set topology and differentiability . .4 1.5 Fundamental theorems of calculus & accuracy in approximation8 2 Smooth minimization 13 2.1 Optimality conditions: Smooth Unconstrained . 13 2.2 Convexity, a first look . 15 2.3 Rates of convergence . 19 2.4 Two basic methods . 20 2.4.1 Majorization view of gradient descent . 21 2.4.2 Newton's method . 24 2.5 Computational complexity for smooth convex minimization . 26 2.6 Conjugate Gradient Method . 29 2.7 Optimal methods for smooth convex minimization . 34 2.7.1 Fast gradient methods . 34 2.7.2 Fast gradient methods through estimate sequences . 40 2.7.3 Optimal quadratic averaging . 46 3 Convex geometry and analysis 55 3.1 Basic convex geometry . 55 3.1.1 Separation theorems . 58 3.1.2 Cones and polarity . 60 3.1.3 Tangents and normals . 61 3.2 Convex functions: basic operations and continuity . 63 3.3 The Fenchel conjugate . 67 3.4 Differential properties . 70 3.5 Directional derivative . 72 3.6 The value function . 74 3.7 Duality and subdifferential calculus . 75 3.7.1 Fenchel-Rockafellar duality . 75 iii iv CONTENTS 3.7.2 Lagrangian Duality . 78 3.8 Moreau-Yosida envelope and the proximal map . 82 3.9 Orthogonally invariant functions . 84 Chapter 1 Review of Fundamentals 1.1 Inner products and linear maps Throughout, we fix an Euclidean space E, meaning that E is a finite- dimensional real vector space endowed with an inner product h·; ·i. Recall that an inner-product on E is an assignment h·; ·i: E × E ! R satisfying the following three properties for all x; y; z 2 E and scalars a; b 2 R: (Symmetry) hx; yi = hy; xi (Bilinearity) hax + by; zi = ahx; zi + bhy; zi (Positive definiteness) hx; xi ≥ 0 and equality hx; xi = 0 holds if and only if x = 0. The most familiar example is the Euclidean space of n-dimensional column vectors Rn, which unless otherwise stated we always equip with the Pn T dot-product hx; yi := i=1 xiyi. One can equivalently write hx; yi = x y. A basic result of linear algebra shows that all Euclidean spaces E can be identified with Rn for some integer n, once an orthonormal basis is cho- sen. Though such a basis-specific interpretation can be useful, it is often distracting, with the indices hiding the underlying geometry. Consequently, it is often best to think coordinate-free. The space of real m × n-matrices Rm×n furnishes another example of an Euclidean space, which we always equip with the trace product hX; Y i := T P tr X Y . Some arithmetic shows the equality hX; Y i = i;j XijYij. Thus the trace product on Rm×n is nothing but the usual dot-product on the matrices stretched out into long vectors. This viewpoint, however, is typically not very fruitful, and it is best to think of the trace product as a standalone object. An important Euclidean subspace of Rn×n is the space of real symmetric n × n-matrices Sn, along with the trace product hX; Y i := tr XY . 1 2 CHAPTER 1. REVIEW OF FUNDAMENTALS For any linear mapping A: E ! Y, there exists a unique linear mapping A∗ : Y ! E, called the adjoint, satisfying hAx; yi = hx; A∗yi for all points x 2 E; y 2 Y: In the most familiar case of E = Rn and Y = Rm, the matrix representing A∗ is simply the transpose of the matrix representing A. Exercise 1.1. Given a collection of real m × n matrices A1;A2;:::;Al, define the linear mapping A: Rm×n ! Rl by setting A(X) := (hA1;Xi; hA2;Xi;:::; hAl;Xi): ∗ Show that the adjoint is the mapping A y = y1A1 + y2A2 + ::: + ylAl. Linear mappings A between E and itself are called linear operators, and are said to be self-adjoint if equality A = A∗ holds. Self-adjoint operators on Rn are precisely those operators that are representable as symmetric matrices. A self-adjoint operator A is positive semi-definite, denoted A 0, whenever hAx; xi ≥ 0 for all x 2 E: Similarly, a self-adjoint operator A is positive definite, denoted A 0, whenever hAx; xi > 0 for all 0 6= x 2 E: A positive semidefinite linear operator A is positive definite if and only if A is invertible. Consider a self-adjoint operator A. A number λ is an eigenvalue of X if there exists a vector 0 6= v 2 E satisfying Av = λv. Any such vector v is called an eigenvector corresponding to λ. The Rayleigh-Ritz theorem shows that the following relation always holds: hAu; ui λ (A) ≤ ≤ λ (A) for all u 2 E n f0g; min hu; ui max where λmin(A) and λmax(A) are the minimal and maximal eigenvalues of A, respectively. Consequently, an operator A is positive semidefinite if and only λmin(A) ≥ 0 and A is positive definite if and only λmin(A) > 0. 1.2 Norms A norm on a vector space V is a function k·k: V! R for which the following three properties hold for all point x; y 2 V and scalars a 2 R: (Absolute homogeneity) kaxk = jaj · kxk (Triangle inequality) kx + yk ≤ kxk + kyk 1.2. NORMS 3 (Positivity) Equality kxk = 0 holds if and only if x = 0. The inner product in the Euclidean space E always induces a norm kxk := phx; xi. Unless specified otherwise, the symbol kxk for x 2 E will always denote this induced norm. For example, the dot product on Rn p 2 2 induces the usual 2-norm kxk2 = x1 + ::: + xn, while the trace product m×n p T on R induces the Frobenius norm kXkF = tr (X X). n Other important norms are the lp−norms on R : ( p p 1=p (jx1j + ::: + jxnj ) for 1 ≤ p < 1 kxkp = : maxfjx1j;:::; jxnjg for p = 1 The most notable of these are the l1, l2, and l1 norms. For an arbitrary norm k · k on E, the dual norm k · k∗ on E is defined by kvk∗ := maxfhv; xi : kxk ≤ 1g: n For p; q 2 [1; 1], the lp and lq norms on R are dual to each other whenever p−1 + q−1 = 1. For an arbitrary norm k · k on E, the Cauchy-Schwarz inequality holds: jhx; yij ≤ kxk · kyk∗: Exercise 1.2. Given a positive definite linear operator A on E, show that the assignment hv; wiA := hAv; wi is an inner product on E, with the in- p duced norm kvkA = hAv; vi. Show that the dual norm with respect to ∗ p −1 the original inner product is kvkA = kvkA−1 = hA v; vi. All norms on E are \equivalent" in the sense that any two are within a constant factor of each other. More precisely, for any two norms ρ1(·) and ρ2(·), there exist constants α; β ≥ 0 satisfying αρ1(x) ≤ ρ2(x) ≤ βρ1(x) for all x 2 E: Case in point, for any vector x 2 Rn, the relations hold: p kxk2 ≤ kxk1 ≤ nkxk2 p kxk1 ≤ kxk2 ≤ nkxk1 kxk1 ≤ kxk1 ≤ nkxk1: For our purposes, the term \equivalent" is a misnomer: the proportionality constants α; β strongly depend on the (often enormous) dimension of the vector space E. Hence measuring quantities in different norms can yield strikingly different conclusions. Consider a linear map A: E ! Y, and norms k · ka on E and k · kb on Y. We define the induced matrix norm kAka;b := max kAxkb: x: kxka≤1 4 CHAPTER 1. REVIEW OF FUNDAMENTALS The reader should verify the inequality kAxkb ≤ kAka;bkxka: In particular, if k · ka and k · kb are the norms induced by the inner products in E and Y, then the corresponding matrix norm is called the operator norm of A and will be denoted simply by kAk. In the case E = Y and a = b, we simply use the notation kAka for the induced norm. n m Exercise 1.3. Equip R and R with the lp-norms. Then for any matrix A 2 Rm×n, show the equalities kAk1 = max kA•jk1 j=1;:::;n kAk1 = max kAi•k1 i=1;:::;n where A•j and Ai• denote the j'th column and i'th row of A, respectively. 1.3 Eigenvalue and singular value decompositions of matrices The symbol Sn will denote the set of n × n real symmetric matrices, while O(n) will denote the set of n × n real orthogonal matrices { those satisfying XT X = XXT = I. Any symmetric matrix A 2 Sn admits an eigenvalue decomposition, meaning a factorization of the form A = UΛU T with U 2 O(n) and Λ 2 Sn a diagonal matrix. The diagonal elements of Λ are precisely the eigenvalues of A and the columns of U are corresponding eigenvectors. More generally, any matrix A 2 Rm×n admits a singular value decomposition, meaning a factorization of the form A = UDV T , where U 2 O(m) and V 2 O(n) are orthogonal matrices and D 2 Rm×n is a diagonal matrix with nonnegative diagonal entries. The diagonal elements of D are uniquely defined and are called the singular values of A. Supposing without loss of generality m ≤ n, the singular values of A are precisely the square roots of the eigenvalues of AAT .

Course Notes: Convex Analysis and Optimization

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support