Lecture 3: Convex Sets and Functions 3.1 Convex Sets
Total Page:16
File Type:pdf, Size:1020Kb
EE 227A: Convex Optimization and Applications January 24, 2012 Lecture 3: Convex Sets and Functions Lecturer: Laurent El Ghaoui Reading assignment: Chapters 2 (except x2.6) and sections 3.1, 3.2, 3.3 of BV. You can also look at section 3.1 of the web textbook. 3.1 Convex Sets Definition. A subset C of Rn is convex if and only if it contains the line segment between any two points in it: 8 x1; x2 2 C; 8 θ1 ≥ 0; θ2 ≥ 0; θ1 + θ2 = 1 : θ1x1 + θ2x2 2 C: Some important special cases of convex sets are the following. • The set is said to be an affine subspace if it contains the entire line passing through any two points. This corresponds to the condition above, with θ1; θ2 arbitrary. Subspaces and affine subspaces are convex. • The set is said to be a convex cone if the condition above holds, but with the restriction θ1 + θ2 = 1 removed. Examples. • The convex hull of a set of points fx1; : : : ; xmg is defined as ( m m ) X m X Co(x1; : : : ; xm) := λixi : λ 2 R+ ; λi = 1 ; i=1 i=1 and is convex. The conic hull: ( m ) X m λixi : λ 2 R+ i=1 is a convex cone. • For a 2 Rn, and b 2 R, the hyperplane H = fx : aT x = bg is affine. The half-space fx : aT x ≤ bg is convex. 3-1 EE 227A Lecture 3 | January 24, 2012 Sp'12 n×n n • For a square, non-singular matrix R 2 R , and xc 2 R , the ellipsoid fxc + Ru : kuk2 ≤ 1g is convex. (The center of the epllipsoid is xc, and you can think of R as the "radius".) With P = RRT , and assuming R is full rank, we can describe the ellipsoid as T −1 x :(x − xc) P (x − xc) ≤ 1 : • A polyhedron is a set described by a finite number of affine inequalities and equalities: P = fx : Ax ≤ b; Cx = dg ; where A; C are matrices, b; d are vectors, and inequalities are understood component- wise. Sometimes bounded polyhedra are referred to as polytopes. • The probability simplex ( n ) n X p 2 R+ : pi = 1 i=1 is a special case of a polyhedron, and is useful to describe discrete probabilities. • The second-order cone n+1 (x; t) 2 R : t ≥ kxk2 (3.1) is a convex cone. It is sometimes called \ice-cream cone", for obvious reasons. (We will prove the convexity of this set later.) • The positive semi-definite cone n T n×n S+ := X = X 2 R : X 0 is a convex cone. (Again, we will prove the convexity of this set later.) Support and indicator functions. For a given set S, the function T φS(x) := max x u u2S is called the support function of S. If S is the unit ball for some norm: S = fu : kuk ≤ 1g, then the support function of S is the dual norm. Another important function associated with S is the indicator function 0 x 2 S; I (x) = S +1 x 62 S: 3-2 EE 227A Lecture 3 | January 24, 2012 Sp'12 Operations that preserve convexity. Two important operations that preserve convexity are: • Intersection: the intersection of a (possibly infinite) family of convex sets is convex. n We can use this property to prove that the semi-definite cone S+ is convex, since n T n×n n T S+ = X = X 2 R : 8 z 2 R ; z Xz ≥ 0 ; from which we see that the set is the intersection of the subspace of symmetric matrices with a set described by an infinite number of linear inequalities of the form zT Xz ≥ 0, indexed by z 2 Rn. Likewise, the second-order cone defined in (3.1) is convex, since the T condition t ≥ kxk2 is equivalent to the infinite number of affine inequalities t ≥ u x, kuk2 ≤ 1. • Affine transformation: If a function is affine (that is, it is the sum of a linear function and a constant), and C is convex, then the set f(C) := ff(x): x 2 Cg is convex. A particular example is projection on a subspace, which preserves convexity. Separation theorems. There are many versions of separation theorems in convex analy- sis. One of them is the separating hyperplane theorem: Theorem 1 (Separating hyperplane). If C, D are two convex subsets of Rn that do not intersect, then there is an hyperplane that separates them, that is, there exit a 2 Rn, a 6= 0, and b 2 R, such that aT x ≤ b for every x 2 C, and aT x ≥ b for every x 2 D. Another result involves the separation of a set from a point on its boundary: Theorem 2 (Supporting hyperplane). If C ⊆ Rn is convex and non-empty, then for any x0 at the boundary of C, there exist a supporting hyperplane to C at x0, meaning that there n T exist a 2 R , a 6= 0, such that a (x − x0) ≤ 0 for every x 2 C. 3.2 Convex Functions Domain of a function. The domain of a function f : Rn ! R is the set domf ⊆ Rn over which f is well-defined, in other words: dom f := fx 2 Rn : −∞ < f(x) < +1g: Here are some examples: • The function with values f(x) = log(x) has domain dom f = R++. n • The function with values f(X) = log det(X) has domain dom f = S++ (the set of positive-definite matrices). 3-3 EE 227A Lecture 3 | January 24, 2012 Sp'12 Definition of convexity. A function f : Rn ! R is convex if i) dom f is convex; ii) 8 x; y 2 dom f and 8 λ 2 [0; 1], f (λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y). Note that the convexity of the domain is required. For example, the function f : R ! R defined as x if x 62 [−1; 1] f(x) = +1 otherwise is not convex, although is it linear (hence, convex) on its domain ] − 1; −1) [ (1; +1[. We say that a function is concave if −f is convex. Here are some examples: • The support function of any set is convex. • The indicator function of a set is convex if and only if the set is convex. T T n • The quadratic function f(x) = x P x + 2q x + r, with P 2 S++, is convex. (For a proof, see later.) • The function f : R ! R defined as f(x) = 1=x for x > 0 and f(x) = +1 is convex. Alternate characterizations of convexity. Let f : Rn ! R. The following are equiva- lent conditions for f to be convex. • Epigraph condition: f is convex if and only if its epigraph epif := (x; t) 2 Rn+1 : t ≥ f(x) is convex. We can us this result to prove for example, that the largest eigenvalue n function λmax : S ! R, which to a given n × n symmetric matrix X associates its largest eigenvalue, is convex, since the condition λmax(X) ≤ t is equivalent to the n condition that tI − X 2 S+. • Restriction to a line: The function f is convex if and only if its restriction to any line n n is convex, meaning that for every x0 2 R , and v 2 R , the function g(t) := f(x0 + tv) is convex. For example, the function f(X) = log det X is convex. (Prove this as an exercise.) You can also use this to prove that the quadratic function f(x) = xT P x + 2qT x + r is convex if and only if P 0. • First-order condition: If f is differentiable (that is, domf is open and the gradient exists everywhere on the domain), then f is convex if and only if 8 x; y : f(y) ≥ f(x) + rf(x)T (y − x): The geometric interpretation is that the graph of f is bounded below everywhere by anyone of its tangents. 3-4 EE 227A Lecture 3 | January 24, 2012 Sp'12 • Second-order condition: If f is twice differentiable, then it is convex if and only if its Hessian r2f is positive semi-definite everywhere. This is perhaps the most commonly known characterization of convexity. For example, the function f(x; t) = xT x=t with domain f(x; t): t > 0g, is convex. Pn (Check this!) Other examples include the log-sum-exp function, f(x) = log i=1 exp xi, and the quadratic function alluded to above. Operations that preserve convexity. • The nonnegative weighted sum of convex functions is convex. • The composition with an affine function preserves convexity: if A 2 Rm×n, b 2 Rm and f : Rm ! R is convex, then the function g : Rn ! R with values g(x) = f(Ax + b) is convex. • The pointwise maximum of a family of convex functions is convex: if (fα)α2A is a family of convex functions index by α, then the function f(x) := max fα(x) α2A is convex. For example, the dual norm x ! max yT x y :kyk≤1 is convex, as the maximum of convex (in fact, linear) functions (indexed by the vector y). Another example is the largest singular value of a matrix A: f(A) = σmax(A) = n maxx : kxk2=1 kAxk2. Here, each function (indexed by x 2 R ) A ! kAxk2 is convex, since it is the composition of the Euclidean norm (a convex function) with an affine function A ! Ax. Also, this can be used to prove convexity of the function we introduced in lecture 2, k n X T X n kxk1;k := jxj[i] = max u jxj : ui = k; u 2 f0; 1g ; u i=1 i=1 where we use the fact that for any u feasible for the maximization problem, the function x ! uT jxj is convex (since u ≥ 0).