ATutorial on Convex Optimization
Haitham Hindi Palo Alto Research Center (PARC), Palo Alto, California email: [email protected]
Abstract—Inrecent years, convex optimization has be- hence tractability) of many problems, often by come a computational tool of central importance in engi- inspection; neering, thanks to it’s ability to solve very large, practical 2) to review and introduce some canonical opti- engineering problems reliably and efficiently. The goal of this tutorial is to give an overview of the basic concepts of mization problems, which can be used to model convex sets, functions and convex optimization problems, so problems and for which reliable optimization code that the reader can more readily recognize and formulate can be readily obtained; engineering problems using modern convex optimization. 3) emphasize the modeling and formulation aspect; This tutorial coincides with the publication of the new book we do not discuss the aspects of writing custom on convex optimization, by Boyd and Vandenberghe [7], who have made available a large amount of free course codes. material and links to freely available code. These can be We assume that the reader has a working knowledge of downloaded and used immediately by the audience both linear algebra and vector calculus, and some (minimal) for self-study and to solve real problems. exposure to optimization. I. INTRODUCTION Our presentation is quite informal. Rather than pro- vide details for all the facts and claims presented, our Convex optimization can be described as a fusion goal is instead to give the reader a flavor for what is of three disciplines: optimization [22], [20], [1], [3], possible with convex optimization. Complete details can [4], convex analysis [19], [24], [27], [16], [13], and be found in [7], from which all the material presented numerical computation [26], [12], [10], [17]. It has here is taken. Thus we encourage the reader to skip recently become a tool of central importance in engi- sections that might not seem clear and continue reading; neering, enabling the solution of very large, practical the topics are not all interdependent. engineering problems reliably and efficiently. In some sense, convex optimization is providing new indispens- Motivation able computational tools today, which naturally extend Avast number of design problems in engineering can our ability to solve problems such as least squares and be posed as constrained optimization problems, of the linear programming to a much larger and richer class of form: problems. Our ability to solve these new types of problems minimize f0(x) ≤ comes from recent breakthroughs in algorithms for solv- subject to fi(x) 0,i=1,...,m (1) ing convex optimization problems [18], [23], [29], [30], hi(x)=0,i=1,...,p. coupled with the dramatic improvements in computing where x is a vector of decision variables, and the power, both of which have happened only in the past functions f0, fi and hi, respectively, are the cost, in- decade or so. Today, new applications of convex op- equality constraints, and equality constraints. However, timization are constantly being reported from almost such problems can be very hard to solve in general, every area of engineering, including: control, signal especially when the number of decision variables in x processing, networks, circuit design, communication, in- is large. There are several reasons for this difficulty: formation theory, computer science, operations research, First, the problem “terrain” may be riddled with local economics, statistics, structural design. See [7], [2], [5], optima. Second, it might be very hard to find a fea- [6], [9], [11], [15], [8], [21], [14], [28] and the references sible point (i.e.,anx which satisfy all equalities and therein. inequalities), in fact, the feasible set which needn’t even The objectives of this tutorial are: be fully connected, could be empty. Third, stopping 1) to show that there is are straight forward, sys- criteria used in general optimization algorithms are often tematic rules and facts, which when mastered, arbitrary. Forth, optimization algorithms might have very allow one to quickly deduce the convexity (and poor convergence rates. Fifth, numerical problems could cause the minimization algorithm to stop all together or where A = a1 ··· aq ;orasthe nullspace of a wander. matrix It has been known for a long time [19], [3], [16], [13] nullspace(B)={x | Bx =0} that if the fi are all convex, and the hi are affine, then { | T T } the first three problems disappear: any local optimum = x b1 x =0,...,bp x =0 is, in fact, a global optimum; feasibility of convex op- T timization problems can be determined unambiguously, where B = b1 ··· bp .Asanexample, let × at least in principle; and very precise stopping criteria Sn = {X ∈ Rn n | X = XT } denote the set are available using duality.However, convergence rate of symmetric matrices. Then Sn is a subspace, since and numerical sensitivity issues still remain a potential symmetric matrices are closed under addition. Another problem. waytosee this is to note that Sn can be written as n×n It was not until the late ’80’s and ’90’s that researchers {X ∈ R | Xij = Xji, ∀i, j} which is the nullspace in the former Soviet Union and United States discovered of linear function X − X T . n that if, in addition to convexity, the fi satisfied a property A set S ⊆ R is affine if it contains line through any known as self-concordance, then issues of convergence two points in it, i.e., and numerical sensitivity could be avoided using interior ∈ ∈ ⇒ ∈ point methods [18], [23], [29], [30], [25]. The self- x, y S, λ, µ R,λ+ µ =1= λx + µy S. concordance property is satisfied by a very large set of important functions used in engineering. Hence, it is now p possible to solve a large class of convex optimization λ =1.5 px problems in engineering with great efficiency. p λ =0.6 py II. CONVEX SETS p In this section we list some important convex sets λ = −0.5 and operations. It is important to note that some of these sets have different representations. Picking the Geometrically, an affine set is simply a subspace which right representation can make the difference between a is not necessarily centered at the origin. Two common tractable problem and an intractable one. representations for of an affine set are: the range of affine We will be concerned only with optimization prob- function q lems whose decision variables are vectors in Rn or S = {Az + b | z ∈ R }, × matrices in Rm n. Throughout the paper, we will make frequent use of informal sketches to help the reader or as the solution of a set of linear equalities: develop an intuition for the geometry of convex opti- T T S = {x | b1 x = d1,...,b x = dp} mization. p n m = {x | Bx = d}. A function f : R → R is affine if it has the form linear plus constant f(x)=Ax + b.IfF is a matrix n A set S ⊆ R is a convex set if it contains the line F : n → p×q F valued function, i.e., R R , then is affine segment joining any of its points, i.e., if it has the form x, y ∈ S, λ, µ ≥ 0,λ+ µ =1=⇒ λx + µy ∈ S F (x)=A0 + x1A1 + ···+ x A n n convex not convex p×q where Ai ∈ R .Affine functions are sometimes loosely refered to as linear. S ⊆ n y Recall that R is a subspace if it contains the x plane through any two of its points and the origin, i.e., x, y ∈ S, λ, µ ∈ R =⇒ λx + µy ∈ S. Geometrically, we can think of convex sets as always Two common representations of a subspace are as the bulging outward, with no dents or kinks in them. Clearly range of a matrix subspaces and affine sets are convex, since their defini- tions subsume convexity. { | ∈ q} range(A)= Aw w R A set S ⊆ Rn is a convex cone if it contains all = {w1a1 + ···+ wqaq | wi ∈ R} rays passing through its points which emanate from the
2 origin, as well as all line segments joining any points points; Co(S) is the unit simplex which is the triangle on those rays, i.e., joining the vectors along with all the points inside it; 3 Cone(S) is the nonnegative orthant R+. x, y ∈ S, λ, µ ≥ 0, =⇒ λx + µy ∈ S Recall that a hyperplane, represented as {x | aT x = Geometrically, x, y ∈ S means that S contains the entire b} (a =0 ), is in general an affine set, and is a subspace ‘pie slice’ between x, and y. if b =0. Another useful representation of a hyperplane T is given by {x | a (x − x0)=0}, where a is normal vector; x0 lies on hyperplane. Hyperplanes are convex, x since they contain all lines (and hence segments) joining any of their points. A halfspace, described as {x | aT x ≤ b} (a =0 )is y 0 generally convex and is a convex cone if b =0. Another T n useful representation is {x | a (x − x0) ≤ 0}, where The nonnegative orthant, R+ is a convex cone. The set n n a is (outward) normal vector and x0 lies on boundary. S+ = {X ∈ S | X 0} of symmetric positive Halfspaces are convex. semidefinite (PSD) matrices is also a convex cone, since aT x = jb any positive combination of semidefinite matrices is n semidefinite. Hence we call S+ the positive semidefinite a cone. p A convex cone K ⊆ Rn is said to be proper if it is x0 closed, has nonempty interior, and is pointed, i.e., there T is no line in K.Aproper cone K defines a generalized T a x ≥ b n a x ≤ b inequality K in R :
x K y ⇐⇒ y − x ∈ K We now come to a very important fact about prop- ≺ ⇐⇒ − ∈ int (strict version x K y y x K). erties which are preserved under intersection: Let A be an arbitrary index set (possibly uncountably infinite) and {Sα | α ∈A}a collection of sets , then we have the y K x following: K x subspace subspace affine affine Sα is =⇒ Sα is convex convex convex cone α∈A convex cone This formalizes our use of the symbol: In fact, every closed convex set S is the (usually infinite) n ≤ • K = R+: x K y means xi yi intersection of halfspaces which contain it, i.e., (componentwise vector inequality) S = {H | H halfspace,S⊆H}.For example, n n • K = S+: X K Y means Y − X is PSD another way to see that S+ is a convex cone is to recall n These are so common we drop the K. that a matrix X ∈ S is positive semidefinite if zT Xz ≥ ∈ n ∈ 0, ∀z ∈ Rn. Thus we can write Given points xi R and θi R, then y = θ1x1 + ··· + θkxk is said to be a n • linear combination for any real θ n ∈ n T ≥ i S+ = X S z Xz = zizjXij 0 . • n =1 affine combination if i θi =1 z∈R i,j • convex combination if θ =1, θ ≥ 0 i i i Now observe that the summation above is actually linear • conic combination if θi ≥ 0 n in the components of X,soS+ is the infinite intersection The linear (resp. affine, convex, conic) hull of a set of halfspaces containing the origin (which are convex S is the set of all linear (resp. affine, convex, conic) cones) in Sn. combinations of points from S, and is denoted by We continue with our listing of useful convex sets. Aff Co Cone span(S) (resp. (S), (S), (S)). It can be A polyhedron is intersection of a finite number of shown that this is the smallest such set containing S. halfspaces As an example, consider the set S = 3 P { | T ≤ } {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Then span(S) is R ; = x ai x bi,i=1,...,k Aff(S) is the hyperplane passing through the three = {x | Ax b}
3 where above means componentwise inequality. √ λ2 √ a p λ1 a1 2 xc
√ P In this case, the semiaxis lengths are λi, where λi a eigenvalues of A; the semiaxis directions are eigenvec- 3 1/2 a5 tors of A and the volume is proportional to (det A) . However, depending on the problem, other descriptions√ T a4 might be more appropriate, such as ( u = u u) • image of unit ball under affine transformation: E = A bounded polyhedron is called a polytope, which also {Bu + xc | u ≤1};vol. ∝ det B has the alternative representation P = Co{v1,...,v }, N • preimage of unit ball under affine transformation: where {v1,...,vN } are its vertices. For example, the −1 n n E = {x | Ax − b ≤1};vol. ∝ det A nonnegative orthant R+ = {x ∈ R | x 0} • sublevel set of nondegenerate quadratic: E = is a polyhedron, while the probability simplex {x ∈ {x | f(x) ≤ 0} where f(x)=xT Cx +2dT x + e Rn | x 0, x =1} is a polytope. i i with C = CT 0, e − dT C−1d<0;vol. ∝ f {x | f(x − −1/2 If is a norm, then the norm ball B= (det A) x ) ≤ 1} is convex, and the norm cone C = c It is an instructive exercise to convert among represen- {(x, t) | f(x) ≤ t} is a convex cone. Perhaps the most tations. familiar norms are the norms on Rn: p The second-order cone is the norm cone associated 1 with the Euclidean norm. ( |x |p) /p ; p ≥ 1, √ x = i i p max |x | ; p = ∞ S = {(x, t) | xT x ≤ t} i i T x I 0 x 2 = (x, t) ≤ 0,t≥ 0 The corresponding norm balls (in R ) look like this: t 0 −1 t
(−1, 1) (1, 1)
p =1 ¡ 1 p =1.5 ¢ £ ¢ 0.8 p =2 £ p = ∞ p =3 I 0.6
0.4
0.2
0 p 1 0.5 1 0 0.5 0 −0.5 −0.5 −1 −1
The image (or preimage) under an affinte transforma- tion of a convex set are convex, i.e.,ifS, T convex, then so are the sets (−1, −1) (1, −1) f −1(S)={x | Ax + b ∈ S} Another workhorse of convex optimization is the { | ∈ } ellipsoid.Inaddition to their computational convenience, f(T )= Ax + b x T ellipsoids can be used as crude models or approximates An example is coordinate projection {x | (x, y) ∈ for other convex sets. In fact, it can be shown that any S for some y}.Asanother example, a constraint of the n bounded convex set in R can be approximated to within form T afactor of n by an ellipsoid. Ax + b 2 ≤ c x + d, There are many representations for ellipsoids. For × where A ∈ Rk n,asecond-order cone constraint, since example, we can use it is the same as requiring the affine function (Ax + T k+1 T −1 b, c x + d) to lie in the second-order cone in R . E = {x | (x − xc) A (x − xc) ≤ 1} n Similarly, if A0,A1,...,Am ∈ S , solution set of the T linear matrix inequality (LMI) where the parameters are A = A 0, the center xc ∈ n R . F (x)=A0 + x1A1 + ···+ xmAm 0
4 is convex (preimage of the semidefinite cone under an A. Convex functions affine function). A function f : Rn → R is convex if its domain dom f m → A linear-fractional (or projective) function f : R is convex and for all x, y ∈ dom f, θ ∈ [0, 1] Rn has the form − ≤ − Ax + b f(θx +(1 θ)y) θf(x)+(1 θ)f(y); f(x)= cT x + d f is concave if −f is convex. and domain dom f = H = {x | cT x + d>0}.IfC is a convex set, then its linear-fractional transformation f(C) is also convex. This is because linear fractional transformations preserve line segments: for x, y ∈H, x x x convex concave neither f([x, y]) = [f(x),f(y)] Here are some simple examples on R) are: x2 is convex (dom f = R); log x is concave (dom f = R++); and x4 x3 f(x)=1/x is convex (dom f = R++). f(x4) f(x3) It is convenient to define the extension of a convex x x 1 2 f(x1) f(x2) function f f(x) x ∈ dom f f˜(x)= +∞ x ∈ dom f Two further properties are helpful in visualizing the Note that f still satisfies the basic definition on for all geometry of convex sets. The first is the separating x, y ∈ Rn, 0 ≤ θ ≤ 1 (as an inequality in R ∪{+∞}). n hyperplane theorem, which states that if S, T ⊆ R We will use same symbol for f and its extension, i.e.,we are convex and disjoint (S ∩ T = ∅), then there exists a will implicitly assume convex functions are extended. T hyperplane {x | a x − b =0} which separates them. The epigraph of a function f is epi f = {(x, t) | x ∈ dom f, f(x) ≤ t } f(x)
epi f T S
a x The second property is due to the supporting hyperplane theorem which states that there exists a supporting The (α-)sublevel set of a function f is hyperplan at every point on the boundary of a convex ∆ T T S = {x ∈ dom f | f(x) ≤ α} set, where a supporting hyperplane {x | a x = a x0} α supports S at x0 ∈ ∂S if Form the basic definition of convexity, it follows that T T f epi f x ∈ S ⇒ a x ≤ a x0 if is a convex function if and only if its epigraph is a convex set. It also follows that if f is convex, then its sublevel sets Sα are convex (the converse is false - a see quasiconvex functions later). x0 The convexity of a differentiable function f : R n → R S can also be characterized by conditions on its gradient ∇f and Hessian ∇2f. Recall that, in general, the gradient yields a first order Taylor approximation at x 0: T III. CONVEX FUNCTIONS f(x) f(x0)+∇f(x0) (x − x0) In this section, we introduce the reader to some We have the following first-order condition: f is convex important convex functions and techniques for verifying if and only if for all x, x0 ∈ dom f, convexity. The objective is to sharpen the reader’s ability T to recognize convexity. f(x) ≥ f(x0)+∇f(x0) (x − x0),
5 i.e., the first order approximation of f is a global • pointwise supremum (maximum): underestimator. fα convex =⇒ sup ∈A fα convex (corresponds f(x) α to intersection of epigraphs) f2(x) epi max{f1,f2} f1(x) x0 x f x ∇f x T x − x ( 0)+ ( 0) ( 0) x To see why this is so, we can rewrite the condition above in terms of the epigraph of f as: for all (x, t) ∈ epi f, • affine transformation of domain: f convex =⇒ f(Ax + b) convex T ∇f(x0) x − x0 ≤ 0, The following are important examples of how these −1 t − f(x0) further properties can be applied: T • piecewise-linear functions: f(x)=maxi{a x+bi} i.e., (∇f(x0), −1) defines supporting hyperplane to i is convex in x (epi f is polyhedron) epi f at (x0,f(x0)) • − f(x) maximum distance to any set, sups∈S x s ,is convex in x [(x − s) is affine in x, · is a norm, so fs(x)= x − s is convex]. • expected value: f(x, u) convex in x ∇f x0 , − ( ( ) 1) =⇒ g(x)=Eu f(x, u) convex n 2 • f(x)=x[1] + x[2] + x[3] Recall that the Hessian of f, ∇ f, yields a second is convex on R (x[ ] is the ith largest x ). To see this, note that order Taylor series expansion around x 0: i j f(x) is the sum of the largest triple of components T T 1 T 2 of x.Sof can be written as f(x)=maxi c x, f(x) f(x0)+∇f(x0) (x−x0)+ (x−x0) ∇ f(x0)(x−x0) i 2 where ci are the set of all vectors with components zero except for three 1’s. We have the following necessary and sufficient second m −1 • − T order condition:atwice differentiable function f is f(x)= i=1 log(bi ai x) is convex 2 dom { | T } convex if and only if for all x ∈ dom f, ∇ f(x) 0, ( f = x ai x