<<

ATutorial on

Haitham Hindi Palo Alto Research Center (PARC), Palo Alto, California email: [email protected]

Abstract—Inrecent years, convex optimization has be- hence tractability) of many problems, often by come a computational tool of central importance in engi- inspection; neering, thanks to it’s ability to solve very large, practical 2) to review and introduce some canonical opti- engineering problems reliably and efficiently. The goal of this tutorial is to give an overview of the basic concepts of mization problems, which can be used to model convex sets, functions and convex optimization problems, so problems and for which reliable optimization code that the reader can more readily recognize and formulate can be readily obtained; engineering problems using modern convex optimization. 3) emphasize the modeling and formulation aspect; This tutorial coincides with the publication of the new book we do not discuss the aspects of writing custom on convex optimization, by Boyd and Vandenberghe [7], who have made available a large amount of free course codes. material and links to freely available code. These can be We assume that the reader has a working knowledge of downloaded and used immediately by the audience both and vector calculus, and some (minimal) for self-study and to solve real problems. exposure to optimization. I. INTRODUCTION Our presentation is quite informal. Rather than pro- vide details for all the facts and claims presented, our Convex optimization can be described as a fusion goal is instead to give the reader a flavor for what is of three disciplines: optimization [22], [20], [1], [3], possible with convex optimization. Complete details can [4], [19], [24], [27], [16], [13], and be found in [7], from which all the material presented numerical computation [26], [12], [10], [17]. It has here is taken. Thus we encourage the reader to skip recently become a tool of central importance in engi- sections that might not seem clear and continue reading; neering, enabling the solution of very large, practical the topics are not all interdependent. engineering problems reliably and efficiently. In some sense, convex optimization is providing new indispens- Motivation able computational tools today, which naturally extend Avast number of design problems in engineering can our ability to solve problems such as and be posed as constrained optimization problems, of the to a much larger and richer class of form: problems. Our ability to solve these new types of problems minimize f0(x) ≤ comes from recent breakthroughs in algorithms for solv- subject to fi(x) 0,i=1,...,m (1) ing convex optimization problems [18], [23], [29], [30], hi(x)=0,i=1,...,p. coupled with the dramatic improvements in computing where x is a vector of decision variables, and the power, both of which have happened only in the past functions f0, fi and hi, respectively, are the cost, in- decade or so. Today, new applications of convex op- equality constraints, and equality constraints. However, timization are constantly being reported from almost such problems can be very hard to solve in general, every area of engineering, including: control, signal especially when the number of decision variables in x processing, networks, circuit design, communication, in- is large. There are several reasons for this difficulty: formation theory, computer science, operations research, First, the problem “terrain” may be riddled with local economics, , structural design. See [7], [2], [5], optima. Second, it might be very hard to find a fea- [6], [9], [11], [15], [8], [21], [14], [28] and the references sible point (i.e.,anx which satisfy all equalities and therein. inequalities), in fact, the feasible set which needn’t even The objectives of this tutorial are: be fully connected, could be empty. Third, stopping 1) to show that there is are straight forward, sys- criteria used in general optimization algorithms are often tematic rules and facts, which when mastered, arbitrary. Forth, optimization algorithms might have very allow one to quickly deduce the convexity (and poor convergence rates. Fifth, numerical problems could cause the minimization algorithm to stop all together or where A = a1 ··· aq ;orasthe nullspace of a wander. matrix It has been known for a long time [19], [3], [16], [13] nullspace(B)={x | Bx =0} that if the fi are all convex, and the hi are affine, then { | T T } the first three problems disappear: any local optimum = x b1 x =0,...,bp x =0 is, in fact, a global optimum; feasibility of convex op- T timization problems can be determined unambiguously, where B = b1 ··· bp .Asanexample, let × at least in principle; and very precise stopping criteria Sn = {X ∈ Rn n | X = XT } denote the set are available using .However, convergence rate of symmetric matrices. Then Sn is a subspace, since and numerical sensitivity issues still remain a potential symmetric matrices are closed under addition. Another problem. waytosee this is to note that Sn can be written as n×n It was not until the late ’80’s and ’90’s that researchers {X ∈ | Xij = Xji, ∀i, j} which is the nullspace in the former Soviet Union and United States discovered of linear X − X T . n that if, in addition to convexity, the fi satisfied a property A set S ⊆ R is affine if it contains line through any known as self-concordance, then issues of convergence two points in it, i.e., and numerical sensitivity could be avoided using interior ∈ ∈ ⇒ ∈ point methods [18], [23], [29], [30], [25]. The self- x, y S, λ, µ R,λ+ µ =1= λx + µy S. concordance property is satisfied by a very large set of important functions used in engineering. Hence, it is now p  possible to solve a large class of convex optimization λ =1.5 px problems in engineering with great efficiency. p  λ =0.6 py II. CONVEX SETS p  In this section we list some important convex sets λ = −0.5 and operations. It is important to note that some of these sets have different representations. Picking the Geometrically, an affine set is simply a subspace which right representation can make the difference between a is not necessarily centered at the origin. Two common tractable problem and an intractable one. representations for of an affine set are: the range of affine We will be concerned only with optimization prob- function q lems whose decision variables are vectors in Rn or S = {Az + b | z ∈ R }, × matrices in Rm n. Throughout the paper, we will make frequent use of informal sketches to help the reader or as the solution of a set of linear equalities: develop an intuition for the geometry of convex opti- T T S = {x | b1 x = d1,...,b x = dp} mization. p n m = {x | Bx = d}. A function f : R → R is affine if it has the form linear plus constant f(x)=Ax + b.IfF is a matrix n A set S ⊆ R is a if it contains the line F : n → p×q F valued function, i.e., R R , then is affine segment joining any of its points, i.e., if it has the form x, y ∈ S, λ, µ ≥ 0,λ+ µ =1=⇒ λx + µy ∈ S F (x)=A0 + x1A1 + ···+ x A n n convex not convex p×q where Ai ∈ R .Affine functions are sometimes loosely refered to as linear. S ⊆ n y Recall that R is a subspace if it contains the x plane through any two of its points and the origin, i.e., x, y ∈ S, λ, µ ∈ R =⇒ λx + µy ∈ S. Geometrically, we can think of convex sets as always Two common representations of a subspace are as the bulging outward, with no dents or kinks in them. Clearly range of a matrix subspaces and affine sets are convex, since their defini- tions subsume convexity. { | ∈ q} range(A)= Aw w R A set S ⊆ Rn is a if it contains all = {w1a1 + ···+ wqaq | wi ∈ R} rays passing through its points which emanate from the

2 origin, as well as all line segments joining any points points; Co(S) is the unit which is the triangle on those rays, i.e., joining the vectors along with all the points inside it; 3 Cone(S) is the nonnegative orthant R+. x, y ∈ S, λ, µ ≥ 0, =⇒ λx + µy ∈ S Recall that a hyperplane, represented as {x | aT x = Geometrically, x, y ∈ S means that S contains the entire b} (a =0 ), is in general an affine set, and is a subspace ‘pie slice’ between x, and y. if b =0. Another useful representation of a hyperplane T is given by {x | a (x − x0)=0}, where a is normal vector; x0 lies on hyperplane. Hyperplanes are convex, x since they contain all lines (and hence segments) joining any of their points. A halfspace, described as {x | aT x ≤ b} (a =0 )is y 0 generally convex and is a convex cone if b =0. Another T n useful representation is {x | a (x − x0) ≤ 0}, where The nonnegative orthant, R+ is a convex cone. The set n n a is (outward) normal vector and x0 lies on boundary. S+ = {X ∈ S | X  0} of symmetric positive Halfspaces are convex. semidefinite (PSD) matrices is also a convex cone, since aT x = jb any positive combination of semidefinite matrices is n semidefinite. Hence we call S+ the positive semidefinite a cone. p A convex cone K ⊆ Rn is said to be proper if it is x0 closed, has nonempty interior, and is pointed, i.e., there T is no line in K.Aproper cone K defines a generalized T a x ≥ b n a x ≤ b inequality K in R :

x K y ⇐⇒ y − x ∈ K We now come to a very important fact about prop- ≺ ⇐⇒ − ∈ int (strict version x K y y x K). erties which are preserved under intersection: Let A be an arbitrary index set (possibly uncountably infinite) and {Sα | α ∈A}a collection of sets , then we have the y K x following: K     x subspace  subspace  affine   affine  Sα is =⇒ Sα is convex convex convex cone α∈A convex cone This formalizes our use of the  symbol: In fact, every closed convex set S is the (usually infinite) n ≤ • K = R+: x K y means xi yi intersection of halfspaces which contain it, i.e., (componentwise vector inequality) S = {H | H halfspace,S⊆H}.For example, n n • K = S+: X K Y means Y − X is PSD another way to see that S+ is a convex cone is to recall n These are so common we drop the K. that a matrix X ∈ S is positive semidefinite if zT Xz ≥ ∈ n ∈ 0, ∀z ∈ Rn. Thus we can write Given points xi R and θi R, then y = θ1x1 +   ··· + θkxk is said to be a   n  • linear combination for any real θ n ∈ n T ≥ i S+ = X S z Xz = zizjXij 0 . • n =1 affine combination if i θi =1 z∈R i,j • if θ =1, θ ≥ 0 i i i Now observe that the summation above is actually linear • conic combination if θi ≥ 0 n in the components of X,soS+ is the infinite intersection The linear (resp. affine, convex, conic) hull of a set of halfspaces containing the origin (which are convex S is the set of all linear (resp. affine, convex, conic) cones) in Sn. combinations of points from S, and is denoted by We continue with our listing of useful convex sets. Aff Co Cone span(S) (resp. (S), (S), (S)). It can be A polyhedron is intersection of a finite number of shown that this is the smallest such set containing S. halfspaces As an example, consider the set S = 3 P { | T ≤ } {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Then span(S) is R ; = x ai x bi,i=1,...,k Aff(S) is the hyperplane passing through the three = {x | Ax b}

3 where above means componentwise inequality. √ λ2 √ a p λ1 a1 2 xc

√ P In this case, the semiaxis lengths are λi, where λi a eigenvalues of A; the semiaxis directions are eigenvec- 3 1/2 a5 tors of A and the volume is proportional to (det A) . However, depending on the problem, other descriptions√ T a4 might be more appropriate, such as ( u = u u) • image of unit ball under affine transformation: E = A bounded polyhedron is called a polytope, which also {Bu + xc | u ≤1};vol. ∝ det B has the alternative representation P = Co{v1,...,v }, N • preimage of unit ball under affine transformation: where {v1,...,vN } are its vertices. For example, the −1 n n E = {x | Ax − b ≤1};vol. ∝ det A nonnegative orthant R+ = {x ∈ R | x  0} • sublevel set of nondegenerate quadratic: E = is a polyhedron, while the probability simplex {x ∈ {x | f(x) ≤ 0} where f(x)=xT Cx +2dT x + e Rn | x  0, x =1} is a polytope. i i with = CT  0, e − dT C−1d<0;vol. ∝ f {x | f(x − −1/2 If is a norm, then the norm ball B= (det A) x ) ≤ 1} is convex, and the norm cone C = c It is an instructive exercise to convert among represen- {(x, t) | f(x) ≤ t} is a convex cone. Perhaps the most tations. familiar norms are the  norms on Rn: p The second-order cone is the norm cone associated 1 with the Euclidean norm. ( |x |p) /p ; p ≥ 1, √ x = i i p max |x | ; p = ∞ S = {(x, t) | xT x ≤ t} i i T x I 0 x 2 = (x, t) ≤ 0,t≥ 0 The corresponding norm balls (in R ) look like this: t 0 −1 t

(−1, 1) (1, 1)

p =1 ¡ 1 p =1.5 ¢ £ ¢ 0.8 p =2 £ p = ∞ p =3 I 0.6

0.4

0.2

0 p 1 0.5 1 0 0.5 0 −0.5 −0.5 −1 −1

The image (or preimage) under an affinte transforma- tion of a convex set are convex, i.e.,ifS, T convex, then so are the sets (−1, −1) (1, −1) f −1(S)={x | Ax + b ∈ S} Another workhorse of convex optimization is the { | ∈ } ellipsoid.Inaddition to their computational convenience, f(T )= Ax + b x T ellipsoids can be used as crude models or approximates An example is coordinate projection {x | (x, y) ∈ for other convex sets. In fact, it can be shown that any S for some y}.Asanother example, a constraint of the n bounded convex set in R can be approximated to within form T afactor of n by an ellipsoid. Ax + b 2 ≤ c x + d, There are many representations for ellipsoids. For × where A ∈ Rk n,asecond-order cone constraint, since example, we can use it is the same as requiring the affine function (Ax + T k+1 T −1 b, c x + d) to lie in the second-order cone in R . E = {x | (x − xc) A (x − xc) ≤ 1} n Similarly, if A0,A1,...,Am ∈ S , solution set of the T linear matrix inequality (LMI) where the parameters are A = A  0, the center xc ∈ n R . F (x)=A0 + x1A1 + ···+ xmAm  0

4 is convex (preimage of the semidefinite cone under an A. Convex functions affine function). A function f : Rn → R is convex if its domain dom f m → A linear-fractional (or projective) function f : R is convex and for all x, y ∈ dom f, θ ∈ [0, 1] Rn has the form − ≤ − Ax + b f(θx +(1 θ)y) θf(x)+(1 θ)f(y); f(x)= cT x + d f is concave if −f is convex. and domain dom f = H = {x | cT x + d>0}.IfC is a convex set, then its linear-fractional transformation f(C) is also convex. This is because linear fractional transformations preserve line segments: for x, y ∈H, x x x convex concave neither f([x, y]) = [f(x),f(y)] Here are some simple examples on R) are: x2 is convex (dom f = R); log x is concave (dom f = R++); and x4 x3 f(x)=1/x is convex (dom f = R++). f(x4) f(x3) It is convenient to define the extension of a convex x x 1 2 f(x1) f(x2) function f f(x) x ∈ dom f f˜(x)= +∞ x ∈ dom f Two further properties are helpful in visualizing the Note that f still satisfies the basic definition on for all geometry of convex sets. The first is the separating x, y ∈ Rn, 0 ≤ θ ≤ 1 (as an inequality in R ∪{+∞}). n hyperplane theorem, which states that if S, T ⊆ R We will use same symbol for f and its extension, i.e.,we are convex and disjoint (S ∩ T = ∅), then there exists a will implicitly assume convex functions are extended. T hyperplane {x | a x − b =0} which separates them. The of a function f is epi f = {(x, t) | x ∈ dom f, f(x) ≤ t } f(x)

epi f T S

a x The second property is due to the supporting hyperplane theorem which states that there exists a supporting The (α-)sublevel set of a function f is hyperplan at every point on the boundary of a convex ∆ T T S = {x ∈ dom f | f(x) ≤ α} set, where a supporting hyperplane {x | a x = a x0} α supports S at x0 ∈ ∂S if Form the basic definition of convexity, it follows that T T f epi f x ∈ S ⇒ a x ≤ a x0 if is a if and only if its epigraph is a convex set. It also follows that if f is convex, then its sublevel sets Sα are convex (the converse is false - a see quasiconvex functions later). x0 The convexity of a differentiable function f : R n → R S can also be characterized by conditions on its ∇f and Hessian ∇2f. Recall that, in general, the gradient yields a first order Taylor approximation at x 0: T III. CONVEX FUNCTIONS f(x)  f(x0)+∇f(x0) (x − x0) In this section, we introduce the reader to some We have the following first-order condition: f is convex important convex functions and techniques for verifying if and only if for all x, x0 ∈ dom f, convexity. The objective is to sharpen the reader’s ability T to recognize convexity. f(x) ≥ f(x0)+∇f(x0) (x − x0),

5 i.e., the first order approximation of f is a global • pointwise supremum (maximum): underestimator. fα convex =⇒ sup ∈A fα convex (corresponds f(x) α to intersection of epigraphs) f2(x) epi max{f1,f2} f1(x) x0 x f x ∇f x T x − x ( 0)+ ( 0) ( 0) x To see why this is so, we can rewrite the condition above in terms of the epigraph of f as: for all (x, t) ∈ epi f, • affine transformation of domain: f convex =⇒ f(Ax + b) convex T ∇f(x0) x − x0 ≤ 0, The following are important examples of how these −1 t − f(x0) further properties can be applied: T • piecewise-linear functions: f(x)=maxi{a x+bi} i.e., (∇f(x0), −1) defines supporting hyperplane to i is convex in x (epi f is polyhedron) epi f at (x0,f(x0)) • − f(x) maximum distance to any set, sups∈S x s ,is convex in x [(x − s) is affine in x, · is a norm, so fs(x)= x − s is convex]. • expected value: f(x, u) convex in x ∇f x0 , − ( ( ) 1) =⇒ g(x)=Eu f(x, u) convex n 2 • f(x)=x[1] + x[2] + x[3] Recall that the Hessian of f, ∇ f, yields a second is convex on R (x[ ] is the ith largest x ). To see this, note that order Taylor series expansion around x 0: i j f(x) is the sum of the largest triple of components T T 1 T 2 of x.Sof can be written as f(x)=maxi c x, f(x)  f(x0)+∇f(x0) (x−x0)+ (x−x0) ∇ f(x0)(x−x0) i 2 where ci are the set of all vectors with components zero except for three 1’s. We have the following necessary and sufficient second m −1 • − T order condition:atwice differentiable function f is f(x)= i=1 log(bi ai x) is convex 2 dom { | T } convex if and only if for all x ∈ dom f, ∇ f(x)  0, ( f = x ai x

6 general. However, if S is convex, then dist(x, S) is Forexample, the following second-order cone con- convex since x − y + δ(y|S) is convex in (x, y). straint on x The technique of composition can also be used to Ax + b ≤eT x + d deduce the convexity of differentiable functions, by is equivalent to LMI means of the chain rule. For example, one can show results like: f(x)=log exp gi(x) is convex if each (eT x + d)IAx+ b i  0. gi is; see [7]. (Ax + b)T eT x + d Jensen’s inequality,aclassical result which essentially This shows that all sets that can be modeled by an SOCP a restatement of convexity, is used extensively in various n constraint can be modeled as LMI’s. Hence LMI’s are forms. If f : R → R is convex more “expressive”. As another nontrivial example, the ≥ ⇒ • two points: θ1 + θ2 =1, θi 0= quadratic-over-linear function f(θ1x1 + θ2x2) ≤ θ1f(x1)+θ2f(x2) 2 • ≥ ⇒ f(x, y)=x /y, dom f = × ++ more than two points: i θi =1, θi 0= R R f( θ x ) ≤ θ f(x ) i i i i i i is convex because its epigraph {(x, y, t) | y>0,x2/y ≤ • continuous version: p(x) ≥ 0, p(x) dx =1 =⇒ t} is convex: by Schur complememts, it’s equivalent to f( xp(x) dx) ≤ f(x)p(x) dx the solution set of the LMI constraints or more generally, f(E x) ≤ E f(x). yx The simple techniques above can also be used to  0,y>0. xt verify the convexity of functions of matrices, which arise in many important problems. The concept of K-convexity generalizes convexity to × • Tr T n n vector- or matrix-valued functions. As mentioned earlier, A X = i,j Aij Xij is linear in X on R −1 ⊆ m • { ∈ n |  } a pointed convex cone K R induces generalized log det X is convex on X S X 0 n m inequality K .Afunction f : R → R is K-convex Proof: Verify convexity along lines. Let λi be the −1/2 −1/2 if for all θ ∈ [0, 1] eigenvalues of X0 HX0 ∆ −1 f t X0 tH ( ) =logdet( + ) f(θx +(1− θ)y) K θf(x)+(1− θ)f(y) −1 −1/2 −1/2 −1 =logdetX0 +logdet( I + tX0 HX0 ) X−1 − tλ =logdet 0 i log(1 + i) In many problems, especially in statistics, log-concave which is a convex function of t. functions are frequently encountered. A function f : 1/n n n • (det X) is concave on {X ∈ S | X  0} R → R+ is log-concave (log-convex) if log f is n • λmax(X) is convex on S since, for X symmetric, concave (convex). Hence one can transform problems in T λmax(X)=supy2=1 y Xy, which is a supre- log convex functions into convex optimization problems. mum of linear functions of X. As an example, the normal density, f(x)=const · T 1/2 T −1 • X 2 = σ1(X)=(λmax(X X)) is con- e−(1/2)(x−x0) Σ (x−x0) is log-concave. m×n vex on R since, by definition, X 2 = B. Quasiconvex functions supu2=1 Xu 2. The Schur complement technique is an indispensible The next best thing to a convex function in optimiza- tool for modeling and transforming nonlinear constraints tion is a , since it can be minimized into convex LMI constraints. Suppose a symmetric ma- in a few iterations of convex optimization problems. n trix X can be decomposed as A function f : R → R is quasiconvex if every sublevel set Sα = {x ∈ dom f | f(x) ≤ α} is AB f X = convex. Note that if is convex, then it is automatically BT C quasiconvex. Quasiconvex functions can have ‘locally flat’ regions. ∆ with A invertible. The matrix S = C − BT A−1B is y called the Schur complement of A in X, and has the following important attributes, which are not too hard to prove: α ∆ • X  0 if and only if A  0 and S = C − BT A−1B  0 x • if A  0, then X  0 if and only if S  0 Sα

7 We say f is quasiconcave if −f is quasiconvex, i.e., • linear-fractional transformation of domain { | ≥ } ⇒ Ax+b superlevel sets x f(x) α are convex. A function f quasiconvex = f cT x+d quasiconvex which is both quasiconvex and quasiconcave is called on cT x + d>0 quasilinear. • composition with monotone increasing function: We list some examples: f quasiconvex, g monotone increasing • f(x)= |x| is quasiconvex on R =⇒ g(f(x)) quasiconvex • f(x)=logx is quasilinear on R+ • sums of quasiconvex functions are not quasiconvex • linear fractional function, in general • f quasiconvex in x, y =⇒ g(x)=infy f(x, y) aT x + b f(x)= quasiconvex in x (projection of epigraph remains cT x + d quasiconvex) is quasilinear on the halfspace cT x + d>0  −  IV. CONVEX OPTIMIZATION PROBLEMS • f(x)= x a 2 x−b2 is quasiconvex on the halfspace {x | x − a 2 ≤ x − b 2} In this section, we will discuss the formulation of optimization problems at a general level. In the next Quasiconvex functions have a lot of similar features to section we will focus on useful optimization problems convex functions, but also some important differences. with specific structure. The following quasiconvex properties are helpful in Consider the following optimization in standard form verifying quasiconvexity: • f is quasiconvex if and only if it is quasiconvex on minimize f0(x) ≤ lines, i.e., f(x0 + th) quasiconvex in t for all x0,h subject to fi(x) 0,i=1,...,m • modified Jensen’s inequality: f is quasiconvex iff hi(x)=0,i=1,...,p for all x, y ∈ dom f, θ ∈ [0, 1], n where fi,hi : R → R; x is the optimization variable; f0 is the objective or cost function; f (x) ≤ 0 are f(θx +(1− θ)y) ≤ max{f(x),f(y)} i the inequality constraints; h (x)=0are the equality f x i ( ) constraints. Geometrically, this problem corresponds to the minimization of f0, overaset described by as the intersection of 0-sublevel sets of fi,i =1,...,m with surfaces described by the 0-solution sets of hi. A point x is feasible if it satisfies the constraints; xy the feasible set C is the set of all feasible points; and • for f differentiable, f quasiconvex ⇐⇒ for all the problem is feasible if there are feasible points. The x, y ∈ dom f problem is said to be unconstrained if m = p =0The  optimal value is denoted by f =inf ∈ f0(x), and we ≤ ⇒ − T ∇ ≤ x C f(y) f(x) (y x) f(x) 0 adopt the convention that f  =+∞ if the problem is infeasible). A point x ∈ C is an optimal point if f(x)=   f and the optimal set is Xopt = {x ∈ C | f(x)=f }. As an example consider the problem Sα 5 1 ∇f x x ( ) 4

Sα minimize x1 + x2 C 2 3

S 2 α3 subject to −x1 ≤ 0 x − ≤ 2 x2√ 0 α1 <α2 <α3 1 − x1x2 ≤ 0 1

• 0 positive multiples 0 1 2 x 3 4 5 T 1 f quasiconvex,α≥ 0=⇒ αf quasiconvex The objective function is f0(x)=[11] x; the feasible  • pointwise supremum f1,f2 quasiconvex =⇒ set C is half-hyperboloid; the optimal value is f =2;  max{f1,f2} quasiconvex and the only optimal point is x =(1, 1). (extends to supremum over arbitrary set) In the standard problem above, the explicit constraints • affine transformation of domain are given by fi(x) ≤ 0, hi(x)=0.However, there are f quasiconvex =⇒ f(Ax + b) quasiconvex also the implicit constraints: x ∈ dom fi, x ∈ dom hi,

8 i.e., x must lie in the set • minimizing over non-convex sets, e.g., Boolean variables D = dom f0 ∩···∩dom fm ∩dom h1 ∩···∩dom hp find x which is called the domain of the problem.For example, such that Ax b, minimize − log x1 − log x2 xi ∈{0, 1} − ≤ subject to x1 + x2 1 0 To understand global optimality in convex problems, 2 ∈ has the implicit constraint x ∈ D = {x ∈ R | x1 > recall that x C is locally optimal if it satisfies 0,x2 > 0}. y ∈ C, y − x ≤R =⇒ f0(y) ≥ f0(x) A feasibility problem is a special case of the standard problem, where we are interested merely in finding any for some R>0.Apoint x ∈ C is globally optimal feasible point. Thus, problem is really to means that • either find x ∈ C y ∈ C =⇒ f0(y) ≥ f0(x). • or determine that C = ∅ . Equivalently, the feasibility problem requires that we For convex optimization problems, any local solution is either solve the inequality / equality system also global. [Proof sketch: Suppose x is locally optimal, y ∈ C f0(y) 0 small. Then z is near x, with or determine that it is inconsistent. f0(z)

maximize x fi(x) ≤ 0,i=1,...,m

subject to Ax b hi(x)=0,i=1,...,p • nonlinear equality constraints, e.g. where the variables are now (x, t) and the objective minimize cT x has essentially be moved into the inequality constraints. T T T subject to x Pix + qi x + ri =0,i=1,...,K Observe that the new objective is linear: t = en+1(x, t)

9 (en+1 is the vector of all zeros except for a 1 in level of modeling capability is at the cost of longer com- its (n +1)th component). Minimizing t will result in putation time. Thus one should use the least complex  minimizing f0 with respect to x,soany optimal x of form for the problem at hand. the new problem with be optimal for the epigraph form A general linear program (LP) has the form and vice versa. minimize cT x + d f0(x) subject to Gx h Ax = b, × × where G ∈ Rm n and A ∈ Rp n. A problem that subsumes both linear and quadratic −e programming is the second-order cone program n+1 C (SOCP): So the linear objective is ‘universal’ for convex opti- minimize f T x ≤ T mization. subject to Aix + bi 2 ci x + di,i=1,...,m The above trick of introducing the extra variable t Fx = g, n ni×n is known as the method of slack variables.Itisvery where x ∈ R is the optimization variable, Ai ∈ R , p×n helpful in transforming problems into canonical form. and F ∈ R . When ci =0, i =1,...,m, the SOCP We will see another example in the next section. is equivalent to a quadratically constrained quadratic A convex optimization problem in standard form with program (QCQP), (which is obtained by squaring each generalized inequalities: of the constraints). Similarly, if Ai =0, i =1,...,m, then the SOCP reduces to a (general) LP. Second-order minimize f0(x) cone programs are, however, more general than QCQPs subject to fi(x) Ki 0,i=1,...,L Ax = b (and of course, LPs). A problem which subsumes linear, quadratic and n → where f0 : R R are all convex; the Ki are second-order cone programming is called a semidefinite mi n mi generalized inequalities on R ; and fi : R → R program (SDP), and has the form are Ki-convex. minimize cT x A quasiconvex optimization problem is exactly the subject to x1F1 + ···+ xnFn + G 0 same as a convex optimization problem, except for one Ax = b, difference: the objective function f0 is quasiconvex. ∈ k ∈ p×n We will see an important example of quasiconvex where G, F1,...,Fn S , and A R . The optimization in the next section: linear-fractional pro- inequality here is a linear matrix inequality. As shown gramming. earlier, since SOCP constraints can be written as LMI’s, SDP’s subsume SOCP’s, and hence LP’s as well. (If V. C ANONICAL OPTIMIZATION PROBLEMS there are multiple LMI’s, they can be stacked into one large block diagonal LMI, where the blocks are the In this section, we present several canonical optimiza- individual LMIs.) tion problem formulations, which have been found to be We will now consider some examples which are extremely useful in practice, and for which extremely themselves very instructive demonstrations of modeling efficient solution codes are available (often for free!). and the power of convex optimization. Thus if a real problem can be cast into one of these First, we show how slack variables can be used to forms, then it can be considered as essentially solved. convert a given problem into canonical form. Consider the constrained ∞- (Chebychev) approximation A. Conic programming minimize Ax − b ∞ We will present three canonical problems in this subject to Fx g. section. They are called conic problems because the inequalities are specified in terms of affine functions and By introducing the extra variable t,wecan rewrite the generalized inequalities. Geometrically, the inequalities problem as: are feasible if the range of the affine mappings intersects minimize t the cone of the inequality. subject to Ax − b t1 The problems are of increasing expressivity and mod- Ax − b −t1 eling power. However, roughly speaking, each added Fx g,

10 which is an LP in variables x ∈ Rn, t ∈ R (1 is the B. Extensions of conic programming vector with all components 1). We now present two more canonical convex optimiza- Second, as (more extensive) example of how an SOCP tion formulations, which can be viewed as extensions of might arise, consider linear programming with random the above conic formulations. constraints A generalized linear fractional program (GLFP) has minimize cT x the form: Prob T ≤ ≥ minimize f0(x) subject to (ai x bi) η, i =1,...,m subject to Gx h Here we suppose that the parameters ai are independent Ax = b Gaussian random vectors, with mean ai and covariance T ≤ where the objective function is given by Σi.Werequire that each constraint ai x bi should hold with a probability (or confidence) exceeding η, T ci x + di η ≥ 0.5 f0(x)= max , where , i.e., i=1,...,r T ei x + fi T Prob(a x ≤ bi) ≥ η. dom { | T } i with f0 = x ei x + fi > 0,i=1,...,r . The objective function is the pointwise maximum of We will show that this probability constraint can be r quasiconvex functions, and therefore quasiconvex, so expressed as a second-order cone constraint. Letting this problem is quasiconvex. u = aT x σ2 i , with denoting its variance, this constraint A determinant maximization program (maxdet) has can be written as the form: u − u b − u T Prob ≤ i ≥ η. minimize c x − log det G(x) σ σ subject to G(x)  0 F (x)  0 Since (u − u)/σ is a zero mean unit variance Gaussian Ax = b variable, the probability above is simply Φ((b i − u)/σ), where where F and G are linear (affine) matrix functions of z 1 2 Φ(z)=√ e−t /2 dt x,sothe inequality constraints are LMI’s. By flipping 2π −∞ signs, one can pose this as a maximization problem, which is the historic reason for the name. Since G is is the cumulative distribution function of a zero mean affine and, as shown earlier, the function − log det X unit variance Gaussian random variable. Thus the prob- n is convex on the symmetric semidefinite cone S+, the ability constraint can be expressed as maxdet program is a convex problem. b − u We now consider an example of each type. First, we i ≥ Φ−1(η), σ consider an optimal transmitter power allocation prob- lem where the variables are the transmitter powers p k, or, equivalently, k =1,...,m.Givenm transmitters and mn receivers all at the same frequency. Transmitter i wants to transmit −1 ≤ u +Φ (η)σ bi. to its n receivers labeled (i, j), j =1,...,n,asshown

T T 1/2 below (transmitter and receiver locations are fixed): From u = ai x and σ =(x Σix) we obtain transmitter k T −1 1/2 a x +Φ (η) Σ x 2 ≤ bi. i i receiver (i, j) By our assumption that η ≥ 1/2,wehaveΦ−1(η) ≥ 0,sothis constraint is a second-order cone constraint. In summary, the stochastic LP can be expressed as the transmitter i SOCP

T minimize c x Let Aijk ∈ R denote the path gain from transmitter T −1 1/2 ≤ k to receiver (i, j), and N ∈ R be the (self) noise subject to ai x +Φ (η) Σi x 2 bi,i=1,...,m. ij power of receiver (i, j) So at receiver (i, j) the signal We will see examples of using LMI constraints below. power is Sij = Aijipi and the noise plus interference

11 power is Iij = k=i Aijk pk + Nij.Therefore the signal to interference/noise ratio (SINR) is d s Sij /Iij . E

The optimal transmitter power allocation problem is then to choose pi to maximize smallest SINR: For this problem, we represent the ellipsoid as E = A p maximize min iji i {By+d | y ≤1}, which is parametrized by its center i,j k=i Aijk pk + Nij d and shape matrix B = BT  0, and whose volume is ≤ ≤ subject to 0 pi pmax proportional to det B. Note the containment condition means This is a GLFP. E⊆P ⇐⇒ aT (By + d) ≤ b for all y ≤1 Next we consider two related ellipsoid approximation i i ⇐⇒ T T ≤ problems. In addition to the use of LMI constraints, sup (ai By + ai d) bi this example illustrates techniques of computing with y≤1 ellipsoids and also how the choice representation of the ⇐⇒ Ba + aT d ≤ b ,i=1,...,L ellipsoids and polyhedra can make the difference be- i i i tween a tractable convex problem, and a hard intractable which is a convex constraint in B and d. Hence finding one. the maximum volume E⊆P:isconvex problem in First consider computing a minimum volume ellipsoid variables B, d: n around points v1, ..., v ∈ R . This is equivalent K maximize log det B to finding the minimum volume ellipsoid around the subject to B = BT  0 polytope defined by the of those points, Ba + aT d ≤ b ,i=1,...,L represented in vertex form. i i i Note, however, that minor variations on these two problems, which are convex and hence easy, resutl in problems that are very difficult: E • compute the maximum volume ellipsoid inside polyhedron given in vertex form Co{v 1,...,vK } • compute the minimum volume ellipsoid containing polyhedron given in inequality form Ax b For this problem, we choose the representation for In fact, just checking whether a given ellipsoid E covers the ellipsoid as E = {x | Ax − b ≤1}, which is a polytope described in inequality form is extremely parametrized by its center A−1b, and the shape matrix difficult (equivalent to maximizing a convex quadratic A = AT  0, and whose volume is proportional to s.t. linear inequalities). det A−1. This problem immediately reduces to: C. Geometric programming minimize log det A−1 In this section we describe a family of optimization subject to A = AT  0 problems that are not convex in their natural form. Avi − b ≤1,i=1,...,K However, they are an excellent example of how problems can sometimes be transformed to convex optimization which is a convex optimization problem in A, b (n + problems, by a change of variables and a transformation n(n +1)/2 variables), and can be written as a maxdet of the objective and constraint functions. In addition, problem. they have been found tremendously useful for modeling real problems, e.g.circuit design and communication Now, consider finding the maximum volume ellipsoid networks. in a polytope given in inequality form n n A function f : R → R with dom f = R++,defined as T P = {x | a x ≤ bi,i=1,...,L} a1 a2 ··· an i f(x)=cx1 x2 xn ,

12 where c>0 and ai ∈ R,iscalled a monomial Similarly, if f a posynomial, i.e., function,orsimply, a monomial. The exponents a of a i K monomial can be any real numbers, including fractional a1k a2k ank f(x)= ckx1 x2 ···x , c n or negative, but the coefficient must be nonnegative. k=1 A sum of monomials, i.e.,afunction of the form then K K a1k a2k nk T ··· a a y+bk f(x)= ckx1 x2 xn , f(x)= e k , =1 k k=1 where ck > 0,iscalled a posynomial function (with K where a =(a1 ,...,a ) and b =logc . After the terms), or simply, a posynomial. Posynomials are closed k k nk k k change of variables, a posynomial becomes a sum of under addition, multiplication, and nonnegative scaling. exponentials of affine functions. Monomials are closed under multiplication and division. The geometric program can be expressed in terms of If a posynomial is multiplied by a monomial, the result the new variable y as is a posynomial; similarly, a posynomial can be divided T 0 K a0ky+b0k by a monomial, with the result a posynomial. minimize k=1 e T Ki + An optimization problem of the form aiky bik ≤ subject to k=1 e 1,i=1,...,m T g y+hi minimize f0(x) e i =1,i=1,...,p, ≤ subject to fi(x) 1,i=1,...,m n where aik ∈ R , i =0,...,m, contain the exponents hi(x)=1,i=1,...,p n of the posynomial inequality constraints, and g i ∈ R , where f0,...,fm are posynomials and h1,...,hp are i =1,...,p, contain the exponents of the monomial monomials, is called a geometric program (GP). The equality constraints of the original geometric program. D n domain of this problem is = R++; the constraint Now we transform the objective and constraint func-  x 0 is implicit. tions, by taking the logarithm. This results in the prob- Several extensions are readily handled. If f is a lem posynomial and h is a monomial, then the constraint T K0 a y+b0k ≤ f˜0(y)=log e 0k f(x) h(x) can be handled by expressing it as minimize k=1 T ≤ Ki + f(x)/h(x) 1 (since f/his posynomial). This includes ˜ aiky bik subject to fi(y)=log =1 e ≤ 0,i=1,...,m as a special case a constraint of the form f(x) ≤ a, k ˜ T where f is posynomial and a>0.Inasimilar way if hi(y)=gi y + hi =0,i=1,...,p. h1 and h2 are both nonzero monomial functions, then Since the functions f˜i are convex, and h˜i are affine, this we can handle the equality constraint h1(x)=h2(x) problem is a convex optimization problem. We refer to by expressing it as h1(x)/h2(x)=1(since h1/h2 it as a geometric program in convex form.Note that the is monomial). We can maximize a nonzero monomial transformation between the posynomial form geometric objective function, by minimizing its inverse (which is program and the convex form geometric program does also a monomial). not involve any computation; the problem data for the We will now transform geometric programs to convex two problems are the same. problems by a change of variables and a transformation of the objective and constraint functions. VI. NONSTANDARD AND NONCONVEX PROBLEMS We will use the variables defined as yi =logxi,so yi In practice, one might encounter convex problems xi = e .Iff a monomial function of x, i.e., which do not fall into any of the canonical forms a1 a2 ··· an f(x)=cx1 x2 xn , above. In this case, it is necessary to develop custom then code for the problem. Developing such codes requires gradient, for optimal performance, Hessian information. f(x)=f(ey1 ,...,eyn ) If only gradient information is available, the ellipsoid, = c(ey1 )a1 ···(eyn )an subgradient or cutting plane methods can be used. These T + are reliable methods with exact stopping criteria. The = ea y b, same is true for interior point methods, however these where b =logc. The change of variables yi =logxi also require Hessian information. The payoff for having turns a monomial function into the exponential of an Hessian information is much faster convergence; in affine function. practice, they solve most problems at the cost of a dozen

13 least squares problems of about the size of the decision [16] J.-B. Hiriart-Urruty and C. Lemar´echal. Fundamentals of Convex variables. These methods are described in the references. Analysis. Springer, 2001. Abridged version of Convex Analysis and Minimization Algorithms volumes 1 and 2. Nonconvex problems are more common, of course, [17] R. A. Horn and C. A. Johnson. Matrix Analysis. Cambridge than nonconvex problems. However, convex optimiza- University Press, 1985. tion can still often be helpful in solving these as well: [18] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4(4):373–395, 1984. it can often be used to compute lower; useful initial [19] D. G. Luenberger. Optimization by Vector Space Methods. John starting points; etc. Wiley & Sons, 1969. [20] D. G. Luenberger. Linear and . Addison- VII. CONCLUSION Wesley, second edition, 1984. [21] Z.-Q. Luo. Applications of convex optimization in signal pro- In this paper we have reviewed some essentials of cessing and digital communication. Mathematical Programming Series B, 97:177–207, 2003. convex sets, functions and optimization problems. We [22] O. Mangasarian. Nonlinear Programming. Society for Industrial hope that the reader is convinced that convex optimiza- and Applied Mathematics, 1994. First published in 1969 by tion dramatically increases the range of problems in McGraw-Hill. [23] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial engineering that can be modeled and solved exactly. Methods in Convex Programming. Society for Industrial and Applied Mathematics, 1994. ACKNOWLEDGEMENT [24] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. Iamdeeply indebted to Stephen Boyd and Lieven [25] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and Algorithms for Vandenberghe for generously allowing me to use mate- Linear Optimization. An Interior Point Approach. John Wiley & rial from [7] in preparing this tutorial. Sons, 1997. [26] G. Strang. Linear Algebra and its Applications. Academic Press, 1980. REFERENCES [27] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley & Sons, 1984. [1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear [28] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Hand- Programming. Theory and Algorithms. John Wiley & Sons, book of Semidefinite Programming. Kluwer Academic Publish- second edition, 1993. ers, 2000. [2] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Op- [29] S. J. Wright. Primal-Dual Interior-Point Methods. Society for timization. Analysis, Algorithms, and Engineering Applications. Industrial and Applied Mathematics, 1997. Society for Industrial and Applied Mathematics, 2001. [30] Y. Ye. Interior Point Algorithms. Theory and Analysis. John [3] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Wiley & Sons, 1997. 1999. [4] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientific, 1997. [5] S. Boyd and C. Barratt. Linear Controller Design: Limits of Performance. Prentice-Hall, 1991. [6] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. Society for Industrial and Applied Mathematics, 1994. [7] S.P. Boyd and L. Vandenberghe. Convex Optimization. Cam- bridge University Press, 2003. In Press. Material available at www.stanford.edu/˜boyd. [8] D. Colleran, C. Portmann, A. Hassibi, C. Crusius, S. Mohan, T. Lee S. Boyd, and M. Hershenson. Optimization of phase- locked loop circuits via geometric programming. In Custom Integrated Circuits Conference (CICC), San Jose, California, September 2003. [9] M. A. Dahleh and I. J. Diaz-Bobillo. Control of Uncertain Systems: A Linear Programming Approach. Prentice-Hall, 1995. [10] J. W. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, 1997. [11] G. E. Dullerud and F. Paganini. A Course in Robust Control Theory: A Convex Approach. Springer, 2000. [12] G. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, second edition, 1989. [13] M. Gr¨otschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, 1988. [14] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, 2001. [15] M. del Mar Hershenson, S. P. Boyd, and T. H. Lee. of a CMOS op-amp via geometric programming. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(1):1–21, 2001.

14