<<

A Tutorial on

Haitham Hindi Palo Alto Research Center (PARC), Palo Alto, California email: [email protected]

Abstract— In recent years, convex optimization has be- quickly deduce the convexity (and hence tractabil- come a computational tool of central importance in engi- ity) of many problems, often by inspection; neering, thanks to it’s ability to solve very large, practical 2) to review and introduce some canonical opti- engineering problems reliably and efficiently. The goal of this tutorial is to give an overview of the basic concepts of mization problems, which can be used to model convex sets, functions and convex optimization problems, so problems and for which reliable optimization code that the reader can more readily recognize and formulate can be readily obtained; engineering problems using modern convex optimization. 3) to emphasize modeling and formulation; we do This tutorial coincides with the publication of the new book not discuss topics like or writing custom on convex optimization, by Boyd and Vandenberghe [7], who have made available a large amount of free course codes. material and links to freely available code. These can be We assume that the reader has a working knowledge of downloaded and used immediately by the audience both and vector calculus, and some (minimal) for self-study and to solve real problems. exposure to optimization. Our presentation is quite informal. Rather than pro- I. INTRODUCTION vide details for all the facts and claims presented, our Convex optimization can be described as a fusion goal is instead to give the reader a flavor for what is of three disciplines: optimization [22], [20], [1], [3], possible with convex optimization. Complete details can [4], [19], [24], [27], [16], [13], and be found in [7], from which all the material presented numerical computation [26], [12], [10], [17]. It has here is taken. Thus we encourage the reader to skip recently become a tool of central importance in engi- sections that might not seem clear and continue reading; neering, enabling the solution of very large, practical the topics are not all interdependent. engineering problems reliably and efficiently. In some Also, in order keep the paper quite general, we sense, convex optimization is providing new indispens- have tried to not to bias our presentation toward any able computational tools today, which naturally extend particular audience. Hence, the examples used in the our ability to solve problems such as and paper are simple and intended merely to clarify the to a much larger and richer class of optimization ideas and concepts. For detailed examples problems. and applications, the reader is refered to [7], [2], and the references therein. Our ability to solve these new types of problems We now briefly outline the paper. Sections II and III, comes from recent breakthroughs in algorithms for solv- respectively, describe convex sets and convex functions ing convex optimization problems [18], [23], [29], [30], along with their calculus and properties. In section IV, coupled with the dramatic improvements in computing we define convex optimization problems, at a rather power, both of which have happened only in the past abstract level, and we describe their general form and decade or so. Today, new applications of convex op- desirable properties. Section V presents some specific timization are constantly being reported from almost canonical optimization problems which have been found every area of engineering, including: control, signal to be extremely useful in practice, and for which effi- processing, networks, circuit design, communication, in- cient codes are freely available. Section VI comments formation theory, computer science, operations research, briefly on the use of convex optimization for solving economics, , structural design. See [7], [2], [5], nonstandard or nonconvex problems. Finally, section VII [6], [9], [11], [15], [8], [21], [14], [28] and the references concludes the paper. therein. The objectives of this tutorial are: Motivation 1) to show that there are straight forward, systematic A vast number of design problems in engineering can rules and facts, which when mastered, allow one to be posed as constrained optimization problems, of the form: develop an intuition for the geometry of convex opti- mization. minimize f (x) 0 A f : Rn Rm is affine if it has the form subject to f (x) 0, i = 1, . . . , m (1) i linear plus constant f→(x) = Ax + b. If F is a matrix h (x) ≤= 0, i = 1, . . . , p. i valued function, i.e., F : Rn Rp×q, then F is affine → where x is a vector of decision variables, and the if it has the form functions f0, fi and hi, respectively, are the cost, in- F (x) = A0 + x1A1 + + xnAn equality constraints, and equality constraints. However, · · · where A Rp×q. Affine functions are sometimes such problems can be very hard to solve in general, i ∈ especially when the number of decision variables in x loosely refered to as linear. Recall that S Rn is a subspace if it contains the is large. There are several reasons for this difficulty: ⊆ First, the problem “terrain” may be riddled with local plane through any two of its points and the origin, i.e., optima. Second, it might be very hard to find a feasible x, y S, λ, µ = λx + µy S. point (i.e., an x which satisfies all the equalities and ∈ ∈ ⇒ ∈ inequalities), in fact the feasible set, which needn’t even Two common representations of a subspace are as the be fully connected, could be empty. Third, stopping range of a matrix criteria used in general optimization algorithms are often range(A) = Aw w Rq arbitrary. Forth, optimization algorithms might have very { | ∈ } = w a + + w a w R poor convergence rates. Fifth, numerical problems could { 1 1 · · · q q | i ∈ } cause the minimization algorithm to stop all together or where A = a1 aq ; or as the nullspace of a · · · wander. matrix   It has been known for a long time [19], [3], [16], [13] nullspace(B) = x Bx = 0 that if the f are all convex, and the h are affine, then { | } i i = x bT x = 0, . . . , bT x = 0 the first three problems disappear: any local optimum { | 1 p } is, in fact, a global optimum; feasibility of convex op- T where B = b1 bp . As an example, let n · · · timization problems can be determined unambiguously, S = X Rn×n X = XT denote the set at least in principle; and very precise stopping criteria of symmetric{ ∈matrices. |Then Sn is a} subspace, since are available using duality. However, convergence rate symmetric matrices are closed under addition. Another and numerical sensitivity issues still remained a potential way to see this is to note that Sn can be written as problem. n×n X R Xij = Xji, i, j which is the nullspace It was not until the late ’80’s and ’90’s that researchers of{ linear∈ function| X X T∀. } in the former Soviet Union and United States discovered A set S Rn is af−fine if it contains line through any ⊆ that if, in addition to convexity, the fi satisfied a property two points in it, i.e., known as self-concordance, then issues of convergence x, y S, λ, µ R, λ + µ = 1 = λx + µy S. and numerical sensitivity could be avoided using interior ∈ ∈ ⇒ ∈ point methods [18], [23], [29], [30], [25]. The self- p concordance property is satisfied by a very large set of  PSfrag replacementsλ = 1.5 px important functions used in engineering. Hence, it is now p possible to solve a large class of convex optimization  y problems in engineering with great efficiency. λ = 0.6 p p  II. CONVEX SETS λ = 0.5 − In this section we list some important convex sets Geometrically, an affine set is simply a subspace which and operations. It is important to note that some of is not necessarily centered at the origin. Two common these sets have different representations. Picking the representations for of an affine set are: the range of affine right representation can make the difference between a function tractable problem and an intractable one. S = Az + b z Rq , { | ∈ } We will be concerned only with optimization prob- or as the solution of a set of linear equalities: lems whose decision variables are vectors in Rn or m×n S = x bT x = d , . . . , bT x = d matrices in R . Throughout the paper, we will make { | 1 1 p p} frequent use of informal sketches to help the reader = x Bx = d . { | }

2 n n A set S R is a if it contains the line • K = S : X Y means Y X is PSD ⊆ + K − segment joining any of its points, i.e., These are so common we drop the K. Given points x Rn and θ R, then y = θ x + x, y S, λ, µ 0, λ + µ = 1 = λx + µy S i ∈ i ∈ 1 1 ∈ ≥ ⇒ ∈ + θkxk is said to be a convex not convex · · · • linear combination for any real PSfrag replacements θi • affine combination if i θi = 1 • if θi = 1, θi 0 y P i ≥ x • conic combination if θi 0 P≥ The linear (resp. affine, convex, conic) hull of a set is the set of all linear (resp. affine, convex, conic) Geometrically, we can think of convex sets as always S combinations of points from S, and is denoted by bulging outward, with no dents or kinks in them. Clearly span(S) (resp. Aff(S), Co(S), Cone(S)). It can be subspaces and affine sets are convex, since their defini- shown that this is the smallest such set containing . tions subsume convexity. S As an example, consider the set S = A set S Rn is a if it contains all (1, 0, 0), (0, 1, 0), (0, 0, 1) . Then span(S) is R3; rays passing⊆through its points which emanate from the {Aff(S) is the hyperplane} passing through the three origin, as well as all line segments joining any points points; Co(S) is the unit which is the triangle on those rays, i.e., joining the vectors along with all the points inside it; 3 x, y S, λ, µ 0, = λx + µy S Cone(S) is the nonnegative orthant R+. ∈ ≥ ⇒ ∈ Recall that a hyperplane, represented as x aT x = Geometrically, x, y S means that S contains the entire { | ∈ b (a = 0), is in general an affine set, and is a subspace ‘pie slice’ between x, and y. if}b =6 0. Another useful representation of a hyperplane is given by x aT (x x ) = 0 , where a is normal { | − 0 } vector; x0 lies on hyperplane. Hyperplanes are convex, PSfrag replacements x since they contain all lines (and hence segments) joining any of their points. T y A halfspace, described as x a x b (a = 0) is 0 generally convex and is a conv{ex|cone if≤b =} 0. Another6 T n useful representation is x a (x x0) 0 , where a The nonnegative orthant, R+ is a convex cone. The set { | − ≤ } n n is (outward) normal vector and x0 lies on boundary. S+ = X S X 0 of symmetric positive T j semidefinite{ (PSD)∈ matrices|  is }also a convex cone, since a x = b any positive combination of semidefinite matrices is semidefinite. Hence we call Sn the positive semidefinite a + PSfrag replacements cone. px0 A convex cone K Rn is said to be proper if it is closed, has nonempty⊆interior, and is pointed, i.e., there T is no line in K. A proper cone K defines a generalized aT x b a x b ≤ ≥ inequality in Rn: K x K y y x K We now come to a very important fact about prop-  ⇐⇒ − ∈ erties which are preserved under intersection: Let be (strict version x y y x int K). A ≺K ⇐⇒ − ∈ an arbitrary index set (possibly uncountably infinite) and Sα α a collection of sets , then we have the follo{ wing:| ∈ A} y K x PSfrag replacements K  subspace subspace S affine = S affine x α is ⇒ α is  convex   convex  α∈A convex cone \ convex cone In fact,every closed con vex set S is the(usually infinite) This formalizes our use of the symbol: intersection of halfspaces which contain it, i.e., n  • K = R+: x K y means xi yi S = halfspace, S . For example, (componentwise vector inequality)≤ another w{Hay to| seeH that Sn is a con⊆veHx}cone is to recall T +

3 that a matrix X Sn is positive semidefinite if zT Xz Another workhorse of convex optimization is the 0, z Rn. Thus∈ we can write ≥ ellipsoid. In addition to their computational convenience, ∀ ∈ n ellipsoids can be used as crude models or approximates Sn Sn T for other convex sets. In fact, it can be shown that any + = X z Xz = zizj Xij 0 . n n  ∈ ≥  bounded convex set in R can be approximated to within z∈R i,j=1 \  X  a factor of n by an ellipsoid. Now observe that the summation above is actually linear   There are many representations for ellipsoids. For in the components of X, so Sn is the infinite intersection + example, we can use of halfspaces containing the origin (which are convex n cones) in S . = x (x x )T A−1(x x ) 1 We continue with our listing of useful convex sets. E { | − − c ≤ } A polyhedron is intersection of a finite number of T where the parameters are A = A 0, the center xc halfspaces Rn. ∈ = x aT x b , i = 1, . . . , k P { | i ≤ i } = x Ax b PSfrag replacements √λ2 p √λ1 { |  } xc where above means componentwise inequality.  a1 a2 In this case, the semiaxis lengths are √λ , where λ PSfrag replacements i i eigenvalues of A; the semiaxis directions are eigenvec- 1/2 P tors of A and the volume is proportional to (det A) . a3 However, depending on the problem, other descriptions T a5 might be more appropriate, such as ( u = √u u) k k • image of unit ball under affine transformation: = a4 Bu + x u 1 ; vol. det B E { c | k k ≤ } ∝ A bounded polyhedron is called a polytope, which also • preimage of unit ball under affine transformation: −1 has the alternative representation = Co v1, . . . , vN , = x Ax b 1 ; vol. det A P { } E { | k − k ≤ } ∝ where v1, . . . , vN are its vertices. For example, the • sublevel set of nondegenerate quadratic: = { } n n nonnegative orthant R = x R x 0 x f(x) 0 where f(x) = xT Cx + 2dT xE + e + { ∈ |  } is a polyhedron, while the probability simplex x with{ | C =≤CT} 0, e dT C−1d < 0; vol. n { ∈ −1/2 − ∝ R x 0, i xi = 1 is a polytope. (det A) If| f is a norm, then}the norm ball B = x f(x It is an instructive exercise to convert among represen- x ) 1 Pis convex, and the norm cone{ | C =− c tations. (x, t)≤ f(}x) t is a convex cone. Perhaps the most The second-order cone is the norm cone associated f{amiliar| norms≤are} the ` norms on Rn: p with the Euclidean norm. p 1/p ( i xi ) ; p 1, x p = | | ≥ √ T k k maxi xi ; p = S = (x, t) x x t  P | | ∞ { | ≤T } 2 x I 0 x The corresponding norm balls (in R ) look like this: = (x, t) 0, t 0 t 0 1 t (−1, 1) (1, 1) ( ≤ ≥ )    −   

p = 1 ¡ ¢ £ p = 1.5 ¢ p = 2 £ p = ∞ 1 p = 3 I 0.8

0.6 p 0.4 PSfrag replacements 0.2

0 1 0.5 1 0 0.5 0 −0.5 −0.5 −1 −1

The image (or preimage) under an affinte transforma- (−1, −1) (1, −1) tion of a convex set are convex, i.e., if S, T convex,

4 T T then so are the sets set, where a supporting hyperplane x a x = a x0 supports S at x ∂S if { | } f −1(S) = x Ax + b S 0 ∈ { | ∈ } T T f(T ) = Ax + b x T x S a x a x0 { | ∈ } ∈ ⇒ ≤ An example is coordinate projection x (x, y) { | ∈ S for some y . As another example, a constraint of the a } x0 form PSfrag replacements T Ax + b 2 c x + d, S k k ≤ where A Rk×n, a second-order cone constraint, since it is the same∈ as requiring the affine expression (Ax + T k+1 b, c x + d) to lie in the second-order cone in R . III. CONVEX FUNCTIONS Similarly, if A , A , . . . , A Sn, solution set of the 0 1 m ∈ In this section, we introduce the reader to some linear matrix inequality (LMI) important convex functions and techniques for verifying

F (x) = A0 + x1A1 + + xmAm 0 convexity. The objective is to sharpen the reader’s ability · · ·  to recognize convexity. is convex (preimage of the semidefinite cone under an affine function). A. Convex functions m A linear-fractional (or projective) function f : R A function Rn R is convex if its domain dom n → f : f R has the form is convex and for all→x, y dom f, θ [0, 1] Ax + b ∈ ∈ f(x) = T f(θx + (1 θ)y) θf(x) + (1 θ)f(y); c x + d − ≤ − and domain dom f = = x cT x + d > 0 . If C f is concave if f is convex. H { | } − is a convex set, then its linear-fractional transformationPSfrag replacements f(C) is also convex. This is because linear fractional transformations preserve line segments: for x, y , ∈ H x x x f([x, y]) = [f(x), f(y)] convex concave neither Here are some simple examples on R: x2 is convex x4 x3 f(x4) (dom f = R); log x is concave (dom f = R++); and PSfrag replacements PSfrag replacements f(x3) f(x) = 1/x is convex (dom f = R++). x1 x2 f(x1) f(x2) It is convenient to define the extension of a f f(x) x dom f f˜(x) = Two further properties are helpful in visualizing the + x ∈ dom f geometry of convex sets. The first is the separating  ∞ 6∈ Note that f still satisfies the basic definition for all hyperplane theorem, which states that if S, T Rn x, y Rn, 0 θ 1 (as an inequality in R + ). are convex and disjoint (S T = ), then there exists⊆ a We will∈ use the≤ same≤ symbol for f and its ∪extension,{ ∞} hyperplane x aT x b =∩0 which∅ separates them. { | − } i.e., we will implicitly assume convex functions are extended. The of a function f is epi f = (x, t) x dom f, f(x) t T { | ∈ ≤ } PSfrag replacements S f(x)

epi f a PSfrag replacements The second property is the supporting hyperplane the- orem which states that there exists a supporting hy- perplane at every point on the boundary of a convex x

5 xi The (α-)sublevel set of a function f is • log-sum-exp function f(x) = log i e (tricky!) T n ∆ • affine functions f(x) = a x+b where a R , b S = x dom f f(x) α P2 ∈ ∈ α { ∈ | ≤ } R are convex and concave since f 0. T ∇ ≡T Form the basic definition of convexity, it follows that • quadratic functions f(x) = x P x+2q x+r (P = T 2 if f is a convex function if and only if its epigraph epi f P ) whose Hessian f(x) = 2P are convex P 0; concave ∇P 0 ⇐⇒ is a convex set. It also follows that if f is convex, then  ⇐⇒  Further elementary properties are extremely helpful its sublevel sets Sα are convex (the converse is false - see quasiconvex functions later). in verifying convexity. We list several: n The convexity of a differentiable function f : Rn R • f : R R is convex iff it is convex on all lines: → ∆→ can also be characterized by conditions on its f˜(t) = f(x0 + th) is convex in t R, for all 2 n ∈ f and Hessian f. Recall that, in general, the x0, h R ∇ ∇ ∈ gradient yields a first order Taylor approximation at x0: • nonnegative sums of convex functions are convex: T α1, α2 0 and f1, f2 convex f(x) f(x0) + f(x0) (x x0) = α ≥f + α f convex ' ∇ − ⇒ 1 1 2 2 We have the following first-order condition: f is convex • nonnegative infinite sums, integrals: if and only if for all x, x0 dom f, p(y) 0, g(x, y) convex in x ∈ = ≥p(y)g(x, y)dy convex in x T ⇒ f(x) f(x0) + f(x0) (x x0), • pointwise supremum (maximum): ≥ ∇ − R i.e., the first order approximation of f is a global fα convex = supα∈A fα convex (corresponds to intersection⇒of epigraphs) underestimator. f2 x f(x) ( ) PSfrag replacements PSfrag replacements epi max f1, f2 { } f1(x)

x0 x x T f(x0) + f(x0) (x x0) • affine transformation of domain: To see why∇ this is so,− we can rewrite the condition above f convex = f(Ax + b) convex in terms of the epigraph of f as: for all (x, t) epi f, ⇒ ∈ The following are important examples of how these T f(x0) x x0 further properties can be applied: ∇ − 0, T 1 t f(x0) ≤ • piecewise-linear functions: f(x) = maxi ai x+bi  −   −  is convex in x (epi f is polyhedron) { } i.e., ( f(x ), 1) defines supporting hyperplane to 0 • maximum distance to any set, sup x s , is epi ∇at − s∈S f (x0, f(x0)) convex in x [(x s) is affine in x, kis−a norm,k PSfrag replacements f(x) so f (x) = x −s is convex]. k · k s k − k • expected value: f(x, u) convex in x x = g(x) = Eu f(x, u) convex ⇒ n x0 ( f(x0), 1) • f(x) = x + x + x is convex on R , where ∇ − [1] [2] [3] x is the ith largest x . [To see this, note f(x) is Recall that the Hessian of f, 2f, yields a second [i] j ∇ the sum of the largest triple of components of x. order Taylor series expansion around x0: T So f can be written as f(x) = maxi ci x, where 1 T T 2 ci are the set of all vectors with components zero f(x) f(x0)+ f(x0) (x x0)+ (x x0) f(x0)(x x0) ' ∇ − 2 − ∇ − except for three 1’s.] m T −1 We have the following necessary and sufficient second • f(x) = log(b a x) is convex i=1 i − i order condition: a twice differentiable function f is (dom f = x aT x < b , i = 1, . . . , m ) { | i i } convex if and only if for all x dom f, 2f(x) 0, Another conPvexity preserving operation is that of ∈ ∇  i.e., its Hessian is positive semidefinite on its domain. minimizing over some variables. Specifically, if h(x, y) The convexity of the following functions is easy to is convex in x and y, then verify, using the first and second order characterizations f(x) = inf h(x, y) of convexity: y α • x is convex on R for α 1; is convex in x. This is because the operation above ++ ≥ • x log x is convex on R ; corresponds to projection of the epigraph, (x, y, t) + →

6 (x, t). The Schur complement technique is an indispensible h(x, y) tool for modeling and transforming nonlinear constraints f(x) into convex LMI constraints. Suppose a symmetric ma- trix X can be decomposed as PSfrag replacements A B X = y BT C   x ∆ with A invertible. The matrix S = C BT A−1B is − An important example of this is the minimum distance called the Schur complement of A in X, and has the function to a set S Rn, defined as: following important attributes, which are not too hard dist(x, S) = inf⊆ x y to prove: y∈S k − k = infy x y + δ(y S) ∆ k − k | • X 0 if and only if A 0 and S = C where δ(y S) is the indicator function of the set S, − | BT A−1B 0 which is infinite everywhere except on S, where it is • if A 0, then X 0 if and only if S 0 zero. Note that in contrast to the maximum distance   function defined earlier, dist(x, S) is not convex in For example, the following second-order cone con- general. However, if S is convex, then dist(x, S) is straint on x convex since x y + δ(y S) is convex in (x, y). Ax + b eT x + d The techniquek −ofkcomposition| can also be used to k k ≤ deduce the convexity of differentiable functions, by is equivalent to LMI means of the chain rule. For example, one can show (eT x + d)I Ax + b results like: f(x) = log i exp gi(x) is convex if each 0. (Ax + b)T eT x + d  gi is; see [7].   Jensen’s inequality, a classicalP result which essentially This shows that all sets that can be modeled by an SOCP a restatement of convexity, is used extensively in various constraint can be modeled as LMI’s. Hence LMI’s are forms. If f : Rn R is convex → more “expressive”. As another nontrivial example, the • two points: θ1 + θ2 = 1, θi 0 = quadratic-over-linear function f(θ x + θ x ) θ f(x ) +≥θ f(x⇒) 1 1 2 2 ≤ 1 1 2 2 • more than two points: θi = 1, θi 0 = 2 dom R R i ≥ ⇒ f(x, y) = x /y, f = ++ f( θixi) θif(xi) × i ≤ i P • continuous version: p(x) 0, p(x) dx = 1 = is convex because its epigraph (x, y, t) y > 0, x2/y ≥ ⇒ f(Pxp(x) dx)P f(x)p(x) dx t is convex: by Schur complememts,{ it’| s equivalent to≤ or more generally≤ , f(E x) ERf(x). the} solution set of the LMI constraints R R ≤ The simple techniques above can also be used to y x verify the convexity of functions of matrices, which arise 0, y > 0. x t  in many important problems.   T n×n • Tr A X = i,j Aij Xij is linear in X on R The concept of K-convexity generalizes convexity to −1 n • log det X is convex on X S X 0 vector- or matrix-valued functions. As mentioned earlier, P { ∈ | } m Proof: Verify convexity along lines. Let λi be the a pointed convex cone K R induces generalized −1/2 −1/2 ⊆ n m eigenvalues of X0 HX0 inequality K . A function f : R R is K-convex ∆ −1  → f(t) = log det(X0 + tH) if for all θ [0, 1] −1 −1/2 −1/2 −1 ∈ = log det X0 + log det(I + tX0 HX0 ) −1 f(θx + (1 θ)y) θf(x) + (1 θ)f(y) = log det X0 i log(1 + tλi) K which is a convex function− of t. −  − 1/n n • (det X) is concave Pon X S X 0 In many problems, especially in statistics, log-concave n { ∈ | } • λmax(X) is convex on S since, for X symmetric, functions are frequently encountered. A function f : T n λmax(X) = sup y Xy, which is a supre- R R+ is log-concave (log-convex) if log f is kyk2=1 → mum of linear functions of X. concave (convex). Hence one can transform problems in T 1/2 • X 2 = σ1(X) = (λmax(X X)) is con- log convex functions into convex optimization problems. kvexk on Rm×n since, by definition, X = As an example, the normal density, f(x) = const 2 − k k −(1/2)(x−x )T Σ 1(x−x ) · sup Xu 2. e 0 0 is log-concave. kuk2=1 k k

7 B. Quasiconvex functions The next best thing to a convex function in optimiza- PSfrag replacements Sα tion is a , since it can be minimized 1 f(x) in a few iterations of convex optimization problems. x ∇ n A function f : R R is quasiconvex if every Sα2 → Sα sublevel set Sα = x dom f f(x) α is 3 convex. Note that if f{is con∈ vex, then |it is automatically≤ } quasiconvex. Quasiconvex functions can have ‘locally α1 < α2 < α3 flat’ regions. • positive multiples y f quasiconvex, α 0 = αf quasiconvex ≥ ⇒ • pointwise supremum: f , f quasiconvex = 1 2 ⇒ PSfrag replacements max f1, f2 quasiconvex { } α (extends to supremum over arbitrary set) • affine transformation of domain f quasiconvex = f(Ax + b) quasiconvex ⇒ x • linear-fractional transformation of domain Ax+b Sα f quasiconvex = f quasiconvex ⇒ cT x+d We say f is quasiconcave if f is quasiconvex, i.e., on cT x + d > 0   superlevel sets x f(x) α −are convex. A function { | ≥ } • composition with monotone increasing function: which is both quasiconvex and quasiconcave is called f quasiconvex, g monotone increasing quasilinear. = g(f(x)) quasiconvex ⇒ We list some examples: • sums of quasiconvex functions are not quasiconvex • f(x) = x is quasiconvex on R in general | | • f(x) = log x is quasilinear on R • f quasiconvex in x, y = g(x) = inf f(x, y) p + ⇒ y • linear fractional function, quasiconvex in x (projection of epigraph remains quasiconvex) aT x + b f(x) = cT x + d IV. CONVEX OPTIMIZATION PROBLEMS is quasilinear on the halfspace cT x + d > 0 In this section, we will discuss the formulation of kx−ak • f(x) = 2 is quasiconvex on the halfspace optimization problems at a general level. In the next kx−bk2 x x a x b section we will focus on useful optimization problems { | k − k2 ≤ k − k2} Quasiconvex functions have a lot of similar features to with specific structure. convex functions, but also some important differences. Consider the following optimization problem in stan- The following quasiconvex properties are helpful in dard form verifying quasiconvexity: minimize f0(x) • f is quasiconvex if and only if it is quasiconvex on subject to f (x) 0, i = 1, . . . , m i ≤ lines, i.e., f(x0 + th) quasiconvex in t for all x0, h hi(x) = 0, i = 1, . . . , p • modified Jensen’s inequality: f is quasiconvex iff n for all x, y dom f, θ [0, 1], where fi, hi : R R; x is the optimization variable; ∈ ∈ f is the objective→or cost function; f (x) 0 are 0 i ≤ f(θx + (1 θ)y) max f(x), f(y) the inequality constraints; hi(x) = 0 are the equality f(x)− ≤ { } constraints. Geometrically, this problem corresponds to PSfrag replacements the minimization of f0, over a set described by as the intersection of 0-sublevel sets of the fi’s with surfaces described by the 0-solution sets of the hi’s. A point x is feasible if it satisfies the constraints; x y the feasible set C is the set of all feasible points; and • for f differentiable, f quasiconvex for all the problem is feasible if there are feasible points. The x, y dom f ⇐⇒ problem is said to be unconstrained if m = p = 0. The ∈ ? optimal value is denoted by f = infx∈C f0(x), and we T f(y) f(x) (y x) f(x) 0 adopt the convention that f ? = + if the problem is ≤ ⇒ − ∇ ≤ ∞

8 infeasible). A point x C is an optimal point if f(x) = 1) no local minima: any local optimum is necessarily ? ∈ ? f and the optimal set is Xopt = x C f(x) = f . a global optimum; As an example consider the problem{ ∈ | } 2) exact infeasibility detection: using duality theory 5 (which is not cover here), hence algorithms are 4 easy to initialize; minimize x1 + x2 3 C 3) efficient numerical solution methods that can han- subject to x 0 2 1 x dle very large problems. PSfrag− replacements≤ 2 x2 0 − ≤ Note that often seemingly ‘slight’ modifications of 1 √x1x2 0 1 − ≤ convex problem can be very hard. Examples include:

0 0 1 2 3 4 5 Tx1 • convex maximization, concave minimization, e.g. The objective function is f0(x) = [1 1] x; the feasible set C is half-hyperboloid; the optimal value is f ? = 2; and the only optimal point is x? = (1, 1). maximize x subject to kAxk b In the standard problem above, the explicit constraints  are given by fi(x) 0, hi(x) = 0. However, there are ≤ • nonlinear equality constraints, e.g. also the implicit constraints: x dom fi, x dom hi, i.e., x must lie in the set ∈ ∈ minimize cT x D = dom f dom f dom h dom h subject to xT P x + qT x + r = 0, i = 1, . . . , K 0 ∩ · · ·∩ m ∩ 1 ∩ · · ·∩ p i i i which is called the domain of the problem. For example, • minimizing over non-convex sets, e.g., Boolean minimize log x log x variables − 1 − 2 subject to x1 + x2 1 0 − ≤ find x 2 has the implicit constraint x D = x R x1 > such that Ax b, . ∈ { ∈ |  0, x2 > 0 xi 0, 1 A feasibility} problem is a special case of the standard ∈ { } problem, where we are interested merely in finding any To understand global optimality in convex problems, feasible point. Thus, problem is really to recall that x C is locally optimal if it satisfies ∈ • either find x C ∈ • or determine that C = . y C, y x R = f (y) f (x) ∅ ∈ k − k ≤ ⇒ 0 ≥ 0 Equivalently, the feasibility problem requires that we either solve the inequality / equality system for some R > 0. A point x C is globally optimal means that ∈ f (x) 0, i = 1, . . . , m i ≤ hi(x) = 0, i = 1, . . . , p y C = f0(y) f0(x). or determine that it is inconsistent. ∈ ⇒ ≥ An optimization problem in standard form is a convex For convex optimization problems, any local solution is optimization problem if f0, f1, . . . , fm are all convex, also global. [Proof sketch: Suppose x is locally optimal, and hi are all affine: but that there is a y C, with f (y) < f (x). Then ∈ 0 0 minimize f0(x) we may take small step from x towards y, i.e., z = subject to fi(x) 0, i = 1, . . . , m λy + (1 λ)x with λ > 0 small. Then z is near x, with T ≤ − a x b = 0, i = 1, . . . , p. f0(z) < f0(x) which contradicts local optimality.] i − i This is often written as There is also a first order condition that characterizes optimality in convex optimization problems. Suppose f minimize f (x) 0 0 is differentiable, then x C is optimal iff subject to fi(x) 0, i = 1, . . . , m ∈ Ax =≤b y C = f (x)T (y x) 0 ∈ ⇒ ∇ 0 − ≥ where A Rp×n and b Rp. As mentioned in the ∈ ∈ introduction, convex optimization problems have three So f0(x) defines supporting hyperplane for C at x. crucial properties that makes them fundamentally more This−∇means that if we move from x towards any other tractable than generic nonconvex optimization problems: feasible y, f0 does not decrease.

9 contour lines of f0 We will see an important example of quasiconvex optimization in the next section: linear-fractional pro- gramming. f0(x) C x −∇ PSfrag replacements V. CANONICAL OPTIMIZATION PROBLEMS In this section, we present several canonical optimiza- tion problem formulations, which have been found to be extremely useful in practice, and for which extremely efficient solution codes are available (often for free!). Many standard convex optimization algorithms as- Thus if a real problem can be cast into one of these sume a linear objective function. So it is important to forms, then it can be considered as essentially solved. realize that any convex optimization problems can be converted to one with a linear objective function. This A. Conic programming is called putting the problem into epigraph form. It We will present three canonical problems in this involves rewriting the problem as: section. They are called conic problems because the minimize t inequalities are specified in terms of affine functions and subject to f0(x) t 0, generalized inequalities. Geometrically, the inequalities f (x) − 0,≤i = 1, . . . , m are feasible if the range of the affine mappings intersects i ≤ hi(x) = 0, i = 1, . . . , p the cone of the inequality. The problems are of increasing expressivity and mod- where the variables are now (x, t) and the objective has essentially be moved into the inequality constraints. eling power. However, roughly speaking, each added level of modeling capability is at the cost of longer com- Observe that the new objective is linear: t = eT (x, t) n+1 putation time. Thus one should use the least complex (e is the vector of all zeros except for a 1 in n+1 form for the problem at hand. its (n + 1)th component). Minimizing t will result in ? A general linear program (LP) has the form minimizing f0 with respect to x, so any optimal x of the new problem will be optimal for the epigraph form minimize cT x + d and vice versa. subject to Gx h f0(x) Ax = b,

PSfrag replacements where G Rm×n and A Rp×n. A problem∈ that subsumes∈ both linear and is the second-order cone program

en+1 (SOCP): − C minimize f T x Hence, the linear objective is ‘universal’ for convex T subject to Aix + bi 2 ci x + di, i = 1, . . . , m optimization. kF x = g, k ≤ The above trick of introducing the extra variable t n ni×n is known as the method of slack variables. It is very where x R is the optimization variable, Ai R , ∈ p×n ∈ helpful in transforming problems into canonical form. and F R . When ci = 0, i = 1, . . . , m, the SOCP We will see another example in the next section. is equi∈valent to a quadratically constrained quadratic A convex optimization problem in standard form with program (QCQP), (which is obtained by squaring each generalized inequalities is written as: of the constraints). Similarly, if Ai = 0, i = 1, . . . , m,

minimize f0(x) then the SOCP reduces to a (general) LP. Second-order cone programs are, however, more general than QCQPs subject to fi(x) Ki 0, i = 1, . . . , L Ax =b (and of course, LPs). n A problem which subsumes linear, quadratic and where f0 : R R are all convex; the Ki are → m n  m second-order cone programming is called a semidefinite generalized inequalities on R i ; and f : R R i i → program (SDP), and has the form are Ki-convex. A quasiconvex optimization problem is exactly the minimize cT x same as a convex optimization problem, except for one subject to x F + + x F + G 0 1 1 · · · n n  difference: the objective function f0 is quasiconvex. Ax = b,

10 k p×n where G, F1, . . . , Fn S , and A R . The is the cumulative distribution function of a zero mean inequality here is a linear∈ matrix inequality∈ . As shown unit variance Gaussian random variable. Thus the prob- earlier, since SOCP constraints can be written as LMI’s, ability constraint can be expressed as SDP’s subsume SOCP’s, and hence LP’s as well. (If bi u there are multiple LMI’s, they can be stacked into one − Φ−1(η), σ ≥ large block diagonal LMI, where the blocks are the individual LMIs.) or, equivalently, −1 We will now consider some examples which are u + Φ (η)σ bi. themselves very instructive demonstrations of modeling ≤ T T 1/2 and the power of convex optimization. From u = ai x and σ = (x Σix) we obtain First, we show how slack variables can be used to T −1 1/2 a x + Φ (η) Σ x 2 bi. convert a given problem into canonical form. Consider i k i k ≤ −1 the constrained `∞- (Chebychev) approximation By our assumption that η 1/2, we have Φ (η) 0, so this constraint is a second-order≥ cone constraint.≥ minimize Ax b ∞ In summary, the stochastic LP can be expressed as the subject to Fk x −g.k  SOCP By introducing the extra variable t, we can rewrite the minimize cT x problem as: subject to aT x + Φ−1(η) Σ1/2x b , i = 1, . . . , m. i k i k2 ≤ i minimize t We will see examples of using LMI constraints below. subject to Ax b t1 B. Extensions of conic programming Ax − b  t1 F x − g, − We now present two more canonical convex optimiza-  tion formulations, which can be viewed as extensions of which is an LP in variables x Rn, t R (1 is the the above conic formulations. vector with all components 1). ∈ ∈ A generalized linear fractional program (GLFP) has Second, as (more extensive) example of how an SOCP the form: might arise, consider linear programming with random minimize f0(x) constraints subject to Gx h Ax = b minimize cT x T where the objective function is given by subject to Prob(ai x bi) η, i = 1, . . . , m ≤ ≥ T ci x + di Here we suppose that the parameters ai are independent f0(x) = max , i=1,...,r T ei x + fi Gaussian random vectors, with mean ai and covariance T T Σi. We require that each constraint ai x bi should with dom f0 = x ei x + fi > 0, i = 1, . . . , r . hold with a probability (or confidence) e≤xceeding η, The objective function{ | is the pointwise maximum of} where η 0.5, i.e., r quasiconvex functions, and therefore quasiconvex, so ≥ this problem is quasiconvex. Prob(aT x b ) η. i ≤ i ≥ A determinant maximization program (maxdet) has the form: We will show that this probability constraint can be minimize cT x log det G(x) expressed as a second-order cone constraint. Letting − T 2 subject to G(x) 0 u = ai x, with σ denoting its variance, this constraint  can be written as F (x) 0 Ax =b u u bi u Prob − − η. where F and G are linear (affine) matrix functions of σ ≤ σ ≥   x, so the inequality constraints are LMI’s. By flipping Since (u u)/σ is a zero mean unit variance Gaussian signs, one can pose this as a maximization problem, − variable, the probability above is simply Φ((bi u)/σ), which is the historic reason for the name. Since G is where − affine and, as shown earlier, the function log det X z − n 1 2 is convex on the symmetric semidefinite cone S , the Φ(z) = e−t /2 dt + √2π maxdet program is a convex problem. Z−∞

11 We now consider an example of each type. First, we A = AT 0, and whose volume is proportional to consider an optimal transmitter power allocation prob- det A−1. This problem immediately reduces to: lem where the variables are the transmitter powers p , k minimize log det A−1 k = 1, . . . , m. Given m transmitters and mn receivers subject to A = AT 0 all at the same frequency. Transmitter i wants to transmit Av b 1, i = 1, . . . , K to its n receivers labeled (i, j), j = 1, . . . , n, as shown k i − k ≤ below (transmitter and receiver locations are fixed): which is a convex optimization problem in A, b (n + transmitter k n(n + 1)/2 variables), and can be written as a maxdet problem. PSfrag replacementsreceiver (i, j) Now, consider finding the maximum volume ellipsoid in a polytope given in inequality form

= x aT x b , i = 1, . . . , L transmitter i P { | i ≤ i }

Let A R denote the path gain from transmitter ijk ∈ k to receiver (i, j), and Nij R be the (self) noise d power of receiver (i, j). So at ∈receiver (i, j) the signal s PSfrag replacements power is Sij = Aijipi and the noise plus interference E power is Iij = k=6 i Aijk pk + Nij.Therefore the signal to interference/noise ratio (SINR) is P S /I . ij ij For this problem, we represent the ellipsoid as = E The optimal transmitter power allocation problem is then By +d y 1 , which is parametrized by its center { | k k ≤ } to choose p to maximize smallest SINR: d and shape matrix B = BT 0, and whose volume is i A p proportional to det B. Note the containment condition maximize min iji i means i,j k=6 i Aijk pk + Nij T subject to 0 pi pmax a (By + d) b for all y 1 ≤ P≤ E ⊆ P ⇐⇒ i ≤ i k k ≤ This is a GLFP. T T sup (ai By + ai d) bi Next we consider two related ellipsoid approximation ⇐⇒ kyk≤1 ≤ problems. In addition to the use of LMI constraints, Ba + aT d b , i = 1, . . . , L this example illustrates techniques of computing with ⇐⇒ k ik i ≤ i ellipsoids and also how the choice representation of the which is a convex constraint in B and d. Hence finding ellipsoids and polyhedra can make the difference be- the maximum volume : is convex problem in tween a tractable convex problem, and a hard intractable variables B, d: E ⊆ P one. First consider computing a minimum volume ellipsoid maximize log det B n T around points v1, . . . , vK R . This is equivalent subject to B = B 0 ∈ T to finding the minimum volume ellipsoid around the Bai + a d bi, i = 1, . . . , L k k i ≤ polytope defined by the of those points, Note, however, that minor variations on these two represented in vertex form. problems, which are convex and hence easy, resutl in problems that are very difficult: • compute the maximum volume ellipsoid inside polyhedron given in vertex form Co v , . . . , v E { 1 K } • compute the minimum volume ellipsoid containing PSfrag replacements polyhedron given in inequality form Ax b  In fact, just checking whether a given ellipsoid covers For this problem, we choose the representation for a polytope described in inequality form is extremelyE the ellipsoid as = x Ax b 1 , which is difficult (equivalent to maximizing a convex quadratic parametrized by itsE center{ A| −k 1b, −andkthe≤ shape} matrix s.t. linear inequalities).

12 C. Geometric programming We will now transform geometric programs to convex problems by a change of variables and a transformation In this section we describe a family of optimization of the objective and constraint functions. problems that are not convex in their natural form. We will use the variables defined as y = log x , so However, they are an excellent example of how problems i i x = eyi . If f is a monomial function of x, i.e., can sometimes be transformed to convex optimization i a1 a2 an problems, by a change of variables and a transformation f(x) = cx1 x2 xn , of the objective and constraint functions. In addition, · · · then they have been found tremendously useful for modeling real problems, e.g.circuit design and communication f(x) = f(ey1 , . . . , eyn ) networks. = c(ey1 )a1 (eyn )an n n · · · A function f : R R with dom f = R++, defined aT y+b as → = e ,

a1 a2 an f(x) = cx1 x2 xn , where b = log c. The change of variables yi = log xi · · · turns a monomial function into the exponential of an where c > 0 and ai R, is called a monomial affine function. ∈ function, or simply, a monomial. The exponents ai of a Similarly, if f is a posynomial, i.e., monomial can be any real numbers, including fractional K or negative, but the coefficient c must be nonnegative. a1k a2k ank f(x) = ckx x x , A sum of monomials, i.e., a function of the form 1 2 · · · n kX=1 K then a1k a2k ank K f(x) = ckx1 x2 xn , T · · · f(x) = eak y+bk , kX=1 Xk=1 where ck > 0, is called a posynomial function (with K where ak = (a1k, . . . , ank) and bk = log ck. After the terms), or simply, a posynomial. Posynomials are closed change of variables, a posynomial becomes a sum of under addition, multiplication, and nonnegative scaling. exponentials of affine functions. Monomials are closed under multiplication and division. The geometric program can be expressed in terms of If a posynomial is multiplied by a monomial, the result the new variable y as is a posynomial; similarly, a posynomial can be divided T K0 a0ky+b0k by a monomial, with the result a posynomial. minimize k=1 e K T i aiky+bik An optimization problem of the form subject to k=1 e 1, i = 1, . . . , m PT ≤ egi y+hi = 1, i = 1, . . . , p, minimize f0(x) P n subject to fi(x) 1, i = 1, . . . , m where aik R , i = 0, . . . , m, contain the exponents ≤ ∈ n hi(x) = 1, i = 1, . . . , p of the posynomial inequality constraints, and gi R , i = 1, . . . , p, contain the exponents of the monomial∈ where f0, . . . , fm are posynomials and h1, . . . , hp are equality constraints of the original geometric program. monomials, is called a geometric program (GP). The Now we transform the objective and constraint func- n domain of this problem is = R++; the constraint tions, by taking the logarithm. This results in the prob- D x 0 is implicit. lem Several extensions are readily handled. If is a T f K0 a y+b minimize f˜ (y) = log e 0k 0k posynomial and h is a monomial, then the constraint 0 k=1 T Ki can be handled by expressing it as ˜  aiky+bik  f(x) h(x) subject to fi(y) = log Pk=1 e 0, i = 1, . . . , m f(x)/h≤(x) 1 (since f/h is posynomial). This includes ≤ h˜ (y) = gT y+ h = 0, i =1, . . . , p. as a special≤case a constraint of the form f(x) a, i i P i ≤ where f is posynomial and a > 0. In a similar way if Since the functions f˜i are convex, and h˜i are affine, this h1 and h2 are both nonzero monomial functions, then problem is a convex optimization problem. We refer to we can handle the equality constraint h1(x) = h2(x) it as a geometric program in convex form. Note that the by expressing it as h1(x)/h2(x) = 1 (since h1/h2 transformation between the posynomial form geometric is monomial). We can maximize a nonzero monomial program and the convex form geometric program does objective function, by minimizing its inverse (which is not involve any computation; the problem data for the also a monomial). two problems are the same.

13 VI. NONSTANDARD AND NONCONVEX PROBLEMS [9] M. A. Dahleh and I. J. Diaz-Bobillo. Control of Uncertain Systems: A Linear Programming Approach. Prentice-Hall, 1995. In practice, one might encounter convex problems [10] J. W. Demmel. Applied Numerical Linear Algebra. Society for which do not fall into any of the canonical forms above. Industrial and Applied Mathematics, 1997. In this case, it is necessary to develop custom code for [11] G. E. Dullerud and F. Paganini. A Course in Robust Control Theory: A Convex Approach. Springer, 2000. the problem. Developing such codes requires gradient [12] G. Golub and C. F. Van Loan. Matrix Computations. Johns and, for optimal performance, Hessian information. If Hopkins University Press, second edition, 1989. only gradient information is available, the ellipsoid, [13] M. Grotschel,¨ L. Lovasz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, 1988. subgradient or cutting plane methods can be used. These [14] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of are reliable methods with exact stopping criteria. The Statistical Learning. Data Mining, Inference, and Prediction. same is true for interior point methods, however these Springer, 2001. [15] M. del Mar Hershenson, S. P. Boyd, and T. H. Lee. Optimal also require Hessian information. The payoff for having design of a CMOS op-amp via geometric programming. IEEE Hessian information is much faster convergence; in Transactions on Computer-Aided Design of Integrated Circuits practice, they solve most problems at the cost of a dozen and Systems, 20(1):1–21, 2001. [16] J.-B. Hiriart-Urruty and C. Lemarechal.´ Fundamentals of Convex least squares problems of about the size of the decision Analysis. Springer, 2001. Abridged version of Convex Analysis variables. These methods are described in the references. and Minimization Algorithms volumes 1 and 2. Nonconvex problems are more common, of course, [17] R. A. Horn and C. A. Johnson. Matrix Analysis. Cambridge University Press, 1985. than nonconvex problems. However, convex optimiza- [18] N. Karmarkar. A new polynomial-time algorithm for linear tion can still often be helpful in solving these as well: programming. Combinatorica, 4(4):373–395, 1984. it can often be used to compute lower; useful initial [19] D. G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1969. starting points; etc, see for example [4], [13], [23], [30]. [20] D. G. Luenberger. Linear and . Addison- Wesley, second edition, 1984. VII. CONCLUSION [21] Z.-Q. Luo. Applications of convex optimization in signal pro- cessing and digital communication. Mathematical Programming In this paper we have reviewed some essentials of Series B, 97:177–207, 2003. convex sets, functions and optimization problems. We [22] O. Mangasarian. Nonlinear Programming. Society for Industrial hope that the reader is convinced that convex optimiza- and Applied Mathematics, 1994. First published in 1969 by McGraw-Hill. tion dramatically increases the range of problems in [23] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial engineering that can be modeled and solved exactly. Methods in Convex Programming. Society for Industrial and Applied Mathematics, 1994. ACKNOWLEDGEMENT [24] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. I am deeply indebted to Stephen Boyd and Lieven [25] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and Algorithms for Vandenberghe for generously allowing me to use mate- Linear Optimization. An Interior Point Approach. John Wiley & Sons, 1997. rial from [7] in preparing this tutorial. [26] G. Strang. Linear Algebra and its Applications. Academic Press, 1980. REFERENCES [27] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley & Sons, 1984. [1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear [28] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Hand- Programming. Theory and Algorithms. John Wiley & Sons, book of Semide®nite Programming. Kluwer Academic Publish- second edition, 1993. ers, 2000. [2] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Op- [29] S. J. Wright. Primal-Dual Interior-Point Methods. Society for timization. Analysis, Algorithms, and Engineering Applications. Industrial and Applied Mathematics, 1997. Society for Industrial and Applied Mathematics, 2001. [30] Y. Ye. Interior Point Algorithms. Theory and Analysis. John [3] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Wiley & Sons, 1997. 1999. [4] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientific, 1997. [5] S. Boyd and C. Barratt. Linear Controller Design: Limits of Performance. Prentice-Hall, 1991. [6] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. Society for Industrial and Applied Mathematics, 1994. [7] S.P. Boyd and L. Vandenberghe. Convex Optimization. Cam- bridge University Press, 2003. In Press. Material available at www.stanford.edu/˜boyd. [8] D. Colleran, C. Portmann, A. Hassibi, C. Crusius, S. Mohan, T. Lee S. Boyd, and M. Hershenson. Optimization of phase- locked loop circuits via geometric programming. In Custom Integrated Circuits Conference (CICC), San Jose, California, September 2003.

14