<<

A mini-introduction to convexity

Geir Dahl∗ January 20, 2014

1 Introduction

Convexity, or convex analysis, is an area of mathematics where one studies questions related to two basic objects, namely convex sets and convex func- tions. Triangles, rectangles and “certain” polygons are examples of convex sets in the plane, and the quadratic function f(x) = ax2 + bx + c is convex provided that a ≥ 0. Actually, the points in the plane on or above the graph of this quadratic function is another example of a . But one may also consider convex sets in IRn for any n, and convex functions of several variables. Convexity is the mathematical core of optimization, and it plays an im- portant role in many other mathematical areas as statistics, approximation theory, differential equations and mathematical economics. This short note is meant for a short (probably too short) introduction to some concepts and results in convexity. The focus is on convexity in connection with linear optimization. This notes are meant for two (or three) lectures in the course MAT-INF3100 Linear Optimization where the main project is to study linear programming, but where some knowledge to convexity is useful. Due to this limited scope of these notes we do not discuss convex functions, except for a few remarks a couple of places.

Example 1. (Optimization and convex functions) A basic optimization prob- lem is to minimize a real-valued function f of n variables, say f(x) where x = (x1,...,xn) ∈ A and A is the domain of f. (Such problems arise in all sorts of applications: economics, statistics (estimation, regression, curve fitting), approximation problems, scheduling and planning problems, image

∗University of Oslo, Dept. of Mathematics ([email protected])

1 f(x1, x2) f(x) f(x)

x2 x x

x1

Figure 1: Some convex functions

analysis, medical imaging, engineering applications etc.). A global minimum of f is a point x∗ with

f(x∗) ≤ f(x) for all x ∈ A

where A is the domain of f. Often it is hard to find a global minimum so one settles with a local minimum point which satisfies f(x∗) ≤ f(x) for all x in A that are sufficiently close to x∗. There are several optimization algorithms that are able to locate a local minimium of f. Unfortunaly, the function value in a local minimum may be much larger than the global minimum value. This raises the question: Are there functions where a local minimum point is also a global minimum? The main answer to this question is: If f is a convex function, and the domain A is a convex set, then a local minimum point is also a global minimum point ! Thus, one can find the global minimum of convex functions whereas this may be hard (or even impossible) in other situations. Some convex functions are illustrated in Figure 1.

In linear optimization (= linear programming) we minimize a linear func- tion f subject to linear constraints; more precicely these constraints are lin- ear inequalities and linear equations. The feasible set in this case (the set of points satisfying the constraints) is always a convex set, in fact it is a special convex set called a polyhedron.

Example 2. (Convex set) Loosely speaking a convex set in IR2 (or IRn) is a set “with no holes”. More accurately, a convex set C has the following property: whenever we choose two points in the set, say x, y ∈ C, then all

2 Figure 2: Some convex sets in the plane. points on the between x and y also lie in C. Some examples of convex sets in the plane are: a sphere (ball), an ellipsoid, a point, a line, a line segment, a rectangle, a triangle, see Fig. 2. But, for instance, a set with a finite number p of points is only convex when p = 1. The union of two disjoint (closed) triangles is also nonconvex.

Example 3. (Approximation) A basic approximation problem, with several applications, may be presented in the following way: given some closed set S ⊂ IRn and a vector a 6∈ S, find a nearest point defined as a point (vector) x ∈ S which is as close to a as possible among elements in S. Let us measure the distance between vectors using the Euclidean norm, so(kx−yk = n 2 1/2 n ( j=1(xj − yj) ) for x, y ∈ IR . One can show that there is always at least one nearest point provided that S is nonempty and closed (contains its boundary).P Now, if S is a convex set, then there is a unique nearest point.

2 The definitions

You will now see three basic definitions: 1. A set C ⊆ IRn is called convex if

(1 − λ)x1 + λx2 ∈ C whenever x1, x2 ∈ C and 0 ≤ λ ≤ 1. Geometrically, this means that C contains the line segment between each pair of points in C. 2. Let C ⊆ IRn be a convex set and consider a real-valued function f de- fined on C. The function f is called convex if the inequality f((1 − λ)x + λy) ≤ (1 − λ)f(x)+ λf(y) (1) holds for every x, y ∈ C and every 0 ≤ λ ≤ 1.

3 3. A polyhedron P ⊆ IRn is defined a the solution set of a system of linear inequalities. Thus, P has the form

P = {x ∈ IRn : Ax ≤ b} (2)

where A is a real m×n matrix, b ∈ IRm and where the vector inequality means it holds for every component.

Here are some important comments to these definitions:

• In part 1 of the definition above the point (1 − λ)x1 + λx2 is called a convex combination of x1 and x2. So the definition of a convex set says that it is closed under taking convex combination of pairs of points. Actually, one can show that when C is convex it also contains every convex combination of any (finite) set of its points. A convex combi- nation of points x1, x2,...,xm is a

m

λjxj j=1 X

where the coefficients λ1,λ2,...,λm are nonnegative and sum to 1. • In the definition of a convex function we actually use that the domain C is a convex set: this assures that the point (1 − λ)x + λy lies in C so the defining inequality for f makes sense.

• In the definition of a polyhedron we consider systems of linear inequal- ities. Since a linear equation aT x = α may be written as two linear inequalities, namely aT x ≤ α and −aT x ≤ −α, one may also say that a polyhedron is the solution set of a system of linear equations and inequalities.

Proposition 1. Every polyhedron is a convex set.

n Proof. Consider a polyhedron P = {x ∈ IR : Ax ≤ b} and let x1, x2 ∈ P and 0 ≤ λ ≤ 1. Then

A((1 − λ)x1 + λx2)=(1 − λ)Ax1 + λAx2 ≤ (1 − λ)b + λb = b which shows that (1 − λ)x1 + λx2 ∈ P and the convexity of P follows.

4 3 Linear optimization and convexity

Recall that a linear programming problem may be written as

maximize c1x1 + ... +cnxn subject to

a11x1 + ... +a1nxn ≤ b1; . (3) .

am1x1 + ... +amnxn ≤ bm;

x1,...,xn ≥ 0. or more compactly in matrix form maximize cT x subject to (4) Ax ≤ b x ≥ O.

Here A = [ai,j] is the m × n coefficient matrix with (i, j)th element being m ai,j, b ∈ IR (column vector) and O denotes a zero vector (here of dimension n). Again vector inequalities should be interpreted componentwise. Thus, the LP feasible set {x ∈ IRn : Ax ≤ b, x ≥ O} is a polyhedron and therefore a convex set. (Actually, LP may be defined as minimizing or maximizing a linear function over a polyhedron). As a consequence we have that if x1 and x2 are two feasible points, then every convex combination of these points is also feasible. But what can be said about the set of optimal solutions? Proposition 2. In an LP problem with finite optimal value the set P ∗ of optimal solutions is a convex set, actually P ∗ is a polyhedron. Proof. Let v∗ denote the optimal value. Then

P ∗ = {x ∈ IRn : Ax ≤ b, x ≥ O, cT x = v∗} which is a polyhedron. So, if you have different optimal solutions of an LP problem every convex combination of these will also be optimal. An attempt to illustrate the geometry of linear programming is given in Fig. 3 (where the feasible region is the solution set of five linear inequalities).

5 c

b

∗ b x cT x = const. b feasible set cT x : maximum value

b b

Figure 3: Linear programming . 4 The

Given a (possibly nonconvex) set S it is natural to ask for the smallest convex set containing S. This question is what we consider in this section. Let S ⊆ IRn be any set. Define the convex hull of S, denoted by conv(S) as the set of all convex combinations of points in S (see Fig. 4). The convex hull of two points x1 and x2 is the line segment between the two points An important fact is that conv(S) is a convex set, whatever the set S might be. Thus, taking the convex hull becomes a way of producing new convex sets. The following proposition tells us that the convex hull of a set S is the smallest convex set containing S. Recall that the intersection of an arbitrary family of sets consists of the points that lie in all of these sets. Proposition 3. Let S ⊆ IRn. Then conv(S) is equal to the intersection of all convex sets containing S. Thus, conv(S) is the smallest convex set containing S. Proof. It is an exercise to show that conv(S) is convex. Moreover, S ⊆ conv(S); just look at a convex combination of one point! Therefore W ⊆ conv(S) where W is defined as the intersection of all convex sets containing S. Now, consider a convex set C containing S. Then C must contain all convex combinations of points in S. But then conv(S) ⊆ C and we conclude that W (the intersection of such sets C) must contain conv(S). This completes the proof.

6 a) b)

conv(S)

The set S

Figure 4: Convex hull

Note that if S is convex, then conv(S) = S. The proof is left as an exercise. We have seen that by taking the convex hull we produce a convex set whatever set we might start with. If we start with a finite set, a very inter- esting class of convex sets arise. A set P ⊂ IRn is called a polytope if it is the convex hull of a finite set of points in IRn. In Figure 3 the shaded area is a polytope (of dimension 2). An example of a three-dimensional polytope is the dodecahedron shown next

The convex set in Fig. 4 b) is not a polytope. Every polytope is clearly bounded, i.e., it lies in some, suitably large, ball. A central result in the theory of polytopes is that: a set is a polytope if and only if it is a bounded polyhedron. We return to this in Section 6.

7 Polytopes have been studied a lot during the history of mathematics. Today polytope theory is still a fascinating subject with a lot of activity. One of the reasons is its relation to linear programming, because in LP problems the feasible set P is a polyhedron. If, moreover, P is bounded, then it is actually a polytope (see above). This means that there is a finite set S of points which “span” P in the sense that P consists of all convex combinations of the points in S.

Example 4. (LP and polytopes) Consider a polytope

P = conv({x1, x2,...,xt}).

We want to solve the optimization problem

max{cT x : x ∈ P } (5) where c ∈ IRn. As mentioned above, this problem is an LP problem, but we do not worry too much about this now. The interesting thing is the combination of a linear objective function and the fact that the feasible set is a convex hull of finitely many points. To see this, consider an arbitrary feasible point x ∈ P . Then x may be written as a convex combination of the t points x1, x2,...,xt, say x = j=1 λjxj for some λj ≥ 0, j = 1,...,t where λ = 1. Define now v∗ = max cT x . We then calculate j j P j j P t t t T T T ∗ ∗ ∗ c x = c λjxj = λjc xj ≤ λjv = v λj = v . j j=1 j=1 j=1 X X X X Thus, v∗ is an upper bound for the optimal value in the optimization problem (5). We also see that this bound is attained whenever λj is positive only for T ∗ those indices j satisfying c xj = v . Let J be the set of such indices. We conclude that the optimal solutions of the problem (5) is the set

conv({xj : j ∈ J}) which is another polytope (contained in P ). The procedure just described may be useful computationally if the number t of points defining P is not too large. In some cases, t is too large, and then we may still be able to solve the problem (5) by different methods, typically linear programming related methods.

8 5 Consistency and Farkas’ lemma

Farkas’ lemma is a theoretical results concerned with the consistency of a linear system of inequalities. It gives sufficient and necessary conditions for this system to be consistent, i.e., have at least one solution. Here is Farkas’ lemma and we prove it using LP duality theory. Other proofs also exists, for instance using separation theorems in convexity. Lemma 4. Let A ∈ IRm,n and b ∈ IRm. Then the linear system Ax ≤ b has at least one solution x if and only if yT b ≥ 0 for every y ∈ IRm satisfying yT A = O and y ≥ O. Proof. Consider the pair of dual LP problems (P) max cT x subject to Ax ≤ b and (D) min yT b subject to yT A = cT , y ≥ O. Let now c = O. Then (D) is feasible (y = O is feasible) so by LP theory (D) either has an optimal solution or it is unbounded. If (D) is unbounded weak LP duality implies that (P) cannot have any feasible solution, i.e., Ax ≤ b is not consistent. On the other hand, if (D) has an optimal solution, then yT b ≥ 0 for every y ∈ IRm satisfying yT A = O, y ≥ O. (For: if yT b < 0 for some y with yT A = O and y ≥ O, then (D) would be unbounded; just consider y′ = λy and increse λ). This proves the result. The geometrical content of Farkas’ lemma may be explained using a dif- ferent version of this lemma: Ax = b has a nonnegative solution if and only if yT b ≥ 0 for all y with yT A ≥ 0. Consider the C generated by the columns a1, a2,...,an of the matrix A; this is the set of all linear combinations of the columns using nonnegative coefficients only. Then b ∈ C means that there is a nonnegative solution x to the linear system Ax = b. Now, if b 6∈ C Farkas’ lemma says that there is a vector y with yT aj < 0 for j ≤ n and yT b > 0 which geometrically means that there is a hyperplane {x ∈ IRn : yT x =0} that separates b from a1, a2,...,an. See the illustration in Fig. 5.

6 Main theorem for polyhedra

We now present one of the most important theorems concerning polyhedra and polytopes. Actually, the theorem explains how these two objects are related. The result in its general form was proved in 1936 by T.S. Motzkin, and earlier more specialized versions are due to G. Farkas, H. Minkowski and H. Weyl.

9 C = cone({a1, ..., a4}) 1 a a2 3 b a a4 y

separating hyperplane {x : yT x =0}

Figure 5: Geometry of Farkas’ lemma

Theorem 5. Let P ⊆ IRn be a nonempty polyhedron. Then there are vectors n n v1, v2,...,vs ∈ IR and w1,w2,...,wt ∈ IR such that P consists precisely of the vectors x that may be written

s t

x = λivi + µjwj (6) i=1 j=1 X X s where all λi and µj are nonnegative and i=1 λi =1. n n Conversely, let v1, v2,...,vs ∈ IR and w1,w2,...,wt ∈ IR . Then the set Q of vectors x of the form ( 6) is a polyhedron,P so there is a matrix A ∈ IRm,n and a vector b ∈ IRm such that

Q = {x ∈ IRn : Ax ≤ b}.

We omit the proof here. The theorem above is an existence result; it shows the existence of certain vectors and matrices, but it does not tell us how to find these. Still, this can indeed be done using, for instance, the Fourier-Motzkin method which we explain in the next section. The core of the main theorem above is that polyhedra may be represented in two ways

1. as the solution set of a linear system Ax ≤ b: this may be considered an exterior description as the intersection of halfspaces.

2. an interior description via convex combinations and conical combina- tions of certain vectors.

10 An immediate consequence of the main theorem is the following result.

Corollary 6. A set P is a polytope if and only if it is a bounded polyhedron.

Theorem 5 has a strong connection to linear programming theory. Con- sider an LP problem

max cT x subject to Ax ≤ b and the corresponding feasible polyhedron P = {x ∈ IRn : Ax ≤ b} which n n we assume is nonempty. Let v1, v2,...,vs ∈ IR and w1,w2,...,wt ∈ IR be the vectors as described in the first part of Theorem 5. So every x ∈ P may be written as s t

x = λivi + µjwj i=1 j=1 X X s for suitable nonnegative scalars where i=1 λi = 1. There are now two possibilities: P t • c wj > 0 for some j ≤ t. Then the LP is unbounded; we can get an arbitrary large value of the objective function by moving along the ray {µwj : µ ≥ 0}.

t • c wj ≤ 0 for all j ≤ t. Then the LP has an optimal solution and, actually, one of the points v1, v2,...,vs is optimal (as we may let µj =0 for all j); recall here the argument given in Example 4.

Moreover, the points v1, v2,...,vs in Theorem 5 may be chosen to be extreme points of the polyhedron P . An extreme point of P is defined as a point which cannot be written as a convex combination of other points in P (actually, it suffices to check if it can be written as the midpoint between two points in P ). Thus: x ∈ P is an extreme point if and only if

1 1 1 2 1 2 1 2 x = 2 x + 2 x for x , x ∈ P implies that x = x = x.

Proposition 7. Let A be an m × n matrix with rank m, and consider the polyhedron P = {x ∈ IRn : Ax = b, x ≥ O}. Let x ∈ P . Then x is a basic solution (in LP sense) if and only if x is an extreme point of P .

11 Proof. If x is a basic feasible solution, then (after possibly reordering vari- x − ables; see LP lectures) we may write x = B where x = A 1b, x = O x B B N  N  and A = AB AN ; here the m × m submatrix AB is invertible (nonsingu- lar). (This is possible as A has full rowrank, so there must exist m linearly   1 2 independent columns in A.) Assume that x = (1/2)x + (1/2)x for some 1 2 1 2 x , x ∈ P . This immediately implies that xN = xN = O (as for each j ∈ N 1 2 1 1 1 xj = 0 and xj , xj ≥ 0). Moreover, Ax = b so ABxB + AN xN = b and 1 −1 1 2 −1 therefore xB = AB b = xB (as xN = O). Similarly, xB = AB b = xB, so x1 = x2 = x, and it follows that x is an extreme point. Conversely, assume that x is an extreme point of P and consider the indices corresponding to positive components: J = {j ≤ n : xj > 0}. Claim: the columns in A corresponding to J are linearly independent. Proof of Claim: If J is empty, there is nothing to prove here. Otherwise, let aj denote the jth column in A. From Ax = b we obtain j∈J xjaj = b. Assume ∈ λjaj = O. Multiply this equation by a number ǫ and add to the j J P previous vector equation; this gives ∈ (xj + ǫλj)aj = b. In this equation P j J each xj is positive, so we can find a suitably small ǫ0 > 0 (small compared P 1 2 to the λj’s), so that xj + ǫλj > 0 for all ǫ ∈ [−ǫ0, ǫ0]. Define x and x by 1 1 2 2 xj = xj + ǫ0λj (j ∈ J), xj =0(j 6∈ J), and xj = xj − ǫ0λj (j ∈ J), xj = 0 1 2 1 1 (j 6∈ J). Then x , x ∈ P (as x ≥ O and Ax = Ax + ǫ0 j∈J λaj = Ax = b; similar for x2). Moreover, x = (1/2)x1 + (1/2)x2 and since x is an extreme 1 2 P point, we get x = x and therefore λj = 0 for each j ∈ J. This proves that the vectors aj, j ∈ J are linearly independent. m Finally, we may extend the columns aj, j ∈ J into a basis for IR by adding m −|J| other columns from A; this is possible since the columnrank of A is m (the “extension theorem” in basic linear algebra). So these m columns form a basis AB in A (a nonsingular submatrix) and since Ax = b −1 (because x ∈ P ) and xN = O we get ABxB = b, so xB = AB b and x is a basic solution. Thus, the two possibilities discussed in the bullets on the previous page correspond to what is known as the fundamental theorem of linear program- ming.

7 Finding extreme points

An interesting, and sometimes important, task is to determine all (or some) of the extreme points of a given polyhedron P . From the previous section we now have two techniques for finding the extreme points:

12 1. Use the definition of extreme point. Look at a general point x in the polyhedron P . Try to write it as the midpoint between two other points in P ; if so, x is not an extreme point. Modify x suitably so that “it is harder to write it as a midpoint”; this will happen when you force more inequalities to be active at your point. Eventually, if you succeed, you will find all extreme points in this way. Note that this technique is a combinatorial discussion/argument which could be very hard to perform in practice (sometimes impossible!). But the idea is to exploit the structure of the given inequalities/equations somehow. This technique may be used for a general polyhedron, e.g., in the form P = {x ∈ IRn : Ax ≤ b}.

2. Use Proposition 7. This method may be applied when the polyhedron has the form P = {x ∈ IRn : Ax = b, x ≥ O} where the m × n matrix A has rank m. (Remark: there is a similar technique for Ax ≤ b, but we do not go into this here; enough is enough!) The technique is “simply” to determine all bases of A. So, again there will be a combinatorial discussion. In principle you may consider all choices of m columns selected from the n columns (i.e., choosing a subset B of size m from {1, 2,...,n}), and for each choice you decide if the corresponding submatrix AB is invertible by solving ABxB = b to see if the solution is unique. If so, and if, in addition, xB ≥ O, then you −1 have found an extreme point x =(xB, xN )=(AB b, O).

But a warning here: finding all vertices can be a very difficult mathe- matical challenge. Let us look at some examples where these methods do the job.

Example 5. (i) Let a1, a2,...,an and b be given positive numbers and con- sider the polyhedron

n P = {x ∈ IR : a1x1 + a2x2 + ··· + anxn = b, x ≥ O}.

Then P has the form so we can apply Technique 2 above. The matrix A is the 1 × n matrix A = a1 a2 ··· an which has rank m = 1. So a basis in A is simply an entry in A,a1 × 1 submatrix, and it is invertible as each entry is nonzero. So if B = {j}, we solve ABxB = b and get xB = b/aj which is positive. The corresponding extreme point is (0,..., 0, b/aj, 0,..., 0)

13 where the nonzero is in position j. Thus, P has n extreme points (and they are the intersections between the coordinate axes and the hyperplane given by aT x = b). (ii) Let

4 P = {x ∈ IR :2x1 +3x2 = 12, x1 + x2 + x3 + x4 =1, x ≥ O}

Again we may apply Technique 2 for

2300 A = " 1111 # which has rank 2. The following selection of index set B correspond to the bases in A: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}. Here {3, 4} does not give a basis, as the first row is the zero vector (so the submatrix is singular). Note that some of the index sets may lead to the same basis (for example, {2, 3} and {2, 4} do so), but these may give different points x (as the B-part are in different components). Then, for each basis, we may compute the corresponding basic solution, and the feasible (i.e., nonnegative) basic solutions are then the extreme points. We leave this computation to the reader! (iii) Let n n P = {x ∈ IR : ajxj ≤ b, O ≤ x ≤ e} j=1 X where aj > 0 (j ≤ n), b > 0 and e is the all ones vector. Let us find all extreme points using Technique 1 above. n Consider first an x satisfying j=1 ajxj < b and 0 < xj < 1 for some ′ ′′ ′ ′′ j ≤ n. But then x = (1/2)x + (1/2)x where x = x + ǫej, x = x − ǫej ′ ′′ P and x , x ∈ P for suitable small ǫ > 0 (and where ej is the jth unit unit n vector). Thus we see that if x is an extreme point satisfying j=1 ajxj < b, then xj ∈ {0, 1} for each j! This gives candidate extreme points that are n P those (0, 1)-vectors that satisfy j=1 ajxj < b. n Next, if an extreme point satisfies ajxj = b, then a slight extension P j=1 of the same argument as above, shows that xj ∈{0, 1} for all except at most one j. Actually, if there were two componentsP strictly between 0 and 1, we could increase one and decrease the other suitably, and violate the extreme point property. Moreover, if one such component xj is strictly between 0 and n 1, then xj is determined by the equation j=1 ajxj = b. P 14 This gives all candidate extreme points, and it is not difficult to show that all these points are indeed extreme point of P . Again, we leave it to the reader to write down all the extreme points based on this discussion.

8 Fourier-Motzkin elimination and projection

Fourier-Motzkin elimination is a computational method which may be seen as a generalization of Gaussian elimination. Fourier-Motzkin elimination is used for finding one or all solutions of a linear system of inequalities, say Ax ≤ b where A ∈ IRm,n and b ∈ IRm. Moreover, the same method can be used to find the projection of a polyhedron into a subspace. The idea is to eliminate one variable at the time and rewrite the system accordingly. To explain the method we assume that we want to eliminate the variables in the order x1, x2,...,xn (although any order will do). The system Ax ≤ b may be split into three subsystems

+ ai1x1 +ai2x2+ ··· +ainxn ≤ bi (i ∈ I ) 0 0 · x1 +ai2x2+ ··· +ainxn ≤ bi (i ∈ I ) (7) − ai1x1 +ai2x2+ ··· +ainxn ≤ bi (i ∈ I )

+ 0 − where I = {i : ai1 > 0}, I = {i : ai1 = 0} and I = {i : ai1 < 0}. This system is clearly equivalent to

′ ′ ′ ′ ′ ′ + − ak2x2 + ··· + aknxn − bk ≤ x1 ≤ bi − ai2x2 −···− ainxn (i ∈ I ,k ∈ I ) 0 ai2x2 + ··· + ainxn ≤ bi (i ∈ I ) (8) ′ ′ + − where bi = bi/|ai1| and aij = aij/|ai1| for each i ∈ I ∪ I . It follows that x1, x2,...,xn is a solution of the original system (7) if and only if x2, x3,...,xn satisfy

′ ′ ′ ′ ′ ′ + − ak2x2 + ··· + aknxn − bk ≤ bi − ai2x2 −···− ainxn (i ∈ I ,k ∈ I ) 0 (9) ai2x2 + ··· + ainxn ≤ bi (i ∈ I ) and x1 satisfies

′ ′ ′ ′ ′ ′ max(ak2x2 + ··· + aknxn − bk) ≤ x1 ≤ min(bi − ai2x2 −···− ainxn). (10) k∈I− i∈I+

Note that, when values on x2, x3,...,xn have been selected, the constraint in (10) says that x1 lies in a certain interval (determined by x2, x3,...,xn).

15 Thus, we have eliminated x1 and may proceed to solve the system (9) which involves x2, x3,...,xn. We here eliminate x2 in similar way, and pro- ceed until we obtain a linear system which only involves xn, say l ≤ xn ≤ u. Then a general solution to Ax ≤ b is obtained by choosing xn in the interval [l,u], and next choosing xn−1 in an interval which depends on xn, then choose xn−2 in an interval depending on xn and xn−1 etc. We may summarize our findings in the following theorem. Theorem 8. The Fourier-Motzkin elimination method is a finite algorithm that finds a general solution to a given linear system Ax ≤ b. If there is no solution, the method determines this fact by finding an implied and inconsis- tent inequality (0 ≤ −1). Moreover, the method finds the projection P ′ of the given polyhedron P = {x ∈ IRn : Ax ≤ b} into the space of a subset of the variables, and shows that P ′ is also polyhedron by finding a linear inequality description of P ′. Proof. This follows from our description above using induction. We note that the system we obtain after the elimination of a variable may contain many more inequalities than the original one. However, some of these new inequalities may be redundant, and this is checked for in computer implementations of the algorithm. We conclude with a small illustration of Fourier-Motzkin elimination method. Example 6. (Fourier-Motzkin) Consider the system

1 ≤ x1 ≤ 2, 1 ≤ x2 ≤ 4, x1 − x2 ≥ −1.

We eliminate x1 and get

max{1, x2 − 1} ≤ x1 ≤ 2 and the new system

1 ≤ 2, x2 − 1 ≤ 2, 1 ≤ x2 ≤ 4. The last system has a two redundant inequalities and is equivalent to

1 ≤ x2 ≤ 3. Thus, the general solution is:

1 ≤ x2 ≤ 3, max{1, x2 − 1} ≤ x1 ≤ 2. Moreover, if P is the polyhedron consisting of the solutions to the original system, then the projection P into the x2-space is the interval [1, 3]. These facts may be checked by a suitable drawing in the plane.

16 9 Exercises

1. Show that the unit ball B(O, 1) := {x ∈ IRn : kxk ≤ 1} is convex. Here k·k denotes the Euclidean norm. Hint: use the triangle inequality. Is the ball B(a, r) := {x ∈ IRn : kx − ak ≤ r} convex for all a ∈ IRn and r > 0?

2. Prove that every linear subspace of IRn is a convex set.

3. Is the union of two convex sets again convex?

n 4. Show that if C1,...,Ct ⊆ IR are all convex sets, then C1 ∩ ... ∩ Ct is convex. In fact, a similar result for the intersection of any family of convex sets. Explain this.

n 5. Is the unit ball B = {x ∈ IR : kxk2 ≤ 1} a polyhedron?

n 6. Consider the unit ball B∞ = {x ∈ IR : kxk∞ ≤ 1}. Here kxk∞ = maxj |xj| is the max norm of x. Show that B∞ is a polyhedron. Illus- trate when n = 2.

n 7. Consider the unit ball B1 = {x ∈ IR : kxk1 ≤ 1}. Here kxk1 = n j=1 |xj| is the absolute norm of x. Show that U1 is a polyhedron. Illustrate when n = 2. P 8. Explain how you can write the LP problem max {cT x : Ax ≤ b} in the T form max {c1 x1 : A1x1 = b1, x1 ≥ O}.

9. Consider the linear system 0 ≤ xi ≤ 1 for i =1,...,n and let P denote the solution set. Explain how to solve a linear programming problem

max{cT x : x ∈ P }.

What if the linear system was ai ≤ xi ≤ bi for i = 1,...,n. Here we assume ai ≤ bi for each i. 10. Show that conv(S) is convex for all S ⊆ IRn. (Hint: look at two convex

combinations j λjxj and j µjyj, and note that both these points may be written as a convex combination of the same set of vectors.) P P 11. Give an example of two distinct sets S and T having the same convex hull. It makes sense to look for a smallest possible subset S0 of a set S such that S = conv(S0). We study this question later.

12. Prove that if S ⊆ T , then conv(S) ⊆ conv(T ).

17 13. If S is convex, then conv(S)= S. Show this!

2 2 14. Let S = {x ∈ IR : kxk2 =1}, this is the unit circle in IR . Determine conv(S).

2 15. Let S = {(0, 0), (1, 0), (0, 1)}. Show that conv(S) = {(x1, x2) ∈ IR : x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1}.

16. Let S consist of the points (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), 3 (1, 0, 1), (0, 1, 1) and (1, 1, 1). Show that conv(S)= {(x1, x2, x3) ∈ IR : 0 ≤ xi ≤ 1 for i =1, 2, 3}. Also determine conv(S \{(1, 1, 1)} as the solution set of a system of linear inequalities. Illustrate all these cases geometrically.

17. Use the version of Farkas’ lemma given in Lemma 4 to prove the follow- ing version: Ax = b has a nonnegative solution if and only if yT b ≥ 0 for all y with yT A ≥ 0.

18. Compute explicitly all the extreme points of the polyhedra discussed in Example 5.

19. Find all extreme points of the n-dimensional rectangle P = {x ∈ IRn : ai ≤ xi ≤ bi (i ≤ n)} where ai ≤ bi (i ≤ n) are given real numbers. 20. Find all extreme points of the polyhedron

x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1, x2 ≤ 1/2. (11)

21. Find some (or even all!) extreme points of the polyhedron

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x1 + x2 + x3 ≤ 4, x1 + x2 ≥ 1, x3 ≤ 2. (12)

22. Use Fourier-Motzkin elimination to find all solutions to the linear sys- tem in (11). Illustrate the solution set geometrically.

23. Use Fourier-Motzkin elimination to find all solutions to the linear sys- tem in (12). Illustrate the solution set geometrically.

24. Consider Fourier-Motzkin elimination in the case n = 2 (two variables), and m = 4 (4 inequalities). Consider a general (or, if you prefer, choose a specific such linear system. Eliminate x1 and try to interprete the inequalities in the new system geometrically.

18 10 Further reading

If you are interested in reading more about convexity, there are many topics to choose from, for instance • Convex functions

• Projection onto convex sets • Caratheodory’s theorem

• Separation theorems • Polyhedral theory

• Polytopes and graphs

• Polyhedra and combinatorial optimization • .... In the references you will find several suggested books for further reading.

Welcome to the world of convexity!

References

[1] V. Chvatal. Linear programming. W.H. Freeman and Company, 1983. [2] G. Dahl. An introduction to convexity. Report 279. Dept. of Informatics, University of Oslo, 2001. [3] G. Dahl. Combinatorial properties of Fourier-Motzkin elimination. Elec- tronic Journal of Linear Algebra, 16 (2007), 334–346. [4] M. Gr¨otschel and M.W. Padberg. Polyhedral theory. In E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys, editors, The traveling salesman problem, chapter 8, pages 251–361. Wiley, 1985. [5] J.-B. Hiriart-Urruty and C. Lemar´echal. Convex analysis and minimiza- tion algorithms I. Springer, 1993. [6] W.R. Pulleyblank. Polyhedral combinatorics. In Nemhauser et al., ed- itor, Optimization, volume 1 of Handbooks in Operations Research and Management Science, chapter 5, pages 371–446. North-Holland, 1989.

19 [7] R.T. Rockafellar. Convex analysis. Princeton, 1970.

[8] R. Schneider. Convex bodies: the Brunn-Minkowski theory, volume 44 of Encyclopedia of Mathematics and its applications. Cambridge University Press, Cambridge, 1993.

[9] E. Torgersen. Comparison of Statistical Experiments, volume 36 of En- cyclopedia of Mathematics and its applications. Cambridge University Press, Cambridge, 1992.

[10] R. Webster. Convexity. Oxford University Press, Oxford, 1994.

[11] G. Ziegler. Lectures on polytopes. Springer, 1995.

20