<<

Linear Programming and the Simplex Method David Gale

his exposition of etc. Further, each food has a cost, so among all and the simplex method is intended such adequate meals the problem is to find one as a companion piece to the article that is least costly. in this issue on the life and work of George B. Dantzig in which the impact The Transportation Problem Tand significance of this particular achievement A certain good, say steel, is available in given are described. It is now nearly sixty years since amounts at a set of m origins and is demanded in Dantzig’s original discovery [3] opened up this specified amounts at a set of n destinations. Cor- whole new area of . The subject is responding to each origin i and destination j there now widely taught throughout the world at the is the cost of shipping one unit of steel from i to level of an advanced undergraduate course. The j. Find a shipping schedule that satisfies the given pages to follow are an attempt at a capsule pre- demands from the given supplies at minimum sentation of the material that might be covered in cost. three or four lectures in such a course. Almost all linear programming applications, including these examples, can be motivated in the Linear Programming following way. There is a given set of goods. For The subject of linear programming can be defined the diet problem these are the various nutrients. quite concisely. It is concerned with the problem For the transportation problem there are m + n of maximizing or minimizing a linear function goods, these being steel at each of the origins whose variables are required to satisfy a system and destinations. There is also a set of processes of linear constraints, a constraint being a linear or activities which are specified by amounts of equation or inequality. The subject might more the goods consumed as inputs or produced as appropriately be called linear optimization. Prob- outputs. In the diet problem, for example, the lems of this sort come up in a natural and quite carrot-consuming activity c has an output of c1 elementary way in many contexts but especially units of calories, c2 units of vitamin A, etc. For the in problems of economic planning. Here are two transportation problem the transportation activ- popular examples. ity tij has as input one unit of steel at origin i and as output one unit of steel at destination j. In gen- The Diet Problem eral, an activity a is column vector whose positive A list of foods is given and the object is to pre- entries are outputs, negative entries inputs. These scribe amounts of each food so as to provide a vectors will be denoted by boldface letters. It is meal that has preassigned amounts of various nu- assumed that activity aj may be carried out at any trients such as calories, vitamins, proteins, starch, nonnegative level xj . The constraints are given by another activity vector b, the right-hand side, David Gale is professor of mathematics at the Uni- which specifies the amounts of the different goods versity of , Berkeley. His email address is that are to be produced or consumed. For the diet [email protected]. problem it is the list of amounts of the prescribed

364 Notices of the AMS Volume 54, Number 3 1 2 j n nutrients. For the transportation problem it is the a a ··· a ··· a 1 given supplies and demands at the origins and b x11 x12 ··· x1j ··· x1n 2 destinations. Finally, associated with each activity b x21 x22 ··· x2j ··· x2n j j ...... a is a cost cj . Given m goods and n activities a ...... i the linear programming problem (LP) is then to b xi1 xi2 ··· xij ··· xin find activity levels xj that satisfy the constraints ...... P ...... and minimize the total cost cj xj . Alternatively, j m c may be thought of as the profit generated by ac- b xm1 xm2 ··· xmj ··· xmn tivity a, in which case the problem is to maximize P i rather than minimize j cj xj . where xij is the coefficient of b in the expression The simplex method is an algorithm that finds for aj as a linear combination of the bi . solutions of LPs or shows that none exist. In the In matrix terms, X is the (unique) matrix satis- exposition to follow we will treat only the special fying case where the constraints are equations and the (1) BX = A. variables are nonnegative, but the more general (Symbols A, B are used ambiguously to stand cases are easily reduced to this case. either for a matrix or the set of its columns.) It will often be useful to include the unit vec- The Simplex Method tors ei in the tableau in which case it appears as In the following paragraphs we describe the sim- shown below. plex algorithm by showing how it can be thought 1 2 j m of as a substantial generalization of standard e e ··· e ··· e 1 Gauss-Jordan elimination of ordinary linear al- b y11 y12 ··· y1j ··· y1m 2 gebra. To gain intuition as to why the algorithm b y21 y22 ··· y2j ··· y2m ...... works, we will refer to the linear activity model ...... i of the previous section. Finally we will mention b yi1 yi2 ··· y ij ··· yim some interesting discoveries in the analysis of ...... the algorithm, including a tantalizing unsolved ...... m problem. b ym1 ym2 ··· ymj ··· ymm

Pivots and Tableaus a1 a2 ··· aj ··· an The basic computational step in the simplex algo- x11 x12 ··· x1j ··· x1n rithm is the same as that in most of elementary x21 x22 ··· x2j ··· x2n linear algebra, the so-called pivot operation. This ...... is the operation on matrices used to solve systems xi1 xi2 ··· xij ··· xin of linear equations, to put matrices in echelon ...... form, to evaluate determinants, etc...... Given a matrix A one chooses a nonzero pivot xm1 xm2 ··· xmj ··· xmn entry aij and adds multiples of row i to the other rows so as to obtain zeros in the jth column. The Note that from (1), Y = {yij } is the solution of −1 ith row is then normalized by dividing it by aij . BY = I so Y = B . Multiplying (1) on the left by Y For solving linear equations a pivot element can gives the important equation be any nonzero entry. By contrast, the simplex method restricts the choice of pivot entry and is (2) YA = X. completely described by giving a pair of simple It is easy to verify that if one pivots on en- rules, the entrance rule that determines the pivot 0 try xij , one obtains a new matrix X that is the column j and the exit rule that determines the tableau with respect to the new basis in which aj pivot row i (in theory a third rule may be needed has replaced bi . to take care of degenerate cases). By following these rules starting from the initial data the algo- a1 a2 ··· aj ··· an 1 0 0 0 rithm arrives at the solution of the linear program b x11 x12 ··· 0 ··· x1n 2 0 0 0 in a finite number of pivots. Our purpose here is b x21 x22 ··· 0 ··· x2n to present these rules and show why they work...... We shall need one other concept...... aj x0 x0 ··· 1 ··· x0 Definition. The tableau X of a set of vectors i1 i2 in ...... A = {a1, a2,..., an} with respect to a basis ...... 1 2 m m 0 0 0 B = {b , b ,..., b } is the m × n matrix b xm1 xm2 ··· 0 ··· xmn

March 2007 Notices of the AMS 365 The simplex method solves linear programs ui , is the value of a variable xji whose correspond- ji i by a sequence of pivots in successive tableaus, ing column a = b . Every xj whose corresponding or, equivalently, by finding a sequence of bases, column is not in the current basis has the value where each basis differs from its predecessor by a zero. single vector. Case II. In trying to replace ek there is no avail-

able pivot because xkj = 0 for all j. k Solving Linear Equations (i) If in addition uk = 0, then leave e in the ba- We start by showing how to solve systems of lin- sis and go on to replace ek+1. ear equations using the language of pivots and (ii) If uk ≠ 0, then the kth row yk of Y gives tableaus. Our fundamental result is the following: a solution of (4). To see this, note that from (2) h i h i h i Theorem 1. Exactly one of the following systems Y A b = X u . Since the kth row of X is has a solution: zero, we get ykA = 0, whereas ykb = uk ≠ 0, so (3) Ax = b yk/uk solves (4). The algebra becomes considerably more com- or plicated if one requires the solutions of (3) to be (4) yT A = 0T , yT b = 1. nonnegative. In this case we have the following (Again, boldface letters are column vectors and are existence theorem (known as Farkas’s Lemma): converted to row vectors using the superscript T .) Theorem 2. Exactly one of the following systems Note that rather than giving complicated con- has a solution. ditions for (3) not to be solvable this formulation (5) Ax = b, x ≥ 0 puts the solvable and unsolvable case on an equal footing. This duality is an essential element of the or theory, as will be seen. (6) yT A ≤ 0T , yT b > 0. Geometric Interpretation Again it is not possible for both systems to If (3) has no solution this means that b does not lie have solutions since multiplying (5) by yT gives in the linear subspace generated by the aj . In that yT Ax = yT b > 0 whereas multiplying (6) by x case there is a vector y orthogonal to the aj but not gives yT Ax ≤ 0. to b. We shall continue to present such geometric Geometric Interpretation pictures as aids to intuition but the mathematical Equation (5) says that b lies in the convex cone arguments will not depend on these “pictures”. generated by the columns aj of A. If b does not It is immediate that (3) and (4) cannot both lie in this cone, then there is a “separating hyper- T hold since multiplying (3) by y and (4) by x would plane” whose normal makes a non-acute angle = imply 0 1. The nontrivial part is to show that with the aj and an acute angle with the vector b. either (3) or (4) must hold. We do this by giving an algorithm that in at most m pivots finds a solution The simplex method finds a solution of either to either (3) or (4). (5) or (6) in a finite number of pivots. However, The initial tableau consists of the matrix A there is no useful upper bound on the number of together with the right-hand side b all preceded pivots that may be needed as we shall see shortly. by the identity matrix I. It is represented schemat- We may assume to start out that the vector b ically in the form below. is nonnegative (if not, change the signs of some of {ei } {aj } {b} the equations). Definition. A basis B will be called feasible (strongly {bi } [I A b ] feasible) if b is a nonnegative (positive) linear com- (The symbols like ei , {aj } in curly brackets are bination of vectors of B. This is equivalent to the the row and column headings of the tableau.) condition that the b column u of the tableau is We proceed to try replacing the ei in order by nonnegative (positive). some aj . The tableau at any stage has the form Imitating the algorithm of the previous section i {ei } {aj } {b} we start from the feasible basis {e } of unit vectors and try to bring in the {aj } by a sequence of pivots i {b } [Y X u ] that maintain feasibility, but to do this the pivot There are now two possibilities. rule of the previous section must be restricted. Case I. All of the unit vectors ei can be replaced Suppose aj is to be brought into a new feasible by some of the aj . Then a solution of (3) can be basis. Then in order for the new basis to be feasi- read off from the vector u in the b-column of the ble the basis vector it replaces must satisfy the fol- tableau above. Namely, each component of u, say lowing easily derived condition.

366 Notices of the AMS Volume 54, Number 3 Exit Pivot Rule verify the validity of the von Neumann proof. The first formally published proof is due to Gale, Kuh, The pivot element xij must satisfy and Tucker [4]. (i) x > 0, ij To see how Theorem 2 is a special case of the (ii) u /x ≤ u /x for all k with x > 0. i ij k kj kj duality theorem we take [IA b] as the given ini- Because of the above restriction it is no longer tial tableau of a standard minimum problem and possible to simply replace the basis vectors ei one the vector cT = (1, .., 1, 0, ..., 0) whose first m en- at a time as in the previous section. Some further tries are 1, the rest 0, as the objective function. rule for choosing the pivot column must be given. The set X is nonempty since it contains the non- We will postpone the proof of Theorem 2 as it negative vector (b , .., b , 0, .., 0), and the set Y will be shown to be a special case of a much more 1 m is nonempty since it contains the zero vector. If general theorem, which we now describe. the optimal value is zero, then a subvector of the optimal solution x solves (5). If the optimal value The Standard Minimum Problem is positive, then the dual optimal vector y solves We consider the following important special case (6). of a linear programming problem. We will now see how the simplex method solves T × Given an m-vector b, an n-vector c and an m problems (7), (8) and gives a constructive proof of n matrix A, find an n-vector x so as to the duality theorem. minimize cT x As a first step we incorporate the objective (7) subject to Ax = b function cT x as the 0th row of the tableau. That is, j x ≥ 0. we define the augmented column vector ab to be j 0 (−cj , a ), and we introduce the 0th unit vector e . Corresponding to this problem (and using ex- The augmented initial tableau is then, actly the same data) is a dual problem, find an m- T 0 i j vector y so as to e {e }{ab } b maximize yT b e0 h 1 0 −cT 0 i {ei } 0 IA b (8) subject to yT A ≤ cT . We may assume that we have found an initial fea- Terminology sible basis, by applying the simplex method to the The linear function cT x and yT b are called the nonnegative solution problem (5), (6) as described. objective functions of (7) and (8), respectively. The Then with respect to the general augmented basis vectors x, y that solve these problems are called {e0,b1,...,bm} the tableau has the form below. optimal solutions; the numbers cT x and yb are e0 {ei }{aj } b optimal values. b 0 Let X and Y be the sets (possibly empty) of all e h 1 yT zT w i vectors that satisfy the constraints of (7) and (8), {bi } 0 YX u respectively. Such vectors are said to be feasible. 0 P i Note that by definition of the tableau, we + i ui b P Key Observation = b so for the 0th entry we have w − ui ci = 0, P i If x ∈ X, yT ∈ Y, then yT b ≤ cT x. so w = i ui ci , which is the value of the objective i Proof. Multiply Ax = b by yT and yT A ≤ cT by x. function for the basic feasible basis {b }. P An important consequence of this observation Also from the tableau zj = i xij ci − cj . This is number compares the cost associated with activ- Corollary. If x ∈ X and y ∈ Y satisfy yT b = cT x, ity vector aj to that of the corresponding linear i then x and y are optimal solutions, and the num- combination of the basis vectors b . If zj is posi- bers yT b=cT x are optimal values for their respec- tive it means that one could reduce the total cost tive problems. by bringing the vector aj into the next basis. This The converse of this fact makes up the leads to the following two important observations: Fundamental Duality Theorem. The dual prob- lems above have optimal solutions x, yT if and I. If the row zT is nonpositive, then u can be only if X and Y are nonempty in which case the extended to a solution x of (7) and yT solves (8). optimal values cT xand yT b are equal. Namely, using (2) again we have " #" # " # 1 yT −cT 0 zT w Historical Note = 0 Y A b X u The first explicit statement of the duality theorem T j T T T is due to von Neumann [7] in a manuscript that so −cj + y a = zj ≤ 0. Hence y A ≤ c so y was privately circulated but never formally pub- satisfies the dual constraints. Similarly we have lished in his lifetime. Moreover it has been hard to yT b = w = cT x, so from the corollary above, the

March 2007 Notices of the AMS 367 Duality Theorem is verified. Bland’s Pivot Selection Rule Among eligible entering vectors choose the one II. If for some j we have zj > 0 and xij ≤ 0 for with lowest index and let it replace the eligible all i, then problem (7) has no minimum, for from basis vector with lowest index. j P i the tableau we have a − i xij b = 0; and since the While the condition is simple to state, the xij ≤ 0, we have proof that it avoids cycling is quite subtle. More- over, it is not as efficient as the customary (Dant- j X i X i λ(ab − xij b ) + xi b = b zig) pivot choice rule. i i Finally, there is a natural economic interpre- tation of the dual problem. In the model of goods giving a new feasible (nonnegative) solution for and activities, the vector yT = (y .., y ) can be any λ > 0. The corresponding change in the objec- 1, m thought of as a price vector where y is the price of tive function is λ(c − P x c ) = −λz ; so since i j i ij i j one unit of the ith good. Then yT aj is the revenue z > 0 the function has no minimum. j generated by operating activity aj at unit level. In view of I above, one is looking for a feasible T j The condition of dual feasibility, y a − cj ≤ 0 solution with z nonpositive, so the following pivot states that profit (revenue minus cost) cannot be rule suggests itself. positive. This is an economically natural require- ment, for if an activity generated positive profits Entrance Pivot Rule producers would want to operate it at arbitrarily j Bring into the basis any column ab where zj is pos- high levels and this would clearly not be feasible. itive. Duality then says that prices are such that the Note that as described the second pivot rule price vector yT maximizes the value yT b of the may, in general, require making a choice among right-hand side activity b subject to the condition

many possible positive zj . A natural choice would of no positive profits. be for example to choose the column with the

largest value of zj , a choice originally suggested Some Properties of the Simplex Method by Dantzig. Since its creation by Dantzig in 1947, there has The two pivot rules completely describe the been a huge amount of literature on variations

simplex algorithm. If for some j, zj is positive and and extensions of the simplex method in many j some xij is positive, then bring a into the basis directions, several devised by Dantzig himself. (by a pivot). It remains to show that eventually We will mention here only one of these, revolv- either I or II will occur. The argument is simple if ing around the question of the number of pivot we make the following steps required to solve an LP in the worst case. Based on a vast amount of empirical experience, Nondegeneracy Assumption it seemed that an m × n program was typically solved in roughly 3m/2 pivots. However, in 1972, Every feasible basis is strongly feasible. Klee and Minty [6] gave an example of an m × 2m Note if this were not the case, then b would be standard problem that using the Dantzig choice a positive linear combination of fewer than m of rule required 2m − 1 pivots. To appreciate the ex- j the vectors a , a degenerate situation. Although ample, first consider the matrix A consisting of the degenerate case would seem (on mathematical the m unit vectors ei and m vectors aj = ej where grounds) to be rare, this is not so in practice. the right-hand side is e, the vector all of whose Suppose xij is the (positive) pivot. Then after entries are one. By inspection, (see for example 0 pivoting w = w − zj ui /xij < w since ui > 0 from the tableau below) the set of feasible solutions nondegeneracy, so the value of the objective func- X decomposes into the direct product of m unit tion decreases with each pivot. This means that intervals, thus, it is a unit m-cube. no basis can recur (since the basis determines the {ei } {aj } b objective value), and since there are only finitely    many bases, the algorithm must terminate in 1 0 0 1 0 0 1    either state I or II.  0 1 0 0 1 0  1  If the nondegeneracy assumption is not satis- 0 0 1 0 0 1 1 fied some further argument is necessary. Indeed, It is easy to see that every feasible basis, thus examples have been constructed by Hoffman [5] every vertex of the cube, must contain either ai or (for one) using the Dantzig pivot choice rule which ei , but not both, for all i. This property will be pre- can lead to “cycling,” meaning the sequence of served if the ai are slightly perturbed. By a clever feasible bases recurs indefinitely. It turns out, choice of this perturbation and a suitable objec- however, that the following simple rule due to tive function, the authors show that the Dantzig Bland [1] guarantees that no basis will recur. pivot rule causes the algorithm to visit all of the

368 Notices of the AMS Volume 54, Number 3 2m vertices following a well known Hamiltonian day the method of choice for solving most linear path on the m-cube. programming problems. 4 ...... 5 ...... References ...... [1] R. G. Bland New finite pivoting rules for the sim- 7 ...... 6 ...... plex method, Mathematics of Operations Research 2 ...... (1977) 103–107...... [2] J. B. J. Fourier, Solution d’une question partic- ...... 3 ...... ulière du calcul des inégalités (1826), and extracts ...... from Histoire de l’Academie (1823, 1824), Oeuvres ...... 2 ...... II, French Academy of Sciences, pp. 317–328...... [3] , Maximization of a linear function 0 ...... G. B. Dantzig ...... subject to linear inequalities, in T. C. Koopmans (ed.), 1 Activity Analysis of Production and Allocation, John Wiley & Sons, , 1951, pp. 339–347. Figure 1. A Hamiltonian path on a 3-cube. [4] D. Gale, H. W. Kuhn, and A. W. Tucker, Lin- ear programming and the theory of games, in T. 1 This means, for example, that the vector a enters C. Koopmans (ed.), Activity Analysis of Production the basis on the first pivot, stays in the basis for and Allocation, John Wiley & Sons, New York, 1951, one pivot, then exits and stays out for one pivot, pp. 317–329. then reenters and stays, then exits and stays out, [5] A. J. Hoffman, Cycling in the simplex algorithm, etc. throughout the computation. National Bureau of Standards, Report No. 2974, So we know that in the worst case the simplex Washington, D.C., December 16, 1953. method may require an exponential number of [6] V. Klee and G. J. Minty, How good is the simplex al- gorithm? in (O. Shisha, ed.) Inequalities III, New York, pivots, although, as mentioned earlier, no natu- Academic Press, 1972, 159–175. rally occurring problem has ever exhibited such [7] J. von Neumann, Discussion of a maximum prob- behavior. There are also results on the expected lem, working paper dated November 15–16, 1947, number of pivots of a “random” LP. reproduced in A. H. Taub (ed.), : Since the Klee-Minty example shows that it Collected Works, Vol. VI, Pergamon Press, Oxford, is possible for the simplex algorithm to behave 1963, pp. 89–95. badly, it is natural to ask whether there may be some other pivoting algorithms that are guaran- teed to find an optimal solution in some number of pivots bounded, say, by a polynomial in n. Let us first consider a lower bound on the num- ber of pivots. If the initial feasible basis is A = {a1,..., am} and the optimal basis is a disjoint set B = {b1, .., bm} then clearly it will require at least m pivots to get from A to B. The very surprising fact is that if the set X of feasible solutions is bounded, then in all known examples it is possi- ble to go from any feasible basis to any other in at most m pivots. Recall that the set X is an m- dimensional convex polytope whose vertices are the feasible bases. The above observation is equiv- alent to the statement that any two vertices of this polytope are connected by an edge path of at most m edges. The conjecture originally posed by War- ren Hirsch has been proved by Klee and Walkup through dimension 5 but remains unresolved in higher dimensions. For Dantzig this “Hirsch conjecture” was es- pecially intriguing since a constructive proof of the conjecture would have meant that there might be an algorithm that solves linear programs in at most m pivots. However, even if such an algorithm existed, it might be that the amount of calculation involved in selecting the sequence of pivots would make it far less computationally efficient than the simplex method. Indeed this remarkable algo- rithm and its many refinements remains to this

March 2007 Notices of the AMS 369