<<

A Randomized Polynomial-Time for

Jonathan A. Kelner∗ Daniel A. Spielman† and Artificial Intelligence Department of Computer Science Laboratory Yale University Massachusetts Institute of Technology

ABSTRACT trast, have been developed for solving linear pro- We present the first randomized polynomial-time simplex grams that do have polynomial worst-case complexity [10, algorithm for linear programming. Like the other known 9, 5, 1]. Most notable among these have been the ellipsoid polynomial-time algorithms for linear programming, its run- method [10] and various interior-point methods [9]. All pre- ning time depends polynomially on the number of bits used vious polynomial-time algorithms for linear programming to represent its input. of which we are aware differ from simplex methods in that We begin by reducing the input linear program to a spe- they are fundamentally geometric algorithms: they work ei- cial form in which we merely need to certify boundedness. ther by moving points inside the feasible set, or by enclosing As boundedness does not depend upon the right-hand-side the feasible set in an ellipse. Simplex methods, on the other vector, we run the shadow-vertex simplex method with a hand, walk along the vertices and edges defined by the con- random right-hand-side vector. Thus, we do not need to straints. The question of whether such an algorithm can be bound the diameter of the original . designed to run in polynomial time has been open for over Our analysis rests on a geometric statement of indepen- fifty years. dent interest: given a polytope Ax b in isotropic posi- We recall that a linear program is a constrained optimiza- tion, if one makes a polynomially small≤ perturbation to b tion problem of the form: then the number of edges of the projection of the perturbed maximize c x (1) polytope onto a random 2-dimensional subspace is expected · subject to Ax b, x Rd, to be polynomial. ≤ ∈ where c Rd and b Rn are column vectors, and A is an ∈ ∈ 1. INTRODUCTION n d matrix. The vector c is the objective function, and the× set P := x Ax b is the set of feasible points. If Linear programming is one of the fundamental problems it is non-empt{y, P| is a≤con}vex polyhedron, and each of its of optimization. Since Dantzig [3] introduced the simplex extreme vertices will be determined by d constraints of the method for solving linear programs, linear programming has form a x = b , where a , . . . ,a are the rows of A. It been applied in a diverse range of fields including economics, i i 1 n is not difficult· to show that{ the objectiv} e function is always , and combinatorial optimization. From maximized at an extreme vertex, if this maximum is finite. a theoretical standpoint, the study of linear programming The first simplex methods used heuristics to guide a walk has motivated major advances in the study of , on the graph of vertices and edges of P in search of one that convex geometry, combinatorics, and complexity theory. maximizes the objective function. In order to show that any While the simplex method was the first practically useful such method runs in worst-case polynomial time, one must approach to solving linear programs and is still one of the prove a polynomial upper bound on the diameter of polytope most popular, it was unknown whether any variant of the graphs. Unfortunately, the existence of such a bound is a simplex method could be shown to run in polynomial time wide-open question: the famous asserts in the worst case. In fact, most common variants have been that the graph of vertices and edges of P has diameter at shown to have exponential worst-case complexity. In con- most n d, whereas the best known bound for this diameter is superp−olynomial in n and d [8]. †Partially supported by NSF grant CCR-0324914. Later simplex methods, such as the self-dual simplex method ∗Partially supported by an NSF Graduate Fellowship and NSF grant CCR-0324914. and the criss-cross method, avoided this obstacle by consid- ering more general graphs for which diameter bounds were known. However, even though these graphs have polynomial diameters, they have exponentially many vertices, and no- body had been able to design a polynomial-time algorithm Permission to make digital or hard copies of all or part of this work for that provably finds the optimum after following a polyno- personal or classroom use is granted without fee provided that copies are mial number of edges. In fact, essentially every such algo- not made or distributed for profit or commercial advantage and that copies rithm has well-known counterexamples on which the walk bear this notice and the full citation on the first page. To copy otherwise, to takes exponentially many steps. republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. In this paper, we present the first randomized polynomial- Copyright 200X ACM XXXXXXXXX/XX/XX ...$5.00. time simplex method. Like the other known polynomial- time algorithms for linear programming, the running time start, the shadow-vertex method requires as input a vertex of our algorithm depends polynomially on the bit-length of v 0 of P . It then chooses some objective function optimized the input. We do not prove an upper bound on the diame- at v 0, say f , sets S = span(c, f ), and considers the shadow ter of polytopes. Rather we reduce the linear programming of P onto S. If no degeneracies occur, then for each vertex y problem to the problem of determining whether a set of lin- of P that projects onto the boundary of the shadow, there is ear constraints defines an unbounded polyhedron. We then a unique neighbor of y on P that projects onto the next ver- randomly perturb the right-hand sides of these constraints, tex of the shadow in clockwise-order. Thus, by tracing the observing that this does not change the answer, and we then vertices of P that map to the boundary of the shadow, the use a shadow-vertex simplex method to try solve the per- shadow-vertex method can move from the vertex it knows turbed problem. When the shadow-vertex method fails, it that optimizes f to the vertex that optimizes c. The number suggests a way to alter the distributions of the perturba- of steps that the method takes will be bounded by the num- tions, after which we apply the method again. We prove ber of edges of the shadow polygon. For future reference, that the number of iterations of this loop is polynomial with we call the shadow-vertex simplex method by high probability. ShadowVertex(a , . . . ,a , b, c, S, v , s), It is important to note that the vertices considered during 1 n 0 the course of the algorithm may not all appear on a single where a 1, . . . ,a n, b, and c specify a linear program of form (1), polytope. Rather, they may be viewed as appearing on the S is a two-dimensional subspace containing c, and v 0 is the convex hulls of polytopes with different b-vectors. It is well- start vertex, which must optimize some objective function known that the graph of all of these “potential” vertices has in S. We allow the method to run for at most s steps. If small diameter. However, there was previously no way to it has not found the vertex optimizing c within that time, guide a walk among these potential vertices to one optimiz- it should return (fail, y), where y is its current vertex. If ing any particular objective function. Our algorithm uses it has solved the linear program, it either returns (opt, x), the graphs of polytopes “near” P to impose structure on where x is the solution, or unbounded if it was unbounded. this graph and to help to guide our walk. In the next section, we will show that if the polytope is Perhaps the message to take away from this is that instead in isotropic position and the distances of the facets from the of worrying about the combinatorics of the natural polytope origin are randomly perturbed, then the number of edges of P , one can reduce the linear programming problem to one the shadow onto a random S is expected to be polynomial. whose polytope is more tractable. The first result of our The one geometric fact that we will require in our analysis paper, and the inspiration for the algorithm, captures this is that if an edge of P is tight for inequalities a i x = bi, idea by showing that if one slightly perturbs the b-vector for i I, then the edge projects to an edge in the shado· w if ∈ of a polytope in near-isotropic position, then there will be a and only if S intersects the convex hull of a i i I . Below, polynomial-step path from the vertex minimizing to the ver- we will often abuse notation by identifying an{ edge} ∈ with the tex maximizing a random objective function. Moreover, this set of constraints I for which it is tight. path may be found by the shadow-vertex simplex method. We stress that while our algorithm involves a perturba- 3. THE SHADOW SIZE IN THE k-ROUND tion, it is intrinsically different from previous papers that CASE have provided average-case or smoothed analyses of linear programming. In those papers, one shows that, given some Definition 3.1. We say that a polytope P is k-round if linear program, one can probably use the simplex method to solve a nearby but different linear program; the pertur- B(0 , 1) P B(0 , k), ⊆ ⊆ bation actually modified the input. In the present paper, where B(0, r) is the ball of radius r centered at the origin. our perturbation is used to inform the walk that we take on the (feasible or infeasible) vertices of our linear program; In this section, we will consider a polytope P defined by however, we actually solve the exact instance that we are x i, a T x 1 , given. We believe that ours is the first simplex algorithm to |∀ i ≤ achieve this, and we hope that our results will be a useful in the case that P nis k-round. Noteo that the condition step on the path to a strongly polynomial-time algorithm B(0 , 1) P implies a 1. for linear programming. i We will⊆ then considerk kthe≤ polytope we get by perturbing We note that no effort has been made to optimize our the right-hand sides, bounds; we shall include more precise bounds in a later ver- sion of this paper. T Q = x i, a x 1 + ri , |∀ i ≤ n o 2. THE SHADOW-VERTEX METHOD where each ri is an independent exponentially distributed random variable with expectation λ. That is, Let P be a convex polyhedron, and let S be a two-dimensional t/λ subspace. The shadow of P onto S is simply the projection Pr [ri t] = e− of P onto S. The shadow is a polygon, and every vertex ≥ (edge) of the polygon is the image of some vertex (edge) of for all t 0. P . One can show that the set of vertices of P that project We will≥ prove that the expected number of edges of the onto the boundary of the shadow polygon are exactly the projection of Q onto a random 2-plane is polynomial in n, vertices of P that optimize objective functions in S [2, 6]. k and 1/λ. In particular, this will imply that for a random These observations are the inspiration for the shadow- objective function, the shortest path from the minimum ver- vertex simplex method, which lifts the simplicity of linear tex to the maximum vertex is expected to have a number of programming in two dimensions to the general case [2, 6]. To steps polynomial in n, k and 1/λ. Our proof will proceed by analyzing the expected length first inequality follows from a union bound: of edges that appear on the boundary of the projection. We ∞ shall show that the total length of all such edges is expected E [max ri] = Pr [max ri t] ≥ to be bounded above. However, we shall also show that our Zt=0 perturbation will cause the expected length of each edge ∞ t/λ Pr min(1, ne− ) to be reasonably large. Combining these two statements ≤ Zt=0 will provide a bound on the expected number of edges that λ ln n h i ∞ t/λ appear. = 1 + ne− Zt=0 Zλ ln n = (λ ln n) + λ Theorem 3.2. Let v and w be uniformly random unit = λ ln(ne), vectors, and let V be their span. Then, the expectation over v, w, and the ris of the number of facets of the projection as desired. of Q onto V is at most We shall now prove the lemmas necessary for Lemma 3.9, 12πk(1 + λ ln(ne))√dn which bounds the expected length of an edge, given that it . λ appears in the shadow. Our proof of Lemma 3.9 will have two parts. In Lemma 3.7, we will show that it is unlikely Proof. We first observe that the perimeter of the shadow that the edge indexed by I is short, given that it appears on of P onto V is at most 2πk. Let r = max r . Then, as i i the convex hull of Q. We will then use Lemma 3.8 to show that, given that it appears in the shadow, it is unlikely that Q x i, a T x 1 + r = (1 + r)P, ⊆ |∀ i ≤ its projection onto the shadow plane is much shorter. To fa- n o cilitate the proofs of these lemmas, we shall prove some aux- the perimeter of the shadow of Q onto V is at most 2πk(1 + iliary lemmas about shifted exponential random variables. r). As we shall show in Proposition 3.3, the expectation of r is at most λ ln(ne), so the expected perimeter of the shadow Definition 3.4. We say that r is a shifted exponential of Q on V is at most 2πk(1 + λ ln(ne)). random variable with parameter λ if there exists a t R such Now, each edge of Q is determined by the subset of d 1 that r = s t, where s is an exponential random∈variable − of the constraints that are tight on that edge. For each with expectation− λ. [n] I , let SI (V ) be the event that edge I appears in the ∈ d 1 shadow,− and let `(I) denote the length of that edge in the Proposition 3.5. Let r be a shifted exponential random ` ´ shadow. We now know variable of parameter λ. Then, for all q R and  0, ∈ ≥ 2πk(1 + λ ln(ne)) E [`(I)] Pr r q +  r q /λ. ≥ ≤ ≥ ≤ [n] I ( − ) Proof. ˆ ˛ ˜ ∈Xd 1 As r q is a shifted˛ exponential random variable, it suffices to consider− the case in which q = 0. So, assume = E [`(I) SI (V )] Pr [SI (V )] . | q = 0 and r = s t, where s is an exponential random I [n] − ∈X(d−1) variable of expectation λ. We now need to compute Pr s t +  s t . (2) Below, in Lemma 3.9, we will prove that ≤ ≥ ˆ ˛ ˜ λ We only need to consider the case˛  < λ, as the proposition is E [`(I) SI (V )] . | ≥ 6√dn trivially true otherwise. We first consider the case in which t 0. In this case, we have ≥ From this, we conclude that 1 t+ s/λ e− ds (2) = Pr s t +  s t = λ s=t ≤ ≥ 1 ∞ (1/λ)e s/λds E [number of edges] = Pr [SI (V )] λ s=Rt − [n] ˆ ˛ ˜ t/λ t/λ+/λ I (d−1) ˛ e−R e− ∈X = − e t/λ 12πk(1 + λ ln(ne))√dn − , = 1 e/λ /λ, ≤ λ − ≤ as desired. for /λ 1. Finally≤, the case when t 0 follows from the analysis in the case t = 0. ≤ We now prove the various lemmas used in the proof of Theorem 3.2. Our first is a straightforward statement about Lemma 3.6. For N and P disjoint subsets of 1, . . . , n , exponential random variables. { } let ri i P and rj j N be independent random variables, each{ of} which∈ is {a shifte} ∈ d exponential random variable with Proposition 3.3. Let r1, . . . , rn be independent exponen- parameter at least λ. Then tially distributed random variables of expectation λ. Then,

Pr min(ri) + min(rj ) <  min(ri) + min(rj ) 0 E [max ri] λ ln(ne). i P j N i P j N ≥ ≤ » ∈ ∈ ∈ ∈ – Proof. ˛ n/2λ. This follows by a simple calculation, in which the ˛ ≤ Proof. Assume without loss of generality that P N , Lemma 3.8. Let Q be an arbitrary polytope, and let I in- so P n/2. | | ≤ | | dex an edge of Q. Let v and w be random unit vectors, and | | ≤+ Set r = mini P ri and r− = minj N rj . Sample r− let V be their span. Let SI (V ) be the event that the edge I ∈ ∈ according to the distribution induced by the requirement appears on the convex hull of the projection of Q onto V . + that r + r− 0. Given the sampled value for r−, the Let θI (V ) denote the angle of the edge I to V . Then ≥ + induced distribution on r is simply the base distribution 2 + Pr [cos(θI (V )) <  SI (V )] d . restricted to the space where r r−. So, it suffices to v,w | ≤ bound ≥ − + + max Pr r <  r− r r− r− r+ − ≥ − ˆ ˛ ˜ ˛ = max Pr min(ri) <  r− min(ri) r− W r− r+ i P − i P ≥ − » ∈ ∈ – ˛ ˛ max Pr ri <  r− min(ri) r− q ≤ r− r+ − i P ≥ − i P » ∈ – X∈ ˛ ˛ y = max Pr ri <  r− min(ri) r− r− r+ − i P ≥ − i P » ∈ – x X∈ ˛ V ˛ = max Pr ri <  r− ri r− r− r+ − ≥ − i P X∈ ˆ ˛ ˜ P (/λ), ˛ ≤ | | where the last equality follows from the independence of the ri’s, and the last inequality follows from Proposition 3.5. Figure 1: The points x, y and q. Lemma 3.7. Let I [n] , and let A(I) be the event that ∈ d 1 I appears on the convex hul−l of Q. Let δ(I) denote the length ` ´ Proof. of the edge I on Q. Then, As in the proof of Lemma 3.7, parameterize the edge by n Pr [δ(I) <  A(I)] . | ≤ 2λ l(t) := p + tq, Proof. Without loss of generality, we set I = 1, . . . , d 1 . q { − } where is a unit vector. Observe that SI (V ) holds if and As our proof will not depend upon the values of r1, . . . , rd 1, − only if V non-trivially intersects the cone i I αia i αi 0 , assume that they have been set arbitrarily. Now, parame- which we denote C. To evaluate the probabilit∈ y, |we ≥will terize the line of points satisfying perform a change of variables that will˘bPoth enable us to¯ T easily evaluate the angle between q and V and to determine a i x = 1 + ri, for i I, ∈ whether SI (V ) holds. Some of the new variables that we by introduce are shown in Figure 1. a l(t) := p + tq, First, let W be the span of a i i I , and note that W is also the subspace orthogonal{ to| q∈. The} angle of q to V where p is the point on the line closest to the origin, and q is determined by the angle of q to the unit vector through is a unit vector orthogonal to p. For each i d, let ti index the projection of q onto V , which we will call y. Fix any th ≥ the point where the i constraint intersects the line, i.e., vector c C, and let x be the unique unit vector in V that ∈ T is orthogonal to y and has positive inner product with c. a l(t ) = 1 + r . (3) i i i Note that x is also orthogonal to q, and so x V W . ∈ ∩ Now, divide the constraints indexed by i I into a pos- Also note that SI (V ) holds if and only if x C. itive set, P = i d a T q 0 , and a negativ6∈ e set N = Instead of expressing V as the span of v ∈and w, we will ≥ | i ≥ i d a T q < 0 . Note that each constraint in the positive express it as the span of x and y, which are much more i ˘ ¯ set≥is satisfied| by l( ) and each constraint in the negative useful vectors. In particular, we need to express v and w set˘ is satisfied b¯y l(−∞). The edge I appears in the convex in terms of x and y, which we do by introducing two more ∞ hull if and only if for each i P and j N, tj < ti. When variables, α and β, so that the edge I appears, its length∈ is ∈ v = x cos α + y sin α, and min ti tj . i P, j N w = x cos β + y sin β. ∈ ∈ − 1 a T p Note that number of degrees of freedom has not changed: v Solving (3) for i P , we find ti = a T q 1 a i + ri . ∈ i − w x 1 T and each had d 1 degrees of freedom, while only has Similarly, for j N, we find tj = 1 + a p rj . − a T q ` j ´ d 2 degrees of freedom since it is restricted to be orthogonal ∈ j − − − Thus, t for i P and t for j | N| are both shifted to q, and given x, y only has d 2 degrees of freedom since i j ` ´ − exponential random∈ variables− with parameter∈ at least λ. So, it is restricted to be orthogonal to x. by Lemma 3.6, We now make one more change of variables so that the angle between q and y becomes a variable. To do this, we

Pr min ti tj <  A(I) < n/2λ. let θ = θI (V ) be the angle between y and q, and note that ri i I i P, j N − | { | 6∈ } » ∈ ∈ – once θ and x have been specified, y is constrained to lie on a d 2 dimensional sphere. We let z denote the particular 2. setting v to be a uniformly chosen unit vector of angle poin−t on that sphere. θ to u. Deshpande and Spielman [4, Full version] prove that the Jacobian of this change of variables from α, β, x , θ, z to v and w is Theorem 4.2. Let a , . . . ,a be vectors of norm at most d 3 d 2 1 n c(cos θ)(sin θ) − sin(α β) − , − 1. Let r1, . . . , rn be independent exponentially distributed where c is a constant depending only on the dimension. random variables with expectation λ. Let Q be the polytope Now, to compute the probability, we will fix α and β ar- given by bitrarily and integrate over x C. T ∈ Q = x i, a x 1 + ri . |∀ i ≤ Pr [cos(θI (V )) <  SI (V )] V | n o Let u be an arbitrary unit vector, ρ < 1/√d, and let v be a n−1 1 dv dw v,w S :V C= and θI (V )  random ρ perturbation of u. Let w be a uniformly chosen = ∈ ∩ 6 ∅ ≤ R v,w Sn−1:Span(v,w) C= 1 dv dw random unit vector. Then, for all t > 1, ∈ ∩ 6 ∅ d 3 x RC,z θ>arccos() c(cos θ)(sin θ) − dx dz dθ = ∈ d 3 x z R xR C,z ,θ c(cos θ)(sin θ) − d d dθ Er1,...,rn,v,w ShadowSizespan(v,w)(Q B(0 , t)) ∈ ∩ π/2 d 3 R (cos θ)(sin θ) − dθ ˆ 42πt(1 + λ log˜n)√dn = θ=arccos() . π/2 d 3 ≤ λρ R θ=0 (cos θ)(sin θ) − dθ d 2 π/2 Proof. (sinRθ) − The proof of Theorem 4.2 is almost identical to = arccos() that of Theorem 3.2, except that we substitute Lemma 4.3 (sin θ)d ˛ 2 π/2 −˛ 0 for Lemma 3.7, and we substitute Lemma 4.4 for Lemma 3.8. d 2 1 (sin(arccos˛ ()) − ≤ − ˛ 2 d 2 2 1 (1  ) − (d 2) . ≤ − − ≤ − Lemma [n] 4.3. For I d 1 and t > 0, Lemma [n] ⊆ − 3.9. For all I d 1 , EV,r1,...,rn [`(I) SI (V )] λ ∈ − | ≥ ` ´ √ . n 6 dn ` ´ Pr δ(I) <  A(I) and I B(0 , t) = . Proof. For each edge I, `(I) = δ(I) cos(θI (V )). By Lemma 3.7, ∩ 6 ∅ ≤ 2λ ˆ ˛ ˜ λ Proof. ˛ Pr δ(I) A(I) 1/2. The proof is identical to the proof of Lemma 3.7, ≥ n ≥ except that in the proof of Lemma 3.6 we must condition » ˛ – ˛ upon the events that By Lemma 3.8, ˛ √ Pr cos(θI (V )) 1/ 2d SI (V ) 1/2. + V ≥ ≥ r t p and r− t p . h ˛ i ≥ − − k k ≤ − k k ˛ √ λ So, given that edge I appears on the˛ shadow, `(I) > (1/ 2d) n p p with probability at least 1/4. Thus, its expected length These conditions have no impact on any part of the proof. when it appears is at least λ . ` ´ 6√dn 4. THE SHADOW SIZE IN THE GENERAL CASE Lemma 4.4. Let Q be an arbitrary polytope, and let I in- dex an edge of Q. Let u be any unit vector, let ρ < 1/√d, In this section, we present an extension of Theorem 3.2 and let v be a random ρ perturbation of u. Let w be a uni- that we will require in the analysis of our simplex algorithm. formly chosen random unit vector, and let V = span(u, v). We extend the theorem in two ways. First of all, we examine Then what happens when P is not near isotropic position. In this 2 2 case, we just show that the shadow of the convex hull of Pr [cos(θI (V )) <  SI (V )] 3.5 /ρ . v,w | ≤ the vertices of bounded norm probably has few edges. As such, if we take a polynomial number of steps around the Proof. We perform the same change of variables as in shadow, we should either come back to where we started or Lemma 3.8. find a vertex far from the origin. Secondly, we consider the To bound the probability that cos θ < , we will allow shadow onto random planes that come close to a particular the variables x, z , α and β to be fixed arbitrarily, and just vector, rather than just onto uniformly random planes. consider what happens as we vary θ. To facilitate writing the Definition 4.1. For a unit vector u and a ρ > 0, we resulting probability, let µ denote the density function on v. define the ρ-perturbation of u to be the random unit vector If we fix x, z , α and β, then we can write v as a function of v chosen by θ. Moreover, as we vary θ by φ, v moves through an angle of at most φ. So, for all φ < ρ and θ, 1. choosing a θ [0, π] according to the restriction of the exponential distribution∈ of expectation ρ to the range [0, π], and µ(v(θ)) < µ(v(θ + φ))/e. (4) With this fact in mind, we compute the probability to be and only if the original is, and from whose solution one can obtain the solution to the original1 . By solving this system v,w Sn−1:V C= and θ (V )  µ(v) dv dw ∈ ∩ 6 ∅ I ≤ with a randomly chosen right-hand side vector we can solve R v,w Sn−1:V C= µ(v) dv dw system (1) while avoiding the combinatorial complications ∈ ∩ 6 ∅ π/2 d 3 of the feasible set of (1). R (cos θ)(sin θ) − µ(v(θ)) dθ = θ=arccos() In our algorithm, we will certify boundedness of (1) by π/2 d 3 finding the vertices minimizing and maximizing some objec- R θ=0 (cos θ)(sin θ) − µ(v(θ)) dθ tive function. Provided that the system is non-degenerate, π/2 d 3 θ=arccosR ()(cos θ)(sin θ) − µ(θ) dθ which it is with high probability under our choice of right- ≤ π/2 d 3 hand sides, this can be converted into a solution to (7). R θ=π/2 ρ(cos θ)(sin θ) − µ(θ) dθ − R π/2 d 3 e θ=arccos()(cos θ)(sin θ) − dθ 6. OUR ALGORITHM , by (4) ≤ π/2 d 3 Our bound from Theorem 3.2 suggests a natural algo- Rθ=π/2 ρ(cos θ)(sin θ) − dθ − rithm for certifying the boundedness of a linear program of R d 2 π/2 (sin θ) − the form given in (8): set each bi to be 1 + ri, where ri is an = e arccos() π/2 exponential random variable, pick a random objective func- d 2˛ (sin θ) − π/2 ρ ˛ − tion c and a random two-dimensional subspace containing d 2 1 (sin(arccos˛ ()) − it, and then use the shadow-vertex method with the given = e − ˛ 1 (sin(ρ)d 2 subspace to maximize and minimize c. − − 2 d 2 In order to make this approach into a polynomial-time 1 (1  ) − e − − algorithm, there are two difficulties that we must surmount: ≤ 1 (1 ρ2/2)d 2 − − − (d 2)2 1. To use the shadow-vertex method, we need to start e − , as ρ < 1/√d, with some vertex that appears on the boundary of the ≤ (d 2)2(1 1/√e)ρ2 shadow. If we just pick an arbitrary shadow plane, − 2 − 3.5(/ρ) . there is no obvious way to find such a vertex. ≤ 5. REDUCTION OF LINEAR PROGRAM- 2. Theorem 3.2 bounds the expected shadow size of the vertices of bounded norm in polytopes with perturbed MING TO CERTIFYING BOUNDEDNESS right-hand sides, whereas the polytope that we are We now recall an old trick [12, p. 125] for reducing the given may have vertices of exponentially large norm. problem of solving a linear program in form (1) to a different If we naively choose our perturbations, objective func- form that will be more useful for our purposes. We recall tion, and shadow plane as if we were in a coordi- that the dual of such a linear program is given by nate system in which all of our vertices had bounded norm, the distribution of vertices that appear on the minimize b y (5) · shadow may be very different, and we have no guar- subject to AT y = c, y 0, ≥ antees about the expected shadow size. and that when the programs are both feasible and bounded, We address the first difficulty by constructing an artificial they have the same solution. Thus, any feasible solution to vertex at which to start our simplex algorithm. To address the system of constraints the second difficulty, we start out by choosing our random Ax b, x Rd, (6) variables from the naive distributions. If this doesn’t work, T ≤ ∈ we iteratively use information about how it failed to improve A y = c, y 0 , ≥ the probability distributions from which we sample and try c x = b y · · again. provides a solution to both the linear program and its dual. 6.1 Constructing a Starting Vertex Using standard techniques, one can reduce the solution of a feasibility problem in this form to a feasibility problem of In order to use the shadow-vertex method on a polytope the form P , we need a shadow plane S and a vertex v that appears on the boundary of the shadow. One way to obtain such a T A1 z = 0 (7) pair is to pick any vertex v, randomly choose (from some z 0 , z = 0 , ) an objective function c optimized ≥ 6 by v, let u be a uniformly random unit vector, and set b c where A1 is a matrix constructed from A, and . When S = span(c, u). the system (7) is non-degenerate, a solution to the system However, to apply the bound on the shadow size given by is equivalent to a certificate that a system of form Theorem 4.2, we need to choose c to be a ρ-perturbation

A1w b1 (8) of some vector. For such a c to be likely to be optimized ≤ 1 is bounded, where the choice of the vector b1 does not mat- If one views the problem as asking if the origin is contained ter as it does not affect boundedness. However, our reduc- inside the convex hull of the rows of A1, then the pertur- bation pushes the origin slightly towards the average of the tion produces a system that is degenerate. This is easily rows. The magnitude of the perturbation is small enough remedied, in strongly-polynomial time, by applying the - that it cannot make an infeasible system feasible, but it only perturbation technique of Megiddo and Chandrasekaran [11], increases the size of the input by a polynomial factor in n which produces a non-degenerate system that is solvable if and d. by v, we need v to optimize a reasonably large ball of ob- Algorithm 6.1: CheckBoundedness(a 1, . . . ,a n) jective functions. To guarantee that we can find such a v, we create one. That is, we add constraints to our polytope Require each a i has norm at most 1. to explicitly construct an artificial vertex with the desired Set k = 16d + 1, λ = 1/n, ρ = 1/6dk2 properties. (This is similar to the “Phase I” approaches that 7 9/2 s = 4 10 d n, and w as described in text; have appeared in some other simplex algorithms.) i Initialize Q· := Id , r := 0; Suppose for now that the polytope x Ax 1 is k-round. n { | ≤ } Repeat until you return an answer Construct a modified polytope P 0 by adding d new con- T Construct constraints for starting corner: straints, w i x 1, i = 1, . . . , d , where T ≤ a n+i := Q w i/(1 w i (Qr)) for i = 1, . . . , d; 8 T− · ˘ ¯ 2 bi := (1 + βi)(1 + a i r) for i = 1, . . . n + d, (1) w i = ej + √dei/3k , > − > βi exponential random vars with expectation λ; j > “ X ” > T >Set starting corner x 0 := point where a i x 0 = bi w w w x > and i = i/(2 i ). Let 0 be the vertex at which > for i = n + 1, . . . , n + d; || || > T w 1, . . . , w d are all tight. Furthermore, let c be a ρ-perturbation >If x 0 violates a x 0 bi for any i, go back to (1) 2 > i ≤ of the vector 1/√d, with ρ = 1/6dk , and let x 1 be the ver- > and generate new random variables; > tex at which c is maximized. We can prove: >c := QT γ, with γ a ρ-perturbation of 1/√d; > > S c Q T u Lemma 6.1. The following three properties hold with high >Shadow plane := span( , ), > with u a uniformly random unit vector; probability. Furthermore, they remain true with probability > n >Run ShadowVertex((a , . . . ,a ), b, c, S, x , s) : 1 (d + 2)e− if we perturb all of the right-hand sides of > 1 n+d 0 − > If returns unbounded then the constraints in P 0 by an exponential random variable of > > return (unbounded); expectation λ = 1/n. > > 8If returns (fail, y ) then > > 0 1. The vertex x appears on P 0, > > 0 > > set y := y 0 and go to (3); > <> > If returns (opt, v 0) then 2. c is maximized at x 0, and > v v − > > set := 0 and continue to (2); >Run> ShadowVertex((a , . . . ,a ), b, c, S, v, s) : (2) 3. None of the constraints w 1, . . . , w d is tight at x 1. < > 1 n :If returns unbounded then Proof. Follows from Lemma 7.1 and bounds on tails of > > return (unbounded); exponential random variables. > 8 > If returns (fail, y 0) then > > > > set y := y 0 and go to (3); Set > <> > If returns (opt, v 0) then 7 9/2 > v v return v v k := 16d + 1 and s := 4 10 d n. > > set 0 := 0 and ( , 0); · > > Q r >Up>date and : (3) Let S = span(c, u), where u is a uniform random unit vec- > : > If Q(y + r) 2k then tor. If P is k-round, then by Lemma 6.1 and Theorem 3.2 > || || ≤ Q r > don’t change or we can run the shadow vertex method on P 0 with shadow > 8else > > plane S and starting at vertex x 0, and we will find the ver- > > Set M := the matrix that scales down > > tex x 1 that maximizes c within s steps, with probability at > > Q(y + r) by factor of 8 and scales w x x > > least 1/2. Since none of the i are tight at 1, 1 will also > >8 vectors in orthogonal complement up c > <> be the vertex of the original polytope P that maximizes . > > by factor of 1 1/d; This gives us the vertex x of P that maximizes c. We > > − 1 > > > > > r := r + 7Q(y + r)/ Q(y + r) ; the same shadow plane. This time, we start at x and find > >> || || 1 > >> the vertex that maximizes c. We are again guaranteed to :> :>:> have an expected polynomial-sized− shadow, so this will again Instead, we shall start out as we would in the k-round case, succeed with high probability. This will give us a pair of adding in artificial constraints w 1, . . . , w d, and choosing an vertices that optimize c and c, from which we can compute objective function and shadow plane as in Section 6.1. By our desired certificate of boundedness.− It just remains to Theorem 4.2, running the shadow-vertex method for s steps deal with polytopes that are not near isotropic position. will yield one of two results with probability at least 1/2:

6.2 Polytopes Far from Isotropic Position 1. It will find the optimal vertex x 1, or We first observe that for every polytope there exists an 2. It will find a vertex y of norm at least 2k. affine change of coordinates (i.e., a translation composed with a change of basis) that makes it k-round. An affine In the first case, we can proceed just as in the k-round case change of coordinates does not change the combinatorial and run the shadow-vertex method a second time to opti- structure of a polytope, so this means that there exists some mize c, for which we will have the same two cases. probability distribution on b and S for which the shadow In the− second case, we have not found the optimal ver- has polynomial expected size. We would like to sample b tex, but we have with high probability learned a point of and S from these probability distributions and then pull large norm inside our polytope. We can use this point to the result back along the change of coordinates. Unfortu- change the probability distributions from which we draw nately, we don’t know an affine transformation that makes our random variables and then start over. This changes our our polytope k-round, so we are unable to sample from these randomized pivot rule on the graph of potential vertices of distributions. our polytope, hopefully putting more probability mass on short paths from the starting vertex to the optimum. We of any number in the input, and so their convex hull has 3 shall show that, with high probability, we need only repeat volume at most 2O(n L) times that of the unit ball [7]. Each this process a polynomial number of times before we find iteration of the algorithm that finds a point of norm at least a right-hand side and shadow plane for which the shadow- k decreases the volume of P by a factor of at least 2. All vertex method finds the optimum. of the polytopes that we construct contain the unit ball, so Our analysis rest upon the following geometric lemma, this can occur at most O(n3L) times. This guarantees that proved in Section 7: the Repeat loop finds an answer after a O(n3L) iterations with high probability, as desired. Lemma 6.2. Let B Rd be the unit ball, let P be a point ⊆ at distance S from the origin, and let C = conv(B, P ) be While the algorithm requires samples from the exponen- their convex hull. If S 16d + 1, then C contains an ellipse ≥ 2 tial distribution and uniform random points on the unit of volume at least twice that of B, having d 1 semi-axes of sphere, it is not difficult to show that it suffices to use stan- length 1 1/d and one semi-axis of length at−least 8 centered − dard discretizations of these distributions of bit-length poly- at the point of distance 7 from the origin in the direction of nomial in n and d. P .

We remark that the number of times that we have to 7. GEOMETRIC LEMMAS FOR ALGORITHM’S change probability distributions depends on the bit-length CORRECTNESS of the inputs, and that this is the only part of our algo- rithm in which this is a factor. Otherwise, the execution of Lemma our algorithm is totally independent of the bit-length of the 7.1. Let P be a k-round polytope, let c and q be inputs. unit vectors, and let Theorem v = argmax c x 6.3. If each entry of the vectors a i is specified x P · using L bits, then CheckBoundedness() either produces a ∈ certificate that its input is bounded or that it is unbounded be the vertex of P at which c x is maximized. If c q 2 2 · · ≤ within O(n3L) iterations, with high probability. (2k 1)/2k , then v q 0. − − · ≤ Proof. It will be helpful to think of the input to Proof. We first note that T CheckBoundedness() as being the polytope x a i x 1 i 2 | ≤ ∀ 2 2 2 2k 1 1 instead of just the vectors a 1, . . . ,a n. We can then talk q + c = q + c + 2(c q) 2 2− = 2 , about running this algorithm on an arbitrary˘ polytope ¯ || || || || || || · ≤ − k k T x α x τi i by rewriting this polytope as so q + c 1/k. The fact that P is contained in B(0, k) | i ≤ ∀ || || ≤ x (α /τ )T x 1 i . implies that v k, and the fact that P contains the unit ˘ i i ¯ || || ≤ With| this notation,≤ ∀ it is easy to check that running an ball implies that ˘iteration of the Repeat¯ loop on a polytope P with Q = Q 0 v c = max c x 1. and r = r 0 is equivalent to running the same code on the x P · ∈ · ≥ polytope Q0(P + r 0) with Q = Id and r = 0. The update step at the end of the algorithm can therefore be thought of We therefore have as applying an affine change of coordinates to the input and q v = c v + (q + c) v 1 + q + c v 0, then restarting the algorithm. · − · · ≤ − || |||| || ≤ If Q = Idn and r = 0, the argument from Section 6.1 as desired. proves that the first iteration of the Repeat loop will either prove boundedness, prove unboundedness, or find a point We now prove some geometric facts that will be necessary with norm at least k with probability at least 1/2. In either for the analysis of our algorithm. We first prove a two- of the first two cases, the algorithm will have succeeded, so dimensional geometric lemma. We then use this to prove a it suffices to consider the third. higher-dimensional analogue, which is the version that we If a point y is in the polytope P 0 = x Ax b , the shall actually use to analyze our algorithm. point y/2 will be in the polytope P = {x |Ax ≤1 } with n { | ≤ } probability at least 1 ne− . This guarantees that P con- 7.1 2-Dimensional Geometry Lemma − tains a point of norm at least k. Since P contains the unit In this section, we prove a lemma about the two-dimensional ball, Lemma 6.2 implies that P contains an ellipse of volume objects shown in Figure 2. In this picture, C is the center at least twice that of the unit ball. The update step of our of a circle C of radius 1. P is a point somewhere along the algorithm identifies such an ellipse and scales and translates positive x-axis, and we have drawn the two lines tangent to so that it becomes the unit ball, and it then restarts with the circle through P , the top one of which we have labeled L. this new polytope as its input. This new polytope has at E is the center of an axis-parallel ellipse E with horizontal most half the volume of the original polytope. semi-axis M 1 and vertical semi-axis m 1. The ellipse All the vertices of the original polyhedron are contained in ≥ ≤ 2 is chosen to be a maximal ellipse contained in the convex a ball of radius 2O(n L), where L is the maximum bit-length hull of the circle and P . Furthermore, let S be the distance 2 2 T 1 from C to P , and let Q = (1 m )/2. If an ellipsoid E is given as the set E = x x Q− x 1 , − where Q is a symmetric, positive definite matrix,| then≤ the semi-axes of E have lengths equal to the˘the eigenvalues ¯of Lemma 7.2. With the definitions above, Q. For example, the semi-axes of the sphere are all of length M = Q(S 1) + 1. 1. − 1 m L S C E M P

Q=(1-m2)/2

Figure 2: The geometric objects considered in Lemma 7.2

Proof. Without loss of generality, let E be the origin. 7.2 High-Dimensional Geometry Lemma The circle and ellipse are mutually tangent at their leftmost d points on the x-axis, so C is at ( M + 1, 0), and P is there- Lemma 7.3. Let B R be the unit ball, let P be a point ⊆ fore at (S M + 1, 0). Let − at distance S from the origin, and let C = conv(B, P ) be − their convex hull. For any m 1, C contains an ellipsoid 1 1 with (d 1) semi-axes of length≤ m and one semi-axis of ` = , 1 , − 2 S − S2 length 1 m (S 1)/2 + 1. r ! − − Proof. Without loss of generality, take P = (S, 0, . . . , 0). and let L be the line given by ` ´ Consider an axis-parallel ellipsoid E with the axes described S M + 1 in the above theorem, with its distinct axis parallel to e1, L = (x, y) ` (x, y) = − . | · S and translated so that it is tangent to B at ( 1, 0, . . . , 0).  ff We assert that E is contained in C. It suffices− to check We claim that L has the following three properties, as the containment when we intersect with an arbitrary 2- shown in Figure 2: dimensional subspace containing 0 and P . In this case, we 1. L passes through P . have exactly the setup of Lemma 7.2, and our result follows immediately. 2. L is tangent to C. Proof Proof of Lemma 6.2. If we set m = 1 1/d, − 3. If we take the major semi-axis M of the ellipse E to then Lemma 7.3 guarantees that the length of the longer be Q(S 1) + 1, then L is tangent to E. semi-axis of the ellipse will be at least − Establishing these properties would immediately imply 1 2 16d 1 1 8. Lemma 7.2, so it suffices to check them one by one. − − d 2 ≥ „ « ! 1. This follows by direct computation—we simply note So, the ratio of the volume of the unit ball to the ellipse is that the point P = (S M +1, 0) satisfies the equation at least − for L. d 1 1 − V/vol(B) 1 8 2. It suffices to show that the distance from the point C ≥ − d „ « to the line L is exactly 1. Since is the unit normal 8 to L, it suffices to check that L = 2. ≥ 4 S M + 1 M + 1 ` C = − 1 = − , 8. REFERENCES · S − S „ « [1] D. Bertsimas and S. Vempala. Solving convex which again follows by direct computation. programs by random walks. J. ACM, 51(4):540–556, 2004. 3. Let [2] K. H. Borgwardt. The Simplex Method: a probabilistic S analysis. Number 1 in Algorithms and Combinatorics. = ( x, y) = ` L L L S M + 1 Springer-Verlag, 1980. − 1 √S2 1 [3] G. B. Dantzig. Maximization of linear function of = , − , S M + 1 S M + 1 variables subject to linear inequalities. In T. C. „ − − « Koopmans, editor, Activity Analysis of Production and so that L = (x, y) (x, y) = 1 . When expressed Allocation, pages 339–347. 1951. { | L · } in this form, L will be tangent to E if and only if [4] A. Deshpande and D. A. Spielman. Improved 2 2 2 2 xM + ym = 1. This can be verified by plugging of the shadow vertex simplex L L 2 in M = Q(S 1) + 1 and Q = (1 m )/2, and then method. preliminary version appeared in FOCS ’05, − − expanding the left-hand side of the equation. 2005. [5] J. Dunagan and S. Vempala. A simple polynomial-time rescaling algorithm for solving linear programs. In Proceedings of the thirty-sixth annual ACM Symposium on Theory of Computing (STOC-04), pages 315–320, New York, June 13–15 2004. ACM Press. [6] S. Gass and T. Saaty. The computational algorithm for the parametric objective function. Naval Research Logistics Quarterly, 2:39–45, 1955. [7] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, 1991. [8] G. Kalai and D. J. Kleitman. A quasi-polynomial bound for the diameter of graphs of polyhedra. Bulletin Amer. Math. Soc., 26:315–316, 1992. [9] N. Karmarkar. A new polynomial time algorithm for linear programming. Combinatorica, 4:373–395, 1984. [10] L. G. Khachiyan. A polynomial algorithm in linear programming. Doklady Akademia Nauk SSSR, pages 1093–1096, 1979. [11] N. Megiddo and R. Chandrasekaran. On the epsilon-perturbation method for avoiding degeneracy. Operations Research Letters, 8:305–308, 1989. [12] A. Schrijver. Theory of Linear and . John Wiley & Sons, 1986.