Smoothed Analysis of Algorithms: Why the Usually Takes Polynomial Time

DANIEL A. SPIELMAN

Massachusetts Institute of Technology, Boston, Massachusetts AND SHANG-HUA TENG

Boston University, Boston, Massachusetts, and Akamai Technologies, Inc.

Abstract. We introduce the smoothed analysis of algorithms, which continuously interpolates be- tween the worst-case and average-case analyses of algorithms. In smoothed analysis, we measure the maximum over inputs of the expected performance of an algorithm under small random perturbations of that input. We measure this performance in terms of both the input size and the magnitude of the perturbations. We show that the simplex algorithm has smoothed complexity polynomial in the input size and the standard deviation of Gaussian perturbations. Categories and Subject Descriptors: F.2.1 [Analysis of Algorithms and Problems Complexity]: Nu- merical Algorithms and Problems; G.1.6 []: Optimization— General Terms: Algorithms, Theory Additional Key Words and Phrases: Simplex method, smoothed analysis, complexity, perturbation

1. Introduction The Analysis of Algorithms community has been challenged by the existence of remarkable algorithms that are known by scientists and engineers to work well

A preliminary version of this article was published in the Proceedings of the 33rd Annual ACM Symposium on Theory of Computing (Hersonissos, Crete, Greece, July 6–8). ACM, New York, 2001, pp. 296–305. D. Spielman’s work at M.I.T. was partially supported by an Alfred P. Sloan Foundation Fellowship, NSF grants No. CCR-9701304 and CCR-0112487, and a Junior Faculty Research Leave sponsored by the M.I.T. School of Science S.-H. Teng’s work was done at the University of Illinois at Urbana-Champaign, Boston University, and while visiting the Department of Mathematics at M.I.T. His work was partially supported by an Alfred P. Sloan Foundation Fellowship, and NSF grant No. CCR: 99-72532. Authors’ addresses: D. A. Spielman, Department of Mathematics, Massachusetts Institute of Technol- ogy, Cambridge, MA 02139, e-mail: [email protected]; S.-H. Teng, Department of Computer Science, Boston University, Boston, MA 02215. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]. C 2004 ACM 0004-5411/04/0500-0385 $5.00

Journal of the ACM, Vol. 51, No. 3, May 2004, pp. 385–463. 386 D. A. SPIELMAN AND S.-H. TENG in practice, but whose theoretical analyses are negative or inconclusive. The root of this problem is that algorithms are usually analyzed in one of two ways: by worst-case or average-case analysis. Worst-case analysis can improperly suggest that an algorithm will perform poorly by examining its performance under the most contrived circumstances. Average-case analysis was introduced to provide a less pessimistic measure of the performance of algorithms, and many practical algorithms perform well on the random inputs considered in average-case analysis. However, average-case analysis may be unconvincing as the inputs encountered in many application domains may bear little resemblance to the random inputs that dominate the analysis. We propose an analysis that we call smoothed analysis which can help explain the success of algorithms that have poor worst-case complexity and whose inputs look sufficiently different from random that average-case analysis cannot be convinc- ingly applied. In smoothed analysis, we measure the performance of an algorithm under slight random perturbations of arbitrary inputs. In particular, we consider Gaussian perturbations of inputs to algorithms that take real inputs, and we mea- sure the running times of algorithms in terms of their input size and the standard deviation of the Gaussian perturbations. We show that the simplex method has polynomial smoothed complexity. The simplex method is the classic example of an algorithm that is known to perform well in practice but which takes exponential time in the worst case [Klee and Minty 1972; Murty 1980; Goldfarb and Sit 1979; Goldfarb 1983; Avis and Chv´atal 1978; Jeroslow 1973; Amenta and Ziegler 1999]. In the late 1970s and early 1980s the simplex method was shown to converge in expected polynomial time on var- ious distributions of random inputs by researchers including Borgwardt, Smale, Haimovich, Adler, Karp, Shamir, Megiddo, and Todd [Borgwardt 1980; Borgwardt 1977; Smale 1983; Haimovich 1983; Adler et al. 1987; Adler and Megiddo 1985; Todd 1986]. These works introduced novel probabilistic tools to the analysis of algorithms, and provided some intuition as to why the simplex method runs so quickly. However, these analyses are dominated by “random looking” inputs: even if one were to prove very strong bounds on the higher moments of the distributions of running times on random inputs, one could not prove that an algorithm performs well in any particular small neighborhood of inputs. To bound expected running times on small neighborhoods of inputs, we consider linear programming problems in the form maximize z T x subject to Ax y, (1) and prove that for every vector z and every matrix A¯ and vectory ¯ , the expectation over standard deviation (maxi k(y¯i , a¯ i )k) Gaussian perturbations A and y of A¯ andy ¯ of the time taken by a two-phase shadow-vertex simplex method to solve such a linear program is polynomial in 1/ and the dimensions of A. 1.1. LINEAR PROGRAMMING AND THE SIMPLEX METHOD. It is difficult to over- state the importance of linear programming to optimization. Linear programming problems arise in innumerable industrial contexts. Moreover, linear programming is often used as a fundamental step in other optimization algorithms. In a linear programming problem, one is asked to maximize or minimize a linear function over a polyhedral region. Smoothed Analysis of Algorithms 387

Perhaps one reason we see so many linear programs is that we can solve them efficiently. In 1947, Dantzig introduced the simplex method (see Dantzig [1951]), which was the first practical approach to solving linear programs and which re- mains widely used today. To state it roughly, the simplex method proceeds by walking from one vertex to another of the polyhedron defined by the inequali- ties in (1). At each step, it walks to a vertex that is better with respect to the objective function. The algorithm will either determine that the constraints are unsatisfiable, determine that the objective function is unbounded, or reach a ver- tex from which it cannot make progress, which necessarily optimizes the objec- tive function. Because of its great importance, other algorithms for linear programming have been invented. Khachiyan [1979] applied the ellipsoid algorithm to linear program- ming and proved that it always converged in time polynomial in d, n, and L—the number of bits needed to represent the linear program. However, the ellipsoid al- gorithm has not been competitive with the simplex method in practice. In contrast, the interior-point method introduced by Karmarkar [1984], which also runs in time polynomial in d, n, and L, has performed very well: variations of the interior point method are competitive with and occasionally superior to the simplex method in practice. In spite of half a century of attempts to unseat it, the simplex method remains the most popular method for solving linear programs. However, there has been no satisfactory theoretical explanation of its excellent performance. A fascinating approach to understanding the performance of the simplex method has been the attempt to prove that there always exists a short walk from each vertex to the optimal vertex. The Hirsch conjecture states that there should always be a walk of length at most n d. Significant progress on this conjecture was made by Kalai and Kleitman [1992], who proved that there always exists a walk of length at most + nlog2 d 2. However, the existence of such a short walk does not imply that the simplex method will find it. A simplex method is not completely defined until one specifies its pivot rule— the method by which it decides which vertex to walk to when it has many to choose from. There is no deterministic pivot rule under which the simplex method is known to take a subexponential number of steps. In fact, for almost every deter- ministic pivot rule there is a family of polytopes on which it is known to take an exponential number of steps [Klee and Minty 1972; Murty 1980; Goldfarb and Sit 1979; Goldfarb 1983; Avis and Chv´atal 1978; Jeroslow 1973]. (See Amenta and Ziegler [1999] for a survey and a unified construction of these polytopes). The best present√ analysis of randomized pivot rules shows that they take expected time nO( d)[Kalai 1992; Matouˇsek et al. 1996], which is quite far from the polynomial complexity observed in practice. This inconsistency between the exponential worst- case behavior of the simplex method and its everyday practicality leave us wanting a more reasonable theoretical analysis. Variousaverage-case analyses of the simplex method have been performed. Most relevant to this article is the analysis of Borgwardt [1977, 1980], who proved that the simplex method with the shadow vertex pivot rule runs in expected polynomial time for polytopes whose constraints are drawn independently from spherically symmet- ric distributions (e.g., Gaussian distributions centered at the origin). Independently, Smale [1983, 1982] proved bounds on the expected running time of Lemke’s self- dual parametric simplex algorithm on linear programming problems chosen from 388 D. A. SPIELMAN AND S.-H. TENG a spherically-symmetric distribution. Smale’s analysis was substantially improved by Megiddo [1986]. While these average-case analyses are significant accomplishments, it is not clear whether they actually provide intuition for what happens on typical inputs. Edelman [1992] writes on this point: What is a mistake is to psychologically link a random matrix with the intuitive notion of a “typical” matrix or the vague concept of “any old matrix.” Another model of random linear programs was studied in a line of research initi- ated independently by Haimovich [1983] and Adler [1983]. Their works considered the maximum over matrices, A, of the expected time taken by parametric simplex methods to solve linear programs over these matrices in which the directions of the inequalities are chosen at random. As this framework considers the maximum of an average, it may be viewed as a precursor to smoothed analysis—the distinction being that the random choice of inequalities cannot be viewed as a perturbation, as different choices yield radically different linear programs. Haimovich and Adler both proved that parametric simplex methods would take an expected linear num- ber of steps to go from the vertex minimizing the objective function to the vertex maximizing the objective function, even conditioned on the program being feasible. While their theorems confirmed the intuitions of many practitioners, they were ge- ometric rather than algorithmic1 as it was not clear how an algorithm would locate either vertex. Building on these analyses, Todd [1986], Adler and Megiddo [1985], and Adler et al. [1987] analyzed parametric algorithms for linear programming un- der this model and proved quadratic bounds on their expected running time. While the random inputs considered in these analyses are not as special as the random inputs obtained from spherically symmetric distributions, the model of randomly flipped inequalities provokes some similar objections.

1.2. SMOOTHED ANALYSIS OF ALGORITHMS AND RELATED WORK. We intro- duce the smoothed analysis of algorithms in the hope that it will help explain the good practical performance of many algorithms that worst-case does not and for which average-case analysis is unconvincing. Our first application of the smoothed analysis of algorithms will be to the simplex method. We will consider the maxi- mum over A¯ andy ¯ of the expected running time of the simplex method on inputs of the form maximize z T x subject to (A¯ + G)x (¯y +h), (2) where we let A¯ andy ¯ be arbitrary and G and h be a matrix and a vector of independently chosen Gaussian random variables of mean 0 and standard deviation (maxi k(y¯i , a¯ i )k). If we let go to 0, then we obtain the worst-case complexity of the simplex method; whereas, if we let be so large that G swamps out A,we obtain the average-case analyzed by Borgwardt. By choosing polynomially small , this analysis combines advantages of worst-case and average-case analysis, and roughly corresponds to the notion of imprecision in low-order digits.

1 Our results in Section 4 are analogous to these results. Smoothed Analysis of Algorithms 389

In a smoothed analysis of an algorithm, we assume that the inputs to the algorithm are subject to slight random perturbations, and we measure the complexity of the algorithm in terms of the input size and the standard deviation of the perturbations. If an algorithm has low smoothed complexity, then one should expect it to work well in practice since most real-world problems are generated from data that is inherently noisy. Another way of thinking about smoothed complexity is to observe that if an algorithm has low smoothed complexity, then one must be unlucky to choose an input instance on which it performs poorly. We now provide some definitions for the smoothed analysis of algorithms that take real or complex inputs. For an algorithm A and input x, let

CA(x) be a complexity measure of A on input x. Let X be the domain of inputs to A, and let Xn be the set of inputs of size n. The size of an input can be measured in various ways. Standard measures are the number of real variables contained in the input and the sums of the bit-lengths of the variables. Using this notation, one can say that A has worst-case C-complexity f (n)if

max(CA(x)) = f (n). x ∈Xn

Given a family of distributions n on Xn, we say that A has average-case C- complexity f (n) under if

E [CA(x)] = f (n). n x Xn Similarly, we say that A has smoothed C-complexity f (n,)if C + k k = , , max E A(x ( x ?) g) f (n ) (3) x ∈Xn g where ( kxk?)g is a vector of Gaussian random variables of mean 0 and standard deviation kxk? and kxk? is a measure of the magnitude of x, such as the largest element or the norm. We say that an algorithm has polynomial smoothed complexity if its smoothed complexity is polynomial in n and 1/. In Section 6, we present some generalizations of the definition of smoothed complexity that might prove useful. To further contrast smoothed analysis with average-case analysis,√ we note that the probability mass√ in (3) is concentrated in a region of radius O( n) and volume at most O( n)n, and so, when is small, this region contains an exponentially small fraction of the probability mass in an average-case analysis. Thus, even an extension of average-case analysis to higher moments will not imply meaningful bounds on smoothed complexity. A discrete analog of smoothed analysis has been studied in a collection of works inspired by Santha and Vazirani’s semi-random source model [Santha and Vazirani 1986]. In this model, an adversary generates an input, and each bit of this input has some probability of being flipped. Blum and Spencer [1995] design a polynomial- time algorithm that k-colors k-colorable graphs generated by this model. Feige and Krauthgamer [1998] analyze a model in which the adversary is more power- ful, and use it to show that Turner’s algorithm [Turner 1986] for approximating the bandwidth performs well on semi-random inputs. They also improve Turner’s anal- ysis. Feige and Kilian [1998] present polynomial-time algorithms that recover large 390 D. A. SPIELMAN AND S.-H. TENG independent sets, k-colorings, and optimal bisections in semi-random graphs. They also demonstrate that significantly better results would lead to surprising collapses of complexity classes.

1.3. OUR RESULTS. We consider the maximum over z ,¯y, anda ¯ 1,...,a¯n of the expected time taken by a two-phase shadow vertex simplex method to solve linear programming problems of the form maximize z T x

subject to ha i | xi yi , for 1 i n, (4) where each a i is a Gaussian random vector of standard deviation maxi k(y¯i , a¯ i )k centered ata ¯ i , and each yi is a Gaussian random variable of standard deviation maxi k(y¯i , a¯ i )k centered at y¯i . = k k < We√ begin by considering the case in which y 1, a¯ i 1, and 1/3dln n. In this case, our first result, Theorem 4.1, says that for every vector t the expected size of the shadow of the polytope—the projection of the polytope defined by the equations (4) onto the plane spanned by t and z —is polynomial in n, the dimension, and 1/. This result is the geometric foundation of our work, but it does not directly bound the running time of an algorithm, as the shadow relevant to the analysis of an algorithm depends on the perturbed program and cannot be specified beforehand as the vector t must be. In Section 3.3, we describe a two- phase shadow-vertex simplex algorithm, and in Section 5, we use Theorem 4.1 as a black box to show that it takes expected time polynomial in n, d, and 1/ in the case described above. Efforts have been made to analyze how much the solution of a linear program can change as its data is perturbed. For an introduction to such analyses, and an analysis of the complexity of interior point methods in terms of the resulting condition number, we refer the reader to the work of Renegar [1995b, 1995a, 1994].

1.4. INTUITION THROUGH CONDITION NUMBERS. For those already familiar with the simplex method and condition numbers, we include this section to provide some intuition for why our results should be true. Our analysis will exploit geometric properties of the condition number of a matrix, rather than of a linear program. We start with the observation that if a = corner of a polytope is specified by the equation AI x y I , where I is a d-set, then the condition number of the matrix AI provides a good measure of how far the corner is from being flat. Moreover, it is relatively easy to show that if A is subject to perturbation, then it is unlikely that AI has poor condition number. So, it seems intuitive that if A is perturbed, then most corners of the polytope should have angles bounded away from being flat. This already provides some intuition as to why the simplex method should run quickly: one should make reasonable progress as one rounds a corner if it is not too flat. There are two difficulties in making the above intuition rigorous: the first is that even if AI is well conditioned for most sets I , it is not clear that AI will be well conditioned for most sets I that are bases of corners of the polytope. The second difficulty is that even if most corners of the polytope have reasonable condition number, it is not clear that a simplex method will actually encounter many of these corners. By analyzing the shadow vertex pivot rule, it is possible to resolve both of these difficulties. Smoothed Analysis of Algorithms 391

The first advantage of studying the shadow vertex pivot rule is that its analysis comes down to studying the expected sizes of shadows of the polytope. From the specification of the plane onto which the polytope will be projected, one obtains a characterization of all the corners that will be in the shadow, thereby avoiding the complication of an iterative characterization. The second advantage is that these corners are specified by the property that they optimize a particular objective function, and using this property one can actually bound the probability that they are ill-conditioned. While the results of Section 4 are not stated in these terms, this is the intuition behind them. Condition numbers also play a fundamental role in our analysis of the shadow- vertex algorithm. The analysis of the algorithm differs from the mere analysis of the sizes of shadows in that, in the study of an algorithm, the plane onto which the polytope is projected depends upon the polytope itself. This correlation of the plane with the polytope complicates the analysis, but is also resolved through the help of condition numbers. In our analysis, we view the perturbation as the composition of two perturbations, where the second is small relative to the first. We show that our choice of the plane onto which we project the shadow is well-conditioned with high probability after the first perturbation. That is, we show that the second perturbation is unlikely to substantially change the plane onto which we project, and therefore unlikely to substantially change the shadow. Thus, it suffices to measure the expected size of the shadow obtained after the second perturbation onto the plane that would have been chosen after just the first perturbation. The technical lemma that enables this analysis, Lemma 5.3, is a concentration result that proves that it is highly unlikely that almost all of the minors of a random matrix have poor condition number. This analysis also enables us to show that it is highly unlikely that we will need a large “big-M” in phase I of our algorithm. We note that the condition numbers of the AI s have been studied before in the complexity of linear programming algorithms. The condition number ¯ A of Vavasis and Ye [1996] measures the condition number of the worst submatrix AI , and their algorithm runs in time proportional to ln( ¯ A). Todd et al. [2001] have shown that for a Gaussian random matrix the expectation of ln( ¯ A)isO(min(d ln n, n)). That is, they show that it is unlikely that any AI is exponentially ill-conditioned. It is relatively simple to apply the techniques of Section 5.1 to obtain a similar result in the smoothed case. We wonder whether our concentration result that it is exponentially unlikely that many AI are even polynomially ill-conditioned could be used to obtain a better smoothed analysis of the Vavasis–Ye algorithm.

1.5. DISCUSSION. One can debate whether the definition of polynomial smoothed complexity should be that an algorithm have complexity polynomial in 1/ or log(1/). We believe that the choice of being polynomial in 1/ will prove more useful as the other definition is too strong and quite similar to the notion of being polynomial in the worst case. In particular, one can convert any algorithm for linear programming whose smoothed complexity is polynomial in d, n and log(1/) into an algorithm whose worst-case complexity is polynomial in d, n, and L. That said, one should certainly prefer complexity bounds that are lower as a function of 1/, d and n. We also remark that a simple examination of the constructions that provide exponential lower bounds for various pivot rules [Klee and Minty 1972; Murty 1980; Goldfarb and Sit 1979; Goldfarb 1983; Avis and Chv´atal 1978; Jeroslow 392 D. A. SPIELMAN AND S.-H. TENG

1973] reveals that none of these pivot rules have smoothed complexity polynomial in n and subpolynomial in 1/. That is, these constructions are unaffected by exponentially small perturbations.

2. Notation and Mathematical Preliminaries In this section, we define the notation that will be used in the article. We will also review some background from mathematics and derive a few simple statements that we will need. The reader should probably skim this section now, and save a more detailed examination for when the relevant material is referenced. [n] —[n] denotes the set of integers between 1 and n, and k denotes the subsets of [n] of size k. —Subsets of [n] are denoted by the capital Roman letters I, J, L, K . M will denote a subset of integers, and K will denote a set of subsets of [n]. —Subsets of IR? are denoted by the capital Roman letters A, B, P, Q, R, S, T, U, V . ? —Vectors in IR are denoted by bold lower-case Roman letters, such as a i , a¯ i , a˜ i , bi , ci , di , h, t, q, z , y. —Whenever a vector, say a ∈ IR d is present, its components will be denoted by lower-case Roman letters with subscripts, such as a1,...,ad. —Whenever a collection of vectors, such as a 1,...,an, are present, the similar bold upper-case letter, such as A, will denote the matrix of these vectors. For ∈ [n] ∈ I k , AI will denote the matrix of those a i for which i I . —Matrices are denoted by bold upper-case Roman letters, such as A, A¯ , A˜ , B, M and Rω. —Sd1 denotes the unit sphere in IRd . —Vectors in S? will be denoted by bold Greek letters, such as ω, , . —Generally speaking, univariate quantities with scale, such as lengths or heights, will be represented by lower case Roman letters such as c, h, l, r, s, and t. The principal exceptions are that and M will also denote such quantities. —Quantities without scale, such as the ratios of quantities with scale or affine coordinates, will be represented by lower case Greek letters such as , , , , . will denote a vector of such quantities such as (1,...,d). —Density functions are denoted by lower case Greek letters such as and . —The standard deviations of Gaussian random variables are denoted by lower-case Greek letters such as , and . —Indicator random variables are denoted by upper case Roman letters, such as A, B, E, F, V , W, X, Y , and Z —Functions into the reals or integers will be denoted by calligraphic upper-case letters, such as F, G, S+, S0, T . ? —Functions into IR are denoted by upper-case Greek letters, such as 8,ϒ,9. —hx |yidenotes the inner product of vectors x and y. —For vectors ω and z ,weletangle (ω, z ) denote the angle between these vectors at the origin. Smoothed Analysis of Algorithms 393

—The logarithm base 2 is written lg and the natural logarithm is written ln. —The probability of an event A is written Pr [A], and the expectation of a variable X is written E [X]. —The indicator random variable for an event A is written [A].

2.1. GEOMETRIC DEFINITIONS. For the following definitions, we let a 1,...,ak denote a set of vectors in IRd .

—Span (a 1,...,ak)denotes the subspace spanned by a 1,...,ak. —Aff (a 1,...,Pak)denotes the hyperplaneP that is the affine span of a 1,...,ak: the = set of points i ia i , where i i 1, for all i. —ConvHull (a 1,...,ak)denotes the convex hull of a 1,...,ak. —ConeP (a 1,...,ak)denotes the positive cone through a 1,...,ak: the set of points i ia i , for i 0. —4 (a 1,...,ad)denotes the simplex ConvHull (a 1,...,ad).

For a linear program specified by a 1,...,an,y and z , we will say that the linear program is in general position if

—The points a 1,...,an are in general position with respect to y, which means [n] = 1 6∈ h | i6= that for all I ( d ) and x AI y I , and all j I , a j x yj. [n] 6∈ —For all I (d1), z Cone (AI ). Furthermore, we will say that the linear program is in general position with respect ∈ [n] to a vector t if the set of for which there exists an I (d1) such that

(1 )t + z ∈ Cone (AI ) is finite and does not contain 0. 2.2. VECTOR AND MATRIX NORMS. The material of this section is principally used in Sections 3.3 and 5.1. The following definitions and propositions are stan- dard, and may be found in standard texts on Numerical Linear Algebra. Definition 2.1 (Vector Norms). For a vector x, we define qP —kxk = x2. P i i k k = | | — x 1 i xi . —kxk∞ = maxi |xi |. PROPOSITION 2.2 (VECTORS NORMS). For a vector x ∈ IR d , √ k k k k k k . x x 1 d x Definition 2.3 (Matrix Norm). For a matrix A, we define def kAk = max kAxk / kxk . x PROPOSITION 2.4 (PROPERTIES OF MATRIX NORM). For d-by-d matrices A and B, and a d-vector x, (a) kAxkkAkkxk. (b) kABkkAkkBk. 394 D. A. SPIELMAN AND S.-H. TENG

k k=k Tk (c) A √A . (d) kAk dmaxi ka i k, where A = (a 1,...,ad). (e) det (A) kAkd.

Definition 2.5 (smin ()). For a matrix A, we define def 1 1 smin (A) = A .

We recall that smin (A) is the smallest singular value of the matrix A, and that it is not a norm.

PROPOSITION 2.6 (PROPERTIES OF smin()). For d-by-d matrices A and B,

(a) smin (A) = minx kAxk / kxk. (b) smin (B) smin (A) kA Bk. 2.3. PROBABILITY. For an event, A,welet[A]denote the indicator random variable for the event. We generally describe random variables by their density functions. If x has density , then Z def Pr [A(x)] = [A(x)] (x) dx .

If B is another event, then R def def [B(x)][A(x)](x)dx Pr [A(x)] = Pr [A(x)|B(x)] = R . B [B(x)](x)dx In a context where multiple densities are present, we will use use the notation Pr [A(x)] to indicate the probability of A when x is distributed according to . In many situations, we will not know the density of a random variable x,but rather a function such that (x) = c(x) for some constant c. In this case, we will say that x has density proportional to . The following Propositions and Lemmas will play a prominent role in the proofs in this article. The only one of these which might not be intuitively obvious is Lemma 2.11.

PROPOSITION 2.7 (AVERAGE MAXIMUM). Let (x, y) be a density function, and let x and y be distributed according to (x, y).IfA(x,y)is an event and X(x, y) is random variable, then Pr [A(x, y)] max Pr [A(x, y)] , and x,y x y E [X(x, y)] max E [X(x, y)] , x,y x y where in the right-hand terms, y is distributed in accordance with the induced distribution (x, y).

PROPOSITION 2.8 (EXPECTATION ON SUBDOMAIN). Let x be a random vari- able and A(x) an event. Let P be a measurable subset of the domain of x. Then, Pr [A(x)] Pr [A(x)] / Pr [x ∈ P] . x ∈P Smoothed Analysis of Algorithms 395

PROOF. By the definition of conditional probability, Pr [A(x)] = Pr [A(x)|x ∈ P] x ∈P = Pr [A(x) and x ∈ P] / Pr [x ∈ P] , by Bayes’ rule, Pr [A(x)] / Pr [x ∈ P] .

LEMMA 2.9 (COMPARING EXPECTATIONS). Let X and Y be nonnegative ran- dom variables and A an event satisfying (1) X k, (2) Pr [A] 1 , and (3) there exists a constant c such that E [X|A] c E [Y |A]. Then, E [X] c E [Y ] + k.

PROOF. E [X] = E [X|A] Pr [A] + E [X|not(A)] Pr [not(A)] c E [Y |A] Pr [A] + k c E [Y ] + k.

LEMMA 2.10 (SIMILAR DISTRIBUTIONS). Let X be a nonnegative random vari- able such that X k. Let and be density functions for which there exists a set S such that (1) Pr [S] > 1 and (2) there exists a constant c 1 such that for all a ∈ S, (a) c(a). Then, E [X(a)] c E [X(a)] + k.

PROOF. We write Z Z E [X] = X(a)(a) da + X(a)(a) da aZ∈S a6∈S c X(a)(a) da + k Za∈S c X(a)(a) da + k a = c E [X] + k.

LEMMA 2.11 (COMBINATION LEMMA). Let x and y be random variables dis- tributed in accordance with (x, y). Let F(x) and G(x, y) be nonnegative functions and and be constants such that

—∀ 0, Prx,y [F(x) ] , and 2 —∀ 0, maxx Pry [G(x, y) ] () , where in the second line y is distributed according to the induced density (x, y). Then Pr [F(x)G(x, y) ] 4. x,y 396 D. A. SPIELMAN AND S.-H. TENG

PROOF. Consider any x and y for which F(x)G(x, y) .Ifiis the integer for which 2i < F(x) 2i+1, then G(x, y) 2i /. Thus, F(x)G(x, y) , implies that either F(x) 2,or there exists an integer i 1 for which F(x) 2i+1 and G(x, y) 2i /. So, we obtain the bound Pr [F(x)G(x, y) ] , x y X Pr [F(x) 2] + Pr F(x) 2i+1 and G(x, y) 2i / , , x y x y X i 1 2 + Pr F(x) 2i+1 Pr G(x, y) 2i /F(x) 2i+1 , , x y x y Xi 1 2 + Pr F(x) 2i+1 max Pr G(x, y) 2i / , x y x y Xi 1 2 2 + 2i+1 2i , by Proposition 2.7, i1 X = 2 + 21i i1 = 4. As we have found this lemma very useful in our work, and we suspect others may as well, we state a more broadly applicable generalization. Its proof is similar.

LEMMA 2.12 (GENERALIZED COMBINATION LEMMA). Let x and y be random variables distributed in accordance with (x, y). There exists a function c(a, b) such that if F(x) and G(x, y) are nonnegative functions and , , a and b are constants such that

a —Prx,y [F(x) ] () , and b —maxx Pry [G(x, y) ] () , where in the second line y is distributed in accordance with the induced density (x, y), then Pr [F(x)G(x, y) ] c(a, b)min(a,b) lg(1/)[a=b], x,y where [a = b] is 1 if a = b, and 0 otherwise.

LEMMA 2.13 (ALMOST POLYNOMIAL DENSITIES). Let k > 0 and let t be a nonnegative random variable with density proportional to (t)tk such that, for some t0 > 0,

max (t) 0 t t0 . c min0tt0 (t) Smoothed Analysis of Algorithms 397

Then, k+1 Pr [t <]

PROOF.Fort0, the lemma is vacuously true. Assuming

2.4. GAUSSIAN RANDOM VECTORS. For the convenience of the reader, we recall some standard facts about Gaussian random variables and vectors. These may be found in Feller [1968, VII.1] and Feller [1971, III.6]. We then draw some corollaries of these facts and derive some lemmas that we will need later in the article. We first recall that a univariate Gaussian distribution with mean 0 and standard deviation has density 1 √ exp(a2/2 2), 2 and that a Gaussian random vector in IRd centered at a pointa ¯ with covariance matrix M has density 1 √ T 1 / . d exp (a a¯ ) M (a a¯ ) 2 2 det(M ) For positive-definite M , there exists a basis in which the density can be written

Yd 1 √ exp a2/2 2 , i i i=1 2 i 2 2 where 1 d are the eigenvalues of M . When all the eigenvalues of M are the same and equal to , we will refer to the density as a Gaussian distribution of standard deviation .

PROPOSITION 2.14 (ADDITIVITY OF GAUSSIANS). If a 1 is a Gaussian random vector with covariance matrix M 1 centered at a point a¯ 1 and a 2 is a Gaussian random vector with covariance matrix M 2 centered at a point a¯ 2, then a 1 + a 2 is the Gaussian random vector with covariance matrix M 1 +M 2 centered at a¯ 1 +a¯ 2.

LEMMA 2.15 (SMOOTHNESS OF GAUSSIANS). Let (x) be a Gaussian distri- bution of standard deviation centered at a point a¯. Let k 1, let dist (x,a¯) k 398 D. A. SPIELMAN AND S.-H. TENG and let dist (x, y) <k. Then, (y) exp 3k/22 . (x)

PROOF. By translating a¯, x and y, we may assume a¯ = 0 and kxk k.We then have ( ) y = k k2 k k2 / 2 exp y x 2 (x) exp 2 kx k + 2 /2 2 , as kyk kxk + exp 2k + 2 /2 2 , as kxk k exp 3k/2 2 as k.

PROPOSITION 2.16 (RESTRICTIONS OF GAUSSIANS). Let be a Gaussian dis- tribution of standard deviation centered at a point a¯. Let v be any vector and r be any real number. Then, the induced distribution (x|v T x = r) is a Gaussian distribution of standard deviation centered at the projection of a¯ onto the plane x : v T x = r .

PROPOSITION 2.17 (GAUSSIAN MEASURE OF HALFSPACES). Let ω be any unit vector in IR d and r any real. Then, Z 1 d √ [hω | gir]exp kgk2 /22 dg 2 g Z 1 t=r = √ exp t 2/2 2 dt 2 t=∞ PROOF. Immediate if one expresses the Gaussian density in a basis contain- ing ω. The distribution of the square of the norm of a Gaussian random vector is the Chi-Square distribution. We use the following weak bound on the Chi-Square dis- tribution, which follows from Equality (26.4.8) of Abramowitz and Stegun [1970].

PROPOSITION 2.18 (CHI-SQUARE BOUND). Let x be a Gaussian random vec- tor in IR d of standard deviation centered at the origin. Then, (k2)d/21 exp(k2/2) Pr [kxk k] . (5) 2d/210(d/2) From this, we derive

COROLLARY 2.19 (A CHI-SQUARE BOUND). Let x be a Gaussian random vec- tor in IR d of standard deviation centered at the origin. Then, for n 3 √ Pr kxk 3 d ln n n2.9d . Smoothed Analysis of Algorithms 399

Moreover, if n > d 3, and x 1,...,xn are such vectors, then h √ i 1 2.9d+1 n Pr max kx i k 3 d ln n n 0.0015 . i d √ PROOF.For=3ln n, we can apply Stirling’s formula [Abramowitz and Stegun 1970] to (5) to find √ √ (2d)d/21 exp(2d/2) exp(d/2) d/2 Pr kxk d √ d/21 / d/2 2 (d 2) 2 √ d/21 d/21 d d = 2 exp (2 1)d/2 √ 2d/21(d/2)d/22 d/21 1 = 2 exp (2 1)d/2 √ d d/2 2 exp (2 1)d/2 = exp 2 ln(2) 1 d/2 exp(2.9d ln n) = n2.9d , as (2 ln(2) 1) = 9 ln(n) ln(9 ln n) 1 ln(n)(9 ln 9 1) 5.8 ln(n). We also prove it is unlikely that a Gaussian random variable has small norm. PROPOSITION 2.20 (GAUSSIAN NEAR POINT OR PLANE). Let x be a d- dimensional Gaussian random vector of standard deviation centered anywhere. Then, √ d (a) For any point p, Pr [dist (x, p) ] min 1, e/d (/) , and (b) For a plane H of dimension h, Pr [dist (x, H) ] (/)dh. PROOF. Letx ¯ be the center of the Gaussian distribution, and let B(p) denote the ball of radius around p. Recall that the volume of B(p)is d/2d 2 . d0(d/2) To prove part (a), we bound the probability that dist (x, p) by Z 1 d √ exp k(x x¯ )k2 /2 2 dx 2 x∈B(p) d d/2d d √ 1 2 = 2 . 2 d0(d/2) d2d/20(d/2) By Proposition 2.21, we have for d 3 d/2 2 e . d2d/20(d/2) d Applying the inequality 2/(d2d/20(d/2)) 1 for all d 1, we establish (a). 400 D. A. SPIELMAN AND S.-H. TENG

To prove part (b), we consider a basis in which d h vectors are perpendicular to H, and apply part (a) to the components of x in the span of those basis vectors.

PROPOSITION 2.21 (GAMMA INEQUALITY). For d 3 d/2 2 e d 2 d /2 0 (d /2) d √ √ PROOF.Ford3, we apply the inequality 0(x +1) 2 x(x/e)x to show

/ 2 2 2e (d 2) 2 √ √ d2d/20(d/2) d2d/2 2 (d 2)/2 d 2 / exp((d 2)/2) d (d 2) 2 = √ √ dd/2 2 (d 2)/2 d 2 d/2 e , d where in the last√ inequality√ we used the facts 1 + 2/(d 2) exp(2/(d 2)) and d 3 implies 2 (d 1)/2 > 1.

PROPOSITION 2.22 (NONCENTRAL GAUSSIAN NEAR THE ORIGIN). For d 3, let x be a d-dimensional√ Gaussian random vector of standard deviation centered at x¯ . Then, for 1/( 2e) q √ d Pr kxk kx¯ k2 + d 2 2e . √ PROOF.√ Let = kx¯ k. We divide the analysis into two cases: (1) d, and (2) √ d . For d , h i h i p √ √ d Pr kxk 2 + d 2 Pr kxk 2d 2e , by Part (a) of√ Lemma 2.20. > For d , let√ Br be the ball of radius√ r around the origin. Applying the assumption 1/( 2e) and letting = c d for c 1, we have h p i h √ i Pr kxk 2 + d 2 Pr kxk 2 Z 1 d = √ exp k(x x¯ )k2 /2 2 dx 2 x∈B√ 2 d d/2 √ 1 2 2 d0(d/2) √ 2 d exp (1 1/e)22/2 2 Smoothed Analysis of Algorithms 401

√ d 2e d exp (1 1/e)22/2 2 d/2 d √ d = 2e d exp d(ln c c2(1 1/e)2/2) √ 2e d , √ where the second inequality holds because 1/( 2e) and for any point x ∈ √ B 2, √ 2 exp( k(x x¯ )k2 /2 2) exp 1 2 2/2 2 exp (1 1/e)22/2 2 ; the third inequality follows from Proposition 2.21; and, the last inequality holds because one can prove that for any c 1, ln c c2(1 1/e)2/2 < 0. Bounds such as the following on the tails of Gaussian distributions are standard (see, e.g., Feller [1968, Section VII.1])

PROPOSITION 2.23 (GAUSSIAN TAIL BOUND). Z exp x 2/2 2 1 ∞ √ √ exp t 2/2 2 dt x = 2 2 t x 3 exp x 2/2 2 √ . x x 3 2 Using this, we prove:

LEMMA 2.24 (COMPARING GAUSSIAN TAILS). Let 1 and let 1 (t) = √ exp t 2/2 2 . 2 Then, for x 2 and |x y| , R ∞ = (t) dt 8 Rt y 1 . (6) ∞ 2 t=x (t) dt 3 PROOF.Ify

We then combine this bound with Z 1 x+ exp(2/ 2) √ (t)dt √ , 2 t=x 2 to obtain R √ ! x+ (t) dt exp(2/ 2) 8 2 8 Rt=x √ = . ∞ / 2 2 t=x (t) dt 2 3 exp( 2 ) 3 PROPOSITION 2.25 (MONOTONICITY OF GAUSSIAN DENSITY). Let 1 (t) = √ exp t 2/2 2 . 2 (a) For all a > 0, (x)/(x + a) is monotonically increasing in x; and, (b) The following ratio is monotonically increasing in x R (x) ∞ t=x (t) dt PROOF. Part (a) follows from (x) = exp (2ax + a2)/22 , (x + a) and that exp(2ax) is monotonically increasing in x. To prove part (b) note that for all a > 0 R ∞ R ∞ R ∞ R ∞ (t) dt (x + t) dt (x + a + t) dt (t) dt t=x = t=0 t=0 = t=x+a , (x) (x) (x + a) (x + a) where the inequality follows from part (a). 2.5. CHANGES OF VARIABLES. The main proof technique used in Section 4 is change of variables. For the reader’s convenience, we recall how a change of variables affects probability distributions. PROPOSITION 2.26 (CHANGE OF VARIABLES). Let y be a random variable dis- tributed according to density .Ify =8(x), then x has density ∂8(x) (8(x)) det . ∂x | ∂y | Recall that det( ∂x ) is the Jacobian of the change of variables. We now introduce the fundamental change of variables used in this article. Let d a 1,...,ad be linearly independent points in IR . We will represent these points by specifying the plane passing through them and their positions on that plane. Many studies of the convex hulls of random point sets have used this change of variables (e.g., see Renyi and Sulanke [1963, 1964], Efron [1965], and Miles [1971]). We specify the plane containing a 1,...,ad by ω and r, where kωk = 1, r 0 and hω | a i i = r for all i. We will not concern ourselves with the issue that ω is ill-defined if the a 1,...,ad are affinely dependent, as this is an event of probability zero. To specify the positions of a 1,...,ad on the plane specified by (ω, r), we must choose a coordinate system for that plane. To choose a canonical Smoothed Analysis of Algorithms 403 set of coordinates for each (d 1)-dimensional hyperplane specified by (ω, r), we first fix a reference unit vector in IRd , say q, and an arbitrary coordinatization of the subspace orthogonal to q. For any ω 6= q,welet

Rω denote the linear transformation that rotates q to ω in the two-dimensional subspace through q and ω and that is the identity in the orthogonal subspace. Using Rω,we can map points specified in the d 1 dimensional hyperplane specified by r and ω to IRd by

a i = Rωbi + rω, d1 where bi is viewed both as a vector in IR and as an element of the subspace orthogonal to q. We will not concern ourselves with the fact that this map is not well defined if q =ω, as the set of a 1,...,ad that result in this coincidence has measure zero. The Jacobian of this change of variables is computed by a famous theorem of integral geometry due to Blaschke [1935] (for more modern treatments, see Miles [1971] or Santalo [1976, 12.24]), and actually depends only marginally on the coordinatizations of the hyperplanes. d 1 THEOREM 2.27 (BLASCHKE). For variables b1,...,bd taking values in IR , ω ∈ S d 1 and r ∈ IR , let

(a 1,...,ad)=(Rωb1 +rω,... ,Rωbd +rω) The Jacobian of this map is ∂ ,..., (a 1 ad) det =(d 1)!Vol (4 (b 1 ,...,bd)) . ∂(ω,r,b1,...,bd) That is,

da 1 dad =(d 1)!Vol (4 (b 1 ,...,bd)) dω dr db 1 dbd We will also find it useful to specify the plane by ω and s, where hsq | ωi = r,so that sq lies on the plane specified by ω and r. We will also arrange our coordinate system so that the origin on this plane lies at sq.

COROLLARY 2.28 (BLASCHKE WITH s ). For variables b1,...,bd taking val- ues in IR d 1 , ω ∈ S d 1 and s ∈ IR , let

(a 1,...,ad)=(Rωb1 +sq,... ,Rωbd +sq) The Jacobian of this map is ∂ ,..., (a 1 ad) det =(d 1)! hω | qi Vol (4 (b 1 ,...,bd)) . ∂(ω,s,b1,...,bd) PROOF. So that we can apply Theorem 2.27, we will decompose the map into three simpler maps: ,..., , , (b1 bd s ω) 7→ + 1 ,..., + 1 , , b1 Rω (sq rω) bd Rω (sq rω) s ω 404 D. A. SPIELMAN AND S.-H. TENG 7→ b + R1(sq rω),...,b +R1(sq rω),r,ω 1 ω d ω 7→ + 1 + ,..., + 1 + Rω b1 Rω (sq rω) rω Rω bd Rω (sq rω) rω = (Rωb1 +sq,...,Rωbd +sq)

1 As sq rω is orthogonal to ω, Rω (sq rω) can be interpreted as a vector in the d 1 dimensional space in which b1,...,bd lie. So, the first map is just a translation, and its Jacobian is 1. The Jacobian of the second map is ∂r = hq | ωi . ∂s

Finally, we note + 1 ,..., + 1 = ,..., , Vol b 1 R ω (s q r ω ) bd Rω (sq rω) Vol (b 1 bd) and that the third map is the one described in Theorem 2.27.

In Section 4.2, we will need to represent ω by c = hω | qi and ∈ Sd2, where gives the location of ω in the cross-section of Sd1 for which hω | qi = c. Formally, the map can be defined in a coordinate system with first coordinate q by p ω = c, 1 c2 .

For this change of variables, we have:

PROPOSITION 2.29 (LATITUDE AND LONGITUDE). The Jacobian of the change of variables from ω to (c, ) is ∂ (ω) 2 (d3)/2 det = (1 c ) . ∂(c, )

PROOF. We begin by changing ω to (,), where is the angle between ω and q, and represents the position of ω in the d 2 dimensional sphere of radius sin() of points at angle to q. To compute the Jacobian of this change of variables, we choose a local coordinate system on Sd1 at ω by taking the great circle through ω and q, and then an arbitrary coordinatization of the great d 2 dimensional sphere through ω orthogonal to the great circle. In this co- ordinate system, is the position of ω along the first great circle. As the d 2 dimensional sphere of points at angle to q is orthogonal to the great circle at ω, the coordinates in can be mapped orthogonally into the coordinates of the great d 2 dimensional sphere—the only difference being the radii of the sub- spheres. Thus, ∂ (ω) d2 det = sin() . ∂(,) Smoothed Analysis of Algorithms 405

FIG. 1. A shadow of a polytope. If we now let c = cos(), then we find ∂(ω) ∂(ω) ∂() det = det det ∂(c, ) ∂(,) ∂(c) p d2 1 = 1 c2 √ 1 c2 p d3 = 1 c2 .

3. The Shadow Vertex Method In this section, we will review the shadow vertex method and formally state the two-phase method analyzed in this article. We will begin by motivating the method. In Section 3.1, we will explain how the method works assuming a feasible vertex is known. In Section 3.2, we present a polar perspective on the method, from which our analysis is most natural. We then present a complete two-phase method in Section 3.3. For a more complete exposition of the Shadow Vertex Method, we refer the reader to Borgwardt [1980, Chap. 1]. The shadow-vertex simplex method is motivated by the observation that the simplex method is very simple in two-dimensions: the set of feasible points form a (possibly open) polygon, and the simplex method merely walks along the exterior of the polygon. The shadow-vertex method lifts the simplicity of the simplex method in two dimensions to higher dimensions. Let z be the objective function of a linear program and let t be an objective function optimized by x, a vertex of the polytope of feasible points for the linear program. The shadow-vertex method considers the shadow of the polytope—the projection of the polytope onto the plane spanned by z and t (see Figure 1). One can verify that (1) this shadow is a (possibly open) polygon, (2) each vertex of the polygon is the image of a vertex of the polytope, 406 D. A. SPIELMAN AND S.-H. TENG

(3) each edge of the polygon is the image of an edge between two adjacent vertices of the polytope, (4) the projection of x onto the plane is a vertex of the polygon, and (5) the projection of the vertex optimizing z onto the plane is a vertex of the polygon. Thus, if one walks along the vertices of the polygon starting from the image of x, and keeps track of the vertices’ pre-images on the polytope, then one will eventually encounter the vertex of the polytope optimizing z . Given one vertex of the polytope that maps to a vertex of the polygon, it is easy to find the vertex of the polytope that maps to the next vertex of the polygon: fact (3) implies that it must be a neighbor of the vertex on the polytope; moreover, for a linear program that is in general position with respect to t, there will be d such vertices. Thus, the method will be efficient provided that the shadow polygon does not have too many vertices. This is the motivation for the shadow vertex method.

3.1. FORMAL DESCRIPTION. Our description of the shadow vertex simplex method will be facilitated by the following definition: d n Definition 3.1 (optVert). Given vectors z , a 1,...,an in IR and y ∈ IR ,we ,..., define optVertz (a 1 an;y)to be the set of x solving maximize z T x

subject to ha i | xi yi , for 1 i n. If there are no such x, either because the program is unbounded or infeasible, we ,..., ∅ ,..., let optVertz (a 1 an;y)be . When a 1 an and y are understood, we will use the notation optVertz .

We note that, for linear programs in general position, optVertz will either be empty or contain one vertex. Using this definition, we will give a description of the shadow vertex method = assuming that a vertex x 0 and a vector t are known for which optVertt x 0. An algorithm that works without this assumption will be described in Section 3.3. Given t and z , we define objective functions interpolating between the two by

q = (1 )t + z . The shadow-vertex method will proceed by varying from 0 to 1, and tracking , ,..., optVertq . We will denote the vertices encountered by x 0 x 1 xk, and we ∈ ∈ ,+ will set i so that x i optVertq for [ i i 1]. As our main motivation for presenting the primal algorithm is to develop intuition in the reader, we will not dwell on issues of degeneracy in its description. We will present a polar version of this algorithm with a proof of correctness in the next section. primal shadow-vertex method ,..., { } = ,..., Input: a 1 an,y,z, and x 0 and t satisfying x 0 optVertt (a 1 an;y).

(1) Set 0 = 0, and i = 0. { } = ∈ , (2) Set 1 to be maximal such that x 0 optVertq for [ 0 1]. Smoothed Analysis of Algorithms 407

FIG. 2. In example (a), optSimp = {{a 1,a 2,a 3}}. In example (b), optSimp ={{a1,a2,a3},{a2,a3, a4}}. In example (c), optSimp =∅,

(3) while i+1 < 1, (a) Set i = i + 1. > ∈ ∈ , (b) Find an x i for which there exists a i+1 i such that x i optVertq for [ i i+1]. If no such x i exists, return unbounded. ∈ ∈ , (c) Let i+1 be maximal such that x i optVertq for [ i i+1].

(4) return x i .

Step (b) of this algorithm deserves further explanation. Assuming that the lin- ear program is in general position with respect to t, each vertex x i will have exactly d neighbors, and the vertex x i+1 will be one of these [Borgwardt 1980, Lemma 1.3]. Thus, the algorithm can be described as a simplex method. While one could implement the method by examining these d vertices in turn, more efficient implementations are possible. For an efficient implementation of this algorithm in tableau form, we point the reader to the exposition in Borgwardt [1980, Section 1.3]. 3.2. POLAR DESCRIPTION. Following Borgwardt [1980], we will analyze the shadow vertex method from a polar perspective. This polar perspective is natural provided that all yi > 0. In this section, we will describe a polar variant of the shadow-vertex method that works under this assumption. In the next section, we will describe a two-phase shadow vertex method that uses this polar variant to solve linear programs with arbitrary yi s. While it is not strictly necessary for the results in this article, we remind the reader that the polar of a polytope P = {x : hx | a i i 1, ∀i}, is defined to be {y : hx | yi 1, ∀x ∈ P}. This equals ConvHull (0,a 1,...,an). We remark that P is bounded if and only if 0 is in the interior of ConvHull (a 1,...,an). The polar motivates: d n Definition 3.2 (optSimp). For z and a 1,...,an in IR and y ∈ IR , y i > 0, we ,..., ∈ [n] let optSimpz (a 1 an;y)denote the set of I ( d ) such that AI has full rank, 4 ((a i /yi )i∈I ) is a facet of ConvHull (0,a 1/y1,...,an/yn)and z ∈ Cone ((a i )i∈I ). ,..., When y is understood to be 1, we will write optSimpz (a 1 an) When ,..., a 1 an and y are understood, we will use the notation optSimpz .

We remark that for y, z and a 1,...,an in general position, the set ,..., optSimpz (a 1 an;y)will be the empty set or contain just one set of indices I . For examples, see Figure 2. The following proposition follows from the duality theory of linear programming: ,..., > ∈ / ,..., PROPOSITION 3.3 (DUALITY). For y 1 yn 0,I optSimpz (a 1 y1 / ∈ ,..., an yn)if and only if there exists an x such that x optVertz (a 1 an;y)and hx | a i i = yi , for all i ∈ I. 408 D. A. SPIELMAN AND S.-H. TENG

We now state the polar shadow vertex method. polar shadow-vertex method Input:

—a 1,...,an,z, and y1,...,yn >0, ∈ [n] ∈ / ,..., / —I ( d ) and t satisfying I optSimpt (a 1 y1 an yn).

(1) Set 0 = 0 and i = 0.

(2) Set 1 to be maximal such that for ∈ [0,1], ∈ / ,..., / . I optSimpq (a 1 y1 an yn)

(3) while i+1 < 1, (a) Set i = i + 1.

(b) Find a j and k for which there exists a i+1 >i such that ∪ { } { } ∈ / ,..., / I j k optSimpq (a 1 y1 an yn)

for ∈ [i ,i+1]. If no such j and k exist, return unbounded. (c) Set I = I ∪ { j} {k}. ∈ / ,..., / ∈ , (d) Let i+1 be maximal such that I optSimpt (a 1 y1 an yn)for [ i i+1]. (4) return I .

,..., The x optimizing the linear program, namely optVertz (a 1 an;y),isgiven by the equations hx | a i i = yi , for i ∈ I . Borgwardt [1980, Lemma 1.9] establishes that such j and k can be found in step / ,..., / 6= ∅ (b) if there exists an for which optSimpq + (a 1 y1 an yn) . That the algorithm may conclude that the program is unboundedi if a j and k cannot be found in step (b) follows from:

PROPOSITION 3.4 (DETECTING UNBOUNDED PROGRAMS). If there is an i and > + < / ,..., / =∅ an 0such that i 1and optSimpq + (a 1 y1 an yn) , then / ,..., / =∅ i optSimpz (a 1 y1 an yn) . / ,..., / 6∈ PROOF. The set optSimpq + (a 1 y1 an yn)is empty if and only if q + i i Cone (a 1,...,an). The proof now follows from the facts that Cone (a 1,...,an) is a convex set and q i + is a positive multiple of a convex combination of t and z . The running time of the shadow-vertex method is bounded by the number of vertices in shadow of the polytope defined by the constraints of the linear program. Formally, this is

d Definition 3.5 (Shadow). For independent vectors t and z , a 1,...,an in IR and y ∈ IR n , y > 0, [ ,..., def= / ,..., / . Shadowt,z (a 1 an;y) optSimpq (a 1 y1 an yn) q∈Span(t,z )

If y is understood to be 1, we will just write Shadowt,z (a 1,...,an). Smoothed Analysis of Algorithms 409

3.3. TWO-PHASE METHOD. We now describe a two-phase shadow vertex method that solves linear programs of form maximize hz | xi

subject to ha i | xi yi , for 1 i n. (LP) There are three issues that we must resolve before we can apply the polar shadow vertex method as described in Section 3.2 to the solution of such programs: (1) the method must know a feasible vertex of the linear program, (2) the linear program might not even be feasible, and (3) some yi might be non-positive. The first two issues are standard motivations for two-phase methods, while the third is motivated by the polar perspective from which we prefer to analyze the shadow vertex method. We resolve these issues in two stages. We first relax the constraints of LP to construct a linear program LP0 such that (a) the right-hand vector of the linear program is positive, and (b) we know a feasible vertex of the linear program. After solving LP0, we construct another linear program, LP+, in one higher dimen- sion that interpolates between LP and LP0. LP+ has properties (a) and (b), and we can use the shadow vertex method on LP+ to transform a solution to LP0 into a solution of LP. Our two-phase method first chooses a d-set I to define the known feasible vertex of LP0. The linear program LP0 is determined by A, z and the choice of I . However, 0 the magnitude of the right-hand entries in LP depends upon smin (AI ). To reduce the chance that these entries will need to be large, we examine several randomly chosen d-sets, and use the one maximizing smin. The algorithm then sets

d k , k e+ M = 2 lg(maxi yi a i ) 2, b c = 2 lg(smin(AI )) , and ( M for i ∈ I y0 = √ i dM2/4 otherwise. These define the program LP0: maximize hz | xi h | i 0, . 0 subject to a i x yi for 1 i n (LP ) 0 By Proposition 3.6, AI is a feasible basis for LP and optimizes any objective 0 function of the form AI , for > 0. Our two-phase algorithm will solve LP by starting the polar shadow-vertex algorithmP at the basis I and the objective function 2 AI for a randomly chosen satisfying i = 1 and i 1/d , for all i. 0 PROPOSITION 3.6 (INITIAL SIMPLEX OF LP ). For every > 0, = ,..., 0 . I optSimpAI a 1 an;y 410 D. A. SPIELMAN AND S.-H. TENG

0 PROOF. Let x be the solution to the linear system h | 0i= 0, ∈ . a i x yi for i I By Definition 2.3 and Proposition 2.4(a), √ √ k 0kk 0kk 1k k 1k= / . x yI AI M d AI M d smin (AI ) So, for all i 6∈ I , √ √ 0 2 ha i | x i(max ka i k)M d/smin (AI ) < M d/4. i Thus, for all i 6∈ I , h | 0i < 0, a i x yi = ,..., 0 and, by Definition 3.2, I optSimpAI a 1 an;y . We will now define a linear program LP+ that interpolates between LP0 and LP. This linear program will contain an extra variable x0 and constraints of the form 1 + x 1 x ha | xi 0 y + 0 y0, i 2 i 2 i and 1 x0 1. So, for x0 = 1, we see the original program LP while for 0 x0 =1wegetLP . Formally, we let  0 / ,  ((yi yi ) 2 a i ) for 1 i n + = (1, 0,... ,0) for i = 0 ai  (1, 0,... ,0) for i =1  0+ / (yi yi) 2 for 1 i n + = 1 for i = 0 yi  1 for i =1 z+ =(1, 0,... ,0), and we define LP+ by + maximize hz | (x0, x)i h + | , i +, , + subject to ai (x0 x) yi for 1 i n (LP ) and we set + def= + ,..., + . y (y1 yn ) √ By Proposition 3.7, dM/4 1, so y0 M and y+ > 0, for all i.IfLP is + i i infeasible, then the solution to LP will have x0 < 1. If LP is feasible, then the solution to LP+ will have the form (1, x) where x is a feasible point for LP.Ifwe use the shadow-vertex method to solve LP+ starting from the appropriate initial vector, then x will be an optimal solution to LP. PROPOSITION√ 3.7 (RELATION OF M AND ). For M and as set by the algo- rithm, dM/4 1. Smoothed Analysis of Algorithms 411

k k √ PROOF. By definition, smin (AI ). On the other hand, smin (AI ) AI d maxi ka i k, by Proposition 2.4(d). Finally, M 4 maxi ka i k. We now state and prove the correctness of the two-phase shadow vertex method. two-phase shadow-vertex method Input: A = (a 1,...,an),y,z.

I = { ,..., } [n] ∈ I (1) Let I1 I3nd ln n be a collection of randomly chosen sets in ( d ), and let I be the set maximizing smin (AI ). d k , k e+ b c (2) Set M = 2 lg(maxi yi a i ) 2 and = 2 lg(smin(AI )) . ( M for i ∈ I (3) Set y0 = √ i dM2/4 otherwise. P 2 0 (4) Choose uniformly at random from { : i = 1 and i 1/d }. Set t = AI . (5) Let J be the output of the polar shadow vertex algorithm on LP0 on input I and t 0.IfLP0 is unbounded, then return unbounded. (6) Let >0 be such that { } ∪ ∈ + / + ,..., +/ + . 1 J optSimp(,z) a1 y1 an yn

(7) Let K be the output of the polar shadow vertex algorithm on LP+ on input {1} ∪ J,(,z). , h , | +i= ∈ (8) Compute (x0 x) satisfying (x0 x) a i yi for i K .

(9) If x0 < 1, return infeasible. Otherwise, return x.

The following propositions prove the correctness of the algorithm.

PROPOSITION 3.8 (UNBOUNDED PROGRAMS). The following are equivalent:

(a) LP is unbounded; (b) LP0 is unbounded; >> + ,..., + + (c) there exists a 1 0such that optSimp(1,0)+(1)(,z)(a1 an ;y ) is empty; >> + ,..., + + (d) for all 1 0,optSimp(1,0)+(1)(,z)(a1 an ;y )is empty. 0 PROPOSITION 3.9 (BOUNDED PROGRAMS). If LP is bounded and has solution J, then ∀> {}∪ ∈ + ,..., + + (a) there exists 0 such that 0, 1 J optSimp(,z)(a1 an ;y ); 0 ∈ ,..., (b) if LP is feasible, then for K optSimpz (a 1 an;y), there exists 0 such ∀> {}∪ 0∈ + ,..., + + that 0, 0 K optSimp(,z)(a1 an ;y ); and, (c) if we use the shadow vertex method to solve LP+ starting from {1, J} and objective function (,z), then the output of the algorithm will have form {0} ∪ K 0, where K 0 is a solution to LP.

PROOF OF PROPOSITION 3.8. LP is unbounded if and only if there exists a vector 0 v such that hz | vi > 0 and ha i | vi0 for all i. The same holds for LP , and establishes the equivalence of (a) and (b). To show that (a) or (b) implies (d), 412 D. A. SPIELMAN AND S.-H. TENG observe h(1, 0) + (1 )(,z) | (0, v)i = (1 ) hz | vi > 0, (8) + a | (0, v) = ha | vi , for i = 1,... ,n, (9) i i + a | (0, v) = 0, and 0 + | , = . a 1 (0 v) 0 + + To show that (c) implies (a) and (b), note that a 0 and a 1 are arranged so that if for some v we have 0 + | , , , ai (v0 v) 0 for 1 i n then v0 = 0. This identity allows us to apply (8) and (9) to show (c) implies (a) and (b). 0 0 = 1 0 PROOF OF PROPOSITION 3.9. Let J be the solution to LP and let x AJ y J be the corresponding vertex. We then have 0 x 0 | a = y , for i ∈ J, and i i 0 | 0 , 6∈ . x a i yi for i J Therefore, it is clear that + + (1, x 0) | a = y , for i ∈ {1} ∪ J, and i i , 0 | + +, 6∈ { } ∪ . ( 1 x ) ai yi for i 1 J 4 + , + + Thus, (a 1 (ai )i∈J ) is a facet of LP . To see that there exists a 0 such that it optimizes (,z) for all >0, first observe that there exist i > 0, for i ∈ J, P P + such that ∈ ia i = z . Now, let (0, z ) = ∈ i a .For>0,wehave i J Xi J i , = + + +, ( z) ( 0)a1 iai i∈J , ∈ + , + which proves ( z) Cone(a 1 (ai )i∈J ) and completes the proof of (a). The proof of (b) is similar. To prove part (c), let K be as in step (7). Then, there exists a k such that for all ∈ ( , 1), k = + ,..., + + . K optSimp(1)(,z)+z+ a1 an ;y , , | + = + ∈ Let (x0 x) satisfy (x0 x) ai yi , for i K . Then, by Proposition 3.3, , = + ,..., + + . (x0 x) optVert(1)(,z)+z+ a1 an ;y < = ,..., If x0 1, then LP was infeasible. Otherwise, let x optVertz (a 1 an;y). By part (b), there exists such that for all >, 0 0 , = + ,..., + + . (1 x ) optVert(,z) a1 an ;y For =+/(1 ), we have 1 (,z) = ((1 )(,z) + z+). 1 Smoothed Analysis of Algorithms 413

So, as approaches 1, =+/(1 ) goes to infinity and we have + ,..., + + = + ,..., + + , optVert(1)(,z)+z+(a1 an ;y ) optVert(,z)(a1 an ;y ) which implies (x0, x) = (1, x ). Finally, we bound the number of steps taken in step (7) by the shadow size of a related polytope: + + ,..., + + ,..., + LEMMA 3.10 (SHADOW PATH OF LP ). For a 1 an and y1 yn as + { }∪ = + / + ,..., +/ + > defined in LP ,if 1 J optSimp(,z)(a1 y1 an yn ) for 0, then the number of simplex steps made by the polar shadow vertex algorithm while solving LP+ from initial basis {1} ∪ J and vector (,z) is at most + + + + + + / ,..., / . 2 Shadow(0,z ),z a 1 y1 an yn PROOF. We will establish that {1} ∈ I for the first step only. One can similarly prove that {0} ∈ I is only true at termination. ∈ + / + ,..., +/ + { }∪ = + ∈ Let I optSimpq (a 1 y1 an yn ) have form 1 L.Asq0 a1 Cone(A{1}∪L ), and Cone(A{1}∪L ) is a convex set, we have q 0 ∈ Cone(A{1}∪L ) 0 for all 0 .As[i,i+1] is exactly the set of optimized by 4 (AI ) in the ith step of the polar shadow vertex method, I must be the initial set. 3.4. DISCUSSION. We note that our analysis of the two-phase algorithm actually takes advantage of the fact that and M have been set to powers of two. In particular, this fact will be used to show that there are not too many likely choices for and M. For the reader who would like to drop this condition, we briefly explain how the argument of Section 5 could be modified to compensate: first, we could consider setting and M to powers of 1 + 1/poly(n, d, 1/). This would still result in a polynomially bounded number of choices for and M. One could then drop this assumption by observing that allowing and M to vary in a small range would not introduce too much dependency between the variables.

4. Shadow Size In this section, we bound the expected size of the shadow of the perturbation of a polytope onto a fixed plane. This is the main geometric result of this article. The algorithmic results of this article will rely on extensions of this theorem derived in Section 4.3. THEOREM 4.1 (SHADOW SIZE). Let d 3 and n > d. Let z and t be indepen- d d dent vectors in IR , and let 1,...,n be Gaussian distributions in IR of standard deviation centered at points each of norm at most 1. Then,

E [|Shadowt,z (a 1,...,an)|]D(n,d,), (10) a 1,... ,a n where 58, 888, 678 nd3 D(n, d,)= √ , min(, 1/3 d ln n)6 Q ,..., n and a 1 an have density i=1 i (a i ). The proof of Theorem 4.1 will use the following definitions. 414 D. A. SPIELMAN AND S.-H. TENG

Definition 4.2 (ang). For a vector q and a set S, we define

ang (q, S) = min angle (q, x) . x ∈S If S is empty, we set ang (q, ∅) =∞.

d Definition 4.3 (angq ). For a vector q and points a 1,...,an in IR , we define ,..., = ,∂ 4 ,..., , angq (a 1 an) ang (q (optSimpq (a 1 an))) ∂ 4 ,..., 4 ,..., where (optSimpq (a 1 an)) is the boundary of (optSimpq (a 1 an)). These definitions are arranged so that if the ray through q does not pierce the ,..., ,..., =∞ convex hull of a 1 an, then angq (a 1 an) . In our proofs, we will make frequent use of the fact that it is very unlikely that a Gaussian random variable is far from its mean. To capture this fact, we define:

Definition 4.4 (P).Pis the set of (a 1,...,an) for which ka i k2, for all i. Applying a union bound to Corollary 2.19, we obtain

PROPOSITION 4.5 (MEASURE OF P).

2.9d 2.9d+1 Pr [(a 1,...,an)∈ P]1n(n )=1n .

√ PROOF OF THEOREM√4.1. We first observe that we can assume 1/3 d√ln n—if >1/3 dln n, then we can scale down all the data until = 1/3 d ln n. As this could only decrease the norms of the centers of the distribu- tions, the theorem statement would be unaffected. Assume without loss of generality that z and t are orthogonal. Let

q = z sin() + t cos(). (11)

We discretize the problem by using the intuitively obvious fact, which we prove as Lemma 4.6, that the left-hand of (10) equals   [  ,..., . lim E optSimpq (a 1 an) m→∞ a 1,... ,a n ∈{ 2 , 22 ,... , m2 } m m m

Let Ei denote the event h i optSimp (a ,...,a )6= optSimp (a ,...,a ) . q 2i/m 1 n q 2((i+1) mod m)/m 1 n

Then, for any m 2 and for all a 1,...,an, [ Xm ,..., = ,..., . optSimpq (a 1 an) Ei(a1 an) ∈{2 ,22 ,... , m2 } i=1 m m m Smoothed Analysis of Algorithms 415

We bound this sum by " # " # " # Xm X X E Ei = E Ei Pr [P] + E Ei Pr P¯ P P¯ i=1 " i # i X n 2.9d+1 E Ei + n P d " i # X E Ei + 1. P i P Thus, we will focus on bounding EP i Ei . Observing that E implies [ang (a ,...,a ) 2/m], and applying lin- i q 2i/m 1 n earity of expectation, we obtain " # X Xm E Ei = Pr [Ei ] P = P i i 1 Xm ,..., < 2 Pr angq / (a 1 an) P 2 i m i=1 m 9,372, 424 nd3 2 , by Lemma 4.7, 6 , , 3 58 888 677 nd . 6

LEMMA 4.6 (DISCRETIZATION IN LIMIT). Let z and t be orthogonal vectors in d IR , and let 1,...,n be nondegenerate Gaussian distributions. Then, " # [ E optSimp (a 1,...,an) = a ,... ,a q a 1 a n ∈ , q Span(z t)   [  ,..., , lim E optSimpq (a 1 an) (12) m→∞ a 1,... ,a n ∈{ 2 , 22 ,... , m2 } m m m where q is as defined in (11). ∈ [n] PROOF. For a I ( d ), let Z ,..., = ,..., = . FI (a 1 an) optSimpq (a 1 an) I d The left- and right-hand sides of (12) can differ only if there exists a >0 such that for all >0, I = optSimp (a ,...,a )for some , and ∃ q 1 n . Pr I ,..., < a 1,... ,a n FI (a 1 an) 416 D. A. SPIELMAN AND S.-H. TENG

As there are only finitely many choices for I , this would imply the existence of a 0 and a particular I such that for all >0, I = optSimp (a ,...,a )for some , and 0 q 1 n . Pr ,..., < a 1,... ,a n FI (a 1 an) ,..., = = ,..., As FI (a 1 an) FI(AI) given that I optSimpq (a 1 an)for some , this implies that for all >0, I = optSimp (A ) for some , and 0 q I . Pr < (13) a 1,... ,a n FI (AI ) = ∈ Note that I optSimpq (AI ) if and only if q Cone (AI ).Now,let Z

G(AI)= [q ∈Cone (AI )](ang (q ,∂ 4(AI))/) d. As G(A ) F (A ), (13) implies that for all >0 I I I I = optSimp (a ,...,a )for some , and 0 q 1 n . Pr < a 1,... ,a n G(AI ) However, G is a continuous function, and therefore measurable, so this would imply I = optSimp (a ,...,a )for some , and 0 q 1 n , Pr = a 1,... ,a n G(AI ) 0 which is clearly false as the set of AI satisfying

—G(AI ) = 0, and ∃ ,..., ={ } — : optSimpq (a 1 an) AI has codimension 1, and so has measure zero under the product distribution of nondegenerate Gaussians. LEMMA 4.7 (ANGLE BOUND). Let d 3 and n > d. Let q be any unit d vector and√ let 1,...,n be Gaussian measures in IR of standard deviation 1/3 d ln n centered at points of norm at most 1. Then, , , 3 ,..., < 9 372 424 nd , Pr [angq (a 1 an) ] P 6 where a 1,...,an have density Yn i (a i ). i=1 The proof will make use of the following definition: j ∈ [n] ∈ j Definition 4.8 (PI ). For a I ( d ) and j I , we define PI to be the set of a 1,...,ad satisfying ,..., 6= ∅ (1) For all q,ifoptSimpq (a 1 an) , then s 2, where s is the real number ∈4 ,..., for which sq (optSimpq (a 1 an)), (2) dist (a i ,a k) 4, for i, k ∈ I { j}, (3) dist (a j , Aff(AI { j})) 4, and Smoothed Analysis of Algorithms 417

⊥, ∈ { } ⊥ (4) dist (a j a i ) 4, for all i I j , where a j is the orthogonal projection of a j onto Aff(AI { j}). j , j PROPOSITION 4.9 (P PI ). For all j I, P PI .

PROOF. Parts (2), (3), and (4) follow immediately from the restrictions ka i k 2. To see why part (1) is true, note that sq lies in the convex hull of a 1,...,an, and so its norm, s, can be at most maxi ka i k 2, for (a 1,...,an)∈ P. PROOF OF LEMMA 4.7. Applying a union bound twice, we write ,..., < Pr [angq (a 1 an) ] P X optSimp (a ,...,a )= I and Pr q 1 n P ang(q,∂ 4(AI))< I XXd optSimp (a ,...,a )= I and Pr q 1 n P ang(q, 4 AI { j} ) < I j=1 XXd . optSimp (a ,...,a )= I and Pr q 1 n Pr [P] j ang(q, 4 AI { j} ) < j I j=1 PI PI (by Proposition 2.8) X Xd . optSimp (a ,...,a )= I and Pr q 1 n Pr [P] j ang(q, 4 AI { j} ) < I j=1 PI (by P P j ) I X Xd ,..., = 1 optSimpq(a 1 an) I and . + Pr 1 n 2 9d 1 j ang(q, 4 AI { j} ) < I j=1 PI (by Proposition 4.5) Xd X ,..., = 1 optSimp q (a 1 an) I and , . + Pr 1 n 2 9d 1 j ang q, 4 AI { j} <, j=1 I PI by changing the order of summation. We now expand the inner summation using Bayes’ rule to get X ,..., = optSimp q (a 1 an) I and Pr , 4 < P j ang q AI { j} I I X = ,..., = Pr [optSimpq (a 1 an) I] P j I I ang q, 4 A { } < Pr I j (14) j optSimp (a 1,...,an)= I PI q ,..., As optSimpq (a 1 an)is a set of size zero or one with probability 1, X ,..., = Pr [optSimpq (a 1 an) I] 1; I 418 D. A. SPIELMAN AND S.-H. TENG from which we derive X ,..., = Pr [optSimpq (a 1 an) I] P j I XI ,..., = j Pr [optSimpq (a 1 an) I] Pr PI I (by Proposition 2.8) 1 X Pr [optSimp (a ,...,a )= I] 2.9d+1 q 1 n 1 n I j (by P PI and Proposition 4.5)

1 . 1 n2.9d+1 So, , 4 < 1 . ang q AI { j} . (14) . + max Pr 1 n 2 9d 1 I j optSimp (a 1,...,an)= I PI q Plugging this bound in to the first inequality derived in the proof, we obtain the bound of ,..., < Pr [angq (a 1 an) ] P , 4 < d ang q AI { j} . + max Pr (1 n 2 9d 1)2 j,I j optSimp (a 1,...,an)= I PI q 9,372, 424 nd3 d , by Lemma 4.11, d 3 and n d + 1, 6 , , 3 = 9 372 424 nd . 6 d 1 Definition 4.10 (Q). We define Q to be the set of (b1,...,bd) ∈ IR satis- fying

(1) dist (b1, Aff (b2,...,bd))4, (2) dist (bi , b j ) 4 for all i, j 2, ⊥, ⊥ (3) dist (b1 bi ) 4 for all i 2, where b1 is the orthogonal projection of b1 onto Aff (b2,...,bd), and (4) 0 ∈4(b1,...,bd).

LEMMA 4.11 (ANGLE BOUND GIVEN OPTSIMP√). Let 1,...,n be Gaussian measures in IR d of standard deviation 1/3 d ln n centered at points of norm at most 1. Then ang(q, 4 (a ,...,a ))< 9,371, 990 nd2 Pr 2 d , (15) 1 optSimp (a ,...,a )={1,... ,d} 6 P1,... ,d q 1 n Smoothed Analysis of Algorithms 419 where a 1,...,an have density Yn i (a i ). i=1

PROOF. We begin by making the change of variables from a 1,...,ad to ω, s, b1,...,bd described in Corollary 2.28, and we recall that the Jacobian of this change of variables is

(d 1)! hω | qi Vol (4 (b 1 ,...,bd)) .

As this change of variables is arranged so that sq ∈4(a1,...,ad)if and only if ∈4 ,..., ,..., ={ ,... , } 0 (b1 bd), the condition that optSimpq (a 1 an) 1 d can be expressed as Y [0 ∈4(b1,...,bd)] [hω |ajihω|sqi]. j>d

Let x be any point on 4 (a 2,...,ad). Given that sq ∈4(a1,...,ad), conditions 1 (3) and (4) for membership in P1,... ,d imply that q √ , , , ,..., 2 + ⊥, 2 , dist (sq x) dist (a 1 x) dist (a 1 Aff (a 2 ad)) dist a 1 x 4 2 ⊥ ,..., where a 1 is the orthogonal projection of a 1 onto Aff (a 2 ad). So, Lemma 4.12 implies

dist (sq, Aff (a 2,...,ad)) hω | qi ang (q, 4 (a 2,...,ad)) √ 2 + 4 2 , ,..., h | i = dist (0 Aff (b2 √ bd)) ω q . 2 + 4 2 ,..., ∈ 1 Finally, observe that (a 1 ad) P1,...,d is equivalent to the conditions s 2 ,..., ∈ ,..., = { ,..., } and (b1 bd) Q, given that optSimpq (a 1 ad) 1 d .Now,the left-hand side of (15) can be bounded by dist (0, Aff (b ,...,b )) hω | qi Pr 2 √ d < , (16) ω,s2 2 + 4 2 (b1,... ,bd )∈Q where the variables have density proportional to

hω | qi Vol (4 (b 1 ,...,bd)) Z ! Y Yd [hω | a j ishω|qi]j(aj)daj i(Rωbi +sq). j>d a j i=1 As Lemma 4.13 implies 900 e2/3d2 Pr [dist (0, Aff (b2,...,bd)) <] , ω,s2 4 (b1,... ,bd )∈Q 420 D. A. SPIELMAN AND S.-H. TENG and Lemma 4.16 implies 340n 2 max Pr h | i < < , [ ω q ] 2 s2,b1,... ,bd ∈Q ω we can apply Lemma 2.11 to prove √ 900 e2/3d2 340n 9, 371, 990 nd2 (16) 4 (2 + 4 2) . 4 2 6

LEMMA 4.12 (DIVISION INTO DISTANCE AND ANGLE). Let x be a vector, let 0 < s 2, and let q and ω be unit vectors satisfying h | i = (a) ω x sq √0, and (b) dist (x, sq) 4 2. Then, dist (x, sq) hω | qi angle (q, x) √ . 2 + 4 2 PROOF. Let r = x sq. Then, (a) implies r 2 hω | qi2 + q kqk = 1; krk so, q hr | qi 1 hω | qi2 krk . Let h be the distance from x to the ray through q. Then, h2 + hr | qi2 = krk2 ; so, h hω | qikrk=hω |qidist (x, sq) Now, h angle (q, x) sin(angle (q, x)) = kxk h | i , h h √ ω q dist√(x sq). s + dist (x, sq) 2 + 4 2 2 + 4 2 4.1. DISTANCE. The goal of this section is to prove it is unlikely that 0 is near ∂ 4 (b1,...,bd).

LEMMA 4.13 (DISTANCE BOUND). Let q be a unit vector√ and let 1,...,n be Gaussian measures in IR d of standard deviation 1/3 d ln n centered at points of norm at most 1. Then, 900 e2/3d2 Pr [dist (0, Aff (b2,...,bd)) <] , (17) ω,s2 4 (b1,... ,bd )∈Q Smoothed Analysis of Algorithms 421 where the variables have density proportional to

hω | qi Vol (4 (b 1 ,...,bd)) Z ! Y Yd [hω | a j ishω|qi]j(aj)daj i(Rωbi +sq). j>d a j i=1 PROOF. Note that if we fix ω and s, then the first and third terms in the density become constant. For any fixed plane specified by (ω, s), Proposition 2.16 tells us that the induced density on bi remains a Gaussian of standard deviation and is centered at the projection of the center of i onto the plane. As the origin of this plane is the point sq, and s 2, these induced Gaussians have centers of norm at most 3. Thus, we can use Lemma 4.14 to bound the left-hand side of (17) by 900 e2/3d2 max Pr dist (0, Aff ( ,..., )) < . [ b2 bd ] 4 ω,s2 (b1,... ,bd )∈Q

LEMMA 4.14 (DISTANCE BOUND IN PLANE). Let√ 1,...,d be Gaussian measures in IR d 1 . of standard deviation 1/3 d ln n centered at points of norm at most 3. Then 900 e2/3d2 Pr dist (0, Aff ( ,..., )) < , (18) [ b2 bd ] 4 b1,... ,bd ∈Q where b1,...,bd have density proportional to Yd Vol (4 (b 1 ,...,bd)) i (bi ). i=1

PROOF. In Lemma 4.15, we will prove that it is unlikely that b1 is close to Aff (b2,...,bd). We will exploit this fact by proving that it is unlikely that 0 is much closer than b1 to Aff (b2,...,bd). We do this by fixing the shape of 4 (b1,...,bd), and then considering slight translations of this simplex. That is, we make a change of variables to 1 Xd h = bi d i=1 di = h bi , for i 2.

The vectors d2,...,dd specify the shape of the simplex, and h specifies its location. As this change of variables is a linear transformation,P its Jacobian is constant. For = = convenience, we also define d1 h b1 i2di. (see Figure 3.) It is easy to verify that

0 ∈4(b1,...,bd) ⇔ h ∈4(d1,...,dd), dist (0, Aff (b2,...,bd)) = dist (h, Aff (d2,...,dd)) , dist (b1, Aff (b2,...,bd)) = dist (d1, Aff (d2,...,dd)) , and Vol (4 (b 1 ,...,bd)) = Vol (4 (d 1 ,...,dd)) .

Note that the relation between d1 and d2,...,dd guarantees 0 ∈4(d1,...,dd) for all d2,...,dd. So, (b1,...,bd) ∈ Q if and only if (d1,...,dd) ∈ Q and 422 D. A. SPIELMAN AND S.-H. TENG

FIG. 3. The change of variables in Lemma 4.14.

0 h ∈4(d1,...,dd).Asd1 is a function of d2,...,dd,weletQ be the set of d2,...,dd for which (d1,...,dd)∈ Q. So, the left-hand side of (18) equals , ,..., < , Pr 0 [dist (h Aff (d2 dd)) ] (d2,... ,dd )∈Q h∈4(d1,... ,dd ) where h, d2,...,dd have density proportional to Yd Vol (4 (d 1 ,...,dd)) i (h di ). (19) i=1 Similarly, Lemma 4.15 can be seen to imply , ,..., < Pr 0 [dist (d1 Aff (d2 dd)) ] (d2,... ,dd )∈Q h∈4(d ,... ,d ) 1 d 3 exp(2/3)d 3 3 exp(2/3)d 2 (20) 2 2 under density proportional to (19). We take advantage of (20) by proving dist (h, Aff (d ,...,d )) 75d max Pr 2 d < < , (21) ,... , ∈ 0 ∈4 ,... , 2 d2 dd Q h (d1 dd ) dist (d1, Aff (d2,...,dd)) where h has density proportional to Yd i (h di ). i=1 Before proving (21), we point out that using Lemma 2.11 to combine (20) and (21), we obtain 900 e2/3d2 Pr [dist (h, Aff (d ,...,d )<)] , 0 2 d 4 (d2,... ,dd )∈Q h∈4(d1,... ,dd ) from which the lemma follows. Smoothed Analysis of Algorithms 423

To prove (21), we let dist (h, Aff (d2,...,dd)) U = h ∈4(d1,...,dd): , dist (d1, Aff (d2,...,dd)) Q = d and we set (h) i=1 i (h di ). Under this notation, the probability in (21) is equal to

((U0) (U))/(U0).

To bound this ratio, we construct an isomorphism from U0 to U. The natural isomorphism, which we denote 8, is the map that contracts the simplex by a factor of (1 )atd1. To use this isomorphism to compare the measures of the sets, we use the facts that for d1,...,dd ∈ Q and h ∈4(d1,...,dd), √ k k k k — h di maxi, j di d j 4 2, so the distance√ from h di to the center of its distribution is at most kh d k + 3 4 2 + 3; i √ —dist (h,8(h))maxi dist (d1, di ) 4 2 to apply Lemma 2.15 to show that for all h ∈4(d1,...,dd), √ √ ! √ ! (8(h)d ) 3 4 2(4 2 + 3) (48 + 18 2) i i exp = exp . 2 2 i(h di) 2

So,

Yd (8(h)) (8(h) d ) min = min i i ∈4 ,... , ∈4 ,... , h (d1 dd ) (h) h (d1 dd ) = i (h di ) i 1 √ ! √ (48 + 18 2)d (48 + 18 2)d exp 1 . (22) 2 2

As the Jacobian ∂8 (h) d = (1 ) 1 d, ∂h using the change of variables x = 8(h) we can compute Z

(U) = (x) dx Zx ∈U ∂8(h) = (8(h)) dh ∈ ∂h h U0 Z

(1 d) (8(h)) dh . (23) h∈U0 424 D. A. SPIELMAN AND S.-H. TENG

So, R 8 (U) (1 d ) ∈ ( (h)) dh R h U0 by (23) (U0) ∈ (h) dh hU0 R (8(h)) ∈ dh (1 d) min Rh U0 ∈4 ,... , h (d1 dd ) (h) ∈ dh √ ! h U0 (48 + 18 2)d (1 d) 1 by (22) 2 75d 1 , as 1. 2 / < 75d (21) now follows from ( (U0) (U)) (U0) 2 .

LEMMA 4.15 (HEIGHT OF SIMPLEX√). Let 1,...,d be Gaussian measures in IR d 1 of standard deviation 1/3 d ln n centered at points of norm at most 3. Then 3e2/3d 3 Pr dist ( , Aff ( ,..., )<) , [ b1 b2 bd ] 2 b1,... ,bd ∈Q where b1,...,bd have density proportional to Yd Vol (4 (b 1 ,...,bd)) i (bi ). i=1 PROOF. We begin with a simplifying change of variables. As in Theorem 2.27, we let

(b2,...,bd)=(Rc2 +t,... ,Rcd +t), d2 where ∈ S and t 0 specify the plane through b2,...,bd, and c2,...,cd ∈ IR d 2 denote the local coordinates of these points on that plane. Recall that the Jacobian of this change of variables is Vol (4 (c 2 ,...,cd)). Let l =h|b1i, d2 and let c1 denote the coordinates in IR of the projection of b1 onto the plane specified by and t. (See Figure 4.) Note that l 0. In this notation, we have

dist (b1, Aff (b2,...,bd)) = l + t.

The Jacobian of the change from b1 to (l, c1) is 1 as the transformation is just an orthogonal change of coordinates. The conditions for (b1,...,bd) ∈ Q translate into the conditions (a) dist ci , c j 4 for all i 6= j; (b) (l + t) 4; and (c) 0 ∈4(b1,...,bd).

Let R denote the set of c1,...,cd satisfying the first condition. As the lemma is vacuously true for 4, we will drop the second condition and note that doing so cannot decrease the probability that (t + l) <. Thus, our goal is to bound Pr [(l + t) <], (24) ,t,l,(c1,... ,cd )∈R Smoothed Analysis of Algorithms 425

FIG. 4. The change of variables in Lemma 4.15. where the variables have density proportional to2

Yd [0 ∈4(b1,...,bd)]Vol (4 (b 1 ,...,bd)) Vol (4 (c 2 ,...,cd)) i (bi ). i=1

As Vol (4 (b 1 ,...,bd)) = (l +t)Vol (4 (c 1 ,...,cd)) /d, this is the same as having density proportional to

Yd 2 (l + t) [0 ∈4(b1,...,bd)]Vol (4 (c 2 ,...,cd)) i (bi ). i=1

Under a suitable system of coordinates, we can express b1 = (l, c1) and bi = (t, ci ) for i 2. The key idea of this proof is that multiplying the first coordinates of these points by a constant does not change whether or not 0 ∈4(b1,...,bd); so, we can determine whether 0 ∈4(b1,...,bd)from the data (l/t, c1,...,cd). Thus, we will introduce a new variable , set l = t, and let S denote the set of (, c1,...,cd) for which 0 ∈4(b1,...,bd)and (c1,...,cd) ∈ R. This change ∂l = of variables from l to incurs a Jacobian of ∂ t, so (24) equals

Pr [(1 + )t <], ,t,(,c1,... ,cd )∈S

2 While we keep terms such as b1 in the expression of the density, they should be interpreted as functions of , t, l, c1,...,cd. 426 D. A. SPIELMAN AND S.-H. TENG where the variables have density proportional to

Yd 2 2 t (1 + )Vol (4 (c 2 ,...,cd)) 1(t, c1) i (t, ci ). i=2 We upper bound this probability by max Pr [(1 + )t <] max Pr [max(1,)t <], ,(,c1,... ,cd )∈S t ,(,c1,... ,cd )∈S t where t has density proportional to

Yd 2 t 1(t, c1) i (t, ci ). i=2

For c1,...,cd fixed, the points (t, c1), (t, c2),... ,(t,cd) become univariate Gaussians of standard deviation and mean of absolute value at most 3. Let 2 t0 = /(3 max(1,)d). Then, for t in the range [0, t0], t is at most 3 + t0 from the mean of the first distribution and t is at most 3 + t0 from the means of the other distributions. We will now observe that if t is restricted to a sufficiently small domain, then the densities of these Gaussians will have bounded variation. In particular, Lemma 2.15 implies that Q d max ∈ , (t, c ) (t, c ) t [0 t0] 1 1 Qi=2 i i d ∈ , , , mint [0 t0] 1( t c1) i=2 i (t ci ) Yd 2 2 exp 3(3 + t0)t0/2 exp 3(3 + t0)t0/2 i=2 ! ! Yd Yd / 2 / 2 2/ 2 2/ 2 exp 9 t0 2 exp 9t0 2 exp 3 t0 2 exp 3t0 2 i=2 ! i=2 ! Yd Yd exp(3/2d) exp(3/2d) exp 2/6d2 exp( 2/6d2) i=2 i=2 exp(3/2) exp(1/6d) exp(2). Thus, we can now apply Lemma 2.13 to show that 3(max(1,)d 3 Pr [t <]

4.2. ANGLE OF q TO ω.

LEMMA 4.16 (ANGLE OF INCIDENCE). Let d 3 and n > d. Let 1,...,n be Gaussian densities in IR d of standard deviation centered at points of norm at Smoothed Analysis of Algorithms 427 most 1. Let s 2 and let (b1,...,bd)∈ Q. Then, 340n 2 Pr [hω | qi <]< , (25) ω 2 where ω has density proportional to Z ! Y Yd hω | qi ω | a j s hω | qi j (a j ) da j i (Rωbi + sq). j>d a j i=1 ,..., PROOF. First note that the conditionsp for (b1 √ bd)tobeinQimply that for 2 2 1 i d, bi has norm at most (4) + (4) = 4 2 by properties (1), (3) and (4) of Q. As in Proposition 2.29, we change ω to (c, ), where c = hω | qi and ∈ Sd2. The Jacobian of this change of variables is (1 c2)(d3)/2. In these variables, the bound follows from Lemma 4.17.

LEMMA 4.17 (ANGLE OF INCIDENCE,II). Let d 3 and n > d. Let d d+1,...,n be Gaussian densities in IR of standard deviation centered at points√ of norm at most 1. Let s 2, and let b1,...,bd each have norm at most 4 2. Let ∈ Sd2. Then 340n 2 Pr [c <]< , 2 where c has density proportional to

(1 c2)(d3)/2 c Z ! Y Yd h | i h | i + . [ ω,c a j s ω,c q ] j(aj)daj i(Rω,cbi sq) (26) j>d a j i=1

PROOF. Let 2 (d3)/2 1(c) = (1 c ) , Y Z 2(c) = [hω,c | a j ishω,c |qi]j(aj)daj, and j>d a j Yd = + . 3(c) i (Rω,c bi sq) i=1 Then, the density of c is proportional to

(26) = c 1(c)2(c)3(c). Let 2 c = . (27) 0 240n 428 D. A. SPIELMAN AND S.-H. TENG

We will show that, for c between 0 and c0, the density will vary by a factor no greater than 2. We begin by letting 0 = /2 arccos(c0), and noticing that a simple plot of the arccos function reveals c0 < 1/26 implies

0 1.001c0. (28) , So, as c varies in the range [0 c0], ω,c travels in an arc of angle at most 0 and therefore travels a distance at most 0.Asc= q |ω,c , we can apply Lemma 4.18 to show min (c) 8n(1 + s) 24n 1.001 0 c c0 2 1 0 1 0 1 , (29) 2 2 max0cc0 2(c) 3 3 30 by (27) and (28). + We similarly note that as c varies between 0 and c0, the point Rω,c bi sq moves a distance of at most √ 0 kbi k 4 20. As this point is at distance at most √ 1 + s + kbi k 4 2 + 3 from the center of i , Lemma 2.15 implies

+ √ √ min0 c c0 i (Rω,c bi sq) 2 exp (3(4 2 + 3)4 20)/2 max (R b + sq) 0 c c0 i ω,c i 2 exp 1470/ . So,

min (c) 0 c c0 3 / 2 / , exp 147d 0 exp( 148 240) (30) max0cc0 3(c) by (27) and (28) and d n. Finally, we note that 1 1 (c) = (1 c2)(d3)/2 (1 1/26d)(d3)/2 1 . (31) 1 52 So, combining Eqs. (29), (30), and (31), we obtain . min0cc0 1(c) 2(c) 3(c) 1 148 1 001 240 / . 1 e 1 1 2 max0cc0 1(c) 2(c) 3(c) 52 30 We conclude by using Lemma 2.13 to show 2 2 2 240 n 340 n Pr [c <]2(/c0) = 2 . c 2 2

LEMMA 4.18 (POINTS UNDER PLANE). For n > d, let d+1,...,n be Gaussian distributions in IR d of standard deviation centered at points of norm at most 1. Let s 0 and let ω1 and ω2 be unit vectors such that hω1 | qi and hω2 | qi Smoothed Analysis of Algorithms 429 are nonnegative. Then, Q R h | i h | i j>d a [ ω2 a j s ω2 q ] j(aj)daj 8n(1 + s) kω ω k Q R j 1 1 2 . 2 > [hω1 |ajishω1 |qi]j(aj)daj 3 j d aj

PROOF. As the integrals in the statement of the lemma are just the integrals of Gaussian measures over half-spaces, they can be reduced to univariate integrals. If j is centered ata ¯ j , then Z

[hω1 | a j ishω1 |qi]j(aj)daj a j Z d 1 2 2 = √ [hω1 | a jishω1 |qi]exp ka j a¯ j k /2 da j 2 a Z j 1 d = √ [hω | g +a¯ ishω |qi]exp kg k2/2 2 dg 1 j j 1 j j 2 g j = (setting g j a j a¯ j ) Z 1 d = √ [hω | g ihω |sqa¯ i]exp kg k2/2 2 dg 1 j 1 j j j 2 g j Z =h | i 1 t ω1 sq a¯ j = √ exp t 2/2 2 dt 2 t=∞

(by Proposition 2.17) Z 1 t=∞ = √ exp t 2/2 2 dt . 2 t=hω1|sqa¯ j i

As ka¯ j k1, we know

hω1 | sq a¯ j i=hω1|sqi+hω1 |a¯jihω1|a¯ji1. (32) Similarly,

|hω1 |sqa¯ji+hω2 |sqa¯ji|=|hω1 ω2 |sqa¯ji| kω1ω2kksq a¯ j k kω1ω2k(s+1). (33) Thus, by applying Lemma 2.24 to (32) and (33), we obtain R R t=∞ 2 2 [hω2 | a j ishω2 |qi]j(aj)daj =h | i exp(t /2 ) dt . Ra j t ω2 sq a¯ j =R =∞ h | i h | i t 2/ 2 . a [ω1 aj s ω1 q ] j(aj)daj =h | i exp( t 2 ) dt j t ω1 sq a¯ j 8(1 + s)kω ω k 1 1 2 . 3 2 430 D. A. SPIELMAN AND S.-H. TENG

Thus, Q R h | i h | i nd j>d a [ ω2 a j s ω2 q ] j(aj)daj 8(1 + s)kω ω k Q R j 1 1 2 2 > [hω1 |ajishω1 |qi]j(aj)daj 3 j d aj 8n(1 + s)kω ω k 1 1 2 . 3 2

4.3. EXTENDING THE SHADOW BOUND. In this section, we relax the restrictions made in the statement of Theorem 4.1. The extensions of Theorem 4.1 are needed in the proof of Theorem 5.1. We begin by removing the restrictions on where the distributions are centered in the shadow bound.

COROLLARY 4.19 (ka i k FREE). Let z and t be unit vectors and√ let a 1,...,an be Gaussian random vectors in IR d of standard deviation 1/3 d ln n centered at points a¯ 1,...,a¯n. Then, E [Shadowz ,t (a 1,...,an)]D d,n, , max (1, maxi ka¯ k) where D(d, n,)is as given in Theorem 4.1.

PROOF. Let k = maxi ka¯ i k. Assume without loss of generality that k 1, and let bi = a i /k for all i. Then, bi is a Gaussian random variable of standard deviation (/k) centered at a point of norm at most 1. So, Theorem 4.1 implies E [Shadow , (b ,...,b )]D d,n, . z t 1 n k

On the other hand, the shadow of the polytope defined by the bi s can be seen to be a dilation of the polytope defined by the a i s: the division of the bi s by a factor of k is equivalent to the multiplication of x by k. So, we may conclude that for all a 1,...,an,

|Shadowz ,t (a 1,...,an)|=|Shadowz ,t (b1,...,bn)|. COROLLARY 4.20 (GAUSSIANS FREE). Let z and t be unit vectors and d let a 1,...,an be Gaussian random vectors in IR with covariance matrices M 1,...,Mn centered at points a¯ 1,...,a¯n, respectively. If the eigenvalues of 2 each M i lie between and 1/9d ln n, then E [Shadowz ,t (a 1,...,an)]D d,n, + 1, 1+maxi ka¯ k where D(d, n,)is as given in Theorem 4.1.

PROOF. By Proposition 2.14, each a i can be expressed as = + + , a i a¯ i g i g˜ i whereg ˜ i is a Gaussian random vector of standard deviation centered at the origin and g i is a Gaussian random vector centered at the origin with covariance matrix Smoothed Analysis of Algorithms 431

0 = 2 / = + M i M i I , each of whose eigenvalues is at most 1 9d ln n. Leta ˜ i a¯ i g i . If ka˜ i k1+ka¯ikfor all i, then we can apply Corollary 4.19 to show

E [Shadow , (a ,...,a )] ,... , z t 1 n g˜ 1 g˜ n D d,n, D d, n, . max (1, maxi ka˜ k) 1 + maxi ka¯ k On the other hand, Corollary 2.19 implies n 1 Pr [∃i : ka˜ k > 1 + ka¯ k] 0.0015 . ,... , i i g 1 g n d n So, using Lemma 2.9 and Shadowz ,t (a 1,...,an)( ), we can show d E E [Shadow , (a ,...,a )] D d,n, + 1, ,... , ,... , z t 1 n k k g˜ 1 g˜ n g 1 g n 1+maxi a¯ from which the Corollary follows. n COROLLARY 4.21 (yi FREE). Let y ∈ IR be a positive vector. Let z and t d be unit vectors and let a 1,...,an be Gaussian random vectors in IR with co- variance matrices M 1,...,Mn centered at points a¯ 1,...,a¯n, respectively. If the 2 eigenvalues of each M i lie between and 1/9d ln n, then

E [Shadowz ,t (a 1,...,an);y] D d,n, + 1, (1 + maxi ka¯ i k)(maxi yi )/(mini yi ) where D(d, n,)is as given in Theorem 4.1.

PROOF. Nothing in the statement is changed if we rescale the yi s. So, assume without loss of generality that mini yi = 1. Let bi = a i /yi . Then bi is a Gaussian random vector with covariance matrix / 2 k k/ k k M i yi centered at a point of norm at most a i yi ai . Then, the eigenvalues 2/ 2 / 2 / of each M i lie between yi and 1 (9d ln nyi ) 1 9d ln n, so we may complete the proof by applying Corollary 4.20.

5. Smoothed Analysis of a Two-Phase Simplex Algorithm In this section, we will analyze the smoothed complexity of the two-phase shadow- vertex simplex method introduced in Section 3.3. The analysis of the algorithm will use as a black-box the bound on the expected sizes of shadows proved in the previous section. However, the analysis is not immediate from this bound. The most obvious difficulty in applying the shadow bound to the analysis of an algorithm is that, in the statement of the shadow bound, the plane onto which the polytope was projected to form the shadow was fixed and unrelated to the data defining the polytope. However, in the analysis of the shadow-vertex algorithm, the plane onto which the polytope is projected will necessarily depend upon data defining the linear program. This is the dominant complication in the analysis of the number of steps taken to solve LP0. 432 D. A. SPIELMAN AND S.-H. TENG

Another obstacle will stem from the fact that, in the analysis of LP+, we need to consider the expected sizes of shadows of the convex hulls of points of the form +/ + + ai yi , which do not have a Gaussian distribution. In our analysis of LP ,we essentially handle this complication by demonstrating that in almost every small region the distribution can be approximated by some Gaussian distribution. The last issue we need to address is that if smin (AI ) is too small, then the resulting 0 + values for yi and yi can be too large. In Section 5.1, we resolve this problem by proving that one of 3nd ln n randomly chosen I will have reasonable smin (AI ) with very high probability. Having a reasonable smin (AI ) is also essential for the analysis of LP0. As our two-phase shadow-vertex simplex algorithm is randomized, we will mea- sure its expected complexity on each input. For an input linear program specified by A, y and z ,welet C(A,y,z) denote the expected number of simplex steps taken by the algorithm on input (A, y, z ). As this expectation is taken over the choices for I and , and can be divided into the number of steps taken to solve LP+ and LP0, we introduce the functions S0 , , I, , z (A y ) to denote the number of simplex steps taken by the algorithm in step (5) to solve LP0 for a given A, y, I and , and S+ , , I + z (A y ) 2 to denote the number of simplex steps3 taken by the algorithm in step (7) to solve LP+ for a given A, y and I. We note that the complexity of the second phase does not depend upon , however, it does depend upon I as I affects the choice of and M.Wehave C , , S0 , ,I, + S+ , ,I, + . (A y z) E z(A y ) E z (A y ) 2 I, I,

THEOREM 5.1 (MAIN). There exists a polynomial P and a constant 0 such n d n d that for every n > d 3, A¯ = [a¯ 1,...,a¯n] ∈ IR , y¯ ∈ IR and z ∈ IR , and >0, n n E[C(A,y,z)]min P(d, n, 1/ min(, 0)), + + 2 , A,y d d + 1 where A is a Gaussian random matrix of standard deviation maxi k(y¯i ,a¯ i )k centered at A¯ and y is a Gaussian random vector of standard deviation maxi k(y¯i ,a¯ i )k centered at y¯ . PROOF. We first observe that the behavior of the algorithm is unchanged if one multiplies A and y by a power of two. That is, C(A, y, z ) = C(2kA, 2ky, z ),

3 The seemingly odd appearance of +2 in this definition is explained by Lemma 3.10. Smoothed Analysis of Algorithms 433 for any integer k. When A and y are Gaussian random variables centered at A¯ andy ¯ k k of standard deviation maxi k(y¯i ,a¯ i )k,2 Aand 2 y are Gaussian random variables k k k k centered at 2 A¯ and 2 y¯ of standard deviation maxi k(2 y¯i , 2 a¯ i )k. Accordingly, we may assume without loss of generality in our analysis that maxi k(y¯i ,a¯ i )k ∈ (1/2, 1]. The Theorem now follows from Proposition 5.2 and Lemmas 5.15 and 5.27. Before proceeding with the proof of Theorem 5.1, we state a trivial upper bound on S0 and S+: PROPOSITION 5.2 (TRIVIAL SHADOW BOUNDS). For all A, y, z , I and , n n S0 (A, y, I, ) and S+(A, y, I, ) . z d z d + 1 S0 n PROOF. The bound on follows from the fact that there are ( d ) d-subsets of [n]. The bound on S+ follows from the observation in Lemma 3.10 that the number of steps taken by the second phase is at most 2 plus the number of (d + 1)-subsets of [n]. 5.1. MANY GOOD CHOICES. For a Gaussian random d-by-d matrix (a 1,...,ad), it is possible to show that the probability that the smallest singu- 1/2 lar value of (a 1,...,ad) is less than is at most O(d ). In this section, we consider the probability that almost all of the d-by-d minors of a d-by-n matrix (a 1,...,an) have small singular value. If the events for different minors were in- dependent, then the proof would be straightforward. However, distinct minors may have significant overlap. While we believe stronger concentration results should be obtainable, we have only been able to prove:

LEMMA 5.3 (MANY GOOD CHOICES). For n > d 3, let a 1,...,an be Gaussian random variables in IR d of standard deviation centered at points of norm at most 1. Let A = (a ,...,a ). Then, we have  1 n  X 1 n Pr  [s (A ) ] 1 nd +nn+d1 +n2.9d+1, ,..., min I 0 a 1 a n n d ∈ [n] I ( d ) where def min(1,) 0 = √ . (34) 12d2n7 ln n In the analyses of LP0 and LP+, we use the following consequence of Lemma 5.3, whose statement is facilitated by the following notation for a set of d-sets, I, I def= . (A) argmaxI ∈I (smin (AI ))

COROLLARY 5.4 (PROBABILITY OF SMALL smin(AI(A))). For n > d 3, let d a 1,...,an be Gaussian random variables in IR of standard deviation centered at points of norm at most 1, and let A = (a 1,...,an).ForIa set of 3nd ln n randomly chosen d-subsets of [n], n 1 Pr smin AI(A) 0 0.417 . A,I d 434 D. A. SPIELMAN AND S.-H. TENG

PROOF Pr smin AI(A) 0 A,I = Pr [∀I ∈ I : smin (AI ) 0] A,I  X  1 n  Pr [smin (AI ) 0] < 1 A n d I ∈([n])  d  X  1 n  +Pr ∀I ∈ I : smin (AI ) 0 [smin (AI ) 0] 1 I,A n d ∈ [n] I ( d ) |I| 1 nd +nn+d1 +n2.9d+1 + 1 , by Lemma 5.3, n nd + nn+d1 + n2.9d+1 + n3d , as |I| = 3nd ln n, n 1 0.417 , d for n > d 3. We also use the following corollary, which states that it is highly unlikely that falls outside the set K, which we now define: n √ √ o blg(x)c K = 2 : 0 x d + 3d ln n . (35)

COROLLARY 5.5 (PROBABILITY OF IN K). For n > d 3, let a 1,...,an be Gaussian random variables in IR d of standard deviation centered at points of norm at most 1, and let A = (a 1,...,an).ForIa set of 3nd ln n randomly chosen d-subsets of [n], h i 1 b c n Pr 2 lg(smin(AI(A))) 6∈ K 0.42 . A,I d PROOF. It follows from Corollary 5.4 that n 1 Pr smin AI(A) 0 0.417 . A,I d On the other hand, as √ smin (AI ) kAI k d max ka i k , i h √ √ i Pr smin AI(A) d 1 + 3 d ln n A,I √ n 1 Pr max ka i k 1 + 3 d ln n 0.0015 , A i d by Corollary 2.19. Smoothed Analysis of Algorithms 435

PROPOSITION 5.6 (SIZE OF K). |K| 9 lg(nd/min(, 1)). The rest of this section is devoted to the proof of Lemma 5.3. The key to the proof is an examination of the relation between the events which we now define. ∈ [n] ∈ [n] 6∈ Definition 5.7. For I ( d ), K (d1), and j K , we define the indicator random variables X = [s (A ) ] , and I min I 0 j = , , YK dist a j Span (AK ) h0 where h def= . 0 4n4 j In Lemma 5.10, we obtain a concentration result on the YK s using the fact that j the YK are independent for fixed K and different j. To relate this concentration result to the X I s, we show in Lemma 5.11 that, when X I is true, it is probably the j case that YI { j} is true for most j. PROOF OF LEMMA 5.3. The proof has two parts. The first, and easier, part is Lemma 5.10, which implies   X X n d 1 n Pr  Y j >1nn+d1. ,..., K a 1 a n 2 d 1 ∈ [n] j6∈K K (d1) To apply this inequality, we use Lemma 5.11, which implies   X X d X Pr  Y j > X  > 1 nd n2.9d+1. ,..., K I a 1 a n 2 ∈ [n] j6∈K I K (d1) Combining these two inequalities, we obtain " # X d n d 1 n d n+d1 2.9d+1 Pr X I < 1n n n . a ,...,a 1 n 2 I 2 d 1 Observing, d X n d 1 n X n d n X < =⇒ X < 2 I 2 d 1 I d d 1 I I = n d n + n d 1 d = 1 n 1 + n d 1 d 1 n 1 , n d 436 D. A. SPIELMAN AND S.-H. TENG we obtain " # X 1 n d n+d1 2.9d+1 Pr X I 1 n +n +n . a ,...,a 1 n I n d

j . LEMMA 5.8 (PROBABILITY OF YK ). Under the conditions of Lemma 5 3, for ∈ [n] 6∈ all K d1 and j K, h i j h0 . Pr YK a 1,...,a n

PROOF. Follows from Proposition 2.20.

j . LEMMA 5.9 (SUM OVER j OF YK ). Under the conditions of Lemma 5 3, for all ∈ [n] K d1 , " # d + / e X 4h (n d 1) 2 Pr Y j d(n d + 1)/2e 0 . a ,...,a K 1 n j6∈K

j PROOF. Using the fact that for fixed K the events YK are independent, we compute " # X Pr Y j d(n d + 1)/2e a ,...,a K 1 n j6∈K X h i ∀ ∈ , j Pr j J YK a 1,...,a n ∈ [n]K J (d + / e) (Xn d 1) 2 Y h i = j Pr YK a 1,...,a n ∈ [n]K j∈J J (d + / e) (n d 1) 2 X d(nd+1)/2e h0 , by Lemma 5.8, ∈ [n]K J (d + / e) (n d1) 2 d(nd+1)/2e 4h0 , | [n]K | |[n]K| = nd+1 as d(nd+1)/2e 2 2 .

j LEMMA 5.10 (SUM OVER K AND j OF YK ). Under the conditions of Lemma 5.3,   X X n d 1 n Pr  Y j > nn+d1. ,..., K a 1 a n 2 d 1 ∈ [n] j6∈K K (d1) Smoothed Analysis of Algorithms 437 P P j > d nd1 e [n] PROOF.If K∈([n]) j6∈K YK (d1), then there must exist a K for P d1 2 j > d nd1 e which j6∈K YK 2 , which implies for that K X + j n d 1 + = n d 1 . YK 1 j6∈K 2 2 Using this trick, we compute   X X n d 1 n Pr  Y j  ,..., K a 1 a n 2 d 1 ∈ [n] j6∈K K (d1) " # [n] X n d + 1 Pr ∃K ∈ : Y j a ,...,a K 1 n d 1 j6∈K 2 " # n X n d + 1 Pr Y j a ,...,a K d 1 1 n j6∈K 2 d(nd+1)/2e n 4h0 d 1 (by Lemma 5.9) d(nd+1)/2e = n 4h0 n d + 1 d + / e 1 (n d 1) 2 nnd+1 n4 nn+d1. The other statement needed for the proof of Lemma 5.3 is:

LEMMA 5.11 (RELATING XSTOYS). Under the conditions of Lemma 5.3,   X X d X Pr  Y j X  nd + n2.9d+1. ,..., K I a 1 a n 2 ∈ [n] j6∈K I K (d1)

PROOF. Follows immediately from Lemmas 5.12 and 5.14.

LEMMA 5.12 (GEOMETRIC CONDITION FOR BAD I ). If there exists a d-set I such that X j d , X I and YI { j} j∈I 2 then there exists a set L I,|L| = bd/21cand a j0 ∈ I L such that √ k k , + d maxi a i . dist (a j0 Span(AL )) d 0 1 2 h0 438 D. A. SPIELMAN AND S.-H. TENG

PROOF. Let I = {i1,...,id}. By Proposition 2.6 (a), X I implies the existence ,..., k ,..., k= of ui1 uid, (ui1 uid) 1, such that X . uia i 0 i∈I P j / | | = On the other hand, j∈I YI { j} d 2 implies the existence of a J I , J d / e j = ∈ | | </ d 2 , such that YI { j} 0 for all j J. By Lemma 5.13, this√ implies u j 0 h0 for all j ∈ J.Ask(ui ,...,u√i )k=1 and 0/h0 1/ d, there exists some ∈ | 1 | /d = { } j0 I J such that u j0 1 d. Setting L I J j0 , we compute X X X u a =⇒ u a + u a + u a j j 0 j 0 j 0 j j j j 0 j∈I j ∈ L j ∈ J X X =⇒ u a + u a + u a j 0 j 0 j j 0 j j ∈ ∈ j L j J ! X X =⇒ a + ( u / u )a (1/ u ) + u a j 0 j j 0 j j0 0 j j ∈ ∈ j L j J ! X √ X =⇒ a + ( u / u )a d + u a j 0 j j 0 j 0 j j ∈ ∈ j L j J √ k k =⇒ , + d 0 maxi a i . dist (a j0 Span (AL )) d 0 2 h0

LEMMA 5.13 (BIG HEIGHT,SMALL COEFFICIENT). Let a 1,...,ad be vectors and u be a unit vector such that Xd . uia i 0 i=1 If dist (a j , Span({a i }i6= j )) > h0, then u j <0/h0.

PROOF.Wehave Xd X =⇒ + uiai 0 u j a j uia i 0 i=1 i 6= j X =⇒ + / / a j (ui u j )a i 0 u j 6= i j =⇒ , { } / , dist a j Span a i i6= j 0 u j from which the lemma follows. Smoothed Analysis of Algorithms 439

LEMMA 5.14 (PROBABILITY OF BAD GEOMETRY). Under the conditions of Lemma 5.3, " # [n] ∃L ∈ , j0 6∈ Lsuch that bd/21c √ d 2.9d+1 Pr k k n + n . ,..., d maxi a i a 1 a n dist (a j , Span(AL )) d0 1 + 0 2 h0

PROOF. We first note that " # ∃ ∈ [n] , 6∈ L bd/21c j0 L such that Pr √ d maxi ka i k a 1,... ,a n , + dist (a j0 Span (AL )) d 0 1 2 h " 0 # ∃ ∈ [n] , 6∈ L bd/21c j0 L such that Pr √ √ ,... , d 1+3 d ln n a 1 a n dist (a j , Span (AL )) d0 1 + 0 2 h0 (36) √ + Pr [max ka i k > 1 + 3 d ln n]. (37) a 1,... ,a n i We now apply Proposition 2.20 to bound (36) by " √ !# X X √ d 1 + 3 d ln n Pr dist (a , Span (A )) d 1 + ,..., j0 L 0 a 1 a n 2 h ∈ [n] j 6∈L 0 L (b / c) 0 d 2 1 !! √ √ d|L| n d d 1 + 3 d ln n ( n d / 2 + 1) 0 1 + . b d / 2 1 c 2 h0 (38) d d e 2d To simplify this expression, we note that 2 3 , for d 3. We then recall , 0 = min(√ 1) , h0 3d2n3 ln n and apply d 3 to show √ √ ! √ d d 1 + 3 d ln n d 2d3/2 √ 0 1 + 0 + 0 + 2d2 ln n 2 h0 h0 3 1 . n3 So, we have

d / e n 1 d 2 (38) (n d/2 + 1) bd/2 1c n3 nbd/21c+1n3d/2 nd . On the other hand, we can use Corollary 2.19 to bound (37) by n2.9d+1. 440 D. A. SPIELMAN AND S.-H. TENG

5.1.1. Discussion. It is natural to ask whether one could avoid the complication of this section by setting I = {1,... ,d}, or even choosing I to be the best d-set in {1,... ,d +k}for some constant k. It is possible to show that the probability that all d-by-d minors√ of a perturbed d-by-(d + k) matrix have condition number at most grows like ( d/)k. Thus, the best of these sets would have reasonable condition number with polynomially high probability. This bound would be sufficient to 0 handle our concerns about the magnitude of yi . The analysis in Lemma 5.18 might still be possible in this situation; however, it would require considering multiple possible splittings of the perturbation (for multiple values of 1), and it is not clear whether such an analysis can be made rigorous. Finally, it seems difficult in this situation to apply the trick in the proofs of Lemma 5.27 and 5.15 of summing over all likely values for . If the algorithm is given as input, then it is possible to avoid the need for this trick (and an such an analysis appeared in an earlier draft of this article). However, we believe that it is preferable for the algorithm to make sense without taking as an input. While choosing I in such a simple fashion could possibly simplify this section, albeit at the cost of complicating others, we feel that once Lemma 5.3 has been improved and the correct concentration bound has been obtained, this technique will provide the best bounds. One of the anonymous referees pointed out that it should be possible to use the rank revealing QR factorization to find an I with almost maximal smin (AI ) (see Chan and Hansen [1992]). While doing so seems to be the best choice algo- rithmically, it is not clear to us how we could analyze the smoothed complexity of the resulting two-phase algorithm. The difficulty is that the assumption that a partic- ular I was output by the rank revealing QR factorization would impose conditions on A that we are currently not able to analyze. 0 5.2. BOUNDING THE SHADOW OF LP . Before beginning our analysis of the 0 shadow of LP , we define the set from which is chosen to be A1/d2 , where we define A = { : h | 1i = 1} , and

A = { : h | 1i = 1 and i ,∀i}.

The principal obstacle to proving the bound for LP0 is that Theorem 4.1 requires one to specify the plane on which the shadow of the perturbed polytope will be measured before the perturbation is known, whereas the shadow relevant to the 0 analysis of LP depends on the perturbation—it is the shadow onto Span(A, z ). To overcome this obstacle, we prove in Lemma 5.18 that if smin A¯ I(A) 0/2, then the expected size of the shadow onto Span (A, z ) is close to the expected size of the shadow onto Span A¯ ¯ , z , where ¯ is chosen from A0. As this plane is independent of the perturbation, we can apply Theorem 4.1 to bound the size of the shadow on this plane. Unfortunately, A¯ is arbitrary, so we cannot make any assumptions about smin A¯ I(A) . Instead, we decompose the perturbation into two parts, as in Corollary 4.20, and can then use Corollary 5.4 to show that with high probability smin A˜ I(A) 0/2. We begin the proof with this decomposition, and build to the point at which we can apply Lemma 5.18. A secondary obstacle in the analysis is that and M are correlated with A and y. We overcome this obstacle by considering the sum of the expected sizes of the Smoothed Analysis of Algorithms 441 shadows when and M are fixed to each of their likely values. This analysis is facilitated by the notation 0 def 0 T , , ,, = , ,..., , z (A I M) ShadowAI z a 1 an;y ( M if i ∈ I where y0 = √ i dM2/4 otherwise. We note that 0 0 b c d k , k e+ S , , I, = T , I , , lg smin(AI(A)) , lg(maxi (yi a i ) ) 2 . z (A y ) z A (A) 2 2

0 n d LEMMA 5.15 (LP ). Let d 3 and n d + 1. Let A¯ = [a¯ 1,...,a¯n]∈IR , n d y¯ ∈ IR and z ∈ IR satisfy maxi k(y¯i , a¯ i )k ∈ (1/2, 1]. For any >0, let A be a Gaussian random matrix centered at A¯ of standard deviation , and let y by a Gaussian random vector centered at y¯ of standard deviation . Let be chosen uniformly at random from A1/d2 and let I be a collection of 3nd ln n randomly chosen d-subsets of [n]. Then, E [S0 (A, y, I, )] , ,I, z A y min(1,4) 326nd(ln n) lg(dn/min(1,)) D d, n, , 12, 960d8.5n14 ln2.5 n where D(d, n,)is as given in Theorem 4.1. PROOF. Instead of treating A as a perturbation of standard deviation of A¯ , we will view A as the result of applying a perturbation of standard deviation 0 2 + 2 = 2 followed by a perturbation of standard deviation 1, where 0 1 . Formally, we will let G be a Gaussian random matrix of standard deviation 0 centered at the origin, A˜ = A¯ + G, G˜ be a Gaussian random matrix of standard deviation 1 centered at the origin, and A = A˜ + G˜ , where

def 0 1 = √ , 6d3 ln n 2 = 2 2 and 0 1 . We similarly decompose the perturbation to y into a perturbation of standard deviation 0 from which we obtainy ˜ , and a perturbation of standard deviation 1 from which we obtain y. We will let h˜ = y y˜ . We can then apply Lemma 5.16 to show n 1 Pr smin A˜ I(A) <0/2 <0.42 . (39) I,A˜ ,G˜ d One difficulty in bounding the expectation of T 0 is that its input parameters are correlated. To resolve this difficulty, we will bound the expectation of T 0 by the sum over the expectations obtained by substituting each of the likely choices for and M. In particular, we set √ √ dlg xe+2 M = 2 : max k(y˜i , a˜ i )k3 dln n1 x max k(y˜i , a˜ i )k+3 dln n1 . i i 442 D. A. SPIELMAN AND S.-H. TENG

We now define indicator random variables V , W, X, Y , and Z by V = [|M| 2] , h p i W = max k(y˜i , a˜ i )k 1 + 3 (d + 1) ln n , i X = s A˜ I /2 , h min (A) 0 i b c Y = 2 lg smin(AI(A)) ∈ K , and d k , ke+ Z = 2 lg maxi (yi a i ) 2 ∈ M , and then expand

E [S0(A, y, I, )] I,A,y, = E [S0(A, y, I, )VWXYZ] + E [S0(A, y, I, )(1 VWXYZ)]. I,A,y, I,A,y, (40) From Corollary 5.5, we know h i 1 b c n Pr [not(Y )] = Pr 2 lg smin(AI(A)) 6∈ K 0.42 . (41) A,I A,I d

Similarly, Corollary 2.19 implies for any A˜ andy ˜ and n > d 3, 1 d k , ke+ n Pr [not(Z)] = Pr 2 lg maxi (yi a i ) 2 6∈ M 0.0015 . (42) G˜ ,h˜ ,I G˜ ,h˜ ,I d From Corollary 2.19, we have

n 1 Pr [not(W)] n2.9(d+1)+1 0.0015 . A˜ ,y˜ d k , k / For i0 an index for which (yi0 a i0 ) 1 2, Proposition 2.22 implies h i p n 1 Pr [not(V )] Pr k(y˜ , a˜ )k < 9 (d + 1) ln n 0.01 . , i0 i0 1 A˜ a˜ i0 y˜i0 d By also applying inequality (39) to bound the probability of not(X), we find

n 1 Pr [(1 VWXYZ) = 1] 0.86 . A,y,I d As n S0(A, y, I, ) , (by Proposition 5.2) d the second term of (40) can be bounded by 1. Smoothed Analysis of Algorithms 443

To bound the first term of (40), we note

E [S0(A, y, I, )VWXYZ] I, , , A y " # X E VW E [T0(A,I(A),,,M)XW] . (43) I, ˜, ˜,˜, A y˜ ∈K,M∈MG h

Moreover, E T 0 (A, I(A), ,,M)XW ˜ G˜ ,h, " # X 0 = E T (A,I,,,M)W[smin(A˜ I ) 0/2][I(A) = I ] G˜ ,h˜, " I∈I # X 0 E T (A, I, ,,M)W[smin(A˜ I ) 0/2] G˜ ,h˜ , " I ∈I # X 0 E T (A, I, ,,M)|W and smin(A˜ I ) 0/2 G˜ ,h˜ , ∈I X I 0 = E T (A, I, ,,M)|W and smin(A˜ I ) 0/2 ˜ , ˜ , I ∈I G h X (6.01)D d, n, √ 1 √ , by Lemma 5.17, (2 + 3 d ln n)( dM2/4M) I ∈I 4 3(6.01)nd(ln n) D d, n, √ 1 √ . (2 + 3 d ln n)( dM)

Thus,

(43) " # X 4 E VW 3(6.01)nd(ln n)D d, n, √ 1 √ I,A˜ ,y˜ (2 + 3 d ln n)( dM) ∈K,M∈M 4 min(K) E 3(6.01)nd(ln n)(V |M|) |K| WD d, n, √ 1 √ I, ˜, + M A y˜ (2 3 d ln n )( d max( )) 4 min(K) E 6(6.01)nd(ln n) |K| WD d, n, √ 1 √ I , ˜ , + M A y˜ (2 3 d ln n )( d max( )) 2 6(6.01)nd(ln n) |K| D d, n, √ √ 1 0 √ , d(2 + 3 d ln n)(1 + 6 (d + 1) ln n) √ where the last inequality follows from max(M) 1 + 6 (d + 1) ln n when W is true. 444 D. A. SPIELMAN AND S.-H. TENG

To simplify, we first bound the third argument of the function D by: 2 √ √ 1 0 √ d(2 + 3 d ln n )(1 + 6 (d + 1) ln n) 1 2 = √ √ √ 0 √ 3d3 ln n d(2 + 3 d ln n)(1 + 6 (d + 1) ln n) 1 1 2 2(min(1,))2 = √ √ √ √ 3d3.5 ln n 12d2n7 ln n (2 + 3 d ln n)(1 + 6 (d + 1) ln n) 1 min(1,4) √ √ 432d7.5n14(ln n)1.5 (2 + 3 d ln n)(1 + 6 (d + 1) ln n) ,4 1 min(1 ) 432d7.5n14(ln n)1.5 30d ln n ,4 = min(1 ) , 12, 960d8.5n14 ln2.5 n where the last inequality follows from the assumption that n > d 3. Applying Proposition 5.6 to show |K| 9 lg(dn/min(1,)), we now obtain min(1,4) (43) 6(6.01) |K| nd(ln n)D d, n, , 8.5 14 2.5 12 960d n ln n min(1,4) 325nd(ln n) lg(dn/min(1,))D d, n, . 12, 960d8.5n14 ln2.5 n

LEMMA 5.16 (PROBABILITY OF SMALL smin(A˜ I(A))). For A, A˜ , and I as de- fined in the proof of Lemma 5.15, n 1 Pr smin A˜ I(A) <0/2 <0.42 . I,A˜ ,G˜ d

PROOF. Let I = I(A), we have

Pr [smin(AI ) <0] Pr [smin(A˜ I ) <0/2] . Pr [smin(AI ) <0|smin(A˜ I ) <0/2] From Corollary 5.4, we have

n 1 Pr [s (A ) <]0.417 . min I 0 d On the other hand, we have Pr smin(AI ) 0 smin(A˜ I ) <0/2 Pr smin(AI ) 0 and smin(A˜ I ) <0/2 Pr kAI A˜ I k0/2, by Proposition 2.6 (b), Smoothed Analysis of Algorithms 445 √ Pr [max ka i a˜ i k0/2 d], by Proposition 2.4 (d), A i √ 5/2 = Pr [max ka i a˜ i k3d ln n1] A i n2.9d+1, by Corollary 2.19. Thus, 1 0.417 n n 1 (5.2) d 0.42 , 1 n2.9d+1 d for n > d 3. [n] ,..., LEMMA 5.17 (FROM a˜ ).√Let I be a set in ( d ) and let a˜ 1 a˜n be points each of norm at most 1 + 3 (d + 1) ln n such that s (A˜ ) 0 . min I 2 Then, ,..., 0 E [ShadowAI ,z (a 1 an;y )] A,∈A 1/d2 ! (6.01)D d, n, √ 1 . (44) + 0/ 0 (2 3 d ln n )(maxi yi mini yi ) PROOF. We apply Lemma 5.18 to show ,..., 0 E ShadowAI ,z a 1 an;y A,∈A 1/d2 0 6 E Shadow ˜ , a 1,...,an;y +1 , ∈ AI ˜ z A ˜ A0 0 6 max E Shadow ˜ , a 1,...,an;y +1 ∈ AI ˜ z ˜ A0 A ! 6D d,n, √ 1 + 7 (2 + 3 d ln n)(max y0/ min y0) i i i i ! (6.01)D d, n, √ 1 , + 0/ 0 (2 3 d ln n )(maxi yi mini yi ) by Corollary 4.21 and fact that D(n, d,) 58, 888, 678 for any positive n, d,. ∈ [n] ,..., LEMMA 5.18 (CHANGING TO ˜ ). Let I ( d ). Let a 1 an be Gaussian d ,..., random vectors in IR of standard deviation 1, centered at points a˜ 1 a˜n.If smin A˜ I 0/2, then ,..., 0 E ShadowAI ,z a 1 an;y A,∈A 1/d2 6 E Shadow a ,...,a ;y0 +1. A˜ I ˜ ,z 1 n A,˜∈A0 446 D. A. SPIELMAN AND S.-H. TENG

PROOF. The key to our proof is Lemma 5.19. To ready ourselves for the appli- cation of this lemma, we let 0 FA(t) = Shadowt,z a 1,...,an;y , √ and note that FA(t) = FA(t/ ktk). If A˜ A 3d ln n1, then 1 1 I A˜ A A˜ A˜ A √ 2 √ 2 3d ln n 1 3d ln n √ 0 . 1 2 0 0 12d3 ln n 2d By Proposition 2.6(b), s (A ) s A˜ A˜ A min I min I √ /2 3d ln n 0 1 1 0 1 2 2 2d 0 17 , 2 18 for d 3. So, we can similarly bound 9 I A1A˜ . 17d2 We can then apply Lemma 5.19 to show 0 0 E [|ShadowA ,z (a 1,...,an;y )|]6 E [|Shadow ˜ , (a 1,...,an;y )|]. ∈ I ∈ AI ˜ z A1/d2 ˜ A From Corollary√ 2.19 and Proposition 2.4(d), we know that the probability that kA˜ Ak > 3d ln n is at most n2.9d+1.AsShadow (a ,...,a ;y0)(n), 1 A˜ I ˜ ,z 1 n d we can apply Lemma 2.9 to show ,..., 0 E E ShadowAI ,z (a 1 an;y ) A ∈A 2 1/d ,..., 0 + . 6E E ShadowA˜ ˜ ,z (a 1 an;y ) 1 A ˜∈A I To compare the expected sizes of the shadows, we will show that the distribution Span(A, z ) is close to the distribution Span(A˜ ˜ , z ). To this end, we note that for a given ˜ ∈ A0, the ∈ A for which A is a positive multiple of A˜ ˜ is given by

1 def A A˜ ˜ = 9(˜ ) = . (45) A1A˜ ˜ | 1

To derive this equation, note that A˜ ˜ is the point in 4 (a˜ 1,...,a˜d) specified by ˜ . A1A˜ ˜ provides the coordinates of this point in the basis A. Dividing by Smoothed Analysis of Algorithms 447

1 hA A˜ ˜ | 1i provides the ∈ A specifying the parallel point in Aff (a 1,...,ad). We can similarly derive ˜ 1 91 = A A . () 1 A˜ A | 1 Our analysis will follow from a bound on the Jacobian of 9.

LEMMA 5.19 (APPROXIMATION OF BY ˜ ). Let F(x) be a nonnegative func- 1 tion depending only on x/kxk.If =1/d2,kIA˜ Ak, and kI A1A˜ k , where 9/17d2, then E [F(A)] 6 E F(A˜ ˜ ) . ∈A ˜ ∈A0 PROOF. Expressing the expectations as integrals, the lemma is equivalent to Z Z 1 6 F (A) d F (A˜ ˜ ) d˜ . Vol ( A ) ∈ A Vol ( A 0 ) ˜ ∈ A 0 Applying Lemma 5.21 and setting = 9(˜ ), we bound Z 1 F (A) d Vol ( A ) ∈ A Z 1 F (A) d Vol ( A ) ∈9 Z ( A 0 ) 1 ∂9(˜ ) = F (A9 (˜ )) d˜ Vol ( A ) ∈ ∂˜ Z ˜ A 0 ∂9 = 1 F ˜ (˜ ) (A˜ ) ∂ d˜ Vol ( A ) ˜ ∈ A 0 ˜ ˜ 9 F / k k (as A˜ is a positive multiple of A Z(˜ ) and (x) only depends on x x ) ∂9(˜ ) 1 max F (A˜ ˜ ) d˜ ˜ ∈A0 ∂˜ Vol ( A ) ∈ ˜ A 0 Z ∂9 (˜ ) Vol ( A 0 ) 1 = max F (A˜ ˜ ) d˜ ˜ ∈A0 ∂˜ Vol ( A ) Vol ( A ) ∈ 0 ˜Z A 0 (1 + )d 1 d 1 √ F (A˜ ˜ ) d˜ d (1 d) (1 ) 1 d Vol ( A 0 ) ˜ ∈ A 0 (by Proposition 5.20Z and Lemma 5.24) 1 6 F (A˜ ˜ ) d˜ , Vol ( A 0 ) ˜ ∈ A 0 for 9/17d2, = 1/d2 and d 3.

PROPOSITION 5.20 (VOLUME DILATION). d Vol ( A 0 ) = 1 . Vol ( A ) 1 d 448 D. A. SPIELMAN AND S.-H. TENG

FIG.5. 0u,v can be understood as the projection through the origin from one plane onto the other.

PROOF. The set A may be obtained by contracting the set A0 at the point (1/d, 1/d,... ,1/d) by the factor (1 d). LEMMA 5.21 (PROPER SUBSET). Under the conditions of Lemma 5.19,

A 9(A0). PROOF. We will prove 1 9 (A) A0. 0 1 0 0 Let ∈ A, = A˜ A and ˜ = /h | 1i. Using Proposition 2.2 to show k k k k = 1 1 and Proposition 2.4(a), we bound 0 | 0|k 0k i i i i 1 kI A˜ Akkk>0. So, all components of 0 are positive and therefore all components of ˜ = 0/ 0 | 1 are positive. We will now begin a study of the Jacobian of 9. This study will be simplified by decomposing 9 into the composition of two maps. The second of these maps is given by: d Definition 5.22 (0u,v ). Let u and v be vectors in IR and let 0u,v (x)bethe map from {x : hx | ui = 1} to {x : hx | vi = 1} given by x 0 , (x) = . u v hx | vi (See Figure 5.)

LEMMA 5.23 (JACOBIAN OF 9 ). ∂9 k k (˜ ) 1 1 = det A A˜ . ∂ T ˜ 1 d 1 A A˜ ˜ | 1 A˜ A 1 Smoothed Analysis of Algorithms 449

0 PROOF. Let = 9(˜ ) and let = A 1A˜ ˜ .Ash˜ |1i=1, we have 1 T 0 | A˜ A 1 = 1.

0 1 T So, = 0 , ( ), where u = (A˜ A) 1 and v = 1. By Lemma 5.25, u v ∂ ∂ ∂ 0 ∂0 0 u,v( ) 1 = = det det A A˜ ∂˜ ∂0 ∂˜ ∂0 k1k = det A 1A˜ . T 1 d 1 A A˜ ˜ | 1 A˜ A 1

LEMMA 5.24 (BOUND ON JACOBIAN OF 9 ). Under the conditions of Lemma 5.19, d ∂9(˜ ) (1 + ) √ , ∂˜ (1 d)d (1 ) for all ˜ ∈ A0. PROOF. The condition kI A 1A˜ kimplies kA 1A˜ k1+,so Proposition 2.4(e) implies det A1A˜ (1 + )d . √ 1 1 Observing that k1k= d, and kI (A˜ A)T k=kI(A˜ A)k, we compute √ 1 1 1 (A˜ A)T 1 k1k 1 (A˜ A)T 1 d I (A˜ A)T k1k √ √ d d. So, k k 1 1 . 1 T 1 A˜ A 1

Finally, as h˜ | 1i = 1 and k˜ k 1, we have A1A˜ ˜ | 1 = h˜ | 1i + A1A˜ ˜ ˜ | 1 = 1 + (A1A˜ I )˜ | 1 1 A1A˜ I k˜ kk1k √ 1 d. Applying Lemma 5.23, we obtain ∂9 k k + d (˜ ) 1 1 (1 ) = det A A˜ √ . ∂ T d ˜ 1 d 1 (1 d) (1 ) A A˜ | 1 A˜ A 1 450 D. A. SPIELMAN AND S.-H. TENG

LEMMA 5.25 (JACOBIAN OF 0u,v ). ∂0 k k u,v(x) v det = . ∂x hx | vid kuk

PROOF. Consider dividing IRd into Span (u, v) and the space orthogonal to Span (u, v). In the (d 2)-dimensional orthogonal space, 0u,v acts as a multipli- cation by 1/ hx | vi. On the other hand, the Jacobian of the restriction of 0u,v to Span (u, v) is computed by Lemma 5.26 to be

k k v . hx | vi2 kuk

So, ∂0 d2 k k k k u,v(x) 1 v v det = = . ∂x hx | vi hx | vi2 kuk hx | vid kuk

2 LEMMA 5.26 (JACOBIAN OF 0u,v IN 2D). Let u and v be vectors in IR and let 0u,v (x) be the map from {x : hx | ui = 1} to {x : hx | vi = 1} by

x 0 , (x) = . u v hx | vi

Then, ∂0 k k u,v(x) v det = . ∂x hx | vi2 kuk 0 1 PROOF. Let = , the 90 rotation counterclockwise. Let R 10

u ⊥ = Ru/ kuk and v ⊥ = Rv/ kvk .

Express the x such that hx | ui = 1, as x = u/ kuk2 + xu ⊥. Similarly, parame- terize the line {x : hx | vi = 1} by v/ kvk2 + yv ⊥. Then, we have 2 ⊥ 2 ⊥ 0u,v u/ kuk + xu = v/ kvk + yv , where u/ kuk2 + xu ⊥ | v ⊥ u/ kuk2 + xu ⊥ | v ⊥ y = = . u/ kuk2 + xu ⊥ | v hx | vi Smoothed Analysis of Algorithms 451

So, ∂0 u,v(x) det ∂ x ∂y = det ∂x D ED E D ED E u ⊥v ⊥ u +xu⊥v u⊥v u +xu⊥v⊥ kuk2 kuk2 = hx |vi2 D ED E D ED E u⊥v⊥ u v u⊥v u v⊥ kuk2 kuk2 = hx |vi2 D ED E D ED E kvk u ⊥v ⊥ u v u⊥ v u v⊥ kuk kvk kvk kuk = kukhx |vi2 D E D E 2 2 k k u v + ⊥ v v kuk kvk u kvk 2 = , as R is orthogonal and R =1, kukhx |vi2 kvk u = , as , u ⊥ is a basis. kukhx |vi2 kuk + 5.3. BOUNDING THE SHADOW OF LP . The main obstacle to proving a bound + +/ + on the size of the shadow of LP is that the vectors ai yi are not Gaussian random vectors. Toresolve this problem, we will show that, in almost every sufficiently small region, we can construct a family of Gaussian random vectors with distributions +/ + similar to the vectors ai yi . We will then bound the expected size of the shadow +/ + of the vectors ai yi by a small multiple of the expected size of the shadow of these Gaussian vectors. These regions are defined by splitting the original perturbation into two, and letting the first perturbation define the region. As in the analysis of LP0, a secondary obstacle is the correlation of and M with A and y. We again overcome this obstacle by considering the sum of the expected sizes of the shadows when and M are fixed to each of their likely values, and use the notation ( √ + + + + + / ,..., / , / def Shadow(0,z ),z a y a y if dM 4 1 T +(A, y,,M) = 1 1 n n z 0 otherwise, where + = 0 / , , ai (yi yi ) 2 a i y+ = (y0 + y )/2, and i ( i i M if i ∈ I y0 = √ i dM2/4 otherwise. 452 D. A. SPIELMAN AND S.-H. TENG

By Lemma 3.10 and Proposition 3.7, we then have + + b c d k , k e+ S , , I = T , , lg smin(AI(A)) , lg(maxi (yi a i ) ) 2 . z (A y ) z A y 2 2

n d LEMMA 5.27 (LP+). Let d 3 and n d +1. Let A¯ = [a¯ 1,...,a¯n]∈IR , n d y¯ ∈ IR and z ∈ IR , satisfy maxi k(y¯i , a¯ i )k ∈ (1/2, 1]. For any >0, let A be a Gaussian random matrix centered at A¯ of standard deviation , and let y by a Gaussian random vector centered at y¯ of standard deviation . Let I be a set of 3nd ln n randomly chosen d-subsets of [n]. Then, ,5 S+ , , I / , D , , min(1 ) + , E (A y ) 49 lg(nd min( 1)) d n / / n A,y,I 223(d + 1)11 2n14(ln n)5 2

where D(d, n,)is as given in Theorem 4.1.

PROOF.For0and 1 defined below, we let G and G˜ be Gaussian random matrices centered at the origin of standard deviations 0 and 1, respectively. We then let A˜ = A¯ + G and A = A˜ + G˜ . We similarly let h and h˜ be Gaussian random vectors centered at the origin of standard deviations 0 and 1, respectively, and lety ˜ = y 0 + h and y = y˜ + h˜ .If √ / √ 3 1 4 , 2en(60n(d + 1)3/2(ln n)3/2) we set 1 = . Otherwise, we set 1 so that q / + 2 2 3 1 4 d( 1 ) 1 = √ , 2en(60n(d + 1)3/2(ln n)3/2)

2 = 2 2 and set 0 1 . We note that  q  3 1/4 + d2  0  1 = min , √ . 2en(60n(d + 1)3/2(ln n)3/2)

As in the proof of Lemma 5.15, we define the set of likely values for M: √ 9 (d +1) ln n M = dlg xe+2 k , k 2 : max (y˜i a˜ i ) 1 / / x i (60n(d + 1)3 2(ln n)3 2) √ + k , k + 9 (d 1) ln n . max (y˜i a˜ i ) 1 / / i (60n(d + 1)3 2(ln n)3 2)

Observe that |M| 2. Smoothed Analysis of Algorithms 453

As in the proof of Lemma 5.15, we define random variables: p W = max k(y˜i , a˜ i )k 1 + 3 (d + 1) ln n0 , i  q  1/4 + d2  0  X = max k(y˜i , a˜ i )k √ , i 2en h i b c Y = 2 lg smin(AI(A)) ∈ K , and d k , ke+ Z = 2 lg maxi (yi a i ) 2 ∈ M . In order to apply the shadow bound proved below in Lemma 5.28, we need

M 3 max k(y˜i , a˜ i )k , i and 3/2 3/2 M (60n(d + 1) (ln n) )1. From the definition of M and the inequality p 1 9 (d + 1) ln n/(60n(d + 1)3/2(ln n)3/2) 3/4, the first of these inequalities holds if Z is true. Given that Z is true, the second inequality holds if X is also true. From Corollary 5.5, we know h i 1 1 b c n n Pr [not(Y )] Pr 2 lg smin(AI(A)) 6∈ K 0.42 0.42n . A,I A,I d d + 1 (46) From Corollary 2.19, we have n 1 Pr [not(W)] n2.9(d+1)+1 0.0015 . (47) A˜ ,y˜ d + 1 From Proposition 2.22, we know Pr [not(X)]  q  1/4 + d2 1  0  (d+1) 1 n = Pr max k(y˜i , a˜ i )k < √ < n . (48) i 2en 24 d + 1

To bound the probability that Z fails, we note that q / + 2 1 4 d 0 max k(y˜i , a˜ i )k √ i 2en and p max k(yi y˜i ,a i a˜ i )k 13 (d + 1) ln n i 454 D. A. SPIELMAN AND S.-H. TENG imply Z is true. Hence, by Corollary 2.19 and (48), n 1 Pr [not(Z)] n2.9(d+1)+1 + n(d+1) 0.044 . (49) d + 1 As in the proof of Lemma 5.15, we now expand E [S+(A, y, I)] I,A,y = E [S+(A, y, I)WXYZ] + E [S+(A, y, I)(1 WXYZ)]. (50) I,A,y I,A,y To bound the second term by n, we apply (47), (48), (46) and (49) to show n 1 Pr [not(W) or not(X) or not(Y ) or not(Z)] n , A,I d + 1 and then combine this inequality with Proposition 5.2. To bound the first term of (50), we note E S+(A, y, I)WXYZ I, , A y " # X E WX E [T + (A, y,,M)XZ] I,A˜ ,y˜ G˜ ,h˜ " ∈K,M∈M # X E WX E [T + (A, y,,M)|XZ] I, ˜ , ˜ , ˜ A y˜ ∈K,M∈M G h " # X 0 1 mini yi E WX eD d, n, 0 + 1 , by Lemma 5.28, (51) I, ˜ , 2 A y˜ ∈K,M∈M 3(maxi yi ) " # X M E WX eD d, n, + 1 2/ 2 I,A˜ ,y˜ ∈K, ∈M 3(M 4 ) M 16 min(K)2 E WX |K||M|eD d,n, + 1 . (52) 3 I,A˜ ,y˜ 3 max(M) √ As min(K) 0/2 and W implies max(M) 9(1 + 3 (d + 1) ln n), 16 min(K)2 16 3 min(1,)2 √ √ 3 max(M)3 34 9(1 + 3 (d + 1) ln n 3 12d2n7 ln n 2 16 min(1,5) √ √ 34 9(1 + 3 (d + 1) ln n 3 12d2n7 ln n 2 ,5 min(1 ) . 223(d + 1)11/2n14(ln n)5/2 Applying this inequality, Proposition 5.6, and the fact that X implies |M| 2, we obtain min(1,5) (52) 49 lg(nd/ min(, 1))D d, n, . 223(d + 1)11/2n14(ln n)5/2 Smoothed Analysis of Algorithms 455

+ LEMMA 5.28 (LP SHADOW,PART 2). Let d 3 and n d + 1. Let y be a Gaussian random vector of standard deviation 1 centered at a point y˜ , and let d a 1,...,an be Gaussian random vectors in IR of standard deviation 1 centered at a˜ 1,...,a˜n respectively. Under the conditions 0 > k , k , ∀ , yi 3( y˜i a˜ i ) i and (53) 0 > + 3/2 3/2 , ∀ , yi 60n(d 1) (ln n) 1 i (54) let + = 0 / , , ai ((yi yi ) 2 a i ) and + = 0 + / . yi (yi yi ) 2 Then, + + + + E Shadow , , + a /y ,...,a /y , ,... , , (0 z ) z 1 1 n n (y1 a 1) (yn a n ) min y0 eD d,n, 1 i i + 1. 0 2 3(maxi yi ) PROOF. We use the notation 0 ˜ + + + yi y˜i hi 2(a ˜ i g˜ i ) (p , (h˜ ), p (h˜ , g˜ )) = a /y = , , i 0 i i i i i i 0 + + ˜ 0 + + ˜ yi y˜i hi yi y˜i hi ,..., ˜ ˜ ,...,˜ = ˜ whereg ˜ 1 g˜n are the columns of G and (h1 hn) h as defined in the proof of Lemma 5.27. The Gaussian random vectors that we will use to approximate these will come from their first-order approximations:

(pˆ , (h˜ ), pˆ (h˜ , g˜ )) i 0 i i i y0 y˜ h˜ (2y0/(y0 + y˜ )) 2˜ +2˜ h˜ (2 ˜ /(y0 + y˜ )) = i i i i i i , ai gi i ai i i . 0 + 0 + yi y˜i yi y˜i , , Let ˆi (pˆ i,0 pˆ i ) be the induced density on (pˆ i,0 pˆ i ). In Lemma 5.30, we prove , ,... , , that there exists a set B of ((p1,0 p1) (pn,0 pn)) such that 1 , ,... , , ∈ . n , Q Pr [((p1,0 p1) (pn,0 pn)) B] 1 0 0015 n , + i=1 i (pi,0 pi ) d 1 , ,... , , ∈ and for ((p1,0 p1) (pn,0 pn)) B, Yn Yn , 0 , . i (pi,0 pi ) e i (pi,0 pi ) i=1 i=1 Consequently, Lemma 2.10 allows us to prove Q E Shadow , , + ((p , , p ),... ,(p , ,p )) n (0 z ) z 1 0 1 n 0 n (p , ,p ) i=1 i i 0 i + , ,... , , + . e Q E Shadow(0,z ),z ((p1,0 p1) (pn,0 pn)) 1 n , i=1 ˆi (pi,0 pi ) 456 D. A. SPIELMAN AND S.-H. TENG

By Lemma 5.29, the densities ˆi represent Gaussian distributions centered at points of norm at most y0 y˜ √ i i , 2˜ai , 0 + 0 + 5 (by condition (53)) yi y˜i yi y˜i whose covariance matrices have eigenvalues at most / 0 2 / + 3/2 3/2 2 / , 9 1 2yi (9 2(60n(d 1) (ln n) )) 1 9d ln n (by condition (54)) and at least / 0 2 . 9 1 8yi Thus, we can apply Corollary 4.21 to bound Q E Shadow , , + ((p , , p ),... ,(p , ,p )) n (0 z ) z 1 0 1 n 0 n ˆ (p , ,p ) i=1 i i 0i ! 9 /8 max y0 eD d, n, √ 1 i i + 1, (1 + 5)(max y0/ min y0) i i i i min y0 eD d, n, 1 i i + 1, 0 2 3(maxi yi ) thereby proving the Lemma. . ˜ , ˜ , LEMMA 5.29 (ˆ ). Under the conditions of Lemma 5 28, (pˆ i,0(hi ) pˆ (hi g˜ i )) is a Gaussian random vector centered at y0 y˜ i i , 2˜ai , 0 + 0 + yi y˜i yi y˜i / 0 2 / 0 2 and has a covariance matrix with eigenvalues between (9 1 8yi ) and (9 1 2yi ) . ˜ , ˜ , ˜ , ˜ , PROOF. Because (pˆ i,0(hi ) pˆ (hi g˜ i )) is linear in (hi g˜ i ) and (hi g˜ i )isa ˜ , ˜ , Gaussian random vector, (pˆ i,0(hi ) pˆ (hi g˜ i )) is a Gaussian vector. The statement ˜ , about the center of the distributions follows immediately from the fact that (hi g˜ i ) is centered at the origin. To construct the covariance matrix, we note that the matrix ˜ , ˜ , ˜ , corresponding to the transformation from (hi g˜ i )to(pˆi,0(hi) pˆ(hi g˜i)) is   2y0 i , ,...,  0+˜ 2 0 0  (yi yi )     2a˜i,1  0+ 2  (yi y˜i)  def   2a˜i,2 Ci =    0+ 2 2   (yi y˜i)   y0+y˜ I   . i i   .  2a˜i,d 0+ 2 (yi y˜i) ˜ , ˜ , 2 T Thus, the covariance matrix of (pˆ i,0(hi ) pˆ (hi g˜ i )) is given by 1 Ci Ci . Smoothed Analysis of Algorithms 457

We now note that     y˜i , ,..., 0+ 0 0 ,...,  y y˜i   1 0 0  i     a˜i,1   0   y0+y˜  y0 + y˜    i i  i i  = a˜i,2  Ci  0   0  2    y +y˜i   . I   i 0   .   .   .  0 a˜i,d 0+ yi y˜i As all the singular values of the middle matrix are 1, and the norm of the right-hand matrix is k(y˜ , a˜ )k /(y0 + y˜ ), all the singular values of C lie between i i i i i k , k k , k 2 (y˜i a˜ i ) 2 + (y˜i a˜ i ) . 0 + 1 0 + and 0 + 1 0 + yi y˜i yi y˜i yi y˜i yi y˜i The stated bounds now follow from inequality (53). LEMMA 5.30 (ALMOST GAUSSIAN). Under the conditions of Lemma 5.28, let , , , i (pi,0 pi ) be the induced density on (pi,0 pi ), and let ˆi (pˆ i,0 pˆ i ) be the induced , , ,... , , density on (pˆ i,0 pˆ i ). Then, there exists a set B of ((p1,0 p1) (pn,0 pn)) such that n 1 , , ,... , , , ∈ . (a) Pr [((p1 0 p1) (pn 0 pn)) B] 1 0 0015 d+1 ; and , ,... , , ∈ (b) for all ((p1,0 p1) (pn,0 pn)) B, Yn Yn , , . i(pi,0 pi) e ˆi(pi,0 pi) i=1 i=1 PROOF. Let ((p , (h˜ ), p (h˜ , g˜ )),... ,(p , (h˜ ),p (h˜ ,g˜ ))) B = 1 0 1 1 1 1 n 0 √n n n n . ˜ , + , such that (hi g˜ i ) 3 (d 1) ln n 1 for 1 i n √ From inequalities (53) and (54), and the assumption |h˜ i |3 (d+1) ln n1, 0 + + ˜ > ˜ , ,..., ˜ , we can show yi y˜i hi 0, and so the map from (h1 g˜ 1) (hn g˜n)to , ,..., , , ,..., , ∈ (p1,0 p1) (pn,0 pn) is invertible for (p1,0 p1) (pn,0 pn) B. Thus, we may apply Corollary 2.19 to establish part (a). Part (b) of follows directly Lemma 5.31. LEMMA 5.31 (ALMOST GAUSSIAN,SINGLE VARIABLE).√ Under the conditions . ˜ k ˜ , k + of Lemma 5 28, for all hi and g˜ i such that (hi g˜ i ) 3 (d 1) ln n 1, ˜ , ˜ , / ˜ , ˜ , . i (pi,0(hi ) pi (hi g˜ i )) exp(1 n)ˆi(pi,0(hi) pi(hi g˜i)) ˜ , ˜ , PROOF. Let (hi g˜ i ) be the density on (hi g˜ i ). As observed in the proof ˜ , ˜ , ˜, of Lemma 5.30,√ the map from (hi g˜ i )to(pi,0(hi)pi(hi g˜i)) is injective for k ˜ , k + (hi g˜ i ) 3 (d 1) ln n 1; so, by Proposition 2.26, the induced density on i is 1 , , = ˜ , , , , = , ˜ , ˜ , i (pi 0 pi ) ∂ , (hi g˜ i ) where (pi 0 pi ) (pi 0(hi ) pi (hi g˜ i )). (pi,0 pi ) det ∂ ˜ , (hi g˜ i ) 458 D. A. SPIELMAN AND S.-H. TENG

Similarly, 1 ˆ , , = ˆ , , ˆ , , = ˆ , ˆ , ˆ , ˆi (pi 0 pˆ i ) ∂ , (hi gˆ i ) where (pi 0 pˆ i ) (pi 0(hi ) pˆ i (hi gˆ i )). (pˆ i,0 pˆ i ) det ∂ ˆ , (hi gˆ i ) The proof now follows from Lemma 5.32, which tells us that (h˜ , g˜ ) i i exp(0.81/n), ˆ , (hi gˆ i ) and Lemma 5.33, which tells us that ∂ , (pˆ i,0 pˆ i ) det ∂ ˆ , (hi gˆ i ) / . ∂ , exp(1 10n) (pi,0 pi ) det ∂ ˜ , (hi g˜ i )

LEMMA 5.32 (ALMOST GAUSSIAN,POINTWISE). Under the conditions of . ˜ = ˆ ˜, = ˆ, k˜ , k Lemma√ 5 31,ifpi,0(hi) pˆ0(hi),pi(hi g˜i) pˆi(hi gˆi), and hi g˜ i 3 (d+1) ln n1, then (h˜ , g˜ ) i i exp(0.81/n). ˆ , (hi gˆ i ) PROOF. We first observe that the conditions of the lemma imply h˜ (y0 + y˜ ) g˜ (y0 + y˜ ) hˆ = i i i , andg ˆ = i i i . i 0 + + ˜ i 0 + + ˜ yi y˜i hi yi y˜i hi We then compute (h˜ , g˜ ) 1 2h˜ (y0 + y˜ ) + h˜ 2 i i = exp k(h˜ , g˜ )k2 i i i i . (55) ˆ , 2 i i 0 + + ˜ 2 (hi gˆ i ) 2 1 (yi y˜i hi ) √ k ˜ , k + Assuming (hi g˜ i ) 3 (d 1) ln n 1, the absolute value of the exponent in (55) is at most + 2h˜ (y0 + y˜ ) + h˜ 2 9(d 1) ln n i i i i . 0 + + ˜ 2 2 (yi y˜i hi ) From inequalities (53) and (54), we find y0 + y˜ i i 40 . 0 + + ˜ 2 2 + 3/2 3/2 (yi y˜i hi ) (37) n(d 1) (ln n) 1 ˜ / 0 + Observing that hi (1 40)(yi y˜i ), we can now lower bound the exponent in (55) by 9(d + 1) ln n 2h˜ (81/80)40 i 0.81/n. 2 3/2 3/2 2 (37) n(d + 1) (ln n) i Smoothed Analysis of Algorithms 459

LEMMA 5.33 (ALMOST GAUSSIAN,JACOBIANS). Under the conditions of Lemma 5.31, ∂ , (pˆ 0 pˆ i ) det ∂ ˆ , (hi gˆ i ) . / . ∂ , exp(0 094 n) (pi,0 pi ) det ∂ ˜ , (hi g˜ i ) PROOF. We first note that ∂(pˆ , pˆ ) 2d+1 y0 det 0 i = |det (C )| = i . ∂ ˆ , i 0 + d+2 (hi gˆ i ) (yi y˜i ) ∂ , To compute |det( (pi,0 pi ) )|, we note that ∂(h˜ i ,g˜ ) i ∂ 0 pi,0 2yi = , and ∂h˜ (y0 + y˜ + h˜ )2 i ( i i i ∂p , (h˜ , g , ) 0ifj6= k i j i i k = ∂ 2 . gi,k 0+ +˜ otherwise yi y˜i hi Thus, the matrix of partial derivatives is lower-triangular, and its determinant has absolute value d+1 0 ∂(p , , p ) 2 y det i 0 i = i . ∂ ˜ , 0 + + ˜ d+2 (hi g˜ i ) (yi y˜i hi ) Thus, ∂ ˆ , ˆ (p0 pi ) 0 d+2 det ∂ ˆ , y + y˜ + h˜ (hi gˆ i ) i i i = 0 ∂(p , ,p ) + det i 0 i yi y˜i ∂(h˜ ,g˜ ) i i ˜ d+2 = + hi 1 0 + yi y˜i + 3h˜ d 2 1 + i , by (53), 2y0 i 3(d + 2)h˜ i exp 0 2yi exp(0.094/n), by d 3 and (54).

6. Discussion and Open Questions The results proved in this article support the assertion that the shadow-vertex sim- plex algorithm usually runs in polynomial time. However, our understanding of the performance of the simplex algorithm is far from complete. In this section, we discuss problems in the analysis of the simplex algorithm and in the smoothed analysis of algorithms that deserve further study. 460 D. A. SPIELMAN AND S.-H. TENG

6.1. PRACTICALITY OF THE ANALYSIS. While we have demonstrated that the smoothed complexity of the shadow-vertex algorithm is polynomial, the polynomial we obtain is quite large. Yet, we believe that the present analysis provides some intuition for why the shadow-vertex simplex algorithm should run quickly. It is clear that the proofs in this article are very loose and make many worst-case assumptions that are unlikely to be simultaneously valid. Wedid not make any attempt to optimize the coefficients or exponents of the polynomial we obtained. We have not attempted such optimization for two reasons: they would increase the length of the paper and probably make it more difficult to read; and, we believe that it should be possible to improve the bounds in this paper by simplifying the analysis rather than making it more complicated. Finally, we point out that most of our intuition comes from the shadow size bound, which is not so bad as the bound for the two-phase algorithm.

6.2. FURTHER ANALYSIS OF THE SIMPLEX ALGORITHM —While we have analyzed the shadow-vertex pivot rule, there are many other pivot rules that are more commonly used in practice. Knowing that one pivot rule usu- ally takes polynomial time makes it seem reasonable that others should as well. We consider the maximum-increase and steepest-increase rules, as well as ran- domized pivot rules, to be good candidates for smoothed analysis. However, the reader should note that there is a reason that the shadow-vertex pivot rule was the first to be analyzed: there is a simple geometric description of the vertices encountered by the algorithm. For other pivot rules, the only obvious character- ization of the vertices encountered is by iterative application of the pivot rule. This iterative characterization introduces dependencies that make probabilistic analysis difficult. —Even if we cannot perform a smoothed analysis of other pivot rules, we might be able to measure the diameter of a polytope under smoothed analysis. We conjecture that it is expected polynomial in m, d, and 1/. —Given that the shadow-vertex simplex algorithm can solve the perturbations of linear programs efficiently, it seems natural to ask if we can follow the solutions as we unperturb the linear programs. For example, having solved an instance of type (4), it makes sense to follow the solution as we let approach zero. Such an approach is often called a homotopy or path-following method. So far, we know of no reason that there should exist an A for which one cannot follow these solutions in expected polynomial time, where the expectation is taken over the choice of G. Of course, if one could follow these solutions in expected polynomial time for every A, then one would have a randomized strongly polynomial time algorithm for linear programming!

6.2. DEGENERACY. One criticism of our model is that it does not allow for degenerate linear programs. It is an interesting problem to find a model of local perturbations that will preserve meaningful degeneracies. It seems that one might be able to expand upon the ideas of Todd [1991] to construct such a model. Until such a model presents itself and is analyzed, we make the following observations about two types of degeneracies. —In primal degeneracy, a single feasible vertex may correspond to multiple bases, I . In the polar formulation, this corresponds to an unexpectedly large number of the a i s lying in a (d 1)-dimensional affine subspace. In this case, a simplex Smoothed Analysis of Algorithms 461

method may cycle—spending many steps switching among bases for this vertex, failing to make progress toward the objective function. Unlike many simplex methods, the shadow-vertex method may still be seen to be making progress in this situation: each successive basis corresponds to a simplex that maps to an edge further along the shadow. It just happens that these edges are co-linear. A more severe version of this phenomenon occurs when the set of feasible points of a linear program lies in an affine subspace of fewer then d dimensions. By considering perturbations to the constraints under the condition that they do not alter the affine span of the set of feasible points, the results on the sizes of shadows obtained in Section 4 carry over unchanged. However, how such a restriction would affect the results in Section 5 is presently unclear. —In dual degeneracy, the optimal solution of the linear program is a face of the polyhedron rather than a vertex. This does not appear to be a very strong condi- tion, and we expect that one could extend our analysis to a model that preserves such degeneracies.

6.3. SMOOTHED ANALYSIS. We believe that many algorithms will be better understood through smoothed analysis. Scientists and engineers routinely use al- gorithms with poor worst-case performance. Often, they solve problems that appear intractable from the worst-case perspective. While we do not expect smoothed anal- ysis to explain every such instance, we hope that it can explain away a significant fragment of the discrepancy between the algorithmic intuitions of engineers and theorists. To make it easier to apply smoothed analyses, we briefly discuss some alternative definitions of smoothed analysis. Zero-Preserving Perturbations. One criticism of smoothed complexity as de- fined in Section 1.2 is that the additive Gaussian perturbations destroy any zero- structure that the problem has, as it will replace the zeros with small values. One can refine the model to fix this problem by studying zero-preserving perturbations. In this model, one applies Gaussian perturbations only to non-zero entries. Zero entries remain zero. Relative Perturbations. A further refinement is the model of relative perturba- tions. Under a relative perturbation, an input is mapped to a constant multiple of itself. For example, a reasonable definition would be to map each variable by

x 7→ x(1 + g), where g is a Gaussian random variable of mean zero and variance 1. Thus, each number is usually mapped to one of similar magnitude, and zero is always mapped to zero. When we measure smoothed complexity under relative perturbations, we call it relative smoothed complexity. Smooth complexity as defined in Section 1.2 above can be called absolute smoothed complexity if clarification is necessary. It would be very interesting to know if the simplex method has polynomial relative smoothed complexity. -Smoothed-Complexity. Even if we cannot bound the expectation of the run- ning time of an algorithm under perturbations, we can still obtain computationally meaningful results for an algorithm by proving that it has -smoothed-complexity f (n,,), by which we mean that the probability that it takes time more than 462 D. A. SPIELMAN AND S.-H. TENG

f (n,,) is at most :

∀x∈X Pr [C(A, x + max(x)g) f (n,)]1. n g

ACKNOWLEDGMENTS. We thank Diana, Donna and Julia for their patience while we wrote, wrote, and re-wrote this article. We thank Marcos Kiwi and Mohammad Madhian for comments on an early draft of this article. We thank Alan Edelman, Charles Leiserson, and Michael Sipser for helpful conversations. We thank Alan Edelman for proposing the name “smoothed analysis”. Finally, we thank the referees for their extrodinary efforts and many helpful suggestions.

REFERENCES

ABRAMOWITZ, M., AND STEGUN, I. A., Eds. 1970. Handbook of Mathematical Functions, 9 ed. Applied Mathematics Series, vol. 55. National Bureau of Standards. ADLER, I. 1983. The expected number of pivots needed to solve parametric linear programs and the efficiency of the self-dual simplex method. Tech. Rep., University of California at Berkeley. May. ADLER, I., KARP,R.M.,AND SHAMIR, R. 1987. A simplex variant solving an m x d linear program in O(min(m2,d2)) expected number of pivot steps. J. Complex. 3, 372–387. ADLER, I., AND MEGIDDO, N. 1985. A simplex algorithm whose average number of steps is bounded between two quadratic functions of the smaller dimension. J. ACM 32, 4 (Oct.), 871–895. AMENTA, N., AND ZIEGLER, G. 1999. Deformed products and maximal shadows of polytopes. In Ad- vances in Discrete and Computational Geometry, B. Chazelle, J. Goodman, and R. Pollack, Eds. Number 223 in Contemporary Mathematics. American Mathematics Society, 57–90. AVIS, D., AND CHVATAL´ , V. 1978. Notes on Bland’s pivoting rule. In Polyhedral Combinatorics. Math. Programming Study, vol. 8. 24–34. BLASCHKE, W. 1935. Integralgeometrie 2: Zu ergebnissen von M. W. Crofton. Bull. Math. Soc. Roumaine Sci. 37, 3–11. BLUM, A., AND SPENCER, J. 1995. Coloring random and semi-random k-colorable graphs. J. Algo- rithms 19, 2, 204–234. BORGWARDT, K. H. 1977. Untersuchungen zur asymptotik der mittleren schrittzahl von simplexver- fahren in der linearen optimierung. Ph.D. dissertation, Universitat Kaiserslautern. BORGWARDT, K. H. 1980. The Simplex Method: a probabilistic analysis. Number 1 in Algorithms and Combinatorics. Springer-Verlag, New York. CHAN,T.F.,AND HANSEN, P. C. 1992. Some applications of the rank revealing QR factorization. SIAM J. Sci. Stat. Comput. 13, 3 (May), 727–741. DANTZIG, G. B. 1951. Maximization of linear function of variables subject to linear inequalities. In Activity Analysis of Production and Allocation, T. C. Koopmans, Ed. 339–347. EDELMAN, A. 1992. Eigenvalue roulette and random test matrices. In Linear Algebra for Large Scale and Real-Time Applications, M. S. Moonen, G. H. Golub, and B. L. R. D. Moor, Eds. NATO ASI Series. 365–368. EFRON, B. 1965. The convex hull of a random set of points. Biometrika 52, 3/4, 331–343. FEIGE, U., AND KILIAN, J. 1998. Heuristics for finding large independent sets, with applications to color- ing semi-random graphs. In 39th Annual Symposium on Foundations of Computer Science: Proceedings (Palo Alto, Calif. Nov. 8–11). IEEE Computer Society Press, Los Alamitos, Calif. FEIGE, U., AND KRAUTHGAMER, R. 1998. Improved performance guarantees for bandwidth minimization heuristics. http://www.cs.berkeley.edu/robi/papers/FK-BandwidthHenristics-manuscript98.ps.gz. FELLER, W. 1968. An Introduction to Probability Theory and Its Applications. Vol. 1. Wiley, New York. FELLER, W. 1971. An Introduction to Probability Theory and Its Applications. Vol. 2. Wiley, New York. GOLDFARB, D. 1983. Worst case complexity of the shadow vertex simplex algorithm. Tech. Rep. Columbia Univ., New York. GOLDFARB, D., AND SIT, W. T. 1979. Worst case behaviour of the steepest edge simplex method. Disc. Appl. Math 1, 277–285. HAIMOVICH, M. 1983. The simplex algorithm is very good!: On the expected number of pivot steps and related properties of random linear programs. Tech. Rep. Columbia Univ., New York. JEROSLOW, R. G. 1973. The simplex algorithm with the pivot rule of maximizing improvement criterion. Disc. Math. 4, 367–377. Smoothed Analysis of Algorithms 463

KALAI, G. 1992. A subexponential randomized simplex algorithm. In Proceedings of the 24th Annual ACM Symposium on Theory of Computing (Victoria, B.C., Canada). ACM, New York, 475–482. KALAI, G., AND KLEITMAN, D. J. 1992. A quasi-polynomial bound for the diameter of graphs of poly- hedra. Bull. Amer. Math. Soc. 26, 315–316. KARMARKAR, N. 1984. A new polynomial time algorithm for linear programming. Combinatorica 4, 373–395. KHACHIYAN, L. G. 1979. A polynomial algorithm in linear programming. Doklady Akademia Nauk SSSR, 1093–1096. KLEE,V.,AND MINTY, G. J. 1972. How good is the simplex algorithm? In Inequalities—III, Shisha, O., Ed. Academic Press, Orlando, Fla., 159–175. MATOUSEKˇ , J., SHARIR, M., AND WELZL, E. 1996. A subexponential bound for linear programming. Algorithmica 16, 4/5 (Oct./Nov.), 498–516. MEGIDDO, N. 1986. Improved asymptotic analysis of the average number of steps performed by the self-dual simplex algorithm. Math. Prog. 35, 2, 140–172. MILES, R. E. 1971. Isotropic random simplices. Adv. Appl. Prob. 3, 353–382. MURTY, K. G. 1980. Computational complexity of parametric linear programming. Math. Prog. 19, 213–219. RENEGAR, J. 1994. Some perturbation theory for linear programming. Math. Prog. Ser. A. 65, 1, 73–91. RENEGAR, J. 1995a. Incorporating condition measures into the complexity theory of linear programming. SIAM J. Optim. 5, 3, 506–524. RENEGAR, J. 1995b. Linear programming, complexity theory and elementary functional analysis. Math. Prog. Ser. A. 70, 3, 279–351. RENYI, A., AND SULANKE, R. 1963. Uber die convexe hulle von is zufallig gewahlten punkten, I. Z. Whar. 2, 75–84. RENYI, A., AND SULANKE, R. 1964. Uber die convexe hulle von is zufallig gewahlten punkten, II. Z. Whar. 3, 138–148. SANTALO, L. A. 1976. Integral Geometry and Geometric Probability. Encyclopedia of Mathematics and its Applications. Addison-Wesley, Reading, Mass. SANTHA, M., AND VAZIRANI, U. 1986. Generating quasi-random sequences from semi-random sources. J. Comput. System Sciences 33, 75–87. SMALE, S. 1982. The problem of the average speed of the simplex method. In Proceedings of the 11th International Symposium on Mathematical Programming. 530–539. SMALE, S. 1983. On the average number of steps in the simplex method of linear programming. Math. Prog. 27, 241–262. TODD, M. 1986. Polynomial expected behavior of a pivoting algorithm for linear complementarity and linear programming problems. Math. Prog. 35, 173–192. TODD, M. J. 1991. Probabilistic models for linear programming. Math. Oper. Res. 16, 4, 671–693. TODD, M. J., TUNC¸EL, L., AND YE, Y. 2001. Characterizations, bounds, and probabilistic analysis of two complexity measures for linear programming problems. Math. Program., Ser. A 90, 59–69. TURNER, J. S. 1986. On the probable performance of a heuristic for bandwidth minimization. SIAM J. Comput. 15, 2, 561–580. VAVASIS,S.A.,AND YE, Y. 1996. A primal-dual interior-point method whose running time depends only on the constraint matrix. Math. Program. 74, 79–120.

RECEIVED JUNE 2002; REVISED OCTOBER 2003; ACCEPTED OCTOBER 2003

Journal of the ACM, Vol. 51, No. 3, May 2004.