Annals of Mathematics, 169 (2009), 595–632

Inverse Littlewood-Offord theorems and the condition number of random discrete matrices

By Terence Tao and Van H. Vu*

Abstract

Consider a random sum η1v1 + ··· + ηnvn, where η1, . . . , ηn are indepen- dently and identically distributed (i.i.d.) random signs and v1, . . . , vn are inte- gers. The Littlewood-Offord problem asks to maximize concentration probabil- ities such as P(η1v1+···+ηnvn = 0) subject to various hypotheses on v1, . . . , vn. In this paper we develop an inverse Littlewood-Offord theory (somewhat in the spirit of Freiman’s inverse theory in additive combinatorics), which starts with the hypothesis that a concentration probability is large, and concludes that almost all of the v1, . . . , vn are efficiently contained in a generalized arithmetic progression. As an application we give a new bound on the magnitude of the least of a random Bernoulli , which in turn provides upper tail estimates on the condition number.

1. Introduction

Let v be a multiset (allowing repetitions) of n integers v1, . . . , vn. Consider a class of discrete random walks Yµ,v on the integers Z, which start at the origin and consist of n steps, where at the ith step one moves backwards or forwards with magnitude vi and probability µ/2, and stays at rest with probability 1−µ. More precisely:

Definition 1.1 (Random walks). For any 0 ≤ µ ≤ 1, let ηµ ∈ {−1, 0, 1} denote a random variable which equals 0 with probability 1 − µ and ±1 with probability µ/2 each. In particular, η1 is a random sign ±1, while η0 is iden- tically zero. Given v, we define Yµ,v to be the random variable

n X µ Yµ,v := ηi vi i=1

*T. Tao is a Clay Prize Fellow and is supported by a grant from the Packard Foundation. V. Vu is an A. Sloan Fellow and is supported by an NSF Career Grant. 596 TERENCE TAO AND VAN H. VU

µ µ where the ηi are i.i.d. copies of η . Note that the exact enumeration v1, . . . , vn of the multiset is irrelevant. The concentration probability Pµ(v) of this random walk is defined to be the quantity

(1) Pµ(v) := max P(Yµ,v = a). a∈Z

Thus we have 0 < Pµ(v) ≤ 1 for any µ, v.

The concentration probability (and more generally, the concentration ) is a central notion in probability theory and has been studied exten- sively, especially by the Russian school (see [21], [19], [18] and the references therein). The first goal of this paper is to establish a relation between the magnitude of Pµ(v) and the arithmetic structure of the multiset v = {v1, . . . , vn}. This gives an answer to the general question of finding conditions under which one can squeeze large probability inside a small interval. We will primarily be interested in the case µ = 1, but for technical reasons it will be convenient to consider more general values of µ. Generally, however, we think of µ as fixed, while letting n become very large. A classical result of Littlewood-Offord [16], found in their study of the number of real roots of random , asserts that if all of the vi’s −1/2 are nonzero, then P1(v) = O(n log n). The log term was later removed by Erd˝os[5]. Erd˝os’bound is sharp, as shown by the case v1 = ··· = vn 6= 0. How- ever, if one forbids this special case and assumes that the vi’s are all distinct, then the bound can be improved significantly. Erd˝osand Moser [6] showed −3/2 that under this stronger assumption, P1(v) = O(n ln n). They conjectured that the logarithmic term is not necessary and this was confirmed by S´ark¨ozy and Szemer´edi[22]. Again, the bound is sharp (up to a constant factor), as can be seen by taking v1, . . . , vn to be a proper arithmetic progression such as 1, . . . , n. Later, Stanley [24], using algebraic methods, gave a very explicit bound for the probability in question. The higher dimensional version of Littlewood-Offord’s problem (where d the vi are nonzero vectors in R , for some fixed d) also drew lots of attention. Without the assumption that the vi’s are different, the best result was obtained by Frankl and F¨urediin [7], following earlier results by Katona [11], Kleitman [12], Griggs, Lagarias, Odlyzko and Shearer [8] and many others. However, the techniques used in these papers did not seem to yield the generalization of S´ark¨ozyand Szemer´edi’sresult (the O(n−3/2) bound under the assumption that the vectors are different). The generalization of S´ark¨ozyand Szemer´edi’sresult was obtained by Hal´asz[9], using analytical methods (especially harmonic analysis). Hal´asz’ paper was one of our starting points in this study. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 597

In the above two examples, we see that in order to make Pµ(v) large, we have to impose a very strong additive structure on v (in one case we set the vi’s to be the same, while in the other we set them to be elements of an arithmetic progression). We are going to show that this is the only way to make Pµ(v) large. More precisely, we propose the following phenomenon:

If Pµ(v) is large, then v has a strong additive structure. In the next section, we are going to present several theorems supporting this phenomenon. Let us mention here that there is an analogous phenomenon in combinatorial number theory. In particular, a famous theorem of Freiman asserts that if A is a finite set of integers and A+A is small, then A is contained efficiently in a generalized arithmetic progression [28, Ch. 5]. However, the proofs of Freiman’s theorem and those in this paper are quite different. As an application, we are going to use these inverse theorems to study µ random matrices. Let Mn be an n by n random matrix, whose entries are i.i.d. copies of ηµ. We are going to show that with very high probability, µ the condition number of Mn is bounded from above by a in n (see Theorem 3.3 below). This result has high potential of applications in the theory of probability in Banach spaces, as well as in and theoretical computer science. A related result was recently established by Rudelson [20], with better upper bounds on the condition number but worse probabilities. We will discuss this application with more detail in Section 3. To see the connection between this problem and inverse Littlewood-Offord theory, observe that for any v = (v1, . . . , vn) (which we interpret as a column µ vector), the entries of the product Mn v are independent copies of Yµ,v. Thus T µ we expect that v is unlikely to lie in the of Mn unless the concentration probability Pµ(v) is large. These ideas are already enough to control the µ singularity probability of Mn (see e.g. [10], [25], [26]). To obtain the more quantitative condition number estimates, we introduce a new discretization technique that allows one to estimate the probability that a certain random variable is small by the probability that a certain discretized analogue of that variable is zero. The rest of the paper is organized as follows. In Section 2 we state our main inverse theorems. In Section 3 we state our main results on condition numbers, as well as the key lemmas used to prove these results. In Section 4, we give some brief applications of the inverse theorems. In Section 7 we prove the result on condition numbers, assuming the inverse theorems and two other key ingredients: a discretization of generalized progressions and an extension of the famous result of Kahn, Koml´osand Szemer´edi[10] on the probability that a random Bernoulli matrix is singular. The inverse theorems is proven in Section 6, after some preliminaries in Section 5 in which we establish basic properties of Pµ(v). The result about discretization of progressions are proven 598 TERENCE TAO AND VAN H. VU in Section 8. Finally, in Section 9 we prove the extension of Kahn, Koml´osand Szemer´edi[10]. We conclude this section by setting out some basic notation. A set

0 P = {c + m1a1 + ··· + mdad|Mi ≤ mi ≤ Mi } is called a generalized arithmetic progression (GAP) of rank d. It is convenient to think of P as the image of an integer box

0 B := {(m1, . . . , md)|Mi ≤ mi ≤ Mi } in Zd under the linear map

Φ:(m1, . . . , md) 7→ c + m1a1 + ··· + mdad.

The numbers ai are the generators of P . In this paper, all GAPs have ra- tional generators. A GAP is proper if Φ is one to one on B. The product Qd 0 0 i=1(Mi − Mi + 1) is the volume of P . If Mi = −Mi and c = 0 (so P = −P ) then we say that P is symmetric. For a set A of reals and a positive integer k, we define the iterated sumset

kA := {a1 + ··· + ak|ai ∈ A}.

One should take care to distinguish the sumset kA from the dilate k·A, defined for any real k as k · A := {ka|a ∈ A}.

We always assume that n is sufficiently large. The asymptotic notation O(), o(), Ω(), Θ() is used under the assumption that n → ∞. Notation such as Od(f) means that the hidden constant in O depends only on d.

2. Inverse Littlewood-Offord theorems

Let us start by presenting an example when Pµ(v) is large. This example is the motivation of our inverse theorems.

Example 2.1. Let P be a symmetric generalized arithmetic progression of rank d and volume V ; we view d as being fixed independently of n, though V can grow with n. Let v1, . . . , vn be (not necessarily different) elements of V . Pn Then the random variable Yµ,v = i=1 ηivi takes values in the GAP nP which has volume ndV . From the pigeonhole principle it follows that

−d −1 Pµ(v) ≥ n V .

In fact, the central limit theorem suggests that Pµ(v) should typically be of the order of n−d/2V −1. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 599

This example shows that if the elements of v belong to a GAP with small rank and small volume then Pµ(v) is large. One might hope that the inverse also holds, namely,

If Pµ(v) is large, then (most of ) the elements of v belong to a GAP with small rank and small volume. In the rest of this section, we present three theorems, which support this statement in a quantitative way.

Definition 2.2 (Dissociativity). Given a multiset w = {w1, . . . , wr} of real numbers and a positive number k, we define the GAP Q(w, k) and the cube S(w) as follows:

Q(w, k) := {m1w1 + ··· + mrwr| − k ≤ mi ≤ k},

S(w) := {1w1 + ··· + rwr|i ∈ {−1, 1}}. We say that w is dissociated if S(w) does not contain zero. Furthermore, w is k-dissociated if there do not exist integers −k ≤ m1, . . . , mr ≤ k, not all zero, such that m1w1 + ··· + mrwr = 0.

Our first result is the following simple proposition:

Proposition 2.3 (Zeroth inverse theorem). Let v = {v1, . . . , vn} be such −d−1 that P1(v) > 2 for some integer d ≥ 0. Then v contains a subset w of size d such that the cube S(w) contains v1, . . . , vn.

The next two theorems are more involved and also more useful. In these two theorems and their corollaries, we assume that k and n are sufficiently large, whenever needed.

Theorem 2.4 (First inverse theorem). Let µ be a positive constant at most 1 and let d be a positive integer. Then there is a constant C =C(µ, d)≥1 such that the following holds. Let k ≥ 2 be an integer and let v = {v1, . . . , vn} be a multiset such that −d Pµ(v) ≥ C(µ, d)k .

Then there exists a k-dissociated multiset w = {w1, . . . , wr} such that

(1) r ≤ d − 1 and w1, . . . , wr are elements of v;

S 1 2 (2) The union τ∈Z,1≤τ≤k τ · Q(w, k) contains all but k of the integers v1, . . . , vn (counting multiplicity).

This theorem should be compared against the heuristics in Example 2.1 √ (setting k equal to a small multiple of n). In particular, note that the GAP Q(w, k) has very small volume, only O(kd−1). 600 TERENCE TAO AND VAN H. VU

The above theorem does not yet show that most of the elements of v belong to a single GAP. Instead, it shows that they belong to the union of a 1 few dilates of a GAP. One could remove the unwanted τ factor by clearing denominators, but this costs us an exponential factor such as k!, which is often too large in applications. Fortunately, a more refined argument allows us to eliminate these denominators while losing only polynomial factors in k:

Theorem 2.5 (Second inverse theorem). Let µ be a positive constant at most one,  be an arbitrary positive constant and d be a positive integer. Then there are constants C = C(µ, , d) ≥ 1 and k0 = k0(µ, , d) ≥ 1 such that the following holds. Let k ≥ k0 be an integer and let v = {v1, . . . , vn} be a multiset such that −d Pµ(v) ≥ Ck . Then there exists a GAP Q with the following properties: (1) The rank of Q is at most d − 1; (2) The volume of Q is at most k2(d2−1)+; (3) Q contains all but at most k2 log k elements of v (counting multiplicity); (4) There exists a positive integer s at most kd+ such that su ∈ v for each generator u of Q.

Remark 2.6. A small number of exceptional elements cannot be avoided. For instance, one can add O(log k) completely arbitrary elements to v, and −O(1) decrease Pµ(v) by a factor of k at worst.

For the applications in this paper, the following corollary of Theorem 2.5 is convenient.

Corollary 2.7. For any positive constants A and α there is a positive constant A0 such that the following holds. Let µ be a positive constant at most one and assume that v = {v1, . . . , vn} is a multiset of integers satisfying −A 0 Pµ(v) ≥ n . Then there is a GAP Q of rank at most A and volume at most nA0 which contains all but at most nα elements of v (counting multiplicity). Furthermore, there exists a positive integer s ≤ nA0 such that su ∈ v for each generator u of Q.

−A Remark 2.8. The assumption Pµ(v) ≥ n in all statements can be re- placed by the following more technical, but somewhat weaker, assumption that Z 1 Y −A |(1 − µ) + µ cos 2πviξ| dξ ≥ n . 0 i=1

The right-hand side is an upper bound for Pµ(v), provided that µ is sufficiently −A small. Assuming that Pµ(v) ≥ n , what is actually used in the proofs is the INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 601 consequence Z 1 Y −A |(1 − µ) + µ cos 2πviξ|dξ ≥ n . 0 i=1 (See §5 for more details.) This weaker assumption is useful in applications (see [27]).

The vector versions of all three theorems hold (when the vi’s are vectors in Rr, for any positive integer r), thanks to Freiman’s isomorphism principle (see, e.g., [28, Ch. 5]). This principle allows us to project the problem from Rr onto Z. The value of r is irrelevant and does not appear in any quantitative bound. In fact, one can even replace Rr by any torsion free additive group. In an earlier paper [26] we introduced another type of inverse Littlewood- Offord theorem. This result showed that if Pµ(v) was comparable to P1(v), then v could be efficiently contained inside a GAP of bounded rank (see [26, Th. 5.2] for details). We shall prove these inverse theorems in Section 6, after some combina- torial and Fourier-analytic preliminaries in Section 5. For now, we take these results for granted and turn to an application of these inverse theorems to random matrices.

3. The condition number of random matrices

If M is an n × n matrix, we use

σ1(M) := sup kMxk n x∈R ,kxk=1 to denote the largest singular value of M. Tthis parameter is also often called the operator of M. Here kxk denotes the Euclidean magnitude of a vector x ∈ Rn. If M is invertible, the condition number c(M) is defined as

−1 c(M) := σ1(M)σ1(M ). We adopt the convention that c(M) is infinite if M is not invertible. The condition number plays a crucial role in applied and computer science. In particular, the complexity of any algorithm which re- quires solving a system of linear equations usually involves the condition num- ber of a matrix; see [1], [23]. Another area of mathematics where this parameter is important is the theory of probability in Banach spaces (e.g. see [15], [20]). The condition number of a random matrix is a well-studied object (see [3] and the references therein). In the case when the entries of M are i.i.d. Gaussian random variables (with mean zero and variance one), Edelman [3], answering a question of Smale [23] showed 602 TERENCE TAO AND VAN H. VU

Theorem 3.1. Let Nn be an n × n random matrix, whose entries are i.i.d. Gaussian random variables (with mean zero and variance one). Then E(ln c(Nn)) = ln n + c + o(1), where c > 0 is an explicit constant.

In application, it is usually useful to have a tail estimate. It was shown by Edelman and Sutton [4] that

Theorem 3.2. Let Nn be a n by n random matrix, whose entries are i.i.d. Gaussian random variables (with mean zero and variance one). Then for any constant A > 0, A+1 −A P(c(Nn) ≥ n ) = OA(n ).

On the other hand, for the other basic case when the entries are i.i.d. Bernoulli random variables (copies of η1), the situation is far from being settled. Even to prove that the condition number is finite with high probability is a nontrivial task (see [13]). The techniques used to study Gaussian matrices rely heavily on the explicit joint distribution of the eigenvalues. This distribution is not available for discrete models. Using our inverse theorems, we can prove the following result, which is µ comparable to Theorem 3.2, and is another main result of this paper. Let Mn be the n by n random matrix whose entries are i.i.d. copies of ηµ. In particular, the Bernoulli matrix mentioned above is the case when µ = 1.

Theorem 3.3. For any positive constant A, there is a positive constant B such that the following holds. For any positive constant µ at most one and any sufficiently large n µ B −A P(c(Mn ) ≥ n ) ≤ n .

Given an M of order n, we set σn(M) to be the smallest singular value of M: σ (M) := min kMxk. n n x∈R ,kxk=1 Then c(M) = σ1(M)/σn(M).

It is well known that there is a constant Cµ such that the largest singular µ 1/2 value of Mn is at most Cµn with exponential probability 1 − exp(−Ωµ(n)) (see, e.g. [14]). Thus, Theorem 3.3 reduces to the following lower tail estimate for the smallest singular value of σn(M):

Theorem 3.4. For any positive constant A, there is a positive constant B such that the following holds. For any positive constant µ at most one and any sufficiently large n µ −B −A P(σn(Mn ) ≤ n ) ≤ n . INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 603

Shortly prior to this paper, Rudelson [20] proved the following result.

Theorem 3.5. Let 0 < µ ≤ 1. There are positive constants c1(µ), c2(µ) −1/2 such that the following holds. For any  ≥ c1(µ)n ,

µ −3/2 P(σn(Mn ) ≤ c2(µ)n ) ≤ .

In fact, Rudelson’s result holds for a larger class of matrices. The descrip- tion of this class is, however, somewhat technical. We refer the reader to [20] for details. It is useful to compare Theorems 3.4 and 3.5. Theorem 3.5 gives an explicit dependence between the bound on σn and the probability, while the dependence between A and B in Theorem 3.4 is implicit. Actually our proof does provide an explicit value for B, but it is rather large and we make no attempt to optimize it. On the other hand, Theorem 3.5 does not yield a probability better than n−1/2. In many applications (especially those involving the union bound), it is important to have a probability bound of order n−A with arbitrarily given A. The proof of Theorem 3.4 relies on Corollary 2.7 and two other ingredients, which are of independent interest. In the rest of this section, we discuss these ingredients. These ingredients will then be combined in Section 7 to prove Theorem 3.4.

3.1. Discretization of GAPs. Let P be a GAP of integers of rank d and volume V . We show that given any specified scale parameter R0, one can “discretize” P near the scale R0. More precisely, one can cover P by the sum of a coarse progression and a small progression, where the diameter of the small progression is much smaller (by an arbitrarily specified factor of S) than the spacing of the coarse progression, and that both of these quantities are close to R0 (up to a bounded power of SV ).

Theorem 3.6 (Discretization). Let P ⊂ Z be a symmetric GAP of rank d and volume V . Let R0,S be positive integers. Then there exists a scale R ≥ 1 and two GAPs Psmall, Psparse of rational numbers with the following properties.

O (1) • (Scale) R = (SV ) d R0.

• (Smallness) Psmall has rank at most d, volume at most V , and takes values in [−R/S, R/S].

• (Sparseness) Psparse has rank at most d, volume at most V , and any two distinct elements of SPsparse are separated by at least RS.

• (Covering) P ⊆ Psmall + Psparse. 604 TERENCE TAO AND VAN H. VU

This theorem is elementary but is somewhat involved. The detailed proof will appear in Section 8. Here, we give an informal explanation, appealing to the analogy between the combinatorics of progressions and linear algebra. Recall that a GAP of rank d is the image Φ(B) of a d-dimensional box under a linear map Φ. This can be viewed as a discretized, localized analogue of the object Φ(V ), where Φ is a linear map from a d-dimensional vector space V to some other vector space. The analogue of a “small” progression would be an object Φ(V ) in which Φ vanished. The analogue of a “sparse” progression would be an object Φ(V ) in which the map Φ was injective. Theorem 3.6 is then a discretized, localized analogue of the obvious linear algebra fact that given any object of the form Φ(V ), one can split V = Vsmall + Vsparse for which Φ(Vsmall) is small and Φ(Vsparse) is sparse. Indeed one simply sets Vsmall to be the kernel of Φ, and Vsparse to be any complementary subspace to Vsmall in V . The proof of Theorem 3.6 follows these broad ideas, with Psmall be- ing essentially a “kernel” of the progression P , and Psparse being a kind of “complementary progression” to this kernel. To oversimplify, we shall exploit this discretization result (as well as the inverse Littlewood-Offord theorems) to control the event that the singular value is small, by the event that the singular value (of a slightly modified random matrix) is zero. The control of this latter quantity is the other ingredient of the proof, to which we now turn.

3.2. Singularity of random matrices. A famous result of Kahn, Koml´os 1 and Szemer´edi[10] asserts that the probability that Mn is singular (or equiv- 1 alently, that σn(Mn) = 0) is exponentially small: Theorem 3.7. There is a positive constant ε such that 1 n P(σn(Mn) = 0) ≤ (1 − ε) . In [10] it was shown that one can take ε = .001. Improvements on ε are obtained recently in [25], [26]. The value of  does not play a critical role in this paper. To prove Theorem 3.3, we need the following generalization of Theo- 1 1 rem 3.7. Note that the row vectors of Mn are i.i.d. copies of X , where 1 1 1 1 1 X = (η1, . . . , ηn) and ηi are i.i.d. copies of η . By changing 1 to µ, we can µ define X in the obvious manner. Now let Y be a set of l vectors y1, . . . , yl in n µ,Y µ µ R and Mn be the random matrix whose rows are X1 ,...,Xn−l, y1, . . . , yl, µ µ where Xi are i.i.d. copies of X . Theorem 3.8. Let 0 < µ ≤ 1, and let l be a nonnegative integer. Then there is a positive constant ε = ε(µ, l) such that the following holds. For any set Y of l independent vectors from Rn, µ,Y n P(σn(Mn ) = 0) ≤ (1 − ε) . INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 605

Corollary 3.9. Let 0 < µ ≤ 1. Then there is a positive constant ε = ε(µ) such that the following holds. For any vector y ∈ Rn, the probability that there are w1, . . . , wn−1, not all zeros, such that µ µ y = X1 w1 + ...Xn−1wn−1 is at most (1 − ε)n.

We will prove Theorem 3.10 in Section 9 by using the machinery from [25].

4. Some quick applications of the inverse theorems

The inverse theorems provide effective bounds for counting the number of “exceptional” collections v of numbers with high concentration probability; see [26] for a demonstration of how such bounds can be used in applications. In this section, we present two such bounds that can be obtained from the inverse theorems developed here. In the first example, let  be a positive constant and M be a large integer, and consider the following question: How many sets v of n integers with absolute values at most M are there such that P1(v) ≥ ? By Erd˝os’result, all but at most O(−2) of the elements of v are nonzero. n  O(−2) Thus we have the upper bound −2 (2M +1) for the number in question. Using Proposition 2.3, we can obtain a better bound as follows. There are only M O(ln −1) ways to choose the generators of the cube. After the cube is fixed, we need to choose O(−2) nonzero elements inside it. As the cube has volume −1 1 O(−2) O( ), the number of ways to do this is (  ) . Thus, we end up with a bound  O(−2) −1 1 M O(ln  )  which is better than the previous bound if M is considerably larger than −1. For the second application, we return to the question of bounding the sin- 1 gularity probability P(σn(Mn) = 0) studied in Theorem 3.7. This probability is conjectured to equal (1/2 + o(1))n, but this remains open (see [26] for the 1 latest results and some further discussion). The event that Mn is singular is the same as the event that there exists some nonzero vector v ∈ Rn such that 1 1 Mnv = 0. For simplicity, we use the notation Mn instead of Mn in the rest of this section. It turns out that one can obtain the optimal bound (1/2 + o(1))n if one restricts v to some special set of vectors. n Let Ω1 be the set of vectors in R with at least 3n/ log2 n coordinates. Koml´osproved the following:

Theorem 4.1. The probability that Mnv = 0 for some nonzero v ∈ Ω1 is (1/2 + o(1))n. 606 TERENCE TAO AND VAN H. VU

A proof of this theorem can be found in Bollob´as’book [2]. We are going to consider another restricted class. Let C be an arbitrary n positive constant and let Ω2 be the set of integer vectors in R where the coordinates have absolute values at most nC . Using Theorem 2.4, we can prove

Theorem 4.2. The probability that Mnv = 0 for some nonzero v ∈ Ω2 is (1/2 + o(1))n.

Proof. The lower bound is trivial so we focus on the upper bound. For each nonzero vector v, let p(v) be the probability that X · v = 0, where X is n a random Bernoulli vector. From independence we have P(Mnv = 0) = p(v) . Since a hyperplane can contain at most 2n−1 vectors from {−1, +1}n, p(v) is at most 1/2. For j = 1, 2,... , let Sj be the number of nonzero vectors v in Ω2 −j−1 −j such that 2 < p(v) ≤ 2 . Then the probability that Mnv = 0 for some nonzero v ∈ Ω2 is at most n X −j n (2 ) Sj. j=1

Let us now restrict the range of j. Note that if p(v) ≥ n−1/3, then by Erd˝os’s result (mentioned in the introduction) most of the coordinates of v are zero. In this case, by Theorem 4.1 the contribution from these v is at most (1/2+o(1))n. C n (C+1)n Next, since the number of vectors in Ω2 is at most (2n + 1) ≤ n , we can ignore those j where 2−j ≤ n−C−2. Now it suffices to show that

X −j n n (2 ) Sj = o((1/2) ). n−C−2≤2−j ≤n−1/3 For any relevant j, we can find an integer d = O(1) and a positive number  = Ω(1) such that n−(d−1/3) ≤ 2−j < n−(d−2/3).

Set k := n. Thus 2−j  k−d and we can use Theorem 2.4 to esti- mate Sj. Indeed, by invoking this theorem, we see that there are at most n  C k2 O(k2 o(n) k2 (2n + 1) = n ) = n ways to choose the positions and values of exceptional coordinates of v. Furthermore, there are only (2nC +1)d−1 = nO(1) ways to fix the generalized progression P := Q(w, k). Note that the elements of P are polynomially bounded in n. Such integers have only no(1 divisors. Thus, if P is fixed any (nonexceptional) coordinate of v has at most |P |no(1) possible values. This means that once P is fixed, the number of ways to set the nonexceptional coordinates of v is at most (no(1)|P |)n = (2k + 1)(d−1+o(1))n. Putting these together,

O(k2) (d−1+o(1))n Sj ≤ n k . INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 607

As k = nε and 2−j ≤ n−(d−2/3), it follows that  1  2−jnS ≤ no(n)n−n/3 = o 2−n. j log n Since there are only O(log n) relevant j, we can conclude the proof by summing the bound over j.

5. Properties of Pµ(v)

In order to prove the inverse Littlewood-Offord theorems in Section 2, we shall first need to develop some useful tools for estimating the quantity Pµ(v). Note that the tools here are only used for the proof of the inverse Littlewood-Offord theorems in Section 6 and are not required elsewhere in the paper. It is convenient to think of v as a word, obtained by concatenating the numbers vi: v = v1v2 . . . vn. This allows us to perform several operations such as concatenating, truncating and repeating. For instance, if v = v1 . . . vn and w = w1 . . . wm, then n m  X µ X µ  Pµ(vw) = max ηi vi + ηn+jwj = a a∈Z i=1 j=1 µ µ k where ηk , 1 ≤ k ≤ n + m are i.i.d. copies of η . Furthermore, we use v to denote the concatenation of k copies of v. It turns out that there is a nice concerning the expressions Pµ(v), especially when µ is small. The core properties are summarized in the next lemma.

Lemma 5.1. The following properties hold.

• Pµ(v) is invariant under permutations of v. • For any words v, w

(2) Pµ(v)Pµ(w) ≤ Pµ(vw) ≤ Pµ(v). • For any 0 < µ ≤ 1, any 0 < µ0 ≤ µ/4, and any word v,

(3) Pµ(v) ≤ Pµ0 (v). • For any number 0 < µ ≤ 1/2 and any word v, k (4) Pµ(v) ≤ Pµ/k(v ). • For any number 0 < µ ≤ 1/2 and any words v, w1,..., wm, m 1/m  Y m  (5) Pµ(vw1 ... wm) ≤ Pµ(vwj ) . j=1 608 TERENCE TAO AND VAN H. VU

• For any number 0 < µ ≤ 1/2 and any words v, w1,..., wm, there is an index 1 ≤ j ≤ m such that

m (6) Pµ(vw1 ... wm) ≤ Pµ(vwj ).

Proof. The first two properties are trivial. To verify the rest, note that from Fourier analysis

Z 1 n (µ) (µ) −2πiaξ Y (7) P(η1 v1 + ··· + ηn vn = a) = e (1 − µ + µ cos(2πvjξ)) dξ. 0 j=1

When 0 < µ ≤ 1/2, the expression 1 − µ + µ cos(2πvjξ)) is positive, and thus n Z 1 Y (8) Pµ(v) = P(Yµ,v = 0) = (1 − µ + µ cos(2πvjξ)) dξ. 0 j=1

To prove (3), note that for any 0 < µ ≤ 1, 0 < µ0 ≤ µ/4 and any θ we have the elementary inequality

|(1 − µ) + µ cos θ| ≤ (1 − µ0) + µ0 cos 2θ.

Using this,

n Z 1 Y Pµ(v) ≤ |(1 − µ + µ cos(2πvjξ))| dξ 0 j=1 Z 1 n Y 0 0 ≤ (1 − µ + µ cos(4πvjξ)) dξ 0 j=1 Z 1 n Y 0 0 = (1 − µ + µ cos(4πvjξ))dξ 0 j=1

= Pµ0 (v) where the next to last equality follows by changing ξ to 2ξ and considering the periodicity of cosine. Similarly, observe that for 0 < µ ≤ 1/2 and k ≥ 1,  µ µ k (1 − µ + µ cos(2πv ξ)) ≤ 1 − + cos(2πv ξ) . j k k j t From the concavity of log(1 − t) when 0 < t < 1, log(1 − t) ≤ k log(1 − k ). The claim follows by exponentiating this with t := µ(1 − cos(2πvjξ))), which proves (4). Finally, (5) is a consequence of (8) and H¨older’sinequality, while (6) follows directly from (5). INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 609

µ Now we consider the distribution of the equal-steps random walk η1 + µ ··· + ηm = Yµ,1m . Intuitively, this random walk is concentrated in an interval of length O((1+µm)1/2) and has a roughly uniform distribution in the integers in this interval (though when µ is close to 1, parity considerations may cause Yµ,1m to favor the even integers over the odd ones, or vice versa); compare with the discussion in Example 2.1. The following lemma is a quantitative version of this intuition.

Lemma 5.2. For any 0 < µ ≤ 1 and m ≥ 1 m µ µ −1/2 (9) Pµ(1 ) = sup P(η1 + ··· + ηm = a) = O((µm) ). a In fact, we have the more general estimate µ µ −1 −1/2 µ µ (10) P(η1 +···+ηm = a) = O((τ +(µm) )P(η1 +···+ηm ∈ [a−τ, a+τ]) for any a ∈ Z and τ ≥ 1. Finally, if τ ≥ 1 and if S is any τ-separated set of integers (i.e. any two distinct elements of S are at least τ apart) then µ µ −1 −1/2 (11) P(η1 + ··· + ηm ∈ S) ≤ O(τ + (µm) ). Proof. We first prove (9). From (3) we may assume µ ≤ 1/4, and then by (8) Z 1 m m Pµ(1 ) = |1 − µ + µ cos(2πξ)| dξ. 0 Next we use the elementary estimate 1 − µ + µ cos(2πξ) ≤ exp(−µkξk2/100), where kξk denotes the distance to the nearest integer. This implies that m R 1 2 Pµ(1 ) is bounded from above by 0 exp(−µmkξk /100)dξ, which is of or- der O((µm)−1/2). To see this, note that for ξ ≥ 1000(µm)−1/2 the function exp(−µmkξk2/100) is quite small and its integral is negligible. Now we prove (10). We may assume that τ ≤ (µm)1/2, since the claim for larger τ follows automatically. By symmetry we can take a ≥ 2. For each integer a, let ca denote the probability (µ) (µ) ca := P(η1 + ··· + ηm = a). Direct computation (letting i denote the number of η(µ) variables which equal zero) yields the explicit formula m X m  m − j  c = (1 − µ)j(µ/2)m−j , a j (a + m − j)/2 j=0 a with the convention that the binomial coefficient b is zero when b is not an integer between 0 and a. This in particular yields the monotonicity property 610 TERENCE TAO AND VAN H. VU ca ≥ ca+2 whenever a ≥ 0. This is already enough to yield the claim when a > τ, so it remains to verify the claim when a ≤ τ. Now the random variable µ µ η1 + ··· + ηm is symmetric around the origin and has variance µm, so from Chebyshev’s inequality we know that X ca = Θ(1). 0≤a≤2(µm)1/2

−1/2 From (9) we also have ca = O((µm) ) for all a. From this and the monotonicity property ca ≥ ca+2 and the pigeonhole principle we see that −1/2 1/2 ca = Θ((µm) ) either for all even 0 ≤ a ≤ (µm) , or for all odd 0 ≤ a ≤ (µm)1/2. In either case, the claim (10) is easily verified. The bound in (11) P then follows by summing (10) over all a ∈ S and noting that a ca = 1.

One can also use the formula for ca to prove (9). The simple details are left as an exercise.

6. Proofs of the inverse theorems

We now have enough machinery to prove the inverse Littlewood-Offord theorems. We first give a quick proof of Proposition 2.3:

Proof of Proposition 2.3. Suppose that the conclusion failed. Then an easy greedy algorithm argument shows that v must contain a dissociated subword w = (w1, . . . , wd+1) of length d + 1. By (2), −d−1 2 < P1(v) ≤ P1(w).

On the other hand, since w is dissociated, all the sums of the form η1w1 + ··· −d−1 + ηd+1wd+1 are distinct and so P1(w) ≤ 2 , yielding the desired contradic- tion.

To prove Theorem 2.4, we modify the above argument by replacing the notion of dissociativity by k-dissociativity. Unfortunately this makes the proof somewhat longer:

Proof of Theorem 2.4. We construct an k-dissociated tuple (w1, . . . , wr) for some 0 ≤ r ≤ d − 1 by the following algorithm:

• Step 0. Initialize r = 0. In particular, (w1, . . . , wr) is trivially k-dissoci- ated. From (4) we have d (12) Pµ/4d(v ) ≥ Pµ/4(v) ≥ Pµ(v).

• Step 1. Count how many 1 ≤ j ≤ n there are such that (w1, . . . , wr, vj) is k-dissociated. If this number is less than k2, halt the algorithm. Oth- erwise, move on to Step 2. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 611

• Step 2. Applying the last property of Lemma 5.1, we can locate a vj such that (w1, . . . , wr, vj) is k-dissociated, and

d−r k2 k2 d−r−1 k2 k2 k2 (13) Pµ/4d(v w1 . . . wr ) ≤ Pµ/4d(v w1 . . . wr vj ).

Then set wr+1 := vj and increase r to r + 1. Return to Step 1. Note that (w1, . . . , wr) remains k-dissociated, and (12) remains true. Suppose that we terminate at some step r ≤ d − 1. Then we have an r-tuple (w1, . . . , wr) which is k-dissociated, but such that (w1, . . . , wr, vj) is 2 k-dissociated for at most k values of vj. Unwinding the definitions, this shows 2 that for all but at most k values of vj, there exists τ ∈ [1, k] such that τvj ∈ Q(w, k), proving the claim. It remains to show that we must indeed terminate at some step r ≤ d − 1. Assume (for a contradiction) that we have reached step d. Then there exists a k-dissociated tuple (w1, . . . , wd), and by (12), (13), k2 k2 µ(v) ≤ µ/4d(w1 . . . wd ) = P(Y k2 k2 = 0). P P µ/4d,w1 ...wd Let Γ ⊂ Zd be the lattice d Γ := {(m1, . . . , md) ∈ Z : m1w1 + ··· + mdwd = 0}. By using independence we can write d X Y (14) µ(v) ≤ P(Y k2 k2 = 0) = P(Y k2 = mj). P µ/4d,w1 ...wd µ/4d,1 (m1,...,md)∈Γ j=1 Now we use a volume packing argument. From Lemma 5.2,   1 X 0 P(Y k2 = m) = O P(Y k2 = m ) µ/4d,1 µ,d k µ/4d,1  m0∈m+(−k/2,k/2) and hence from (14),

−d X Pµ(v) ≤ Oµ,d k

(m1,...,md)∈Γ d ! X Y 0 P(Yµ/4d,1k2 = mj) . 0 0 d (m1,...,md)∈(m1,...,md)+(−k/2,k/2) j=1 0 0 Since (w1, . . . , wd) is k-dissociated, all the (m1, . . . , md) tuples in Γ + (−k/2, k/2)d are different. Thus, we conclude d ! −d X Y Pµ(v) ≤ Oµ,d k P(Yµ/4d,1k2 = mj) . d j=1 (m1,...,md)∈Z 612 TERENCE TAO AND VAN H. VU

But from the union bound

d X Y P(Yµ/4d,1k2 = mj) = 1, d j=1 (m1,...,md)∈Z and so −d Pµ(v) ≤ Oµ,d(k ). To complete the proof, set the constant C = C(µ, d) in the theorem to be −d larger than the hidden constant in Oµ,d(k ).

Remark 6.1. One can also use the Chernoff bound and obtain a shorter proof (avoiding the volume packing argument) but with an extra logarithmic loss in the estimates.

1 Finally we perform some additional arguments to eliminate the τ dilations in Theorem 2.4 and obtain our final inverse Littlewood-Offord theorem. The key will be the following lemma. Given a set S and a number v, the torsion of v with respect to S is the smallest positive integer τ such that τv ∈ S. If such τ does not exists, we say that v has infinite torsion with respect to S. The key new ingredient will be the following lemma, which asserts that adding a high torsion element to a random walk reduces the concentration probability significantly.

Lemma 6.2 (Torsion implies dispersion). Let 0 < µ ≤ 1 and consider a Pd GAP Q := { i=1 xiWi| − Li ≤ xi ≤ Li}. Assume that Wd+1 has finite torsion τ with respect to 2Q. Then there is a constant Cµ depending only on µ such that 2 L1 Ld τ −1 L1 Ld Pµ(W1 ...Wd Wd+1) ≤ Cµτ Pµ(W1 ...Wd ).

Proof. Let a be an integer such that

2 d Li τ ! 2 L1 Ld τ X X µ X µ Pµ(W1 ...Wd Wd+1) = P Wi ηj,i + Wd+1 ηj,d+1 = a , i=1 j=1 j=1

µ µ where the ηj,i are i.i.d. copies of η . It suffices to show that

 2  d Li τ X X µ X µ −1 L1 Ld P  Wi ηj,i + Wd+1 ηj,d+1 = a = Oµ(τ )Pµ(W1 ...Wd ). i=1 j=1 j=1

2 2 Let S be the set of all m ∈ [−τ , τ ] such that Q + mWd+1 contains a. 2 Pd PLi µ Pτ µ Observe that in order for i=1 Wi j=1 ηj,i +Wd+1 j=1 ηj,d+1 to equal a, the INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 613

Pk µ L1 Ld quantity j=1 ηj,d+1 must lie in S. By the definition of Pµ(W1 ...Wd ) and Bayes identity, we conclude

2 d Li τ ! X X µ X µ P Wi ηj,i + Wd+1 ηj,d+1 = a i=1 j=1 j=1 τ 2 ! L1 Ld X µ ≤ Pµ(W1 ...Wd )P ηj,d+1 ∈ S . j=1 Consider two elements x, y ∈ S. By the definition of S,(x−y)v ∈ Q−Q = 2Q. From the definition of τ, |x − y| is either zero or at least τ. This implies that S is τ-separated and the claim now follows from Lemma 5.2. The following technical lemma is also needed.

Lemma 6.3. Consider a GAP Q(w,L). Assume that v is an element with (finite) torsion τ with respect to Q(w,L). Then 1 Q(w,L) + Q(v, L0) ⊂ · Q(w,L(L0 + τ)). τ 1 Pr Proof. Assume w = w1 . . . wr. We can write v as τ i=1 aiwi, where 0 |ai| ≤ L. An element y in Q(w,L) + Q(v, L ) can be written as r X y = xiwi + xv i=1 0 where |xi| ≤ L and |x| ≤ L . Substituting v, r r r X 1 X 1 X y = x w + x a w = w (τx + xa ), i i τ i i τ i i i i=1 i=1 i=1 0 where |τxi + xai| ≤ τL + L L. This concludes the proof.

Proof of Theorem 2.5. We begin by running the algorithm in the proof of Theorem 2.4 to locate a word w of length at most d − 1 such that the set S 1 2 [0] 1≤τ≤k τ · Q(w, k) covers all but at most k elements of v. Set v to be the word formed by removing the (at most k2) exceptional elements from v which S 1 do not lie in 1≤τ≤k τ · Q(w, k). By increasing the constant k0 in the assumption of the theorem, we can assume, in all arguments below, that k is sufficiently large, whenever needed. By (2) and (3) (15) [0] k2 k2 k2 −d k2 Pµ/4d(v w ) ≥ Pµ/4d(vw ) ≥ Pµ/4d(v)Pµ/4d(vw ) ≥ k Pµ/4d(vw ). In the following, assume that there is at least one nonzero entry in w; otherwise the claim is trivial. 614 TERENCE TAO AND VAN H. VU

Now we perform an additional algorithm. Let K = K(µ, d, ) > 2 be a large constant to be chosen later. 2 [0] • Step 0. Initialize i = 0 and set Q0 := Q(w, k ) and v as above. • Step 1. Count how many v ∈ v[i−1] having torsion at least K with re- spect to 2Qi−1. (We need to have the factor 2 here in order to apply Lemma 6.2.) If this number is less than k2, halt the algorithm. Other- wise, move on to Step 2. • Step 2. Locate a multiset S of k2 elements of v[i−1] with torsion at least K with respect to 2Qi−1. Applying (6), we can find an element v ∈ S such that

2 2 2 2 2 2 2 [i−1] k τ1 τi−1 [i] k τ1 τi−1 k Pµ/4d(v w W1 ...Wi−1 ) ≤ Pµ/4d(v w W1 ...Wi−1 v ) where v[i] is obtained from v[i−1] by deleting S. Let τi be the torsion of v with respect to 2Qi−1. Since every element of [0] v has torsion at most k with respect to Q0, K ≤ τi ≤ k. We then set 2 Wi := v, Qi := Qi−1 + Q(Wi, τi ), increase i to i + 1 and return to Step 1. Consider a stage i of the algorithm. From construction and induction and (15), we have a word W1 ...Wi with

2 2 2 2 2 [i] k τ1 τi [0] k −d k Pµ/4d(v w W1 ...Wi ) ≥ P(v w ) ≥ k P(w ). On the other hand, by applying Lemma 6.2 iteratively, i 2 2 2 2 k τ1 τi k Y −1 Pµ/4d(w W1 ...Wi ) ≤ Pµ/4d(w ) (Cµτj ). j=1 Qi −1 −d Qi −1 d It follows that j=1(Cµτj ) ≥ k , or equivalently j=1(Cµ τj) ≤ k . Recall that τj ≥ K. Thus by setting K sufficiently large (compared to Cµ, d and 1/), we can guarantee that i Y d+/2d (16) τj ≤ k j=1 where  is the constant in the assumption of the theorem. It also follows that d+/2d the algorithm must terminate at some stage D ≤ logK k ≤ (d+1) logK k. Now look at the final set QD. Applying Lemma 6.3 iteratively, D ! Y 1 Q ⊂ · Q(w,L ) D τ D j=1 j 2 where L0 := k and 2 2 (17) Li := Li−1(τi + τi ) ≤ (1 + 1/K)Li−1τi . 1 1 We now show that the GAP Q := K! ·(2K!)Q(w,LD) = K! ·Q(w, 2K!LD) satisfies the claims of the theorem. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 615

• (Rank) We have rank(Q) = rank(Q(w,LD)) = rank(Q0) = r ≤ d − 1, as shown in the proof of the previous theorem. r • (Volume) We have Vol(Q) = (2K!) Vol(Q(w,LD)) = O(Vol(Q(w,LD))). On the other hand, by (16) and (17)

D ! r r r  2 Y 2 Vol(Q(w,LD)) = (2LD + 1) ≤ (3LD) = O k (1 + 1/K)τj j=1 !  r = O k2+2(d+/2d)(1 + K)D .

d+/2d By definition, D ≤ logK k < log k, given that K is sufficiently large compared to d. Thus (1 + 1/K)D ≤ exp(D/K) ≤ k1/K which implies that

r(2+2(d+/2d)+1/K) 2(d2−1)+ Vol(Q(w,LD)) = O(k ) = o(k )

provided that r ≤ d − 1 and K is sufficiently large compared to d and 1/. (The asymptotic notation here is used under the assumption that k → ∞.) • (Number of exceptional elements) At each stage in the second algorithm, 2 2 2 we discard a set of k elements; thus all but Dk ≤ (d + 1)k logK k) [0] elements of v have torsion at most K with respect to 2QD. As QD ⊂ [0] 2 Q(w,LD) and v\v ≤ k , it follows that all but at most

2 2 (d + 1)k logK k + k

elements of v have torsion at most K with respect to

2Q(w,LD) = Q(w, 2LD).

By setting K sufficiently large compared to d and 1/, we can guarantee that 2 2 2 (d + 1)k logK k + k ≤ k log k.

To conclude, note that any element with torsion at most K with respect 1 to Q(w, 2LD) belongs to Q := K! · Q(w, 2K!LD). Thus, Q contains all but at most k2 log k elements of v. 1 1 • (Generators) The generators of K! · Q(w, 2K!LD) are QD wi, 1 ≤ K! j=1 τj QD d+/2d d+ i ≤ r. Since wi ∈ v and j=1 τj ≤ k = o(k ), the claim about generators follows.

The proof is complete. 616 TERENCE TAO AND VAN H. VU

7. The smallest singular value

In this section, we prove Theorem 3.4, modulo two key results, Theo- rem 3.6 and Corollary 3.9, which will be proved in later sections. Let B10 be a large number (depending on A) to be chosen later. Suppose µ −B that σn(Mn ) < n . This means that there exists a unit vector v such that µ −B kMn vk < n . By rounding each coordinate v to the nearest multiple of n−B−2, we can find a vectorv ˜ ∈ n−B−2 · Zn of magnitude 0.9 ≤ kv˜k ≤ 1.1 such that µ −B kMn v˜k ≤ 2n . Thus, writing w := nB+2v˜, we can find an integer vector w ∈ Zn of magnitude 0.9nB+2 ≤ kwk ≤ 1.1nB+2 such that µ 2 kMn wk ≤ 2n . Let Ω be the set of integer vectors w ∈ Zn of magnitude 0.9nB+2 ≤ kwk ≤ 1.1nB+2. It suffices to show the probability bound µ 2 −A P(there is some w ∈ Ω such that kMn wk ≤ 2n ) = OA,µ(n ).

We now partition the elements w = (w1, . . . , wn) of Ω into three sets:

• We say that w is rich if −A−10 Pµ(w1 . . . wn) ≥ n

and poor otherwise. Let Ω1 be the set of poor w’s. • A rich w is singular w if fewer than n0.2 of its coordinates have absolute B−10 value n or greater. Let Ω2 be the set of rich and singular w’s. • A rich w is nonsingular w, if at least n0.2 of its coordinates have absolute B−10 value n or greater. Let Ω3 be the set of rich and nonsingular w’s.

The desired estimate follows directly from the following lemmas and the union bound.

Lemma 7.1 (Estimate for poor w). µ 2 −A P(there is some w ∈ Ω1 such that kMn wk ≤ 2n ) = o(n ).

Lemma 7.2 (Estimate for rich singular w). µ 2 −A P(there is some w ∈ Ω2 such that kMn wk ≤ 2n ) = o(n ).

Lemma 7.3 (Estimate for rich nonsingular w). µ 2 −A P(there is some w ∈ Ω3 such that kMn wk ≤ 2n ) = o(n ). INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 617

Remark 7.4. Our arguments will show that the probabilities in Lemmas 7.2 and 7.3 are exponentially small.

The proofs of Lemmas 7.1 and 7.2 are relatively simple and rely on well- known methods. We delay these proofs to the end of this section and focus on the proof of Lemma 7.3, which is the heart of the matter, and which uses all the major tools discussed in previous sections. Proof of Lemma 7.3. Informally, the strategy is to use the inverse Littlewood-Offord theorem (Corollary 2.7) to place the integers w1, . . . , wn in a progression, which we then discretize using Theorem 3.6. This allows us µ 2 µ,Y to replace the event kMn wk ≤ 2n by the discretized event Mn = 0 for a suitable Y , at which point we apply Corollary 3.9. We turn to the details. Since w is rich, we see from Corollary 2.7 that there exists a symmetric GAP Q of integers of rank at most A0 and volume at A0 0.1 0 most n which contains all but bn c of the integers w1, . . . , wn, where A is a constant depending on µ and A. Also the generators of Q are of the form A0 wi/s for some 1 ≤ i ≤ n and 1 ≤ s ≤ n . Using the description of Q and the fact that w1, . . . , wn are polynomially bounded (in n), it is easy to derive that the total number of possible Q is nOA0 (1). Next, by paying a factor of   n 0.1 ≤ nbn c = exp(o(n)) bn0.1c 0.1 we may assume that it is the last bn c integers wm+1, . . . , wn which possibly 0.1 lie outside Q, where we set m := n − bn c. As each of the wi has absolute value at most 1.1nB+2, the number of ways to fix these exceptional elements is at most (2.2nB+2)n0.1 = exp(o(n)). Overall, it costs a factor of at most exp(o(n)) to fix Q and the positions and values of the exceptional elements of w. Once we have fixed wm+1, . . . , wn, we can then write µ µ Mnw = w1X1 + ··· + wmXm + Y, µ where Y is a random variable determined by Xi and wi, m < i ≤ n. (In this µ proof we think of Xi as the column vectors of the matrix.) For any number y, let Fy be the event that there exists w1, . . . , wm in Q, where at least one of the B−10 wi has absolute value larger or equal n , such that µ µ 2 |w1X1 + ··· + wmXm + y| ≤ 2n . It suffices to prove that −A P(Fy) = o(n ) for any y. Our argument will in fact show that this probability is exponentially small. 618 TERENCE TAO AND VAN H. VU

B/2 10 We now apply Theorem 3.6 to the GAP Q with R0 := n and S := n B/2+O (1) to find a scale R = n A and symmetric GAPs Qsparse, Qsmall of rank at most A0 and volume at most nA0 such that:

• Q ⊆ Qsparse + Qsmall.

−10 −10 • Qsmall ⊆ [−n R, n R].

10 10 • The elements of n Qsparse are n R-separated.

10 Since Q (and hence n Q) contains w1, . . . , wm, we can write sparse small wj = wj + wj sparse small for all 1 ≤ j ≤ m, where wj ∈ Qsparse and wj ∈ Qsmall. In fact, this decomposition is unique. µ µ µ µ Suppose that the event Fy holds. Writing Xi = (ηi,1, . . . , ηi,n) (where ηi,j µ are i.i.d. copies of η ) and y = (y1, . . . , yn), µ µ 2 w1ηi,1 + ··· + wmηi,m = yi + O(n ) for all 1 ≤ i ≤ n. Splitting the wj into sparse and small components and estimating the small components using the triangle inequality, we obtain sparse µ sparse µ −9 w1 ηi,1 + ··· + wm ηi,m = yi + O(n R) 10 for all 1 ≤ i ≤ n. Note that the left-hand side lies in mQsparse ⊂ n Qsparse, which is known to be n10R-separated. Thus there is a unique value for the 0 right-hand side, denoted as yi, which depends only on y and Q such that sparse sparse 0 w1 ηi,1 + ··· + wm ηi,m = yi. The point is that now we have eliminated the O() errors, and thus have essen- tially converted the singular value problem to the zero determinant problem. Note also that since one of the w1, . . . , wm is known to have magnitude at least nB−10 (which will be much larger than n10R if B is chosen large depending sparse sparse on A), we see that at least one of the w1 , . . . , wn is nonzero. Consider the random matrix M 0 of order m × m + 1 whose entries are µ 0 m+1 0 0 0 i.i.d. copies of η and let y ∈ R be the column vector y = (y1, . . . , ym+1). We conclude that if the event Fy holds, then there exists a nonzero vector w ∈ Rm such that M 0w = y0. But from Corollary 3.9, this holds with the desired probability exp(−Ω(m + 1)) = exp(−Ω(n)) = o(n−A) and we are done. Proof of Lemma 7.1. We use a conditioning argument, following [20]. (An argument of the same spirit was used by Koml´osto prove the bound O(n−1/2) for the singularity problem [2].) INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 619

2 Let M be a matrix such that there is w ∈ Ω1 satisfying kMwk ≤ 2n . Since M and its transpose have the same spectral norm, there is a vector w0 0 2 0 which has the same norm as w such that kw Mk ≤ 2n . Let u = w M and Xi be the row vectors of M. Then n X 0 u = wiXi i=1 0 0 where wi are the coordinates of w . Now we think of M as a random matrix. By paying a factor of n, we can 0 0 assume that wn has the largest absolute value among the wi. We expose the 2 first n − 1 rows X1,...,Xn−1 of M. If there is w ∈ Ω1 satisfying kMwk ≤ 2n , then there is a vector y ∈ Ω1, depending only on the first n − 1 rows such that n−1 1/2  X 2 2 (Xi · y) ≤ 2n . i=1

Now consider the inner product Xn · y. We can write Xn as n−1 1  X  X = u − w0X . n w0 i i n i=1 Thus, n−1 1 X |X · y| = |u · y − w0X · y|. n kw0 k i i n i=1 The right-hand side, by the triangle inequality, is at most

n−1 1  X  (kukkyk + kw0k (X · y)2)1/2 . kw0 k i n i=1 0 −1/2 0 2 By assumption kwnk ≥ n kw k. Furthermore, as kuk ≤ 2n , kukkyk ≤ 2 2 0 0 2n kyk ≤ 3n kw k as kw k = kwk, and both y and w belong to Ω1. (Any two Pn−1 2 1/2 2 vectors in Ω1 have roughly the same length.) Finally ( i=1 (Xi ·y) ) ≤ 2n . Putting all these together,

5/2 |Xn · y| ≤ 5n .

Recall that y is fixed (after we expose the first n − 1 rows) and Xn is a copy µ µ 5/2 5/2 of X . The probability that |X · y| ≤ 5n is at most (10n + 1)Pµ(y). On −A−10 the other hand, y is poor, and so Pµ(y) ≤ n . Thus, it follows that µ 2 P(there is some w ∈ Ω1 such that kMn wk ≤ 2n ) ≤ n−A−10(10n5/2 + 1)n = o(n−A),

0 where the extra factor n comes from the assumption that wn has the largest absolute value. This completes the proof. 620 TERENCE TAO AND VAN H. VU

Proof of Lemma 7.2. We use an argument from [15]. The key point will be that the set Ω2 of rich nonsingular vectors has sufficiently low entropy so that one can proceed using the union bound. A set N of vectors on the n-dimensional unit sphere Sn−1 is said to be an -net if for any x ∈ Sn−1, there is y ∈ N such that kx − yk ≤ . A standard greedy argument shows the following:

Lemma 7.5. For any n and  ≤ 1, there exists an -net of cardinality at most O(1/ε)n.

Next, a simple concentration of measure argument shows

Lemma 7.6. For any fixed vector y of magnitude between 0.9 and 1.1

µ −2 P(kMn yk ≤ n ) = exp(−Ω(n)).

It suffices to verify this statement for the case |y| = 1. Note that

n n µ 2 X 2 X kMn yk = (Xi · y) = Zi i=1 i=1

2 where Zi = (Xi · y) . The Zi are i.i.d. random variables with expectation µ Pn and bounded variance. Thus i=1 Zi has mean Ω(n) and the claimed bound follows from Chernoff’s large deviation inequality (see, e.g., [28, Ch. 1]). (In fact, one can replace the n−2 by cn1/2 for some small constant c, but this refinement is not necessary.) 0 0 0 For a vector w ∈ Ω2, let w be its normalization w := w/kwk. Thus, w is a unit vector with at most n0.2 coordinates with absolute values larger or −10 0 0 equal n . Let Ω2 be the collection of those w with this property. 2 0 −B B+2 If kMwk ≤ 2n for some w ∈ Ω2, then kMw k ≤ 3n , as kwk ≥ 0.9n . 0 0 Thus, it suffices to give an exponential bound on the event that there is w ∈ Ω2 µ 0 −B such that kMn w k ≤ 3n . n  By paying a factor n0.2 = exp(o(n)) in probability, we can assume that the large coordinates (with absolute value at least n−10) are among the first 0.2 −3 l := n coordinates. Consider an n -net N in Sl−1. For each vector y ∈ N, let y0 be the n-dimensional vector obtained from y by letting the last n − l coordinates be zeros, and let N 0 be the set of all such vectors obtained. These vectors have magnitude between 0.9 and 1.1, and from Lemma 7.5, |N 0| ≤ O(n3)l. 0 00 Now consider a rich singular vector w ∈ Ω2 and let w be the l-dimen- sional vector formed by the first l coordinates of this vector. Since the remain- ing coordinates are small, kw00 k = 1 + O(n−9.5). There is a vector y ∈ N such that 00 ky − w k ≤ n−3 + O(n−9.5). INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 621

It follows that there is a vector y0 ∈ N 0 such that ky0 − w0k ≤ n−3 + O(n−9.5) ≤ 2n−3. For any matrix M of norm at most n, kMw0k ≥ kMy0k − 2n−3n = kMy0k − 2n−2. It follows that if kMw0k ≤ 3n−B for some B ≥ 2, then kMy0k ≤ 5n−2. Now µ 0 0 −2 take M = Mn . For each fixed y , the probability that kMy k ≤ 5n is at most exp(−Ω(n)), by Lemma 7.6. Furthermore, the number of y0 is subexponential (at most O(n3)l = O(n)3n.2 = exp(o(n))). Thus the claim follows directly by the union bound.

8. Discretization of progressions

The purpose of this section is to prove Theorem 3.6. The arguments here are elementary (based mostly on the pigeonhole principle and linear algebra, in particular Cramer’s rule) and can be read independently of the rest of the paper. We shall follow the informal strategy outlined in Section 3.1. We begin with a preliminary observation, which asserts the intuitive fact that progres- sions do not contain large lacunary subsets.

Lemma 8.1. Let P ⊂ Z be a symmetric generalized arithmetic progression of rank d and volume V , and let x1, . . . , xd+1 be nonzero elements of P . Then there exist 1 ≤ i < j ≤ d + 1 such that

−1 −1 Cd V |xi| ≤ |xj| ≤ CdV |xi| for some constant Cd > 0 depending only on d.

Proof. We may order |xd+1| ≥ |xd| ≥ · · · ≥ |x1|. If we write

P = {m1v1 + ··· + mdvd : |mi| ≤ Mi for all 1 ≤ i ≤ d}

(so that V = Θd(M1 ...Md)), then each of the x1, . . . , xd+1 can be written as a linear combination of the v1, . . . , vd. Applying Cramer’s rule, we conclude that there exists a nontrivial relation

a1x1 + ··· + ad+1xd+1 = 0 where a1, . . . , ad+1 = Od(V ) are integers, not all zero. If we let j be the largest index such that aj is nonzero, then j > 1 (since x1 is nonzero) and in particular, we conclude that |xj| = O(|ajxj|) = Od(V |xj−1|) from which the claim follows. 622 TERENCE TAO AND VAN H. VU

Proof of Theorem 3.6. We can assume that R0 is very large compared to O (1) (SV ) d since otherwise the claim is trivial (take Psparse := P and Psmall := {0}). We can also take V ≥ 2. Let B = Bd be a large integer depending only on d to be chosen later. The −BB+2 BB+2 first step is to subdivide the interval [(SV ) R0, (SV ) R0] into Θ(B) overlapping subintervals of the form [(SV )−BB+1 R, (SV )BB+1 R], with every integer being contained in at most O(1) of the subintervals. From Lemma 8.1 and the pigeonhole principle we see that at most Od(1) of the intervals can B B contain an element of (SV )B P (which has volume O((SV )Od(B )). If we let B be sufficiently large, we can thus find an interval [(SV )−BB+1 R, (SV )BB+1 R] which is disjoint from (SV )BB P . Since P is symmetric, this means that every x ∈ (SV )BB P is either larger than (SV )BB+1 R in magnitude, or smaller than (SV )−BB+1 R in magnitude. Having located a good scale R to discretize, we now split P into small ( R) and sparse ( R-separated) components. We write P explicitly as

P = {m1v1 + ··· + mdvd : |mi| ≤ Mi for all 1 ≤ i ≤ d} so that V = Θd(M1 ...Md) and more generally

kP = {m1v1 + ··· + mdvd : |mi| ≤ kMi for all 1 ≤ i ≤ d} d for any k ≥ 1. For any 1 ≤ s ≤ B, let As ⊂ Z denote the set Bs As := {(m1, . . . , md): |mi| ≤ V Mi for all 1 ≤ i ≤ d; −BB+1 |m1v1 + ··· + mdvd| ≤ (SV ) R}. Roughly speaking, this space corresponds to the kernel of Φ as discussed in Section 3.1; the additional parameter s is a technicality needed to compensate for the fact that boxes, unlike vector spaces, are not quite closed under dila- d tions. We now view As as a subset of the Euclidean space R . As such it spans d a vector space Xs ⊂ R . Clearly

X1 ⊆ X2 ⊆ · · · ⊆ XB. Therefore if B is large enough, by the pigeonhole principle (applied to the dimensions of these vector spaces) we can find 1 ≤ s < B such that we have the stabilization property Xs = Xs+1. Let the dimension of this space be r; thus 0 ≤ r ≤ d. There are two cases, depending on whether r = d or r < d. Suppose first that r = d (so the kernel has maximal dimension). Then by definition of As we have d “equations” in d unknowns, (j) (j) −BB+1 m1 v1 + ··· + md vd = O((SV ) R) for all 1 ≤ j ≤ d, (j) Bs (j) (j) where mi = O(MiV ) and the vectors (m1 , . . . , md ) ∈ As are linearly independent as j varies. Using Cramer’s rule we conclude that s B+1 Od(B ) −B vi = Od((SV ) (SV ) R) for all 1 ≤ j ≤ d INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 623 since all the determinants and minors which arise from Cramer’s rule are in- O (B) tegers that vary from 1 to Od(V d ) in magnitude. Since Mi = O(V ) for O (Bs) −BB+1 all i, we conclude that x = Od(V d (SV ) R) for all x ∈ P , which by construction of R (and the fact that s < B) shows that

B+1 B+1 P ⊂ [−(SV )−B R, (SV )−B R]

(if B is sufficiently large). Thus in this case we can take Psmall = P and Psparse = {0}. Now we consider the case when r < d (so the kernel is proper). In this r d−r case we can write Xs as a graph of some linear transformation T : R → R : after permutation of the coordinates, we have r d−r r Xs = {(x, T x) ∈ R × R : x ∈ R }. The coefficients of T form an r × d − r matrix, which can be computed by Cramer’s rule to be rational numbers with numerator and denominator O (Bs) Od((SV ) d ); this follows from Xs being spanned by As, and on the inte- grality and size bounds on the coefficients of elements of As. Let m ∈ As be arbitrary. Since As is also contained in Xs, we can write s r Od(B ) m = (m[1,r], T m[1,r]) for some m[1,r] ∈ Z with magnitude Od((SV ) ). By definition of As, we conclude that −BB+1 hmr, v[1,r]iRr + hT mr, v[r+1,d]iRd−r = O((SV ) R) where v[1,r] := (v1, . . . , vr), v[r+1,d] := (vr+1, . . . , vd), and the inner products on Rr and Rd−r are the standard ones. Thus ∗ −BB+1 hmr, v[1,r] + T v[r+1,d]iRr = O((SV ) R) where T ∗ : Rd−r → Rr is the adjoint linear transformation to T . Now since A r spans X, the m[1,r] will linearly span R as we vary over all elements m of A. Thus by Cramer’s rule we conclude that

s B+1 ∗ Od(B ) −B (18) v[1,r] + T v[r+1,d] = Od(V (SV ) R). ∗ Write (w1, . . . , wr) := T v[r+1,d]; thus w1, . . . , wr are rational numbers. Then construct the symmetric generalized arithmetic progressions Psmall and Psparse explicitly as

Psparse := {m1w1 + ··· + mrwr + mr+1vr+1

+ ··· + mdvd : |mi| ≤ Mi for all 1 ≤ i ≤ d} and

Psmall := {m1(v1 + w1) + ··· + mr(vr + wr): |mi| ≤ Mi for all 1 ≤ i ≤ d}.

It is clear from construction that P ⊆ Psparse + Psmall, and that Psparse and Psmall have rank at most d and volume at most V . Now from (18), s B+1 Od(B ) −B vi + wi = Od((SV ) (SV ) R) 624 TERENCE TAO AND VAN H. VU and hence for any x ∈ Psmall,

s B+1 Od(B ) −B x = Od((SV ) (SV ) R). By choosing B large enough we conclude that

|x| ≤ R/S which gives the desired smallness bound on Psmall. The only remaining task is to show that SPsparse is sparse. It suffices to show that SPsparse − SPsparse has no nonzero intersection with [−RS, RS]. Suppose for contradiction that this failed. Then we can find m1, . . . , md with |mi| ≤ 2SMi for all i and

0 < m1w1 + ··· + mrwr + mr+1vr+1 + ··· + mdvd < RS. Let Q be the least common denominator of all the coefficients of T ∗, then O (Bs) Q = Od((SV ) d ). Multiplying the above equation by Q, we obtain

0 < m1Qw1 + ··· + mrQwr + mr+1Qvr+1 + ··· + mdQvd s B+1 < O(RSV Od(B )) < (SV )B R.

∗ Since (w1, . . . , wr) = T v[r+1,r+d], the expression between the inequality signs is an integer linear combination of vr+1, . . . , vd, with all coefficients of size O (Bs) Od((SV ) d ), for example

m1Qw1 + ··· + mrQwr + mr+1Qvr+1 + ··· + mdQvd = ar+1vr+1 + ··· + advd. In particular, this expression lies in (SV )BB P (again taking B to be suffi- ciently large). Thus by construction of R, we can improve the upper bound of (SV )BB+1 R to (SV )−BB+1 R:

−BB+1 (19) 0 < ar+1vr+1 + ··· + advd < (SV ) R.

Taking B to be large, this implies that (0,..., 0, ar+1, . . . , ad) lies in Xs+1, r d−r which equals Xs. But Xs was a graph from R to R , and thus ar+1 = ··· = ad = 0, which contradicts (19). This establishes the sparseness.

9. Proof of Theorem 3.10

n Let Y = {y1, . . . , yl} be a set of l independent vectors in R . Re- µ,Y µ µ call that Mn denote the random matrix with row vectors X1 ,...,Xn−l, µ µ µ µ y1, . . . , yl, where Xi are i.i.d. copies of X = (η1 . . . , ηn). Define δ(µ) := max{1 − µ, µ/2}. It is easy to show that for any subspace V of dimension d,

(20) P(Xµ ∈ V ) ≤ δ(µ)d−n. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 625

In the following, we use N to denote the quantity (1/δ(µ))n. As 0 < µ ≤ 1, δ(µ) > 0 and thus N is exponentially large in n. Thus it will suffice to show that µ,Y −ε+o(1) P(Mn singular ) ≤ N for some ε = ε(µ, l) > 0, where the o(1) term is allowed to depend on µ, l, and ε. We may assume that n is large depending on µ and l since the claim is trivial otherwise. µ,Y Note that if Mn is singular, then the row vectors span a proper sub- space V . To prove the theorem, it suffices to show that for any sufficiently small positive constant ε X µ µ −ε+o(1) P(X1 ,...,Xn−l, y1, . . . , yl span V ) ≤ N . V,V proper subspace Arguing as in [25, Lemma 5.1], we can restrict ourselves to hyperplanes. Thus, it is enough to prove X µ µ −ε+o(1) P(X1 ,...,Xn−l, y1, . . . , yl span V ) ≤ N . V,V hyperlane We may restrict our attention to those hyperplanes V which are spanned n by their intersection with {−1, 0, 1} , together with y1, . . . , yl. Let us call such hyperplanes nontrivial. Furthermore, we call a hyperplane H degenerate if there is a vector v orthogonal to H and at most log log n coordinates of v are nonzero. Following [25, Lemma 5.3], it is easy to see that the number of degenerate nontrivial hyperplanes is at most N o(1). Thus, their contribution in the sum is at most N o(1)δ(µ)n−l = N −1+o(1) which is acceptable. Therefore, from now on we can assume that V is nonde- generate. For each nontrivial hyperplane V , define the discrete codimension d(V ) of V to be the unique integer multiple of 1/n such that

− d(V ) − 1 µ − d(V ) (21) N n n2 < P(X ∈ V ) ≤ N n . Thus d(V ) is large when V contains few elements from {−1, 0, 1}n, and con- versely. µ µ Let BV denote the event that X1 ,...,Xn−l, y1, . . . , yl span V . We denote by Ωd the set of all nondegenerate, nontrivial hyperplanes with discrete codi- mension d. It is simple to see that 1 ≤ d(V ) ≤ n2 for all nontrivial V . In particular, there are n2 = N o(1) possible values of d, so to prove our theorem it suffices to show that X −ε+o(1) (22) P(BV ) ≤ N

V ∈Ωd for all 1 ≤ d ≤ n2. 626 TERENCE TAO AND VAN H. VU

We first handle the (simpler) case when d is large. Note that if

µ µ X1 ,...,Xn−l, y1, . . . , yl span V, then some subset of n − l − 1 vectors Xi together with the yj’s already span V (since the yj’s are independent). By symmetry, we have X P(BV )

V ∈Ωd X µ µ µ ≤ (n − l) P(X1 ,...,Xn−l−1, y1, . . . , yl span V )P(Xn−l ∈ V )

V ∈Ωd − d X µ µ n ≤ nN P(X1 ,...,Xn−l−1, y1, . . . , yl span V )

V ∈Ωd − d − d +o(1) ≤ nN n = N n .

This disposes of the case when d ≥ εn. It remains to verify the following lemma.

Lemma 9.1. For all sufficiently small positive constant ε, the following holds. If d is any integer multiple of 1/n such that

(23) 1 ≤ d ≤ (ε − o(1))n, then X −ε+o(1) P(BV ) ≤ N .

V ∈Ωd

Proof. For 0 < µ ≤ 1 we define the quantity 0 < µ∗ ≤ 1/8 as follows. If µ = 1 then µ∗ := 1/16. If 1/2 ≤ µ < 1, then µ∗ := (1 − µ)/4. If 0 < µ < 1/2, then µ∗ := µ/4. We will need the following inequality, which is a generalization of [25, Lemma 6.2].

Lemma 9.2. Let V be a nondegenerate nontrivial hyperplane. Then   1 ∗ P(Xµ ∈ V ) ≤ + o(1) P(Xµ ∈ V ). 2

The proof of Lemma 9.2 relies on some Fourier-analytic ideas of Hal´asz[9] (see also [10], [25], [26]) and is deferred until the end of the section. Assuming it for now, we continue the proof of Lemma 9.1. 1 Let us set γ := 2 ; this is not the optimal value of this parameter, but will suffice for this argument. µ∗ µ∗ µ µ Let AV be the event that X1 ,...,X(1−γ)n, X1 ,..., X(γ−ε)n are linearly µ∗ µ∗ µ independent in V , where Xi ’s are i.i.d. copies of X and Xj ’s are i.i.d. copies of Xµ. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 627

Lemma 9.3. (1−γ)−(1−ε)d+o(1) P(AV ) ≥ N .

Proof. Note that the right-hand side on the bound in Lemma 9.3 is 0 µ∗ µ∗ µ µ the probability of the event AV that X1 ,...,X(1−γ)n, X1 ,..., X(γ−ε)n belong to V . Thus, by Bayes’ identity it is sufficient to show that 0 o(1) P(AV |AV ) = N . From (21), (24) P(Xµ ∈ V ) = (1 + O(1/n))δ(µ)d and hence by Lemma 9.2

∗ (25) P(Xµ ∈ V ) ≥ (2 + O(1/n))δ(µ)d. On the other hand, by (20)

∗ P(Xµ ∈ W ) ≤ (1 − µ∗)n−dim(W ) for any subspace W . Thus by Bayes’ identity, we have the conditional proba- bility bound

∗ ∗ P(Xµ ∈ W |X(µ ) ∈ V ) ≤ (2 + O(1/n))−1δ(µ)−d(1 − µ∗)n−dim(W ) ≤ δ(µ)−d(1 − µ∗)n−dim(W ). When dim(W ) ≤ (1 − γ)n, the bound is less than one when ε is sufficiently 1 small, thanks to the bound on d and the choice γ = 2 . µ∗ µ∗ Let Ek be the event that X1 ,...,Xk are linearly independent. The above estimates imply that 0 −d ∗ n−k P(Ek+1|Ek ∧ AV ) ≥ 1 − δ(µ) (1 − µ ) for all 0 ≤ k ≤ (1 − γ)n. Thus applying Bayes’ identity repeatedly, we obtain 0 −o(1) P(E(1−γ)n|AV ) ≥ N . To complete the proof, observe that since P(Xµ ∈ W ) ≤ δ(µ)n−dim(W ) for any subspace W , it follows that by (24), P(Xµ ∈ W |Xµ ∈ V ) ≤ (1 + O(1/n))δ(µ)−dδ(µ)n−dim(W ).

Let us assume E(1−γ)n and denote by W the (1−γ)n-dimensional subspace µ∗ µ∗ µ µ spanned by X1 ,...,X(1−γ)n. Let Uk denote the event that X1 ,..., Xk ,W are liearly independent. We have 0 pk = P(Uk+1|Uk ∧ AV ) 1 ≥ 1 − (1 + O(1/n))δ(µ)−dδ(µ)n−k−(1−γ)n ≥ 1 − δ(µ)−(γ−ε)n+k 100 628 TERENCE TAO AND VAN H. VU for all 0 ≤ k < (γ − ε)n, thanks to (23). Thus by Bayes’ identity we obtain

0 o(1) Y o(1) P(AV |AV ) ≥ N pk = N 0≤k<(γ−ε)n as desired.

Now we continue the proof of the theorem. Fix V ∈ Ωd. Since AV and BV are independent, by Lemma 9.3,

P(AV ∧ BV ) −(1−γ)+(1−ε)d+o(1) P(BV ) = ≤ N P(AV ∧ BV ). P(AV ) Consider a set µ∗ µ∗ µ µ µ µ X1 ,...,X(1−γ)n, X1 ,..., X(γ−ε)n,X1 ,...,Xn−l µ µ of vectors satisfying AV ∧BV . Then there exists εn−l−1 vectors Xj ,...,Xj µ µ 1 εn−l−1 inside X1 ,...,Xn−l, which together with µ∗ µ∗ µ µ X1 ,...,X(1−γ)n, X1 ,..., X(γ−ε)n, y1, . . . , yl, n−l  span V . Since the number of possible indices j1, . . . , jεn−l−1 is εn−l−1 = 2(h(ε)+o(1))n (with h being the entropy function), by conceding a factor of 2(h(ε)+o(1))n = N ah(ε)+o(1),

where a = log1/δ(µ) 2, we can assume that ji = i for all relevant i. Let CV be the event that µ∗ µ∗ µ µ µ µ X1 ,...,X(1−γ)n, X1 ,..., X(γ−ε)n,X1 ,...,Xεn−l−1, y1, . . . , yl span V. Then we have −(1−γ)+(1−ε)d+ah(ε)+o(1)  µ µ  P(BV ) ≤ N P CV ∧ (Xεn,...,Xn−l in V ) .

On the other hand, CV and the event (Xεn,...,Xn in V ) are independent, so

 µ µ  µ (1−ε)n+1−l P CV ∧ (Xεn,...,Xn−l in V ) = P(CV )P(X ∈ V ) . Putting the last two estimates together we obtain −(1−γ)+(1−ε)d+ah(ε)+o(1) −((1−ε)n+1−l)d/n P(BV ) ≤ N N P(CV ) −(1−γ)+ah(ε)+(l−1)ε+o(1) = N P(CV ). Since any set of vectors can only span a single space V , we have P P(C ) V ∈Ωd V ≤ 1. Thus, by summing over Ωd, X −(1−γ)+ah(ε)+(l−1)ε+o(1) P(BV ) ≤ N .

V ∈Ωd 1 −ε+o(1) With the choice γ = 2 , we obtain a bound of N as desired, by choosing ε sufficiently small. This provides the desired bound in Lemma 9.1. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 629

9.1. Proof of Lemma 9.2. To conclude, we prove Lemma 9.2. Let v = (a1, . . . , an) be the normal vector of V and define n Y Fµ(ξ) := ((1 − µ) + µ cos 2πaiξ). i=1 From Fourier analysis (cf. [25]) Z 1 µ µ P(X ∈ V ) = P(X · v = 0) = Fµ(ξ)dξ. 0 The proof of Lemma 9.2 is based on the following technical lemma.

Lemma 9.4. Let µ1 and µ2 be a positive numbers at most 1/2 such that the following two properties hold for for any ξ, ξ0 ∈ [0, 1]:

4 (26) Fµ1 (ξ) ≤ Fµ2 (ξ) and

0 0 2 (27) Fµ1 (ξ)Fµ1 (ξ ) ≤ Fµ2 (ξ + ξ ) . Furthermore, Z 1

(28) Fµ1 (ξ) dξ = o(1). 0 Then Z 1 Z 1

(29) Fµ1 (ξ) dξ ≤ (1/2 + o(1)) Fµ2 (ξ) dξ. 0 0

Proof. Since µ1, µ2 ≤ 1/2, Fµ1 (ξ) and Fµ2 (ξ) are positive for any ξ. From (27) we have the sumset inclusion

{ξ ∈ [0, 1] : Fµ1 (ξ) > α} + {ξ ∈ [0, 1] : Fµ1 (ξ)α} ⊆ {ξ ∈ [0, 1] : Fµ2 (ξ) > α} for any α > 0. Taking measures of both sides and applying the Mann-Kneser- Macbeath “α + β inequality” |A + B| ≥ min(|A| + |B|, 1) (see [17]), we obtain

min(2|{ξ ∈ [0, 1] : Fµ1 (ξ) > α}|, 1) ≤ |{ξ ∈ [0, 1] : Fµ2 (ξ) > α}|.

But from (28) we see that |{ξ ∈ [0, 1] : Fµ2 (ξ) > α}| is strictly less than 1 if α > o(1). Thus we conclude that 1 |{ξ ∈ [0, 1] : F (ξ) > α}| ≤ |{ξ ∈ [0, 1] : F (ξ) > α}| µ1 2 µ2 when α > o(1). Integrating this in α, we obtain Z 1 Z 1 F (ξ) dξ ≤ F (ξ) dξ. µ1 2 µ2 [0,1]:Fµ1 (ξ)>o(1) 0 630 TERENCE TAO AND VAN H. VU

On the other hand, from (26) we see that when Fµ1 (ξ) ≤ o(1), then Fµ1 (ξ) = 1/4 o(Fµ1 (ξ) ) = o(Fµ2 (ξ)), and thus Z Z 1

Fµ1 (ξ) dξ ≤ o(1) Fµ2 dξ. [0,1]:Fµ1 (ξ)≤o(1) 0 Adding these two inequalities we obtain (29) as desired. By Lemma 5.1 Z 1 µ P(X · v = 0) ≤ Pµ(v) ≤ Pµ/4(v) = Fµ/4(ξ)dξ. 0

It suffices to show that the conditions of Lemma 9.4 hold with µ1 = µ/4 ∗ R 1 and µ2 = µ = µ/16. The last estimate 0 Fµ1 (ξ) dξ ≤ o(1) is a simple corollary of the fact that at least log log n among the ai are nonzero (instead of log log n, one can use any function tending to infinity with n), so we only need to verify the other two. Inequality (26) follows from the fact that µ2 = µ1/4 and the proof of the fourth property of Lemma 5.1. To verify (27), it suffices to show that for any µ0 ≤ 1/2 and any θ, θ0 µ0 ((1 − µ0) + µ0 cos θ)((1 − µ0) + µ0 cos θ0) ≤ ((1 − µ0/4) + cos(θ + θ0)2. 4 0 0 θ+θ0 2 The left-hand side is bounded from above by ((1 − µ ) + µ cos 2 ) , due to convexity. Thus, it remains to show that θ + θ0  µ0  µ0 (1 − µ0) + µ0 cos ≤ 1 − + cos(θ + θ0) 2 4 4

0 θ+θ0 since both expressions are positive for µ < 1/2. By defining x := cos 2 , the last inequality becomes  µ0  µ0 (1 − µ0) + µ0x ≤ 1 − + (2x2 − 1) 4 4 which trivially holds. This completes the proof of Lemma 9.2.

University of California, Los Angeles, CA E-mail address: [email protected] Rutgers University, Piscataway, NJ E-mail address: [email protected]

References

[1] D. Bau III and L. N. Trefethen, , Society for Industrial and Applied Math. (SIAM), Philadelphia, PA, 1997. [2] B. Bollobas´ , Random Graphs, Second edition, Cambridge Studies in Adv. Math. 73, Cambridge Univ. Press, Cambridge, 2001. INVERSE LITTLEWOOD-OFFORD AND CONDITION NUMBER 631

[3] A. Edelman, Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Anal. Appl. 9 (1988), 543–560. [4] A. Edelman and B. Sutton, Tails of condition number distributions, SIMAX 27 (2005), 547–560. [5] P. Erdos˝ , On a lemma of Littlewood and Offord, Bull. Amer. Math. Soc. 51 (1945), 898–902. [6] ———, Extremal problems in number theory, Proc. Sympos. Pure Math. VIII (1965), 181–189, A. M. S., Providence, R.I. [7] P. Frankl and Z. Furedi¨ , Solution of the Littlewood-Offord problem in high dimensions, Ann. of Math. 128 (1988), 259–270. [8] J. Griggs, J. Lagarias, A. Odlyzko, and J. Shearer, On the tightest packing of sums of vectors, European J. Combin. 4 (1983), 231–236. [9] G. Halasz´ , Estimates for the concentration function of combinatorial number theory and probability, Period. Math. Hungar. 8 (1977), 197–211. [10] J. Kahn, J. Komlos´ , and E. Szemeredi´ , On the probability that a random ±1 matrix is singular, J. Amer. Math. Soc. 8 (1995), 223–240. [11] G. Katona, On a conjecture of Erd˝osand a stronger form of Sperner’s theorem, Studia Sci. Math. Hungar. 1 (1966), 59–63. [12] D. Kleitman, On a lemma of Littlewood and Offord on the distributions of linear com- binations of vectors, Advances in Math. 5 (1970), 155–157. [13] J. Komlos´ , On the determinant of (0, 1) matrices, Studia Sci. Math. Hungar. 2 (1967), 7–22. [14] N. Alon, M. Krivelevich, and V. H. Vu, On the concentration of eigenvalues of random symmetric matrices, Israel J. Math. 131 (2002), 259–267. [15] A. Litvak, A. Pajor, M. Rudelson, and N. Tomczak-Jaegermann, Smallest singular value of random matrices and geometry of random polytopes, Adv. in Math. 195 (2005), 491– 523. [16] J. E. Littlewood and A. C. Offord, On the number of real roots of a random algebraic equation. III, Rec. Math. Mat. Sbornik 12 (1943), 277–286. [17] A. M. Macbeath, On measure of sum sets II. The sum-theorem for the torus, Proc. Cambridge Philos. Soc. 49 (1953), 40–43. [18] L. P. Postnikova and A. A. Judin, An analytic method for estimates of the concen- tration function (Russian), Analytic Number Theory, Mathematical Analysis and their Applications (dedicated to I. M. Vinogradov on his 85th birthday), Trudy Mat. Inst. Steklov. 143 (1977), 143–151, 210. [19] B. A. Rogozin, The concentration functions of sums of independent random variables, Proc. of the Second Japan-USSR Symposium on Probability Theory (Kyoto, 1972), pp. 370–376. Lecture Notes in Math. 330, 370–376, Springer-Verlag, New York, 1973. [20] M. Rudelson, Invertibility of random matrices: Norm of the inverse, Annals of Math. 168 (2008), 575–600. [21] N. P. Salikhov, An estimate for the concentration function by the Esseen method. (Russian) Teor. Veroyatnost. i Primenen. 41 (1996), 561–577; translation in Theory Probab. Appl. 41 (1997), 504–518. [22] A. Sark´ ozy¨ and E. Szemeredi´ , Uber¨ ein Problem von Erd˝osund Moser, Acta Arithmetica 11 (1965), 205–208. [23] S. Smale, On the efficiency of algorithms of analysis, Bull. Amer. Math. Soc. 13 (1985), 87–121. 632 TERENCE TAO AND VAN H. VU

[24] R. Stanley, Weyl groups, the hard Lefschetz theorem, and the Sperner property, SIAM J. Algebraic Discrete Methods 1 (1980), 168–184. [25] T. Tao and V. H. Vu, On random ±1 matrices: Singularity and determinant, Random Structures Algorithms 28 (2006), 1–23. [26] ———, On the singularity probability of random Bernoulli matrices, J. Amer. Math. Soc. 20 (2007), 603–628. [27] ———, On the condition number of a randomly perturbed matrix, Proc. Thirty-Ninth Annual ACM Sympos. on Theory of Computing 2007, 248–255. [28] ———, Additive Combinatorics, Cambridge Univ. Press, Cambridge, 2006. (Received November 8, 2005)