Discrepancy, Graphs, and the Kadison

Asia Pacific Mathematics Newsletter1 Discrepancy,Discrepancy, Graphs, Graphs, and and the the Kadison–SingerKadison–Singer ProblemProblem N Srivastava N Srivastava

Discrepancy theory seeks to understand how well from T1 (and the rest, which must also be about a continuous object can be approximated by a half, comes from T2). We will henceforth refer discrete one, with respect to some measure of to the maximum deviation from perfect balance uniformity. For instance, a celebrated result due to (i.e. 1/2) over all x as the discrepancy of a partition. Spencer says that given any set family S , ... , S Note that every partition has discrepancy at most 1 n ⊂ [n], it is possible to colour the elements of [n] Red 1/2, so the guarantee of the theorem is nontrivial and Blue in a manner that: whenever 5 √α < 1/2. Si This type of theorem was conjectured to hold Si Si R | | 3 √n, ∀ || ∩ | − 2 | ≤ by Nik Weaver [2], with any constant strictly less where R [n] denotes the set of red elements. than 1/2 (independent of m and n) in place of ⊂ In other words, it is possible to partition [n] into 5 √α. The reason he was interested in it is that two subsets so that this partition is very close he showed it implies a positive solution to the to balanced on each one of the test sets Si. Note so-called Kadison–Singer (KS) problem, a central that a “continuous” partition which splits each question in operator theory which had been open element exactly in half will be exactly balanced since 1959. KS was itself motivated by a basic on each Si; the content of Spencer’s theorem is question about the mathematical foundations of that we can get very close to this ideal situation quantum mechanics — check out the blog soul- with an actual, discrete partition which respects physics [3] for an intuitive description of its the wholeness of each element. physical significance. If you want to know exactly Spencer’s theorem and its variants have had what the statement of KS is and how it can be applications in approximation algorithms, numer- reduced to finite-dimensional vector discrepancy ical integration, and many other areas. In this statements similar to Theorem 1, I highly rec- post I will describe a new discrepancy theorem [1] ommend the accessible and self-contained survey due to Adam Marcus, Dan Spielman, and myself, article written recently by Nick Harvey [4]. which also seems to have many applications. The In the rest of the post I will try to demystify theorem is about “uniformly” partitioning sets of what Theorem 1 is about, say a bit about the vectors in Rn and says the following: proof, and describe a simple application to graph Theorem 1. (implied by Corollary 1.3 in [1]) theory. Given vectors v , ... , v Rn satisfying v 2 α and 1 m ∈ � i� ≤ m 1. What the Theorem Says v , x 2 = 1 x = 1, (1) � i � ∀� � i=1 Let’s examine how restrictive the hypotheses of there exists a partition T T = [m] satisfying Theorem 1 are. To see that some bound on the 1 ∪ 2 norms of the vi is necessary for the conclusion of 2 1 = the theorem to hold, consider an example where vi, x 5 √α x 1. � � − 2 ≤ ∀� � 2 = i T one vector has large norm, say v1 3/4. In any ∈ j � � ... Thus, instead of being nearly balanced with re- partition of v1, , vm, one of the sets, say T1, will spect to a finite set family as in Spencer’s setting, contain v1. If we now examine the quadratic form in the direction x = v1/ v1 , we see that we require our partition of v1, ... , vm to be nearly � � balanced with respect to the infinite set of test v , x 2 v 2 = 3/4, � i � ≥ � 1� vectors x = 1. In this context, “nearly balanced” i T1 � � ∈ means that about half of the quadratic form so this partition has discrepancy at least 1/4.

(“energy”) of the v1, ... , vm in direction x comes The problem is that v1 by itself accounts for

October 2013, Volume 3 No 4 15 Asia Pacific Mathematics Newsletter 2

significantly more than half of the quadratic form so these vectors are isotropic. The normalised in direction x, and there is no way to get closer vectors have norms to half without splitting the vector. 2 1/2 2 v = W− w . Another instructive example is the one- � i� � i� dimensional instance v , ... , v R1, with v2 = 1 m ∈ i To better grasp what these norms mean, we can 1/m = α for all i and m odd. Here, the larger write: 2 = side of any partition must have i T vi, e1 � ∈ j � � 1/2 2 2 / + α/ 1/2 2 x, W− wi i Tj v 1 2 2, leading to a discrepancy of W− w = sup � � ( ) � ∈ i ≥ i T at least α/2. � � x 0 x x ∗ 1/2 1/2 2 In general, the above examples show that the W y, W− w = sup � i� T presence of large vectors is an obstruction to the y=W1/2x 0 y Wy existence of a low discrepancy partition. Theorem 2 y, wi 1 shows that this is the only obstruction, and = sup � � . y, w 2 if all the vectors have sufficiently small norm y 0 �i� i� then an appropriately low discrepancy partition Thus, the norms v 2 measure the maximum frac- � i� must exist. It is worth mentioning that by a tion of the quadratic form of W that a single vector more sophisticated example than the ones above, wi can be responsible for — exactly the critical Weaver has shown that the O( √α) dependence in quantity in the example at the beginning of this Theorem 1 cannot be improved. section. Let us now consider the “isotropy” condition These numbers are sometimes called “leverage (1). This may seem like a very strong requirement scores” in numerical linear algebra and statistics. at first, but it is in fact best viewed as a normali- As long as the leverage scores are bounded by α, sation condition. To see why, let us first write the we can apply Theorem 1 to v1, ... , vm to obtain a theorem using matrix notation. It says that given partition satisfying vectors v , ... , v Rn with v 2 α and 1 m ∈ � i� ≤ m 1 � √α� T = 1/2  T 1/2 T = 5 I � vivi W− � wiwi  W− � vivi I, 2 − �   i Tj i Tj  i=1 ∈  ∈  = 1 there is a partition T1 T2 [m] satisfying � + 5 √α� I. ∪ � 2 1 1 � 5 √α� I v vT � + 5 √α� I, 2 − � � i i � 2 We now appeal to the fact that A B iff MAM i T � � ∈ j MBM, for any invertible M (this amounts to a where A B means that simple change of variables similar to what we did � in (*)). Multiplying by W1/2 on both sides, we find xTAx xTBx x Rn, ≤ ∀ ∈ that the partition T1 T2 guaranteed by Theorem ∪ or equivalently that B A is positive semidefinite. 1 satisfies: − Now suppose I am given some arbitrary vec- m 1  T T n � 5 √α� wiw wiw tors w1, ... , wm R , which are not necessar- � i  � i 2 −  =  � ∈  i 1  i Tj ily isotropic. Assume that the span of the wi   ∈ is Rn (otherwise, change the basis and write m 1  T � + 5 √α� wiw . (2) them as vectors in some lower-dimensional Rk). � 2 � i   i=1  This implies that the positive semidefinite matrix   W := m w wT is invertible, and therefore has Thus, we have the following restatement of �i=1 i i 1/2 a negative square root W− . Now consider the Theorem 1: “normalised” vectors n Theorem 2. Given any vectors w1, ... , wm R , there = ∈ = 1/2 = is a partition T1 T2 [m] such that (2) holds with vi W− wi, i 1, ... , m ∪ α = max wT( m w wT)+w . i i �i=1 i i i and observe that Note that we have used the pseudoinverse m m T 1/2  T 1/2 instead of the usual inverse to handle the case viv = W− wiw W− = I, � i � i  n =  =  where the vectors do not span R . i 1  i 1 

16 October 2013, Volume 3 No 4 Asia Pacific Mathematics Newsletter3

For those who do not like to think about sums scale, but only with nonzero (rather than high) of rank one matrices (I know you’re out there), probability: Theorem 2 may be restated very concretely as: Theorem 5. (Theorem 1.2 in [1]) T Theorem 3. Any matrix Bm n whose rows w have If ǫ > 0 and v1, ... , vm are independent random vectors × i T T + n leverage scores wi (B B) wi bounded by α can be in R with finite support such that partitioned into two row submatrices B1 and B2 so m n T that for all x R : Eviv = I, ∈ � i i=1 2 2 2 (1/2 5 √α) Bx Bjx (1/2 + 5 √α) Bx . − � � ≤ � � ≤ � � and The reason this theorem is powerful is that lots E v 2 α � i� ≤ of diverse objects can be encoded as quadratic forms of matrices. We will see one such applica- for all i, then tion later in the post. m � T� + 2 P �� vivi � (1 √α)  > 0. � � ≤  � i=1 �  2. Matrix Chernoff Bounds and � �  The conclusion of the theorem is equivalent to Interlacing Polynomials the following existence statement: there is a point Let me quickly say a bit about the proof of ω Ω in the probability space implicitly defined ∈ Theorem 1. One reasonable way to try to find a by the vis such that good partition T1 T2 is randomly, and indeed ∪ � T� + 2 this strategy is successful to a certain extent. The �� vi(ω)vi(ω) � (1 √α) . � � ≤ �i m � tool that we use to analyse a random partition � ≤ � is the so-called “Matrix Chernoff Bound”, de- To prove the theorem, we begin by considering veloped and refined by Lust-Piquard, Rudelson, for every ω the univariate polynomial Ahlswede-Winter, Tropp, and others. The variant =  T that is most convenient for our application is the P[ω](x): det xI vi(ω)vi(ω) .  − �   i m  following:  ≤  The roots of P[ω] are real since it is the charac- Theorem 4. (Theorem 4.1 in [5]) n n teristic polynomial of a symmetric matrix, and in Given symmetric matrices A1 ... , Am R × and ∈ particular the largest root is equal to the spectral ǫ ... ǫ independent random Bernoulli signs 1, , m, we T norm of i m vi(ω)vi(ω) . have � ≤ The proof now proceeds in two steps. First, we m t2 P � ǫ A � t 2n exp   . show that there must exist an ω such that �� i i� ≥  ≤ · − m 2  � = �   2 i=1 Ai  � i 1 �   ��  � � λmax(P[ω]) λmax(EP), (3) = T ≤ Applying the theorem to Ai vivi and taking T = i : ǫ = +1 yields Theorem 1 with a where λ denotes the largest root of a poly- 1 { i } max α discrepancy of O( � log n), which is nontrivial nomial. This type of statement may be seen as when α O(1/ log n). This bound is interesting a generalisation of the probabilistic method to ≤ and useful in some settings, but it is not sufficient polynomial-valued random variables, and was to prove the Kadison–Singer conjecture (which introduced in the paper [6], where we used it to requires a uniform bound as n ), or for the show the existence of bipartite Ramanujan graphs → ∞ application in the next section. It may be seen of every degree. Note that (3) does not hold for as analogous to the discrepancy of O( �n log n) general polynomial-valued random variables — achieved by a random colouring of a set family in general, the roots of a sum of polynomials S , ... , S [n], which is easily analysed using the do not have much to do with the roots of the 1 n ⊂ usual Chernoff bound and a union bound. individual polynomials. The reason it holds in this In order to remove the logarithmic factor particular case is that the P[ω] are generated by and obtain Theorem 1, we prove the following sums of rank-one matrices (which by Cauchy’s stronger but less general inequality, which con- theorem produce interlacing characteristic poly- trols the deviation of a sum of independent rank- nomials) and form what we call an “interlacing one matrices at a constant rather than logarithmic family”.

October 2013, Volume 3 No 4 17 Asia Pacific Mathematics Newsletter 4

The second step is to upper bound the roots This is a good time to mention that the leverage

of the expected polynomial µ(x):= EP(x). It turns scores of the incidence vectors bij in any graph G out that the right way to do this is to write µ(x) have a natural interpretation — they are simply as a linear transformation of a certain m-variate the effective resistances of the edges ij when the

polynomial Q(z1, ... , zm), and show that Q does graph is viewed as an electrical network (this m not have any roots in a certain region of R . happens because inverting LG is equivalent to This is achieved by a new multivariate general- computing an electrical flow, and the quantity T + isation of the “barrier function” argument used x LGx is equal to the energy dissipated by the in [7] to construct spectral sparsifiers of graphs. flow.) In any case, for the complete graph, all of The multivariate barrier argument relies heavily the edges have effective resistances equal to 2/n, on the theory of real stable polynomials, which so we may apply Theorem 2 with α = 2/n to are a multivariate generalisation of real-rooted conclude that there is a partition of the edges into

polynomials. two sets, T1 and T2, each satisfying Rather than giving any further details, I en- 1/2 O(1/ √n) L b bT courage you to read the paper. The proof is not − Kn � ij ij ij Tk difﬁcult to follow, and from what I have heard ∈ 1/2 + O(1/√n) L . (4) quite “readable”. � Kn

Now observe that each sum over Tk is the Lapla-

3. Partitioning a Graph into Sparsiﬁers cian LGk of a subgraph Gk of Kn. By recalling the connection to cuts, this implies that Kn can be par- One very fruitful setting in which to apply The- titioned into two subgraphs, G1 and G2, each of orem 2 is that of Laplacian matrices of graphs. which approximates its cuts up to a 1/2 O(1/√n) Recall that for an undirected graph G = (V, E) on ± factor. n vertices, the Laplacian is the n n symmetric × This seems like a cute result, but we can go a matrix deﬁned by: lot further. As long as the effective resistances of

= T edges in G1 and G2 are sufﬁciently small, we can LG bijbij , ij E apply Theorem 2 again to each of them to obtain ∈ where b := (e e ) is the incidence vector of the four subgraphs. And then again to obtain eight ij i − j edge ij. The Laplacian quadratic form subgraphs, and so on. How long can we keep doing this? The an- xTL x = (x(i) x(j))2 G − swer depends on how fast the effective resistances ij E ∈ grow as we keep partitioning the graph. The fol- encodes a lot of useful information about a graph. lowing simple calculation reveals that they grow For instance, it is easy to check that given any cut geometrically at a favourable rate. Initially, all of S V, the quadratic form xTL x of the indicator the effective resistances are equal to ℓ = 2/n. After ⊂ S G S 0 vector xS(i) = 1 i S is equal to the number of one partition, the maximum effective resistance of { ∈ } T edges between S and S. Thus, the values of x LGx an edge in Gk is at most completely determine the cut structure of G. (We T + 1 + ℓ := max b L b (1/2 O(1/√n))− b L b 1 ij Gk ij ij Kn ij mention in passing that the extremisers of the ij Gk ≤ − ∈ 1 quadratic form are eigenvalues and are related to = (1/2 O(1/√n))− (2/n). various other properties of G — this is the subject − · of spectral graph theory.) In general, after i levels of partitioning, we have the inequalities: Now consider G = Kn, the complete graph on

n vertices, which has Laplacian 1 2 exp (O( ℓi 1))ℓi 1 (1/2 O( ℓi 1))− ℓi 1 − − ≥ − − − T LK = bijb . ℓ n ij ≥ i ij 1 (1/2 + O( ℓi 1))− ℓi 1 ≥ − − An elementary calculation reveals that the lever- (3/2)ℓi 1, age scores in this graph are all very small: ≥ −

as long as ℓi 1 is bounded by some sufﬁciently T + 2 − b L bij = . ij Kn n small absolute constant δ. Applying these inequal-

18 October 2013, Volume 3 No 4 Asia Pacific Mathematics Newsletter5 ities iteratively we find that after t levels: [9], which says that up to a certain extent, any t 1 sufficiently good expander graph may be parti- − ℓ t exp   ℓ  ℓ t 2 O � � i 0 tioned into sparser expander graphs. It may also ≤   =    i 0  be seen as an unweighted version of the spectral t 1 − t   t 1 i  sparsification theorem of Batson, Spielman, and 2 exp O �(2/3) − − ℓt 1 (2/n)  � −  ≤ ·   =  · myself [7], which says that every graph has a   i 0  exp (O(√δ)) 2t(2/n), weighted O(1)-factor spectral approximation with ≤ · O(n) edges. The recursive partitioning argument and the inequalities are valid as long as we main- that we have used is quite natural and appears to tain that ℓt 1 δ. Taking binary logs, we find that − ≤ have been observed a number of times in various these conditions are satisfied as long as contexts; see for instance paper of Rudelson [10], O(√δ) + t + log (2/n) log (δ), as well as the very recent work of Harvey and ≤ Olver [11]. which means we can continue the recursion for t = log n 1 + log (δ) O(√δ) = log n O(1) − − − 4. Conclusion and Open Questions levels. This yields a partition of Kn into O(n) subgraphs, each of which is an O(1)-factor spectral Theorem 1 essentially shows that under the t approximation of (1/2 )Kn, in the sense of (4). This mildest possible conditions, a quadratic form/ latter approximation property implies that each sum of outer products can be “split in two” while of the graphs must have constant degree (by con- preserving its spectral properties. Since graphs sidering that the degree cuts must approximate can be encoded as quadratic forms/outer prod- t those of (1/2 )Kn) and constant spectral gap; thus ucts, the theorem implies that they also can be we have shown that Kn can be partitioned into “split into two” while preserving some proper- O(n) constant degree expander graphs. ties. However, a lot of other objects can also be The real punchline, however, is that we did encoded this way. For instance, applying The- not use anything special about the structure of orem 1 to a submatrix of a Discrete Fourier n Kn other than the fact that its effective resistances Transform (it also holds over C ) or Hadamard are bounded by O(1/n). In fact, the above proof matrix yields a strengthening of the “uncertainty works exactly the same way on any graph on n principle” for Fourier matrices, which says that a vertices with m edges, whose effective resistances signal cannot be localised both in the time domain are bounded by O(n/m) — for such a graph, the and the frequency domain; see paper of Casazza same calculations reveal that we can recursively and Weber [12] for details. This strengthening partition the graph for log (m/n) O(1) levels, has implications in signal processing, and its − while maintaining a constant factor approxima- infinite-dimensional analogue is useful in analytic tion! Note that the total effective resistance of any number theory. For a thorough survey of the unweighted graph on n vertices is n 1, so the consequences of the Kadison–Singer conjecture − boundedness condition is just saying that every and Theorem 1 in many diverse areas, check effective resistance is at most a constant times the out [13]. average over all m edges. To conclude, let me point out that the cur- For instance, the effective resistances of all rent proof of Theorem 1 is not algorithmic, since n edges in the hypercube Qn on N = 2 vertices it involves reasoning about polynomials which are very close to 1/2n = 1/2 log N. Thus, repeat- are in general #P-hard to compute. Finding a edly applying Theorem 2 implies that it can be polynomial-time algorithm which delivers the partitioned into O( log N) constant degree sub- low-discrepancy partition promised by the theo- graphs, each of which is an O(1)-factor spectral rem is likely to yield further insights into the tech- approximation of 1/ log N Q . In fact, this type niques used to prove it as well as more connec- · n of conclusion holds for any edge-transitive graph, tions to other areas — just as the beautiful work of in which symmetry implies that each edge has Moser-Tardos [14], Bansal [15], and Lovett-Meka exactly the same effective resistance. [16] has done for the Lovasz Local Lemma and The above result may be seen as a gener- Spencer’s theorem. It would also be nice to see alisation of the theorem of Frieze and Molloy if the methods used here can also be used to

October 2013, Volume 3 No 4 19 Asia Pacific Mathematics Newsletter 6

recover known results in discrepancy theory, such [6] A. Marcus, D. A. Spielman and N. Srivastava, as Spencer’s theorem itself. Interlacing families I: Bipartite Ramanujan graphs of all degrees, arXiv:1304.4132 (2013). [7] J. Batson, D. A. Spielman and N. Srivastava, Twice- Ramanujan sparsiﬁers, arXiv:0808.0163 (2009). Acknowledgements [8] D. G. Wagner, Multivariate stable polynomials: theory and applications, arXiv:0911.3569 (2009). Thanks to Nick Harvey, Daniel Spielman, and [9] A. M. Frieze and M. Molloy, Splitting an expander Nisheeth Vishnoi for helpful suggestions, com- graph, J. Algorithms 33(1) (1999) 166–172. [10] M. Rudelson, Almost orthogonal submatrices of ments, and corrections during the preparation of an orthogonal matrix, arXiv:math/9606213 (1996). this article. Special thanks to my coauthors Adam [11] N. J. A. Harvey and N. Olver, Pipage rounding, Marcus and Daniel Spielman of Yale University. pessimistic estimators and matrix concentration, arXiv:1307.2274 (2013). [12] P. G. Casazza and E. Weber, The Kadison–Singer problem and the uncertainty principle, Proc. Amer. References Math. Soc. 136(12) (2008) 4235–4243. [13] P. G. Casazza, M. Fickus, J. C. Tremain and E. We- [1] A. Marcus, D. A. Spielman and N. Srivas- ber, The Kadison–Singer problem in mathematics tava, Interlacing families II: Mixed characteris- and engineering: a detailed account, in D. Han, tic polynomials and the Kadison–Singer problem, P. E. T. Jorgensen and D. R. Larson, eds. Contem- arXiv:1306.3969 (2013). porary Mathematics, Vol. 414 (2006), pp. 299–356. [2] N. Weaver, The Kadison–Singer problem in dis- [14] R. A. Moser and G. Tardos, A constructive proof of crepancy theory, Discrete Mathematics 278 (2004) the general Lovasz Local Lemma, arXiv:0903.0544 227–239. (2009). [3] B. Roberts, Philosophy and physics in the Kadison– [15] N. Bansal, Constructive algorithms for discrep- Singer conjecture, http://www.soulphysics.org/ ancy minimization, arXiv:1002.2259 (2010). 2013/06/philosophy-and-physics-in-the-kadison- [16] S. Lovett and R. Meka, Constructive discrep- singer-conjecture/ ancy minimization by walking on the edges, [4] N. J. A. Harvey, An introduction to the Kadison– arXiv:1203.5747 (2012). Singer problem and the paving conjecture (2013). [5] J. A. Tropp, User-friendly tail bounds for sums of random matrices, arXiv:1004.4389 (2011).

Nikhil Srivastava Microsoft Research, Bangalore, India [email protected]

Nikhil Srivastava graduated in Mathematics and Computer Science from Union College, Schenectady, NY in June 2005. In May 2010 he obtained PhD from Yale University in Computer Science with the dissertation “Spectral Sparsification and Restricted Invertibility". He did his postdoc at Princeton, MSRI (Berkeley) and IAS (Princeton). Currently he is with Microsoft Research, Bangalore, India. His current research interests are theoretical computer science, linear algebra, random matrices, and convex geometry.

20 October 2013, Volume 3 No 4