Abstract Spectral Sparsification and Restricted Invertibility

Nikhil Srivastava 2010 B n × m m ≥ n <  < In this thesis we prove the following two basic statements in linear algebra. Let Spectral Sparsification. S be an arbitrary matrix where and suppose 0 1 is given.m×m dn/ e −  BBT  BSBT   BBT T 1. 2 ThereBB is a nonnegative2 diagonalweighted matrix 2 with at most B nonzero entries for which (1 n ) (1 + ) . Thus the spectral behavior of is captured by a subset of the kBkF Restricted Invertibility. Sm×m k −  columns of , of size proportional to its rank . kBk2 T BSB k 2 2 2 kBk 2.  F There is a diagonal withB at least = (1 ) m 2 kBk nonzero2 entries, allS equal to 1, for which hasnumericaleigenvalues rank greaterF than kBk2 . Thus there is a large coordinate restriction of (i.e., a submatrix2 of its 2 columns, given by ), of size proportional to its , which is S well-invertible. This improves a theorem of Bourgain and Tzafriri [14]. O mn / O −  mn We give deterministic3 2 algorithms2 for3 constructing the promised diagonal matrices in time (n ) and ((1 ) ), respectively. O n By applying (1) to the class of Laplacian matrices of graphs, we show that every graph on vertices can be spectrally approximated by a weighted graph with ( ) edges, thus generalizing the concept of expanders, which are constant-degree approx- imations of the complete graph. Our quantitative bounds are within a factor of two of those achievedO byn then celebrated Ramanujan graphs. We then present a second graph sparsification algorithm based on random sampling, whichB produces weaker sparsifiers with ( log ) edges but runs in nearly-linear time. We also prove a refinement of (1) for the special case of arising from John’s decompositions of the identity, which allows us to show that every convex body is close to one which has very few contact points with its minimum volume ellipsoid. Spectral Sparsification and Restricted Invertibility

A Dissertation Presented to the Faculty of the Graduate School of in Candidacy for the Degree of Doctor of Philosophy

by Nikhil Srivastava

Dissertation Director: Daniel A. Spielman

May 2010

Copyright c 2010 by Nikhil Srivastava All rights reserved.

ii for Mamma, Papa, Neha, Moni Didi, and Cundi.

iii Acknowledgments

This thesis and graduate school in general turned out better than I could have imag- ined, and that is because of many people. First I want to thank Dan Spielman, who I was outrageously lucky to have as my advisor. Dan tolerated and encouraged me in the early days, when I had no idea what a singular value was and was often para- lyzed by fear. Homer (Simpson) said tryingMATLAB is the first step towards failure; without Dan’s support I might still be stuck in that well. Dan was generous in every respect  with time, freedom, ideas, advice, and refutations of conjectures (which should have been my job). He was happy for me when I succeeded, and he smiled like a kid at my first conference talk. Without his heroic optimism (and of course, his bold mathematics), we would certainly have given up on some of the stronger theorems in this thesis. Knowing Dan is a pleasure and an honor, both personally and intellectually, and unquestionably one of the best things in my life. I am forever indebted to my two great undergraduate advisors at Union: Alan Taylor drew me to mathematics with his beautiful lectures, went out of his way to get me involved in research, supplied wise maxims which I live by to this day, and was generally an awesome role model; Peter Heinegg taught me how to read and write and (along with Rosie Heinegg) cut weeds and paint furniture  I ended up minoring in English after trying to take every class that he offered, and they were all great. I also learned a lot from lectures and collaborations with Chris Fernandes, Brian Postow, Steve Sargent, and Bill Zwicker. At Yale, I thank Dana Angluin for being my friend and unofficial guide starting the day I arrived, David Pollard for teaching the best course I have ever taken, and Joan Feigenbaum for convincing me to come here in the first place. I thank my committee members Vladmir Rokhlin, Steve Zucker, and Shang-Hua Teng for checking out my thesis, and Linda Dobb for helping me submit it. I thank my pal Lev Reyzin, with whom I had fun and symbiotic collaborations and some of my first successes in research in graduate school, as well as my first sprats. I was also very fortunate to work with Josh Batson and Adam Marcus, and some of our joint results appear here. I thank Ravi Kannan and Rina Panigrahy, who mentored me during wonderful summer internships at Microsoft Research in Bangalore and Silicon Valley. Moses Charikar invited me to give one of my first talks; I was lucky to run into him again at SVC, where we had several exciting discussions about graphs. I enjoyed hanging out and working with many interns during those summers  Hari, Moritz, Debmalya, Yi, Madhur  and also with Amit Deshpande, Ittai Abraham, and Parikshit Gopalan.

iv I thank my friends for being rad and holding time together in one piece, among them in vaguely chronological order: The Entity, Omar Ali, Bukhari, Trinka, Alameen, Neethu, The Cage (and groupies), The School for Nocturnal Learning, Shipdog, Anja, the AKW diasapora  Eli, Edo, Aaron, Pradipta, Ronny, Kevin, Alex, Felipe, Rich, Sam, Nick, Justin, Antonis, Andi, Azza, Reynard, Huan  Yao, Dominik, Argyro, Akshi, Jennifer, Ricardo, Jackie, and Hannah. I thank Ulli for zaubering. I thank Broken Social Scene for supplying a wicked soundtrack and multiple religious experiences. I also owe a lot to acid-free paper, which looks just like regular paper. I thank my grandfather, my philosophical opponent and friend for many years, and my grandmothers and all of their children for spoiling me. Last, I thank my parents and sister, who raised me right, loved me, and gave me a happy childhood which continues to the present day.

v Contents

1 Introduction 1

1.1 Spectral Sparsification ...... 2 1.2 Restricted Invertibility ...... 5 1.3 Summary ...... 8 2 Preliminaries 9 1.4 Bibliographic Note ...... 8

2.1 Linear Algebra ...... 9 2.1.1 Basic Notions ...... 9 2.1.2 The Pseudoinverse ...... 11 2.1.3 Formulas for Rank-one Updates ...... 11 2.1.4 Positive Semidefinite Matrices ...... 12 3 Spectral Sparsification 14 2.2 Graphs and Laplacians ...... 12

3.1 Strong Sparsification ...... 14 3.1.1 Intuition for the Proof ...... 16 3.1.2 Proof by Barrier Functions ...... 18 3.1.3 Optimality ...... 25 3.2 Weak Sparsification by Random Sampling ...... 25 4 Sparsification of Graphs 29 3.3 Other Notions of Matrix Approximation ...... 28

4.1 Twice-Ramanujan Sparsifiers ...... 32 4.1.1 Expanders: Sparsifiers of the Complete Graph ...... 33 4.1.2 A Lower Bound ...... 34 4.2 Sparsification by Effective Resistances ...... 36 4.2.1 Electrical Flows ...... 37 5 Application to Contact Points 43 4.2.2 Computing Approximate Resistances Quickly ...... 38

5.1 Introduction ...... 43 5.2 Approximate John’s Decompositions ...... 46 5.2.1 An Outline of The Proof ...... 47

vi 5.2.2 Realizing the Proof ...... 50 6 Restricted Invertibility 59 5.3 Construction of the Approximating Body ...... 53

6.1 Introduction ...... 59 7 The Kadison-Singer Conjecture 65 6.2 Proof of the Theorem ...... 60

vii Chapter 1

Introduction

Matrices are central objects in Mathematics and Computer Science. They can be used to represent a wide variety of data, for instance: graphs, point configurations, poly- hedra, covariances of random variables, systems of linear equations, Markov chains,A T quantum states,aij recurrences, second order derivatives,x 7→ Ax and general linear transforma-x 7→ x Ax tions. Depending on the context, we may emphasize different features of a matrix : the entries ( ) themselves, the linear function , the quadratic form , the eigendecomposition/invariant subspaces, and so on; every square matrix comes equipped with all of these interpretations, and each can be enlightening in its own way. In this thesis, we will investigate the extent to which arbitrary matrices can be approximated by ‘sparse’ or ‘structured’ ones. We will show that for a certain large class of matrices this is always possible in a certain useful sense. We will then use • (Chapter 4). G n that fact and related ones to obtain new results in: H O n Every undirected weighted graph on ver- tices can be approximated by a weighted graph which has ( ) edges. This generalizes the concept of expander graphs, which are constant-degree approx- imations of the complete graph  in fact, the quantitativeH bounds which we achievesparsifier are within a factor of two of those achieved byG the celebrated Ramanu- jan graphs. Besides being of inherentH interest,O n the graph , which we call a can be used to speedH up computationsO n n on . We give a polynomial time algorithm for constructing with ( ) edges, as well as a nearly-linear • Asymptotic Convex Geometry (Chapter 5). K n time algorithm for constructing with ( log ) edges. R H O n Every convex body in is close to a body whichH hasO atn mostn ( ) contact points with the minimum volume ellipsoid which contains it. This improves a result of Rudelson [38], who showed • Functional/Numerical Analysis (Chapter 6). the existence of with ( log ) contact points. coordinate numerical rank Every matrix with large trace has a subspace of size proportional to its on which it is well-invertible (i.e., close to an isometry). Qualitatively, this fact has been

1 known for some time as the Restricted Invertibility Theorem of Bourgain and Tzafriri [14]. Our construction is significantly simpler and supplies much sharper quantitative bounds.

A notable feature of our proofs is that they use only elementary linear algebra and yield deterministic polynomial time algorithms for constructing the desired objects. All previous results in these directions were weaker and had considerably more complex proofs which were probabilistic or nonconstructive. All of our theorems rest on two foundational results, which we will motivate and introduce in the next two sections. In each case, we will start by asking a basic question about matrices, show that the spectral theorem provides a weak answer to 1.1that question, Spectral and then Sparsification present the result of this thesis as a more satisfying answer.

A n

T The object of study in this section isA a positiveviv semidefinitei ; matrix of rank written i≤m as a sum of outer products X m = n vi where the number of terms may be much more than the rank . Here we think of the as being ‘elementary’, ‘meaningful’, or ‘interesting’ directions arising from the combinatorial,A probabilistic, geometric, or other origin of the matrix. The following examples illustrate how such a representation can provide a more natural coordinate Example 1.1. G V ; E; w system for than simply considering its entries. n Laplacian G (Graph Laplacians). Suppose = ( ) is an undirected weighted T graph on vertices. Then theLG wij eofi − ejise definedi − ej as: ij∈E X = ( )( n)

Thus the space of Laplacians of weighted graphs on vertices is simply the set of e − e e − e T i; j ∈ n ; i 6 j all nonnegative linear combinationsi j i of j  ( )( ) : [ ] = Example 1.2. X ⊂ n which correspond to the possible edges between vertices.R p X (Covariance Matrices). Suppose is a discrete set of points T T and we are interested in probabilityA Ep distributionsxx p x xxon . Then the set of covariance x∈X matrices X {xxT x ∈ =X} = ( )

is the convex hull of : .

2 T T n × m B bi A BB i≤m bibi B More generally, such a representation as a sum of outer productsP can arise from any matrix with columns by considering = = ; we often doQuestion this, for 1.3. instance, whenA computingsparse the singular values of . The question we seekA to answer is: small number When does admit a representation as a sum of outer products?A Being able to write as the sum of a of outer products may be Answeruseful in 1.4. succinctly storing, computing with, and understandingA the behavior of . One answer to this question is readily supplied by the spectral theorem. (Sparse Representation in Eigenbasis).λ ≥ :::; As≥ λisn symmetric and positive semidefinite,u ; : : : all un of its eigenvalues are nonnegative real numbers. Thus there are 1 efficiently computable nonnegative weights (the eigenvalues) and 1 T vectors (the eigenvectors)A for whichλiuiui ; i≤n X A = n and we have written compactly as a weighted sum of outer products, which is the minimal number. A The above representation has several unique and desirable geometric properties  for instance, the eigenvectors are orthogonal and cleanly decouple{uj } thej≤n action of into separate components  which account for the immense power{ ofvi} thei≤m spectral theorem. However, it has theA serious drawback that the eigenvectors need not havevi meaningful interpretations in terms of the elementary directions ; indeed, decomposing the action of nalong eigenspaces mayn be quite unnatural in terms of the uj. To see this moreuj concretely, consider that Answeredges 1.4 allowsei − ej us to represent the Laplaciannot of any graph on vertices as a sum of outer products in its eigenvectorsn T ; however, the do not in general correspond to uj uj( ), and so this does tell us anything about, say, the graph being equivalent a sparser graph with edges. In essence, as soon as we start talking about which do not come from our set of elementary directions, we have left the space of matrices which correspond T ton graphs.A i≤m vivi A T vivi To remedy the above situation, wen consider a more demanding task: given a rank P T matrix = , we seek to represent by as shorteri ≥ sum of theA elementaryi sivivi , ideally of length comparable to as attained in Answer 1.4. UnfortunatelyP it is not always the case that there are sparseapproximation weights 0 forA which = exactly. What we show is that such weights do always exist (and can be computed efficiently) if we are willing to settle for an of rather than an exact

Definitionrepresentation. 1.5 Let us take a moment to. describe the notion of approximationSn that we use. (Matrix Approximation) The positive semidefinite cone comes with A  B A − B ∈ S : a natural partial order, given by n

iff

3 multiplicative approximation A κ A This induces a strong notion of for such matrices: we A  A  κ · A: say that ˜ is a -approximation for if ˜ kA−Ak The above compares favorably with more common notions of approximation that appear in the literature, such as requiring ˜ to be small in some norm; we defer Theorema more thorough 1.6 discussion to Section. 3.3.Suppose <  < and We are now in a position to state our first main theorem. T (Spectral Sparsification)A vivi 0 1 i≤m X n = are given, with vi ∈ R . Then there are nonnegative weights {si}i≤m, at most dn/ e of which are nonzero, for which 2

T −  A  A sivivi   A: i≤m 2 X 2 (1 ) ˜ = (1 + ) There is an algorithm which computes the weights si in deterministic O n m/ time. A any 3 2 ( ) sparseIn short, the theorem says that if can be written as nonnegative linear combinationdn/ ofe elementary (rank one) matrices, then it can be approximated by a nonnegative2 linear combination of the same elementary matrices. The number of terms is only a constant factor greater than that promised by the spectral theorem in Fact 1.4. In fact, we show in Section 3.1.3 that the sparsity/approximation tradeoff which we achieve cannot be improved by more than a factor of 4. n We mention in passingdictionary that the above bears some syntactic{vi}i≤m ⊆ resemblanceR to the n well-studiedx ∈ R sparse approximation problem for vectors (see,si for instance,x [48]),i siv ini x which we are given a of elementary vectors and aP signal vector and we seek a sparse set of coefficients for which = . We remark that the analogue of Answer 1.4 in this case would be to write as a sparse sum of just one vector, namely itself. However, there are important differences between this setting and ours, such as requiring the coefficientsA spectral to sparsifierbe nonnegative,A T and we do not dwell on the comparisonA further in this thesis.A i vivi one v vT A ˜ We prove Theorem 1.6 in Chapteri i 3, and we call a P for . The procedure we give for finding ˜ is iterative: given = , it selects some multiple of outer product to add to ˜nin each step, and sparsity is ensuredvi by proving that the running sum converges to a matrix with the appropriate spectral properties inv ai small number of steps, linear in . The rule for selecting a particular in any step is greedy, and based on two ‘barrier’ functions which blow up if one tries to choose a which affectsA the eigenvalues in an undesirable manner. The analysis of why the process works is local in that we just prove that each step by itself brings us closer to approximating .

4 At the end of Chapter 3, in Section 3.2, we observe that a simple randomizedA method for spectral sparsification follows immediately from aO probabilityn n/ inequality of Rudelson [39] regarding sums of random rank one matrices. The approximation2 ˜ produced by that method has a weaker sparsity guarantee of ( log ) terms, but it can be computed more quickly than the stronger one promised in Theorem 1.6. In Chapter 4 we motivate and define a spectral notion of approximation for undi- rected weighted graphs, which corresponds exactly to restricting attention to the class of Laplacian matrices described in Example 1.1. We observe that Theorem 1.6 imme- diately indicates a way to deterministically construct near-optimal sparsifiers of such graphs. Then we describe a faster algorithm based on the random sampling approach discussed in Section 3.2, which gives sparsifiers of somewhat worse quality but runs in nearly linear time. vi In Chapter 5, we prove a refinement of Theorem 1.6 which provides additional guarantees in the case when we are given elementary vectors whose mean is zero. We apply this to prove a sparsification result for John’s decompositions of the identity, which characterize the contact points of convex bodies with their minimum 1.2volume ellipsoids, Restricted and use Invertibility that to improve a result of Rudelson on the matter [38].

n × m B m n The resultsR R of this section are best appreciated in the slightly more general settingB where we consider an arbitrary matrix , which we view as a linearx 7→ operatorBx from to . An important step towards understandingB the behavior of such a m m is to determineB ⊆ R whether/whereB it is⊆ invertible,R i.e., where the functionB is injective. It is wellB known thatB the→ invertibiltyB of is characterized by the subspaces rowspan( ) and colspan( ) , both of dimension rank( ), for which the restrictedFact 1.7. map : rowspanB ( ) colspan( ) is a bijection. We summarize this information as: Every matrix is invertible precisely on a linear subspace of dimension equal to its rank, and this subspace is spanned by any maximal linearly independent subset of its columns. B 11T However, both rank and invertibility are fragile notions in the sense that adding a tiny amount of noise to a degenerateleast singular matrix value such as = immediately makes it full rank and invertible everywhere. A more quantitative and robust measure of in- vertibility is supplied by the T T , which/ is defined for any rectangular x B Bx T matrix as: σmin B kBxk λmin B B : kxk x6 xT x 1 2   p ( ) = min=1 = min=0 B = ( ) σmin B > well-invertible Theσmin leastB > csingular value tells usc  how much can shrink a vector in the worst case. Thus a matrix is invertible precisely when ( ) 0, and when ( ) for some constant 0.

5 numerical rank

kBk BT B The corresponding quantitative notion ofF rank turns out to be the , B T : which is defined as: kBk2 kB Bk 2 Tr( ) B nrank( ) = 2 = 2 B ≤ B B We think of nrank as counting the number of directions in which is large. Notice T that we always have nrank( ) rank( ), with11 equality when' is an orthonormaln basis. Also observe that this definition is robust against perturbations, so that in the degenerate example mentioned earlier nrank( + noise) 1 rather than . Question 1.8. B We may now ask if a quantitativen×m version of Fact 1.7 holds, namely: numerical rank B well-invertible Given a matrix , is there a subset of columns of size equal to the so that the restriction of to these columns is ? Answer 1.9. A BBT As in the previous section, a partial answer is provided byn× then spectral theorem. λi A σi B A (Well-Invertibilityi λi onλ Largei ≤ kA Eigenspace.)k Let = 2 , so that the eigenvaluesA of are equal to the squares of the singular values, , of . Recalling P A 2 that Tr( ) = and that eachk , we canB deduce by Markov’s inequality kAk that has at least λ ; : : : ; λ 1 Tr( ) 1 k = 2 = nrank( ) 2 2 1 A kBk eigenvalues greater thanθ F ; n n 2 1 Tr( ) 1 = = n 2 2 L ⊂ R u ; : : : ; uk − m 0 whichD we assume for normalizationB L ⊂ R is at least a constant. B D → L / 1 Let denoteσ ; : the: : ;1 σ spank of the correspondingθ eigenvectors , and let be its preimage . Thenk the restricted1 2 linear map : 1 has singularB values , all greater than , and is therefore well-invertible. Moreover, these subspaces have dimension , which is proportional to the numerical B rank of . T Thus weBB have shown that every matrix is well-invertible on a subspace of size proportional to its numerical rank, and this subspace isL spanned by the top eigen- T vectors of BB. Again, the answer provided by the spectral theorem is geometricallyB satisfying in certain ways  for instance, the subspace columnswhich weB restrict to is in- variant under  but it fundamentally disrespects the structure of the matrix in others.S ⊂ m In particular, it doesB not supply a restrictioncoordinate to of as in Fact 1.7. The truly satisfying answer to Question 1.8 would be an appropriately large subset B S → B [ ] of the columns of for whichS R the S restriction B n × |S| B S : colspan( ) S is well-invertible, where denotes the submatrix of with columns in . In 1987, Bourgain and Tzafriri [14] proved that this is true upto constant factors

6 m n B

when = and the columns of have unit length. Their result was used to great effect in geometric functional analysis and convex geometry. It can be seen as a Ramsey type theorem, in that it says that every matrix satisfying some reasonable properties contains a large submatrix (corresponding to the coordinate subspace) which is ‘structured’ in the sense that it is close to anB isometry by virtue of not shrinking or stretching vectors too much. It can also be viewed as an ‘average caseB vs. worst case’ kind of statement, in that it says that if is well-invertible for an average vector (i.e., has large trace), then there is some coordinate restriction of which is well-invertible for every vector. − Bourgain and Tzafriri’s proof used probabilistic and functional-analytic techniques and was nonconstructive. Moreover, the constant factors it achieved were far from72 practical  in particular, the bounds were off from those in Answer 1.9 by about 10  and so the theorem was not very influential in numerical analysis even though it is qualitatively related to the Column Subset Selection problem and the Rank-Revealing QR Factorization [47]. These constants were improved in recent works by Tropp [47] (who also gave a randomized polynomial time algorithm), Casazza [16], and others in certain regimes, but they remained far from optimal. The second main theorem of this thesis is an elementary constructive proof of the Theoremfollowing 1.10 generalization of Bourgain and. Given Tzafriri’s an n× theorem,m matrix withB, there a sharp exists constant a subset of S1. ⊂ m of linearly independent columns of size (Restricted Invertibility) kBk [ ] |S| ≥ −  F −  B kBk2 2 2 2 for which the coordinate restriction(1 )BS has2 = least (1 singular) nrank value( ) at least

kBkF σmin BS ≥  : m 2 2 2 Moreover, S can be found deterministically( ) in O −  n m time. 2 3 ((1 ) ) T si ≥In Chapter 6, we usei≤m s aiv variantivi of the barrierA method of Chapter 3 (with one barrier) to prove Theorem 1.10.P Theorem 1.6 guarantees the existence of a sparse set of scalars 0 forS which approximates ; here, we are able to guarantee that the scalars1 are are either 0 or 1, corresponding to whether or not a column is chosen to be in . At a high level, this becomes possible because we are only interested in a lower bound on the spectrum of the vectors wesi select (this is what is meant by ’well-invertible’), as opposed to both upper and lower bounds as in TheoremT 1, andT A BB i≤m bibi this gives us more freedom toS choose⊂ m the weights . −  A b bT  1 kAk iP∈S i i A  ThisI connection becomes clearer if we state Theorem 1.10 in terms2 Tr( of) = = , m bi i∈S 2 P for2 Tr( which) it guarantees a subset [ ] of size at least (1 ) satisfying span( : ) .

7 1.3 Summary

Thus, our two main results can be seen as ‘coordinate’ versionsvi of Answers 1.4 and 1.9uj  they supply comparable quantitative conclusions to those implied by the spectralbarrier theorem,method but in terms of columns / elementary vectors rather than eigenvectors . They share the same iterative method of proof, which we will refer to as the . Unlike previous approaches, this method immediately yields deterministic polynomial time algorithms. Because of the pliability of the method as well as the central place of linear algebra in mathematics, we are able to apply our results to 1.4prove interesting Bibliographic theorems in Note diverse areas.

Many of the results presented in this dissertation have been published or submitted for publication. The new results in Chapter 3 are joint work with Joshua Batson and Daniel Spielman, and appeared in [8]. The results of Chapter 4 are joint work with Daniel Spielman and appeared in [41]. Chapter 6 is also joint work with Daniel Spielman, and has been submitted for publication.

8 Chapter 2

Preliminaries

2.1 Linear Algebra

Most of the proofs in this thesis rely only on elementary facts about symmetric ma- trices. All of this material can be found inA; any B; X textbook on linear algebra, such as [27]. v; w; x A n × n We willR use uppercase letters such as to denote matrices andn × lowercase letters such as to denote vectors. In what follows, is an matrix with 2.1.1entries in Basicand Notions all vectors are column vectors (matrices of dimension 1). • Entries. A i; j v i

• Transpose,We Norm, will use Inner parentheses Product, as and in Outer( ) and Product.( ) to denote the entries of mmatrices× n and vectors,X respectively.n × m X T i; j X j; i v; w ∈ Rn inner product The transpose of an matrix is the matrix ( ) = ( ). T If are vectors thenhv; w thei v w v iisw ai real: number given by i≤n X = = ( ) ( ) √ Euclidean norm kvk vT v outer product vwT n × n v i w j The of a vector is denoted by = . • Spectral Theorem. A λ ; : : : λn The is an n matrix with entries ( ) ( ). u ; : : : un ∈ R 1 If is symmetric then there are real numbers and 1 Au λ u orthonormal vectors fori whichi i

= T A uiui : i≤n and X =

9 eigenvalues eigenvectors

• Trace. trace These are called the and , respectively.

The of a matrix is the sum of its diagonal entries; equivalently, the sum of its eigenvalues: A A i; i λi: i≤n i≤n X X Tr( ) = ( ) = • Determinant. determinant

The of aA matrix isλ giveni; by i≤n Y det( ) =

• Characteristic Polynomial. characteristic polynomial the product of the eigenvalues. p x xI − A TheA λ ; : : : λ ( ) = det( n ) • Image and Kernel. A 1 A {Ax x ∈ n} has zeros equal to the eigenvalues . R A {x Ax } If is symmetric then the subspaces im( ) = : • Rank. A and ker( ) = : = 0 are orthogonal to each other. A λi The rank of is equal to the number of linearly independent columns of • Matrix Product. A a ; : : : ; a ∈ n B bT ; : : : ; bT ∈ , the dimension of its image, and the numbern R of nonzero eigenvalues . n Rn 1 T 1 If has columnsAB aibi : and has rows i≤n , then X • Matrix Norms. spectral norm =operator norm

kAk kAxk: The or kxk of a matrix is given by 2 = sup=1 A kAk i λi

Frobenius norm 2 When is symmetric then = max , the largest eigenvalue. / / The is given by T / kAkF A i; j 1 2 A A λi 1 2 : i;j≤n ! 1 2 i≤n ! X 2  X 2 = ( ) = Tr( ) = k · k

When we omit the subscript in , we refer to the spectral norm.

10 • Quadratic form and Entrywise Product.

T A • B A i; j · BLeti; j A B i;j≤n X entrywise = ( ) ( ) = Tr( ) quadratic form A v 7→ vT Av Rn → R denote the product. T T The of is thev functionAv A • vv : from , and can be written as • Courant-Fisher Theorem. λ ≤ λ ;::: ≤ λ = n A 1 2 The eigenvalues of a symmetric vT Av vT Av matrix haveλi the following variational characterization: ; S S n−i v∈S\{ } vT v S S i v∈S\{ } vT v

S =:dim( max)= +1 min0Rn =:dim( min)= max0

2.1.2where The Pseudoinverseranges over subspaces of .

An×n

n If is symmetric then we can diagonalize it andT write A λiuiui i X = λ ; : : : ; λk =1 A u ; : : : ; uk Moore-Penrose Pseudoinverse A 1 1 where are the nonzero eigenvaluesk of and are a corresponding set of orthonormal eigenvectors. The T of is then A uiui : λi defined as i + X 1 ⊥ = =1 A A A + k The pseudoinverse behaves as the inverse on ker( )T . Notice that ker( ) = ker( ) AA A A uiui ; and that i + + X = = =1 A A AA A A A A ⊥ which is simply the projection+ onto the span+ of+ the nonzero eigenvectors of (which are2.1.3 also the Formulas eigenvectors for ofRank-one). Thus, Updates= is the identity on im( ) = ker( ) .

The following well-known and easily verified identity describes the behavior of the inverse of a matrix under rank-one updates (see [27, Section 2.1.3]). As many of our constructions involve iteratively building a matrix by adding one outer product at a time, we will use it frequently.

11 Lemma 2.1 . If A is a nonsingular n × n matrix and v is a vector, then A− vvT A− (Sherman-Morrison Formula)T − − A vv A − T − : 1 v A 1v 1 1 ( + ) = 1 determinant 1 + There is a related formula describing the change in the of a matrix Lemma 2.2 . If A is nonsingular and v is a vector, then under the same update: A vvT A vT A− v : (Matrix Determinant Lemma) 1 2.1.4 Positive Semidefinitedet( + Matrices) = det( )(1 + )

positive semidefinite matrices R

In this thesis we will often focus on symmetric over , Definition 2.3. A ∈ n×n positive semidefinite (PSD) which have the following equivalent characterizations.R B ∈ n×m A BBT A symmetricR matrix is if A b bT 1. There is a matrix for which =i i . Thus we can write i≤m X = bi B A where are the columns of .

2. is in the conical hullT of the setn of rank one symmetricT outer products,n i.e. A ∈ vv v ∈ R aivivi ai ≥ ; vi ∈ R : ( i≤m )  X cone : = : 0 v ∈ Rn vT Av A 3. For every vector the quadratic form is nonnegative. n × n S 4. The eigenvalues of are nonnegative real numbers.n

We will denote the cone of PSD matrices by . This cone comes with a A  B A − B ∈ S : natural partial order given by n

2.2 Graphs and Laplaciansiff

G V ; E; w n m we ≥ G Let = ( ) be a connected weighted undirected graph with vertices and edges and edge weights 0. If we orient the edges of arbitrarily, we can write

12 T L B WB Bm×n signed edge-vertex incidence matrix

its Laplacian as = , where is thev e , B e; v − v e given by   1 if is ’s head ( ) = 1 if is ’s tail Wm×m  W e; e we T 0 otherwise be B e u; v and is the diagonal matrix with ( ) = . To put this another way, the b χ − χ row of corresponding to an edgee =u ( )v is given by χ u u = 1 where is the canonical basis vector with a 1 in position . Thus the Laplacian T T admits an expansionL as a sumweb ofeb outere productswuv χu − χv χu − χv : e∈E uv∈E X X L= = ( )( )

xT Lx xT BT W Bx kW / Bxk It is immediate that is positive semidefinite since: 1 2 2 n wu;v xu − xv ≥ ;2 x ∈ R = u;v ∈E = X 2 = ( ) 0 for every . G ( ) L W / B 1

1 2 and that is connected if and only if ker( ) = ker( ) = span( ).

χu ei; ej 1 e We are using to denote canonical vectors here rather than the more conventional to avoid confusion with edges named .

13 Chapter 3

Spectral Sparsification

In the first section we will prove our main spectral sparsification result, Theorem 1.6, restated shortly for convenience. In the second section we will describe a weaker version of this result, due to Rudelson [39], which is based on random sampling and 3.1is computationally StrongSparsification more efficient.

Theorem 3.1 . Suppose <  < and Let us begin by recalling the statement of Theorem 1.6. T (Spectral Sparsification,A copy ofviv Theoremi 1.6) 0 1 i≤m X n = are given, with vi ∈ R . Then there are nonnegative weights {si}i≤m, at most dn/ e of which are nonzero, for which 2

T −  A  A sivivi   A: i≤m 2 X 2 (1 ) ˜ = (1 + ) (3.1) There is an algorithm which computes the weights si in deterministic O n m/ time. 3 2 ( )

For conceptualA I and notational convenience, we will actually prove the following equivalent statement, which amounts to saying that it is sufficient to consider the Theorem 3.2. Suppose d > and v ; v ;:::; v are vectors in n with case when = . m R 1 1 2 T 1 vivi I i≤m X

vi = u; l 1 We use boldface to denote the vectors in order to avoid confusion with lowercase letters which denote real numbers in the subsequent proofs.

14 Then there exist scalars si ≥ with |{i si 6 }| ≤ dn so that √ 0 T : d= 0 d I  sivivi  √ I: i≤m d − d! X + 1 + 2 + 1 2 Proof of Theorem 3.1. A Theorem 3.1 can be deduced immediately from this. T Assume withoutA lossw ofiw generalityi : that has full rank and i≤m X − / = vi A wi 1 2 Setting = , we see thatT − / T − / vivi A wiwi A I; i i≤m ! X 1 2 X 1 2 = = d /

si ≥ dn/ e 2 satisfying the requirement2 of Theorem 3.2. We√ can now set = 1 and obtain scalars 0, at most Tnonzero,d for whichd  I  sivivi  √ I I: d − d −  2 i≤m !   X + 1 + 2 1 + T = A −  i siwiwi+ 1 2 1 2 ˜ P  Thus if we take = (1I  ) −  − A− / thenAA− /  I; −  2 2 1 2 1 2   1 + (1 ) ˜ 1

which is equivalent to the desired conclusion. si The rest of this{ sectionvi} is devoted to proving Theorem 3.2. The proof is constructivesi T andA yieldsi sivi avi deterministic polynomial time algorithm for findingA the scalars . T A siviv GivenP vectors , our goal is to choosei a small set of coefficients so that = is well-conditioned. We will build the matrix in steps, starting with = 0 and adding one outer product at a time. Before beginning the proof, it will be instructive√ to study how the eigenvalues and characteristic polynomial of a matrix d √d evolved − d upon the addition of a vector. This discussion should provide some intuition2 for the+1+2 structure of the proof, and demystify the origin of the ‘Twice-Ramanujan’ number +1 2 which appears in our final result. 2 See Section 4.1

15 3.1.1 Intuition for the Proof A vvT A

T AIt isvv well known that the eigenvalues of + interlace those of . In fact, the new eigenvalues can be determined exactly by looking at the characteristic polynomial of + , which is computed using LemmaT 2.2 as follows: hv; uj i pA vvT x xI − A − vv pA x − ; x − λ 2 j i ! + X ( ) = det( ) = ( ) 1 λi A uj pA vvT x λ where are the eigenvalues of and are the corresponding eigenvectors. The + p λ λ A polynomial ( ) hasA two kinds of zeros : j v uj T 1. Those for which ( ) = 0. These arevv equal to the eigenvalues of for which the added vector is orthogonal to the corresponding eigenvector , and which p λ 6 do not therefore ‘move’A upon adding . hv; u i 2. Those for which ( ) =f 0λand − j : λ − λ 2 j j ! X ( ) = 1 = 0

These are the eigenvalues which have moved and strictly interlace the old eigenvalues. The above equation immediately suggests a simple physical model v, u 2 = 1/4 which gives intuition as to where these new eigenvalues are located.h ni

λn v, u 2 = 0 λn h 3i v, u 2 = 1/4 h 2i

λ3 v, u 2 = 1/2 λ3 h 1i λ2 λ2

λ1 λ1

PhysicalFigure Model. 3.1: Physical model of interlacingλ eigenvalues. n λj We interpret the eigenvalues as charged particles lying on a slope. On the slope are fixed, chargeless barriers located at the initial eigenvalues , and each particle is resting against one of the barriers under

16 vvT hv; uj i λj the influence2 of gravity. Adding the vector corresponds to placing a charge of on the barrier corresponding to . The charges on the barriers repel those on the eigenvaluesj with a force that is proportional to the charge on the barrier and inversely proportional to the distance from the barrier  i.e., the hv; u i force from barrier is given by j ; λ − λj2

λj λ

a quantity which is positive for ‘below’ , which are pushing the particle ‘upward’, and negative otherwise. The eigenvalues move up the slope until they reach an equilibrium in which the repulsive forces from the barriers cancel the effectf λ of gravity, which we take to be a +1 in the downward direction. Thus the equilibrium condition corresponds exactly to having the total ‘downward pull’ ( ) equal to zero. A random {vi} With this physical modeluj in mind, we begin to consider what happens to the eigenvalues of whenv ∈ we {vi add}i≤m a vector from our set . The first observation is that for any eigenvector (in fact for any vector at all), the expected projection of a randomly chosen is T T kuj k Evhv; uj i hvi; uj i uj vivi uj : m m m 2 m 2 i 2 i ! 1 X 1 X 1 = = = = vi

Of course, this does not mean that there is any single vector in our set that realizes/m this ‘expected behavior’3 of equal projections on the eigenvectors. ButA if we were to add such a vector in our physical model, we would add equal charges of 1 to each of the barriers, and we would expect all of the eigenvalues of to drift forward ‘steadily’. In fact, one might expect that after sufficiently many iterationsλmax/λ ofmin this process, the eigenvalues would all march forward together, with no eigenvalue too far ahead or too far behind, and we would end up in a position where is bounded. In fact, this intuition turns out to be correct. Adding a vector with equal projections changes the characteristic polynomial in the/m following manner: 0 p T x p x − p x − /m p x ; A vavgvavg A x − λ A A j j ! + X 1 ( ) = ( ) 1 = ( ) (1 ) ( )

3 For concreteness, we remark that thisvavg ‘average’√ vectoru would: be precisely m j j 1 X =

17 0 pA x j i6 j x − λi A p x xn k P Q = since ( ) = ( ). If we start with = 0, which has characteristic 0 p x I − /m D k xn polynomial ( ) = , then afterk iterations of this process we obtain the polynomial D x ( ) = ( (1 ) ) I − αD α > where is the derivativeassociated with respect Laguerre to . polynomials Fortunately, iterating the operator ( ) for any 0 generates polynomials which are members of a standard orthogonalk dn family – the [20]. These polynomials are very well-studied and the locations√ of their zeros are known; in particular, after d d = iterations the ratio of the largest√ to the smallestn → ∞ zero is known [20] to approach d − d + 1 + 2 as , + 1 2 which is exactly what we want. T vavgTo prove the theorem, we will show that we can choose a sequence ofvi actualvi vectors thats realizesi ≥ the expected behavior (i.e. the behavior of repeatedly adding ), as long as we are allowed to add arbitrary fractional amounts of the via the weights 0. We will control the eigenvalues of our matrix by maintaining two barriers as in the physical model, and keeping the eigenvalues between them. The lower barrier will ‘repel’ the eigenvalues forward; the upper one will make sure they do not go too far. The barriers will move forward at a steady pace. By maintaining that the total ‘repulsion’ at every step of this process is bounded, we will be able to guarantee that there is always some multiple of a vector to add that allows us to 3.1.2continue Proof the process. by Barrier Functions

We begin by defining two ‘barrier’ potential functions which measure the quality of the eigenvalues of a matrix. These potential functions are inspired by the inverse law Definition 3.3. u; l ∈ R A λ ; λ ; : : : ; λn of repulsion in the physical model discussed in the last section. 1 2 Foru A uIand− A −a symmetric matrix with eigenvalues: , u − λi define: def 1 i X 1 Φ ( ) = Tr( )− = (Upper potential) l A A − lI : λi − l def 1 i X 1 AΦ≺( uI) = Tr( A lI) = λ A < u(Lowerλ potential)A > l A u l max min As long as and (i.e., ( ) and ( ) ), these potentialuI − A functionsA − lI measure how far the eigenvalues of are from the barriers and . In u particular, they blow up as any eigenvalue approaches a barrier,A since≤ then (or λi ) approaches a singularu matrix.λi Their strength lies in thatk they reflect thek locations of all the eigenvalues simultaneously: for instance, Φ ( ) 1 implies that no is within distance one of , no 2 ’s are at distance 2, no are at distance ,

18 u A A u l A and so on. In terms of the physical model, the upper potential Φ ( ) is equal to the T total repulsion of the eigenvalues of from the upperi si barriervivi , while Φ ( ) is the analogous quantity for the lower barrier. P To prove the theorem, we will build the sum iteratively, adding one A ;A ;:::;A Q vector at a time. Specifically, we will construct a sequence of matrices (0) (1) ( ) u ; l ; δ ; δ ;   0 = U L U L 4 0 0 along with positive constants and which satisfy the following con- u u l l ditions: u A  0 0 A  : (a) Initially, the barriers are at = U and = l and the potentialsL are 0 (0) 0 (0) Φ ( ) = and Φ ( ) = vi (b) Each matrix is obtained by a rank-one update of the previous one  specifically A q A q tvvT v ∈ {v } t ≥ by adding a positive multiple of an outer product ofi some . ( +1) ( )

= + u forl someδU δL and 0. q ; ;:::Q (c) If we increment the barriers and by and respectively at each step, u δU A q ≤ u A q ≤  u u qδ then the upper and lower potentials do notU increase. For everyU = 0 1 , + ( +1) ( ) A q ≤ A q ≤  l l 0 qδ Φ l δ(L ) Φ l( ) L for = + L . ( +1) ( ) + q0 ; ;:::Q Φ ( ) Φ ( ) for = + . λ A q < u qδ λ A q > l qδ : (d) No eigenvalue ever jumps across aU barrier. For every = 0 1 L , ( ) ( ) max 0 min 0 ( ) + u ; l and; δU; δL; U( ) L + Q dn A Q 0 0 To complete the proof we will choose( ) and√ so that after = λ A Q u dnδ d d steps, the condition number of is boundedU by √ Q ≤ : λ A( ) l dnδL d − d max 0 ( ( ) ) + + 1 + 2 A Q min 0 = dn ( ) + + 1 2 ( ) T By construction, is a weighted sum of at most vvof the vectors, as desired. The main technical challengeboth is to show that conditions (b) and (c) can be satisfied simultaneously  i.e., that there is always a choice of to add to the current matrix which allows us to shift barriers up by a constant without increasing either U L u n potential. We achieve this in the following three lemmas. l 4 −n δU δL / d / d − 0 On first reading, we suggest the reader follow the proof with the assignment = = 1, = , 0 = , = 2, = 1 3. This will provide the bound (6 + 1) ( 1), and eliminates the need to use Claim 3.7.

19 u u δU A u A The first lemmaλi concerns shiftingu the upper barrier. If we shift forward to + T without changing the matrix ,tvv then the upper potential Φλi ( ) decreasesl since the eigenvalues do not move and moves away from them. This gives us room to T add some multiple of a vector , whichvv will move the towards and increase the potential, counteracting the initial decrease due to shifting. The following lemma quantifies exactly how much of a given we can add without increasing the potential Lemma 3.4 . Suppose λ A < u, and v is any vector. If beyond its original value before shifting. vT u δ I − A − v max (Upper Barrier Shift)U T ( ) − ≥ u u δ v u δU I − A v UA v t A − U A 2 1 def 1 (( + ) ) then + + (( + ) ) = ( ) u δU Φ ( ) TΦ (u ) T A tvv ≤ A and λ A tvv < u δU: + T That is, if we add t times vv to A and shift themax upper barrier by δU, then we do not increase the upperΦ potential.( + ) Φ ( ) ( + ) +

T UA v vv 0 Proof. u u δU We remark that ( ) is linear in the outer product . Let = + . By the Sherman-Morrison formula, we can write the updated u δU A tvvT u0I − A − tvvT − potential as: + 0 − T 10 − 0 − t u I − A vv u I − A u I − A T 0 − Φ ( + ) = Tr( − tv 1u I −) A v 1  1  t (vT u0I −) A − (u0I − A )− v = Tr (0 −) + 1 u I − A T 0 − 1− tv (u I1− A ) v 1 1 Tr( ( XY) ( YX ) ) = Tr( ) + 1 tvT u0I 1− A − v( ) u δU A T 0 − since Tr is linear− tv andu I −Tr(A2 v) = Tr( ) + T 0 − ( ) 1 v u I − A v = Φu ( ) +u u δU A − A − A T 0 − 1 ( ) /t − v u I − A2 v + ( ) 1 v > v=T Φu0I(−) A −(Φv ( ) Φ ( )) + /t ≥ v UA 1 ( U) A u δU T u /t ≥ UA v 1 A tvv ≤ A T λ AsA tvv( ) < u( δU ) , the last+ term is finite for 1 ( ). By now sub- 0 0 T 0 u δU 0 T stitutingt ≤ t any 1 λ A ( )twevv find uΦ δU( + ) Φ ( t). This alsoA tellst vv us that max ( + ) + , as if this were not the case, then there would+ be some positive max for which ( + ) = + . But, at such a , Φ ( + l ) would blow up, and we have just established that it is finite. l δL A l AThe second lemma isl about shifting the lower barrier. Here,λi shifting forward to T + whiletvv keeping fixed hasλi the opposite effect  it increases the lower potential Φ ( ) since the barrier moves towards the eigenvalues . Adding a multiple of a vector will move the forward and away from the barrier, decreasing the

20 vvT l potential. Here, we quantify exactly how much of a given we need to add to compensate for the initial increase from shifting , and return the potential to its Lemma 3.5 . Suppose λ A > l, A ≤ /δ , and v is any original value before the shift. l L vector. If min (Lower BarrierT Shift) − ( ) Φ ( ) 1 v A − l δL I v T − < ≤ − v A − l δL I v LA v t l δL A − l A 2 1 def 1 ( ( + ) ) then 0 + ( ( + ) ) = ( ) Φ T( ) Φ ( ) T l δL A tvv ≤ l A and λ A tvv > l δL: T That is, if we add+ t times vv to A and shift themin lower barrier by δL, then we do not increase the lowerΦ potential.( + ) Φ ( ) ( + ) +

Proof. λ A > l l A ≤ /δL λ A > l δL T t > λ A tvv > l δL min 0 min First, observe that ( ) and Φ ( ) 1 implyl thatl δL ( ) + . min So, for every 0, ( + ) + . Now proceed as in the proof for the upper potential. Let = + . By Sherman- A tvvT A tvvT − l0I − Morrison, we have:l δL t A − l0I − vvT A1− l0I − + 0 − A − l I − T 0 − Φ ( + ) = Tr( + tv 1 A −) l v 1  1  t (vT A −)l0I − (A − l0I )− v = Tr ( 0 −) 1 A − l I − T 0 − 1 +tv A( −1 l I) v 1 1 tvTTr(A −(l0I − v ) ( ) ) = Tr( ) 1 l δL A − T 0 − tv A1− +l I2 ( v ) + ( ) vT A − l0I − v = Φ ( ) 1 l A l δL A − l A − T 0 − 1 + ( ) /t v A − l 2I v

+ ( ) 1 = Φ ( ) + (ΦA ( tvv) T Φ≤( )) A /t ≤ v l δL l 1 + ( LA ) T + tvv Rearranging shows that Φ ( + ) Φ ( ) when 1 ( ). The third lemma identifies the conditions under which we can find a single which allows us to maintain both potentials while shifting barriers, and thereby con- tinue the process. The proof that suchvavg a vector exists is by an averaging argument,t so this can be seen as thesi step in which we relate the behavior of actual vectors to the behavior of the expected vector . Notice that the use of variable weights , Lemma 3.6 . If λ A < u, λ A > l, u A ≤  , A ≤  , and from which the eventual arise, is crucial to this part of the proof.U l L U; L; δU and δL satisfy max min (Both Barriers) ≤ ( ) U ≤ ( −) L Φ ( ) Φ ( ) δU δL 1 1 0 + (3.2)

21 then there exists an i and positive t for which

T LA vi ≥ /t ≥ UA vi ; λ A tvivi < u δU;

and λ A maxtv vT > l δ : ( ) 1 ( ) i( i + ) L + Proof. min ( + ) + LA vi ≥ UA vi ; i i We will show that X X ( ) ( ) vT u δ I − A − v from which the claim willi followi byU Lemmas 3.4i and 3.5.T We begin by bounding− UA vi u u δ vi u δU I − A vi A − U A 2 i i X P X 1 −(( + )+T ) u ( δ)U =I − A • i vivi + (( −+ ) ) T u Φ u( δ) Φ ( ) u δU I − A • vivi A − 2 U A i ! P 1 X (( + ) ) +− ( ) = u δU I − A + (( + −) ) u Φ ( )u δΦ ( ) u δU I − A A − U A 2 1 Tr(( + ) T ) = v+ivi I + Tr((X •+I ) X ) Φ ( ) i Φ ( ) X u δ − λ − since i =U andi = Tr( ) u δU − − A i u − λi − i u δU2 − λi P + − u( 1 +δU − λi ) 1 = P i P +u ΦδU ( ) − − A δU ( i u −) λi u ( δ+U 2− λi ) P + u δ ≤ /δ ( U+A1 ; ) 1 = PU + Φ ( ) ( + )− ( + )− − u − λi u δU − λi ≥ u δU − λi 1 + Φ ( ) i 1 1 i 2 X u X ≤as/δU ( A) ≤( /δ+U U: ) ( + )

1 + Φ ( ) 1 +

22 T − i vi A − l δL vi T − On the otherLA hand,vi we have − vi A − l δL I vi A − A 2 i l δL l i X P X 1 − (( ( +T )) A −( l) = δL I +• i vivi ( − ( + ) ) T Φ ( ) Φ ( −) A − l δL I • vivi l δL A −2 l A P 1 i ! ( ( + ) ) − ( ) X = A − +l δL I ( ( −+ ) ) Φ ( ) Φ (−) A − l δL I l δL A − l A 2 1 Tr( ( + )T ) = + vivi I Tr( X • I( + )X) Φ ( )i Φ ( ) X − since i λi −=l − δandL = Tr( ) − − − − λi − l − δL : λ − l − δ − 2λ − l i i L i i i P X 1 ( 1− ) 1 ≥= P/δL − λi − l P /δL − L; ( ) ( i ) ( ) X 1 1 ( ) = 1

by Claim 3.7. Putting these together,v we≤ find that ≤ −  ≤ v ; UA i δ U δ L LA i i U L i X 1 1 X ( ) + ( )

Claim 3.7. If λ > l for all i, ≤ λ − l − ≤  , and /δ −  ≥ , then as desired. i i i L L L 1 λ −Pl − δ − i 0i ( L ) 1 0 − − − λ − l − δ − 2λ − l λ − l − δ i i P L i i i i L ( ) X 1 P 1 P 1 ( ) ≥ −( ) : δ λ − l L i i 1 X 1 Proof. (3.3) δL ≤ /L ≤ λi − l; Wei have 1 for every . So, the denominator of the left-most term on the left-hand side is positive,

23 and the claimed− inequality is equivalent to λi − l − δL ≥ − − λi − l − δL λi − l δL λi − l − δL λi − l i 2 i i ! i i ! X X 1 X 1 1 X 1 X 1 ( ) + δ δ L λ − l − δ λ − l δ L λ − l − δ λ − l i i L i ! L i i L i ! X 1 1 X 1 = + ( )( ) δ ( )( ; ) λ − l − δ λ − l L λ − l − δ λ − l 2 i i L i i i L i ! X 1 X 1 = + ( )( ) ( )( ) which, by moving the first term on the RHS to the LHS, is just δ ≥ δ : L λ − l − δ λ − l L λ − l − δ λ − l 2 i i L i i i L i ! X X 1 2 1 ( ) ( ) ( )( )

By Cauchy-Schwartz, δ ≤ δ δ L λ − l − δ λ − l 2 L λ − l L λ − l − δ λ − l i i L i ! i i ! i i L i ! X X X 1 1 1 2 ( )( ) ≤ δLL δL ( ) ( ) λ − l − δ λ − l i i L i ! X − 1 2 ( ) λi − l ≤ L ( ) ( ) X 1 ≤ sinceδL ( ) ; λ − l − δ λ − l i i L i ! X 1 2 1 − L ≥ ; δL ( ) ( ) 1 since 0

Proof of Theorem 3.2. U; L; δU δL and so (3.3) is established. q q A All weA need to doA now is set , and viin a manner that satisfies(0) Lemma 3.6 and gives( +1) a good bound( ) on the condition number. Then, we can q v ≥ q v take = 0 and construct LfromA i byUA choosingi any vector with ( ) ( ) A q A q tv vT ( ) ( ) i i t ≥ ( +1) ( )

(such a vector is guaranteed to existq v by≥ Lemma≥ 3.6)q v and: setting = + LA i t UA i for any 0 satisfying: ( ) ( ) 1 ( ) ( )

24 It is sufficient toδL take L √ l −n/L d √ √ d 1d − 0 δU = 1√ U = √ u = n/U: d − d d + 1 1 0 = = = √ 1 √ + d − d − We can check that:U √ √ √ − √ − L δU d d d d δL 1 1 1 1 1 + = + = 1 = n + 1 ( + 1) U U n L dn L so that (3.2) is satisfied. dn The initial potentials areλΦ (0)A = andn/UΦ (0)dnδ =U . After steps, we have dn ≤ λ A( ) −n/L dnδL max √ √ d d d ( ( ) ) √ +d √ min d− d− ( ) +√ + d − d +1 1 √ 1 d +d = √ ; d − d + 2 + 1 = 2 + 1 The Algorithm. as desired. vi O n m u δ I − A − u δ I − A − ToU turn this proof into2U  an algorithm, one must first compute the vectors , which can be done in time1 . For each2 iteration of theO algorithm,n we must UA vi3 LA vi compute (( + ) ) , (( + ) ) , and the same matrices  for the lower potentialvi function. This computationO cann m be performed in timedn . Finally, we O dn m can decide which vector to add in each iteration2  by computing ( ) and ( ) for each , which can be done in3 time . As we run for iterations, the total 3.1.3time of the Optimality algorithm is .

It can be shown that the approximation guarantee of Theorem 3.2 is optimal up to a constant factor of four. As the matrices which demonstrate this are the Laplacians of complete graphs, we present them in Section 4.1.2 along with the results on graph 3.2sparsification. Weak Sparsification by Random Sampling

The results of the previous section more or less settle the question of spectral spar- sification as we have formulated it. However, the method presented is not compu-

25 tationally practical for large matrices as it requires computing inverses, which is a slow operation. In this section we briefly discuss an alternate approach which was discoveredn by Rudelson and Vershynin in [40]. The approach, which is based on ran- dom sampling and can be computationally faster and more robust, but loses a factor of log in sparsity. Y |Y | ≤ M Theq starting point is to recallY ;:::;Y the Chernoffq bound from probabilityµ theoryq i≤ (see,q Yi for instance, [34]). Suppose is a boundedµ randomEY variable, ,1 and we 1 P take independent copies of it, . Then the empirical mean ˜ := concentrates very strongly about the mean := : −q P Yi − EY >  ≤ : q M 2 i≤q ! X   1 2 exp q M / 4µ  µ − 2 −2 Thus,µ if we take = 8 samples, we can expect ˜ to be withinµ of withq probability at least 1 exp( 2). This can be interpreted as a kind of sparsification, since may be a sum of many (even infinitely many) terms whereas ˜ has only terms. Y n × n It was discovered byO Rudelsonn [39] and later by Ahlswedeq and Winter [3] that the same phenomenon continues to hold if is a random matrix, but with a (necessary) blowup of (log ) in the numberEY I of samples that is required to attain concentration. Here is the slightly stronger version described in lecture notes of Theorem 3.8. Suppose Y is a random n×n matrix satisfying Y I and kY k ≤ M. Vershynin [50]; note the normalization = . E n If Y ;:::;Yq are independent samples of Y , then 2 = 1 −q P Yi − In >  ≤ n · :  q  M2 i≤q   1 X   exp q M n/ 2 Y 4

I 2 Thus, taking = 8 log samples of should suffice to obtain a good ap- proximation to . This suggests a strategy for sparsification which we formalize in the Theorem 3.9. Suppose v ; v ;:::; v are vectors in n with following theorem, analogous to Theoremm 3.2. R 1 2 T vivi I i≤m X and <  < is a constant. Let Y be the random= rank one matrix given by

n T 0 1 Y vivi with probability pi ∝ kvik : kvik 2 2 = 2

26 If we take q n n/ independent samples Y ;:::;Yq then 2 1 = 8 log −  I  Y   I q i i≤q 1 X with probability at least / (1. ) (1 + )

Proof. kv k I n p kv k /n i i 1 2 i i P 2 2 kvik n T Note that =E Tr(Y ) = , so that v=ivi I . Thus n 2 kv k i i X 2 M= n =

and invoking Theorem 3.8 with = gives the desired conclusion. pi An important feature ofpi this∝ kv samplingik procedure is that it is quite robust with regards to the sampling probabilities2 ; in particular, any constant-factor approxi- Corollary 3.10. Suppose v vT I, {p } are given by p ∝ kv k , and q are mation to the distribution i≤m i i also works.i i≤m i i i numbers satisfying qi ≥ pi/α and qi ≤ α for some factor α ≥ . Then taking P i≤m 2 q α n n/ independent samples Y ;:::;Yq from the distribution P= 2 2 1 q 1 = 8 log Y i i v vT with probability q : q i i i P i yields a sparsifier in that=

−  I  Y   I q i i≤q 1 X with probability at least / (1. ) (1 + )

Proof. Y 1 2 q α p Following the proof ofi Theoremi v vT ≤ 3.9, thei normi v vT of≤ αisn now bounded by q i i p /α i i P i Pi 2 M α n n ( ) 2 so we need to take = instead of just . kvik The flexibility of using approximate2 probabilities for sampling is important in applications, where the exact may be difficult to compute. This is, in fact, the case in the application to graph sparsification which we will see at the end of the next chapter.

27 3.3 Other Notions of Matrix Approximation

There isk aA large− Ak body≤  of workA on sparse [6, 2] and low-rankA [23, 2, 40, 21, 22] approx- imations for general matrices.A The algorithms in this literature provide guarantees ofx 2 the form ˜ , where is the originalA matrix and ˜ is obtained by entrywise or columnwise sampling of . This is analogous to satisfying (3.1) only for vectors in the span of the dominant eigenvectors of , and corresponds to an ‘additive’ rather than a multiplicative approximation. The above constructions have the advantage of being simple and quickly com- putable, but are not as useful in many contexts where much of the interesting infor- mation lies in the smaller eigenspaces. For instance, if we were to use these sparsi- fiers on Laplacian matrices of graphs (discussed in the next chapter), they would only preserve the large cuts whereas our spectral sparsifiers would preserve all the cuts.

28 Chapter 4

Sparsification of Graphs

In this chapter we will apply the spectral sparsification theorems of Chapter 3 to the class of matrices corresponding to Laplacians of undirected weighted graphs; more e − e e − e T i; j ∈ n ; i 6 j concretely, this is exactly thei conej ofi matricesj generated by  ( )( ) : [ ] = as discussed in Example 1.1. Let us begin by motivating the problem of graph sparsifi- cation and examining why theG spectralV ; E; notion w of approximationH for Laplacian matricesG corresponds to a combinatorially interesting notion of approximation for graphs. A sparsifier of a graph = ( ) is a sparse graph that is similar to inH some useful way. Many notions ofG similarity have been considered. For example, Chew’s [18] spanners have the property that the distance between every pair of vertices in is approximatelyG H the same as in . Benczur and Karger’s [9] cut-sparsifiers have the property that the weight of the boundary of every setH of verticesκ is approximatelyG V the samex in∈ Ras in . The spectral notion of similarity which we consider was first T T T introduced by Spielman andx TengLGx [42,≤ x 45]:LHx we≤ κ say· x thatLGx; is a -approximation of if for allLG LH, G H κ (4.1) where and are the Laplacian matrices of and . This matches exactly our definition from Chapter 1 of -approximation for PSD matrices. T Recall from Chapter 2 thatx LGx wu;v xu − xv ; u;v ∈E X 2 = ( ) ( ) wu;v u; v G x where is the weight of edge ( ) in . By considering vectors that1 are the characteristic vectors ofG sets, one can seen thatH condition (4.1)n is strictly strongerH than the cut condition of Benczur and Karger. It is worth mentioning that although 1 G x ; ; : : : ; n H To see strictness,G take to be a cycle on [ ] and to be a path on [ ]. Then every cut in is within half of its weight in , but the vector = (1 2 ) witnesses the fact that is not a spectral sparsifier for .

29 / / LH LG LH the spectral condition is stronger,+ 1 it2 can+ be1 2 verified easily in polynomial time by computingTheoretical the Motivation. eigenvalues of ( ) ( ) , whereas this is not clear for the cut condition as there are exponentially many cuts. Spectral graph sparsifiers preserve many interesting prop- xT Lx erties of graphs, in addition to the weights of cuts. The Courant-Fischer Theorem λi T : tells us that S S k x∈S x x λ ; : : : ; λn =:dim( maxL)=G min λ ;:::; λn LH

1 1 Thus, if are the eigenvaluesλi ≤ ofλi ≤andκλi;˜ ˜ are the eigenvalues of , then we have ˜ and the eigenspaces spanned by corresponding eigenvalues are related. As the eigen- xT D− / LD− / x values of the normalized Laplacian are given by λi T ; S S k x∈S 1x2 x 1 2

=:dim( max)= min D− L

1 and are the same as the eigenvalues of the walk matrix , we obtain the same relationship between the eigenvalues of the walk matrix of the original graph and its sparsifier. Many properties of graphs and random walks are known to be revealed by their spectra (see for example any text on Spectral Graph Theory [13, 19, 25]). While the existence of sparse subgraphs which retain these properties is interesting its own right, we also expect it to be useful in theoretical applications. We remark that the study of sparse approximations of the complete graph, otherwise known as expanders, has been hugely successful over the past few decades, with significant impact in such diversePractical fields Motivation. as robust network design, coding theory, and derandomization. We will discussH expanders in moreG detail in Section 4.1. H G There is also a practical reason for studying graph sparsifica-H tion: If is close to in some appropriate metric, thenH can be used as a proxy for in computations without introducing too much error. At the same time, since has very few edges, computation with and storage of should be cheaper. Thus a fast algorithm for graph sparsification can be used as a preprocessing step to speed up computations on graphs. This is an especially useful primitive to have given the increasingly large graphs we encounter in practice, for instance from internet and scientific data. Here are three outstanding examples of how different kinds of fast graph sparsification have been used to speed up computations. multiplicative spanners G k /k 1. Thorup andH Zwick⊂ G [46] gaveO n a fast random sampling algorithm for constructing ; specifically,1+1 on input and their algorithm produces u; v ≤ u; v ≤ k · u; v ∀u; v ∈ G; a subgraph G with ( H ) edges for whichG

dist ( ) dist ( ) dist ( )

where dist is the shortest path distance. They used these spanners to construct

30 fast and compact oracles for pairwise distance queries between vertices in a G m graph. c O m  H O n n 2. Spielman and Teng [45]c gave a algorithm which on input with edges pro- duces in ˜ ( ) time a spectral (1+ )-approximation with ( log ) edges for some large constant . They used these sparsifiers to construct preconditioners for symmetric diagonally-dominant matrices, which led to the first nearly-linear G time solvers for such systems of equations.  > H O n n/ 3. Benczur and Karger gave a nearly-linear timeH procedure which takes a graph±  2 and a parameterG 0, and outputs a weighted subgraph withO mn( log ) edges such that the weightO n of every cut in is withinst a factor of (1 ) of / its weight in . This was2 used to turnO Goldbergn m and Tarjan’sO ne( ) max-flow algorithm [26] into an e( ) algorithm for3 approximate2 -mincut,2 and appeared more recently as the first step of an e( + )-time (log ) approximation Two Sparsification Algorithms. algorithm for sparsest cut [32].

The strong and weak sparsification theorems of Chap- ter 3 yield graph sparsification algorithms with very different characteristics. Roughly speaking, the first one is mainly of theoretical interest while the second is more likely • G  > to be useful in practice. H dn/ e 2 Theorem 3.12 implies that every has, for every 0, a weighted subgraph −  L  L   L : with edges for which G H G 2 2 H (1 ) (1 + ) O mn / The procedure given in the3 proof2 of that theorem shows that can be computed in deterministic time ( ), which is prohibitively slow for large graphs. The significance of these sparsifiers lies in their small size  they have constant average degree for any constant , and can thus be seen as generalizations of expander graphs, which are constant degree approximations of the complete graph. In Section 4.1 we dwell on this analogy, and prove that our bounds • G  > H cannot be improved by more than a factor of 4. O n n/  Theorem 3.9 shows2 that every has, for every 0, a weightedG subgraph with ( log ) edges which is a spectralpe (1 + )-approximation. This con-ef- structionfective resistances is randomized and involves sampling the edges of independently with certain appropriate probabilities , which turn out to be exactly the of the edges. In Section 4.2, we show how to compute the effective resistances in nearly linear time, leading to an algorithm which is both fast and conceptuallyH simple, although itG is not optimal with respect to the sparsity/approximation ratio. 2 i.e., the set of edges of is a subset of the edges of , but the weights are different

31 4.1 Twice-Ramanujan Sparsifiers

G Ramanujan Graphs d√ H√ In the case where is the complete graph,d − excellentd − spectrald sparsifiersd − are supplied by √ [33, 35].n These are -regular graphs all of whose non- n/zerod − Laplaciand − eigenvalues lie between κ 2 1 and + 2 1. Thus, if we take a on vertices and√ multiply the weight of every edge by d d − ( 2 1), we obtain a graphκ that -approximates√ : the complete graph, for d − d − + 2 1 = 2 1

In this section,d we observe3 that Theoremdn/ 3.1 implies that every graph can be approxi- mated at least this well by a graph with only twice as manyd edges/ as the Ramanujan graph (as a -regular graph has 2 edges). The following is2 essentially a direct restatement of Theorem 3.1, but with the substitution = 1 to emphasize the Theorem 4.1. For every d > , every undirected weighted graph G V ; E; w on average degree of the graphs produced. n vertices contains a weighted subgraph H V;F; w with dd n − e edges (i.e., average degree at most d) that1 satisfies: = ( ) √= ( ˜) ( 1) T T2 d d T V x LGx ≤ x LHx ≤ √ · x LGx ∀x ∈ R : d − d! + 1 + 2 H G + 1 2 H G G We remark that while the edges of areH a subset of the edges of , the weights of edges in and will typically be different. In fact, there exist unweighted graphs for which every good spectral sparsifier must contain edges of widelyd− varying weights [45]. The significance of Ramanujan graphs lies in their exact optimality: no regular graph can exhibit sharper concentration of eigenvalues (or larger spectral gap) [36]. All known constructions of Ramanujan graphs are number-theoretic and rely on heavy tools from algebra to prove their correctness. Thered have been many attempts (e.g. [37, 12]) at elementary constructions of such graphs, but none have come within a constant factor of the optimal dependence on . In the remainder of this section, we argue that Theorem 4.1 should be seen as a step towards this goal. First we discuss why the weighted graphs produced by our construction should be considered expanders at all. Next, we extend the Alon-Boppana bound [36] for the extremality √ √ κ ofd Ramanujan√d− graphsd to≥ include weighted graphs. d d− 3 d− +2 Strictly1 speaking, our approximation1+ 5 constant is only better than the Ramanujan bound = 2 1 in the regime 2 . This includes the actual Ramanujan graphs, for which is an integer greater than 2.

32 4.1.1 Expanders: Sparsifiers of the Complete Graph G H

In the case that is a complete graph, our construction produces graphs that are expanders. However, these expanders are slightly unusual in that their edges have weights, they may be irregular, and the weighted degrees of vertices can vary slightly. This may lead one to ask whether they should reallyH be considered expanders. In this section, we argue thatH they should be, as they may easily be shown to have theH properties that define expanders. In particular, has high edge-conductance, random walks mix rapidly on and converge to an almost-uniform distribution, and satisfies the Expander Mixing Property (see [5] or [30, Lemma 2.5]). High edge- conductance and rapid mixing would not be so interesting if the weighted degrees were not nearly uniform  for example, the star graph has both of these properties, but the random walk on the star graphH converges to a very non-uniform distribution, and the star does not satisfy the Expander Mixing Property. For the convenience of Lemma 4.2. Let LH V ; E; w be a graph that  -approximates LG, the complete thegraph reader, on V . we Then, include for every a proof pair that of disjointhas the sets ExpanderS and T Mixing, Property below. = ( )  (1+ ) w S;T − |S| |T | ≤ n / |S| |T |;   p where w S;T denotes ( the) sum1 of + the weights of edges( 2) between S and T . 2 Proof. ( )    − LG  LH − LG  LG; We have   1 + 2 LH L2G M; 2 so we can write M  / kL k ≤ n/ x = 1 + G + S y 2 T where is a matrix of norm at most ( 2) 2. Let be the characteristic −w S;T xT L y: vector of , and let be the characteristic vectorH of . We have G S T ( ) = xT L y − |S| |T | : As is the complete graph and andG are disjoint, we also know

=

T  T T Thus, x LHy x LGy x My   − |S| |T | xT My: = 1 + +  2  = 1 + + 2 T The lemma now followsx byMy observing≤ kMk kx thatk kyk ≤ n / |S| |T |: p ( 2)

33 4.1.2 A Lower Bound

κ As the graphs we produce are irregular√ and weighted, it is also not immediately clear d d − that we should be comparing √with the Ramanujan√ O bound/d : of d − d − d + 2 1 4 d = 1 + + (1 )κ (4.2) 2 1 κ4 Itaverage is known thatd no -regularκ graph of uniform weight can -approximateκ a complete graph for asymptotically better than (4.2) [36]. While we believe that no graph of degree can be a -approximation of a complete graph for asymptotically better than (4.2), we are unable to show this at the moment and prove instead the Proposition 4.3. Let G be the complete graph on vertex set V , and let H V ; E; w following weaker claim. be a weighted graph with n vertices and a vertex of degree d. If H κ-approximates G, then √ = ( ) d κ ≥ √ − O : d n ! 2 Proof. 1 + H κ x∗ y∗ 1 We use a standard approach. Suppose is a -approximation of the complete y∗T L y∗ kx∗k graph. We will construct vectors andH orthogonal to the vector so that ∗T ∗ ∗ x LHx ky k2 κ2 v d v ; : : : ; vd vi is large, and thisv will give us a lower boundwi on . 0 1 vi Let be the vertex of degreev ; v ; : : :, ; and vd letδi its neighbors be . Supposex is 0 connectedy to by an edge of weight , and the total weight of the edges between 0 1 and vertices other than is . We begin by considering vectors and u v ; with √ x u / d u vi i ≥ ;  0 1 for u =6∈ {v ; : : : ; vd} ( ) = 1 for = , 1  d 0 0 for√ 4 d − d − √ While lower bounds [36] on the spectral gap of -regular graphsd focus ond showing− that the second- smallest eigenvalue is asymptotically at most 2 1, the same proofs by test functions can be used to show that the largest eigenvalue is at asymptotically least + 2 1.

34 u v ; √ y u − / d u vi i ≥ ;  0 1 for u =6∈ {v ; : : : ; vd} ( ) = 1 for = , 1  1 0 0 for x y These vectors are not orthogonal to , but we will take care of that later. It is easy d √ d √ to compute the valuesT taken by the quadratic form at and : x LHx wi − / d δi / d − i i X 2 X 2 d d d √ = =1 (1 1 ) + =1 (1 0) wi δi wi /d − wi/ d i i i X X X = =1 + =1 ( + ) 2 =1

d √ d √ T and y LHy wi / d δi − / d − i i X 2 X 2 d d d √ = =1 (1 + 1 ) + =1 ( 1 0) wi δi wi /d wi/ d: i i i X X X = =1 + =1 ( + ) + 2 =1 √ yT L y w δ w /d w / d The ratio in question isH thus i i i i i i i √ T x LHx w δ w /d − w / d Pi i Pi i i Pi i w √ + ( i +i ) + 2 = P d Pwi δi wi /d P i Pi + 2( +w ) : 2 − √1 P P i i d wi+ (δi+wi)/d 1 + i Pi 2 = 1 P P H κ 1 + ( + ) n nκ

Since is a -approximation,i allwi weighted degrees must lie between and , which δ w /d ≥ : wi δi wi /d i i i κ gives i i w P P i i 2 2( + ) 2 P P = P + ( +T ) 1 +√ 1 + y LHy d κ T ≥ : Therefore, x LHx − √1 2 d 1+κ ∗ ∗ 1 + 1 2 x y x y 1+ 1 (4.3) 1 √ √ d Let and be thekx projections∗k kxk of− hx;and1/ nirespectively− orthogonal to the vector. n 2 Then 2 2 2 (1 + ) = = 2

35 √ √ − d ky∗k kyk − hy; 1/ ni − n 2 and 2 2 2 n → ∞ √ (1 ) = kx∗k =d 2 ∗ − O : so that as ky k2 n ! 2 = 1 (4.4) √ ∗T ∗ ∗ √ Combining (4.3) andy (4.4),LHy wekx concludek thatd asymptotically:κ d ∗T ∗ ∗ ≥ − O x LHx ky k2 − √1 2 : n d κ !! 1 + 1+ 2 1 2 1 1 κ1+ √ √ d But by our assumption theκ ≥ LHS is atd mostκ ,− soO we have − √1 2 : n d κ !! 1 + 1+ 1 2 1 1 1+ √ d which on rearranging gives κ ≥ √ − O d n ! 2 1 + as4.2 desired. Sparsification by Effective Resistances

O n n/

2 In this section, we show how to construct spectral sparsifiersO m with ( log ) edges in nearly-linear time, thus improving on both [10] and [43]. Our sparsifiers are sub- graphs of the original graph and can be computed in e( ) time by random sampling, T where the samplingG probabilities are given by the effective resistancesLG ofe∈ theE we edgesbebe be H in the graph. P O nSupposen/ is a connected weighted graph with Laplacian = , 2 where the are incidence vectors of edges./ The existence of a sparsifier with ve we L be ( log ) edges follows immediately by applying1 Theorem 3.9 to the vectors 1 2 + 2 = ( )

for which it is easilyT checked that T veve L webebe L L L L I : 1 1 1 1 e + 2 e ! + 2 + 2 + 2 X X im(L) = ( ) ( ) = ( ) ( ) = ve

In particular, Theorem 3.9 tells us to sample the vectors , which correspond to edges

36 G pe ∝ kvek effective resistances2 Re G of with,we probabilities . In Section 4.2.1, we observe that these lengths correspond to the of edges in (multiplied by their edge weights ), establishing the following conceptually simple and satisfying description ofH theReffSparsify sparsificationG algorithm. q n n/ e G = pe ( 2 ) weRe e H we/qpe q Let = 8 log as in Theorem 3.9. Choose a random edge of with proba- bility proportional to , and add to with weight . Take samples

independently withfast replacement, summing weights if an edge is chosen moreRe than once. To turn this into a algorithm, we must compute the effective resistances effi- ciently. In Section 4.2.2, we show how to compute approximate effective resistances in nearly-linear time, which is essentially optimal. The tools we useRuv to do this are Spielman and Teng’s nearly-linear timeu solverv [43, 44] and the Johnson-Lindenstrauss LemmaTheorem [31, 4.4. 1].There Specifically, is an O wem prover/ the followingtime algorithm theorem, which in which on input denotes > and the effective resistance between vertices and . G V ; E; w with r wmax/wmin computes2 a n/ × n matrix Z such that with probability at least −e(/nlog ) 2 0 = ( ) = (24 log ) e −1  R1uv ≤ kZ χu − χv k ≤  Ruv 2 for every pair of vertices u; v ∈ V . (1 ) e( ) (1 + )

Z χu − χv Z u; v SinceOe( n/ ) is simply the difference of theO correspondingm n/ two columns of e, we can query theO m2 approximater/ effective resistance between any2 pair of vertices ( ) in time (log ), and for2 all the edges in time ( log ). By Corollary 3.10, 4.2.1this yields Electrical an e( log Flows) time for sparsifying graphs, as advertised. G V ; E; w n e we Let/we us begin by identifying = ( ) with an electrical network on nodes in which each edge corresponds to a link of conductancei u (i.e., a resistor of resistance 1 ), orientedi e arbitrarily as as in Section 2.2. We will use the same notation as [29] ext to describev u electrical flows on graphs: for a vector ( ) of currents injected at the vertices, let ( ) be the currents induced in the edges (in the direction of orientation) and ( ) the potentials induced at the vertices. By Kirchoff’s current law, the sum of BT i i : the currents entering a vertex is equal to the amount injected at the vertex:

ext =

By Ohm’s law, the current flow in ani edgeWB isv equal: to the potential difference across its ends times its conductance: =

37 i BT WBv Lv: Combining these two facts, we obtain i ⊥ 1 L ext = ( ) =

ext If span( ) = ker( )  i.e., if the total amount of current injected is equal to the v L i total amount extracted  then we can write + L ext = effective+ resistance u v by the definition of in Section 2.1.2. Recall that the between two vertices and is defined as the potential differenceL induced between them when a unit current is injected at one and extractede atu; the v other.+ Wei willb derivee χv an− χ algebraicu expression for the effective1 resistance in terms of .i To inject and extract a unit currentv L acrossbe the endpoints ext of an edge = ( ), we sete u;= v = ( ), which isb clearlye + orthogonal to . ext The potentials induced by at the vertices are given by = ; to measure the v v − v u χ − χ T v bT L b : potential difference across = ( ), wev simplyu multiplye bye on the left: + T e Re be L be k L bek ( ) ( ) = ( ) = 1 + + 2 2

It follows that the effective resistance across/ is given by = = ( ) . kvek kwe L bek weRe; Thus we conclude that 1 2 1 2 + 2 2 = ( ) =

4.2.2as desired. Computing Approximate Resistances Quickly

{Re}

It isR note clear howO m to computer all the effective resistances exactly andO efficiently.n × n In this section,Z we show that one can compute constant factor approximations to all the in time e( log ). In fact, we do something stronger:O wen build a (log ) e Proofmatrix of Theoremfrom which 4.4. theu effectivev resistance betweenG any two vertices (including vertices not connected by an edge) can be computed inu (logv ) time. If and are vertices in , then following the discussion in R χ − χ T L χ − χ the previous section,uv the effectiveu v resistanceu betweenv and can be written as: T χu − χv L+LL χu − χv = ( ) T ( T / ) / χu − χv L+ B+W W BL χu − χv = ( / ) ( ) kW BL χ+u − χv 1k2: 1 2 + = (( 1 2 +) 2)( ( )) 2 {W / BL χ } = ( ) v v∈V 1 2 + Thus effective resistances are just pairwise distances between vectors in . By the Johnson-Lindenstrauss Lemma, these distances are preserved if we project the

38 O n

vectors onto a subspace spanned by (log ) randomd vectors. For concreteness, we Lemma√ 4.5. Given fixed vectors v : : : vn ∈ R and  > , let Qk×d be a random use± / thek matrix following (i.e., version independent of the Johnson-Lindenstrauss Bernoulli entries) with Lemmak ≥ due ton/ Achlioptas. Then with [1]. 1 probability at least − /n 0 2 1 24 log −  kv − v k ≤ kQv − Qv k ≤  kv − v k 1 1i j i j i j 2 2 2 for all pairs i; j ≤ n. 2 2 2 (1 ) (1 + ) / {QW BL χv } 1 2 + TheoremOur goal 4.6 is now to compute. There the projections is an algorithm x . We willL; y; exploit δ which the takeslinear a system Laplacian solver matrix of SpielmanL, a column and Teng vector [43,y, 44], and which an error we recall parameter satisfies:δ > , and returns a column(Spielman-Teng) vector x satisfying = STSolve( ) 0 kx − L ykL ≤ kL ykL; + + T where kykL y Ly. The algorithm runs in expected time O m /δ , where m is the number of non-zero entries in L. p Z =QW / BL Ze ( log(1 )STSolve)

1 2 + Z zi zi T i Let =Z Z . We will compute anzi approximationi e byZ using to approximately computeZ the rows of . Let the column vectors and ˜ denote the th rows of and ˜, respectively√ (so that is the th column of ). Now we can Q ± / k k × n k n/ construct the matrix e in the following three steps. 2 Y QW / B m × n/ m O m/ 1. Let be a random 1 matrix of/ dimension where = 24 log . B m1 2 W 2 2 2. Compute = . Note that this1 2 takes 2 24 log + = e( ) y ≤ i ≤ k Y z L; y ; δ time sincei has 2 entries and is diagonal. i i i 3. Let , for 1 , denote the rows of , and compute ˜ = STSolve( ) STSolve for each .  −  w We now prove that, for ourδ purposes, it sufficesmin to: call with s  n wmax 2(1 ) = 3 Lemma 4.7. Suppose 3 (1 + )

−  Ruv ≤ kZ χu − χv k ≤  Ruv ; 2 for every pair u; v ∈ V . If for all i, (1 ) ( ) (1 + )

kzi − zikL ≤ δkzikL;

˜ (4.5)

39 where  −  w δ ≤ min s  n wmax 2(1 ) then 3 (4.6) 3 (1 + ) −  Ruv ≤ kZ χu − χv k ≤  Ruv ; for every uv. 2 2 2 (1 ) e( ) (1 + ) Proof. u v  Consider ank arbitraryZ χu − χv pairk − of kZ verticesχu − χv ,k .≤ It sufficeskZ χu − toχ showv k that

( ) ˜( ) ( ) (4.7) 3 since this willkZ χ implyu − χv k − kZ χu − χv k 2 2 kZ χ − χ k − kZ χ − χ k · kZ χ − χ k kZ χ − χ k ( u ) v ˜( u ) v u v u v   ≤ · kZ χ − χ k : = ( ) u˜( v ) ( ) + ˜( )   2 G 2 + ( ) P u v 3 3 As is connected, there is a simple path connecting to . Applying the triangle inequalitykZ χu twice,− χv k we − obtainZ χu − χv ≤ Z − Z χu − χv

≤ Z − Z χ − χ : ( ) e( ) ( e)( a ) b ab∈P X ( e)( )

40 We will upper bound this later term by considering its square: Z − Z χa − χb 2 ≤ n Z − Z χa − χb ! 2 ab∈P ab∈P X X ( e)( ) ( e)( ) by Cauchy-Schwarz ≤ n Z − Z χa − χb ab∈E 2 X e n Z − (Z BT )( ) 2F

T = n (B Z −e)Z writing this as a Frobenius norm 2F n / T =≤ ( W eB) Z − Z wmin 2F 1 2− / √ kW k ≤ / wmin ( e) n /1 2 T ≤ δ W BZ2 F wmin since 21 2 1 2 / / kW B zi − zi k ≤ δ kW Bzik n 1 2 2 2 1 2 2 δ wab kZ χa − χb k wminsince ( ˜ ) by (4.5) 2 ab∈E 2 n X ≤= δ wab (  Rab ) wmin ab∈E 2 X n  (1 + ) ≤ δ n − wabRab n − wmin 2 ab∈E (1 + ) X ( 1) since = 1.

−  On the other hand,kZ χu − χv k ≥ −  Ruv ≥ ; nwmax 2 2(1 ) ( ) (1 )

by PropositionkZ χu − 4.8.χv k Combining − Z χu − theseχv bounds,n we have / nw / ≤ δ n − · max kZ χ − χ k w 1 2 −  1 2 u ev  min    ( ) ( )  (1 + ) ≤ ( 1) ( ) 2(1 ) by (4.6), 3 Proposition 4.8. If G V ; E; w is a connected graph, then for all u; v ∈ V , as desired.

= ( ) Ruv ≥ : nwmax 2 Proof. Ruv G

By Rayleigh’s monotonicity law (see [13]), each resistance in is at least

41 0 0 Ruv G wmax × Kn 0 wmax G 0 0 the correspondingG resistance in = Ruv G(the complete graph with all edge weights ) since is obtained by increasing weights (i.e., conductances) of R0 n − /w edges in . But by symmetryuv eachuv resistancemax in is: exactly n n n − / nw P max ( 1) 2  = = Ruv ≥ u; v 2∈ V nwmax ( 1) 2 2 Thus for all Z . O m /δ / O m r/ kZ χu − χv k2 ≈ Ruv 2 u; v ∈ V O Thusn/ the construction of e takes e( log(1 ) 2) = Ze( log ) time. We can then find2 the approximate resistance e( ) for any in (log ) time simply by subtracting two columns of e and computing the norm of their difference. {Re} Using the above procedure, we can compute arbitrarily good approximations to the effective resistances which we need for sampling in nearly-linear time. By Corollary 3.10, any constant factor approximation yields a sparsifier, so we are done.

42 Chapter 5

Application to Contact Points

In this chapter we will prove a refinement of the Spectral Sparsification Theorem of Chapter 3 and use it to make progress on a problem in high dimensional convex geometry. The proof will again use two ‘barrier’ potential functions to iteratively build amean sparse weighted sum of rank one outer products with desirable spectral properties. The additional twist in this case is that we require the vectors produced to have small ; in Theorem 5.5, we show how to do this at some cost in both sparsity and control on the spectrum. We view this as an important first step in extending the barrier 5.1method Introductionto more constrained settings and exploring the scope of its applicability.

K Rn E K contact points K E LetK be an arbitraryE convex body in and let be a minimal volume ellipsoid n containing . Then theK of are the points of intersectionR of Eand n . The ellipsoid is uniqueB and characterizedm by≤ aN celebratedn n theorem/ of F. John n [7],x ; : which : : ; xm says∈ K ∩ thatB if is embedded via an affinec ; : : :transformation cm in so that is 2 the standard Euclidean ball , then there are = ( + 3) 2 contact points 1 2 1 candixi nonnegative weights for which i X T cixix=i 0 I (mean zero) (5.1) i X = 0 (inertia matrix identity)n (5.2) K x ; : : : ; xm B ci; xi i≤m John’s 1 2 decompositionMoreover, any convex body containing must have inside its John ellipsoid. We refer to a system ( ) which satisfies (5.1) and (5.2) as a of the identity. The study of contact points has been fruitful in Convex Geometry, for instance in understanding the behavior of volume ratios of symmetric and nonsymmetric convex bodies [7] and in estimating distances between convex bodies and the cube or simplex [38, 24]. In this chapter, we consider the number of contact points of a convex body.

43 d K H Rn Define a distance 1 between two (not necessarily symmetric) convex bodies and d K;H {c H u ⊂ TK ⊂ c H u } in as follows : T ∈GL n ;u∈Rn

K ( ) = inf( ) : + ( + ) d K N n n / and let be the space of all convex bodiesK equipped with the topology induced by K. Gruber [28] proved that the set of having fewer than = ( + 3) 2 contact points is of the first Baire category in . However, Rudelson has shown that every Theorem 5.1 . Suppose K is a convex body in n and  > . Then is arbitrarily close to a body which has a much smaller numberR of contact points. there is a convex body H such that d H;K ≤  and H has at most m ≤ Cn n/ contact points,(Rudelson where C is [39]) a universal constant. 0 2 ( ) 1+ log n

In this chapter,d H;K we≤ show that the log factor in Rudelson’s theoremm ≤ n/ is unnecessary in many cases.H For symmetric convex bodies, we obtain exactly the same2 distance guarantee ( ) H 1 + but with a much smallerd H;K number≤ : 32m ≤ Cnof contact points of . For arbitraryC convex bodies, weO shown an somewhat weaker result that only guaranteesd H;K an within< : constant distance ( ) 2 24, with contact points for some universal . Thus Rudelson’sH ( log ) bound is still the best known in the regime ( ) 2 24 for nonsymmetric bodies. Our approach for constructing is the same as Rudelson’s, and consists of two c ; x K steps: i i i≤m xi approximately 1. Given a John’s decompositionbi s ( O n) for , extract a small subsequence of points whichu are a John’s decomposition. To be precise, choose a set of scalars , at most = ( ) of which are nonzero and a small ‘recen- tering’ vector for which T bi xi u I − bi xi u xi u ≤ :

i i X X ( + ) = 0 ( + )( + ) bi; xi u i≤s ai; ui i≤s 2. Map the points ( H + ) in the approximateK John’s decomposition to an exact John’s decomposition ( ) , and show that it characterizes the John Ellipsoid of a body that is close to . bi The source of our improvement is a new method for extracting the approximate decom- position in step (1). Whereas the were chosen by random methods in Rudelson’s work, we now use the deterministic procedure described in Theorem 3.1, restated here for convenience:K H u d 1 When and are symmetric then we can take = 0 and becomes the usual Banach-Mazur distance.

44 Theorem 5.2 . Suppose n d > and v ; v ;:::; vm are vectors in R with (Spectral Sparsification of the Identity, copy of Theorem 3.2) 1 2 T 1 vivi I: i≤m X = Then there exist scalars si ≥ with |{i si 6 }| ≤ dn so that √ 0 T: = 0d I  sivivi  √ 2 I: i≤m d − ! X + 1 1symmetric

A sharp result regarding the contact points of convex bodies can be Corollaryderived as 5.3. an immediateIf K is a symmetric corollary convexof Theorem body 5.2 in R andn and Rudelson’s > , then proof there of Theorem exists a 1.1body [38].H such that H ⊂ K ⊂  H and H has at most m ≤ n/ contact points with its John Ellipsoid. 0 2 Proof. K (1 + ) 32 Bn X {x ; : : : xm} c ; : : : cm K 2 Suppose is a symmetricxi convex∈ X ⇐⇒ body − whosexi ∈ X John ellipsoid is , and let 1 1 = ci be contact points satisfying (5.1,5.2) with weights . Since X is symmetric we can√ assume that , and that the corresponding weights arevi equal.cixi d / si Y ⊂ X We will extractxi an approximatesi John’s decomposition2 from . Apply Theorem 5.2 to the vectors = with parameter = 16 to obtain scalars , and let be the set of with nonzero . We areT now guaranteed/ that I  sicixixi  I / − 2 xi∈Y   X 4 + 1 |Y | ≤ n/ 4 1 2 / with 16 . Notice that by an easy calculation≤  / − 2   4 + 1 1 + 4 1

for sufficiently small epsilon. In order toY obtain a John’s decompositionsi from these vectors, we need to ensure the mean zero condition (5.1). This is achieved easily by taking a negative copy of each vector in and halving thesi/ scalarscixi s,i/ sinceci −xi x ∈Y Xi ( 2) + ( 2) ( ) = 0 T T T si/ cixixi si/ ci −xi −xi sicixixi x ∈Y x ∈Y and Xi Xi ( 2) + ( 2) ( )( ) =

45 Y ∪ −Y bi sici/ xi −xi  which we know is a good approximationn/ to the identity. Thus the vectors in with weights = 2 on2 and constitute a (1 + )-approximate John’s decomposition withK only 32 points. Substituting this fact in place of Lemma 3.1 [38] in the proof of Theorem 1.1 [38] gives the promised result.

i bixi When the body is notu symmetric,− b there is noxi immediate way to guarantee the P i i mean zero condition. If we simply recenterP the vectors produced by Theorem 5.2 to have mean zero by adding = T to each T , then the correspondingT inertia bi xi u xi u bixixi − bi uu matrix is i i i ! X X X ( + )( + ) = (5.3) T k i bi uu k whichP no longer approximates the identity (indeed, it can have negative eigenvalues) if u is large. This is the issue that we address here. In Section 5.2, we prove a variant of Theorem 5.2 which allows us to obtain very good control on the mean at the cost of having a worse (at best, factorH 4) approximation of the inertia matrix to the identity. In Section 5.3, we show that this is still sufficient to carry out Rudelson’s construction of the approximating body . The end result is the following Theorem 5.4. For every  > the following is true for n sufficiently large. If K is a theorem. convex body in Rn, then there is a convex body H such that 0 √ H ⊂ K ⊂  H

has at most O n contact points with its John Ellipsoid.  ( 5 + )

5.2 Approximate( ) John’s Decompositions

Theorem 5.5. Suppose we are given a John’s decomposition of the identity, i.e., unit In this section we willn prove the following theorem. vectors x ; : : : ; xm ∈ R with nonnegative scalars ci such that

1 cixi i X T cixixi =I: 0 (5.4) i X = (5.5) Then for every  > there are scalars bi, at most O n nonzero, and a vector u such that T 0 I  bi xi u xi u ( )  I i X ( +bi x)(i +u ) (4 + ) i X ( + ) = 0

46 bi kuk ≤ : i ! X 2 u xi ei ci {ei}i≤n We remark that the requirement (5.4) is necessary to allow a useful bound on , since otherwise we can take = with = 1 for the canonical basis vectors and it is easily seen that T bi kuk ≥ bi λmin bixixi i ! i ! X 2 X min = bi

5.2.1for every An choice Outline of scalars of The, which Proof is worthless considering (5.3).

bi; xi u As in the proof of Theorem 5.2, we will build the approximate John’s decomposition T ( + ) by an iterative processA which addsbj xj x onej vector at a time. At any step of j the process, let X = bi denote the inertia matrix of the vectors that have already been added (i.e., the ’s that have been set to some nonzeroz value), andbj xj let j X = A z Barrier Functions. denote their weighted sum. Initially both = 0 and = 0. As in the proofA of Theorem 3.1, Theu; lchoice∈ R of vector to add in each step will be guided by two ‘barrier’ potential functions which we will use to maintain control on the eigenvalues of . For real numbers , which we will call the upper and loweru barrierA respectively,uI − A − define: : u − λi def 1 i X 1 Φ ( ) = Tr( ) = (Upper potential) − l A A − lI ; λi − l def 1 i X 1 λ ; : : : ; λnΦ ( ) = Tr( ) = A (Lower potential) A ≺ uI A lI λ A < u λ A > l 1 where are the eigenvalues of . A u l max min As long as and (i.e., ( ) and ( ) ), these potentialuI − A u functionsA − lI measure how far the eigenvalues of arel A from the barriersA and . In particular, they blow up as any eigenvalue approaches aA barrier, since then (or ) approaches a singular matrix. Thus if Φ ( ) and Φ ( ) are appropriately bounded, then we can conclude that the eigenvalues of are ‘well-behaved’. For a more thorough discussion of where these barrier functions come from and why they work, we refer the reader to Section 3.1.1.

47 Invariants. u; l; A z PU;PL;  We will maintain three invariants throughout the process. Note that • A l u and vary from step to step, while and remain fixed. lI ≺ A ≺ uI: The eigenvalues of lie strictly between and :

• PL (5.6) PU u Both the upper and lower potentialsl A ≤ PL are boundedA ≤ P byU: some fixed values and : • z Φ ( ) Φ ( ) (5.7) kzk ≤  · A The running sum is appropriately small: 2 Initialization. Tr( ) A z (5.8)

At the beginning of the process we have = 0 and = 0, and the l l − u u : barriers at initial values 0 0 P P n = = 1 and = = 1U L (5.9) Maintenance. It is easytv to seerw that (5.6),t; (5.7), r ≥ and (5.8)v; w all∈ {holdxi}i≤ withm = = at this point. The process will evolve in steps. Each step will consist of adding two vectors, and , where 0 and . We will call these the primary vector and theδ fixl > vector, respectively.δu > The primary vector will allow us to move the upper and lower barriers forward by fixed amounts 0 and 0 while maintaining the invariants (5.6) and (5.7); in A tvvT ≤ A u δu A tvvT ≤ u A l δ I ≺ A tvvT ≺ u δ I: particular,l δl we will choosel it to satisfy: l u + + Φ ( + ) Φ ( )Φ ( + ) Φ ( )( + ) + ( + ) rw (5.10) The fix will correct any undesirable impact that the primary has on the sum; hz; tv rwi ≤ specifically, we will choose in a way that guarantees z + 0 (5.11) where is the sum at the end of the previous step. Thus the net increase in the kz tv rw k kzk hz; tv rwi ktv rwk ≤ kzk t r ; length of the sum in any step is given by 2 2 2 2 2 v w + ( + ) = + 2 + + + + ( + ) (5.12) t r since and are unit vectors. The corresponding increase in the trace is simply t r ≤  + , and so if we guarantee in addition that the steps are sufficiently small:

+ (5.13)

48 kz tv rw k kzk t r then the invariant (5.8) can be maintained by induction as follows: T T ≤ A tvv rww2 A2 t r2 + ( + ) +k (zk+ )t r ≤ ; by (5.12) Tr( + + ) Tr( ) + (A+ )t r  2 2  ≤  ( + ) max Tr( ) + by (5.13). rw f However, we need to makeδu > sure that adding the fix vector does not cause us to violate (5.6) or (5.7). To do this,rw the addition of will be accompanied by an

appropriatelyf large shift of 0 in the upper barrier. In particular, we will make u δu δu T T u δu T T T f sure thatA ontvv top ofrww satisfying≤ (5.11),A tvvalso satisfiesA tvv rww ≺ u δu δu I: + + + T T T Φ A ( +tvv +rww ) AΦ tvv( + ) and + + ( + + ) (5.14) Since + + + , there is no need to shift the lower barrier, and A tvvT rwwT ≤ A tvvT A tvvT rwwT l δ I: thel analogousδl bound for the lowerl δl potential is immediate: l

+ +0 + Φ ( + + ) Φ ( + ) with + + ( + ) A tvvT rwwT ≤ A ≤ P Together with (5.10), thesel δ inequalitiesl guarantee thatl L

+ Φ (f + + ) Φ ( ) u δu δu T T u A tvv rww ≤ A ≤ PU; and + + Φ ( + + ) Φ ( ) tv f thusrw maintaining bothl (5.6)u and (5.7), as desired.δl δl δu To summarize what has occurred during the step: we have added two vectors and and shifted and forward by and + , respectively, in a manner that ourt; three r ≥ invariants continuev; w ∈ to {x hold.i}i≤m To show that such a step can actually be taken, we need to prove that as long as the invariants are maintained there must exist Termination.scalars 0 ands vectors which satisfy all of the conditions (5.10), (5.11), (5.13), and (5.14). We will do this in Lemma 5.6. − sδ I ≺ A ≺ s δ δf I : After steps ofl the process, weu haveu s λ A /λ A f ( 1 + ) (1 + ( + )) by (5.6) δu δu δl max min f If we take + sufficiently large, then we can makeδl; δu; theδu ratio ( ) ( ) arbitrarily close to . In the actuals O proof,n which we will present shortly, we will show that the process can be realized with parameters for which this ratio can be made arbitrarily close to 4 in = ( ) steps.

49 j bj xj z u − b − A P j j P Tr( ) As for the mean, we will set = =kzk , so that immediately bj kuk ≤  A2 j ! X 2 = (5.15) Tr( )

5.2.2as promised. Realizing the Proof

f δl; δu; δu

To complete the proof, we will identify a range of parameters for which the Lemma 5.6 . Suppose c ; x is a John’s decomposition and z is any above process can actually be sustained.i i i≤m vector. Let A be a matrix satisfying the invariants (5.6) and (5.7). If

(One Step) ( ) n f PU PL ≤ δu δu  δl 1 1 4 1 Then there are scalars t; r ≥+ and+ vectors 2 + v; w+∈ {xi} which satisfy (5.10), (5.11),(5.16) (5.13), and (5.14). 0

To this end, we recall the following lemmas from Chapter 3, which characterize how much of a vector one can add to a matrix without increasing the upper and lower Lemma 5.7 . Suppose A ≺ uI and δ > . potentials. u Then there is a positive definite matrix U U A; u; δu so that if v is any vector which satisfies (Upper Barrier Shift, copy of Lemma 3.4) 0 vT =v ≥ ( ) U t then 1 u δU T u T A tvv ≤ A and λ A tvv < u δU: + T That is, if we add t times vv to A and shift themax upper barrier by δU, then we do not increase the upperΦ potential.( + ) Φ ( ) ( + ) +

Lemma 5.8 . Suppose A lI, δl > , and l A < /δl. Then there is a matrix L L A; l; δl so that if v is any vector which satisfies (Lower Barrier Shift, copy of Lemma 3.5) 0 Φ ( ) 1 vT =v ≤( ) L t then 1 T T l δL A tvv ≤ l A and λ A tvv > l δL: T That is, if we add+ t times vv to A and shift themin lower barrier by δL, then we do not increase the lowerΦ potential.( + ) Φ ( ) ( + ) +

50 T T v Uv v Lv {xi} {ci} We will prove that desirable vectors exist by taking averages of the quantities and over ourT contact points Twith weights . Since cixi Uxi U • cixixi i i ! X X = U • I U = by (5.2) L = Tr( ) U L A u; l (and similarlyδ foru ),δ itl will be useful to recall bounds on Tr( ) and Tr( ) from Chapter 3. Remarkably, these bounds do not depend on the matrix or on at all, but only Lemma 5.9 . If lI ≺ A ≺ uL with A < P and u A < P then on the shifts and Land onU the potentials. l L U

(Traces of and ) U ≤ PU Φ ( ) Φ ( ) δu 1 and Tr( ) + L ≥ − PL: δl 1 Tr( ) f f Proof of Lemma 5.6. L L A; l; δl ; U U A; u; δu ; U U A; u δu; δu We are now in a position to prove Lemma 5.6. Let = ( ) = ( ) = ( + ) be thetv matrices produced by lemmas 5.7 and 5.8. Let us focus on the primary vector first. By Lemmas 5.7 and 5.8, we can add vT v ≤ /t ≤ vT v: without increasing potentials if U L v 1 vT v / ≤ /t ≤ vT v In fact, we will insist on for whichU L t ≤ / T T + 2 1 D v v Lv − v Uv − / F {xi D xi ≥ } feasible as this will ensureP { thatxi h wexi; z cani > take} 2. Letz ( ) = N {xi hxi; zi2 ≤ }and call = : ( ) 0 the set of vectors. Let = : 0 be the set of vectors with positive inner product T with ,/t and letv Lv = : v ∈ F0 be the vectors in the complementary halfspace.v We will always addv as littleth ofv; a zi primary vector can, so we can assume that we take 1 = whenever . Here isw the rule for choosing which to add: chooseF ⊂ the P feasible for which is minimized. If this quantity is negative then hv; zi there is no need for a fix vector, and≥ takingvT Lv =∀ 0vwe∈ F are: done. Otherwise we know that and α t α {thv; zi v ∈ F} 1 = (5.17)

where = min : . Taking a sum, we notice that

51 hx ; zi hx ; zi c i ≥ c i F ⊂ P i α i α P F X X T ≥ cixi Lxi since F X T ≥ ciD xi by (5.17)U  D xi ≤ xi Lxi F X ≥ ciD(xi) since D < 0 implies thatF ( ) i X ( ) since 0 outside i cixi P hx ; −zi hx ; zi However, since =c 0 thisi implies thatc i ≥ c D x : i α i α i i P i XN X X = w ( ) (5.18) w ∈ {xi} r ≥ We will use (5.18) to show that a suitable fix vector exists. We are interested rhw; −zi ≥ α α thv; zi in finding a and 0 for which T f f w U w / ≤ /r δu r ≤ / (sufficient to reverse = )for (5.11) w + 2 1 (upper barrier feasible with shift and 2)for (5.13,5.14) hw; −zi wT f w / ≤ ; Thus it suffices to find a for whichU α

/r + 2

and then we can squeeze 1 in between. Taking a weighted sum over all vectors of T f hxi; −zi interest, it will be sufficientc toix showxi thatci/ ≤ ci : i U α XN XN + 2

c xT f x c / ≤ c xT f x c / f n/ For the left handi sidei U wei usei the crude estimatei i U i i U XN NX ∪P + 2 + 2 = Tr( ) + 2

52 hx ; −zi and for thec righti hand≥ sidec weD x consider that i α i i i XN X T T cixi (Lx)i − cixi Uxi − ci/ by (5.18) i X T = L − U − n/ 2 cixixi I i X = Tr( ) Tr( ) 2 since =

f n/ ≤ − − n/ Thus it will be enough to haveU L U

Tr( ) + 2 Tr( ) Tr( ) 2

Proof of Theorem 5.5. l − u PL which follows from our hypothesisf (5.16) and Lemma 5.9. PU n δu δu  δl 0 0 If we start with = 1 and = 1 then we can take = n = . If we set = = (2 + ) , thenn (5.16)≤ reduces; to  δl  δl 2 4 1 + 3 + (2 + )  which it is easy to check is satisfied wheneverδl ≤ : 2n s 10 λ A s ·  δ At the end of stepsκ weA have ≤ l λ A − sδl max ( ) 1 + 2(2 + ) ( ) = min  s ≥ n/ : T ( ) 1 + : A uu  3 which can be made arbitrarily at most 4 + 3 by taking 100 steps. By (5 3) and5.3(5 15) Construction, the term Tr( ) ofonly the affects Approximating this by at most Body.

− The contents of this section are very similar to Rudelson [38] Section 4, but the calculations are a bit more subtle because we only have a 4 approximate John’s decomposition as opposed to a (1 + )-approximate one. The main result is a generic procedure for turning approximate John’s decomposi- Lemma 5.10. Suppose K has contact points c ; x and b ; y x u are the tions into approximating bodies: i i i≤m i i i i≤s

( ) ( = + )

53 vectors produced by Theorem 5.5, with

T A biyiyi : i X Then for  > and n sufficiently large,= there is a body H with at most s contact points and / 0 κ A − d H;K ≤  κ A 1 2 : p 2 !! ( ( ) 1) ( ) (1 + ) ( ) 1 + (5.19) 4 Proof of Theorem 5.4. κ A This immediately yields a proof of Theorem 5.4: Since the condition number ( ) guaranteed by Theorem 5.5 / √ can be made arbitrarily close to 4, the number in (5.19) can be made arbitrarily close 1 2 to    1 4 1 + = 5 4 bi; yi xi u i≤s as desired.

Let ( = + ) be the approximatebiyi John’s decomposition guaranteed by i Theorem 5.5, with X = 0 T biyiyi A: i and X yi = E A / Bn There are two1 problems2 with this: the are not unit vectors, andai; u theiri i≤s moment 2 ellipsoid = isv not the sphere. We will adjust theyi vectors in a manner that fixes both these problems, to obtain an exact John’s decomposition (ˆ ˆ ) . y y v Add a small vector , to be determinedi later,i to each to obtain vectors

ˆ = + A b y yT b y yT b vvT A A vvT : with inertia matrixv i i i i i i i i i i X X X = ˆ ˆ = + = + Tr(/ ) (5.20)n · v Rv Av Ev Rv B yi 1 2 Ev 2 (we will use ˆ to denote vectors that depend on .) Let = and let = yi − be the correspondingz momenti ellipsoid. If weky rescaleikE v eachkRv yˆikto lie on : kyikE v 1 ˆ ˆ = whereR−ˆ = ˆE .Bn (5.21) ˆ v v 1 2 and then apply the inverse transformation which maps to , then we obtain

54 − ui Rv zi: unit vectors 1 ˆ = ˆ a b ky k Moreover, if these are given weights i i i E v 2 ˆ = ˆ

T then we have an exactT decomposition− of they identityiyi since− − − aiuiui Rv bikyikE v Rv Rv Av Rv I: kyikE v i 1 i 2 ! 1 1 1 X X ˆ ˆ ˆ ˆ ˆ = ˆ 2 = = ˆ v

In the following lemma, we show that there− y musti exist− a small for which the weighted aiui bikyikE v Rv Rv bikyikE v yi sum kyikE v i i 2 1 1 i ! X X ˆ X ˆ ˆ = ˆ = ai; ui i≤ˆs ˆ ˆ Lemma 5.11. Let b ; y ;A, etc. be as above and let  > , and let n be sufficiently is equal to zero. Thisi willi complete the construction of (ˆ ˆ ) . large. Then there is a vector v with 0  kAk kvk ≤ νA κ A − def s A 1 + p  = ( ) 1 (5.22) for which 2 Tr( ) bikyikE v yi : i X Proof. v ˆ ˆ = 0

b y v T A− y v y v : We need to find a ifor whichi v i i i X p 1 ( + ) ( + )( + ) = 0

As in [38], we will do thisv using the Brouwer fixed point theorem. In particular it will i biβi yi v T T − sufficeF tov show− that the function βi yi v A vv yi v ( )v i biβi P ( ) 1 (v ) p ( ) = Pi bi βi − µ yi where = ( + ) ( + ) ( + ) − µ ∈ R biyi ( ) v P i biβi i ( ( ) ) X = P for any since = 0

55 n νAB

2 b β v − µ hy ; wi maps kF v tok itself. Thisi cani i be demonstratedi as follows: kwk ( ) v P i biβi ( ) / =1 v /( ) ( ) = maxi bi βi −Pµ ≤ · bihyi; wi 1 2 ( ) v kwk biβ 2 1 2 P i i i 2! ( ( ( ) ) ) X P max=1 b β v − µ / i i i · kAk / by( Cauchy-Schwarz) v biβ 2 1 2 P i i 1 2 ( ( ( ) ) ) b β v − µ / =≤ P i i i · kAk / − / ( ) − O A biβ 2 1 2 P i i 1 2 1 1 2 ( ( v(0) ) ) P βi 1 (Tr( ) ) b β v −( )µ / ≤ i i i · kAk / · kAk / by Lemma− / 5.12 and( ) = Ω(1) − O A i bi 2 1 2 P 1 2 1 2 1 ( ( − ) ) 1 2 βi ≥ kAk i P 1 (Tr( ) ) (0) 1 / / i bi βi − µ O A ≤ since −min/ · kAk − O A (0) 2 A1 2 1 4 P √ √ √ 1 1 2 ( ( ) ) +a (Tr(b ≤) )a b 1 (Tr( )b β) − µ / Tr( ) ≤ i i i · kAk A (0)by Lemma 5.13 and + + P 2 1 2 (1 + )( ( ) ) > n A n  Tr(b /) |β − µ| ≤ i i i i · kAk Afor any (0) 0 and large , since Tr( ) = Ω( ) P 1 2 (1 + )(| β) max− β | kAk  i i i i Tr((0) ) (0) A /

max µ min β − 1β2 / = (1 + ) i i 2 i Tr( i )  (0) kAk (0) ≤ kA− k / − setting = (maxkAk / minA / ) 2  1 1 2  / 1 +  − / / 1 1 2 kAk 1 2 kA k kAk − / ; 2 Tr(A1)2 1 1 2 1 2 1 +  = 1 1 2 2 Tr( ) Lemma 5.12. If kvk O A − / then as desired. 1 2 v / = (Tr( b)iβi )≥ biβi − O A : i i X ( ) X (0) 1 2 (Tr( ) )

56 Proof.

β v kR− y v k We can lowerboundi v thei individual terms as ( ) −1 − ≥ kRv yik − kRv vk = T 1( + T) − 1 / − yi A vv yi − kRv vk 1− 1 T2 − 1 / T − A vv A − = yi( +A −) T − yi − kRv vk 1 v A v1 1 2   1   1 = 1 1 + − T − / T − T A vv A − ≥ yi Abyy Sherman-Morrisoni − yi T − yi − kRv vk 1 v A v1 1 2 q 1 √  √ √   1 a − b ≥ a − 1 b 1 + since .

− T − / Taking a sum, we observev the differenceT ofA interestvv A is bounded by − biβi − biβi ≤ bi yi T − yi bikRv vk: 1 v A v1 1 2 i i i i X (0) X ( ) X     X 1 1 + 1 + / / − T − / − T − The first sumT isA handledvv A by Cauchy Schwarz: T A vv A bi yi T − yi ≤ bi 1 2 biyi T − yi 1 2 1 v A v1 1 2 1 v A v1 i i ! i ! X     X X   1 − T − / 1 / AA vv A 1 + A T − 1 + v1 A v 1 1 2 1 2   Tr( 1T ) = Tr( ) biyiyi A 1 +i / X < A since: = 1 2 Tr( )

− − / For the second, we observebikR crudelyv vk ≤ thatA kRv kkvk O A i X 1 1 1 2 − Tr( ) = (Tr( ) ) kRv k O Lemma 5.13.1 If kvk O A − / then since = (1). 1 2 v − / =bi β(Tr(i −)µ )≤ bi βi − µ O A : i i X ( ) 2 X (0) 2 1 2 Proof. ( ) ( ) + (Tr( ) ) v v v bi βi − µ biβi − biβi biµ : i i Write X ( ) 2 X ( )2 ( ) 2 ( ) = 2 +

57 v T βi ≤ βi A vv  A ( )2 (0)2 v / Then − bybiβi +≤ − ,b andiβi O A ; i i X ( ) X (0) 1 2 2 2 + (2Tr( ) ) by Lemma 5.12

Proof of Lemma 5.10. H as desired. K Bn ci; xi i≤m We are now in a position to constructbi the; yi i body≤s promisedO n 2 in Theoremv; 5.4. Av ; { Letzi}i≤s be aa convexi; ui i≤s body with as itsθmax John ellipsoidi kzik and contactθmin pointsi kzi(k ) . Use > Theorem 5.5 to obtain a subsequence ( ) with only ( ) 2 points Let ˆ , and (ˆ ˆ ) be as above. Let = max ˆ and = 2 θ min ˆ and let 0, andL consider themin bodyK; z ;:::; z :  s   1 = conv ˆ ˆ (5.23) θ 1 + min K ⊂ L ⊂  θ K:  max We will show that z ∈  θ K (1 + ) i max n 1 + The first containment is obvious; for the second, observe that each ˆ (1 + ) kyikK kxi u vkK fork sufficientlyzikK large since kyikE v kyikE v

ˆ kxi u+ v+k − / ˆ ≤=  = kxik kxikK kuk; kvk ≤ O n ˆ kyikEˆv 2 1 2  kz k+ + 2 (1 + ) i since = and ( ) ≤  θmax:ˆ 2 = (1 + ) ˆ z L E z ∈/ θmin Bn (1 + ) i v i  − − Ev L Rv Rv L − n 1+ 2 It is also clear thatui ˆRarev zi the onlyB contact points of with since1 all ˆ 1 . It remains to show that is1 the John Ellipsoida ofi . Applying , we see that 2 has contact points ˆ = L ˆ withK , and we have already shown that these satisfy the conditions of John’s theorem with weights ˆ . θmax i kyikE v The distance between and≤ is now at most ≤  κ Av θmin i kyikE v 2 2 3 max ˆ p ky k ≤ (1 + )kx k (1 + )n (1 + ) ( ) i E v i E v min ˆ

since ˆ (1 + ) for large . Combining (5.20) and (5.22) gives the required bound (5.19).

58 Chapter 6

Restricted Invertibility

6.1 Introduction

Theorem 6.1 . There are universal constants c; d > , suchIn this that chapter whenever we studyB is the an followingn × n matrix well-known with unit theorem length of columns, Bourgain one and can Tzafriri. find a subset S ⊂ n(Restrictedof cardinality Invertibility [14]) 0 |S| ≥ cn/kBk [ ] 2 for which 2 σmin BS ≥ d; where BS is the submatrix of B with columns in S. ( ) (6.1)

This theorem has had significant applications in the local theory of Banach spaces and in the study of convexn bodies in high dimensions. It isS ;:::;S also consideredk a step towards the resolution of the famous Kadison-Singer conjecture, which asks if there 1 exists a partition of [ ] into a constant number of subsets for which (6.1) holds. Recently, the theorem has attracted attention in numerical analysis due to its connection with the column subset selection problem, which seeks to selectS a ‘representative’ subset of columns from a given matrix. In particular, Tropp [47] has developed a randomized polynomial time algorithm which finds the subset effi- ciently. Bourgain andc Tzafriri’sd ∼ proof of Theorem 6.1 uses probabilistic andc functionalc  c0 d  − <  < c0 analytic techniques and is1 non-constructive.72 In the original paper the theorem was 10 shown2 to hold for = 1 . Later on [15], the same authors proved it for = ( c) =  and = (1 + ) for every 0 1, where √ is a universal (tiny) constant. They were interested in the casec when/ isd small;/ theπ quadratic dependence of ( ) on was shown to be necessary in [11]. In another regime, modern methods can be used to obtain the constants = 1 128 and = 1 8 2 [16, 47]. S Here we present a short proofS that uses only basic linear algebra, achieves much better constants, and contains a deterministic algorithm for finding . Our method of proof involves building the set iteratively using a ‘barrier’ potential function, similar

59 to that in the proof of Theorem 3.1 in Chapter 3. The main conceptual differences between this proof and the earlier oneS are: 1. Here we use only one barrier instead of two, since we seek only a lower bound on the eigenvalues of vectors in , the set that we construct. si S 2. The additional freedom arising from having only one barrier allows us to guar- antee that the weights are either 0 or 1 (i.e., a vector is either inside or not) and that no vector is chosen twice. n T 1 n n Theorem 6.2. Suppose v ; : : : vm ∈ R , i vivi I, and <  < . Let L ` → ` Specifically, we prove the following generalization of Theorem 6.1 kLk be a linear operator. Then there is a subset S ⊂ m of size |S| ≥  F for which 1 P kLk2 2 2 {Lvi}i is linearly independent and = 0 1 2 2 : 2 [ ] T −  kLkF λ Lvi Lvi > ; m2 2 i∈S ! min X (1 ) ( ) where λ is computed on {Lvi}i∈S.

min span n n This form of generalization was introduced by VershyninL ` → ` [49] in his application to the study of contact pointsL of convex bodies via John’s decompositions of the identity. kLk 2 2 It says thatF given any such decomposition and any : , there is a part of the kLk2 decomposition2 on which is well-invertible whose size is proportional to the stable 2 rank . The original form of Bourgainc   and Tzafriri’sd  theorem−  follows quickly from Theorem 2 2 6.2 with constants{vi} {ei}i≤n kLeik ( ) = and ( ) = (1 ) by taking from the standard basis and assuming = 1. This domi- 6.2nates previous Proof bounds of the in all Theorem regimes, for small and large.

T A i∈S Lvi Lvi S P We will build the matrix = ( )( ) by an iterative process that adds one T − vector to in eachb step.A The processLvi A − willbI beLv guidedi by the potential function i X 1 T − T Φ ( ) = L( A) −( bI L) ( ) vivi I i  1  X = Tr ( ) since B = , n × m

1 Lvi This statement is identical to Chapter 1 Theorem 1.10 if we take to be the matrix with columns . We write it differently here in order to maintain consistency with the presentation in the literature [49, 14].

60 barrier b A b b > where the is a real number that varies from step to step. 0 T − T kLkF Initially =b 0, the barrierL is− atb I = L −0, andL theL /b potential− is : b 2 1 0  0    0 Φ (0) = Tr (0 ) = Tr = 0wwT A w ∈ {Lvi}i≤m w Lvj j S Each step of the process involves adding someδ > rank-one matrix to where (if = then this corresponds to adding to ) and shifting the T barrier towards zero by some fixedb−δ A amountww ≤ 0b, withoutA : increasing the potential. Specifically, we want k A k Φ ( + b ) Φ ( ) We will maintain the invariant that after vectors have been added, has exactly nonzero eigenvalues, all greater than . Keeping the potentialw small (in fact, 0 sufficiently negative) will ensure that thereb is ab suitable− δ vector to add at each step. In any step of the process, we areA only B interested inA − vectorsB which add a new nonzero eigenvalue that is greater than = . These are identified in the Lemma 6.3. Suppose A  has k nonzero eigenvalues, all greater than b0 > . If following lemma, where the notation means that is positive semidefinite. w 6 and T 0 − 0 w A − b I w < − 0 then= 0A wwT has k nonzero eigenvalues1 greater than b0. ( ) 1 (6.2) Proof. λ ≥ · · · ≥ λ A λ0 ≥ · · · ≥ λ0 + +k 1 k k A wwT A 1 1 +1 Let be the nonzero eigenvalues of , and let be the + 1 largest eigenvalues of + . As the latter matrix is obtained from by λ0 ≥ λ ≥ λ0 ≥ · · · ≥ λ ≥ λ0 : the addition of a rank one positive semi-definite matrix,k k their eigenvalues interlace:

1 1 2 +1

0 − Consider the quantity A − b I 0 0 ; λi − b − b 1 i≤k i>k   X 1 X 1 Tr ( ) = + 0

where we have written the positive and negative terms in theT sum separately.0 − By the T 0 − 0 − w A − b I w Sherman-MorissonA ww formula,− b I − A − b I − T 0 − : w A − b I2 w 1 1     ( ) 1 wTTrA −( b+0I − w < −) Tr ( ) = (6.3) 0 1 + ( 0 −) 1 A − b I A − b I  Since ( ) 1, the denominator in the right-hand term2 is negative. The numerator is positive since is non-singular and ( ) 0. So, the right-hand side of (6.3) is positive.

61 < A wwT − b0I − − A − b0I − On the other hand, a direct evaluation of this difference yields k k  1  1 0 Tr0 ( + 0 − 0 ) Tr0 ( 0 − ) 0 λ − b − b λ − b λi − b k i i i 1 1 X 1 X 1 = +1 : + =1 =1 0 0 0 λk − b b

0 1 1 0 0 λk ≥ = +1 + λk > b 0 +1 +1 b b b − δ As 0, this is only possible if , as desired.

The updated potentialT afterT one0 step, asT − the barrier moves from to = , 0 A ww L A − b I ww L can be calculatedb using the Sherman-Morisson formula: 1 LT A − b0I − wwT A − b0I − L LT A − b0I − L −  Φ ( + ) = Tr ( + ) wT A1 − b0I − w 1 1     wTrT A −( b0I − LL) T A −( b0I − w) = Tr T ( 0 )− 1 L A − b I L − T 0 − 1 +w 1A(− b I )w 1 T 1 0 − T 0 −  w A − b I LL( A − b)I w ( 1 ) = Tr 0 ( ) b A − T 0 − : w 1A − b1I + w ( 1 ) ( ) ( ) = Φ ( ) 1 w 1 + ( ) wT A − b0I − LLT A − b0I − w To prevent an increase0 in potential, we want choose a such that b A − T 0 − ≤ b A : w 1A − b I w 1 ( ) ( ) Φ ( ) 1 Φ ( ) (6.4) w 1 + ( ) We can now determine how small we need the potential to be in order to guarantee Lemma 6.4. Suppose A has k nonzero eigenvalues, all of which are greater than that a suitable , whichn× willn allow us to keep on going, always exists. b, and let Z be the orthogonal projection onto the kernel of A. If

kLk b A ≤ −m − δ 2 2 and Φ ( ) (6.5) kLZk < δ < b ≤ δ F kLk 2

2 T then there exists a vector w ∈0 {Lvi}i≤m for which2 A ww has k nonzero(6.6) 0 T eigenvalues greater than b b − δ and b0 A ww ≤ b A . + + 1 Proof. both = Φ ( + ) Φ ( ) 2 The vectors satisfying of the inequalities (6.2) and (6.4) are precisely 2 We would like to thank Pete Casazza for pointing out an important mistake in an earlier version of this proof.

62 w

wT A − b0I − LLT A − b0I − w those for which T 0 − ≤ b A1 − b0 A · −1 − w A − b I w : ( ) ( ) 1 w w ∈ {Lv } (Φ ( ) Φ ( )) ( 1 ( ) ) i i≤m

We can show that such a exists by taking the sum over all and LT A − b0I − LLT A − b0I − L ensuring that the inequality holds in the sum, i.e., that T 0 −  ≤ b A −1 b0 A · −m1− L A − b I L : Tr ( ) ( ) 1  kLk b b A − b0 A(Φ ( ) Φ ( )) ( Tr b A ( ≤ −m )− ) ) (6.7) δ 2 2

T 0 − kLk Let ∆ := Φ ( ) LΦ (A −).b FromI L the assumptionb A − b Φ≤( −m) − − bwe immediately have δ 2  1  2 Tr ( ) = Φ ( ) ∆ ∆ kLk and so (6.7) will followLT A − fromb0I − LLT A − b0I − L ≤ · : b δ b  2   1 1  2 LLTrT  k(Lk I ) ( ) ∆ + ∆ (6.8)

2 LT A − 2b0I − LLT A − b0I − L ≤ kLk LT A − b0I − L : Noting that , we can bound the left hand side as 1 1 2 2 P  A 2 Z  Tr ( ) ( P ) T Tr ( 0 − ) Z (6.9) P Z I b0 A L P A − b I PL b0 A T 0 − LetL ZbeA − theb I projectionZL onto the image of and let be the1 projection ontoP itsZ A A − b0I − A − b0I −   kernel, so that 1 + = . Let Φ ( ) = Tr ( ) and Φ ( ) = 1 2 Tr ( ) be theP potentialsZ computed onP theseZ subspaces. Since , , 0 A 0 A 0 A ; ; , ( ) , andb ( )b are mutuallyb diagonalizable,b b b we can write LT A − b0I − L LT P A − b0I − PL LT Z A − b0I − ZL : Φ ( ) = Φ ( ) + Φ ( ) ∆ = ∆ + ∆ and P A −b0I − P  2  P A − bI − P  2   2  Tr ( ) = Tr ( ) + Tr ( ) 1 1 b − b0 P A − b0I − P  P A − bI − P − P A − b0I − P As ( ) 0 and ( ) 0, it is easy to check that 2 1 1 ( ) ( ) ( ) ( )

T 0 − P kLk which immediately giveskLk L P A − b I PL ≤ b : δ 2 2 2 2 2   Tr ( ) ∆ (6.10)

T 0 − P Z kLk P kLk Thus, byk (6.8),Lk (6.9),L Z andA − (6.10),b I ZL we are≤ doneb ifb we· can show thatb − b : δ 2 δ 2 2 2  2  2 2   Tr ( ) (∆ + ∆ ) + ∆ ∆

63 P Z b ; b ≥

T 0 − Z kLk Z Taking into accountkLk that ∆L Z∆A − b0,I thisZL is implied≤ b · by the statementb : δ 2 2 2  2  2   TrT ( 0 − ) kLZkF∆ + ∆ (6.11) L Z A − b I ZL 0 b 2  2  2 We now computeZTr (T ) − = and0 − kLZkF b L Z A − bI − A − b I ZL δ 0 bb 2  1 1  ∆ = Tr (( ) ( ) )) = δkLZk which upon substituting and rearrangingkLk ≤ reducesF (6.11) to b 2 2 2

Proof of Theorem 6.2. which we have assumed in (6.6). b b δ The requirements (6.5) and (6.6) of Lemma 6.4 are satisfied at 0 kLkF kLk the beginning of the processb if we start− with≤ −=m − and satisfying b 2 δ 2 0 2 Φ (0) = 0 kLkF b ≤ δ A Z A I: and kLk2 0 2 <  < ker( ) 2 since initially = 0 and = Proj = −  kLk −  kLk Both of these conditionsb are met ifF we take 0 δ 1 and : m 2 m 2 2 0 (1 ) (1 ) = and = kLkF kLZkF kLk  kLk2 They are maintained at2 every step of the process by2 Lemma 6.4 and the fact2 that2 the −  kLk 2 kLk 2 Frobenius norm decreases by atF − most −  in oneF step. Taking steps m 2 m2 leaves the barrier at 2 (1 ) (1 ) which is the desired bound.

64 Chapter 7

The Kadison-Singer Conjecture

We conclude by drawing a connection between the main results of this thesis and an outstanding open problem in mathematics, the Kadison-Singer conjecture. This con- jecture, which dates back to 1959, is equivalent to the well-known Paving Conjecture [4, 17] as well as to a stronger form of the restricted invertibility theorem in Chapter Conjecture 7.1.  > ; δ > r ∈ N 6. The following formulation is due to Nik Weavern [51]. v ;:::; vm ∈ R kvik ≤ δ i There are universal constants 0 0, and for which the 1 T following statement holds. If vivi satisfyI; for all and i≤m X = X ;:::Xr { ; : : : ; m}

1 then there is a partition of 1 for which v vT ≤ −  i i i∈Xj X

1 j ; : : : ; r kv k ≤ δ for every = 1 . i si β > Suppose we had an ‘unweighted’κ < β version of Theorem 3.2 which, assuming , guaranteed that the scalars were all either 0 or some constant 0, and gave a T constant approximation factor I  β. Thenviv wei  wouldκ · I; have i∈S X

S {i si 6 } r  { − κ β ; β } 1 for = : = 0 , yielding a proofT of Conjectureκ 7.1 with = 2 and = min 1 since vivi ≤ ≤ −  β i∈S X 1 and

65 v vT − λ v vT ≤ − ≤ − : i i i i β i∈S i∈S ! X X min 1 = 1 1 partitions1 m

σi Similarly, a strengthening of Theorem 6.2 which [ ] into a constant number of sets also implies an affirmative resolution to the conjecture. Thus, Kadison-Singer is the common conjectured strengthening of the two main results of thisunweighted thesis, and indeed they can be viewed as steps towards its resolution. As a special case, a proof of the conjecturek wouldvik ≤ δ also imply the existence of sparsifiers for the complete graph and other (sufficiently dense) edge- transitive graphs. It is also worth noting that the condition when applied to vectors arising from a graph simply means that the effective resistances of all edges are bounded; thus, we would be able to conclude that any graph with sufficiently small resistances can be split into two graphs that approximate it spectrally.

66 Bibliography

PODS ’01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of [1] databaseAchlioptas, systems D. Database-friendly random projections. In

(New York, NY, USA, 2001), ACM Press, pp. 274–281. STOC ’01: Proceedings of the thirty-third annual ACM symposium [2] Achlioptas,on Theory of D., computing and McSherry, F. Fast computation of low rank matrix approx- imations. In (New York, NY, USA, 2001), ACM, pp. 611–618. Information Theory, IEEE Transactions on 48 [3] Ahlswede, R., and Winter, A. Strong converse for identification via quantum Mem. channels. , 3 (mar 2002), 569 –579. Amer. Math. Soc. [4] C. A. Akemann and J. Anderson. Lyapunov theorems for operator algebras. , 94, 1991. Discrete Mathematics 72 [5] Alon, N., and Chung, F. Explicit construction of linear sized tolerant networks. (1988), 15–19. APPROX-RANDOM [6] Arora, S., Hazan, E., and Kale, S. A fast random sampling algorithm for sparsi- in Flavors fying matrices. In (2006), pp. 272–279. of Geometry [7] Ball, K. An elementary introduction to modern convex geometry. In (1997), Univ. Press, pp. 1–58. STOC ’09: Proceedings of the 41st annual ACM symposium on Theory of [8] Batson,computing J. D., Spielman, D. A., and Srivastava, N. Twice-Ramanujan sparsifiers. In (New York, NY, USA, 2009), ACM, pp. 255–262. Proceedings of The Twenty-Eighth Annual ACM Symposium On The2 Theory [9] BenczOf Computingur,´ A. A., (STOC and Karger, ’96) D. R. Approximating s-t minimum cuts in O(n ) time. In O n (New York, USA, May 1996), ACM Press, pp. 47–55. STOC 2 [10] Benczur,´ A. A., and Karger, D. R. Approximating s-t minimum cuts in ˜ ( ) time. In (1996), pp. 47–55. Integral Equations and Operator Theory 11 [11] Berman, K., Halpern, H., Kaftal, V., and Weiss, G. Matrix norm inequalities and the relative dixmier property. (1988), 28–48.

67 FOCS [12] Bilu, Y., and Linial, N. Constructing expander graphs by 2-lifts and discrepancy Modern Graph Theory vs. spectral gap. In (2004), pp. 404–412.

[13] B. Bollobas. . Springer, July 1998. Israel Journal of [14] MathematicsBourgain, J., and57 Tzafriri, L. Invertibility of ŚlargeŠ submatrices with applica- tions to the geometry of banach spaces and harmonic analysis. J. Reine (1987), 137–224. Angew. Math. 420 [15] Bourgain, J., and Tzafriri, L. On a problem of Kadison and Singer. (1991), 1–43. Operators and Matrices 3 [16] Casazza, P., and Tremain, J. Revisiting the Bourgain-Tzafriri Restricted Invert- ibility Theorem. (2009), 97–110. Proceedings of the National Academy of Sciences of [17] Peterthe United G. Casazza States and of America Janet C. Tremain. The Kadison-Singer problem in math- ematics and engineering. SCG , 103(7):2032–2039, 2006. ’86: Proceedings of the second annual symposium on Computational geometry [18] Chew, P. There is a planar graph almost as good as the complete graph. In Spectral Graph Theory (1986), ACM, pp. 169–177.

[19] Chung, F. R. K. . CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997. Constructive Approximation [20] H. Dette and W. J. Studden. Some new asymptotic properties for the zeros of Ja- cobi, Laguerre, and Hermite polynomials. , 11(2):227– 238, 1995. FOCS ’01 [21] P. Drineas and R. Kannan. Fast monte-carlo algorithms for approximate matrix multiplication. In , pages 452–459, 2001. SODA ’03 [22] P. Drineas and R. Kannan. Pass efficient algorithms for approximating large matrices. In , pages 223–232, 2003. J. ACM 51 [23] Frieze, A., Kannan, R., and Vempala, S. Fast monte-carlo algorithms for finding low-rank approximations. , 6 (2004), 1025–1041.

[24] Giannopoulos, A., and Milman, V. Extremal problems and isotropic positions of Algebraic Graph Theory convex bodies.

[25] Godsil, C., and Royle, G. . Graduate Texts in Mathe- matics. Springer, 2001. STOC ’86: Proceedings of the eighteenth annual ACM symposium on Theory [26] Goldberg,of computing A. V., and Tarjan, R. E. A new approach to the maximum flow problem. In (New York, NY, USA, 1986), ACM, pp. 136–146.

68 Matrix Computations, 3rd. Edition

[27] Golub, G. H., and Van Loan, C. F. Rendiconti del Circolo. The Matem- Johns aticoHopkins di Palermo University 37 Press, Baltimore, MD, 1996. [28] Gruber, P. M. Minimal ellipsoids and their duals. SIAM J. Matrix Anal., 1 Appl. (1988), 21 35–64. [29] Guattery, S., and Miller, G. L. Graph embeddings and laplacian eigenvalues. Bulletin of the American Mathematical, 3 (2000), Society 703–723. 43 [30] Hoory, S., Linial, N., and Wigderson, A. Expander graphs and their applications. Contemp. Math. , 4 (2006), 439–561. [31] W. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space.STOC ’06: Proceedings, 26:189–206, of the thirty-eighth 1984. annual ACM symposium [32] onKhandekar, Theory of R., computing Rao, S., and Vazirani, U. Graph partitioning using single commod- ity flows. In Combinatorica 8 (New York, NY, USA, 2006), ACM, pp. 385–390. [33] Lubotzky, A., Phillips, R., and Sarnak, P. Ramanujan graphs. , http://www.econ.upf.edu/3 (1988), 261–277. ∼lugosi/anu.ps [34] G. Lugosi. Concentration-of-measure inequalities, 2003. Available at . Problems of [35] Margulis,Information G. Transmission A. Explicit group 24 theoretical constructions of combinatorial schemes and their application to the design of expanders andDiscrete concentrators. Mathematics 91 , 1 (July 1988), 39–46. [36] Nilli, A. On the second eigenvalue of a graph. , 2 (1991), 207–210. Foundations of [37] Reingold,Computer Science, O., Vadhan, Annual S., and IEEE Wigderson, Symposium A. onEntropy 0 waves, the zig-zag graph product, and new constant-degree expanders andIsrael extractors. J. Math 101 (2000), 3.

[38] Rudelson, M. Contact points of convex bodies. J. of Functional, 1 Analysis (1997), 16392–124. [39] Rudelson, M. Random vectors in the isotropic position. , 1 (1999), 60–72. J. ACM 54 [40] Rudelson, M., and Vershynin, R. Sampling from large matrices: An approach through geometric functionalProceedings analysis. of the 40th, 4 (2007), Annual 21. ACM Symposium [41] Spielman,on Theory D. of Computing A., and Srivastava, N. Graph sparsification by effec- http://arXiv.org/abs/0803.0929tive resistances. In (2008), pp. 563–568. Full version available at .

69 Proceedings of the [42] thirty-sixthSpielman, D. annual A., and ACM Teng, Symposium S.-H. Nearly-linear on Theory of time Computing algorithms (STOC-04) for graph parti- tioning, graph sparsification, and solving linear systems. In (2004), pp. 81–90. Proceedings of [43] Spielman,the thirty-sixth D. A., annual and Teng, ACM S.-H. SymposiumNearly-linear on Theory time algorithmsof Computing for (STOC-04)graph par- titioning, graph sparsification, and solving linear systems. In http://arxiv.org/abs/cs.DS/0310051 (New York, June 13–15 2004), ACM Press, pp. 81–90. Full version available at .

[44] Spielman,http://www.arxiv.org/abs/cs.NA/0607105 D. A., and Teng, S.-H. Nearly-linear time algorithms for precondi- tioning and solving symmetric, diagonally dominant linear systems. Available at CoRR , 2006. abs/0808.4134 http://arxiv.org/abs/0808.4134 [45] Spielman, D. A., and Teng, S.-H. Spectral sparsification of graphs. (2008). Available at .

[46] Thorup, M., and Zwick, U. Approximate distance oracles. pp. 183–192. SODA ’09: Proceedings of the Nineteenth Annual ACM -SIAM [47] Tropp,Symposium J. A. onColumn Discrete subset Algorithms selection, matrix factorization, and eigenvalue op- timization. In (Philadelphia, PA, USA, 2009), Society for Industrial and , pp. 978–986.

[48] Tropp, J. A. Topics in Sparse Approximation. Ph.D. dissertation, Computational Israel Journal of and Applied Mathematics, Univ. Texas at Austin, Aug. 2004. Mathematics 122 [49] Vershynin, R. John’s decompositions: Selecting a large part. (2001), 253–277.

[50] Vershynin, R. A note on sums of independent random matrices after ahlswede- winter. Available at http://www-personal.umich.edu/ romanv/teaching/reading- Discrete Math- group/ahlswede-winter.pdf, 2009. ematics [51] Nik Weaver. The Kadison-Singer problem in discrepancy theory. , 278(1-3):227 – 239, 2004.

70